Professional Documents
Culture Documents
PROCEEDINGS
EDITED BY C. WOODS, G. B. LUCK, R. BROCHARD,
F. SEDDON & J. A. SLOBODA
Back
Back
Proceedings
Movement meaning Human voice Categorisation, Instrumental Rhythm and meter Music performance
and flux development similarity perception learning:
participation and Chair: Parncutt,R. Chair:
cue abstraction
practice Gabrielsson,A.
Convenor: Convener: Welch,G (First of 3 sessions)
Davidson,J. Chair:
Chair: Trehub,S Chair: Hallam,S.
Cohen,A. Convenors: Deliège,I
Chair: Louhivouori,J
Discussant:
Sundberg,J. Discussiants:
Baroni,M., Cross,I.,
Mélen,M.
Meaning and the Genesis of singing Perception of Perceived social Hierarchic Coordinating duo
specification of behaviour similarity and related support and representations on piano performance
motion in music theories children's complex meters
White, P. participation in
music
Long-term average
spectrum analysis of
developmental
9.30 Guile, L. changes in children's Anagnostopoulou, C. McPherson, G. Vazan, P. Ashley, R.
voices. (Abstract)
The expressive Similarity and Self regulation and Mental The pragmatics of
world of flux! Howard, D. categorisation in musical practice: a manipulation of conducting:
boulez' parenthese longitudinal study meter analysing and
Listener perception from the third piano interpreting
of girls and boys in sonata: a formal (Abstract) conductor's
an English cathedral analysis expressive gestures
choir
McAllister, A.
10.00 Correia, J. Addessi, A. Jorgensen, H. Hugardt, A. Edlund, B.
Psychological and
Music performance Segmentation of post Practice planning Children Pulse and Listening to oneself
perceptual voice
as intimate self tonal music and instrumental music at a distance
characteristics in 10
disclosure achievement
year old girls as
manifested in voice
Davidson, J. range profiles Dibben, N. Renwick, J. DiMatteo, R. Keller, P.
10.30
Exploring the body Thurman, L. Motive structure and I've got to do my Time estimation Attentional resource
in the production the perception of scale first, a case and time related allocation in musical
Vocal self-expression
and perception of similarity study of a novice's features of auditory ensemble
and
performance clarinet practice environment performance
self-determination.
(Abstract)
The application of a The perception and Categorisation, Melodic Song memory Models for pulse
theory of intrinsic performance of similarity perception, representation and rhythm
musicality in therapy. vibrato cue abstraction Chair:
(Second of 3 sessions) Chair: Kopiez,R. Costa-Giomi,E. Chair: Povel,D.
Convenor: Convenors:
Trevarthen,C. Timmers,R., Convenor: Deliège,I
Desain, P. & Chair: Louhivouori,j.
Chair: Sloboda,J. Honing,H.
Discussants: Baroni,
Discussant: Chair: Honing,H M., Cross,I., Mélen,M.
Pavlicevic,M
Discussant: Rodet,
X
(Read by C.
Trevarthen)
(Abstract)
Keynote abstract
Over recent decades, voice research has revealed several links between the physiology and the acoustics
of voice production in various types of singing. Yet it has been difficult to identify acoustic parameters that
are relevant to perceptual voice parameters. There are reasons to assume, however, that perceptual
descriptions are closely related to voice production, so that much of the gap between perceptual and
acoustic descriptions of voice reflect the complex relations between production and acoustic characteristics.
Aims.
This paper aims to elucidate links between perceptual and acoustic aspects of voice timbre by reviewing
research on the relationships between variations of physiological parameters, such as subglottal pressure,
larynx height, voice source, and vocal tract dimensions, and their effects on the acoustic characteristics.
These relationships will be illustrated by reference to physiological differences between classifications, male
and female, young and old, and operatic, pop music and amateur singers.
Main Contribution.
The paper will comment on possible relationships between acoustic, physiological, and perceptual
descriptions of singing voices.
Implications.
The review will highlight those acoustic voice parameters which should be of direct relevance to perceptual
descriptions of voice quality and thus attempt to identify acoustic parameters that are perceptually and
physiologically informative.
Back to index
Symposium introduction
Title of Symposium: Movement, meaning and flux
Symposium Convenor: DAVIDSON, Jane W. Dr
Symposium Rationale: The papers in this symposium focus on the human body as the site out of
which musical understanding emerges. The symposium is regarded as being of critical importance
since the role of the body in music perception and cognition has been largely ignored.
Aims This symposium aims to explore the role of bodily movement in the creation and
comprehension of a musical work. Four different research approaches to the issue are included to
illustrate the central role of the body in music listening, performing, teaching and creation.
Back to index
Back to index
Symposium introduction
Symposium
CATEGORISATION<==>Similarity Perception<==>CUE ABSTRACTION
URPM - CRFMW
University of Liège
B - 4000 Liège
e-mail: Irene.Deliege@ping.be
Over several decades, research on categorisation has taken an important place in the field of the cognitive sciences in
general, more precisely in psychology and in artificial intelligence. However, work on categorisation in the domain of
music perception has mainly developed over the last ten years.
2. Aims
The program of the symposium is intended to bring together a group of young researchers who have undertaken the
study of the topic of categorisation from a variety of perspectives in connection with music perception. It is expected
that a synthesis of results will be developed during the round table discussions with a view to define further and future
research perspectives.
session I:
Sunday morning
• Irène Deliège (Liège, Belgium) :
Similarity and categorisation in Boulez' Parenthese from the 3rd piano sonata:
A formal analysis
session II:
Sunday afternoon
• Emilios Cambouropoulos (Vienna) :
Musical schemata in children from 10 to 12 years of age: A study on segmentation and mental line elaboration
session III:
Monday morning
• Marc Mélen and Julie Wachsman (Liège) :
Cognition and affect in music listening: inter-relations between musical structure, cue abstraction and continuous
measures of emotion
• Tuomas Eerola, Petri Toiviainen, Topi Järvinen and Jukka Louhivuori (Jyvaskyla, Finland):
Discussants :
Mario Baroni (Bologna), Ian Cross (Cambridge), Marc Mélen (Liège)
To close the symposium : KEYNOTE SPEAKER ASSOCIATED WITH THE SYMPOSIUM
Mario Baroni : MUSIC AND MEANING
Monday morning
Back to index
Proceedings paper
Abstract
The relationship between music and motion has attracted interest over a broad sweep of
history and across a variety of disciplines including aesthetics, psychology, music theory
and neuroscience, and the relationship itself has been regarded variously as metaphorical,
semiotic, and physiological. This paper will argue that the relationship between music
and motion: i) is a fundamental aspect of music's impact and meaning; ii) is significantly,
but not entirely, concerned with self-motion (as argued by Todd); iii) should be regarded
as a truly perceptual relationship - even though the motion that is perceived may be
illusory (in the sense of being virtual rather than real). The aim of the paper is to clarify
the nature and status of the relationship between music and motion, rejecting both the
physiological reductionism of regarding it as 'hardwired' and the potentially dismissive
view of it as merely metaphorical. In place of these, the paper will argue for motion as
being perceptually specified by the acoustical information in music .
The theory proposed here has specific empirical implications - most obviously an
investigation of the various kinds of information in music that specify motion, and a
consideration of whether these function in anything like the same manner as for real
motion. It also has implications for theories of musical meaning, since it allows for the
integration of the sense of (self-)motion with the other kinds of events (physical,
structural, cultural) as they are specified in music.
1. Introduction
The close relationship between music and movement has been acknowledged since the time of
classical Greek writings on music. More recently, a number of authors - with perspectives ranging
from philosophy and semiotics to neuroscience - have discussed the relationship, proposing
explanations ranging from hardwired physiology to metaphor. This paper will present an overview of
some of this work, and will propose a perceptual approach to the issue based on ecological principles.
In essence, I will propose that the sense of motion when listening to music is an inevitable
consequence of the event-detecting nature of the auditory system, that there are some interesting
questions about what listeners perceive as being in motion, and that the varieties of motion specified
in musical sound are central to listeners' perceptions of meaning in music. This paper will argue that
the relationship between music and motion: i) is a fundamental aspect of music's impact and meaning;
ii) is significantly, but not entirely, concerned with self-motion (as argued by Todd); iii) should be
regarded as a truly perceptual relationship - even though the motion that is perceived may be illusory
(in the sense of being attributed to virtual, rather than real, objects).
2. Music and motion: brief overview.
In his historical overview of rhythmic theory, going back to Greek theory, Yeston (1976) observes
that
"In the broadest sense, the theory of musical rhythm has always been concerned with the
elucidation of musical motion - motion that is differentiated by the durational value,
pitch, or intensity of sounds, but which, at the same time, presumably exhibits certain
regularities." (p. 1)
His survey lies clearly within the tradition of structuralist musicology, of which Shove & Repp
(1995), who provide an important survey of motion literature relating to performance, comment:
"Traditionally, to explain the source of musical motion, theorists, philosophers and
psychologists alike have turned to musical structure, which by most accounts is abstract.
This has led some to believe that the motion heard is virtual, illusory or abstract. ...
Hidden from this view is perhaps the most obvious source of musical movement: the
human performer. Why have so many theorists failed to acknowledge that musical
movement is, among other things, human movement?" (p. 58)
Their aim is to examine the way in which the real (or imagined) motion of the performer may be
specified in sound, and the ways in which the movements of both performers and listeners might be
harnessed to increase their aesthetic awareness of music. In doing so, Shove and Repp demonstrate
that a number of authors in the early part of the twentieth century (Sievers, Becking, Truslit - see
Shove & Repp, 1995) were particularly interested in the relationship between body movement,
gesture and performance (both musical and literary). Each of these authors developed his own lexicon
of movement types, the function of which was both analytical and practical: each lexicon was
intended both to reveal and understand the inner dynamics of works of music (and literature), and to
help performers to find a fluent and expressive approach in performance. Becking, whose ideas have
subsequently been taken up and developed by Clynes (e.g. Clynes, 1983), distinguished a number of
types or styles of movement curve, and attributed these different styles of movement to the music of
different groups of composers. Truslit (1938; see Repp, 1993) had no interest in the composer
specificity of musical motion, but was more concerned with the relationship between the acoustic
surface of a musical performance and the underlying motion dynamics of the piece. His interest was
therefore in the specifics of individual pieces, and with the discovery of the particular movement
patterns that would help listeners and performers to understand and project the music in the most
effective manner. A particular feature of his theory which links it with more recent work by Todd (see
below) is the proposal that the combination of larger (more global) and smaller (more local)
movements in performance, as well as in the physical responses of listeners, reflect the organisation of
the motor system into two divisions controlling whole body movement and more peripheral limb
movement respectively. The former (the ventromedial system) is closely associated with the
vestibular apparatus (as Truslit observed) which is responsible for our sense of balance and
movement, thus suggesting a possible direct physiological link between the perception of sound and
movement.
Although Shove and Repp briefly consider the wider specification of motion in music (i.e. motion that
is not necessarily attributable to the performer, but to some other agent, such as the listener's self
motion, or the motion of some virtual agent) this is not the focus of their review. Todd, however, has
given rather more attention to listening and the sense of self-motion that it can induce, without
resorting to the 'abstraction' of musical structure to which Shove and Repp refer. In a paper concerned
with the relationship between tempo and dynamics in performance, Todd (1992) concludes with the
proposal "that musical expression has its origins in simple motor actions and that the performance and
perception of tempo/musical dynamics is based on an internal sense of motion", and that "expressive
sounds can induce a percept of self-motion in the listener..." (p. 3549). The basis for this, Todd
speculates, may lie in the physiology of the inner ear, and in the possibility that sound directly
activates the vestibular apparatus which is responsible for our sense of self-motion. In a subsequent
paper, however, Todd (1995) notes the need to be cautious about the relationship between musical
expression and physical motion, distinguishing between "purely metaphorical notions of musical
motions and any more psychologically concrete phenomena that correspond to the metaphorical."
(Todd, 1995; p. 1946)
Empirical evidence for listeners' sense of motion in rhythm is provided by Gabrielsson (1973), who
had listeners rate a battery of simple rhythmic figures on a large number of adjectival descriptors, and
analysed these judgements using factor analysis. Three main dimensions emerged: a 'structure' factor
was an indicator of metre (triple or duple) and pattern complexity; an 'emotion' factor was an indicator
of broadly positive and negative emotional attributions; and a 'motion' factor picked up listeners' sense
of the movement quality of each rhythm, using labels such as 'running', 'limping', 'flowing', 'crawling'
etc. The relationship between music and dance (as well as marching and other forms of physical
activity) is an obvious and ancient one, and can be seen in concert music in the persistence of dance
forms right up to the present day, quite apart from its more manifest importance in virtually every
form of popular and traditional music. But the striking feature of Gabrielsson's research is the
spontaneous emergence of this factor in the context of 'laboratory' conditions and very simple single
line rhythms played on a drum. The implication is that the motion character is a pervasive deep-seated
component of listeners' responses to music.
3. The Metaphor of Motion.
In his book on the aesthetics of music, Scruton (1997) devotes considerable space to a consideration
of music's motion character. The starting point for Scruton's argument is the distinction between
sound and tone, identifiable in
"three important distinctions: that between the acoustical experience of sounds, and the
musical experience of tones; that between the real causality of sounds, and the virtual
causality that generates tone from tone in the musical order; and that between the
sequence of sounds and the movement of the tones that we hear in them. These
distinctions are parts of the comprehensive distinction between sound and tone... When
we hear music, we do not hear sound only; we hear something in the sound..." (Scruton,
1997: 19)
In common with many other writers, Scruton argues strenuously for a fundamental distinction
between the sounds of the 'everyday' world, and the tone(s) of music. This is important because it
places the motion experiences of listeners in a realm that is quite separate from those same listeners'
auditory experiences of the motional character of everyday objects in the 'real' world - the sound of
footsteps approaching, of cars passing, of balls bouncing, of bottles breaking, of water gushing and so
on. But despite (or perhaps even because of) the separation of musical motion from the motion of
objects in the world, Scruton places the sense of motion at the centre of musical experience.
"Whenever we hear music, we hear movement...", he writes (p. 55), and elsewhere "...we must hear
the movement in music, if we are to hear it as music." (p. 52).
His explanation for this sense of movement in music, relating as it does to an acousmatic space
divorced from the real spaces of the world, is that it depends on a deep-seated metaphor:
"[The] idea of musical movement is an irreducible metaphor, which can be explained
only through our response to music. It is associated with other metaphors - and in
particular with the metaphor of life. In hearing the movement in music we are hearing
life - life conscious of itself..." (p. 353)
And taking the metaphor of life further, Scruton suggests that we
"think of music as spread out in acousmatic space, where a new kind of individual is born
and lives out its life: an individual whose character is constantly changing in response to
the musical surroundings.
These musical individuals are not, of course, concrete particulars, like tables and chairs.
... They are heard as individuals; but any attempt to identify them must lean upon
acoustical criteria - according to which they are not individuals at all, but repeatable
patterns or types." (p. 72-3)
Scruton's approach relies on taking the first step of claiming that musical events are 'secondary
qualities' - i.e. not tied to the physical circumstances of the real world but separated from them, and
capable of 'behaving' in ways that are not constrained by the real world. His notion of movement,
gesture etc. is necessarily abstract and (in some sense) idealised - because the 'things' that are moving
are metaphysical rather than physical. The movement and space are metaphorical because the
properties of real space and movement have been transferred across to another domain where they
have no literal application. Scruton is aware of the arguments for the central role of metaphor in
human understanding (he makes reference to the work of Lakoff & Johnson on several occasions), but
he nonetheless draws a clear line between the tangible and practical world of sound, and the abstract
and incorporeal domain of tone - and in doing so aligns himself unambiguously with the aesthetic
tradition of music's autonomy.
But why make the first move of insisting upon the separation between sound and tone? The sounds of
music can and obviously do specify objects and events in the world (instruments and the people who
play them), and kinds of action, even when the nature of what is acting is unclear or uncertain (we
may hear blowing or scraping without knowing exactly what is being blown or scraped). In this most
obvious sense the sounds of music are the sounds of the 'real world'. A listening strategy that was
concerned solely with identifying instruments and playing actions would undoubtedly be a limited one
(although the importance of music's instrumental character has been seriously undervalued by the
psychology of music, aesthetics and music theory - all of which are unduly focused on pitch and
rhythm), but a range of other possibilities is opened up by considering the way in which musical
sounds may specify the objects and events of a virtual environment. By analogy, when looking at a
painting (representational or abstract), an important part of the perceptual experience is to see the
forms and colours as specifying a 'virtual space' which has properties that are reminiscent of the real
spaces that we know, or that intrigue us by defying the normal rules of space (the striking and
puzzling quality of some of M. C. Escher's drawings, for example, depend on this). The same
principle applies to hearing: sounds can specify a virtual domain that both abides by and defies the
normal laws of physics.
but scene analysis itself extends to some quite large-scale and complex features - and dynamic and
motional character is one such. In a passage which draws attention to the close relationship between
the way this character is conveyed in music and in 'real life', Bregman writes:
"Transformations in loudness, timbre, and other acoustic properties may allow the
listener to conclude that the maker of a sound is drawing nearer, becoming weaker or
more aggressive, or changing in other ways. However, in order to justify such
conclusions, it must be known that the sounds that bear these acoustic relations to one
another are derived from the same source. ... This strategy of allowing discrete elements
to be the bearers of form or 'transformation' only if they are in the same stream is the
foundation of our experience of sequential form in music as much as in real life."
(Bregman, 1990: 469)
As remarked earlier, if the sounds of music were restricted to specifying only real physical sources
(instruments and their modes of activation), an ecological approach to music perception might seem to
offer a limited prospect, and this is perhaps why Gaver (1993a) draws a sharp line between musical
and everyday listening in his important paper on the ecology of audition. But there are ways of
developing the ecological principle far beyond the recognition of instrumental sources, as I have
proposed elsewhere (Clarke, 1997; 1999; see also Windsor, 1995), and as Bregman also admits when
he adopts an idea of McAdams' - that of virtual sources. It is in this idea that the potential of an
ecological approach lies for the kind of motion perception that is the specific subject of this paper.
McAdams (1984) coined the term 'virtual source' by analogy with the terms virtual image or virtual
object in optics, which refer to the objects and images seen in mirrors and pictures, and which occupy
the virtual space behind the plane of the picture or mirror.
"A composer may want the listener to group the sounds from different instruments and
hear this grouping as a sound with its own emergent properties. This grouping is not a
real source, however. Stephen McAdams has referred to it as a 'virtual' source, perhaps
drawing the analogy from the virtual image of optics ... Experiences of real sources and
of virtual sources are both examples of auditory streams. They are different not in terms
of their psychological properties, but in the reality of the things that they refer to in the
world. Real sources tell a true story; virtual sources are fictional." (Bregman, 1990: 460)
In the case of a mirror, the virtual objects have a lawful relation with the real objects of which they are
a reflection, move in a fashion that is identical to the corresponding movements of their real
counterparts (of which they are a reflection), and are described by an optics identical to that of the real
world (albeit right/left reversed). In a picture, or film/animation, the objects have qualities that may
mimic those of the real world, and can do so very convincingly in the case of trompe l'oeuil painting,
or computer animation, but achieve these qualities using quite different means. Gibson noted this in
his discussion of painting and drawing (Gibson, 1966; p. 240) when he pointed out (in relation to a
painting of Sir Thomas More by Holbein) that the sense of folding and texture on More's velvet sleeve
is achieved by Holbein using pigments of different hues, while the same visual effect produced by a
real piece of velvet is achieved by differential reflectance and shadow. Assuming that the painting of
the velvet is completely convincing, the same perceptual effect is produced by completely different
means: paintings specify the shapes and positions of a virtual space with quite different methods
(pigments, shading, geometry) than the 'same' effects in the real world (reflection, shadow,
orientation).
The comparison with painting is instructive because it suggests a way of understanding both what it is
that we perceive as moving in music, and how the effect is produced. Just as the disposition of
pigment in a painting can create a perceptual effect analogous to that produced by reflectance and
shadow in the real world, so music may create perceptual effects with the disposition of discrete
pitches and instrumental timbres in time that reproduce, or approximate to, those that we experience
with the continuous acoustical transformations that are characteristic of real world events. It is
important to be cautious about making too simple a translation from the spatial domain of pictures to
the temporal domain of music, however, and to avoid representing music as little more than 'sound
pictures'.
5. Ecology extended: motion, gesture, meaning
While it seems obvious that visual information can specify movement, there is more resistance to the
idea that the same might be true of sound. Certain kinds of acoustical information are readily accepted
as specifying motion (e.g. continuously changing left ear/right ear intensity balance or phase relation;
or the pitch shift of the Doppler effect) since these directly specify real movement in real space. The
experiences of vivid sound tracks or 'demonstration' sequences in surround-sound systems or Dolby
Digital cinemas are indicative of how powerful this effect can be even when artificially generated. But
the much more general possibility is that rhythmic effects, timbral changes, dynamic changes, pitch
patterns etc. have the capacity to specify motion in a virtual space - in the same way that the swirls
and textures and so on of computer animation specify motion in the virtual spaces of the film. Just as
the success of animation depends on the propensity of the visual system to detect movement in even
quite poor approximations to the changing visual arrays that specify real movement, so too the
perception of motion and gesture in music relies on the detection of motional and gestural invariants
in sound sequences which may be quite poor approximations to their real-world counterparts. The
temporal, timbral and dynamic components of music can be seen as having a direct capacity to specify
objects and events (for a detailed analysis, see Todd, O'Boyle & Lee, 1999).
This final section can do little more than sketch the outlines of such an approach - for two reasons:
first, a more complete account is beyond the scope of this paper; and second the empirical
investigations needed to explore the ideas presented here do not exist. But the principle itself can be
stated simply enough: since sounds in the everyday world specify (among other things) the motional
characteristics of their sources, it is inevitable that musical sounds will also specify movements and
gestures, both the real movements and gestures of their actual physical production (as discussed
above) and also the fictional movements and gestures of the virtual environment (cf. Windsor, 1995)
which they conjure up. In certain respects, this is not a new idea at all: Langer (1942) wrote of the
way in which musical meaning is tied up with its capacity to convey movement and gesture, but the
difference here is the claim that this relationship is truly perceptual rather than metaphorical, symbolic
or analogical. For obvious adaptive reasons, the auditory system is highly attuned to the
movement-specifying properties of sounds, and since the variety of movements by animate and
inanimate objects is unlimited, every musical sound will specify some kind of movement. The most
obvious way in which movement is specified is in the temporal properties of musical sound, since any
temporal property can specify movement. The crucial question is therefore what a listener hears as
being in motion.
As discussed earlier, Todd has proposed that the sense of motion in music is a sense of self-motion.
For him this is a necessary consequence of the vestibular explanation that he proposes: if musical
sound directly stimulates the vestibular apparatus, then this will induce a sense of self-motion in just
the same way as the well-documented manner in which visual information can. But if a perceptual
approach is adopted, self-motion is not the only kind of motion that listeners might experience -
although the relativity of motion means that it is always an option. When you look out of the window
of a stationary train and see another train move, the well-known phenomenon of experiencing either
illusory self-motion or the true movement of the other train is an illustration of this relativity. In a
similar way, sound specifies relative motion but cannot alone specify which object is moving in
relation to the other: in terms of pitch shift, the Doppler effect, for example, is identical whether
caused by a sound-emitting object approaching and passing a stationary observer, or a moving
observer passing a stationary sound-emitting object. In the absence of appropriate empirical evidence,
it is impossible assert the extent to which listeners perceive musical motion as self-motion or the
movement of other objects, but intuition suggests the experience of self-motion is not uncommon. In
part this may be attributed to a simple principle of ecological acoustics: if all the separate sound
sources that are (or appear to be) specified in some music are heard to behave and move in a
correlated fashion, then the auditory scene specified is that of a listener moving in relation to a
collection of stationary sound sources (self-motion). If, however, the various sound sources move
relative to one another and in relation to the listener, then this specifies the movements of external
objects in relation to one another. In very simple terms this suggests, for instance, that music that has
broadly polyphonic and contrapuntal properties is likely to be heard in the latter category, while
monodic or homophonic music may more easily specify self-motion.
Let us consider a single musical example which may illustrate some of these points: the pair of
dramatic orchestral crescendos on a single note (B) that occurs in the interlude between scenes 2 and 3
of the third act of Alban Berg's opera "Wozzeck". The movement that is specified by these sounds is a
paradoxical one. On the one hand the complete absence of pitch change specifies stasis, while on the
other the continuous change in both timbre (in the first of the two crescendos) and dynamic (in both)
specifies continuous, inexorable and perhaps headlong motion. The net result is, perhaps, a sense of
highly focused and unswerving approach - the auditory equivalent of what Gibson called 'looming' or
'time-to-contact'. Gibson writes of the information which specifies approach to/of an object as
follows:
"Approach to a solid surface is specified by a centrifugal flow of the texture of the optic
array. Approach to an object is specified by a magnification of the closed contour in the
array corresponding to the edges of the object. A uniform rate of approach is
accompanied by an accelerated rate of magnification. ... The magnification reaches an
explosive rate in the last moments before contact. This accelerated expansion ... specifies
imminent collision." (Gibson, 1958, cited in Gibson 1979, p. 231)
This description of the information specifying approach reads as a very close parallel to the sounds of
these two orchestral crescendos, if one makes appropriate sensory substitutions: continuous dynamic
increase substitutes for flow of optical texture, and the pitch stasis provides the centrifugal quality
(imagine how different the effect of the example would be if the music were to trace some kind of
continuous or stepwise pitch trajectory at the same time as the crescendo). Relating to Gibson's
mention of "an explosive rate" of magnification, and the specification of imminent collision, this
aspect of the Wozzeck example is largely in the hands of the conductor who controls by exactly how
much, and at what rate, Berg's dynamic markings should be realised. But performances and recordings
of the work usually do seem to reach "an explosive rate" of intensity increase, and thus the sense of
imminent collision.
The two crescendos differ in a number of respects, and give rise to interestingly different movement
effects as a result. The first crescendo is achieved not only by continuous increases of dynamic within
each of the instruments or instrumental groups of the orchestra, but also by a complex pattern of
successive instrumental entries, so that the timbre and texture of the orchestral sound changes
continuously as its dynamic increases. Arguably, this results in a less focused and static 'looming'
quality than is achieved by the second crescendo, which consists simply of huge orchestral tutti
crescendo. A second difference concerns the respective goals or 'contact moments' of the two
crescendos. The first, after a build up on unison B, ends with a 6-note orchestral chord, which is
played as a rhythmic unison on a downbeat, and coincides with the first note of a striking rhythmic
figure played solo and triple forte by the bass drum. The orchestral downbeat has the attack and
unanimity of an impact, and the sense of motion that the first crescendo conveys is therefore of
approach, followed by impact - out of which the bass drum appears as if in a blinding flash. The
second crescendo, by contrast, is an orchestral unison throughout, and consists solely of a dynamic
crescendo which ends not with a downbeat - indeed without any final 'event' at all - but with the
equivalent of a cinematic cut straight into the next scene of the opera. It is as if the imminent collision
never materialises, and the listener (or the music) is shot out into a new and completely unpredicted
space - as if passing through an invisible barrier into a new domain at the moment when collision
seemed inevitable. These descriptions may seem fanciful and unfounded, but it would be relatively
straightforward to establish empirically what the musical/acoustical conditions are that specify
collision, or rupture, or emergence, or linear movement, or movement with frequent directional
change, and so on. It simply hasn't yet been tried, but using the similarity with optical flow, texture
gradients, coordinated versus independent component behaviour and so on, the basis for which is
already established within auditory scene analysis research, it is a perfectly tractable proposition.
An issue already broached and again pinpointed by the Wozzeck example is the question of who or
what is moving. As the discussion above illustrates, there is an ambiguity about the agency to which
the movements described above should be attributed. This is perhaps made all the more concrete and
particular by the operatic context from which this music comes, and the drama of which it is a part.
Does each listener hear him or herself as moving towards some collision, or one of the characters of
the opera moving towards some collision, or indeed some other person or object? Given the dramatic
context in which this music takes place (the main character, Wozzeck, has just murdered Marie, his
partner), it is likely that many listeners will hear this as Wozzeck 'rushing to meet his fate' (or fate
rushing to meet him), or perhaps death rushing to meet Marie (heard now in immediate retrospect).
But equally it is possible for a listener to hear this as self-motion and as an identification with one of
the opera's characters. Writing at a much more psychophysical level of a similar kind of subject/object
uncertainty or fluidity, and of the relationship between rhythm and motion, Todd et al. (1999) express
the situation as follows:
"The essence of the sensory-motor theory is that the experience of rhythm is mediated by
two complementary representations: a sensory representation of the motional-rhythmical
properties of an external source on the one hand, and a motor representation of the
[listener's] musculoskeletal system, on the other. For any individual, the sensory systems,
by learning, tune in to the temporal-motional properties of the physical environment,
whilst the motor control system, also by learning, tunes into the dynamic properties of its
musculoskeletal system. If the temporal/dynamical properties of the source and the
musculoskeletal system are matched, then the motor image will tend to synchronise with
the source." (Todd, O'Boyle & Lee, 1999, p. 26)
It could be objected once again that the discussion of the "Wozzeck" example is highly speculative,
subject to an enormous amount of interpretative licence, and out of step with an ecological approach
since it seems to depend on the interpretation of sense data in the light of verbal and dramatic
information, rather than specification by stimulus invariants. But this would be to overlook the fact
that all of the factors mentioned (the drama, the characters, the sounds) are indeed part of the available
information for a viewer/listener. That there is considerable leeway for different perceptions of this
short musical extract is not a problem for an ecological approach: many environmental circumstances
can be perceived in more than one way, and aesthetic objects are particularly (and deliberately)
multivalent - not only because they often contain deliberately partial and perhaps conflicting
perceptual information (and therefore have the capacity to specify more than one state of affairs), but
also because the viewers/listeners who encounter them, even when drawn from notionally the same
culture, may differ markedly in their previous experience of this or similar events. The differences in
exposure to the music of Berg, for example, among a sample of psychologists of music is likely to be
far greater than the differences between these same individuals' experiences of mountains.
6. Summary and Implications
To conclude, I have proposed in this paper that the sense of motion and gesture in music is a truly
perceptual phenomenon, and that the perceptual information that specifies motion is broadly speaking
the same as for the perception of motion in the everyday world. In music, it is the temporal component
of music that is particularly rich in specifying motion. The sense of motion or self-motion draws the
listener into an engagement with the musical materials in a particularly dynamic manner (he or she
acts amongst them), and in so doing constitutes a vital part of musical meaning. The theory proposed
here has specific empirical implications - most obviously an investigation of the various kinds of
information in music that specify motion (and the work of Todd et al. is an important start in this
direction), and a consideration of whether these function in anything like the same manner as for real
motion. It also has implications for theories of musical meaning, since it allows for the integration of
the sense of (self-)motion with the other kinds of events (physical, structural, cultural) as they are
specified in music.
References
Bregman, A. S. (1990): Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge,
Mass.: MIT Press.
Clarke, E. F. (1997): Perception and Critique: Ecological Acoustics, Critical Theory and Music.
Proceedings of the International Computer Music Conference, Thessaloniki, Greece , 19-22.
Clarke, E. F. (1999): Subject-position and the specification of invariants in music by Frank Zappa and
P. J. Harvey. Music Analysis, 18, 347-374.
Clynes, M. (1983): Expressive microstructure in music, linked to living qualities. In J. Sundberg (Ed):
Studies of Music Performance. Publications issued by the Royal Swedish Academy of Music no. 39,
Stockholm.
Fowler, C. A. (1986): An event approach to the study of speech perception from a direct-realist
perspective. Journal of Phonetics, 14, 3-28.
Gabrielsson, A. (1973): Adjective ratings and dimension analyses of auditory rhythm patterns.
Scandinavian Journal of Psychology, 14, 244-260.
Gaver, W. W. (1993a): What in the world do we hear? An Ecological Approach to Auditory Event
Perception. Ecological Psychology, 5, 1-30.
Gaver, W. W. (1993b): How do we hear in the world? Explorations in ecological acoustics.
Ecological Psychology, 5, 285-313.
Gibson, J. J. (1966): The Senses Considered as Perceptual Systems. Boston: Houghton Mifflin.
Gibson, J. J. (1979/1986): The Ecological Approach to Visual Perception. Hillsdale, NJ: Lawrence
Erlbaum.
Langer, S. K. (1942): Philosophy in a New Key. Cambridge, MA: Harvard University Press.
McAdams, S. (1984): Spectral Fusion, Spectral Parsing, and the Formation of Auditory Images.
Unpublished PhD thesis, Stanford University.
Repp, B. (1993): Music as motion: a synopsis of Alexander Truslit's (1938) "Gestaltung und
Bewegung in der Musik". Psychology of Music, 21, 48-72.
Scruton, R. (1997): The Aesthetics of Music. Oxford: Oxford University Press.
Shove, P. & Repp, B. (1995): Musical motion and performance: theoretical and empirical
perspectives. In J. Rink (Ed.): The Practice of Performance. Studies in Musical Interpretation.
Cambridge: Cambridge University Press, p. 55-83.
Todd, N. P. (1992): The dynamics of dynamics: a model of musical expression. JASA, 91, 3540-3550.
Todd, N. P. (1995): The kinematics of musical expression. JASA, 97, 1940-1949.
Todd, N. P., O'Boyle, D. J. & Lee, C. S. (1999): A Sensory-Motor Theory of Rhythm, Time
Perception and Beat Induction. Journal of New Music Research, 28, 5-28.
Truslit, A. (1938): Gestaltung und Bewegung in der Musik. Berlin: Chr. Friedrich Vieweg.
Warren, W. & Verbrugge, R. (1984): Auditory perception of breaking and bouncing events: a case
study in ecological acoustics. Journal of Experimental Psychology: Human Perception and
Performance, 10, 704-712.
Windsor, W. L. (1995): A Perceptual Approach to the Description and Analysis of Acousmatic Music.
Unpublished PhD thesis, City University, London.
Windsor, W. L. (1996): Perception and Signification in Electroacoustic Music . In R. Monelle and C.
T. Gray (eds.) Song and Signification . Edinburgh University Faculty of Music.
Yeston, M. (1976): The Stratification of Musical Rhythm. New Haven: Yale University Press.
Back to index
Proceedings paper
It has been long understood that context is significant in our perception of musical behaviours. For
those of us brought up in the Western cultural tradition, a sense of audience is often an important
factor in the perception, cognition and perceived worth and significance of music, particularly in
relation to archetypal 'high art' and ' popular' genres. Any differential status and reward reflect the
relative meanings that we, as individuals and as groups, assign to the musical products of our
(sub-)culture (cf Finnegan, 1989; Burns, 1999; Carterette & Kendall, 1999). Singing to oneself in the
car, in the bathroom or whilst undertaking some daily chore is likely to be perceived differently from
singing on the concert stage in front of a paying audience. The former is private and personal, relaxed
and unselfconscious. In contrast, public singing involves a greater sense of 'performance', of implied
'correctness' against some perceived expectation of what counts as 'appropriate' musical behaviour in
this context.
So, an appreciation of context provides the socio-cultural and socio-musical framing for the
interpretation of a particular musical behaviour. Similarly, that which counts as 'music' within a
particular culture is also socially located. Particular groupings and patterns of sounds are accepted as
belonging to, even as exemplars of, specific musical genres. Although all musics may share basic
generic acoustic elements (related to acoustic waveform, duration, intensity and frequency), it is the
specific configuration of these in particular combinations that gives rise to the many different and
distinct musical genres.
Nevertheless, although an appreciation of context and the structure of the musical artefact are
necessary ingredients for an understanding of observed musical behaviour, they are not sufficient. A
provide evidence that they are already able to discriminate prosodic features of the dominant
(maternal) language such as intonation and rhythm (Mehler & Christophe, 1995 p947).
Further evidence of the interfacing of socio-cultural context with neuropsychobiological development
is provided by analyses of lullabies and infant songs from different cultures and also of infant-directed
speech. The musical structures of the lullabies and songs are characterised by perceived simplicity,
descending intervals and contours and relatively few contour changes. These features parallel some
prosodic features of infant-directed speech (Unyk et al, 1992 p25). During these first few months of
life, parents 'consistently guide the infant towards at least three levels of vocal expertise' (Papousek,
1996 p44-45). These levels gradually emerge during preverbal vocal development.
● level 1: initial fundamental voicing develops into prolonged, euphonic cooing (around the age
of 8 weeks); subsequently leading to phrasing and melodic modulation (two months);
● level 2: the vocal stream becomes segmented into syllables due to the use of consonants;
mothers facilitate this development through the use of rhythmic games and rhythmic melodies;
● level 3: canonical syllables appear and are treated by parents as 'protowords' to which are
attributed meanings, being assigned in a declarative manner to the naming of persons, objects
and events.
Adult-child dialogue is characterised by 'rich melodic modifications' during these first months of life
(Papousek, op.cit. p48), with repetitions, frequent glides, a prevalence of basic harmonic intervals
(3rds, 4ths, 5ths, 8ves) and sometimes dramatic changes in intensity.
Given the bipotentiality of early interactive vocalisation, it is not surprising to discover that the
borders between singing and speech are often blurred, particularly for the young child (Davies, 1994;
Davidson, 1994; Welch, 1994; Sergeant, 1994; Welch, Sergeant and White, 1996). Such blurring is
characteristic of one of the first phases of singing development for many young children, with singing
often described as 'chant-like', with a relatively restricted pitch range and melodic phrasing (Welch,
2000a). Nevertheless, some children become relatively skilled in the dominant musical song genre(s)
at a very early age, probably because of a greater match between individual potential and the
existence of a particularly nurturing, socio-cultural musical pathway.
Human voice development is, therefore, characterised by the extent to which the socio-cultural
context (including the dominant soundscape and its musical genres) interfaces with emergent physical
and mental abilities. Anatomical and physiological development create limitations to the range and
variety of vocal products. Compared to the adult, for example, the infant larynx is much smaller, with
different ratios of cartilage to ligament and less-defined mucosal tissues (Thurman & Klitzke, 2000).
The onset of puberty heralds the onset of major physical changes to the underlying vocal structures,
especially (but not only) in adolescent males. It is only after puberty that the internal vocal ligament
and mucosa become fully mature. Subsequently, as the vocal structures age, there can be a
degeneration of muscle and connective tissue alongside an increased ossification and calcification of
the laryngeal cartilages (Welch & Thurman, 2000). Even so, older singers are still capable of singing
expressively, not least because biological age is only loosely related to chronological age for those
individuals who have maintained healthy and well-conditioned vocal instruments. So, ongoing
changes in the physical bases for voicing across the lifespan in both structure and function have an
impact on vocal production in singing, particularly for the young, adolescent and old. Therefore,
certain aspects of a predominant musical genre (such as its vocal pitch range, tessitura, or vocal style)
may be beyond, or inimical to, the capabilities of a particular individual at a particular time in their
lives.
Similarly, psychological processing and development determines the salient features of both musical
input and sung output. In a supportive context, a mismatch between musical task and current vocal
competency could produce a greater motivation towards success and mastery. Equally, however, such
a mismatch in a different context could result in avoidance behaviours, or singing behaviours that are
judged to be consistently 'inadequate' when set against a music's internal rule system (as is the case
with singing 'out-of-tune'). The linguistic terms applied to singing demonstrate the ambivalence with
which societies nurture or hinder potential singing competency, as well as the development of singer
identity. Labels such as 'tone-deaf', 'monotone', 'growler', 'poor pitch singer' and having a voice that is
'breaking' or 'broken' may be contrasted with language which celebrates the vocal 'purity' of the
cathedral chorister and those 'gifted' individuals who are regarded as 'beautiful', 'powerful', or 'natural'
singers. Such positive and negative features are characteristic of the socio-musical environment and
are likely to be closely related an individual's value-emotive response to singing as a pleasurable or
unenjoyable activity.
Summary
Overall, there is a sense in which the 'genesis' of singing behaviour can be seen to have both a micro
as well as a macro meaning. With regard to the latter, pre- and perinatal musical experiences provide
the socio-cultural framework within which subsequent vocal utterances are shaped according to
dominant musical conventions towards that which counts as 'singing' within a particular (sub-)culture.
Birth is the beginning of voice production and it is possible to trace the genesis of singing behaviours
from this time. Initial vocal exploration during the cooing and babbling stages includes variations in
vocal pitch range, timbre and loudness. Concomitantly, there is a highly significant interaction with
parents (carers) using infant-directed speech that can facilitate a growing mastery of musical artefacts
and conventions. After the age of five 'children can hold onto a stable tonality throughout a song...the
typical 5-year-old has a fairly large repertoire of nursery songs from his or her culture' (Dowling,
1999).
Yet the nature of the physical development of the vocal instrument across the lifespan suggests that
each vocal transformation has its own genesis, especially after adolescence. Each of the periods of
early childhood, later childhood, puberty, adolescence, early adulthood, older adulthood and
senescence are marked by particular and sometimes significant changes in vocal structure and
(potentially) function. These physical changes require behaviour modification, a different
co-ordination of muscle groups and, in some cases, relearning of singing behaviours because the
previous motor programmes do not produce the required or expected singing outcomes.
Individual singing potential and achievement should be seen in relation to each of these lifespan
periods. If there is a succession of appropriately supportive contexts where singing 'task' and response
are matched, it should be possible to celebrate relatively accomplished singing behaviours from early
childhood through to senescence.
Bibliography
Burns, E.M. (1999). Intervals, Scales, and Tuning. In D. Deutsch (Ed.). The Psychology of Music [2nd
Edition]. London: Academic Press. pp 215-264.
Carterette, E.C. & Kendall, R.A. (1999). Comparative Music Perception and Cognition. In D.
Deutsch. The Psychology of Music. [2nd Edition]. London: Academic Press. pp 725-791.
Cottrell, S. (1998). Partnerships in the classroom. British Journal of Music Education, 15(3), 271-285.
Davidson, L. (1994). Songsinging by Young and Old: A Developmental Approach to Music. In R.
Aiello and J.A. Sloboda (Eds.). Musical Perceptions, Oxford: Oxford University Press. pp 99-130.
Davies, C. (1994). The Listening Teacher: An Approach to the Collection and Study of Invented
Songs of Children Aged 5 to 7. Musical Connections: Tradition and Change. Auckland, NZ:
International Society for Music Education. pp. 120-127.
Dowling, W. Jay. (1999). The Development of Music Perception and Cognition. In D.Deutsch (Ed).
The Psychology of Music. [2nd Edition]. London: Academic Press. pp 603-625.
Durrant, C. (1998). Developing a choral conducting curriculum. British Journal of Music Education.
15(3), 303-316.
Finnegan, R. (1989). The Hidden Musicians. Cambridge: Cambridge University Press.
Folkestad, G. (1998). Musical Learning as Cultural Practice: As Exemplified in Computer-Based
Creative Music-Making. In B. Sundin, G.E. McPherson & G. Folkstad (Eds.). Children Composing.
Malmö: Lunds University. pp 97-134.
Gardner, H. (1983). Frames of Mind. London: Heinemann.
Hallam, S. (1998). The Predictors of Achievement and Dropout in Instrumental Tuition. Psychology
of Music, 26(2), 116-132.
Hargreaves, D.J. (1996). The development of artistic and musical competence. In I.Deliege & J.
Sloboda (Eds.). Musical Beginnings. Oxford: Oxford University Press. pp145-170.
Harwood, E. (1998). Musical learning in Context: A Playground Tale. Research Studies in Music
Education, 11, 52-60.
Lecanuet, J-P. (1996). Prenatal auditory experience. In I.Deliege & J. Sloboda (Eds.). Musical
Beginnings. Oxford: Oxford University Press. pp3-34.
Lecanuet, J-P. (1998). Foetal responses to auditory and speech stimuli. In A. Slater (Ed.). Perceptual
Development. Hove: Psychology Press. pp317-355.
Mehler, J. & Christophe, A. (1995).Maturation and Learning of Language in the First Year of Life. In
M.S. Gazzaniga (Ed.). The Cognitive Neurosciences. Cambridge, Mass: MIT Press. pp 943-954.
Nettl, B. (1983). The Study of Ethnomusicology. Urbana: University of Illinois Press.
OfSTED. (1998). The Arts Inspected. Oxford: Heinemann.
Papousek, H. (1996). Musicality in infancy research: biological and cultural origins of early
musicality. In I.Deliege & J. Sloboda (Eds.). Musical Beginnings. Oxford: Oxford University Press.
pp37-55.
Patel, A. D. & Peretz. I. (1997). Is music autonomous from language? A neuropsychological
appraisal. In I. Deliege & J. Sloboda (Eds.). Perception and Cognition of Music. Hove, UK:
Psychology Press. pp 191-215.
Sergeant, D.C. (1994). Towards a Specification for Poor-Pitch Singing. In G.F. Welch and T. Murao
(Eds.). Onchi and Singing Development. London: David Fulton. pp 63-73.
Sloboda, J.A. (1985). The musical mind: The cognitive psychology of music. Oxford: Clarendon
Press.
Spender, N. (1987). Psychology of Music. In R.L. Gregory (Ed.), The Oxford Companion to the
Mind.. Oxford: Oxford University Press. pp499-505.
Thurman, L. & Klitzke, C. (2000). Highlights of physical growth and function of voices from
pre-birth to age 21. In L. Thurman and Welch, G.F. (Eds.). Bodymind and Voice: Foundations of
Voice Education. [2nd Edition]. Iowa: National Center for Voice and Speech.
Trehub, S.E., Schellenberg, G. & Hill, D. (1997). The origins of music perception and cognition: A
developmental perspective. In I. Deliege & J. Sloboda (Eds.). Perception and Cognition of Music.
Hove, UK: Psychology Press. pp103-128.
Unyk, A.M., Trehub, S.E., Trainor, L.J. & Schellenberg, E.G. (1992). Lullabies and Simplicity: A
Cross-Cultural Perspective. Psychology of Music, 20(1), 15-28.
Van Lancker, D. (1997). Rags to Riches: Our Increasing Appreciation of Cognitive and
Communicative Abilities of the Human Right Cerebral Hemisphere. Brain and Language, 57(1), 1-11.
Webster, P.R. (1998). Young Children and Music Technology. Research Studies in Music Education,
11, 61-76.
Welch, G.F. (1994). The Assessment of Singing. Psychology of Music, 22, 3-19.
Welch, G.F. (1998). Early Childhood Musical Development. Research Studies in Music Education,
11, 27-41.
Welch, G.F. (2000a). Singing Development in Early Childhood: the Effects of Culture and Education
on the Realisation of Potential. In P.J. White (Ed.) Child Voice. Stockholm: Royal Institute of
Technology. [in press].
Welch, G.F. (2000b). 'The Ontogenesis of Musical Behaviour: A Sociological Perspective'. Research
Studies in Music Education. [in press].
Welch, G.F., Sergeant, D.C. & White, P. (1996). 'The singing competences of five-year-old
developing singers'. Bulletin of the Council for Research in Music Education, 127, 155-162.
Welch, G.F. & Thurman, L. (2000). Vitality, health, and vocal self-expression in older adults. In L.
Thurman and Welch, G.F. (Eds.). Bodymind and Voice: Foundations of Voice Education. [2nd
Edition]. Iowa: National Center for Voice and Speech.
Young, S. (1999). Just Making a Noise? Reconceptualizing the Music Making of Three and Four Year
Olds in a nursery setting. Early Childhood Connections. 5(1), 14-22.
Back to index
Proceedings abstract
Back to index
Proceedings paper
Abstract
The presence of girls in English cathedral choirs is becoming increasingly commonplace, but there are those who believe that they are unable to carry out
this role appropriately in this traditionally male dominated arena. It is suggested by some that girls are unable to produce a sound that is in keeping with the
musical traditions of the choral sung divine offices. The aim of this paper is to explore whether or not listeners can tell the difference between the blended
sound of boys and girls in a cathedral musical context. A perceptual experiment was conducted to determine the extent to which listeners can tell whether
boy or girl choristers were singing the top line in snippets from professional English cathedral choir recordings where the lower three parts and the acoustic
environment remained essentially constant. Overall, the results suggest that listeners can tell the difference between girls and boys and that this difference
has statistical significance. In addition, the data indicates that this ability improves with age and that girls are more accurate than boys. It is also clear from
the data that identification abilities vary between some of the musical settings selected as stimuli.
Background
The English cathedral choir has for centuries been the preserve of male singers only with the upper musical line being sung by boys, or trebles, in
pre-pubescence. The traditional musical repertoire composed for sung divine offices has therefore been written for a top line sung by boy trebles and there
are those who would argue that their sound is thereby was specifically desired by composers. As girls have been admitted into cathedral choirs over the last
five years or so to provide an alternative top line to the trebles, a debate has been provoked over whether or not girls can provide a sound that is appropriate
in the musical context of the English cathedral sung divine office. In general, it is either the girls or the boys who sing the top line for the regular divine
offices, allowing the boys to sing less services per week than previously, although they might well be brought together for concerts or large festivals.
Whilst discussions over the appropriateness of the place of girls in cathedral choirs might for some have more bearing on sexist rather than purely musical
issues, important musical questions are posed given that there are basic differences between boys and girls in this age range in terms of their vocal
development which relates to adolescent physiological change.
In order to explore whether or not listeners can tell the difference between boys and girls singing the top line in choral music, a number of potential
variables require consideration. These include the presence or absence of organ or other accompaniment, the potential influence of other sung parts (usually
alto, tenor and bass), the acoustics of the environment, and the recording conditions. In addition, the musical repertoire chosen could influence listener
decisions. Professional recordings of one particular cathedral choir were traced in which the majority of these variables remained relatively constant, and
these formed the basis for the listening material. The remainder of this paper describes the data preparation, the conduct of the tests, the results obtained,
their statistical analysis and draws conclusions.
Method
The purpose of the perceptual listening test was to investigate whether or not listeners can perceive the difference between boys or girls singing the top line
of snippets of traditional cathedral choral music. The musical material was taken from two professionally recorded compact disks (CD) of one English
Cathedral choir. The girl or boy choristers sang with the lay clerks and were recorded in the cathedral on each disk respectively. The material therefore
encompassed minim variability between the recordings in terms of acoustic cues that could cue differences between the two recordings, other than those
arising from the top line itself. The musical repertoire itself was however, different. Indeed it is highly unlikely that professional recordings would be made
of identical pieces of music in such circumstances. The recordings are of Wells Cathedral choir. The CD with the boy choristers singing the top line was
recorded in March 1998 and is titled 'The Glorious Renaissance' (No. GCD4019). That involving the girl choristers is 'I look from afar' (No. LAMM102D)
which was recorded in November 1997.
In order to keep the listening test relatively short with a view to gaining a large number of responses, it was decided to select ten snippets each of boys and
girls singing the top line, and each snippet was to last approximately 20s. This gives a total listening time of less than 7 minutes. The snippets themselves
were chosen such that they contained a fairly continuous top line throughout. In some cases, organ accompaniment was used.
The snippets themselves were extracted from the relevant CD in stereo via a SoundBlaster Gold card using the Goldwave waveform editing package
running on a standard PC computer. Each was edited to last approximately 20s, and then amplitude normalised to keep the volume levels on replay similar
for each. The onset and offset amplitude envelopes were linearly ramped from 0% to 100% in order to avoid audible clicks at the start and finish of each
CD test track. The 20 snippets were archived in a random order onto recordable audio compact disks from which the tests themselves were played using
standard stereo audio equipment. A response sheet was produced that requested the age and sex of the listener, and a tick-box response against boys or girls
for each of the 20 snippets with the instruction: Please indicate whether you think boys or girls are singing in the choral snippets.
The listening tests were conducted with groups of listeners in different acoustic environments using standard CD replay equipment. However, in every case
the listening conditions remained constant during each test with the listeners sitting in a fixed position relative to the loudspeakers. In total, 189 listeners
agreed to take part in the experiment, of whom 81 were female and 108 male, and their distribution with respect to sex and age is shown in table 1.
Male 11 36 23 9 21 5 2 1 108
Female 19 22 16 8 13 2 1 0 81
Data analysis
For any experiment involving psychoperceptual judgements it is important to ensure not only that enough data is gathered to provide a fully representative
statistical sample of the population as a whole, but also that all possible sub-populations are also well sampled. Further, an appropriate range of stimuli
must be selected in order to reduce possible researcher bias at the design level.
In this study, the experimental design first required that each listener must give a specific answer - male or female. Null or "don't know" answers were
forbidden in advance. Further, all listeners were exposed to the various stimuli in a similar acoustic environment and were unaware of the accuracy of their
previous responses. This means that each individual listener response forms a Bernoulli trial and that the group responses to any one specific stimulus
forms a sequence of independent Bernoulli trials with a constant probability, p, of success. Under such circumstances the total number of successful
responses over n trials, X, is a random variable which follows a binomial distribution:
(1)
However, to allow a reasonable sampling of the population of stimuli, the experimental design allowed for the initial trials to be carried out with 20
different stimuli, ten of each sex. There existed the implicit assumption that the designers neither had a priori knowledge of the relative difficulties of
correctly identifying any of the stimuli, nor of each of the sex subgroups, nor of any known probability density function describing the relative difficulties
for the full population of appropriate stimuli. Hence each different stimulus will in general be represented by a different binomial distribution, for ni trials
of the ith stimulus, with neither the individual probability Pi nor their probability distribution known in advance as follows:
(2)
The overall characteristics of this design are that the 20 stimuli represent a reasonable compromise between the limitations of pragmatic experimental
logistics and the need to provide an adequate sample of the population of suitable stimuli. In combination with 189 listeners therefore, an overall total of
3780 success/fail Bernoulli trials are available for statistical analysis. The resulting dataset is large enough to allow sensible analyses of sub-populations
which can not only be grouped listener age and/or sex, but also by their relative success in recognising the voices of boys or girls within the stimuli.
However, the full set of 3780 results and any sub-grouping which includes counts over more than one stimulus, will consequently not follow a simple
distribution - in general it will instead have a more complicated composite form, as it arises from the sum of binomial distributions of different means and
variances.
Hence, in analysing the success counts, it is not appropriate to use any statistical tool that presumes normality or makes any other distribution-specific
assumptions. In such circumstances, it is common to use an approximation based on the argument that although the underlying probability distribution is
unknown, any calculated sample mean is itself a random variable which the central limit theorem allows us to assume will follow a normal distribution - an
approximation which improves as the size of the sample is increased. With the further assumption that the measured sample standard deviation is
representative of the population standard deviation, the confidence interval for the true population mean µ is then given approximately by:
(3)
where: the dataset size is n, is the sample mean, s is the sample standard deviation and z is the point of the standard normal distribution.
A confidence level of 99% was chosen for the current work, which is equivalent to a value of z = 2.575. This approach allows the calculation of sample
means and standard deviations for all desired sub-populations as well as the estimation of 99% confidence limits on those means. The inclusion of the
group size within the calculation ensures that the calculated confidence limits correctly reflect the underlying group size, thereby avoiding dubious
conclusions being drawn from inappropriately small data sets.
Results
Figure 1 show a histogram of the number of correct identifications given by subjects and it can be seen that the mode is 12 correct responses. It is
interesting to note that three listeners achieved 19 correct responses whilst one achieved just 6 correct. The most important aspect of these data is that there
was considerable variation in the overall abilities of listeners to label individual stimuli themselves correctly as indicated in the analysis discussion above.
Table 2 provides details of the percentage (rounded) of correct listener responses to each stimulus. It can be seen that there is considerable variation
between the stimuli themselves, suggesting that some were rather easier to identify (e.g. stimuli 3, 4, 15, 17, 20) compared to others (e.g. stimuli 1, 2, 6, 8,
14, 16). These considerations informed the choice of statistical test employed as described above. These data are plotted in the form of a histogram in figure
2 which serves to emphasise the variation between the stimuli themselves. It is clear that stimulus 16, which was correctly identified by only 37% of the
listeners, could well contain some strong acoustic 'anti-cue' to the identity of the sex singing the top line.
Figure 2: Histogram of the number of correct listener responses (%) to each stimulus.
Stimuli 1 2 3 4 5 6 7 8 9 10
(1-10)
% correct 43 49 73 76 57 42 63 45 72 65
Stimuli 11 12 13 14 15 16 17 18 19 20
(11-20)
% correct 62 68 65 46 77 37 73 74 50 74
Stimuli Male listeners (n=108) Female listeners (n=81) All listeners (n=189)
Stimuli Child listeners (n=51) Adult listeners (n=138) All listeners (n=189)
Table 4: Statistical analysis of data by listener age (child: 0-17 years; adult: >17years).
A statistical analysis was carried to out to investigate three questions with respect to boys and girls singing the top line:
1. Can listeners tell the difference?
In order to investigate these questions, the sample means and 99% limits were calculated for:
● male listeners and female listeners (table 3), and
● child listeners and adult listeners (table 4).
Statistical significance at the 0.01 level can be ascertained with reference to the 99% limit values. If the mean plus or minus the 99% limit given does not
overlap the mean with which it is being compared and vice versa, then the difference is significant at the 0.01 level.
In regard to the first question posed, it can be observed that all listener groups represented have an ability that is better than chance at the 0.01 level to
identify the sex of the choristers singing the top line. It should be noted in addition that for all listener groups given in tables 3 and 4, boys are recognised
more often than girls when singing the top line, and this is statistically significant at the 0.01 level.
In answer to the second question, there is no statistically significant difference between the abilities of the male and female listeners to identify who is
singing the top line.
The third question has been investigated by splitting the listener group into children and adults by taking under eighteen year olds as children. These data
are given in table 4, and it can be observed that there is a statistically significant difference at the 0.01 level between the identification abilities of adults and
children in identifying either boys or girls. In all cases, the adults recognise the sex of the choristers singing the top line more often that the children
listeners. In addition, this significant difference is also exhibited for all stimuli between the adult and child listeners, whereas this is not the case between
male and female listeners (see table 3).
Discussion
It is clear from these data that there is no single obvious distinguishing acoustic feature that enables listeners to distinguish boy and girl choristers singing
the top line in choral music. However, it has been shown that listeners can distinguish differences between individual untrained boy and girl singers from
approximately the age of 8, and that this ability becomes more accurate as the age of the singers increases (Welch, 2000). In a study of unison singing
reported in Sergeant and Welch (1997), it was found that on average listeners could not do better than chance in identifying a group of unison boys from
unison girls. However, they indicate that the discrimination ability varied depending on the stimulus; a similar comments might be made about the variation
exhibited in figure 2. In these experiments, the music used was all from the cathedral repertoire and it is not believed that there was any feature of the
selected material that carried obvious identity clues (e.g. all hymns by boys and all anthems by girls).
Listeners are able, on average, to discriminate between girls and boys singing the top line and all were better identifying the boys than the girls. Perhaps
this is due to the cathedral tradition of all male choirs which almost all listeners will be familiar with, even if it is only through listening to broadcasts of
large state occasions such as the Coronation, Royal Weddings, State Funerals or Carols from King's. Are there some acoustic cues inherent in the boy
chorister blended sound that provides the listeners with an acoustic 'fingerprint' or hallmark which the girl chorister blended sound either cannot, for
physiological reasons, or has yet to achieve? Clearly, analysis of the acoustic properties of the snippets used in this test could provide some clues.
Adult listeners are able to identify the sex of the choristers more often than children listeners which suggests that there may be an issue of either familiarity
with the sound or listening experience. Could direct working familiarity with the blended sound of cathedral choirs be an important issue here? In order to
understand better this effect, it might be appropriate to carry out further listening tests with ex-choristers, current lay clerks, cathedral organists and
choirmasters as opposed to listeners who have no direct working experience with cathedral music. On the other hand, this could be a result of increased
listening experience emerging from 'lifelong learning' through broadcasts, recorded music, concerts and sacred services. Perhaps it arises from an instinct to
be aware of and look after our young which provides a heightened sense of the differences between the vocal outputs from boys and girls.
The analysis presented in this paper poses a number of further questions to ask of the data in terms of whether there are finer structures that might be
observed between listeners of different ages and sex, and the extent to which any trends therein are common to both girl and boy chorister stimuli. It is also
clear from the analysis to date that some stimuli are easier to identify than others and the snippets employed in the test will be subject to acoustic analysis,
both standard spectrography (e.g. Baken, 1987) and hearing modelling spectrography (e.g. Howard et al., 1995) the results of which can be considered in
terms of timbre (e.g. Howard and Tyrrell, 1997). Identification of acoustic differences is likely to lead to a greater understanding of the effect of chorister
training. This could in turn lead to the development of real-time feedback devices for use in practice as support tools in the vocal training process.
Acknowledgements
The authors would like to thank all the listeners who took part in these tests.
References
Baken, R.J. (1987). Clinical measurement of speech and voice, London: Taylor and Francis.
Howard, D.M., Hirson, A., Brookes, T., and Tyrrell, A.M. (1995). Spectrography of disputed speech samples by peripheral human hearing
modelling, Forensic Linguistics, 2, (1), 28-38.
Howard, D.M., and Tyrrell, A.M. (1997). Psychoacoustically informed spectrography and timbre, Organised Sound, 2, (2), 65-76.
Sergeant, D.C. and Welch, G.F. (1997). Perceived similarities and differences in the singing of trained children's voices, Choir Schools Today, 22,
9-10.
Back to index
Proceedings paper
The VRP recordings followed the recommendations of the Union of European Phoniatricians (Schutte &
Seidner, 1983). However, SPL was measured using a flat frequency curve rather than dB (A). The
microphone distance was 30 cm. The subjects were asked to sing as soft and as loud as possible at each
pitch on the vowel [α ] . A synthesiser, CASIO SA-20, was used to give reference pitches. Whenever
required, the experimenter also sang the pitches.
Data from a previous study of boys of the same age are used for comparison (McAllister et al, 1994).
Results
Nineteen of the 22 girls could be inspected using indirect microlaryngoscopy, see Table 1. In 10 children
stroboscopy was also used. Seven girls had incomplete glottal closure, ie posterior glottal chinks and three
had complete closure all along the vocal folds according to the stroboscopic examination. No girl in the
present study had vocal nodules and/or hourglass chinks.
Table 1. Results from glottal inspection with and without stroboscopy. Note that the vocal folds of three
girls could not be inspected.
Complete Incomplete Incomplete Hourglass Inspected Total
glottal glottal glottal chinks but closure N
closure closure all closure not
along the along the evaluated
vocal folds posterior
half or two
thirds
Micro-laryngoscopy 3 1 9 - 6 19
Stroboscopy 3 - 7 - - 10
The perceptual evaluation showed that two girls were slightly hoarse. However, as can be seen in Figure 1
the girls had a lower mean hoarseness value than the boys in their peer group, (girls mean hoarseness value
= 23,1, boys mean hoarseness value = 33,4 mm rated hoarseness). The perceptual evaluation rated 13 girls
as free of hoarseness and any related voice dysfunction. One girl had a mutational voice according to a
growth index (Taranger et al, 1976) and the perceptual evaluation.
Figure 1. Rank ordered mean hoarseness values for girls (unfilled squares) and boys (filled diamonds). Note
the knee in the distribution around 40 mm for both boys and girls.
All 22 girls could complete the VRP recording with at least an octave in range. The mean fundamental
frequency range in semitones for these girls with no singing experience apart from that provided within the
regular school system was 25 st, from G3 - Giss5 (196 - 830 Hz), Table 2. The one girl with a mutational
voice had somewhat larger fundamental frequency range than her peers, 33 st.
The mean maximum dynamic range for the whole group, defined as the difference between the upper and
lower VRP contours at a given fundamental frequency was 20,7 dB. Only very minor differences could be
observed between the different voice-groups. However, regarding mean F0 range the girls with glottal
chinks had a somewhat larger F0 range in semitones (st) than the controls.
The averaged VRP for 10 boys of the same age-group with normal voices as compared to that of the present
11 girls with normal voices showed that the lower contour of the girls was somewhat lower that that of the
boys, see Figure 2. This may reflect the willingness of the vocal folds to vibrate at low driving pressures.
Regarding the upper contour the boys had higher values practically throughout the frequency range
indicating an ability in the vocal folds to vibrate at higher subglottal pressures.
Figure 2. Mean VRP-data for 10 boys (filled diamonds) and 10 girls (unfilled squares) with normal voices.
Discussion
Children have been reported to have a somewhat elevated lower VRP curve as compared to adult voices and
sometimes a more restricted dynamic range (Kotby et al., 1995). In the present study of 10-year old girls the
lower contour was similar to that found in adult female voices (Coleman, Mabis, Hinson, 1977; Gramming,
1991).
All girls could complete the VRP-recording. However, in the previous study of boys of the same age 28%
could not sing the desired pitch and thus could not complete the VRP recording.
Pedersen and co-workers, (1986; 1987) found that register transitions manifested in VRP as a dip of 5 dB or
more the upper contour. This could not be confirmed in the present investigation.
Conclusions
VRP analysis appears to be a useful method for evaluation of ten-year old children's voices. All girls could
produce data for a VRP recording. As compared with adult women, girls seem to have somewhat
compressed dynamic VRP contours reflecting restricted dynamic vocal capabilities.
Key Words: Voice range profiles, girls' voices, pitch range, mutation, dynamics.
References
Coleman RF, Mabis JH, Hinson JK. Fundamental frequency-sound pressure level profiles of
adult male and female voices. J Speech Hear Res 1977;20:197-204.
Gramming P. The Phonetogram. An Experimental and Clinical Study, Diss., Dept. of
Otolaryngology, Malmö General Hospital, Sweden: Lund University, 1988.
Gramming P. Vocal loudness and frequency capabilities of the voice. J Voice 1991;5:2:144-57.
Klingholz F, Jolk A, Martin F. Stimmfelduntersuchungen bei Knabenstimmen (Tölzer
Knabenchor). Sprache-Stimme-Gehör 1989;13,107-11.
Klingholz F, Martin F, Jolk A. Die Bestimmung der Registerbrüche aus dem Stimmfeld.
Sprache Stimme-Gehör 1985;9:109-11.
McAllister, A., Sederholm, E., Sundberg, J., Gramming, P. Relations between Voice Range
Profiles and Physiological and Perceptual Voice Characteristics in Ten-year-old children. J
Voice 1994, 3:230-239.
Ohlsson A-C, Järvholm B, Löfqvist A. Vocal symptoms and vocal behaviour in teachers. Scand
J Logoped Phoniat 1987;12:61-9.
Pabon P, Plomp R. Automatic phonetogram recording supplemented with acoustical voice
quality parameters. J Speech Hear Res 1988;31:710-22.
Pedersen MF, Munk E, Bennet P, Møller S. The change of voice during puberty in choir
singers measured with phonetograms and compared to androgen status together with other
phenomena of puberty. Proceedings of the Tenth International Congress of Phonetic Sciences,
1983:604-608.
Pedersen MF, Møller S, Krabbe S, Munk E, Bennet P, Kitzing P. Change of voice in puberty in
choir girls. Acta Otolaryngol (Stockholm) 1984;Suppl. 412;46-9.
Schutte HK, Seidner W. Recommendation by the Union of European Phoniatricans (UEP):
Standardizing Voice Area Measurements/Phonetography. Fol Phoniat 1983;35:286-88.
Sederholm E, McAllister A., Sundberg J, Dalkvist J. Perceptual analysis of child hoarseness
using continuous scales. Scan J Logoped Phoniat 1993;18:73-82.
Taranger J, Engström I, Lichtenstein H, Svennberg-Redegren I. The somatic development of
children in a Swedish urban community. A prospective longitudinal study. VI. Somatic
pubertal development. Acta Paediatr Scand 1976;Suppl. 258.
Back to index
Proceedings abstract
Leon Thurman, Ed.D., Fairview Voice Center, Fairview-University Medical Center, Minneapolis, Minnesota,
USA
Background
During prenatal development, human biological processes evolve a unique bodymind. Newborn human
bodyminds possess a unique array of primary capability-ability clusters that gradually process experiences
into the formation of a unique neuropsychobiological self. Three interrelated primary capability-ability
clusters are predominantly involved in self-determination: (1) interactive-expressive, (2) imitative, (3)
exploratory-discovery. The neuromuscular coordinations that produce voice are an integral part of the
interactive-expressive cluster, and the other two clusters are prominently used in the development of spoken
and sung self-expressive abilities. During childhood, adolescence, and adulthood, patterned genetically
triggered brain growth spurt cycles occur that amplify those capability-ability clusters and bring new ones
on-line. Human capabilities can be converted into optimal abilities only with optimal environmental support.
Aim
This paper argues that (1) all human beings who have relatively normal vocal anatomy and physiology are
capable of learning skilled, expressive speech prosody and singing abilities, and (2) optimum development
of these abilities can play a major role in the development of empathic social relatedness, constructive
personal competence, and self-reliant autonomy, primary characteristics of constructive selfhood.
Main Contribution.
The neuropsychobiological benefits of self-expressive speaking and singing will be presented, along with
decrements to self-identity formation that can result from lack of, or suboptimal development of,
self-expressive speaking and singing. Evidence in support of the paper's contributions will be from the
neuropsychobiological sciences and from case histories of voice disordered patients and voice education
clients at Fairview Voice Center.
Implications.
Evidence cited in this paper may be used to: (1) challenge social myths about the genetic heritability of
"good speaking voices" and "good singing voices", (2) point parents, general educators, and music
educators toward ways to optimally support the development of self-expressive speaking and singing skills,
and thus, enhance the development of constructive selfhood, that is, empathic social relatedness,
constructive personal competence, and self-reliant autonomy.
Back to index
Proceedings paper
In this paper, a sketch will be presented of the various avenues of thought that have led to the development
of a theoretical framework that allows for the investigation of the psychological organisation underlying
musical listening. The model is based, first, on the Gestalt principles and the contribution of Lerdahl and
Jackendoff to the theory of grouping; and then, on the opposition between the idea of SIMILARITY which
links objects together and that of DIFFERENCE which distinguishes objects from each other. These
oppositions have led to the development of some new concepts -cue abstraction and the formation of
imprints - which have provided us with an approach to the study of the activity of categorisation that
underlies the mental organisation of musical information.
The Gestalt heritage
It is nowadays well-known that the Gestalt laws formalize, in essence, a spontaneous and unconscious
tendancy of human psychological mechanisms to define units in the perceptual field generated by the
properties of objects and the relationships between them: proximity, similarity, common fate and good
continuation. These factors determine how a perception is segmented and how it is organized into groups
by generating boundaries between regions.
The goal of the first chapter of Lerdahl and Jackendoff's A Generative Theory of Tonal Music (1983) was to
present a systematic formalisation of the general laws of perception as applied to musical rhythm. This
theory has already been described on more than one occasion (e.g. Deliège, 1987a) therefore there is little
point in dwelling on it here.
As I pointed out in a study on this subject published in 1987, a common idea, that can be applied regardless
of the grouping principle involved, accounts for the emergence of boundaries within a total structure: a
grouping boundary is invariably perceived when there is a perceived difference between the groups
adjoining the boundary, as opposed to a similarity between the elements within the groups (Deliège 1987a).
The perception of SIMILARITY and DIFFERENCE
This causes us to turn our attention to the rôle of similarity in perceptual processes, one of the most
important problems in contemporary cognitive psychology (as emphasized recently by Jean-François Le
Ny (1997)). The importance of the idea of SIMILARITY and, through this idea, the various degrees of
equivalence that follow from it - identity, repetition, invariant-variant relationship - has been described
many times in the context of musical experience and practice by composers, theorists and music
psychologists as diverse as Schoenberg (1967), Webern (1933/1980), Souris (1976), C. Deliège (1984,
Chapter VI) and Imberty (1997). Incidentally, the subject is still topical in the field of psychology if one is
to judge from the number of recent meetings, symposia and publications dedicated to this subject.
Consider, for example, Simcat 97, the Interdisciplinary Workshop on Similarity and Categorisation
organised at the University of Edinburgh in November 1997; and the special issue of the review Computing
in Musicology devoted to melodic similarity, published recently by MIT Press.
Also, in the context of an experimental study that I carried out in 1991 (I. Deliège 1991) on the perception
of invariance/variance that took Steve Reich's Four Organs as a starting point, my conclusions converged
with those from ethnomusicology. According to Gilbert Rouget (1990), the perception of SIMILARITY
could be more developed in certain ethnic groups. For example, Simha Arom (1985) encountered in Central
Africa concepts of similarity and of identity for which the perceptual scales were more flexible than our
own and went beyond what we would customarily include under these headings in our daily perceptual
experience. In particular, he cites among others two musical fragments - one rhythmic, the other melodic -
as examples of the sense of similarity experienced by African musicians (see Figure 1). The rhythmic
sequences are, it seems, judged to be the "same" when each has the same total number of events, the
arrangement and rhythmic articulation of these events being insignificant. As for the two melodic
sequences, these are perceived to be "identical" because they have a similar melodic contour and because
each only uses notes from a single pentatonic scale. For all that, the idea of universality that is recognized
as being a feature of the problem of SIMILARITY in the field of psychology is neither breached nor
reduced, but we have just pointed out the effect of cultural milieu as a factor that could influence the ways
in which psychological mechanisms adapt to their musical environment: the analysis and understanding of
these processes could therefore force us to introduce tools that are better adapted to the problem than the
Gestalt laws alone.
Fig 1 [If you have Finale software installed click here to download file]
It was these ideas that led me, a decade or so ago, to the development of a model based on the Principles of
Similarity and Difference (I. Deliège 1989) as the organising principles underlying musical listening. These
principles arose directly from the Gestalt principles of which they constitute an extension, or, more
precisely, a generalisation insofar as the psychological mechanisms concerned, previously analysed in
terms of proximity, similarity, common fate, good continuation etc., are henceforth divided only into two
categories: that of Similarity which proposes that small differences between elements within a constituent
are minimised; and that of Difference, which assumes that the contrast between elements adjacent to a
boundary are emphasized. The Principles of Similarity and Difference are thus defined, on the one hand as
laws of description and structural analysis of musical perception, while, on the other hand, their function is
based on a preliminary act of categorisation, that is to say a line of thought that focuses on the explanation
of the underlying mental processes. I want to emphasize this latter aspect in particular because it distances
the model from its Gestalt heritage, whilst also developing it: the first priority of the Gestalt theorists was to
emphasize the importance of the structural relationships that govern perception and to express the
principles underlying these structural relationships in the form of laws that were supposed to respect the
holistic character of perceived totalities; but, as Johnson-Laird has pointed out, the description of mental
processes was not necessarily privileged (1988/1993, p.23). This aspect of cognitive processes that divides
up structures according to the categories provided by the proposed principles turns out to be essential in the
study of the formation of a mental schema of a musical work.
References to theories of categorisation
Recently, George Vignaux wrote (1999, p.13):
Similarities allow us to group objects together, differences allow us to set them apart. Meaning
arises from this double-game of similarity (which gathers together families, species and
moments in time) and difference (which sets families, species and moments apart either in
time or in form.)
Thus, the main ideas of Similarity and Difference are recognized once again as being fundamental to the
activity of thinking about the world, organising it and classifying the objects and phenomena that one finds
in it. It remains only to specify how one can gain access to these concepts.
For Bruner, "all perceptual experience represents the final product of a process of categorisation" (1958,
p.3). In other words, when faced with the daily environment, an individual refers to the knowledge that he
has acquired and stored in memory and classifies what he perceives according to the collection of
categories that he has already developed. On the other hand, an unfamiliar environment arouses a degree of
uncertainty inherent in novel experiences. In this case, the product that Bruner talks of gradually becomes
reality through intermediate attempts at classification - a play of equivalences and comparisons - that
continue until the need to understand the structures encountered succeeds in assimilating these new
structures to the world that the perceiver has internalised:
J.S. Bruner wrote that an object, an event or a sensation that is unclassifiable...is a
phenomenon that is so rare that the possibility of such a phenomenon existing is doubtful. If a
perceptual experience could be so virginal and bereft of all categorisation, then it would be a
pure diamond enclosed in the silence of internal experience (ibid., p.4).
One can easily imagine how such operations may be organised in the case of everyday experiences, but the
case of musical listening raises problems of a quite different nature. To propose the hypothesis that the
comparisons and classifications of structures are generated during the listening process seems rather
inexplicit so long as their object has not yet been identified. In other words, it is essential that the terms on
which the comparisons are made be specified if one wishes to provide a valid description of the process by
which a subject achieves an understanding of the musical environment during listening.
The dimension of time in which the work unfolds poses a particular problem. Yet, the idea that such
processes may intervene, at the conclusion of the listening process, with reference to certain incidental,
scattered memories, cannot be supported. Indeed, being able to remember events requires that the mental
schema starts being built up at the beginning of the listening process and progresses and evolves over the
course of listening to the work. Modelled on the theory of discourse perception advanced by Bartlett and
later by Kintsch and van Dijck, namely a focussing of attention on a selection of key elements that reduces
the complete collection of events to a size that can be managed by memory, I have proposed since 1987 the
hypothesis that an analogous selection process - that I call cue abstraction - operates over the course of
musical listening (I. Deliège 1987b, 1989, 1991). However, as the product of such a selection process in
this particular case lacks semantic reference, it is necessary to specify its substance.
A cue is a salient element which is prominent in the musical surface. This idea of key structures that play a
foreground rôle in the whole musical work, is undoubtedly an example of the idea of Figure/Ground
discrimination applied to the perceptual organisation of musical structures, an idea that again brings us
close to the Gestalt-influenced model. But beyond its connections with the original structures, a cue
generates different cognitive strategies by virtue of the very fact that it is "emergent" - a property that
confers upon it, cognitively speaking, a more clear definition with respect to the rest of the musical
environment. As I have pointed out elsewhere (I. Deliège 1989, p.307), the idea of a cue that is presented
here refers to that aspect of a sign that Charles Pierce talks about when he defines a cue (or index) as "a
sign that refers back to the object that it denotes because it is in a real sense affected by this object" (2.248,
p.140). This idea was echoed by Ignace Meyerson (1948/1995) for whom a cue has a close and permanent
connection with its original structures: "it is a fragment of this reality ... bound to the fragments that it
evokes by natural ties" (p.77).
As soon as it has been abstracted, the cue plays an active rôle in more than one account in the listening
process. In the first place, it has caught the attention of the listener and thus becomes fixed all the more
effectively in long-term memory; but, in conjunction with the storage mechanism, it "summarizes" the
sequences from which it arose into a succinct representation, a sort of label, that lightens the memory load
required to internalize the whole structure. Musical time is thus progressively marked out, the different cues
that are abstracted during listening acting as waymarkers or milestones. This gives rise to the notion of a
mental line which makes reference to a symbolic "musical space" in which the fundamental articulations of
the mental schema of the work are drawn. In addition, a cue always acts in concert with the other main axes
of the model - the Principles of Similarity and Difference - in the context of which a cue acts as a
cornerstone in the process of musical categorisation and has conferred upon it its primary dynamic
function. The cue therefore provides us with the basic point of reference for the comparisons between
musical structures that occur throughout the listening process.
Over the past ten to fifteen years, the theoretical aspects of the model have been investigated
experimentally in various ways, the principal areas of study being processes of segmentation, categorisation
strategies and the organisation of long-term memory in the mental schema. This research has been carried
out on a collection of works chosen from different periods of the musical repertoire (Bach, Schubert,
Wagner, Debussy, Reich, Berio, Boulez, etc.) Among the most significant contributions of this work is the
discovery that categorisation is always observed to act in the case of musical listening, even when the
studies have not been directly aimed at studying the categorisation processes in question. The results of
studies on segmentation show this explicitly (I. Deliège 1989, 1990, 1997): a cue acts as a driving force in
combination with the Principle of Similarity, that groups the structures established at the beginning of the
model, the Principle of Difference intervenes in order to signal the introduction of a new cue structure and
the beginning of a new group that will be developed along similar lines. Finally, an insistence upon using
the same cue over the duration of its employment by a composer-through literal repetitions or more or less
varied elaborations-generates a prototype figure, the imprint, in cognition. That is, human memory cannot
record in detail all the varied ways in which a cue is presented and so it finds a kind of "mean" that captures
the main features of all the presentations whilst still effecting that simplification of the musical environment
that is common to all processes of categorisation (Gineste 1997, p.95). Besides the importance of change,
the stress is placed here on Similarity and focuses on what happens between the boundaries defined by the
principle of Difference. Thus, with the intervention of processes of categorisation, the schema of the
complete piece is built up - a schema where the categories of Similarity take priority in the development of
musical time, the principle of Difference operating only briefly at certain points to punctuate the boundaries
between groups.
At this point in my exposition and in relation to the idea of psychological constants that I have stressed
since my earliest work, it is interesting to note a parallel between the fundamental principles of the theories
of Eleanor Rosch (1978) and the various articulations of the model based upon the Principles of Similarity
and Difference that have just been described. In effect, this constitutes a musical application that is quite
close to that proposed by Rosch for the case of categorisation of the environment and thus suggests the
existence of deep analogies between the cognitive foundations underlying information processing in quite
different domains.
With respect to the concept of category, Rosch defines two dimensions: horizontality and verticality.
Horizontality consists of arranging the members of a collection along a scale-for example, all the possible
variations of the same motif. This generates the concept of prototypicality which specifies the "central"
member of a category - the prototype - which is the best representative of the category as a whole. On the
other hand, verticality provides a hierarchical perspective on the cognitive organisation in question. At a
so-called basic level, the elements are mutually independent: they each have their own characteristics but
belong to the same higher, superordinal level. Finally, the lower or subordinate level can be understood as
being a sort of equivalent of the idea of horizontality, described above, inasmuch as it involves the use of a
single basic element in all the desired variants. For example, the superordinal level mammals contains, at
the basic level, elements such as dog and cat; the subordinate level being occupied by the different breeds
of dog (poodle, greyhound, basset, etc.) and cat (persian, chartreux, siamese, etc.)
A parallel between these theoretical principles and those described with respect to music could allow us to
see horizontality as being instantiated in the variations developed from a single cue - a dimension that could
prove particularly fertile in the case of musical composition and listening. Viewed from the perspective of
Similarity, they generate an Imprint, that is, an analogue of the prototype viewed from the angle of
typicality. Concerning the hierarchical aspects of the principle of verticality in the field of categorisation of
musical structures, one could find a parallel, for the basic level, in the various cues abstracted within a
piece. They satisfy the criterion of independence that Rosch's theory requires and generate their
subordinates, that is, the collection of structures derived from them in the form of variations-which
connects them with the notion of horizontality defined above. Finally, the effect of the Principle of
Difference is to segment the work into periods: those structures that are gathered around each cue that is
abstracted at the basic level-that is, the collection of variations derived from the cue-are grouped together at
the higher level: the superordinal.
In conclusion, it seems that the principles governing the psychological organisation of music perception
adhere quite closely to the properties of Rosch's model. This positive feature as regards the development of
the present model of music perception should not, however, be understood as an attempt to veil the
complexity of that cognitive ability which is musical listening. As in any model, the project aims to gain an
understanding of the psychological landscape of a particular domain and then to predict and discern as
many aspects as possible. It is dangerous, however, to believe that a model could be complete and that it
could reproduce in all respects the phenomenal reality that it aims to account for. A recent warning from
Jean-Pierre Dupuy (1999) emphasizes, precisely with respect to this, that
a scientific model is right from the start an imitation that has the same relationship with reality
that a "reduced model" has with the object for which it is intended to be a more easily
manipulable copy. (p.18)
Subsequent stages of the research will have to put forth models in which the listening process is imitated
more and more literally. This will allow us to understand aspects of music cognition that we have not yet
even glimpsed.
References
Arom, S. (1985). De l'écoute à l'analyse des musiques centrafricaines, Analyse Musicale, 1, 35-39.
Bruner, J. S. (1958). On perceptual readiness. Psychological Review, 64, 123-152.
Deliège, C. (1984). Les fondements de la musique tonale, Paris, Lattès.
Deliège, I. (1987a). Grouping conditions in listening to music : An approach to Lerdahl & Jackendoff's
grouping preference rules », Music Perception, 4 (4), 325-360.
Deliège, I. ( 1987b). Le parallélisme, support d'une analyse auditive de la musique : Vers un modèle des
parcours cognitifs de l'information musicale », Analyse musicale, 6, 73-79.
Deliège, I. (1989). Approche perceptive de formes contemporaines. In S. McAdams et I. Deliège (Eds)La
Musique et les Sciences cognitives, Bruxelles, Pierre Mardaga, pp. 305-326.
Deliège, I. (1991). L'organisation psychologique de l'écoute de la musique. Des marques de sédimentation
- indices et empreinte - dans la représentation mentale de l'oeuvre, Thèse de Doctorat, Université de Liège,
non publié.
Back to index
Proceedings paper
Peers are another key social group that can influence the behaviour and attitudes of young people (Best, 1983;
Boulton & Smith, 1994; Sroufe, Bennet, Englund, & Urban, 1993; Thorne & Luria, 1986; Thorne, 1986).
Research looking at the role of peers in the continuing motivation and participation in sports and the arts in
early adolescence, found that talented adolescents' relationships with peers appeared to serve an important
motivational function with respect to continued commitment to their talent area (Patrick, Ryan, Alfeld-Liro,
Fredricks, Hruda & Eccles, 1997). This is consistent with previous research on the role of peer support and
involvement with regard to participation in sport (Scanlan, Carpenter, Lobel & Simons, 1993). Berndt and
colleagues (1990) found that discussions among pairs of friends influenced their decisions in motivation-related
dilemmas (e.g. whether to complete a homework assignment) and that children tended to make friends who
were most similar in their decisions. Studies also suggest that peer group pressure to conform operates in the
domain of music. Children will hide their real musical interests in order to conform to group norms and avoid
the judgement and response of their peers (e.g. Finnas, 1987). They may even consider abandoning playing
instruments if the negative feedback from peers, such as bullying or name calling, becomes too much (Howe &
Sloboda, 1992). O'Neill and Boulton (1995b, see also O'Neill, 1997) found that both female and male
participants in their study thought a child of the same sex as themselves would be liked less, and bullied more,
by other children if they played an instrument that was considered 'gender inappropriate' (e.g. a boy playing the
flute, a girls playing the drums). Davidson, Howe, and Sloboda (1997) found it was important for children to
have the opportunity for informal musical engagement, such as playing with friends or family, as well as
opportunities for formal practice. Therefore children who have friends, in or out of school, who play
instruments could have more opportunity for these kinds of informal opportunities which may make music more
fun and acceptable in their peer group.
Research has shown that the teaching context can have a profound influence on children's performance
achievement and engagement. Indeed research investigating the development of children's musical skills has
placed great emphasis on the effect of teachers' expectations on learners' achievement, with low achievement
and low teacher expectation being highly correlated (e.g. Rosenthal & Jacobson 1968; Blatchford, Burke,
Farquhar, Plewis & Tizard 1989). An instrumental music teacher is often the first significant adult a child will
come into one-to-one contact with, other than members of the child's family or a child-minder. A music teacher
may spend up to one hour each week with a child, often for several years. It is a unique and possibly critical
learning relationship which may have long-lasting effects on the child. A study by Sloboda (1989) of
autobiographical memories of emotional responses to music in childhood found that adults who were not
involved in music, or who considered themselves to be unmusical, were more likely to report that they had
negative musical experiences in educational contexts during childhood (i.e., where some attempt to perform or
respond to music was criticised by early teachers), than adults who considered themselves to be musical.
Despite the important role music teachers play, few studies have explored the characteristics associated with
effective teachers of young instrumentalists. According to findings by Soniak (1985; 1990), Sloboda and Howe
(1991) and Davidson et al (1997), successful young musicians were more likely to have a first music teacher
who were reported to have characteristics such as warmth, enthusiasm and encouragement. These 'warmth'
characteristics were considered more important than the ability for these teachers to display impressive
technical skills on an instrument. Children who had given up lessons did not differentiate between their initial
teacher's 'personal' and 'professional' characteristics in the same way. This suggests that in the early stages of
learning, the personal characteristics of teachers are important for promoting children's musical development
and their continuing of lessons. North, Hargreaves and O'Neill (2000) found significant sex differences in the
reasons young people gave for playing instruments. Boys were more concerned than girls with creating an
external impression, such as being trendy or creating an image. Whereas, a non-significant trend indicated that
girls were more concerned with pleasing others, such as parents, teachers and friends. O'Neill (1997) reports
that more girls than boys are involved in, and successful at, musical activities at school, with approximately
twice as many girls learning to play instruments. Girls also achieve a higher percentage of passes at all levels
than boys in school music examination (DES, 1991).
There is little doubt that children's participation and level of engagement is inextricably linked to their social
and cultural environment. However, it remains unclear how children's perception of different social support
influences their involvement in music. The present study aims to address this by examining the relationship
between children's perceptions of social support by parents, peers and teachers and their levels of participation
in music.
Method
Participants
The present study is part of a longitudinal project investigating the social and motivational factors influencing
young people's participation and achievement in music. During Year 1, 1209 children, (Females = 585, Males
=624) aged 10-11years (mean age 10.5, SD .49), attending 35 primary schools in North Staffordshire
participated in the project. Children were recruited through their school and parental consent obtained.
Procedure
The children completed a questionnaire designed to assess their level of engagement in music and perceived
social support. All items were answered using 7-point Likert-style response scales, except where categorical
responses were required, such as "Do you play an instrument? yes or no". All scales have good psychometric
properties (details given below). The questionnaires were administered with verbal instructions to the children
on a classroom basis in the selected schools. The children completed the questionnaires independently typically
within 45 minutes.
Measures
The children's level of musical participation was measured in terms of how often children reported playing
instruments, for example, how often they played an instrument by themselves or with friends. The 'Playing
Instruments' scale began with the phrase "How often do you...." And had anchors of (1) never to (7) very often.
The 10-item scale had good internal reliability (Cronbach's alpha = .87). (See Appendix for full list of scale
items for all measures).
The child's perceived social support from parents, peers and the school music teacher were measured by
separate support scales. The 'Parent' support scale both began with the phrase " If you played a musical
instrument, how much do you think your parents would......" and had anchors of (1) not very much to (7) a lot.
The 12-item parent support scale assessed perceived support and expectations from parents, for example, how
much the children thought their parents would be pleased or help them to play an instrument. A principal
component analysis using varimax rotation confirmed there was one factor accounting for 50% of the variance.
The 'Parent' scale had excellent internal reliability (α = .9096). The 9-item teacher support scale began with the
same phrase and also assessed perceived support and expectations, for example, how much the children thought
their teachers were pleased with the work they do in class or wanted them to pass exams. Principal component
analysis confirmed the 'Teacher' scale also had one factor, accounting for 54% of the variance. The 'Teacher'
scale had high internal reliability (α =.8928). Peer support was measured by the 'Friendship' support scale which
also began with the same phrase as the parent and teacher scales: "If you played a musical instrument, how
much would your friends........" and had scale anchors of (1) not very much to (7) a lot. The 10 item friendship
support items measured social and active support from peers. Principal component analysis confirmed that the
scale also had one factor, accounting for 46% of the variance. The 'Friendship' scale has high internal reliability
(α =.8689).
Results
The results are presented in three sections. First, details of the principal component analysis performed on the
social support and playing instruments scales are presented, with details of the composite scale scores that were
created for further analysis. Second, descriptives of the relations between the children's level of participation
and perceived social support are described. Finally, multiple regression analysis of the social support predictors
for children's level of participation are presented.
Principal Component Analysis
Playing Instruments
A principle component analysis with varimax rotation confirmed that there were two factors, accounting for
60% of the variance. The scale had high internal reliability (α = .8720), and the two factors were interpreted as
formal (M = 2.90, SD =1.7) and informal playing (M = 2.51, SD =1.4). Scale scores were created from the
mean composite scores for 'Formal' (5 items, α = .8407) and 'Informal' (4 items, α = .7238) playing and used in
all further analysis. The possible scale range was from (1) minimum to (7) maximum. The scale name 'Formal'
refers to playing that mostly occurs within the school setting, including the Year 6 music class and instrumental
lessons, whereas 'Informal' playing refers more to out of school playing, such as with friends or family
members.
Relations Between the Children's Level of Participation and Perceived Social Support
The children were assigned to a categorical cohort based on their self-reported level of participation in playing
instruments. The three cohorts are (1) Players (those who presently play an instrument), (2) Gave up's (children
who had previously played an instrument but given up), and (3) Non-players (those who had never played an
instrument). These cohorts were used in further analyses to examine how children's perceived support
influences their level of participation in playing instruments.
Using GLM multivariate analysis a 2 (Sex) x 3 (Cohort) x 3 (Support) analyses of variance (ANOVA)
identified significant main effects for Sex with 'Parent' (F= 46.275 (df=1), p<.0001), 'Friendship' (F=63.149
(df=1), p<.0001), and 'Teacher' (F=14.587 (df=1), p<.0001) support scales. Significant main effects were also
identified for Cohort with 'Parent' (F= 53.931 (df=2), p<.0001), 'Friendship' (F=19.426 (df=2), p<.0001), and
'Teacher' (F=39.008 (df=2), p<.0001) support scales. No significant interaction effect was found. (See Table 1
for the means and standard deviations).
Table 1
Mean Scores and Standard Deviations for Perceived Support as a Function of Gender and Cohort
Support Scales
Total
Further post-hoc analyses of cohort and support identified significantly higher levels of perceived support from
parents, friends and teachers among children who reported they currently played an instrument compared to
those who had given up or never played. Children who had previously given up also reported higher perceived
support from parents than children who had never played an instrument. Examining the means also highlights
that the significant sex difference is due to girls reporting more perceived support from parents, friends and
teachers than boys.
Regression
The objective of this study was to examine how children's perception of social support influenced their level of
participation in instrumental music. To address this issue, two multiple regression analyses were calculated in
which either 'Formal' or 'Informal' playing of instruments was the outcome variable, with sex and the three
social support scales as the predictor variables.
Predicting Informal Playing
In order to predict informal playing a hierarchical multiple regression was calculated. As sex was found to
correlate significantly with informal playing (see Table 2) it was entered as a predictor at step 1. The three
support scales were entered simultaneously at step 2. By examining the individual regression coefficients for
each of these predictors at this step we could determine which, if any, of them were unique predictors of
informal playing. The three sex x support interaction terms were entered simultaneously at step 3. This latter
step allowed us to examine if the support variables differed in terms of their predictive power as a function of
sex. The results are summarised in Table 3. After the variance shared with sex had been controlled the three
support variables together accounted for a significant proportion of the variance in informal playing.
Additionally, the analysis revealed that all three support variables emerged as unique predictors of the
dependent variable. None of the interaction terms at step 3 either collectively or individually emerged as a
significant predictor.
Table 2
Means, Standard Deviations, and Intercorrelations for Children's Level of Participation, Perceived Social
Support and Sex
Variable M SD 1 2 3 4
Predictor variables
4. Sex -----
Variable B SEB β R² B R²
Variable B SEB β R² B R²
Further analysis indicated that only the sex x teacher support interaction term was a unique predictor at this
step. In order to interpret this interaction effect we calculated the zero order correlation between teacher support
and formal playing separately for girls and boys.
The correlation co-efficient of teacher support and formal playing was higher for girls (r= .404, p<.01), than
boys (r= .360, p<.01), indicating that girls reported higher levels of perceived support from teachers for formal
playing than boys.
Discussion
The significant effects of sex and cohort on the levels of perceived support confirms the importance of
perceived social support on children's level of involvement in playing instruments. Higher levels of perceived
support from parents, friends, and teachers were reported by children who currently play an instrument
compared to those who had given up or never played. Yet children who had previously given up playing an
instrument still reported higher levels of perceived support from parents than those who had never played. The
finding that children presently engaged in playing instruments perceive higher levels of support could be
interpreted as a result of their engagement leading to higher levels of involvement from parents, friends and
teachers. But as children who have given up still report higher levels of perceived support from parents than
those who had never played, it suggests that there is more than just actual engagement causing the difference in
perception. It is likely that children who perceive higher levels of parental support for playing instruments also
perceive a higher value for the engagement in the activity. This suggests a relationship between the valuing of
playing instruments and perceived higher levels of support encourages children to engage in playing
instruments more than children who do not perceive it as a valued activity in their social and cultural
environment.
Girls also reported more perceived support from parents, friends, and teachers than boys. As previous research
has found that girls engage more in musical activities than boys, and have higher levels of achievement, it is not
surprising that girls are found to perceive more support. It is likely that those who perceive support for engaging
in an activity will engage more and do better than those who do not perceive such support. However, the reverse
it also true, that those who perceive low support for their engagement in playing an instrument will be less
motivated to continue and therefore will not achieve high levels of success, possibly abandoning playing
altogether. As music is also perceived as a 'girl' activity, girls are also less likely to be bullied for playing an
instrument. It is more likely that their friends will also play instruments, providing the opportunity for 'fun'
informal playing as well as structured formal practice.
The two types of playing identified suggest that children can engage in playing instruments in different ways,
with formal playing taking place mostly at school in a structured format and informal playing occurring out of
school, with less-structured opportunities for engagement with friends and family. Using multiple regression
analyses to explore the data separately for each type of playing it was found that all three support variables
emerged as unique predictors of informal playing. This finding suggests that all the perceived supportive
relationships can influence a child's engagement in informal playing of instruments, and that like Wentzel's
(1998) findings, they can be additive rather than compensatory. As informal playing may take place in a number
of settings it is likely that whichever support relationship is most present will be the most important. In the case
of formal playing, the sex x teacher interaction term was the only unique predictor. As formal playing takes
place at school, it is not surprising to find that perceived teacher support is the significant unique predictor.
Further analysis of this finding also identified that it was girls who reported the higher levels of perceived
support from teachers. Indeed, as mentioned previously, girls reported significantly more perceived support
across all three support groups. The findings of this study highlight the importance of children's perceived social
support and their engagement in playing instruments. It also demonstrates that the different types of engagement
can require different sources of support, and that if sufficient support is found in the most prominent support
relationship, such as teachers for formal playing at school, this can lead to higher and more successful levels of
engagement in instrumental music.
Key words: social support, children, and participation
References
Berndt, T.J., Laychak, A.E., & Park, K. (1990). Friends' influence on adolescents' academic achievement
motivation: An experimental study. Journal of Educational Psychology, 82, 664-670.
Best, R. (1983). We've all got scars: What boys and girls learn in elementary school. Bloomington: Indiana
University Press.
Blatchford, P., Burke, J., Farquhar, C., Plewis, I., and Tizard, B. (1989). Teacher expectations in infant school:
Associations with attainment and progress, curriculum coverage and classroom interaction. British Journal of
Educational Psychology, 59, 19-30.
Boulton, M.J. and Smith, P.(1994). Bully/victim problems in middle school children. British Journal of
Developmental Psychology, 12, 315-329.
Csikzentmihalyi, M., Rathunde, K., and Whalen, S. (1993). Talented teenagers: The roots of success and
failure. Cambridge: Cambridge University Press.
Davidson, J.W., Howe, M.J.A., Moore, D.G, and Sloboda, J.A. (1996). The role of parental influences in the
development of musical ability. British Journal of Developmental Psychology, 14, 399-412.
Davidson, J.W., Howe, M.J.A., and Sloboda, J.A.(1997). Environmental factors in the development of musical
skill over the life span. In Hargreaves, D.J., and North, A.C. (Eds.). The social psychology of music. Oxford:
Oxford University Press. pp188-206.
Department of Education and Science (1991). Music for ages 5 to 14: Proposals of the Secretary of State for
Education and Science and Secretary of State for Wales. HMSO.
Eccles, J.A., Wigfield, A. and Schiefele, U. (1998). Motivation to succeed. In Damon, W. (Ed.). The Handbook
of Child Psychology, Vol 3, pp 1017-1095.
Finnas, L. (1987). Do young people misjudge each other's musical tastes? Psychology of Music, 15, 152-166.
Howe, M.J.A., and Sloboda, J.A. (1992).Problems experienced by talented young musicians as a result of the
failure of other children to value musical accomplishments. Gifted Education, 8, 1, 16-18.
North, A.C., Hargreaves, D.J., and O'Neill, S.A.(2000). The importance of music to adolescents. British Journal
of Educational Psychology, 70, 2, pp. 255-272.
O'Neill, S.A.(1994). Musical development: Aural. In A. Kemp (Ed.). Principles and processes of music
teaching. Reading: International Centre for Research in Music Education. pp. 1043.
O'Neill, S.A. and Boulton, M.J. (1995a). Is there a gender bias toward musical instruments. Music Journal,
60,358-359.
O'Neill, S.A. and Boulton, M.J.(1995b). Children's perceptions of the social outcomes of playing musical
instruments. Proceedings of the British Psychological Society, 3,1, 87.
O'Neill, S. A. (1997). Gender and music. In Hargreaves, D.J., and North, A.C. (Eds.). The social psychology of
music. Oxford: Oxford University Press. pp 46-63.
Patrick, H., Ryan, A. M., Alfeld-Liro, C., Fredricks, J.A., Hruda, L.Z., and Eccles, J.S. (1997). Commitment to
developing talent in adolescence: The role of peers in continuing motivation for sports and the arts. Paper
presented at the biennial meeting of the Society for Research in Child Development, Washington, DC.
Rosenthal, R. and Jacobson, L. (1968). Pygmalion in the classroom. New York: Holt, Rinehart and Winston.
Scanlan, T.K., Carpenter, P.J., Lobel, M., and Simons, J.P.(1993). Sources of enjoyment for youth sport
athletes. Pediatric Exercise Science, 5,275-285.
Sloboda, J.A.(1989). Music as a language. In F. Wilson and F. Roehmann (Eds.). Music and child development.
St. Louis, Miss.: MMB Music Inc. pp.28-43.
Sloboda, J.A. and Howe, M.J.A. (1991). Biographical precursors of musical excellence: An interview study.
Psychology of Music, 19, 3-21.
Sosniak, L.A.(1985). Learning to be a concert pianost. In B.S. Bloom (Ed.). Developing Talent in Young
People. New York: Ballentine.
Sosniak, L.A.(1990). The tortoise, the hare, and the development of talent. In M.J.A. Howe (Ed.). Encouraging
the Development of Exceptional Abilities and Talents. Leicester: The British Psychological Society.
Sroufe, L., Bennet, C., Englund, M., and Urban, J. (1993). The significance of gender boundaries in
preadolescence: Contemporary correlates and antecedents of bountary violation and maintenance. Child
Development, 64, 455-466.
Thorne, B. and Luria, Z. (1986). Sexuality and gender in children's daily worlds. Social Problems, 33, 176-190.
Thorne, B. (1986). Boys and girls together... But mostly apart: Gender arrangements in elementary schools. In
W. Hartup and Z. Rubin (Eds.). Relationships and development. Hillsdale, N.J.: Lawrence Erlbaum Associates.
Wentzel, K.R., and Asher, S.R. (1995). Academic lives of neglected, rejected, popular, and controversial
children. Child Development, 62, 1066-1078.
Wentzel, K.R. (1998). Social relationships and motivation in middle school: The role of parents, teachers, and
peers. Journal of Educational Psychology. Vol. 90, No. 2, 202-209.
Yoon, K.S. (1997). Exploring children's motivation for instrumental music. Paper presented at the biennial
meeting of the Society for Research in Child Development, Washington.
Back to index
Proceedings paper
One reads these cycles by moving through them in a clockwise fashion, starting from the "12 O'clock"
position. Most metric patterns are more interesting than this, in that they involve a hierarchy of
coordinated time-cycles created by temporal connections between "non-adjacent" events on the "outer
rim" of metric diagram:
The outer rim represents the basic cycle of a meter; it represents the lowest/shortest/fastest level of the
metric hierarchy--not beats, typically, but beat subdivisions. The interior line-segments represent
higher levels of metric structure. The meter in this example is based on an 8-cycle, and the various
pathways within the cycle correspond to different levels of the metric hierarchy: the outer level
defines the cycle, the next defines the beats, the next the half-bar level, and the red loop the measure
itself.
Metric well-formedness may be expressed in terms of the following rules for constructing cyclical
representations:
Given a basic cycle of N elements, additional levels may be constructed, provided:
(a) each line segment connects non-adjacent time-points on the cycle (with the exception
of (d) noted below);
(b) each and every series of segments that represents a metric level must start and end at
the same location (for convenience, notated here as the "12 O'clock" position), forming a
sub-cycle;
(c) no crossing of line segments is permitted;
(d) the highest level of metric structure is represented by a loop to and from the cardinal
metric position.
Note in the previous example each level follows these rules recursively--what were non-adjacent
points on the basic cycle become adjacent on the "first interior sub-cycle" of the diagram. N.B., Since
my main interest is the relationship between the basic cycle and the beat level of the measure, I will
omit higher levels of metric structure in my subsequent graphs. Also, to avoid un-necessary clutter, I
will omit the directional arrows, as one may assume all basic cycles and sub-cycles involve clockwise
directed motion.
These local constraints on metric well-formedness capture some basic aspects of metric structure. One
is that higher levels of meter are usually comprised of two or three elements from the level
underneath. Another is that one point in the measure--the downbeat--is of cardinal importance in the
alignment and coordination of metric processes. What they do not capture is a global constraint on the
spacing of higher-level articulations relative to the basic cycle, the principal of maximal evenness.
Maximal evenness is a concept developed in the study of pitch-class sets (Clough and Douthett,
1991). A maximally-even pattern is one in which a subset of M-elements are spaced "as far apart as
possible" on the circle that represents their N-element superset. We may therefore note:
(a) The basic cycle itself is, by definition, maximally even
(b) Regular meters are, by definition, maximally even, since each beat is comprised of
the same number and kind of sub-division units.
(c) Complex meters are also maximally even
While the first two points are not remarkable, why should complex meters tend toward maximal
evenness? On a cyclical representation of a metric pattern, the fundamental constraint on the
formation of its metric hierarchy is the number of time-points in the basic cycle. This number
determines the various configurations that are possible within it, and so one can represent
cyclically-defined set of meters in terms of the different interior patterns it may contain.
Consider a 9-cycle. According to the metric well-formedness rules given above, it may contain the
following three and four-beat sub-cycles: (a) a pattern of three evenly-spaced beats (familiarly, 9/8),
As an aside, note in the case of the complex meter, that the location of the long relative to the
downbeat may shift, but the SSSL series remains unaffected, since the L always loops back to the first
S. Why not, however, have a sub-cycle of 2+3+4 (?):
A common argument against this configuration is that the segment which spans four articulations of
the basic cycle "naturally" devolves to 2+2, since the "duple" unit tends to persist in the listener's
perception and anticipation. This is a level-specific rationale. Maximal evenness provides a global
rationale, one that assumes that the listener will gravitate towards the most parsimonious attending
strategy. The simplest attentional frameworks are comprised of categorically-equivalent spans on each
and every metric level. As a metric pattern, the 2-3-4 sub-cycle involves three categorically different
time intervals, a short, a medium, and a long. The 2-3-4 sub-cycle is of course not maximally even.
But this pattern of time intervals does align with the maximally-even 2-3-(2-2) sub-cycle, and this
pattern involves only two categorically distinct time-intervals. Thus I would conjecture that while a
four-beat pattern is nominally more complex than a three-beat pattern, because the 2-3-2-2 pattern is
both maximally even and involves fewer distinct durational categories, it is in fact the preferable
attending strategy. This is an instance of a general problem: in inferring a meter from a durational
surface, when are long surface durations "split" into shorter sub-articulations when metrically
interpreted? An answer may be: whenever splitting a long duration preserves maximal evenness.
This brief examination of the metric possibilities of the 9-cycle is but one instance of how cyclical
representations allow one to examine the formal properties of various metric systems. One may also,
for instance, consider the differences in N-element basic cycles when N is a prime versus non-prime
number. Similarly, an examination of the 12-cycle shows that contains a great number of sub-cyclic
configurations, from symmetrical 3, 4, and 6 beat patterns to a wide variety of complex meters. This
perhaps explains its cross-cultural ubiquity, as it is so rife with metric possibilities.
Here the beats follow a 3-3-2 pattern of basic cycle elements; the ratio of the long beat to the short
beat is, as is typical, 3:2. Since we prefer a meter in which all beat-level periodicities fall in the range
of maximal pulse salience, we can note the following effect of tempo changes on this pattern:
Long Beat Interval Short Beat Interval Basic Cycle Interval
600ms 400ms 200ms
750ms 500ms 250ms
825ms 550ms 275ms
900ms 600ms 300ms
1050ms 700ms 350ms
1200ms 800ms 400ms
Only when the interval of the basic cycle falls within the 250-300ms range do both the long and short
beats fall within or near the range of maximal pulse salience. This suggests that complex meters may
be more sensitive to tempo constraints than simple meters, and that tempo constraints may play a role
in limiting the range of possible sub-cycles when the basic cycle itself is made up of many (that is,
more than 16) elements.
This diagram is for the second measure of Chopin's E-major etude, taken from the grand average of
timings from 27 performances by nine different pianists (see Repp 1998a, p. 268). As can be seen,
neither the basic cycle nor the 4-beat sub-cycle involves isochronous time intervals. Indeed, the 4 beat
sub-cycle bears more than a little resemblance to the 4-in-9 sub-cycle given above, though of course
here the last beat is considerably shorter than the others, whereas in the 4-in-9 sub-cycle, the last beat
is considerably longer. Notice also the range of time intervals that occurs here on the basic cycle
itself, from a minimum 436ms to a maximum of 617ms.
What is going on in this measure? In a word, rubato. The story goes something like this: In the first
part of the measure we had a slight bit of expressive timing variation, with the 2nd half of each beat
being stretched by about 40-50ms; in this fashion, the first two beats follow a predictable pattern
(more on this in a moment). On the third beat, however, we have a more dramatic bit of rubato
(corresponding with the onset of a sustained tone in the melody), as the first half of beat 3 is Å100ms
longer than the first half of beats 1 and 2. Now the pianist must regain the time s/he has taken, and the
remaining time-intervals make up the "stolen time"; as such they are correspondingly short. Notice,
however, that even under this constraint the second half of the last beat has a discernible stretch
relative to its first half.
Here is where a bit of mathematical graph theory may be of use. Let us suppose that the rubato on the
third beat of this measure had not occurred. In that case, the timing data might have looked something
like this:
Here we see a pattern on the basic cycle of a regular alternation of slightly shorter--slightly longer
time intervals, what I have labeled T1 and T2. Notice that each "odd" location (the filled dots) on the
graph is symmetrically positioned, as each sub-cycle re-integrates to a constant timing value. In graph
theory, cycles with such symmetrical properties are reducible to more compact graphic
representations, what are referred to as voltage graphs (a term obviously borrowed from their usage in
electrical engineering). The right-hand panel above shows the voltage graph for an 8-cycle comprised
of alternating T1 and T2 values, values for the different "voltages" between the on-beat and off-beat
timepoints. The voltage graph above will generate various basic cycles when "counted" according to a
particular arithmatic modulus. Thus the 8-cycle is generated by the given voltage graph, modulo 4. In
the case of the E-major prelude, we can specify that the ratio between T1 and T2 should be Å48:52.
Other timing ratios may be specified, such as a shift from "straight" to "swung" 8th notes in a jazz
performance style.
Because the rubato performance reported by Repp lacks the symmetry of the simplified version given
above, one cannot reduce it to a simple voltage graph (nor can one use a voltage graph to generate its
complete metric cycle. Similarly, one cannot reduce the 4-in-9 sub-cycle in terms of a corresponding
voltage graph, since it too lacks the requisite symmetry. Let me add that it may be possible to make
alternative graphic representations of the "rubato 8" and 4-in-9 patterns, using what are known as
permutation voltages, and so it would not be correct to infer that such graphs are irreducible. But the
geometric similarities between the "rubato 8" and the "4-in-9" are highly suggestive.
To the extent that a meter can be reduced to a simple voltage graph, one need not include higher-level
timings as part of a structural representation, which is to say, as part of the listener's temporal
attending strategy. In these instances, the higher levels "take care of themselves" as a byproduct of the
cyclical generation from the underlying voltage graph. By contrast, complex meters (such as the
4-in-9) and simple meters with multi-leveled expressive variation (the rubato 8) require more levels of
structure in their representation(s); the rubato 8 example shows how low-level timing changes can
"trickle up" to affect higher levels of attending/anticipation. This implies that as attentional strategies,
these meters require a greater interplay of top-down and bottom-up information--indeed, one cannot
build up higher levels from lower levels, but must instantiate the metric hierarchy in toto (see London
1995, p. 73).
Complex meters (such as the 4-in-9) and simple meters performed with a high degree of expressive
variation (such as the rubato 8) have a number of formal and cognitive similarities, from maximal
evenness of events on each level to timing constraints on local and global levels. These meters are
thus in many ways more alike than they are different. Given that most human musical performance
involves multi-leveled expressive variation, one is led to question the validity of the simple-complex
metric distinction. While one may draw this distinction in theory, in practice, metric attending is
almost always complex.
Works Cited
Berz, W. L. (1995). Working Memory in Music: A Theoretical Model. Music Perception, 12,
353-364.
Clough, J. and J. Douthett (1991). Maximally Even Sets. Journal of Music Theory, 35, 93-173.
Fraisse, P. (1982). Rhythm and Tempo. in The Psychology of Music, ed. D. Deutsch. New York,
Academic Press: 149-180.
Hirsh, I. J., C. B. Monohan, et al. (1990). Studies in Auditory Timing: 1. Simple Patterns. Perception
and Psychophysics, 47, 215-226.
Komar, A. J. (1971). Theory of Suspensions. Princeton, Princeton University Press.
Lerdahl, F. and R. Jackendoff (1983). A Generative Theory of Tonal Music. Cambridge, MIT Press.
London, J. M. (1995). Some Examples of Complex Meters and Their Implications for Models of
Metric Perception. Music Perception, 13, 59-78.
Repp, B. H. (1995). Detectability of duration and intensity increments in melody tones: A partial
connection between music perception and performance. Perception and Psychophysics, 57,
1217-1232.
Repp, B. H. (1998a). Obligatory 'expectations' of expressive timing induced by perception of musical
structure. Psychological Research, 61, 33-43.
Repp, B. H. (1998b). The Detectability of Local Deviations from a Typical Expressive Timing
Pattern. Music Perception, 15, 265-289.
Repp, B. H. (1999). Detecting Deviations from Metronomic Timing in Music: Effects of Perceptual
Structure on the Mental Timekeeper. Perception and Psychophysics, 61, 529-548.
Roederer, J. G. (1995). The Physics and Psychophysics of Music: An Introduction. New York,
Springer Verlag.
Sloboda, J. A. (1983). The Communication of Musical Metre in Piano Performance. Quarterly Journal
of Experimental Psychology, 35A, 377-396.
Todd, N. P. M. (1995). The Kinematics of Musical Expression. Journal of the Acoustical Society of
America, 97, 1940-1950.
Zuckerkandl, V. (1956). Sound and Symbol: Music and the External World. New York, Pantheon
Books.
Back to index
Proceedings paper
Recent interest in performance skill acquisition, social interaction and expressive body movement has
opened up exciting new areas of research in musical performance - namely, that of the methods by
which skilled musicians communicate in ensembles. In such situations, verbal, musical and visual
cues must be established and shared between co-performers to enable and drive successful rehearsals
and performances.
Verbal feedback is a fundamental mode of communication for rehearsal situations. Murningham and
Conlon (1991), for instance, studied group function among string quartet players and found that topics
were often discussed in rehearsal for points of clarification and unity.
It is important to note, however, that much of the exchange between musicians is unspoken. Indeed, a
substantive literature demonstrates that expert musicians can effectively communicate their musical
ideas through non-verbal means by varying such features of the music as timing (Povel, 1977; Shaffer,
1980, 1981, 1984; Shaffer, Clarke & Todd, 1985; Clarke, 1982, 1985; Repp, 1996, 1997), intensity
(Patterson, 1974; Kamenetsky, Hill & Trehub, 1997) and pitch (Schoen, 1922; Bartholomew, 1934;
Deutsch & Clarkson, 1957). Nevertheless, some features of the musical score can be more or less
emphasised depending on moment-by-moment and quite spontaneous modifications to interpretations
(Sloboda, 1985). So, it appears that rehearsals are occasions for co-performers to learn the score and
plan the co-ordination of timing, also to establish general expressive features of the music. In a live
performance situation, variations which occur spontaneously are critically dependent on performers
being able to detect and act immediately upon another's ideas.
Besides communicating information about the co-ordination of timing, expressive ideas and personal
support between co-performers, Davidson (1993, 1994, 1995) has shown that specific movements
reveal much about performers' expressive interpretation of musical structure. For example, in a
performance of one of Beethoven's Bagatelles for piano, a pianist used highly distinctive head shaking
movements consistently whenever he played a cadence, and a wiggle movement of the upper torso
when playing ornaments. These movements were found to be absolutely integral to the production of
the music.
Aside from verbal and musical exchanges, visual cues are integral to ensemble performance. Clayton
(1985) discovered that without visual feedback co-performers found it extremely difficult to
co-ordinate musical timing. Additionally, Yarbrough (1975) showed that co-performer interactions are
successful when there are high levels of eye contact and use of facial and bodily expressions.
Of course, moment-by-moment behaviours of all kinds - be they verbal, musical or visual - are
embedded within a socio-cultural framework. Thus, when considering co-ordination between
co-performers, researchers must be aware of overriding socio-cultural factors which shape the
interaction processes (e.g. social etiquette and a learned cultural aesthetic).
In research terms, the vital social communication aspects of rehearsal and performance have been
largely ignored (Davidson, 1997). In this paper, we hope to provide initial insight into these salient
aspects of musical skill and development by examining and detailing the video data collected from
two pianists when rehearsing and performing piano duos and duets. We focus on the exchanges which
take place between players during the rehearsals and a subsequent performance of an entire recital
programme. The piano duo was specifically chosen because both performers contribute similar
musical elements to the performance and both use comparable instrumental techniques.
Method
Participants
Two highly skilled male pianists with a mean age of 23 years participated in the study. Both had a
wide range of solo and accompanying experience. Both had played the piano since the age of seven
years and so had about fifteen years of learning and performing experience. They accompanied at
least two concerts a week, and played solo repertoire in addition. Although they had met informally
on two previous occasions, they did not know one another prior to working for the current project.
Materials
A VHS camcorder was used to record all rehearsals and the performance given by the two pianists.
Two Sony Walkman tape recorders were used to record all independent, individual practice by the
performers and to collect data from post-performance interviews with them. Based on observations of
the taped material, the researchers designed a semi-structured interview schedule to ask the pianists
questions about the process of rehearsing and performing together.
Procedure
The researchers approached the two pianists and offered them the opportunity to perform a 30 minute
lunchtime recital of piano duos and duets at the University of Sheffield. Both agreed and were set a
performance date for ten weeks later. They were asked to select their own repertoire, arrange all their
own rehearsals and prepare as normally as possible, but with the additional request of recording all
joint rehearsals and the performance on video tape and all individual practice sessions on cassette
tape. The pieces they performed were:
1. Variations on a Theme by Beethoven by C. Saint-Saëns for two pianos
2. Second Movement of Concerto in C for two keyboards by J. S. Bach
3. Sonate for four hands by F. Poulenc
Explorations of the Data
General observations of the pianists' practice and performance were made across all of the prepared
compositions. Once these data were collected, the analyses took the following forms:
1. a preliminary exploration of the content of the obtained data by both researchers;
2. systematic observations leading to both qualitative and quantitative measures of eye contact,
non-verbal gestures and spoken communication;
3. thematic content analyses of interview data.
Summary of Results
General observations
Prior to this project, neither of the pianists had played the pieces. In a brief telephone conversation
arranging the first rehearsal, both mentioned that they had previously heard the Saint-Saëns and the
Poulenc and that these might be interesting pieces to explore.
Throughout the rehearsal period, the Saint-Saëns and Poulenc were practised most (i.e. on four
occasions prior to the performance). These were at fortnightly intervals up to the date of the recital.
Alongside these pieces, they also rehearsed the first movement of Bach's Concerto in C. At the end of
the second rehearsal, however, they decided to run-through the second movement of this concerto in
the next session in order to create a more musically diverse recital programme - both the Saint-Saëns
and the Bach first movement included fast fugue sections. The second movement of the Bach,
therefore, was only rehearsed twice prior to the performance, with both rehearsals taking place three
and two days, respectively, beforehand. Surprisingly, the pianists never individually practised the
pieces, only ever practising during the video recorded sessions. Both musicians attributed this
apparent lack of practice to their fluent sight-reading abilities.
Discussion
Regardless of the repertoire being played, it seems that these two pianists were able to converse
"musically", with information being given and received, modified and consolidated with negligible
verbal interaction. In association with the musical refinements, gestural cues and eye contact became
gradually more refined, synchronous and fluent over the rehearsal period. The performance itself
clearly reflected all the refinements the rehearsal process had brought.
In line with previous research, it was discovered that the pianists were concerned about using
expressive devices, such as highlighting structural features in the music. They also spoke of the
importance of a shared emotional state and conception of each piece's narrative.
In terms of eye contact, the current study consolidates Clayton's (1985) finding that eye contact
between co-performers is critical in the co-ordination of musical content. The performers purposefully
synchronised their glances at major structural points in the compositions.
The movement gestures used could be categorised in terms of illustrators and emblems as suggested
by Davidson (1997). The high hand lifts of the two pianists illustrated the mutual energy and force
both were bringing to the performance and were simultaneously emblematic of a nineteenth century
extravagant pianistic style.
Irrespective of the composition being played, the swaying movement was a feature which developed
throughout the course of the rehearsals and was at its most obvious during performance. It was allied
to phrase structure and overall tempo of each specific piece, but most significantly was highly
illustrative of the emotional intention of the performers. This observation seems to reflect the
theoretical proposals of Runeson and Frykholm (1983) who argue that there is an attunement of
kinematics to dynamics in human actions. That is, internal states and intentions become manifest in
movement. "Movements specify the causal factors of events" (Runeson & Frykholm, 1983, p. 585).
Swaying as the key source of expressive movement in pianists has already been described by
Davidson (1997, 2000). In her theoretical proposal, it seems that the hip region of the pianist acts as a
fulcrum and generator of expressive movement.
In line with work by Cutting and Proffitt (1981), the swaying could represent the global level in a
hierarchy of expressive gestural information, with the hands providing a local indicator. From our
observations, this would seem to be the case. As mentioned above, however, we are aware of
socio-cultural influences which affect this movement production (e.g. adopting nineteenth century
emblematic gestural piano style).
This research allows us to gain initial insight into how two individuals articulate their ideas in both the
construction and execution of an ensemble piece of music. Further research is necessary to validate
and ground these findings.
References
Bartholomew, W. T. (1934). A physical definition of "good voice-quality" in the male
voice. Journal of the Acoustical Society of America, 6, 25-33.
Clarke, E. F. (1982). Timing in the performance of Erik Satie's "Vexations." Acta
Psychologica, 50, 1-19.
Clarke, E. F. (1985). Some aspects of rhythm and expression in performances of Erik
Satie's "Gnossienne No. 5". Music Perception, 2, 299-328.
Clayton, A. M. H. (1985). Coordination Between Players in Musical Performance.
Unpublished PhD Thesis, University of Edinburgh.
Cutting, J. E., & Proffitt, D. R. (1981). Gait perception as an example of how we may
perceive events. In R. D. Walk & H. L. Pick (Eds.), Intersensory Perception and Sensory
Integration (pp. 249-273). New York: Plenum.
Davidson, J. W. (1993). Visual perception of performance manner in the movements of
solo musicians. Psychology of Music, 21, 103-113.
Davidson, J. W. (1994). Which areas of a pianist's body convey information about
expressive intention to an audience? Journal of Human Movement Studies, 26, 279-301.
Davidson, J. W. (1995). What does the visual information contained in music
performances offer the observer? Some preliminary thoughts. In R. Steinberg (Ed.), The
Music Machine: Psychophysiology and Psychopathology of the Sense of Music (pp.
105-113). Springer Verlag.
Davidson, J. W. (1997). The social in musical performance In D. J. Hargreaves & A. C.
North (Eds.), The Social Psychology of Music (pp209-228). Oxford: Oxford University
Press.
Davidson, J. W. (in press for 2000). Understanding the expressive movements of a solo
pianist. Deutsche Jahresbuch fur Musikpsychologie.
Deutsch, J. A., & Clarkson, J.K. (1959). Nature of the vibrato and the control loop in
singing. Nature, 183, 167-168.
Kamenetsky, S. B., Hill, D. S., & Trehub, S. E. (1997). Effect of tempo and dynamics on
the perception of emotion in music. Psychology of Music, 25, 149-160.
Murningham, J. K., & Conlon, D. E. (1991). The dynamics of intense work groups; a
study of British string quartets. Administrative Science Quarterly, June, 165-186.
Patterson, B. (1974). Musical Dynamics. Scientific American, 231, 78-95.
Povel, D. J. (1977). Temporal structure of performed music: Some preliminary
observations. Acta Psychologica, 41, 309-320.
Repp, B. (1996). Patterns of note onset asynchronies in expressive piano performance.
Journal of the Acoustical Society of America, 100, 3917-3932.
Repp, B. (1997). Some observations on pianists' timing of arpeggiated chords.
Psychology of Music, 25, 133-148.
Runeson, S., & Frykholm, G (1983). Kinematic specification of dynamics as an
informational basis for person-and-action perception: Expectations, gender, recognition,
and deceptive intention. Journal of Experimental Psychology: General, 112, 585-615.
Schoen, M. (1922). An experimental study of the pitch factor in artistic singing.
Psychological Monographs, 31, 230-259.
Shaffer, L. H. (1980). Analysing piano performance. In G. E. Stelmach & J. Requin
(Eds.), Tutorials in Motor Behaviour. Amsterdam: North-Holland.
Shaffer, L. H. (1981). Performances of Chopin, Bach, and Bartok: Studies in motor
programming. Cognitive Psychology, 13, 326-376.
Shaffer, L. H. (1984). Timing in solo and duet piano performances. Quarterly Journal of
Experimental Psychology, 36A, 577-595.
Shaffer, L. H., Clarke, E. F., & Todd, N. P. (1985). Metre and rhythm in piano playing.
Cognition, 20, 61-77.
Sloboda, J. A. (1985). The Musical Mind: The Cognitive Psychology of Music. Oxford:
Oxford University Press.
Yarbrough, C. (1975). Effect of magnitude of conductor behaviour on students in mixed
choruses. Journal of Research in Music Education, 23, 134-146.
Back to index
Proceedings paper
Introduction
The theory put forward in this paper is based on the view that musical sound originates from an inner
impulse to move. During a musical performance, the many and varied movements of the body are the
vehicles through which a musical idea becomes audible. Clarke and Davidson (1998) suggest that to a
large extent the expressive intentions generated by a performer have their origin in corporeal
experiences. Therefore it may be proposed that an audible trace of the original kinetic impulse exists
within the sound spectrum of music. It is interesting to note that musical utterances of the same
intensity, contour, timing profile or overall kinetic quality can be achieved by the extremely varied
corporeal requirements of different instruments. For instance, the arm-wrist-hand movements
controlling a bow, or the tongue and breath manipulation on a mouthpiece can produce a very similar,
if not identical, musical communication. This may indicate the existence of a kinetic or body-based
'language' that can be expressed by any body movements and is detectable in the resulting sound
stream. The kinetic quality of a musical gesture, or indeed any kind of motion, is what I shall be
referring to as 'dynamics' throughout this paper. In the following theory, 'dynamics' constitute
perceptually available elements present within both musical and physical movement.
Motion in music and motion in the physical world both share essential elements that constantly
moving or evolving. It could be said that both of these activities are fundamentally in a constant state
of flux. Apart from the temporal flux of music and movement, there are countless other elements that
are constantly changing. Consider the possible parameters in flux during a single musical utterance.
The musical tones can move up or down, the degree of weight placed on each note can change
dramatically, the tone can expand or contract. Further to this, any musician would agree that the
degree of fluctuation possible within each of these factors is immense. However, it is flux that
presents, perhaps, some of the most difficult problems to psychologists interested in the perception of
any kind of motion. The moment to moment generation and perception of music and movement is
difficult to study because of the amount of complex factors that are constantly changing. The main
purpose of this paper is to describe the nature and extent of our ability to perceive flux information
from both our physical and acoustic environments. In addition, the possible usage and meaning of flux
information is also discussed.
The Transmodal Communication of Dynamics
In her work on 'intuitive parenting', Papousek (1996) notices how adults involved in active
communication with an infant will synchronise the force and temporal pattern of their own modalities
in order to convey a unified message to the child. For instance, Papousek shows that when seeking to
calm a child the adult may use quiet, slow, repetitive sounds alongside soft, slow and repetitive
movements; thus conveying the same energy profile, or what I term as 'dynamics', through two
modalities. In order to comprehend a transmodal communication, the infant would have to form some
kind of abstract, amodal representation of the significant perceptual elements. It is widely believed by
many developmentalists that humans are born with the ability to perceive the amodal qualities of
events and the environment (see Bower 1974, Spelke 1987) Indeed, Bower (1974) maintains that
infants learn to distinguish between sensory modalities at a later stage in development. Meltzoff
(1981) showed that infants could reliably copy the dynamic profile of visual stimuli. In one
experiment, infants were presented with a visual tongue protrusion and were able to repeat the action
with obvious attention to the manner in which it was performed. The ability of infants to achieve
intermodal matching as well as transmodal communication has been well documented by
developmentalists (Stern 1985, Maratos 1998). From the evidence presented above, it appears that we
can comprehend the transmodal communication of dynamic elements from infancy.
Stern asserts that the infant's entire world is one of "shapes, intensities and temporal patterns" (Stern
1985). The infant experiences everyday events of waking, becoming tired or becoming hungry as
certain dynamic profiles without realising the function and significance of these events as an adult
would. Further to this, Stern notes that it is more probable that infants will perceive the actions of
adults - such as the way in which an adult reaches for the feeding bottle, or the way in which the adult
picks up the child - in terms of the dynamic profile of the action as opposed to knowledge of the
action's goal. This may indicate the use of a direct perceptual mechanism as opposed to a more
cognitive process. In the above experiments and observations, it can be seen that infants display the
ability to recognise and repeat dynamic communication. This dynamic communication appears to be
based on energy fluctuations, on the flux of motion and could, perhaps, be described as a prelinguistic
language. If there is, as the evidence suggests, an amodal 'dynamic' language then what kind of
information can it communicate?
McLaughlin notes, can be traced right back to Freud who remarked that:
" He that has eyes to see and lips to hear may convince himself that no mortal can keep a secret. If his
lips are silent, he chatters with his fingertips; betrayal oozes out of him at every pore." (Freud1905 in
McLaughlin 1992)
It is apparent that our motion perception is a highly evolved, incredibly sensitive system, which is
needed, amongst other reasons, to gain knowledge of the identity and intentions of others.
Dynamics in Music: The Perception of Motion in Music
The question concerning many music psychologists today is whether our perception of motion in
music functions in a similar way to physical motion perception. Many psychologists, philosophers and
writers refer to the link between musical and physical motion, often using a combination of anecdotal
and empirical evidence in an effort to explain a perceptual link which clearly exists. Storr (1992)
discusses how music simply 'resonates' with our physical experiences. Ansdell (1995) describes the
ability of music to 'animate' us. That is, to provide an impetus for movement. Truslit, who likens
music to an "invisible, imaginary dance" (see Repp 1993) describes how musical phrases/gestures can
be thought of as physical 'shapes'. In one experiment, Truslit asked musicians to interpret the manner
in which they should play a scale from looking at a drawn, linear shape. He describes how the
performances changed markedly according to what 'shape' is being read, and notes that he could
successfully predict the gestural quality of the performance from the given shape.
Evidence supporting our ability to think in shapes as opposed to discrete, individual elements can be
seen in the work of Dowling (1982). He describes how our memory displays a definite preference to
remembering melodic sequences as a single contour (i.e. a single 'shape') instead of a group of
discrete tones. Furthermore, Chang and Trehub (1977) revealed that infants as young as five months
old could recognise contour changes in a-tonal melody; thus revealing that the infants could recognise
a contour as an entity in itself. Davidson (1993) found that subjects could distinguish between three
different manners of performance (exaggerated expression, normal expression and deadpan
expression) from the performers' movements alone, as well as from the music alone. This experiment
reveals that expressive information can be co-specified in a performer's music and movement.
The above studies may support the claim that there are similar perceptual elements available in both
physical and musical motion. Clearly many more studies need to be carried out in order to explore
motion perception in music; however, it has been shown both that movement perception is a highly
evolved, sensitive system and that similar motion information can be specified in the acoustic
spectrum. It remains to be seen whether the motion information available in music can signify
psychological states and intentions in the same way and to the same extent as physical motion.
The Language of Dynamics: The Invariant Perceptual Elements of Motion
Aristotle, describing Plato's 'Theory of Forms' notes that if thought needs an object on which to think
then there must be fixed concepts of aspects of the environment. These fixed concepts would enable
us to recognise objects for what they are, even though they appear somewhat different when they are
in motion, as "....there is no knowledge of things which are in a state of flux", (from 'Metaphysics', see
Flew 1989). This, rather neatly, presents the central conundrum facing psychologists interested in the
perception of motion. The psychologist Gibson provided the first comprehensive theoretical proposal
that tackled this problem. Gibson (1960) suggests that in order to perceive a stable environment, we
must be able to detect constant, invariant elements in the visual and acoustical array. These
'invariants', as Gibson calls them, consist of certain ratios and proportions that can be directly
perceived (i.e. without further cognition) by the observer. For instance, all the elements we need to
specify the cause of an acoustic event, Gibson argues, are available in the 'wave train'. The complex
frequency array specifies the vibratory event and the sequence of transients provide information
pertaining to the temporal profile of the event. However, I find that this theory could be somewhat
misleading for the psychologist studying expressive motion as Gibson infers that flux information is
to a large extent discarded in the process of perceiving invariants: "The hypothesis is that constant
perception depends on the ability of the individual to detect invariants, and that he ordinarily pays no
attention whatever to the flux of changing sensations" (Gibson 1960). For instance, when considering
the perception of a flat object then it would indeed be necessary to perceive the invariant properties of
the object - its flatness, density and weight - and discard the enormous amount of changing sensory
data from our fingertips. However, when it comes to the perception of expressive motion then it is just
this very complex, constantly changing information that we need to perceive. Clearly there must be
invariant perceptual factors available in motion, but they are invariants that change from moment to
moment. Mechanisms must exist to perceive invariant factors at different stages of flux and the
perceptual system would need to recognise these factors as essentially transient in form. When
listening to musical motion, there may be certain fluctuating elements of the acoustical spectrum.
These would enable us to comprehend the meaning of the motion, as acoustical information contains a
deluge of highly complex frequency and temporal information. Deutsch (1999) notes that there exists
many ordering and grouping mechanisms that serve to recreate the original acoustic event. Grouping
mechanisms rely, to some extent on the perception of constant factors within the acoustical array.
The Meaning of Flux: Physical Gestures Heard in Sound
The dance theorist Rudolf Laban describes the content factors necessary to produce movement for the
purpose of dancer/actor training. Laban (1960) states that in order to move we need to manipulate the
weight of our bodies, the space around us, the temporal profile of our actions and the flow or
continuity between movements. These four factors, (or 'motion factors' as Laban refers to them) of
Weight, Time, Space and Flow can be combined and manipulated to produce the entire spectrum of
human movement. Laban describes the construction of eight fundamental actions or gestures from
combinations of the most basic opposite attitudes that can be displayed towards the motion factors of
Weight, Time and Space. (Note that the element of Flow only comes into play when two or more
consecutive actions are carried out.) Thus our attitude can be firm or gentle towards Weight, quick or
slow towards time, and direct or flexible towards space. An action such as a 'dabbing' can be produced
by displaying a light attitude towards Weight, a direct spatial trajectory, and a quick attitude towards
Time. Changing the attitude of one of these elements results in a different action. For instance, if the
spatial trajectory becomes more flexible in approach then the dabbing action becomes one of
'flicking'.
I propose that these 'motion factors' are also active motion formatives within a musical utterance. A
performer can manipulate the degree of Weight placed on a note, the movements of notes within a
virtual musical Space, the temporal profile, and the manner in which musical ideas travel from one to
the next. The physical actions described above, and many more, are in fact used by the musician in
order to create a vast diversity of expressions. For instance, to produce a piano, staccato note on the
clarinet, the action of the tongue against the reed would be one of dabbing. Increase the Weight or
force of the tongue movement, combined with an increase in the force of the air column and the action
goes towards punching. Whereas, decreasing the 'quickness' of the dabbing would produce a more
sustained action, referred to by Laban as 'pressing'.
Weight, Time, Space and Flow can be seen in actions and heard in music. Within a single movement,
these motion factors constantly fluctuate, yet, a physical or acoustic 'punch' is still recognised as a
single gesture. This may function in the same way as Clarke (1987) describes our perception of a
non-musical sound event, such as the jangle of a bunch of keys. Clarke notes that we would recognise
the source of the sound as a unit of information i.e. 'keys' and not as a highly complex sound
spectrum. Using Gibson's (1960) theory this is possible through the perception of invariants in the
acoustic array, which signify the identity of the event. My conjecture is that the motion factors may
constitute fluctuating invariants that combine to form meaningful units of perceptual information
pertaining to the expressive motion of physical and acoustic events. For music, this means that a
passage of music may be heard as a complex array of gestures and other body-based movement
sequences. However, this theory does not suggest that musical motion must allude to physical motion
for it to be meaningful. Rather, it is proposed that the perceptual elements, which are available in both
musical and physical motion constitute a meaningful language in themselves.
The production of a singular gesture is clearly only part of our movement potential. The human body
is capable of complex sequences of gestures, different gestures produced simultaneously and motion
sequences without distinct gestures. It is feasible within Laban's theory to account for all of these
possibilities, although it is beyond the scope of this paper to describe the entire theory and how
practitioners have interpreted it. This theory does not attempt to describe the endless possible
variations of a movement, but the basis of how all movements and motion sequences may be
constructed.
The Meaning of Flux: Musical Motion as Interpersonal Communication
Laban (1960), see also Carpenter (1965), addresses the interpersonal meaning of motion through
describing the 'inner participations' of the four motion factors. These 'inner participations' constitute
the mental attitudes that can be displayed by manipulation of the physical motion factors. In the
creation of meaning, Weight corresponds to a person's Intention, Space to their Attention, Time to
their Decision and Flow to their Progression. For an in-depth discussion of this process see North
(1998) who uses this aspect of Laban's theory for personality assessment through movement
observation.
The motion factors and their corresponding 'inner participations' are used for the training of actors at
the 'Drama Centre', London. Briefly, an actor can explore their Intention towards another character by
manipulating their use of the element of Weight, both in their bodily movements and in their voice.
For instance, by increasing the weight of their hand tapping on a table, perhaps combined with an
increase in the force of their vocal utterances, an actor can display increased Intention to exert their
will upon another character/event. Similarly, if an actor wanted to display fixed pinpointed Attention
they would use direct actions in space, be it a direct hand movement or a fixed glare. Whereas flexible
movements in space can display non-direct Attention, where the hand waves around and the eyes don't
come to fix upon a certain point.
If we can hear motion sequences and gestures within music, is it possible that we can also perceive
information pertaining to inner attitudes within the acoustic array? I believe that it is feasible that a
sustained increase in the force of a musical phrase may be perceived as increased Intention. This
musical phrase could even be perceived as 'dominating' over other musical lines that display less
Intention or force. Similarly, a highly flexible musical line with no appreciable direction could be
perceived as lacking Attention. However, attributing inner psychological states to musical sequences
could only ever be one of many ways to create meaning from a musical experience. Many differing
and plausible accounts of the meaning of music exist today, perhaps reflecting the multiplicity of
meanings which music can elicit for different people. For instance, Meyer (1956) suggests that the
stimulation of emotions from 'arousal and inhibition' sequences in music create meaning. This can
occur where knowledge of the musical syntax used leads us to expect certain outcomes, which can
then be delayed or changed unexpectedly. Nketia (1966) describes the meaning of traditional African
music as, amongst other things, reinforcing social structures and community consensus for its
participants. Salgado Correia and Davidson (1999) describe how performers create meaning in a work
by means of their own individual 'metaphorical projections', which are grounded in body-based
experience.
Applications
The implications of the perception of dynamic elements in music are many for the fields of music
psychology, music education, and for the wider domain of non-verbal communication. However, one
application of the theory proposed in this paper that warrants special mention is in the field of Music
Therapy. Many Music Therapists note that it is difficult to describe the musical material that arises out
of a session with a client (Pavlicevic 1998, Ansdell 1997, Bunt 1994). Traditional and structural
musical analysis, which tends to separate music into elements of melody, harmony and rhythm, may
only explain a small part of the client's musical communication. This becomes especially true when
the client has had little musical training/experience prior to therapy. Thus, Music Therapists have
relied on descriptive language in order to discuss the complex feelings and interpersonal musical
dialogues that arise during a session.
Analysing a musical communication in terms of 'dynamics', or motion impulses, may provide more
insightful information for the therapist to gain knowledge of the nature and manner of the client's
expression. An analysis of this type could be implemented on many levels; from simply noting what
motions are present or absent in a clients expressions to a fuller personal analysis of the type
described by North (1990) using the 'inner participation' theory mentioned above.
Bunt, describing Music Therapy with children, notes that the therapist must not simply imitate the
music of a child in order to join with them, but they must somehow get, "behind the sounds to make
contact with the child's world of feeling" (Bunt 1994). Possessing an in-depth understanding of human
motion expression could help the therapist understand, compliment and suggest new directions for a
client's expressive musical communication.
Summary
It was proposed at the beginning of this paper that both music and movement share a common
'language' of dynamic, motion-based elements. Evidence has been presented which suggests that
common, transmodal perceptual elements are available in the acoustic and visual array. Further
studies have shown that these elements contain information pertaining to the expressive qualities of
motion. Laban's 'motion factors' have been utilised to describe the possible construction and operation
of these perceptual elements. Continuing from this theory, the perception of inner attitudes and mental
states from motion perception has also been discussed. It was proposed that the motion-based
perceptual elements, termed 'dynamics', could provide meaningful information alone, or they could
signify information similar to that of interpersonal communication. However, it has been noted that
this could only account for one facet of the meaning and significance of music for both listener and
performance.
References
Ansdell, G. (1995) Music for Life. Aspects of Creative Music Therapy with Adult Clients. London:
Jessica Kingsley.
Bower, T. J. R. (1974) Development in Infancy. San Francisco: W.H. Freeman and Company.
Nonverbal Cues. In S. Kramer and S. Akhtar (Eds.). When the Body Speaks. Psychological Meanings
in Kinetic Clues. Northvale, New Jersey: Jason Aronson Inc. pp.131-161.
Nketia, J.H.K. (1966) A Review of the Meaning and Significance of Traditional African Music.
Accra: Institute of African Studies. University of Ghana.
North, M. (1990) Personality Assessment Through Movement. 2nd Edition. Plymouth: Northcote
Papousek, M. (1996) Intuitive Parenting: A Hidden Source of Musical Stimulation in Infancy. In I.
Deliege and J. Sloboda (Eds.). Musical Beginnings. Origins and Developments of Musical
Competence. Oxford: Oxford University Press. pp. 88-108.
Repp, B.H. (1993) Music as Motion: A Synopsis of Alexander Truslit's (1938) Gestaltung und
Bewegung in der Musik. Psychology of Music 21, 48-73.
Runeson, S. Frykholm, G. (1983) Kinematic Specification of Dynamics as an Informational Basis for
Person-and-Action Perception: Expectation, Gender Recognition, and Deceptive Intention. Journal of
Experimental Psychology. General. Vol. 112 No.4, 585-615.
Salgado Correia, J. Davidson, J.W. (1999) Meaningful Musical Performance. Unpublished paper.
Spelke, E.S. (1976) Infant's Intermodal Perception of Events. Cognitive Psychology 8, 553-560.
_____ (1987) The Development of Intermodal Perception. In P. Salaptek and L. Cohen (Eds.).
Handbook of Infant Perception. Vol. 2. From Perception to Cognition. Orlando, Florida: Academic
Press, Inc. pp. 233-267.
Stern, D.N. (1985) The Interpersonal World of the Infant. A view from Psychoanalysis and
Developmental Psychology. New York: Basic Books, Inc.
Storr, A. (1992) Music and the Mind. London: HarperCollins Publishers.
Back to index
1 Introduction
Categorisation is the process of detecting structures and similarities between the
objects in the world, and grouping similar objects together into classes. This pro-
cess lies at the basis of most human cognitive activities. Equally, similarity and
difference relations play a fundamental role in the internal structure of a musical
piece, and in our musical understanding (e.g., [Deliège, 96]). Many theories and
analytical methods in music, such as traditional morphological analysis, paradig-
matic analysis, pitch class set theory, motivic analysis and so forth, are based on
similarity relations.
A problem with a categorisation-based approach to music analysis is that often
the categories in musical pieces are chosen intuitively, making it difficult to justify
the choice of a specific class for a musical segment, and introducing inconsisten-
cies into the analysis. In this paper we address this problem by presenting a formal
approach to categorisation which is based on a clustering algorithm that operates
on well-defined descriptions of musical segments, and we apply this approach to
the analysis of a musical piece, namely, Boulez’ Parenthèse, a movement from his
3rd piano sonata.
The rest of this paper is organized as follows: in the next section, we briefly
describe Boulez’ 3rd piano sonata and Parenthèse, and we discuss the challenges
that this piece poses to the analyst. Then, we explain in detail our formal approach
Many thanks to Gert Westermann and Fred Howell.
1
to categorisation, including segmentation of the piece, representing the segments
in terms of musical features, and clustering these representations with a compu-
tational algorithm. Section 4 describes the categorisation experiments that were
carried out, and the results of these experiments are presented in section 5. In
section 6 we discuss these results and suggest directions of future research.
In order to capture this structure, the analysis of the entire piece can be split
into three parts: first, the analysis of the six obligatory fragments, second, the
analysis of the optional fragments in parentheses, and third, the relation between
obligatory and optional fragments. In this paper, we demonstrate a full analysis
of the first part, that is, the obligatory fragments of the piece.
Within Parenthèse we observe different similarity relations between its seg-
ments: first, the dodecaphonic “repetition-difference”, which is based on the use
of pitch class sets, and second, similarity relations in musical properties such as
rhythm and tempo, tonal centres, intervals, contour, and way of playing.
The method of analysis that we present in this paper aims to bring out these
relations. The aim is, on the one hand, piece-specific: to demonstrate the structure
of the obligatory part of the piece. On the other hand, a more general aim is to
demonstrate how the formal method of analysis, that has previously been shown
2
to work for monophonic pieces ([Anagnostopoulou and Westermann, 97]) can
be applied to a non-monophonic, atonal piece of music with very rich internal
relations, and where a hierarchical segmentation is needed.
3 The Analysis
The analysis method is a formalised and extended version of Paradigmatic Ana-
lysis [Ruwet, 1996; Nattiez, 1975]. The formalisation consists in dividing the
analysis process into discrete steps, fully specifiying the representations at each
step, and performing the clustering of the musical segments with a well-defined
algorithm.
The analysis process is illustrated in figure 1. First the piece1 is broken down
hierarchically into smaller segments, and then each of the segments is described
as a set of properties. The description of the segments is then turned into an ap-
propriate computational input in the form of feature vectors, and the classification
algorithm takes this input and produces a hierarchic classification of the segments.
The result of this process is a categorisation analysis that makes similarity rela-
tions explicit. In the following sections, we describe each step in detail.
Segmentation
of piece
Segments as transfomration into Classification
sets of properties featurevectors of segments
List of musical
properties
Sim-Cat
analysis
re-evaluate
3.1 Segmentation
In most formal methods of analysis, the music piece is first split into segments. It
is important to consider that the precise way in which the piece is segmented has
a profound influence on the outcome of the analysis.
1
By using the term piece we mean the obligatory fragments that are analysed here.
3
In Parenthèse, segmentation is an easier task than for most pieces, since in
most places the segmentation points are clearly indicated by the composer. We
define segment boundaries
4
intermediate level where we combine certain adjacent low-level segments: for ex-
ample, the low level segments 1a and 1b form the intermediate level segment 1ab.
By this we hope to capture similarities that exist between the different segmenta-
tion levels.
properties that are true for a part of the segment, for example, the existence
of a specific interval in the segment
5
properties that are true for the whole of the segment, for example a rising
melodic movement
In our approach we mainly make use of the second kind of properties, with the
exception of specific rhythmic and intervallic patterns that describe merely part of
a segment.
Table 1 shows the properties that we use in the analysis, and the segments in
which they are found. The properties considered here are:
the existence of various pitch-class sets and certain common subsets that
they share. The reason to consider the common subsets is to reinforce simil-
arity between the sets that the composer uses, which are indeed very similar
to each other. In order for a pitch-class set to be true for a segment, all the
notes of the segment have to belong to the pitch class set.
tempo and dynamic descriptions. The composer is very specific about which
tempo and dynamic descriptions he uses, and these are important for the
distinction of the segments and the overall structure of the piece, so in a
classification task of this piece, they should be taken into account.
tonal centres, which in this case are single tones rather than keys, and rela-
tions between tones, significant intervals that the composer seems to favour,
and contour information.
3.3 Classification
The classification of the segments that are represented as feature vectors, is carried
out with a computational algorithm. This approach differs from the traditional
Ruwet/Nattiez Paradigmatic Analysis in that
6
property 1a 1b 1c 2a 2b 2c 3 4a 4b 4c 4d 5 6a 6b
3-1(12) y y y
4-1(12) y- -y- -y
7-2 y- -y- -y y- -y- -y
6-9 y- -y y- -y y- -y
5-2 y- -y y y y- -y
5-5 y y
all y- -y- -y- -y
inv 012 y- -y y y -y- y y y -y y- y y y- -y
inv +3 y- -y -y y -y- -y y y -y y- y y- -y
inv +5 y- -y y- -y- -y y y -y y- -y y- -y
inv +7 y -y- -y y- -y y- y
longn y y y y y y
Q,Q y y
4note y y
SQdot y y
triplex y y y y
exact y y y y y
précip y y y y
cédé y y y y y
mf+ y y y y
cresc y- -y- -y y y y
dimin y y y- -y
steady y y y?- -y- -y y y y
Gis/Aes y y y y
G,Gis,A y- -y y- -y y- -y- -y y- -y
D y y y
Cis,D,Dis y y y y
semit y y y y- -y y y y y y
tritone y y y y
third y y y y y
wob y y
down1 y y y
down2 y y
up2 y
Table 1: The lowest-level obligatory segments (1a, ..., 6b) and the properties that
are true for each segment. When a property exists in a segment, then this is marked
by a “y”. When there is a property that is true for a bigger segment but not for
the low-est level, then this is marked in the lowest-level segments that the bigger
segment is made from, by using “y-”, “-y”, “-y-”, according to which adjecent the
property is shared with. The first part of the table contains the pitch-class sets and
their common subsets, the second part contains the rhythmic patterns, the third
part contains the directions by the composer on tempo and way of playing, the
fourth part contains tonal centres and specific intervals and the last part contains
contour information.
7
Figure 3: Construction of a GNG network. Small circles represent input data, and
large circles connected with edges are the units of the network.
8
a cluster, expressed in the probability distribution of the feature values of their
cluster members, and the distances between the units can be measured to gain
information about the similarity between the classes.
4 Experiments
We performed four experiments:
In the first experiment, the classification algorithm was trained on the feature
vectors that represent the segments on the smallest level only: 1a, 1b, 1c, 2a, 2b,
2c, 3, 4a, 4b, 4c, 4d, 5, 6a, 6b. The properties that stretch over adjacent smallest-
level segments were not taken into account.
In the second experiment, the algorithm was again trained on feature vectors
representing the smallest-level segments, but this time they were enhanced with
those features that stretch over segment boundaries. For example, if segment 1
has a property a that is not reflected in its sub-segments 1a, 1b, and 1c, then here
these sub-segments inherited this global feature.
In the third experiment, all segmentation levels were represented in parallel
and the algorithm was trained on the full set of lowest-level segments 1a,. . . ,6b,
the highest level segments 1,. . . ,6, and certain middle-level segments such as 1ab,
4bcd, and so on. In contrast to experiment 2, the lowest-level segments were only
represented by their own properties and not the shared ones.
In the fourth experiment, we considered only a selection of eight segments
drawn from all the levels: 1ab, 1c, 2, 3, 4a, 4bcd, 5 and 6.
By comparing the developing network architecture over a period of insertion
of units, we were able to observe the hierarchy of classes.
5 Results
Table 2 shows the results of experiments 1 and 2, when the number of classes is 5
(that is, when the network has inserted 5 units). Table 3 shows the results in the
same two experiments, when there are 7 and 8 classes.
The results of experiments 1 and 2 are all intuitively acceptable, although those
from experiment 2 seem slightly better. Table 2 shows the results of experiment
1 with 5 classes: here, 1a, 2b, 2c, 4b, 4c, 6b belong to the same class. This
classification would have been better if segments 1a and 6b were in a different
class from the others, since they contain long notes whereas the other segments
contain shorter notes. This difference could be enhanced by introducing an extra
9
Class Exp 1 Exp 2
Class I 2a, 4d 2a, 2b, 2c, 4b, 4c, 4d
Class II 3, 4a 1c, 5a
Class III 1a, 2b, 2c, 4b, 4c, 6b 1b, 6a
Class IV 1b, 6a 3, 4a
Class V 1c, 5a 1a, 6b
Table 2: The experimental results in the two first experiemtns when the number
of classes is 5.
10
2 classes 2a, 2b, 2c, 4b, 4c, 1c, 5a, 3, 4a
4d, 1b, 6a, 1a, 6b
4 classes 1c, 5a 3, 4a
8 classes 3 4a
9 classes 2b 4c
In experiment 3 all levels of segments are taken into account. The results for
5 and 8 classes are shown in table 4. In this case we often get segments and
their subsegments classified in the same category, since they share many of their
properties (for example, segments and subsegments of 2 and 4). This problem
cannot be avoided in such a setting and the results need further interpretation in
order to be valid. For this reason, 5 classes seems to be too few classes for an
acceptable classification. When the number of classes increases to 8, the results
improve: 3 and 4a are correctly classified into a category of their own, and the
same holds for 1b and 6a. It is interesting to see segment 4 on a category of its
own, since it is the longest segment of all. Segments 2 and 4bcd are placed in the
same category and are an example of similarity across levels. In general, 8 classes
seem to be sufficient for demonstrating the symmetry of the segments, although
one needs to consider carefully which segments denote this and which are merely
related subsegments of the same bigger segment.
Experiment 4 is the simplest experiment because we consider only a selection
of 8 segments across levels. These are chosen in order to show the structure of the
piece. Table 5 shows the resulting classification when having 4 classes: the first
and last segment, 1ab and 6, are classified together, and the same is true for 1c
and 5, 2 and 4bcd and 3 and 4a. These segments are almost mirror images of each
other, and define the symmetrical structure of the piece.
11
Class Exp 3, classes 5 Exp 3, classes 8
Class I 1c, 3, 4a, 4, 4ab 1c, 5a, 4ab
Class II 1a, 2b, 2c, 4b, 4c, 6b, 2bc 1a, 2b, 2c, 4b, 4c, 6b
Class III 2a, 4d, 2, 2ab, 4cd, 4bcd 1, 6, 1ab, 1bc
Class IV 1b, 6a, 1, 6, 1ab 2a, 4d, 2, 2ab, 4cd, 4bcd
Class V 1bc 4
Class VI 1b, 6a
Class VII 2bc
Class VIII 3, 4a
Table 4: The experimental results in the third experiment when the number of
classes is 5 and 8.
Class Experiment 4
Class I 1c, 5
Class II 2, 4bcd
Class III 1ab, 6
Class IV 3, 4a
Table 5: The experimental results in experiment 4, with 4 classes.
6 Conclusions
We presented a formal method of analysis based on categorisation of music seg-
ments according to similarity. We applied this method to the analysis of Boulez’
Parenthèse from the 3rd piano sonata, taking into account the obligatory fragments
of the piece. The resulting hierarchic classification defines the similarity and dif-
ference relations between classes and between segments. We demonstrated how a
classification analysis is appropriate for this piece and how it brings out the sym-
metrical structure that the composer intended. This method of analysis, shown
previously to work on more traditional kinds of music, is shown here to be appro-
priate for an atonal and non-monophonic piece of music.
The results give many interesting insights on the obligatory fragments. In
terms of internal relations, it is a very rich piece, each note situated in its position
for a variety of reasons, forming part of an overall plan. More specifically, we see
that the piece also has an interesting tonal structure, evolving mainly around G
sharp at the beginning and end, and around D in the middle of the piece. The pitch
class sets used are very similar to each other, segments 2 and 4 sharing sets, and
the same for segments 1, 3 and 6. Dynamics and tempo seem to be very import-
12
ant for the segmentation and difference between subsegments, whereas contour
information seems to be reflecting the symmetrical structure of the piece.
The issue of hierarchic segmentation in a classification task poses interesting
challenges to the analyst. When classifying all the levels at the same time, on the
one hand we get interesting similarities across levels, but on the other hand we get
similarities between segments and their subsegments which are redundant.
The results depend on the initial representation, that is the choice of properties
according on which each segment is described. A different choice of properties
would yield different results. However, a bad resulting classification would show
that the initial properties were not chosen carefully, and a re-evaluation of these
properties is needed. In that way, the analyst can revise the initial properties. This
procedure can go on until an acceptable classification is produced.
The principles of similarity and difference are common principles to the vast
majority of musical repertoire. It can be argued that they are responsible to a large
extent for cohesion and coherence within the musical piece.
7 Bibliography
Anagnostopoulou, C. and Westermann, G, 1997, Classification in Music: A Formal
Model for Paradigmatic Analysis, Proceedings of the International Computer Mu-
sic Conference, Thessaloniki, Greece.
Deliège, I., 1996, Cue Abstraction as a Component of Categorisation Pro-
cesses in Music Listening, in Psychology of Music, 20(2).
Forte, A., 1973, The Structure of Atonal Music, Yale University Press, New
Haven and London.
Fritzke, B., 1995, A Growing Neural Gas Algorithm Learns Topologies, in G.
Tesauro and D. S. Touretzky and T. K. Lean (eds), Advances in Neural Informa-
tion Processing Systems 7, MIT Press.
Höthker, K., Hörnel, D. and Anagnostopoulou, C., 2000, Investigating the
Infuence of Representations and Algorithms in Music Classification, in Computers
and the Humanities, 34(4), in press.
Nattiez, J.J., 1975, Fondements d’une Sémiologie de la Musique, Union Générale
d’Editions, Paris.
Stoianova, I., 1978, Geste-Texte-Musique, Union Générale d’Editions, Paris.
Ruwet, N., 1966, Méthodes D’Analyse en Musicologie in Revue Belge de Mu-
sicologie 20.
13
Self-Regulation and Musical Practice:
Proceedings paper
In contrast to these expertise-oriented perspectives, musical practice can also be studied in terms of the
self-regulated processes that students use to study their instrument. For many schoolchildren, practice plays a role
that is close to homework (Xu & Corno, 1998). Effective practice, like efficient homework, requires
self-regulation, which is evident when students are "metacognitively, motivationally, and behaviorially active
participants in their own learning process" (Zimmerman, 1986, p. 308). In this conception, self-regulation is not
seen as a fixed characteristic, but rather as a set of context-specific processes that students select from in order to
accomplish a task (Zimmerman, 1998). The degree to which these self-regulatory processes are employed by
students depends on six dimensions, which appear to be consistent across a range of diverse disciplines such as
music, sport and academic learning (Zimmerman, 1994, 1998). Reinterpreted for musical practice, these
dimensions incorporate:
1. Motive - feeling free to and capable of deciding whether to practise
2. Method - planning and employing suitable strategies when practising
3. Time - consistency of practice and time management
4. Performance outcomes - monitoring, evaluating and controlling performance
5. Physical environment - structuring the practice environment (e.g., away from distractions)
6. Social factors - actively seeking information that might assist (e.g., from another family member, teacher,
practice diary or method book).
We consider this self-regulatory perspective to be particularly attractive. Not only does it enable us to clarify key
processes involved in efficient musical practice and to compare these with other disciplines, but it may lead to a
more complete understanding of musical learning with implications for optimising practice. Consequently, it is
from this perspective that the present study is grounded.
Parameters of the Study
So far, most research has concentrated on defining the processes that lead to expert performance, often through the
use of retrospective accounts and studies in which performers are asked to prepare researcher-assigned pieces for a
formal performance. Relatively little research has studied practice in naturalistic settings, free of
researcher-imposed restrictions. Another gap in existing literature concerns the very beginning stages of learning
an instrument and particularly what young children actually do when practising their instrument at home. To
expand knowledge in this area, we analysed the videotapes of children's home practice using a procedure that
attempted to make our observations as 'normal' and therefore as ecologically valid as possible. Data we obtained
from the analyses of these videos was used to supplement information obtained from regular, detailed interviews
with a larger sample of 156 children in eight primary schools (K-6) who commenced learning at the same time,
and from interviews with their mothers, classroom teachers, and instrumental teachers. Our purpose was to
synthesise findings from the larger sample of interview data with new information obtained from analysing
children's videotaped practice, in a way that would shed light on the self-regulatory processes outlined earlier.
Procedure
At the beginning of the study all the children and their parents were invited to participate in the videotaping of
practice. Before the taping commenced, the 27 parents and their children who agreed to participate were
interviewed in order to explain the purpose of the study and in order for the researchers to stress that the home
practice sessions should be as normal as possible, and representative of how each child generally practised his or
her instrument. After viewing all videotapes, seven children (three females and four males) were selected for the
analysis reported here. The rest were excluded because they were irregular with videotaping of practice or because
the child's behaviour appeared to be unduly influenced by the recording situation. Two were novices, three had
learnt another instrument (e.g., piano) which they ceased playing before joining the school instrumental program,
and two were continuing to play piano while beginning their new band instrument. The sample consisted of two
trumpets, two clarinets, and one flute, saxophone and cornet.
Tapes of 14 practice sessions undertaken in Year 1 of the study (two for each of the seven children) and 10
sessions from Year 3 (two for each of the five children still learning) were selected for analysis. They were coded
using the software package, The Observer (Noldus Information Technology, 1995), which allows the researcher to
play the videotape at various speeds through a computer interface, and to use various 'channels' to code behaviour.
This process provides highly rigorous data that can be revisited by repeatedly viewing the videotape, although this
rigor comes at a high cost in terms of research time: a 10-minute practice session can take up to 5 hours to code.
Results and Discussion
Results for the analyses can be discussed according to the six self-regulatory processes as defined by Zimmerman
(1994, 1998).
Motive
To understand this dimension of self-regulation, it is necessary to examine the degree to which the children feel
free to and capable of deciding whether or not to practise. Results from interviews before the children commenced
learning (see McPherson, accepted) show that they were able to differentiate between their interest in learning a
musical instrument, the importance to them of being good at music, whether they thought their learning would be
useful to their short and long-term goals, and also the cost of their participation, in terms of the effort needed to
continue improving. During the first year of learning, the children's initial motives and expectations for learning,
as measured by whether they predicted that they would play until the end of primary school, high school or into
their adult lives, coupled with how much practice they undertook, provided a powerful predictor of their
achievement. Children who made the least progress tended to express more extrinsic reasons for learning, such as
being part of the school band because their friends were also involved. In contrast, children who made rapid
progress were more likely to express intrinsic reasons, such as always having liked music or wanting to play
particular pieces for their own personal enjoyment (our presentation will highlight findings from the larger sample
relevant to this aspect of self-regulation).
Method
The method dimension focuses on how the children practised, in terms of the essential conditions that allowed
them to choose or adapt a particular method when they practised. Statistics generated by The Observer revealed
that most of the children's practice consisted of simply playing the piece through without any other strategy being
used (see Table 1 - Year 1: 94%; Year 3: 96%). This type of playing was accompanied by foot-tapping for 4% of
the time in Year 1, declining to 2% in Year 3. It was also introduced by counting the beat aloud in 1% of the time
in Year 1, but this behaviour had disappeared by Year 3. Other strategies such as singing, silent fingering, and
silent inspection of the music each accounted for less than 2% of the total time in both years. No evidence of
chanting or using a metronome was observed.
Interviews with the instrumental music teachers revealed that the standard advice about practice given to the
students was to work for 15-20 minutes 5 days in the week, and that this should consist of repeating pieces and
exercises until a degree of fluency is reached. Contrary to this advice, the vast majority (Year 1: 90%; Year 3:
92%) of playing time was spent playing through a piece or exercise only once. Although the children would
occasionally stop and repeat a small section after an error, as soon as they finally reached the end of the piece they
seemed content to move on to another task. This trend was remarkably stable across the 3 years (see Table 1). As
a result, there was virtually no evidence of the deliberate practice strategies which typify expert musicians. For
these children, practice involved a rather superficial coverage of performance literature, with little evidence during
the first three years of the types of self-regulatory strategies that would enable them to more efficiently control
their own learning.
Time
How children plan and manage their time has important implications for how efficient their practice will be. In
Year 1, only 73% of the students' observed videotaped practice was spent playing their instrument. This
percentage rose to 84% by Year 3, suggesting that these five subjects were beginning to spend their time more
efficiently. As shown in Table 1, the vast majority of this playing time was spent on repertoire (Year 1: 84%; Year
3: 93%) with approximately equal time spent on ensemble parts and solo pieces. Technical work (scales and
arpeggios) took up the remainder of playing time (Year 1: 15%; Year 3: 7%), while the presence of playing by
ear, improvising, and playing from memory was negligible. This pedagogically unbalanced 'diet' (McPherson,
1998) is surprising, and reveals that the 'informal' practice found by Sloboda et al. (1996) in more experienced
young musicians had not yet emerged in this group of beginners.
Interestingly, the remainder (Year 1: 27%; Year 3: 16%) of the children's practice time was spent on non-playing
activities. These activities show an interesting pattern of change with skill acquisition. Time spent looking for
printed music to play rose from 45% of non-practising time in Year 1 to 76% in Year 3. Time spent talking or
being spoken to fell from 32% in Year 1 to only 8% in Year 3, mostly as a factor of the reduced presence of other
people in the room in the later sessions. Between Year 1 and Year 3, daydreaming fell from 4% to 3% of
non-playing time, responding to distractions fell from 4% to 2%, and expressions of frustration fell from 3% to
1%. Time spent resting between pieces rose from 3% of non-practising time in Year 1 to 6% in Year 3, possibly as
a factor of the longer pieces played at this stage.
Table 1 also reveals marked differences between individuals. For example, in Year 1, the least efficient learner
spent only 57% of his time actually playing, while the most efficient learner spent 82% of his time practising.
Research in academic subjects shows that many children actively avoid studying or use less time than allocated
(Zimmerman & Weinstein, 1994). This was also true in our analysis of the videotapes. For our least efficient
learner, 21% of his total session time was spent talking with his mother about his practice tasks in a highly
unfocussed manner, where the child's repeated errors became the primary focus, and a source of considerable
frustration. With some children, there was a high level of reference to the time, with frequent behaviours such as
calling out to a parent to ask if they were "allowed to stop yet". For our sample it appears that a minimum time
limit was often enforced, yet the efficient use of that time was not.
Performance Outcomes
A typical self-regulated approach to practice involves an ability to react by choosing, modifying and adapting
one's playing based on the feedback obtained when performing. We chose to assess this type of performance
outcome by analysing the nature of the children's errors (Palmer & Drake, 1997). Our two trumpet-players had to
be eliminated from this analysis because their pitching was too inaccurate: using aural analysis alone meant that
only the clearer pitching of beginner woodwind instruments and the cornet could be assessed. Clear pitch and
sound-production errors were coded; no attempt was made to assess rhythmic accuracy, which varied enormously
among the sample.
As Table 1 shows, nearly half the errors made by subjects in the 1st year of learning were ignored, which points to
a general inability by the children to self-regulate the accuracy of their music reading. The attentional demands of
learning an unfamiliar instrument, together with the considerable cognitive challenge of learning to read music at
the same time, left the students largely unable to verify their own accuracy. However, there were very large
individual differences between subjects in their self-regulation of accuracy. Table 1 shows the total errors per
minute and the ignored errors per minute for the two subjects with the highest (KR) and lowest (WD) error rate. It
reveals how, while the subject KR makes many more errors than WD, she also ignored a far higher proportion of
these errors than WD did. WD's regulation of his own accuracy was remarkable in Year 1 and also when we
analysed his practice in Year 3. Most notably, his rate of improvement is very high on the second run-through of a
piece: in Year 1 his error rate fell from 1.4 per min on the first run-through to 0.6 per min on the second
run-through, suggesting that he possessed an outstanding ability to retain a mental representation of his
performance between run-throughs, and to use this as a basis for learning from his errors. In Year 3, the same
phenomenon prevailed with WD. Table 1 shows the ratio of errors on the second run-through divided by the error
rate on the first run-through. Although the frequency of WD's errors had risen (Year 1: 1.4; Year 3: 6.7) because
of a steep increase in the difficulty of the repertoire he was playing, the error rate on the second run-through of his
practice was only 34% of that on the first.
Such large individual differences in children's ability to self-regulate the accuracy of their playing can partly be
explained by considering the enormous demands placed on working memory for children simultaneously learning
to read notation, to manipulate the keys or valves on their instrument, and to adjust their embouchure according to
aural feedback. The tradition from which these students come places great importance on learning to read notation
from the first lesson, and for many of them, there is insufficient opportunity to learn to associate their nascent
aural schemata with the notation. The most accurate students in the study were relieved of this high cognitive load
because they had learnt how to read music on instruments such as the piano or recorder before starting on their
band instrument. The seven children fell into three clear groups concerning prior learning, and these corresponded
clearly to their ability to monitor their playing. The two children who had previously learnt the piano and were
continuing made an average of 2.6 errors per minute in Year 1; those who had previously learnt an instrument but
had discontinued averaged 7.3 errors per minute; and those that were complete beginners (the two trumpeters)
made too many errors to count. Thus, the children with prior experience in learning another instrument, and for
whom reading had become to a certain extent automatised, displayed a more refined ability to monitor and control
their own playing.
Physical environment
Self-regulated learners are aware that their physical environment should be conducive to efficient learning. There
was a wide range of locations chosen by the children for practice, ranging from the privacy of a bedroom to a
shared family space. Some children would appear in different rooms in different sessions, suggesting that they
were choosing a quiet space according to the family situation on the day. This appeared to give the children access
to help from other family members when they needed it, but also meant that some needed to spend some of their
practice time coping with distractions from siblings, pets, and a television in the next room. Data obtained from
both the videotaped practice sessions and child/mother interviews, shows that the physical environment was
mostly well-equipped with a music stand and an appropriate chair (on the videotapes only one child stood while
practising). However, differences between children were noticeable. One child practised (in his pyjamas) while
sitting cross-legged on his pillow with the bell of his trumpet resting on the bed. The poor posture of this young
learner could be contrasted with some of his peers, who were more capable of holding their instrument correctly
while sitting or standing with a straight back and suitable playing position.
Social factors
When faced with difficulties, self-regulated learners actively seek help from knowledgeable others. The
observation of family involvement reveals a rich pattern (see Table 1) with a noticeable decline in the
participation of parents between the 1st and 3rd years of learning. In Year 1, one or both parents were present in
the room for 65% of the observed time. (This level of participation may, of course, have been affected by the role
some parents took in being a camera-operator). This time spent in the practice room further broke down into three
parental behaviours: 6% involved a parent 'teaching' the child (i.e. taking a very active instructive role). Another
12% of parental involvement can be described as 'guiding' (e.g. "What piece are you going to do first?"). The
remainder of the time (81%) was spent 'listening' less actively again. A large amount of maternal involvement
with some of the children consisted of bolstering motivation and delivering praise ("That sounds fantastic!").
Discussion between parent and child about appropriate practising strategies was found in only one subject, and
this was highly argumentative - certainly falling outside of the parental involvement that might be called
"autonomy-supportive" (Grolnick, Kurowski, & Gurland, 1999). Nevertheless, by the 3rd year of the study, a
higher level of autonomy was observed, with parents present in only 22% of the time, and now almost exclusively
in a semi-listening but supportive capacity.
In Year 1, five of the seven children showed high usage of a practice diary in which the teacher had written down
set tasks. The two trumpeters, who showed poor monitoring of their errors, were not observed referring to a diary
at all. By Year 3, only two children continued to refer to their diary, possibly implying that the other three children
were capable of remembering what had been assigned by their teacher.
Conclusions
Zimmerman (1998) concludes that the self-regulatory processes identified here are distinguishing characteristics
of experts across a number of diverse disciplines that include music, sport and professional writing. He also
believes that they can be found, to a greater or lesser extent, in the early stages of learning. It can therefore be
speculated that musicians who display these characteristics early in their development will be more likely to
practise harder and more efficiently, will display a higher self-efficacy about their own capacity to learn, and be
more likely to achieve at a higher level. Early results from our interview and videotape research show that the
practice habits of the children we studied varied considerably and that there were important differences between
the children on each of the six self-regulatory processes, even from the very earliest practice sessions.
Our results lead us to conclude that a majority of our learners possessed the will to learn their instrument, but not
necessarily the level of skill required to ensure efficient and effective practice. By this we mean that the young
learners were typically excited about learning their instrument and came to their learning as optimistic, keen
participants. However, while their instrumental teachers were making them aware of what to practise, many had
very little idea of how to practise. An important implication therefore is that teachers should spend time during
their lessons demonstrating and modelling specific strategies that their students can try when practising, such as
how to correct or prevent certain types of performance errors. However, such strategies will be ineffective unless
the learners also develop their capacity to monitor and control their own learning. Consequently, teachers should
also devise strategies whereby learners can be encouraged to reflect on the adequacy of their own practice habits,
and especially on how they might invent better ways (such as self-reflective comments in their diaries) that will
help them practise more efficiently. Our preliminary findings suggest that the skills of knowing how to
self-monitor, set goals and use appropriate strategies take time to develop in young children. Helping children to
reflect on their own progress and ability to employ self-regulatory processes may go some way to improving
instrumental instruction, especially for children who do not pick up these skills implicitly.
Realising that our study only scratches the surface of the complex issues which surround the self-regulatory
behaviour of young musicians, we intend to build on the findings reported here in order to construct a more
detailed profile of the participating instrumentalists from the data yet to be analysed. At their most basic level, our
early results, combined with the extensive body of evidence found in academic learning confirms that the six
self-regulatory processes are used to greater or lesser degrees in young musicians as a means of improving
performance. Every time a young musician self-initiates practice, consciously plans what to practise, chooses to
correct their performance, structures their learning environment, or actively seeks information from
knowledgeable others, they come one step closer to refining the self-regulatory processes that will eventually
become automatised. For researchers, the challenge involves expanding and clarifying these issues in a way that
will provide useful information that teachers can use to cater for the wide range of abilities which they encounter
in their everyday teaching.
Note
This research has been supported by a large Australian Research Council Grant (No. A79700682), awarded for 3 years in 1996.
References
Chaffin, R., & Imreh, G. (1997). "Pulling teeth and torture": Musical memory and problem solving.
Thinking and Reasoning, 3(4), 315-336.
Ericsson, K. A. (1997). Deliberate practice and the acquisition of expert performance: An overview.
In H. Jørgensen & A. C. Lehmann (Eds.), Does practice make perfect? Current theory and research
on instrumental music practice (pp. 9-51). Oslo: Norges musikkhøgskole.
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the
acquisition of expert performance. Psychological Review, 100(3), 363-406.
Grolnick, W. S., Kurowski, C. O., & Gurland, S. T. (1999). Family processes and the development of
children's self-regulation. Educational Psychologist, 34(1), 3-14.
Gruson, L. M. (1988). Rehearsal skill and musical competence: Does practice make perfect? In J. A.
Sloboda (Ed.), Generative processes in music: The psychology of performance, improvisation, and
composition (pp. 91-112). Oxford: Clarendon Press.
Hallam, S. (1995). Professional musicians' approaches to the learning and interpretation of music.
Psychology of Music, 23, 111-128.
McPherson, G. E. (1998). Music performance: Providing instruction that is balanced, comprehensive
and progressive. In C. van Niekerk (Ed.). Conference Proceedings of the 23rd International Society
for Music Education World Conference (pp. 397-410). Pretoria: University of South Africa.
McPherson, G. E., & McCormick, J. (1998). Motivational and self-regulated learning components of
musical practice. In T. Murao (Ed.), Proceedings of the 17th Research Seminar of the International
Society for Music Education (pp. 121-130). Johannesburg: University of Witwatersrand.
McPherson, G. E. (accepted). Commitment and practice: Key ingredients for achievement during the
early stages of learning a musical instrument. Paper accepted for the XXIV International Society for
Music Education Research Commission, Salt Lake City, USA, July 10-15, 2000.
Miklaszewski, K. (1989). A case study of a pianist preparing a musical performance. Psychology of
Music, 17, 95-109.
Nielsen, S. G. (1999). Learning strategies in instrumental music practice. British Journal of Music
Education, 16(3), 275-291.
Noldus Information Technology. (1995). The Observer, base package for Windows: Reference
manual, version 3.0 edition. Wageningen, The Netherlands.
Palmer, C., & Drake, C. (1997). Monitoring and planning capacities in the acquisition of music
performance skills. Canadian Journal of Experimental Psychology, 5(4), 369-384.
Sloboda, J. A., & Davidson, J. W. (1996). The young performing musician. In I. Deliège & J. A.
Sloboda (Eds.), Musical beginnings: Origins and development of musical competence (pp. 171-190).
Oxford: Oxford University Press.
Sloboda, J. A., Davidson, J. W., Howe, M. J. A., & Moore, D. G. (1996). The role of practice in the
development of performing musicians. British Journal of Psychology, 87(4), 287-309.
Williamon, A., & Valentine, E. (in press). Quantity and quality of musical practice as predictors of
performance quality. British Journal of Psychology.
Xu, J., & Corno, L. (1998). Case studies of families doing third-grade homework. Teachers College
Record, 100(2), 402-436.
Zimmerman, B. J. (1986). Becoming a self-regulated learner: Which are the key subprocesses?
Contemporary Educational Psychology, 11, 307-313.
Zimmerman, B. J. (1994). Dimensions of academic self-regulation: A conceptual framework for
education. In D. H. Schunk & B. J. Zimmerman (Eds.), Self-regulation of learning and performance:
Issues and educational applications (pp. 3-21). Hillsdale, NJ: Erlbaum.
Zimmerman, B. J. (1998). Academic studying and the development of personal skill: A
self-regulatory perspective. Educational Psychologist, 33(2/3), 73-86.
Zimmerman, B. J., & Weinstein, C. E. (1994). Self-regulating academic study time: A strategy
approach. In D. H. Schunk and B. J. Zimmerman (Eds.), Self regulation of learning and
performance: Issues and educational applications (pp. 181-199). Hillsdale, NJ: Erlbaum.
Back to index
Proceedings abstract
Peter Vazan & Michael F. Schober, New School for Social Research
vazanp01@newschool.edu
Background
The fact that arbitrarily many rhythms can be constructed from a given meter,
and that arbitrarily many meters can generate a given rhythm, suggests that
rhythm and meter are independent. However, the fact that people do not
recognize the same rhythmic pattern presented in different metrical contexts
indicates that rhythm and meter are intricately related.
Aims
In this study we assess the psychological status of meter and its dependency on
the stimulus' accent structure. We do this by examining people's ability to
hear the same pattern in a new way by imposing a different ground (i.e., meter)
on an unchanging rhythmic figure.
Method
Results
Participants took much longer to impose new metrical structures than to play
rhythmic variations. This was especially true for syncopated alignments;
participants took four times as long to impose a new meter when the underlying
beats were not aligned with the sounding events as to tap a syncopated rhythm.
Some participants were not at all able to impose a new meter on some trials.
Conclusions
The difference between constructing arbitrarily many rhythms from one meter and
generating arbitrarily many meters from one rhythm is a principled one--it is
the difference between what is embedded and what embeds (i.e., the framework).
Our findings that people have trouble imposing new meters in the absence of
facilitating acoustic cues show that pattern structure provides strong
constraints on how a sequence is perceived. This suggests that we do not
arbitrarily choose and construct our perceptual frameworks and cognitive
reference points, but rather that they are given and structured by our
environment and the world around us.
Back to index
Proceedings paper
conversation, participants most often interact by taking 'turns,' and concurrent verbalization with two
persons is rare; in conductor/ensemble frameworks, 'both' participants are producing their outputs
simultaneously under most circumstances. (There are other kinds of activities where these kinds of
simultaneous interactions may be seen: dance, some kinds of sports--rowing, football, and others--and
physical theatre come to mind. We lay these aside for the time being, despite their clear interest to the
study of musical interactions.)
A second comparison with language is instructive. In conversation, both parties use verbalization as
their typical means of interaction. In conductor/ensemble frameworks, the conductor contributes
primarily by means of gesture, and the ensemble contributes primarily by means of sound. In fact, this
two-modality interaction may even be an asset in coordinating synchronized activity. Our overall
framework is one of a closely time-coupled feedback or contribution system in which each
participant's output is closely monitored by the other, who in turn modifies his/her activity and who
contributes this modified output back to the interaction.
Conducting and pragmatic analysis: what we can learn from studies in linguistics
Conductors produce a large number of physical gestures--facial expressions, hand motions, moves
involving the entire torso--which we hypothesize are related to musical structure and musical
expression in some way(s). The question is: how are these gestures to be interpreted? Our approach is
to interpret conductors' gestures with frameworks derived from studies in linguistics. Therefore we
will begin by looking at how researchers in linguistics have related gesture to language.
Physical activities of conductors
A conductor's visual output to the ensemble incorporates a number of interrelated but separate
'streams' of information. At this time we identify four streams (facial expression, handshape, hand
movement, and torso placement and movement), but classify all of these as 'gesture' for simplicity.
We are particularly interested in those gestures which conductors make which are not explicitly
related to giving information about meter and tempo (that is, not related to 'keeping the beat'). This
paper focuses on handshape and hand movement, specifically with the non-baton-holding hand
(typically the left hand) and with facial expression.
Categorization of types of gesture: 'Kendon's continuum'
The study of gesture's relationship to language is being pursued by a relatively small group of
scholars. One of the foremost of these researchers is David McNeill, of the University of Chicago; his
book Hand and Mind (1992) is a summary of years of investigation into the relationship of gesture
and language to thought. One of the topics of the book is the way in which gestures can be
categorized,. McNeill begins with what he calls 'Kendon's continuum' (after Kendon 1988). This
involves placing gestures at some point along a line, represented by the following schema (McNeill
1993, p. 37):
Gesticulation > Language-like gestures > Pantomimes > Emblems > Sign languages
McNeill describes the traversal of this line, from left to right, as having certain characteristics: "(1) the
obligatory presence of language declines; (2) the presence of language properties increases, and (3)
idiosyncratic gestures are replaced by socially regulated signs" (ibid.). Thus, gestures may be
categorized by their independence from spoken language, their independently-perceived linguistic
properties, and the relative presence or absence of criteria of well-formedness for some gesture. Let us
now proceed to consider the relationships between this framework for understanding gesture and
speech and the connection between conductors' gestures and ensembles' musical sounds.
● Cutoff (release): left (or right) hand makes a tight loop with a sudden ending
● Cue (for entrance of a part): eye contact with player(s) to be cued, then a pointing or
downward-stroking motion.
However, these descriptions are subject to more localized difference, raising the question of standards
of well-formedness for these emblems; in addition, the degree of change called for in dynamics
requires interpretation. This will be considered in more detail in a subsequent section of this paper.
Pantomime and 'language-like gestures' in conducting
McNeill defines pantomime as depicting objects or actions, without the need for co-occuring speech
(McNeill p. 37). Conducting treatises rarely mention behaviors which resemble pantomime, and when
they do, it is with disdain. Consider the following (from Farberman, p. 27): "The Shakers shake the
baton throughout a beat pattern or in the direction of individual players. They are especially addicted
to prolonged shaking on holds. (Why don't the players reciprocate by shaking their instruments at the
conductor?) The sound scatters as a result of the very wide vibrato employed by the observant
orchestra, a direct result of the shaking baton." Other kinds of pantomime can be observed in
conductors, especially mouthing the words of texts or pantomiming the bowing of stringed
instruments; nevertheless, such behavior is largely unmentioned in treatises. Moving leftward along
Kendon's continuum from pantomime, there seems to be no analog to the language-like gestures
defined by McNeill, where a gesture fills a slot for a term in an otherwise auditory speech stream (for
example, "the parents were all right, but the kids were [gesture]" (McNeill p. 37).
The leftmost edge of Kendon's continuum includes gestures which are typically co-generated with
speech, have no conventional meanings, and are highly dependent on context for their interpretation.
Many of the 'expressive' gestures of conductors seem to fall into this category; there are relatively few
standards of well-formedness at work. Rather, these kinds of gestures, which are the focus of
McNeill's research efforts, "...are free to incorporate only the salient and relevant aspects of the
context. Each gesture is crated at the moment of speaking and highlights what is relevant..." (McNeill
p. 41). These encompass some of the most interesting and least discussed of conductors' gestures.
Grice's Cooperative Principle and maxims: A theoretical approach to pragmatics
One highly influential theory of linguistic pragmatics was laid out by the philosopher H. Paul Grice.
The fundamental components of Grice's theory were presented at Harvard in 1967 as the William
James Lectures and published piecemeal over a number of years (especially Grice 1975, 1978; for a
rather more readable discussion, see Levinson 1983, Ch. 3). These lectures deal with the gaps that
occur in conversational interchange if it is understood only in terms of the logical syntax and
semantics of the statements made by the participants. In order to show how people 'make sense' of
conversations where logic alone would be insufficient, Grice describes what he calls the 'Cooperative
Principle' (CP), which is a generalized proposal for understanding how persons interact in rational
ways. The CP states that one should behave so as to:
Make your contribution such as is required, at the stage at which it occurs, by the
accepted purpose or direction of the talk exchange in which you are engaged.
Grice then gives a set of four 'maxims,' some of which have submaxims, which elaborate on how this
CP can be implemented. These are:
The maxim of Quality: try to make your contribution one that is true, specifically:
(i) do not say that which you believe to be false
(ii) do not say that for which you lack adequate evidence
The maxim of Quantity
(i) make your contribution as informative as is required for the current purposes of the
exchange
(ii) do not make your contribution more informative than is required
The maxim of Relevance: make your contributions relevant
The maxim of Manner: be perspicuous, and specifically
(i) avoid obscurity
(ii) avoid ambiguity
(iii) be brief
(iv) be orderly
These provide the basis for Grice's analysis of how speakers can go beyond the actual words and
forms of sentences they hear, by a process of implication. Implication allows one to make sense of
interchanges such as the following:
"do not let your gestures become too involved or confusion will result." [Rudolf p. 248]
The maxim of Relevance (make your contributions relevant)
"The genuinely inspired musical leader concentrates upon meeting the demands of the
music and of the orchestra; he has no time or energy for superficial gestures having only
audience appeal." [Rudolf p. 240]
The maxim of Manner (be perspicuous, and specifically avoid obscurity, avoid
ambiguity, be brief, and be orderly)
"The two extremes to be avoided are shyness and exhibitionism." [Rudolf p. 240]
"...the conductor's...technique...may be defined as a highly individualized craft to evoke
specific responses on the part of the players with the most effective gestures..." [Rudolf
p. 314]
"When and how to use the left hand are matters of individual taste, but it should always
tell the orchestra something essential. If the conductor uses the left hand continually, the
players will ignore it." [Rudolf p. 243]
A deeper look: examining the criteria for implicature to take place
As Grice states his framework, there are a number of conditions which must be met for implicature to
happen: cancellability, nondetachability, calculability, and nonconventionality.. Let us take these
conditions one by one and compare them to the conductor/ensemble framework. Only one musical
case per condition will be mentioned; in general, many parallel cases exist as well.
1) The implicature must be able to be canceled by the addition of information that undercuts the
implicature or makes it clear that a speaker is 'opting out' of the CP.
● What kind of additional information might cause an implicature from conductor expression to
be canceled? a variety of examples might be adduced. Take, for example, co-occurring gestures
which undercut the implied intent of some conductor gesture: a hand cue to enter is given at the
appropriate time for a certain performer awaiting a cue, but the conductor's gaze is directed
elsewhere. The implication that the waiting performer should come in is undercut, causing at
the least confusion and, in more extreme cases, some considerable degrees of ensemble chaos.
● A quote from Farberman is in order here: "Many beginning conductors are guilty of indicating
pulse with their head. It is a disruptive gesture that disarms the baton of authority by
supplying an alternative beat...A head pulse, or beat, most likely will not correspond with the
baton beat. Thus, at the the point of attack, the conductor offers the orchestra a choice of two
pulses. Most often the result is a weak, imprecise orchestral entrance." [Farberman p. 8,
emphasis in the original]
2) The inference in the implicature must inhere in the semantics of the statements themselves and not
just their surface forms (Grice's is 'nondetachability'). The use of synonyms in place of the exact
words of an utterance should not undermine the implicature.
● With regard to nondetachability, the application to conductor/ensemble interaction seems more
problematic, as the semantics of the various gestures themselves are often unclear.
● Again, from Farberman: "Is there a correct physical [i.e. conductor's gestural] response to a
musical problem? A single 'correct' musical/physical solution to a musical problem does not
exist. ... It is a given that any two conductors confronted with the same musical problems will
view them differently and devise distinct musical, thus physical, solutions. Even conductors
who could agree fully on the musical meaning of a score would produce dissimilar results
because of their individual motor and muscular skills and unique body structures. ... Conductors
must think of stroke choices just as string players think of bowing possibilities, there are
generally several solutions for most problems. In theory any baton stroke can be used for
any solution, so ALL strokes may be 'correct.' But in practice, the 'correctness' of the stroke
depends on who chooses what stroke and when, and how and to what effect it is used."
[Farberman p. 178, emphasis in the original]
3) Implicatures must be calculable; that is, given some unclear communication, one must be able to
construct a chain of explanation, using what is given in the gestures, and the CP and maxims, to show
how a reasonable interpretation preserving the CP is to be made.
● One might make the case that syncopation provides a good example of a musical context which
shows how calculable implicatures might be made. This example is a little more involved than a
simple quote but is, we hope, illuminating.
● In The Grammar of Conducting, Max Rudolf discusses syncopation in the following way:
"Syncopated passages without accents require no special beat. The gestures must be very
definite and the rhythm steady. You must beat, so to speak, between the notes, not on them. ...
Syncopated notes with accents are indicated on the preceding beat, which is staccato. The
sharpness of the beat increases with the degree of the accent. In contrast with an ordinary
accent, which is on the count, this staccato beat is not prepared. The beat itself is the
preparation for the syncopated note that comes after the count. Again, never beat the
syncopation, beat the rhythm!" [Rudolf, 1949, pp. 207-208]
● What is the player to make of this situation? normally the increased emphasis on the beat would
indicate that the note on the beat would receive an accent. However, in in this case there may be
no note initiated on the beat itself (the example given in the Rudolf text, from Stravinsky's
L'Oiseau de Feu, has only tremolos on the beat, giving no overt sense of rhythm). The chain of
reasoning by the players might run something like this: (1) The conductor is telling us to accent.
(2) However, there is no note to accent at the place s/he is indicating [causing the player to
wonder if the maxim of Relevance--or possibly that of Quality--is being violated]. (3) This
conductor knows what s/he is doing, so the indicated 'accents' must have something to do with
our musical purpose [the player is preferring to believe that the CP is still in force and that the
maxims of Relevance and Quality are somehow still being preserved]. (4) Therefore, it makes
most sense to infer that the accents are simply displaced from the nominally-indicated metric
position,and should be applied to the intervening notes that are on the normally-unaccented
positions [since to show accents on those positions would require indicating twice as many
beats as the conductor is currently giving us, and that would be a violation of the maxim of
Quantity, by giving more information than is really necessary, since the conductor knows we
can make the appropriate implicature to arrive at the correct interpretation ."
4) The meanings understood in implicature are nonconventional--that is, they are not part of the
conventional meanings of the words themselves (not part of the literal meanings of the words, but
derived from them through the CP and maxims).
Only gestures which are fully lexicalized would have meanings so governed by convention that they
needed no contextual interpretation; the discussion of dynamic change to follow in the next section
will explicate this further.
Two test cases for pragmatics and conducting
Having assembled all this theoretical machinery, it is time to apply it to musical situations and see
what results. The method we use employs videotapes of conductors in a variety of manners. In a
paper, of course, this method is difficult to convey. Interested readers may contact us for the actual
examples used and more detailed analysis of the examples. In the following we look at processes of
implicature in two cases: interpreting a dynamic-change gesture and interpreting a pantomime gesture.
Case 1: Dynamic alterations and implicature
Gestural patterns indicating changes in dynamics are a mainstay of the conductor's art. Commonplace
as they are, there are a few interesting pragmatic issues surrounding the communication of dynamic
changes between conductor and ensemble. One of these seems to be related to what Levinson,
following Gazdar, calls scalar implicature (Levinson pp. 132-136; Gazdar 1979). Briefly put, the idea
is this: when someone says "John wasted some of his money at the casino," the implicature is that
John did not waste ALL of his money at the casino, even though in a strict logical sense that could be
true as well. Or, if say "John has three children," one might reasonably interpret the sentence as "John
has three children and no more," when in fact if John had eleven children, the statement that he has
three children is still true. Given some ordered scale of terms, such as <none, few, some, many, all>,
to state one of these terms implicates "this level and no more" of that to which the term is being
applied, by means of the maxim of Quantity. Such inferences are the stuff of which scalar
implicatures are made.
We can make a connection to interpreting conductors' gestures regarding changes in dynamics within
the same framework. Our ordered scale of dynamic levels might be <pp, p, mp, mf, f, ff>. Given a
starting dynamic level, under most circumstances a conductor's crescendo or decrescendo gesture can
be interpreted to mean that the ensemble should move from the current dynamic level to the next
higher or lower level, on the same basis as that used for the scalar implicatures just described.
However, this is a matter of inference. Max Rudolf observes, disapprovingly, that "Many players have
a tendency to play loudly at once when they see 'cresc.' and softly when their parts indicate 'descresc.'
or 'dimin.' (Rudolf, p. 60)." This indicates that, to his mind, such players are behaving improperly,
when in fact a subito forte or subito piano would be one not-forbidden interpretation of the dynamic
indication. How can this be? Probably to Rudolf, the crescendo or decrescendo indicator is a scalar
indicator, from which the implicature is something like "from our current level, say mf, make a
decrescendo, which implies mp, and not more change than that." The player giving a more abrupt
change in the dynamics is, literally and logically, not wrong--but has failed to produce an appropriate
implicature.
For testing this implicature we used a videotape of a conductor leading an orchestra in a passage by
Mozart. At one point a decrescendo is desired, and the conductor uses the typical palm-down gesture
to indicate this. At this point in the soundtrack for the video, two different versions were produced.
One was the original audio as recorded by the orchestra, and the other was a modified version where
the decrescendo was more in the nature of a subito piano. A group of 8 musicians experienced in
ensemble playing was shown both versions and asked which one most closely reflected the
conductor's intentions, and why. All of the musicians remarked that both were possible interpretations
but that the less-marked version (the original) was preferable. When asked to give the reasons for their
choice, most gave as their reason that the motion made implied a modest decrease in volume but not a
sudden one, lending support to the notion of scalar implicature as experienced in musical
performance.
Case 2: 'Overacting' and implicature
Conducting treatises make many comments on how much guidance a conductor should give an
ensemble, usually to the effect that one should be economical, to the point, and specific in gesture and
word (Farberman, Chapter 27, and Rudolf, Chapter 31, are typical examples). One might see these as
specific applications of the maxims of Quantity and Manner. In fact, many of the behaviors which
musicians specifically dislike in conductors can be traced to conductors' violations of these maxims,
in the general domain of 'overacting.' Let us consider one such case from one of our videos.
In this instance, the conductor is mouthing the words to the music (a large choral/orchestral work of
Mozart). This kind of gesture falls under the category of 'pantomime,' as it mimics the actions of the
chorus. In this instance the maxim of Relevance is being observed as far as the chorus goes, but this is
not the case with the orchestra. One might predict, then, that orchestra members would respond more
negatively than chorus members, because of their somewhat differing interpretations of Relevance.
However, even for the chorus, another maxim might be violated here: Quantity. Mouthing the words
to the chorus is almost always unnecessary (and universally derided by teachers of conducting,
although one sees this behavior even with very prominent conductors); it violates the injunction to
make contributions no more informative than necessary. There may be a related phenomenon at work
as well; there is research which indicates that speakers have a tendency to gesture, but not listeners.
When two participants in a conversation trade roles, the one who had been speaking typically stops
gesturing, only to resume again when s/he adopts the role of speaker again. Thus, for a conductor to
constantly engage in large amounts of gesturing, over and above that which is necessary, might be
interpreted as not sharing responsibility adequately with the orchestra.
To test these implicatures, we played this videotape for the same group of ensemble musicians and
asked them to describe their reaction to the conductor. As predicted by the violation of the maxims of
Quantity and Relevance, all members of the group indicated their dislike for the conductor's mouthing
of the words. When asked why they disliked this, there were two answers: the first was that for the
singers it was unnecessary, and for the instrumentalists it was unrelated to their parts.
Concluding remarks
The ways in which linguistic theories related to study of gesture co-occuring with speech and to the
interpretation of incomplete or uncertain parts of dialog have been shown to have relevance to
understanding the ways in which conductors and ensembles communicate with each other. This paper
has focused on only a few aspects of this, with many other dimensions left for a more extensive
discussion. Our view is that the theories and methods developed in linguistics have much to offer the
study of musical behaviors, especially with regard to interactions between musicians.
References
Farberman, H. (1997) The Art of Conducting Technique. Miami: Warner Brothers.
Gazdar, G. (1979) Pragmatics: Implicature, Presupposition, and Logical Form. New York: Academic
Press.
Grice, H. P. (1975). Logic and conversation. In P. Cole & J. Morgan, eds., Syntax and Semantics, vol
3. New York: Academic Press, pp. 41-58.
Grice, H.P. (1978). Further notes on logic and conversation. In P. Cole, ed., Syntax and Semantics,
vol. 9. New York: Academic Press, pp. 113-128.
Lerdahl, F., and Jackendoff, R. (1983) A Generative Theory of Tonal Music. Cambridge,
Massachusetts: MIT Press.
Back to index
Proceedings paper
INTRODUCTION
Charles Rosen's statement 'musicology is for musicians what ornithology is for the birds'
reflects how, in our culture, musical analysis and musical performance diverge. Analysis
tends to describe music as pure sound relations whilst performance may be regarded as an
embodied shareable meaningful action-device: The two forms seem to result from very
different cognitive operations or realms. Consequently, it is not clear to what extent analytical
knowledge of large-scale structure is important for performers to shape their own
performances; or, generally, what role analytical thought has in the creative process of
shaping musical performances.
We also often hear people saying that a particular musical performance was 'very intelligent'
or 'insightful', but what precisely is meant? Ask a number of people the question, and a
divergence between what they say, imagine and think about music will emerge. This range
of individual differences presents the teacher with a great challenge: How to encourage,
increase knowledge, assist development, yet allow an individual learner to preserve his or
her own ideas about music. Thus, Within the educational setting, clear goals and objectives
are necessary. To achieve these goals, a discourse which reflects our experience of music is
required. Musical works need to be submitted, like any other 'signifying system' or 'text' in
every culture, to hermeneutic processes in order to be understood, and discourse is, of
course, a necessary tool, if not an indispensable one, to these hermeneutic processes.
The need for the hermeneutic becomes obvious when one considers that a score itself,
though having a huge variety of indications (from composer's notes accompanying the score
to the title and expression markings), does not provide the necessary acculturation for one to
be able to play stylistically - that is, to be authentic to both the cultural style and the
individual interpretative style. Yet, the indetermination of the score provides a reason why
the interpreter can and should be creative.
In the existing pedagogical literature, the processes involved in conceiving, rehearsing and
performing the expressive elements of a musical work are not given serious attention, with
the emphasis in method books being on technical and formal aspects of playing an
instrument. Anecdotal evidence from a variety of music lesson contexts illustrate that
interpretation is discussed, but is mainly regarded as an account of the notation which draws
upon a large stock of standardized expressive effects acquired through stylistic imitation of a
stereotype - either coming from the teacher, a particular 'school' of playing, or a famous
interpreter.
The teacher's job is to engage the student in creative reading, translation and construction
processes, and, thus, to oppose the general tendency to encourage technically focused
practice, which seems to be potentially constraining. I believe that an expressive and
communicative engagement with music from the very beginning of practice teaches the
students to think expressively and to play intentionally. In the current paper, a series of
practical investigations where cultural symbols were used as hermeneutic triggers and
explored as a means of stimulating creativity in the student will be examined. The basis of
the approach was that the student was guided to develop her/his own ideas about
interpretation, rather than depending upon imitative models. These investigations will be
triangulated with data collected from class observation, interviews with internationally
recognised performers, and recent developments IN neuroscience. The main two arguments
of this paper are firstly, that music reveals the bodily origin of meaning, and secondly, that
the performer operates in the moment to make any particular performance more or less
meaningful to him/herself, co-performers and the audience.
In the case of music revealing the bodily origin of meaning, I propose that the focus of
discourse about music, at least in instrumental practice and teaching, should be taken out of
the usual context where either simplistic imitation or else forms of abstract theorizing are
used to convey how the interpretation should be achieved. As it hopefully will become clear
from the examples, teaching should be focused on the creative use of action or movement
metaphors and/or expressive physical gestures to ground the understanding of musical
gesture. Musical meaning is inseparable from its physical, presential and temporal
experience.
The performer operates in the moment. Given the 'here-and-now' of the performance
context, the motor programming achieved through practice and experience (both physical
and psychological) and the mental state of the performer during the performance ritual, I
propose that the performer is most likely operating in a concatenationist fashion (cf.
Levinson, 1997). When the performer is grounding his or her performance in expressive
actions and gestures emerging from their individual stock of human traits and experiences,
they are creating the conditions to function, as exclusively as possible, from core
consciousness (cf. Damasio, 1999). During their training, performers should also be
prepared to deal with this physical/psychological state.
NOTE
For all the practical work described in this paper, a number of qualitative research
methodologies were drawn upon: participant observation (myself as teacher and co-worker
with ten students as they prepared for examinations over a period of six months); video
analysis (systematic and critical reflections on recorded lessons and performances
commentaries provided by students and teachers); semi-structured interviewing, with
interpretative phenomenological analysis being used as a framework for data analysis (see
Smith, 1995).
Instrumental class observation shows clearly how teachers communicate to their students
how they feel about the phrasing or the whole atmosphere of the piece: they use physical
gesture and body movement - specially for situations where expression has to do with
external/visible movement - and a wide range of metaphors, especially for situations where
expression has to do with 'inner motion' or 'inner reaction' (like fear, joy, excitement,
elevation, contraction, tenderness, tension build ups and relaxation, suspense or anger).
When rehearsing, performers make a decision, consciously or not, to adopt a particular
context or semantic field for the concerned musical work. It does not make any difference if
the context was chosen in the sequence of a complicated formal analysis or if it was just
suggested by free association. Inspired from the chosen context, performers use movement
images and action-metaphors in order to inject sounds with the right emotional content or, as
Trevarthen (1999) puts it, they use them to coordinate their emotional acting and its
channelling into an imagined narrative of purposes .
For each musical phrase or situation, they take the emotional content (that is, intrinsic
relations of movement and rest, speed and slowness, tension and relaxation, etc.) from the
metaphorical referent, as if abstracting it from the original context of experience in which it
was formally integrated. When applying this emotional content to the musical sounds they
make them expressive, but fairly abstract. Kendall Walton (1997) neatly describes this
process as creating 'a smile without a face'. The imagination of the active listener will work to
find a new face to that smile. It is in this sense that one may say that music reveals the
bodily origin of meaning.
From the observations and my own experimentation with students, it seems reasonable to
conclude that the students could easily focus on musical expression after negotiating the
musical material with their bodies, or, in other words, after embodying the musical meaning.
Suggested by the work-in-context, the action metaphors 'inspire' expressive actions and
gestures, which emerge out of our stock of human traits and experiences. To explore the
metaphors is to capture their physical qualities, their intrinsic relations of movement and rest,
speed and slowness, and simultaneously, to explore the flexibility of the musical material to
express these relations. The metaphors work like the way in to processes of symbolic
activity, linking both cognitive and bodily structures. Musical communication seems to
happen when it has this bodily basis. To teach musical interpretation is to teach how to
reach within for one's deep bodily structured experiences. In this sense, it is absolutely right
to say that music develops self-knowledge, or, strictly speaking, it involves knowledge
beyond the boundaries of the self.
what he/she decided in the rehearsals. However, as soon as he/she starts to play, the pulse
and emotional variation of movement is back in place with a strong feeling for the context
where the action happens, the 'here-and-now' of the performance. It is then that new
emotional variations may happen. Variations in emotional intensity, of course, but also
variations caused by the necessity of integrating new elements and factors which occur in
the real time of the performance.
Focused in the flux of emotions, performers, even if only for short periods of time, have
reported feelings of 'the self' being freed, feelings of acting spontaneously in a state of
euphoria, as though they are 'flying', 'taking off' or 'going with the flow', even going into a
form of trance, and so on (see Davidson, 1997, for further examples of such reports). 'The
self' could be described as flowing, a term Csikszentmihalyi (1990) uses. In this state: '...
people typically feel strong, alert, in effortless control, unselfconscious and at the peak of
their abilities'. Thus, the performer, gains new insights, becomes spontaneous and genuinely
creative in the moment.
To perform is, perhaps paradoxically, not so much to reproduce what was memorised in
rehearsal but to re-live - here-and-now - the devised emotional narrative. When performing,
the performer is both free to be surprised with what comes from the embodied automatic
actions assimilated in rehearsals, and to re-act spontaneously to eventual new happenings
coming not only from the outward context of the performance but also from its inward
context:
"The unconsciously generated emotions and motives are integrated with the discriminating
and strategic operations of consciousness, memory and skill, modulating them. Musicality
implicates and expresses both 'ergic' and 'trophic' representations in the moving mind or
spirit" (Wallin, 1991, quoted in Trevarthen, 1999: 165)
So it can happen that conscious and unconscious reactions to the 'here-and-now' context
take over, and that is when performers experience 'becoming' - a type of experience
described by Deleuze and Guattari (1980): their sense of self is suspended, and they open
themselves to the ground (to their multiple stocks of impressions, emotions, body motions,
etc.) with an acute sense of the here-and-now. Their account of time vanishing, their sense
of not being in control, of becoming one with the sound, and the dream-state comparison are
all clear signals of the process of self-effacement and consequent disclosure in their
grounded 'becoming'.
"I lost the feeling that I was controlling what was happening.... I became a single
sound and body moving. Curiously, the feeling of time passing vanished, and
there was no moving ahead. It was like the same instant was presenting itself
from many different perspectives. The performance moment was an action
without phases, and I was listening to the music but from a very, very distant
place within myself. Recalling what was going on in my mind during that
performance is as frustrating as trying to recall a dream from which you have just
awoken." Flautist reporting a concert experience...
Therefore, 'becoming' is an act of intimate disclosure. Along this line, becoming takes place
in music performances where the musicians take on the role of operators of the moment,
where expressive actions and gestures emerge, which are based directly on their stock of
human traits and experiences which might be considered as a ground beyond personality or
self-driven concerns.
"... le grand concert pour moi c'est un concert où je ne me souviens plus de ce
que j'ai fait... j'ai décollé... ça arrive quand tu es très centré... moi, j'arrive à le
provoqué chaque fois quand j'arrive à tuer mon ego..." (Patrick Gallois, quote
taken from an interview)
CONCLUSION
Thus, a meaningful musical performance is one which is grounded in and reveals its bodily
origins. It seems that what is really decisive in performance is the gratifying and convincing
'becoming' experience. If the discourse about music, at least in instrumental practice and
teaching, focuses on the creative use of action or movement metaphors and/or expressive
physical gestures to ground the musical gestures, then the students are preparing for both to
develop their own ideas about the works they are playing, and to deal with the
physical/psychological state of the performance, creating this way the conditions to
experience 'becoming'.
REFERENCES
Barthes, Roland (1977) Image-Music-Text , trans. Stephen Heath, London:Fontana
Paperbacks
Csikszentmihalyi, M. (1990). Flow : the psychology of optimal experience . New York:
HarperCollins Publishers.
Damasio, A. (1999) The Feeling of What Happens , Orlando: Harcourt Brace.
Davidson, J.W. (1997) The social in musical performance. In D.J. Hargreaves and A.C.
North The Social Psychology of Music. Oxford: Oxford University Press. (pp 209-228)
Deleuze, G. and Guattari, F. (1980) Mille Plateaux , Paris: Les ...ditions de Minuit.
Johnson, M. (1987) The Body in The Mind. Chicago: University of Chicago Press.
Kivy, Peter (1989) The Corded Shell , Temple University Press, Philadelphia
Smith, J.A. (1998) Doing interpretative phenomenological analysis, In: M. Murray and
K.Chamberlain (Eds.) Qualitative Health Psychology: Theories and Methods. London: Sage.
Swanwick (1999) Teaching Music Musically. London: Routledge
Trevarthen, C. (1999-2000) Musicality and the intrinsic motiv pulse: evidence from human
psychobiology and infant communication, in Musicae Scienciae, Special Issue, 155-215.
Walton, Kendall (1997) "Listening with Imagination: Is Music Representational?", in Music
and Meaning, edited by Jenefer Robinson, Cornell University Press, Ithaca and London
Back to index
Proceedings paper
INTRODUCTION
The analysis of post-tonal music presents different problems compared to that of tonal music; various different procedures and methods have been
used to tackle these problems, both from the point of view of musical theory and analysis (Forte, Lerdahl, Narmour, Hasty) and of the psychology of
perception (Meyer, Imberty, Deliège).
The study we present here is part of the research proposed by the "Gruppo di Analisi e Teoria Musicale" (GATM), a group whose aim is to study
common procedures for the analysis of twentieth century post-tonal music. The group has recently launched a project to investigate the "macroform"
of such music. In this context the term "macroform" is used to indicate a higher system of segmentations which in turn contains segmentations of a
lower order (bibl.). It is necessary first of all to establish a very clear distinction between the concepts of "segmentation" and "macroform":
1. The term segmentation is used to indicate the exact point where two sections are separated: we are therefore dealing with a local
phenomenon, brought about by the presence of a contrast or discontinuity that involves one or more parameters of the musical material
(duration, dynamics, timbre, density, register, etc.);
2. The term macroform is used to indicate the result of a process of memorisation based on the division of a musical text into parts, each having
structural coherence and homogeneity. Such a division is not necessarily caused by the hierarchy of the segmentations: a strong segmentation
does not always, in fact, produce a division into parts; in the same way, it may happen that a division into parts does not coincide with a point
of strong segmentation.
One of the most important methods of research used by the group was the experimental study of perceived answers. The analysis of the results of
listening tests is an important tool for the creation of a theory about macroforms, since in this way the investigation does not simply follow the
already extensively trodden paths concerning the study of the rules of composition, but attempts to tackle the varied and complex problems involved
in actually listening to post-tonal music. In the post-tonal repertory the Nattiezian distinction between the poietic dimension and the aesthesic
dimension takes on a particularly important significance, as a great deal of the difficulties in comprehending this type of music arise from this point.
Generally speaking, the approach taken in most musicological literature tends to be one that favours the study of compositional techniques as
opposed to the analysis of listening strategies. It concentrates more on "what the composer did" than "what comes out of listening to his music". The
file:///g|/Sun/Addessi.htm (1 of 14) [18/07/2000 00:28:14]
Addessi
project set up by the GATM study group, on the other hand, aims at investigating both aspects at the same time, by comparison with the results of a
research method based on the analysis of perceived answers. In this context the results obtained from studies already performed on perceptive
analysis were an important point of reference, especially those carried out by Michel Imberty (1981, 1987) and Irène Deliège (Deliège 1989;
Deliège and El Ahmadi 1990; Deliège and Mélen 1997).
As far as the repertory is concerned, the investigation has concentrated mainly on string quartets.The choice to work on a timbrically homogeneous
repertory was dictated by the need to limit the number of variables, given the great variety of styles present within post-tonal music. So far analysis
(analysis of perception/ analysis of compositional techniques) has been made of the following pieces: A. Webern, String Quartet op. 5, first
movement; D. Milhaud, String Quartet, first movement; B. Maderna, String Quartet (1942) (Addessi and Caterina 2000).
Macroform 2
I part: bars 1-11 (sec. 000-27"/30")
EXPERIMENTAL STUDY
Method
Partecipants: 43 students took part in the experiment: 25 non musicians (university students) and 18 musicians (conservatory graduates,
conservatory teachers).
Materials: G. Kurtág, Quartet Op. 1, V movement (1959), duration 2'02" (CD WDR Auvidis Montagne MO 789007)
Equipment: EPM programme (this computer programme was devised at the University of Padua by Christian Temporali, and allows each subject to
indicate, by clicking on the mouse, the segmentations perceived while listening in real time), paper questionnaire.
Experimental procedure: each participant was given a questionnaire which in addition to the written answers involved three tasks to be carried out
using the EPM programme. The tasks were as follows: 1. Listen to the piece to become familiar with it; 2. Listen to the piece and, while listening,
record on the computer all the points of separation perceived; 3. Repeat the previous test, without worrying about any possible differences in the
answers; 4. Listen to the piece and indicate the main sections perceived on a line drawn on one sheet of the questionnaire; 5. Listen to the piece
again and indicate using the EPM programme the main sections perceived; 6. Describe in words the charactersitics of the sections indicated; 7.
Indicate, choosing from a series of possibilities, which element or elements influenced the division of the piece into parts.
Experimental hypotheses
The operative hypotheses we will deal with in this paper regard test number 5, which involves the perception of the macroform (division into main
sections):
1. The macroform of the piece listened to which obtains the highest frequency of replies will correspond to one of the three macroforms
proposed by the analytical group.
2. The results obtained from the subjects who are musicians will have a significantly higher correspondence to the analytical proposals than
Seconds 1/7 8/13 14/25 26 27/34 35 36/41 42/46 47/54 55/56 57/65 66/72 73/82 83/92 93/100 101 102/108 109-113 114/end
Bars 1-3 4-5 6-10(3) 10 11-13 13(4) 14-16(3) 16(4)-18(3) 18(4)-20 21 22-25 25(2)-27 28-30 31-33 34-36 37(1)-37(3) 37(4)-39 40-41(3) 4I(4)-end
(4) (3)
Range A B C D E F G H I L M N O P Q R S T U
Partecipants
Musicians
1 B E G O S
2 E G I M O P S
3 B E G O S
4 B C E G I L M P Q S T U
5 E M O S
6 E M O
7 G O
8 E O S
9 E G I M O S
10 E I O P
11 E G I O
12 A B E G I
13 E
14
15 E P
16 A B E I O P S T U
17 A E G I M O
18 G M O
Partecipants
Non musicians
1 C I P T
3 B E G I L O
4 E I O
5 E Q
6 E O T
7 E G M O S
8 B E O
9 G O
10 E O S
11 E G O T
13 B E H O P T U
14 E G O P
16 E I S
17 E I O
18 E G I O S
19 C E G
20 B E I O P Q T U
21 G O S
22 E I O S U
23 E G O P
24 B C E M O S T U
25 B C E G O S T U
26 C E I M O P Q S T U
27 A E I O
28 E I Q
Analyst's
macroforms
We elaborated an empirical measure or index to check how near or far the perceived (by the subjects) macroforms were from those proposed by the
analysts. In each perceived macroform we checked all the points of subsegmentation. We compared these points with those in the analysts'
macroforms. We adopted the following formula: I= (c*100)/A where c is the number of points coinciding with the points in analysts' macroforms
(macroform 1, macroform 2 and macroform 3) and A is the number of all the segmentation points indicated by the participants. If A is smaller than
the proposed points in the analysts' macroforms then I is not computed.
Table 2
Corrispondence of macroforms proposed by musical analysts and macroforms perceived by our partecipants
Partecipants N Mean Index of Std. Deviation
Corrispondence*
Musicians Macroform 1 16 28,9583 15,7894
Macroform 2 14 48,1548 18,3280
Macroform 3 12 47,8472 18,5948
It can be seen from Table 2 that macroform n. 2 (I= 48, 1548 musicians; I= 52, 4127 non- musicians) is the one which is closest to the perceived
macroforms in musicians as well as in non-musicians.The results for macroform n. 3 are slightly lower (I=47, 8472 musicians; I=43,2494 non-
musicians), while macroform n. 1 is clearly the furthest from the perceived macroforms (I= 28, 9583 musicians; I=32, 4009 non-musicians). The
results obtained for macroform n. 1 can be explained if we consider that many participants did not indicate the category 47/54 seconds as a point of
segmentation. There are no significant differences in the I values between musicians and non-musicians according to the Mann-Whitney test:
file:///g|/Sun/Addessi.htm (10 of 14) [18/07/2000 00:28:16]
Addessi
therefore our second experimental hypothesis, which predicted that musicians would give answers closer to the hypotheses made by the analysts,
has not been verified.
Table 1 also shows other aspects of the perception of macroforms that are quite interesting especially considering the results that are clearly far from
the predictions of the analysts.
For example, one of the points of segmentation chosen by many subjects, concerning bars 14-16 (36/ 41 seconds category), was not considered
suitable by the analysts. Our participants did, in fact, perceive bars 12-14 as a section; these bars, on account of their brevity, had been considered
by the analysts as belonging to the following section of the piece, even though they are quite well discernible due to the contrast between "ostinato"
and "non-ostinato". The rhetorical function of a "bridge" may also have been attributed to these bars, as had been done by the analysts. Our results
show that the internal characteristics of the sections, and therefore their homogeneity and contrast with the neighbouring sections, seem to be more
important for the listener than the duration of each section as far as the macroform is concerned: in any case listeners seems prefer such elements
when they are given a choice. In this sense the analysts' basic hypothesis, which proposed that the macroform of Kurtag's passage depends on the
presence or absence of the "ostinato" (contrast between "ostinato" and "non-ostinato") and on the content homogeneity of the individual sections,
predicted that these aspects would also operate in the minds of the subjects and thus be reflected in in their answers and macroforms. The results
seem in this case to support the idea that, although the section is rather small, the listeners were able to memorize not the presence of a cue
(Deliège), represented by the "ostinato", but rather its absence. Furthermore, the somewhat brief duration of the section could have allowed the
listeners to perceive the rhetorical function of these bars as a "bridge", as had been hypothesized by the analysts.
A similar case can be observed in bar 25 (2) where macroform n. 3 proposes the beginning of a new section (bars 25 (2)-28), characterized by the
suspension of the "ostinato". In this case our subjects did not indicate the beginning of the section (in fact there are no answers in this category):
they simply stopped indicating the repetitions of the "ostinato", as they had done up to the preceding bar (see the answers in the preceding category).
The total absence of indications in these bars, where the "ostinato" disappears, compared to those given in the preceding bars, which coincide with
the entrances of a very compact series of "ostinati" , tells us that the subjects perceived these bars as being different from the previous and following
ones, even though they did not indicate these bars in particular and did not consider them as a section of the passage, simply due to the lack of
"ostinati".
Many participants marked the beginning of a new section in the 83/ 92 seconds category at bars 31-33 whereas none of the three analysts'
macroforms regarded these bars as segmentation points. The analysts' macroforms give the beginning of a new section at the end of bar 29, where,
after the conclusion of the preceding section with notes of long duration, an "ostinato" which had already been proposed in preceding sections
returns. Perhaps many listeners wanted to wait and be sure that the "ostinato" was not there "by chance", but was actually the first element of a
series characterizing a new section. Only when the "ostinato" had been repeated twice did the listeners register the beginning of a new section. This
event has been studied by Deliège and El Ahmadi (1990), who hypothesized the existence of a lapse of time during which listeners must decide if
the new cue that allows them to identify a new section will come again or if it will be left out (the "tiling" zones). The tendency towards the
perception of segmentation points within the 83-92 seconds category and not in the previous one is still more evident in our subjects in the tests n.2
and n.3 (which will not be discussed in this paper) where there are more answers given in the 83-92 seconds category than in the previous one. It
should be born in mind that in tests n. 2 and n. 3 subjects had listened to the piece fewer times than in test n.5. This fact induces us to conclude that
most of the subjects are only able to anticipate the beginning of the section at bar 29, without waiting for the confirmation of the presence of the
file:///g|/Sun/Addessi.htm (11 of 14) [18/07/2000 00:28:16]
Addessi
"ostinato", after many listenings (6) and therefore only after the memorization of the passage.
Finally, another difference between the macroforms of the analysts and those of the listeners can be found in the sections indicated in the final bars
(102/ 108 seconds, bars 37 (4) -39; 109/ 113 seconds, bars 40-41 (3); 114/ end seconds, bars 41 (4) - end). Following the rule of the "ostinati", the
analysts decided on a single section from bar 29 (3) until the end of the piece, since the same "ostinato" is repeated eight times and each time is
clearly recognizable. In the final bars, however, the "ostinato" is interrupted by long pauses which may have induced the subjects, both musicians
and non-musicians, to identify sections. The presence of these pauses certainly creates a strong case for segmentation: one may wonder, however, if
the pauses by themselves are able to create these sections or whether their contextual rhetorical function of closing the piece is somehow involved.
Besides, the pauses may also act as an element of variation in the repetition of the "ostinato", introducing discontinuity that may have led to the
perception of the segmentations. These results support Deliège's studies on the relationship between variant and invariant elements in the
memorization of a heard musical passage (Deliège and Mélen 1997).
CONCLUSIONS
The three macroforms hypothesized by the analysts represented the basic framework for many of the macroforms perceived by the listeners: the
second macroform in particular represented a kind of a macroform prototype that was perceived, with certain variations, by all listeners. A
macroform prototype is, therefore, at the basis of many macroforms actually perceived by the listeners. The contrast between "ostinato" and
"non-ostinato" seems to have been the principal criterion for subdividing the piece into different parts for both listeners and analysts, thus bringing
about the tendency towards the macroform prototype proposed by the analysts. In this sense, the presence or absence of an "ostinato" (even though
not always the same "ostinato") may have acted as a recognition "cue" during the memorization of the macroform of a heard musical passage
(Deliège).
The variations observed in the actually perceived macroforms seem to depend on a series of preferential choices that the subjects made by applying
in a more or less consistent manner the criteria of repetition (indicating a section for each repetition of the same "ostinato" or only the first time that
the "ostinato" presented itself), the rule of difference-sameness (such as in the case of the pauses inserted in the last bars among the repetitions of the
same "ostinato") and the rhetorical rule of the bridge and of conclusion.
The differences between the two groups of subjects, musicians and non-musicians, are not significant. This result is in line with the findings of our
study group, as well as with the results of research by Deliège and Imberty. Above all, this result would seem to support the hypothesis that, at some
levels of analysis (macroform), the competences possessed by the musicians do not affect the memorization of a musical passage and the perception
of a macroform. We could observe, however, that the non-musicians occaisonally gave solutions closer to the analysts' macroforms, particularly to
macroform n.2. Therefore, although there are no significant differences between the two groups of participants, the tendency that emerges from the
answers leads us to suppose that the criteria used by the analysts hypothesizing their three macroforms are nearer to a perceptive analysis than to an
analysis of the musical score. However, at least the initial presupposition of the analyses has been respected: i.e. , that the musical scores would be
able to offer us clues explaining (with the help of data inferred from the analysis of the structures) the reasons for the segmentations and the division
into parts proposed by the listeners.
References
Addessi, A. R. and Caterina, R. (2000). Perceptual musical analysis: segmentation and perception of tension. Musicæ Scientiæ, in print.
Bigand, E., Parncutt R. and Lerdahl F. (1996). Perception of musical tension in short chord sequences: the influence of harmonic function,
sensory dissonance, horizontal motion, and musical training. Perception and Psychophysics, 58/1, 125-41.
Cross, I. (1998). Music analysis and music perception. Music Analysis, 17 (2), 3-20.
Deliège, I. (1989). A perceptual approach to contemporary musical forms. In S. McAdams and I. Deliège (eds), Music and Cognitive
Sciences. Contemporary Music Review, 4, 213-230.
Deliège, I. and El Ahmadi, A. (1990). Mechanisms of cue extraction in musical groupings: A study of perception on Sequenza VI for viola
solo by L. Berio. Psychology of Music, 18 (1), 18-44
Deliège, I., Mélen, M., Stammers, D. and Cross, I. (1996). Musical schemata in real-time listening to a piece of music, Music Perception, 14
(2), 117-160.
Deliège, I., Mélen, M. (1997). Cue abstraction in the representation of musical form. In I. Deliège and J. Sloboda (eds), Perception and
Cognition of Music (pp. 387-412). Hove: Psychology Press.
Dibben, N. (1999). The perception of structural stability in atonal music: The influence of salience, stability, horizontal motion, pitch
commonality, and dissonance. Music Perception, 16 (3), 265-294.
Imberty, M. (1981). Les écritures du temps. Sémantique psychologique de la musique (tome 2). (Le scritture del tempo. Milano:
Ricordi-Unicopli, 1990). Paris: Bordas.
Imberty, M. (1987). "L'occhio e l'orecchio. Sequenza III di Berio". In L. Marconi and G. Stefani (eds), Il senso in musica (pp. 163-186).
Bologna: CLUEB.
Imberty M. (1993). "Teorie musicali e teorie della memoria". In M. Baroni, M. Imberty and G. Porzionato, Memoria musicale e valori sociali,
«Quaderni della SIEM», 4, Milano: Ricordi.
Lerdahl, F. (1989). Structure de prolongation dans l'atonalité. In S. McAdams and I. Deliège (eds), La musique et le sciences cognitives (pp.
103-135), Bruxelles: Mardaga.
Meyer, L. B. (1996). Commentary. Music Perception, 13/3, 455-84.
Krumhansl, C. L. (1996). A perceptual analysis of Mozart's Piano Sonata K 282: Segmentation, tension and musical ideas. Music Perception,
13/3, 401-32.
Back to index
.
Proceedings paper
Introduction
Practising is an all-important part of instrumental study. Does this practising require some sort of
planning? And, does some sort of practice planning improve the instrumental achievement of students
in higher instrumental study? Considering practising as an activity directed by aims, it is highly
relevant to ask how students plan, and the effects of planning on achievement.
It is important to distinguish between at least two levels of planning, or two domains of planning. One
is the planning inherent in the formulation of performance aims and means during practising, and the
development of mental representations for performance. This is a domain which has been shown a
growing research interest (see Sloboda (1982), Gabrielsson (1999) for overview of research, and
Nielsen (1999) and Sullivan and Cantwell (1999) for recent examples).
The other planning domain, where this study is based, is the overall planning of practice activities. By
this I mean questions like the planning of how to co-ordinate practice sessions in relation to other
study activities; when and how students plan their practice activities, why they plan etc. These are
more global features of their practice planning, and they correspond to the planning activities carried
out by teachers. That is why research teachers' planning activities and how teachers think about
planning is relevant to this research project. There is, however, no research relating teachers'
«achievements» to their planning behaviour. The only field of research where there are comparable
research questions to my study, is studies in time management behaviours among college and
university students. I will return to these in my concluding discussion.
Research on this global aspect of instrumental practice planning is mostly neglected. Two previous
reports from the research project presented here have concentrated on different types of planning
behaviour among higher instrumental students (Jørgensen (1997 a), and their time perspective in
planning (Jørgensen 1997b). For professional musicians, there is a study by Hallam (1997), where
their organisation of practice is part of the study.
There is, however, no previous study of the relationship between global aspects of practise planning
and achievement.
The study
The participants were students in an Academy of Music, in their four-year undergraduate program.
They were enrolled in the instrumental, vocal and church music institutes. Planning behaviour was
registered through a questionnaire. All questions were related to a «normal» study week or study
period, excluding periods where examinations etc. may disturb their regular, usual type of behaviour.
Instrumental achievement was measured as their instrumental performance grade on their major
instrument in their 2nd and 4th (final) year of study.
Grades are given from a five-point scale, where 1 is best and 5 is "fail". An examination concert is the
context for giving the grade. All except one student in this study got a grade on one of the three
highest levels. This leaves us for all practical purposes with three grade groups: The «excellent» (1);
the «very good» (2); and the «acceptable» (3).
My research questions are:
1. Do students in different grade groups differ in their co-ordination of practice sessions with other
study activities?
2. Do students in different grade groups differ in when they carry out planning activities?
3. Do students in different grade groups differ in their time perspective in planning practising on
repertoire and technical exercises?
4. Do students in different grade groups differ in respect to systematic planning?
The study has been carried out over several years, and some research questions have been replicated.
Results
Coordination of practice with other study activities
On busy study days and study weeks, the students usually attend several classes and rehearsals. Most
of these activities are on their weekly schedule, with fixed times for each of them, while some are not
on this schedule, being organised more ad hoc. This leaves the time between the scheduled activities
for practising, and the students have to coordinate their practice sessions in relation to their other
study activities. The question to the students was related to the time perspective in this coordination:
Did they include practice sessions in their week-plan, or did they coordinate practice sessions in
relation to other activities at the beginning of each study day, taking one day at a time? Or did they fit
them in during the course of the day, without any previous planning? When they answered, they had
to choose the one which fitted best for their own behaviour. This posed a problem for some, who
commented that their study weeks were so different, that they used all three alternatives over a period
of time. The answers will, accordingly, reflect a form of forced choice for some students, but I will
not conclude that this was a serious threat to the validity of the question.
Results from this part of the study are from 1991. The dominant type of coordination behaviour, for
students from all the three institutes and all the four study years, is to plan the coordination at the
beginning of each study day, taking one day at a time. This was the case for 58% of the instrumental
students (N=78), 46% of the vocal students (N=11) and 41% of the organ (church music) students
(N=17).
For students in their 2nd and 4th study year, there was a proportion of 10-15% in all the three grade
groups who coordinated their practice sessions by including them in their work-plan. Then there is a
difference between students with the lowest grade (3) and those with the two highest grades (1 and 2).
Among the former, more than 50% fit in their practice session during the course of the day, while only
24-29% of the latter do that. The tendency is, accordingly, that the majority of students in the two
highest grade groups coordinate their practice one day at a time, while the majority of students in
grade group 3 fit in their practice sessions during the day, with no previous planning. The differences
are not statistically significant with chi square (2nd study year: N=27, chi square=5.682, df=6,
p=0.460; 4th year: N=23, chi square=4.246, df=4, p=0.374). My conclusion is that:
● Students in different grade groups did not differ significantly in practice coordination activity,
with the exception of a tendency for students in the lowest grade group to fit in practising
during the day, without previous planning.
Time period for planning
When do students plan? Planning can be carried out at quite different points of time, before, during,
and after practising. Are there differences between students in different grade groups in their
utilisation of these time periods for planning? The five periods I concentrated on, were «before a
practice day», «in the beginning of a practice session», «during practising», «shortly after practising»,
and «between practice days and practice sessions». The students were asked if they «always»,
«often», «sometimes», «seldom» or «never» used each of these time periods for planning.
This part of the study is based on information from the students in 1995 and 1996. For all students in
the three institutes (N=109), 55% planned «always» or «often» before a practice day in 1995, with
28% «sometimes» and 17% «seldom» or «never».The distribution for planning in the beginning of a
practise session was 76% with «always» or «often», 13% «sometimes» and 11% «seldom» or
«never». 50% planned «always» or «often» during practising, with 30% «sometimes» and 21%
«seldom» or «never». Shortly after practising 24% planned «always» or «often», with 31%
«sometimes» and 45% «seldom» or «never». And between practice days and practice sessions 34%
planned «always» or «often», 33% «sometimes» and 34% «seldom» or «never». We can see that the
different time periods had different popularity value with the students, with planning immediately
before a practice session as the most popular period for planning, and the period immediately after
practice sessions as the least popular. The distribution in 1996 was very similar.
The analysis was carried out with information from the two different student populations (1995 and
1996), each of them with students getting a grade in their 2nd study year, and others getting a grade in
their 4th and final year. This established four groups for analysis for each of the five practice
behaviours. Based on chi square analysis of differences between the grade groups in each of the four
analysis groups, I got the following results (with p-values from the four groups):
Students in the three grade groups did not differ significantly in their tendency to plan:
● Before a practice day (1995: p= 0.551 (2nd year, N=28), p=0.617 (4th year, N=27); 1996:
p=0.069 (2nd year, N=15), p=0.174 (4th year, N=24))
● In the beginning of a practice session (1995: p=0.611 (2nd year), p=0.281 (4th year); 1996:
p=0.760 (2nd year), p=0.333 (4th year))
● During practising (1995: p= 0.463 (2nd year), p=0.567 (4th year); 1996: p= 0.216 (2nd year), p=
0.332 (4th year))
● Shortly after practising (1995: p= 0.356(2nd year), p=0.373 (4th year); 1996: p=0.933 (2nd year),
p=0.295 (4th year))
● Between practice days and sessions (1995: p= 0.258 (2nd year), p= 0.920 (4th year); 1996:
p=0.947 (2nd year), p=0. 0.892 (4th year))
yourself as a person who uses practice planning in a systematic way?». The answers were
approximately normally distributed, with 5% saying that they regarded themselves as «very
systematic planner», 20% were «very systematic to average systematic», 50% were «average
systematic», 18% were to «average systematic to very unsystematic planner», and 7% were to «very
unsystematic planner». My research question was now: Do students in different grade groups differ in
respect to systematic planning?
Looking at the three different grade groups in each of the four analysis groups (see above), the main
conclusion is that:
● there is no significant difference between grade groups in their evaluation of their own use of
planning in a more or less systematic way (p=0.561, 0.510, 0.235 and 0.291 for, respectively,
year 2 and 4 in 1995, and year 2 and 4 in 1996)
Discussion
The main result may seem surprising: There seems to be no systematic and statistically significant
difference between students with different grades regarding several types of practise planning
behaviour.
Since this result is from an explorative project in a field with no previous research, we have to look at
research on other students' planning behaviour, outside music, for a comparison and discussion. Even
here the research activity is very small, but there are some studies about students' time management
behaviours and their academic performance. Macan et. al. (1990) developed a «Time Management
Behavior Scale», based on «tips, ideas, and techniques repeated throughout several how-to books on
time management» (op.cit. 761). When they related the students 'grade point average' with the overall
score on the scale, the correlation was 0.23. Correlations between grade point average and the four
factors were: «Setting goals and priorities», 0.10; «Mechanics - Planning - Scheduling», 0.20;
«Perceived control of time», 0.22; and «Preference for disorganization», 0.17. All correlations are
positive, indicating a positive relationship between certain types of planning behaviour and academic
achievement. The values are, however, so small that their main message is that this relationship is
negligible.
Britton and Tesser (1991) developed a time-management questionnaire with 35 items, each answered
on a 5-point scale. Their theory was derived from research in computer-operating systems, and based
on the supposition that the information-processing resources of college students is managed by some
mental system analogous to the time-management component of a computer's operating system. They
used «cumulative grade point averages» over all four college years as a dependent measure, and
developed three factors in the questionnaire. The correlation between grade points and the three
factors were: For «Short-range planning», 0.25; for «Time attitudes» it was 0.39; and for «Long-range
planning» -0.10. The first and last of these correlations are negligibly low, while the «time attitude»
factor shows a positive and sufficiently high correlation to be of interest. I will return to this factor.
A third study, by Trueman and Hartley (1996), is also relevant for my discussion. They used a
shortened version of Britton and Tesser's scale on students in psychology in a British university. The
correlation between academic performance (on first year examination) and the whole scale was 0.15,
between academic performance and «Daily plan» it was 0.04; and between academic performance and
«Confidence in long-term planning» it was 0.19. All of them negligibly small.
These three research efforts from study contexts other than instrumental music, in my view, support
my own conclusion: There is no general and systematic relationship between certain types of planning
activity and academic achievement. Even if there are several limitations in my research and in the
three reported studies in time management, both in measures of dependent and independent variables,
and possible neglect for important aspects of planning, the low correlations in the time-management
studies among college and university students, and the non-significant differences between
achievement groups in practise planning behaviour among the instrumental students, suggest that
there is no general and strong relationship between planning and achievement, relevant for all
students. The most important result from the time management studies is the suggestion given by the
«Time attitudes» factor in the Britton and Tesser study. This factor suggests that for many of the
students, it is more important how they experience their own control (or lack of control) over their
study time, than how they manage and plan the distribution and use of this time.
References
Britton, B.K. and Tesser, A. (1991). Effects of time-management practices on college grades. Journal
of Educational psychology, 83, 405-410.
Gabrielsson, A. (1999). Music Performance. In: Deutsch, D. (Ed.), The Psychology of Music. 2nd ed.
San Diego: Acadmic Press.
Hallam, S. (1997) Approaches to instrumental music practice of experts and novices: Implication for
education. In: Jørgensen H. and Lehmann A. C., (Eds.). Does practice make perfect? Current theory
and research on instrumental music practice, pp. 89-107. Oslo, Norway: Norges musikkhøgskole.
Jørgensen, H. (1997a). Higher instrumental students' planning of practice. In: Proceedings, Third
Triennial ESCOM Conference, pp. 171-176. Uppsala, Sweeden, 7-12 June 1997,
Jørgensen, H. (1997b). Higher level students' time perspective in planning instrumental and vocal
practising. In: Proceedings, IV International Symposium of RAIME, pp. 52-61. Dundee: Northern
College.
Macan, T.H., Shanani, C., Dipboye, R.L., Phillips, A.P. (1990) College students' time management:
Correlations with academic performance and stress. Journal of Educational Psychology, 82, 760-768.
Nielsen, S. (1999). Regulation of learning strategies during practice: A case study of a single church
organ student preparing a particular work for a concert performance. Psychology of Music, 27,
218-229.
Sloboda, J.A. (1982). Music Performance. In: Deutsch, D. (Ed.), The Psychology of Music. New
York: Acadmic Press.
Sullivan, Y.M. and Cantwell, R.H. (1999). The planning behaviours of musicians engaging traditional
and non-traditional scores. Psychology of Music, 27, 245-266.
Trueman, M. and Hartley, J. (1996). A comparison between the time-management skills and academic
performance of mature and traditional-entry university students. Higher Education, 32, 199-215.
Back to index
Proceedings paper
Introduction
There is a substantial amount of psychological research investigating human "musical" timing. This
research has been performed for more than a century and most of it adopts an experimental approach.
Examples of such research areas are; frequency regions in relation to perception and performance of
beat; synchronization abilities; perception and performance of changes in tempi; expert behaviour
compared to nontrained; differences between perception and performance; personal spontaneous
tempo; developmental aspects, and effects of different manipulations such as rhythm training and
education, and even administrating whiskey to the subjects (Harrel, 1937).
Some of the earliest research efforts in this field focused on stability in different tempo regions and
synchronization. (Stevens, 1885; Dunlap, 1910: Harrel, 1937). One main finding is that there seems to
exists a range of frequencies that constantly is regarded as easy to perceive, easy to perform and
experienced as possible foundations for the "experienced beat". This frequency span is typically found
approximately between 60 - 120 bpm, sometimes higher. The centre frequency, around 80 bpm, is
often regarded as the best suited for these different tasks. (Stevens, 1885; Brown, 1979; Fraisse, 1982;
Grieshaber, 1987; Duke, 1989)
Previous research has also suggested that there is a concept called "spontaneous tempo", "personal
tempo", or "mental tempo" which is a voluntary tempo that is characteristic of the individual ( Fraisse,
1982).
In such research it has been argued that different individuals have their own typical and specific
spontaneous tempo which can be expected to be relatively invariant over time.
Earlier research, unfortunately, for the most part focus on adults (Pouthas, 1996) and is most often
aimed at producing results on a general level.
In practising music and in music education it is usually assumed that, children in a class or a group
can experience the tempo, or the pulse in such a way that they can act together upon that experience in
musical activities, e.g. singing, playing and dancing.
Tempo is of vital importance in both the experience of music and in the performance of it.
Synchronization is of vital importance in anticipating musical events and the ability to make music,
alone or especially together with others. Individuals differ, however, quite markedly in their
synchronization to the music played.
Practically all research on the subject also confirms this individual variability in synchronization. The
timing precision, nevertheless, is generally found to be very high, in the range of tenths of
milliseconds. (e.g.Dunlap 1910; Barttlett&Bartlett, 1959; Keele &Ivry, 1988; Franek, Mates, Radil,
Beck, Pöppel, 1991). Fraisse (1982) states that synchronizing is also possible in cases of more
complex rhythms and in cases of accelerated and decelerated sounds even if this diminishes the timing
precision.
This present investigation focuses these two major aspects of time in music, tempo and
synchronization.
The purpose is to investigate the concepts of spontaneous tempo and synchronizing to regular
external stimuli to see if children exhibits individually stable ways to handle these concepts and how
these behaviours may change over time. The concept of synchronization is investigated in two ways.
The first is synchronization to steady tempo and the second is synchronization to slowly changing
tempo, which was intended to show how synchronization was performed over a greater frequency
region.
Three fundamental questions were asked:
1. Is there individual typical stability to be found regarding these two concepts in 8-year old school
children?
2. If such stability is to be found, to what extent and in what way does it vary between the children?
3. If such stability is to be found does it still exist five years later or in what respect has it changed?
Method
The group in the investigation consisted of two school-classes from two schools in the same
neighbourhood in Arvika, Sweden. There were 18 girls and 12 boys.
The thirty children were tested in 1992 when the children were eight years old and in 1997 and both
test years involved three test sessions.
The research design adopts an experimental approach in which children in an individual setting are
tested regarding spontaneous tempo and synchronization to external pulse flow. The two concepts are
measured by computer and the external pulse flow is also computer generated.
The test equipment consists of a computer with a MIDI pad and a sound module connected to it. The
sound module is adjusted to produce the distinct sound of a snare drum. This is amplified and played
back through a speaker. The drumpad is put on a table at a height that enables the children to beat the
pad with an ordinary drumstick in a comfortable manner while standing.
Findings in an earlier investigations (Hugardt, 1987) stated that in an individual setting children could
produce a spontaneous, steady pulse flow when beating a drum. In this present investigation, the drum
was substituted by a drum-pad connected to the computer.
The computer allows for very accurate measurements and is reasonable portable so that the
investigation can be carried out in the school where the children are and not located in a laboratory.
A computer-program, specially designed for this investigation, was invented, and produced. This
software measures and analyses the spontaneous tempo, synchronization to steady pulse flow and
Results
High individual stability was detected both in spontaneous tempo and synchronizing behaviour in
both the 1992 and the 1997 test sessions.
Spontaneous tempo
The individual stability in spontaneous tempo was generally higher in the 1992 test results.
The measure used to express deviation in spontaneous tempo from occasion to occasion (or stability
in spontaneous tempo) was the mean deviation from the individual mean frequency and this was 10,7
% in 1992 compared to 13,8% in 1997.
The mean spontaneous tempo for all children was fast in both years compared to earlier suggestions of
around 100 bpm. (Fraisse, 1982). In 1992 it was 144,6 bpm and in 1997 it was slightly faster, 149,6
bpm.
The ranges in spontaneous tempo in the 90 measurements were remarkable similar in the two years
respectively. It was between 52 and 297 bpm in the 1992 measurements and between 59 and 272 bpm
in 1997.
In 1992 there were 18 children deviating less than 10% compared to 16 in 1997 and there were six
children deviating more than 20% in both years. These are, however not the same children in both
years. The conclusion was that even if the deviations between occasions were greater in 1997 relative
1992 the increase was only 3%. The absolute amount of deviation for most children from their
individual mean frequency in both test years indicated a consistency in spontaneous tempo
performance. This was supported by ANOVA tests on the three test sessions in respective year and on
all the six test occasions in 1992 and 1997 together which all reveals a significant difference between
the children indicating individual stability in performance in each test year and also in the longitudinal
perspective.
Synchronization to steady tempo
The children exhibited great differences in synchronizational precision. Despite this the mean
deviation from stimulus tempo in both test years for all children was below ten percent of stimulus
tempo.
The most striking difference between the 1992 and the 1997 test results was the dramatic drop in
deviation from stimuli in synchronizing behaviour indicating a higher precision in synchronizing in
the 1997 measurements. The average mean deviation for all children was reduced from 7,83% in 1992
to 3,46% in 1997. The standard deviation was reduced from 6.86 to 1.67 indicating both low deviation
and low variation in these from occasion to occasion in 1997. The range in deviation from stimulus
tempo also decreased. The range in 1992 was between 2% - 36% and in 1997 between 1% - 9%.
It was apparent that it foremost was the children with the highest deviation readings in 1992 that
dramatically reduced their deviation in the 1997 measurements.
Individual stability in synchronization to steady tempo
ANOVA test on the three test occasions in both 1992 and 1997 indicates that the mean values in
deviation from each child were significantly different from the others indicating individual stability in
this respect. When the results from the six test occasions in both test years are analyzed together, the
ANOVA test still displayed significant difference between the children's performance, indicating
individually stable performances even in the longitudinal perspective.
Most children exhibited low variability in their synchronizational performance from occasion to
occasion, both in 1992 and in 1997. In 1992, four children displayed a considerably higher deviation
from stimulus tempo in each test occasion. These most deviating children in 1992 also varied most in
their amount of deviation from stimulus tempo, from occasion to occasion. No such extreme result
was to find in the 1997 test sessions where all children performed in a much more uniform way.
Synchronization to slowly changing tempo
This part of the investigation displayed great similarities to the steady tempo measurements in the
difference between the children and in the average amount of deviation.
The deviation from stimuli in the synchronization to slowly changing tempo was nevertheless
generally a little higher than the deviation in the steady tempo measurement. The deviation had
dropped in the 1997 test sessions compared to 1992 but the drop was far from as big as in the steady
tempo results. The average deviation from stimuli tempo in the slowly changing tempo measurements
only decreased from 9.47% in 1992 to 8.37% in 1997.
There were, just as in the steady tempo measurements, extreme cases of high deviations in 1992,
which exhibited a remarkable drop in deviation in the 1997 measurements. The range between
smallest and largest deviation also decreased in 1997. In 1992 it was 4% - 26% and in 1997 4% -
19%.
Individual stability in synchronization to slowly changing tempo
ANOVA test revealed that there was, as in the synchronization to steady tempo measurements, a
significant difference between the children in respective year. When the test results from both years
are analyzed together the ANOVA test still displayed difference between the children's performance
in a significant way, indicating individually stable performances even in this longitudinal perspective.
The most deviating children also exhibited the greatest variation in performance from occasion to
occasion in the 1992 test sessions while this tendency is gone in 1997. This result also corresponds to
the findings in the steady tempo measurements.
Gender results
Substantial differences between boys and girls were found in the synchronizational performances in
the 1992 measurements. Mean deviation from stimulus tempo for girls in the steady tempo
measurements was 6,2% and for boys, 10,3%. The corresponding figures in the slowly changing
tempo measurements were 8,1 % for girls and 11,5 % for boys.
T-test on this difference was nevertheless not overwhelmingly significant and exhibited p=0,068 in
the steady tempo measurements and p=0,023 in the changing tempo measurements.
Conclusions
The results from the investigation give the following answers to the questions asked in the
introduction:
There was individual typical stability to be found regarding performance of spontaneous tempo and
synchronization in 8-year old school children?
The range in variation between children was big. In the spontaneous tempo measurements the range
was over 200 bpm from the slowest to the fastest tempo. The synchronization to external stimuli also
displayed great differences between the children in their synchronizational precision and these have
proved to be individually stable.
The stability in performance found in these children in 1992 was still present in 1997. The variation in
individual spontaneous tempo from occasion to occasion was somewhat higher in 1997 indicating a
little drop in individual stability here. The deviation from stimuli tempo in the synchronization task,
on the other hand, was lower in the 1997 sessions indicating a higher synchronizational precision.
Differences between high deviating children and the rest in synchronization task in 1992 was
dramatically reduced and exhibits a much more uniform picture in 1997
Discussion
The increase in accuracy in synchronization to external stimuli with age is earlier documented (i.e.
Grieshaber, 1987) and it is interesting to note that this investigation conforms to this notion and that
the children at the same time maintains their individually stable performance. It is also interesting to
note that children that were stable in high deviation in their synchronization when they were eight
years old have made the greatest change and dropped most in deviation from 1992 to 1997. The
synchronization to slowly changing tempo exhibits a drop in timing precision compared to the steady
tempo measurements, which is a result that conforms to earlier findings, (Fraisse, 1982). The
individually stable performance is nevertheless present in both synchronization to steady tempo and to
slowly changing tempo.
The results altogether suggests the importance to pay attention to individual differences while the
developmental aspects of the results stresses the importance in not regarding individual differences as
static. In practising music and in music education, paying attention to this last remark might be of
vital interest.
References
Bartlett, NR, Bartlett, SC (1959) Synchronization of a motor response with an anticipated sensory
event. Psyc Rev 66:203-218.
Brown, P. (1979). An enquiry into the origins and nature of tempo behaviour. Psychology of Music
7/1: 19-35.
Duke, R.A. (1989) Musician's perception of beat in monotonic stimuli. Journal of research in Music
Education. 37: 61-71.
Dunlap, K. (1910) Reactions to rhythmic stimuli with attempt to synchronize. Psych Rev 17: 399-416.
Fraisse P. (1982) Rhythm and Tempo, In D. Deutsch (Ed.). Psychology of music. New York:
Academic Press. pp. 149-180.
Franek, M., Mates, J., Radil, T., Beck, K., Pöppel, E. (1991). Sensorimotor synchronization: Motor
responses to regular auditory patterns. Perception & Psychophysics, 49 (6), 509-516
Grieshaber, K. (1987). Children's rhythmic tapping: A critical review of research. Bulletin of the
Council for research in music education 90 (Winter 1987): 73-81.
Harrel, T.W. (1937) Factors influencing preference and memory for auditory rhythm. Journal of
General Psychology. 17:63-104.
Hugardt, A. (1987) Puls och rytmik, barn och motorik, ingen är den andre lik. Göteborg: Göteborg
University.
Keele S.W., Ivry R. (1988) Modular analysis of timing in motor skill. In G.Bower (ed.). The
psychology of learning and motivation vol. 21. San Diego: Academic Press. pp. 183-228.
Pouthas, V. (1996). The development of the perception of time and temporal regulation of actions in
infants and children. In I. Deliège and J. Sloboda (Eds.). Musical beginnings. Origins and
development of musical competence. Oxford: Oxford University Press. pp. 115-141.
Stevens LT (1886) On the time-sense. Mind 11: 393-404
Back to index
Proceedings paper
Introduction
It is likely that Schumann referred to the inner hearing that turns sounds into music, and that he, as the
ardent romanticist he was, wanted to maintain intuition at the expense of intellectual reasoning. It may
perhaps be taken as a sign of more prosaic times if attention is instead paid to the conditions that
determine how musicians perceive the vibrations produced by their instruments.
There are several reasons why the musicians themselves are not the best judges of the sounds out of
which they make music. The player, actually producing the music, is apt to "hear" a confluence of
physical sound waves and sensations emanating from the bodily motions that generate these sounds,
and it is also likely that musicians sometimes confuse the actually emitted sound sequence with their
musical intentions - they may hear what they wish to be heard rather than what there is to be heard.
Finally, it is obvious that players more often than not listen to themselves from a peculiar and
misleading acoustic perspective, very different from the one that really should count in professional
work, viz. that of the audience.
The directions of sound propagation and also the frequency-dependent angles of diffusion may be
such that part of the immediately emitted sound is likely not to hit the musicians' ears. This means that
the players' perception of the direct sound is often biassed towards low frequencies, and that
musicians are more or less dependent on reflected sound to get an idea of the spectral quality, a
reflected sound which, due to absorption, is impoverished with respect to high-frequency partials. On
the other hand, due to the very short distance to the instrument, players do hear a lot of secondary
sounds associated with the tone production - sounds that, particularly if they have high frequencies,
are not audible at greater distances because of air absorption. Proximity also causes musicians to hear
themselves as very loud in relation to their fellow players (a violin in a string quartet is a case in
point) though perhaps not as loud as they really are (trumpets, if you ask the woodwind players seated
in front of them). Finally, musicians and listeners alike hear a mixture of direct and reflected sound,
but in the ears of the former direct sound is bound to dominate over reflected sound, and therefore
musicians at play are poor judges as regards the effects of reverberation on their playing.
A few further examples may serve to illustrate the problems involved. All brass players (except those
playing the French horn) are behind the bells of their instruments, which is hardly a good position if
you want to get full and reliable information as to your actual sound quality and loudness. Singers are
even worse off since their sound perspective is dominated by low-frequency biassed sound
transmitted via bone conduction. Organists and conductors deal with a multitude of different
intensities and sound qualities, and whereas the spread of sound sources might help to separate the
various components, it is still very difficult to form a correct idea of the joint effect of, and the
balances within, the organ registers and orchestral parts as they are heard in the auditorium - indeed,
conductors sometimes step back from the orchestra during rehearsals to find out.
Clever musicians have somehow learnt from long experience to cope with the fact that they cannot
always trust their auditory feedback when they play - at least we like to think that they are not victims
of the peculiarities of their listening conditions. But this experience is hard-earned; there is a lot of
trial and error, and much waste of time, involved in this process of learning.
It is true that ever since tape recorders came into general use, musicians have had an equipment at
hand making it possible to listen to themselves at a distance. But it seems that tape recorders have
been little used to guide artistic judgement in practising and rehearsals. The reason for this is probably
the fact that one can only listen to tapes afterwards - there is no push to corrective change when it
would be most effective, i. e. during the very act of performing. The ideal thing would be immediate
feedback, to listen to oneself at a distance while playing.
Theoretical considerations
In this paper is proposed, tried out, and evaluated a method that in a number of ordinary situations
makes it possible to judge one's own playing with "distant ears".
In short the method works as follows. In order to prevent as much as possible the player from hearing
the sound from the instrument in the natural, airborne way, he/she is wearing high-performance
protective earmuffs. The sound is instead picked up by microphones mounted at some distance in the
room and then relayed to earphones in the earmuffs.
In order to work satisfactorily from the perceptual point of view, distant listening must fulfill two
conditions. The proximate sound travelling directly from the instrument to the player, and leaking
somewhat through the earmuffs, must seem to be exchanged for the distant sound fed back from the
microphones to the earphones. Secondly, the dominating distant sound must not confuse the player
because of its somewhat delayed arrival.
The first condition implies that the distant sound led back to the ears must have a substantially higher
intensity level than the remainder of the proximate sound, finding its way into the ears in spite of the
efforts to muffle it. Otherwise the proximate sound will not be properly masked.
The sound intensity inevitably decreases in proportion to the square of the distance from the
instrument. On the other hand and depending on the amount of absorption, the room will be uniformly
filled with reflected sound. The intensity level of the distant sound received by the microphones is
therefore the sum of the direct sound spread from the instrument and thus reduced in intensity, and a
considerable loudness increment due to reflected sound. Indeed, a few meters away from the
instrument the reflected sound begins to dominate over the ever-weaker direct sound, determining the
sound intensity and making it practically constant no matter the further distance from the instrument.
The microphones should be mounted outside this reverberation radius (which depends on the amount
of reflected sound in the room) if one wants to gain information as to how the music is heard by the
audience; cf. Sundberg 1991, p. 176.
First-rate protective earmuffs of the kind used in the present experiments reduce the sound level with
approx. 16 dB at 125 Hz, 23 dB at 250 Hz, 32 dB at 500 Hz, and 39 dB at 1000 Hz. The masking
effect of tones within the same critical bandwidth as the tone to be masked is approx. 20 dB; cf.
Sundberg 1991, p. 67. Excepting perhaps very low tones (that generally have weak fundamental
frequencies anyway) it seems that the intensity difference between relayed distant and muffled
proximate sound may allow of proper masking without excessive amplification of the signals from the
microphones - as a last resource, the relayed sound can of course be amplified until it drowns the
proximate sound.
The dual fact that masking implies adding the intensities of the sounds involved, and that
amplification of the distant sound above its actual level may be necessary, means that distant listening
is not suitable for checking the authentic loudness of the direct sound at the location of the
microphones - a minor drawback in most applications since the much more important relative
intensity differences within the distant sound are preserved. When setting the volume of the relayed
sound, proper masking must be the primary consideration; next comes a level that makes possible a
comfortable and attentive study of the distant sound. Only in the third place, and if it is of any interest,
one might try to adjust the volume so as to approximate the authentic intensity.
Turning to the second condition, the time interval between the muffled proximate sound and the
stronger distant sound is also critical - the distant sound having travelled through the room to the
microphones is bound to arrive at the player's ears somewhat later than the proximate sound. But
double onsets (pre-echoes) must be avoided, and so must any sense of delayed onset in general -
discrepancies between motor and auditory onsets may be gravely confusing for players.
If however the time difference between the arrival of the early proximate sound and the late distant
sound does not exceed a certain value, and if the intensity of the delayed distant sound is greater than
that of the proximate sound, a variety of the "precedence effect" may be taken to apply. (Cf. Benade
1976, p. 204; Hall 1980, p. 363.)
The precedence effect as used in public-address systems regulates the relative positions (the times of
sound arrival) and the intensity levels of sound source and loudspeakers, and it means that the
amplified sound is added to the orginal one in such a way that you locate your sensation to the
original sound source, and that you take it to start at the onset of the original sound. In order for this
illusory disappearance of the original sound to work properly, two limits must be respected. The
relayed sound must not reach the listener more than approx. 30 ms after the direct sound, and it must
not be more than approx. 10 dB louder than the direct sound.
In this specific application, however, the 10 dB maximum intensity difference between proximate and
distant sound can be disposed of - there is no need to secure correct localization, since both the
proximate and the distant sounds are heard as being within the earmuffs. Respecting only the 30 ms
time-difference limit, the listener will hear the late distant sound as starting at the onset of the early
proximate sound, and the illusion necessary to avoid double or delayed onsets has been attained.
The second condition for distant listening thus stipulates a maximum delay of approx. 30 ms, which in
turn introduces a limit for how far the microphones can be placed from the instrument: since the
velocity of sound is approx. 343 m/s, the distance from instrument to microphones should not amount
to more than approx. 10 meters. But this is more than many rooms measure, and also (for practically
all music rooms) well beyond the reverberation radius, outside which reverberated sound dominates
the aural impression.
To conclude, it should be observed that the masking situation involved, and hence the criteria of
masking, is somewhat unusual. Masking and masked sound are the same except for the crucial quality
differences, and this sameness is furthermore a necessary condition for the early-onset illusion of the
precedence effect. This implies that for the present purposes, masking is a fact when the ordinary
sound of the instrument, recognized by any musician playing it, is said to be replaced by a distinctly
different sound, associated with how the instrument sounds at a distance. (In practice, the sound
immediately heard to be replaced when the distant sound is switched on is of course the very dull,
muffled proximate sound.)
It thus appears that the distant-listening method might work, but rather than indulging in further
theoretical considerations, suitable equipments, procedures and applications should be tried out in
practical trial-and-error experiments. However, before presenting such a pilot study, another crucial
problem should be shortly mentioned.
There may be some instruments that do not lend themselves to distant listening. Singers, who would
benefit most from the opportunity to judge their voice at a distance, have their instruments inside their
own heads and must therefore be excluded. The low-frequency biassed, bone-conducted sound that
determines the impression of one's own voice can of course not be quenched by ear-protecting
devices. The violin, pressed against the jaw-bone, the clarinet, more gently held against the teeth, and
the brass instruments, fed by the vibrations of the lips, might possibly also leak too much sound
directly into the skull. And loud low-frequency sounds in general may be problematic since they are
well transmitted through solids - the vibrations from double basses standing on a podium might for
instance be taken up by the feet and relayed to the ears via the skeleton.
Experimental procedure
It was decided to test the distant-listening method for eight different instruments - violin, violoncello,
flute, clarinet, trumpet, trombone, piano, and organ - as well as for a baritone singer in order to find
out what insights a singer might eventually gain from the experience. The subjects were asked to
prepare and perform a few short excerpts from their repertoire, varying with respect to tempo,
dynamics and articulation.
In addition, beyond the plan, distant listening was also tried by some further musicians passing by -
two horn players, a vibraphone player, two more singers (a soprano and a baritone) - and by the author
at the piano. The author, wearing earmuffs with earphones supplying distant sound, sometimes
accompanied the subjects at the piano in order to find out what assistance one might get from distant
listening when it comes to balancing parts in ensemble playing.
Excepting the organ session in St. Andreas Church in Malmö (a rather large, modern church with
fairly generous reverberation) the experiments were carried out in the Rosenberg Hall at Malmö
College of Music, a newly-built concert hall of moderate size, featuring adjustable reverberation by
means of courtains.
The typical set-up of the equipment (it was sometimes changed or simplified) was as follows. A first
pair of microphones were mounted just above the musician's head in order to pick up the proximate
sound from the player's perspective; a second and third pair of microphones were used for the distant
sound. The nearest of these microphones were set up at a distance of 5-6 meters - just outside the
reverberation radius as measured by means of a sound-level meter - whereas the other pair was
mounted as far away as the musicians could accept - beyond a distance of 8-9 meters the subjects
began to notice double onsets. The signals from the various microphones were fed into a
mixer/amplifier, and from there on to a tape recorder and (in the case of distant sound) to the
earphones in the earmuffs.
To test equipment of different technical quality, two kinds of microphones were used alternatively to
supply the distant sound - Sony ECM 909A microphones representing good standard quality, and
professional Brüel & Kjær 4006 microphones. At the other end of the transmission chain, the standard
earphones of the Peltor HT7A protective earmuffs were used alternatively with no-communication
Peltor H7 earmuffs, combined with Sony MDR E 565(B) free-style earplugs devised for musical use.
And sometimes the Tascam M 216 mixer/amplifier unit was exchanged for the standard amplifier of
the Revox B77 tape recorder. The amplification volume corresponding to equal sound pressure at the
distant microphones and within the earmuffs, respectively, was determined for each combination of
components by means of a sound-level meter (Brüel & Kjær 2225) as well as aurally; this volume was
used as a normal, starting value for amplification in the experiments.
Besides testing different equipment and procedures and learning about the limitations of the method,
the distant-listening experience was also evaluated by the musicians. In addition to reporting on the
perceptual qualities of the various distant-listening conditions, they were asked whether the equipment
or the musical situation was disturbing, and if they considered distant listening musically suggestive -
the latter question leading to talks that turned out to be quite informative.
Since it was of interest to study spontaneous modifications in performance due to distant listening, the
subjects played some of their short music selections first without, and then with the earmuffs
supplying distant sound; these renderings were recorded for subsequent musical evaluation. In order
to simplify comparisons of the playing characteristics, the left and right channels of the tape recorder
registered the sound from the proximate and the distant microphones, respectively.
The purpose of the experiments was to find out if and when the distant-listening method works, and to
have its merits, if any, assessed by the musicians taking part as subjects.
The aim was not primarily to establish the optimal equipment, but rather to test the method in various
technical and musical conditions. Expensive equipment is likely to give the most satisfactory auditory
results, but it was considered more important to try the method at more modest levels, allowing distant
listening to be put to everyday and handy use.
Turning to the musical evaluation of the method, main importance was attached to some core issues in
music making, such as the character of the sound at a distance as opposed to the biassed ordinary
sound perspective, questions of loudness balance, and the influence of reverberation on performance.
Special applications will require more sophisticated equipment and procedures. Thus, if you want to
carefully study matters of sound quality, it is of course crucial to use first-rate microphones and
earphones, and also a high-quality amplifier. If you are particularly interested to know exactly what a
listener hears, you must ensure that the recording of the distant sound models human binaural hearing
as closely as possible; dummy-head microphones might perhaps be used to this end. And if you want
to try to get some idea of the actual dynamic level of your playing as it is heard at a certain distance, it
is necessary to use a sound-level meter in order to check that the intensity of the sound fed into the
ears equals that received by the microphones.
Turning to the individual musicians' evaluation of the method but avoiding duplication of frequent
points of view, a number of observations made by the subjects will be reported and shortly discussed.
The violinist sometimes noticed a slight over-hearing directly from the instrument. This effect was not
due to bone conduction since it remained when the violin was not in contact with the jaw. (When
playing the piano, the author could also hear some such additional sound when he tipped his head
backwards and slightly to the side.) It seems probable that for certain sound-propagation angles (or
certain positions of the head) sound might more easily leak in under the edges of the earmuffs. When
using the most distant microphones, the violinist had some difficulties to co-ordinate properly with the
piano. He found the method especially useful for improving details of bowing technique - he could
distinguish those noise components that really count at a distance and work with them. Engaged in
selecting a new violin, he wanted to use the method in order to compare different instruments with
regard to how they might sound to an audience.
The violoncello player found the distant-sound perspective with its brightness and transparence quite
inspiring, and she was especially interested in the opportunity to get an idea of the acoustic
invironment - the curtain at the rear wall was operated to test how a varied amount of reverberation
influenced performance. Both wearing distant-listening equipment, this subject and the author
rehearsed the exposition from the first movement of Brahms' E-minor Sonata, evaluating the
potentials of the method as an aid to achieve a good balance between the instruments.
The first of the two flute subjects was very pleased with the fact that distant listening relieved him of
the ordinary way of hearing himself differently with the left and the right ear. (For flutists the sound is
very loud in the right ear, whereas the left ear is exposed when playing the violin.) He also
appreciated the possibility to hear to what extent the distant tone was free from noise associated with
the blowing. Having brought his baroque and classical flutes in addition to his modern one, he played
the same passages on all three instruments and found it quite informative to listen to the distant-ear
impressions. He also played notes in different registers and at different dynamic levels on these flutes,
and compared the sound at various microphone distances with the dB-values obtained from the
sound-level meter at these positions.
The other flutist found the distant sound richer in overtones, and used the method to instantly evaluate
the effects of different (both proximate and distant) microphone positions. He also found it quite
useful to study the relationships between various attack articulations and the distinctness of tone onset
as heard at a distance, and to check the balance of multiphonic and whistling effects.
The clarinet player did not find that the difference between distant and ordinary listening was very
great, but used the method with profit to evaluate timbre differences associated with various
fingerings.
The subject playing the trumpet did not hear the distant sound as quite representative, but he found it
very interesting to play with the relaying microphones as far away as 11 meters although the
substantial delay robbed him of the immediate auditive feedback necessary to control the onsets. This
condition reminded him of the fact that trumpet players (and other musicians seated far back in the
orchestra) must play slightly ahead of the conductor's signs in order to make for good joint precision.
(This idea might perhaps be developed into a practising method helping students to acquire a feeling
for the proper degree of temporal "push" in large-ensemble playing. Conducted by an assistant at the
position of the distant microphones, one player is seated close to the conductor whereas the other one,
wearing earmuffs and being seated far away from the conductor, tries to play in exact co-ordination
with his colleague.)
As already mentioned, the trombone player tended to hear both the bright distant sound and some
amount of low-frequency biassed sound, a situation that made for intonation problems. The intonation
was appreciably bettered, however, when the intensity of the distant sound was raised to secure full
masking. He took the opportunity to test the rule that the listeners' auditory impression of the
trombone is more favourable if the player does not stand face to face with the audience, directly
exposing the listeners to all high-frequency components.
The pianist was perhaps the most enthusiastic. He found the rich, balanced and transparent distant
sound of the grand piano much more attractive than the ordinary sound heard at the keyboard, being
dominated by low-frequency components and by quite a lot of thumping mechanical noise. (The
author can also testify to this - the difference is quite extraordinary.) He also stressed that it was of
great value to hear more of the reverberation in the hall, and he could immediately use this
information to refine articulation and pedaling.
Due to the position of the organ, the distant microphones had to be placed at a substantially greater
distance down at the floor of the church, but in spite of this the organist did not complain about
delayed feedback. (Organists are likely to have acquired great tolerance with respect to late onsets.)
He considered the method profitable for judging registrations - especially such involving both the
subjectively quite offensive Brustwerk just above the organist's head, and the Rückpositiv behind his
back with its shut-off sound. Since it gave an idea of what it takes to achieve an impression of
musically effective silence in a highly reverberant room, he found distant listening useful for checking
articulation and choice of tempo. He also mentioned that organists sometimes do use microphones
suspended under the vaults to gain a distant perspective of the musical events issuing from the gallery;
they tend to use ordinary headphones, not protective earmuffs, however, and thus they are not likely
to mask effectively the proximate sound from the organ or the choir.
Turning finally to the baritone singer, he could only hear the distant quality of his voice as part of a
mixture of realyed and bone-conducted sound, but on the other hand he found it quite interesting to
sing wearing just silent earmuffs. This condition, implying a substantial reduction of emitted sound in
favour of internally transmitted vibrations, offered opportunities to check the head resonances for
various pitches and vowels - a suggestion that deserves to be studied and that may be put to
pedagogical use. Both wearing distant-listening equipment, the singer and the author at the piano
worked with problems of balance: at what accompaniment loudness and in which voice registers does
the pianist run the risk of dominating the singer? Distant listening made it possible to gain an
objective idea of the actual relative intensities involved: habit prompts a singer to stand more or less
with his/her back towards the piano - an unfavourable listening position not only for the pianist,
getting little or no direct sound, but also for the singer, hearing the piano quite loudly. The lid of the
grand piano was sometimes closed or almost so, sometimes against current practice left wide open.
The quality of the piano sound was of course appreciably bettered in the latter condition, which did
not necessarily result in drowning the singer. It seems that the custom of closing the piano may derive
more from the fact that the singers, standing in the middle of the acoustic draught, do not want to feel
overwhelmed, than from well-founded considerations with regard to sound balance.
It thus appears that distant listening works in a number of ordinary musical conditions, and also that it
might yield valuable musical insights. Distant listening can of course not be used all the time or even
very often, but frequent use is not necessary in order to gain musical experience and to arrive at
specific musical conclusions. It seems that this method, sparingly applied, may be an important
resource within higher musical education, offering students opportunities for reconsideration of
ingrained performance habits. Distant listening may also be used with some profit when it comes to
certain difficult problems in professional work. While the method may certainly be developed in
various ways to satisfy specific demands, distant listening can already be applied in a variety of
musical situations as a tool for improving professional training and refining artistic efforts.
References
Benade, Arthur H. (1976). Fundamentals of Musical Acoustics. New York, Oxford University Press
Hall, Donald E. (1980). Musical Acoustics. Belmont, Wadsworth Publishing Company
Sundberg, Johan, (1991). The Science of Musical Sounds. San Diego, Academic Press
Acknowledgements
I wish to express my thanks to Johan Sundberg (Royal Institute of Technology, Stockholm) and
Anders Jönsson (Department of Audiology, University of Lund) for their constructive interest. I am
also grateful for the open-minded co-operation of my subjects, all teachers or students at Malmö
College of Music: Anders Frostin (violin), Hege Waldeland (violoncello), Anders Ljungar-Chapelon
and Terje Thiwång (flutes), Christophe Liabäck (clarinet), Peter Meyer (trumpet), Mattias Cederberg
(trombone), Andrzej Ferber (piano), Hans Hellsten (organ), and Johan Weigel (baritone).
Back to index
Proceedings paper
1. Background Over the past ten years, I have worked at describing and interpreting the
body movements of musicians in an attempt to understand the relationships between the
physical control of an instrument, the musical material being performed and the performer's
implicit and explicit expressive intentions. To date, my work has suggested that the interface
between physical execution and the expression of mental states is a subtle and complex
one. For instance, performers appear to develop a specific vocabulary of expressive
gestures, yet these gestures - though perceptually discreet - co-exist and are even
integrated to become part of the functional movement of playing. Additionally, the' meaning'
of these individual gestures - unlike the specific emblems used to accompany speech -
appears to change dramatically according to context. In parallel with these highly
individualised concerns of function and expression for each performer, there is the matter of
how both musical and extra-musical concerns are coordinated between co-performers using
body movements. Also, how both group and individual concerns are communicated to the
audience.
2. Aim I wish to explore the interaction between individual performance body style, musical
expression and communication in order to understand how a coordinated and meaningful
performance is created. I shall explore this question through a case studies of three singers
from different Western styles of performance: classical, jazz and pop.
3. Contribution This work builds on previous research to bring music production and
perception research within a social psychological framework.
4. Implications In theoretical understanding and practical teaching the implications of this
paper are far-reaching, regarding the performance process as a 'tuning-in relationship'
between co-performers and audience - each party familiarising him or herself with the
gestures of the other individuals and interpreting them from a basis of shared stocks of
knowledge.
Extended Paper
Background
Davidson (1993, 1995) demonstrated that information about both structural features and
expressive intention is communicated to observers through body movement. For example,
she showed that performances of the same piece of music with three different expressive
intentions (to perform the piece without expression, with normal expression and with
explains the finding in piano playing that there was both continuously available expressive
information (the swaying motion) and much more local information (for instance, specific
information limited just to two seconds of the performance): that is, some areas of the body
are global indicators of expression whilst other local parts of the body provide more specific
information. In music, of course, there are many potential demands on the player. Adhering
to the score means that there will be differing technical requirements made on the body. This
in turn will affect the presentation of expressive intentions, making some areas clearer
indicators of expression than others. Additionally, key musical structures may be the
individual points around which expression of the intention is most pronounced. Given the
strong evidence that structure and expression are closely related, significant structural
moments are likely to provide the focal points around which specific examples of expression
will be organised, accounting for the very local nature of some of the expressive moments.
All of the above mentioned studies show the critical role of movement and give some
insights into the specifics of the types of movement being used, and even perhaps why the
movements are used. However, on these basis of these data, it would be misleading to imply
that bodily movement in performance can be accounted for simply in terms of the primary
processes of physiology, sensori-motor coordination, and the cognitive mechanisms of
expression. There is a powerful social component to the way in which we use and present
our bodies - in musical performance no less (and arguably rather more) than in other
aspects of our lives. As argued elsewhere (see Clarke and Davidson, 1998), Gellrich (1991)
has shown how a set of specifically learned mimetic movements and gestures furnish a
performance with expressive intention, and suggested that these gestures can have both
negative and positive effects on the production of the performance. In the positive sense,
they can provide the observer with information which assists in understanding the
performance since the gestures can intensify and clarify meaning, even when the movement
itself is 'superfluous' to the production of the musical whole. In other words, there can be a
'surface' level of movement - a kind of rhetoric - which the performer adds to the
performance. On the negative side, if these gestures are not consistent with the intentions of
the performer, they can create physical tensions in the performer, inhibiting technical
fluency, and disturbing observers with the incongruity between the gesture adopted and the
performance intention.
Support for Gellrich's observations about the negative consequences of incongruous mimetic
gestures is found in the work of Runeson and Frykholm (1983) who demonstrated that
covert mental dispositions become specified in movement and can be detected by
observers. Using the simple task of lifting a box, they asked observers to report what they
could see, and discovered that the box weight, and how much that weight differed from the
lifter's expectation about the weight, could be detected. Most relevantly, attempts to give
false information about the box weight were detected by the onlookers. Thus in this case, the
lifter's expectation, the deceitful attempt and the real weight of the box are specified.
Clearly, 'surface' gestures may contribute significantly to the production and perception of a
musical performance. Indeed, a further interpretation of the finding that some two second
excerpts of the pianist's performances in Davidson's study were more richly informative than
others could be that mimetic gestures are used at certain points during the performance, and
that these movements heighten the expressive impact of a specific moment. For instance, a
large head shaking gesture may have occurred which could have had its own distinguishable
form, yet been part of the all-pervasive swaying movement. A fairly extensive literature on
physical gesture in spoken language (cf. Ekman and Friesen, 1969; Ellis and Beattie, 1986)
indicates that gestural repertoires emerge which are associated with specific meanings, and
it could be that the pianist in Davidson's studies had developed specific gestures for
particular musical expression - a gestural movement repertoire.
The current study will explore how and why these identifiable gestures are used, and the
extent to which social factors such as performance etiquette influence the shaping of them.
The current paper builds on Davidson's previous work in that the social context is explored,
examining how style and culture influence the movement patterns. Furthermore, the notion
of a 'centre of moment' for musicians other than pianists will be considered, as here the case
will be singers.
Methodological Note
All the data used in this study came from video recordings of performances. The study
examines live performances by a classical singer, a jazz singer and a pop singer. These raw
data were subjected to repeated observations by the author to explore the nature of the
expressive gestures used. The first study, an analysis of one of the author's own
performances, provided the grounding for the subsequent analyses. Interpretations of the
analyses were obtained by asking two independent evaluators for their feedback and
commentaries on the raw material and the interpretations of it.
Results
The self-analysis revealed that the classical singer regularly engaged in a forward and
backward rocking motion, shifting her weight from side to side. There were also individual
gestures during the course of a full one hour programme, however, these were few in
number but they used were of seemingly very different types. Observations suggested that
they were expressive of the following:
i) movements directly related to and reactive of material in the texts of the poems - a priest
giving a sermon was portrayed with outstretched preacher-like gestures;
ii) movements linking together sections of the music or ideas between musical passages -
hands in a slow moving 'begging dog' position to connect one phase end to the opening of
the next song;
iii) gestures with clear technical orientation - a lifting and turning hand and forearm
'illustrating' the action of the soft palette lifting;
iv) movements of direct instructional nature about musical entrances and exits, as signals to
the accompanist - head nods to indicate 'now'.
These movements were regarded as a combination of being:
a) performance process-oriented - to assist the moment-by-moment issues of co-ordination:
making the performance start, remain fairly co-ordianted and finish;
b) expressive of emotional intention;
c) rhetorical in terms of both the narrative of the poem and the music. Additionally, revealing
a 'story' about how the singer had been trained to move to produce the performance. The
palette-lifting gesture, for example, presumably having been learned in singing lessons and
then integrated as an expressive gesture in the performance.
The subsequent analyses of the other two singers similar features in that both included
many of the elements listed above. However, both displayed a lot more in terms of
non-musical performance elements. For example, the jazz performer engaged in dance
movements with her co-performers, the pop singer engaging directly with the audience,
using her body to make sexually enticing gestures and signals.
Discussion
This work certainly adds to the previous study in that by looking at singers who have text as
well as music to communicate, some insights into the different types of movements being
used can be more readily deciphered: linking narrative and gesture, for instance. It is
possible that such differences may also occur in instrumental performance, but perhaps it is
more difficult to differentiate between the narrative gesture, and the gesture used for
primarily technical ends, but which has become synthesised into an expressive movement
vocabulary (like the palette lifting movement used by the classical singer).
Furthermore, it is evident that socially motivated movements are used a lot in singing
performance: musical content being co-ordinated though interactive movements between
co-performers; but also, movements of how singers typically or stylistically move within a
certain performance context.
In a recent paper, Cook ( 2000) has argued that the movements of a musical performance
shows how music and action combine to create a 'different' work, not simply a piece of
music: rather, a performance 'multi-media', as Cook terms it. The analysis of the three
singers allowed for these issues to be explored. But, additionally, it enabled the author to
make a reflexive turn and note that within the tradition of classical concert singing it was
rather less likely for 'multi-media' devices to be used, whereas jazz and to the largest extent
pop performances were structured around the body as the provider of an additional element
or embellishment to the music.
References
Clarke. E.F. & Davidson, J.W. (1998) The body in performance In W.Thomas (Ed.)
Composition-Performance-Reception. Aldershot: Ashgate.
Cook, N. (2000) Demise of the Work Ethic: Jimi Hendrix's Improvisation as Performance Art.
Royal Musical Association Conference : Performance 2000, University of Southampton,
April.
Cutting, J.E. and Kozlowski, L.T. (1977) Recognising friends by their walk: Gait perception
without familiarity cues. Bulletin of the Psychonomic Society, 9, 353-56.
Cutting, J.E., Proffitt, D.R. and Kozlowski, L.T. (1978) A biomechanical invariant for gait
perception. Journal of Experimental Psychology: Human Perception and Performance, 4,
357-72.
Cutting, J. E. and Proffitt, D.R. (1981) Gait perception as an example of how we may
perceive events. In R.D. Walk and H.L. Pick (eds) Intersensory Perception and Sensory
Integration, New York: Plenum.
Davidson, J.W. (1993) Visual perception of performance manner in the movements of solo
musicians. Psychology of Music, 21, 103-13.
Davidson, J.W. (1994) What type of information is conveyed in the body movements of solo
musician performers? Journal of Human Movement Studies, 6, 279-301.
Davidson, J.W. (forthcoming September 2000) Understanding the expressive movements of
a solo pianist. Deutsche Jahresbuch fur Musikpsychologie
Ekman, P. and Friesen, W. V. , (1969) The repertory of nonverbal behaviour:Categories,
origins, usage, and coding. Semiotica, 1, 49-98.
Ellis, A. and Beattie, G., (1986) The Psychology of Language and Communication, London:
Weidenfield and Nicolson.
Gellrich, M. (1991) Concentration and Tension, British Journal of Music Education, 8,
167-79.
Runeson, S. and Frykholm, G. (1983) Kinematic Specification of Dynamics as an
informational basis for person-and-action perception: Expectations, gender, recognition, and
decpetive intention, Journal of Experimental Psychology: General, 112, 585-615.
Back to index
Proceedings paper
In contrast to this similarity-based classification, a second strand of research has highlighted the importance of
explicitly defined concepts, or theory-based classification (e.g. Murphy & Medin, 1985; Rips, 1989). According to
this research, similarity is insufficiently clear and constrained to act as an explanation of categorisation (e.g. 'whale'
and 'bat' are members of the category 'mammal' despite perceptual dissimilarity) and suggests that we categorise
not on the basis of clusters of similarity but on the basis of selecting the concept that best explains the instance to
be categorised. The role of this kind of classification is also supported by developmental research, where it has
been argued that children's concepts are first based upon similarity but are later replaced by more theory-like
categorisations (Keil, 1989). (This distinction between similarity-based and theory-based classification resembles
the distinction between 'natural' and 'artificial' categories, and Zbikowski's Type 1 and Type 2 classifications).
Overlapping with this, a distinction is often drawn between perceptual ('surface') similarity based on immediately
obvious (usually visual) features of objects (e.g. colour, shape), and theory-based ('deep') similarity, which suggests
that is not only the surface appearance of members of objects/events that determines conceptual distinctions but
that aspects of their deeper character are also involved (Keil, 1989; Medin & Ortony, 1989; Rips, 1989).
One question raised by these uses of the terms 'surface' and 'deep' therefore is whether it has the same meaning in
the psychological as in the musicological context. In both cases the notion of a polarity between 'deep' and 'surface'
similarity seems to lead to a rather limited understanding of what might be responsible for creating similarity.
Hampton has suggested a revised model of similarity and categorisation in which he cites evidence for the role of
similarity in categorisation (Hampton, 1997): namely, the fuzziness of concepts, the degree of flexibility and
context sensitivity of conceptual categories and the ability of similarity to pervade attempts to reason logically. He
concludes that similarity-based categorisation is a widespread phenomenon but that it should be broadened to
encompass information which goes beyond the perceptual appearance of objects and that although we have the
ability to think in a precise, logical fashion, this is generally more difficult and requires training. In sum, he
suggests that similarity forms the basis of people's concepts most of the time (Hampton, 1997: 109).
Adopting this understanding of similarity and the continuity between 'surface' and 'deep' structures we
conceptualise similarity relationships in music in terms of a continuum of levels of structure, ranging from very
deep through to very surface. We can map this onto Dowling's (1982) conceptual framework for levels of pitch
organisation. Dowling describes musical systems at four levels of abstraction from the actual notes of real melodies
(Figure 1). The first, most abstract level of the psychophysical scale represents the physical materials of the pitch
continuum produced by logarithmic frequency. The second level of tonal material represents the set of pitches in
use within a particular musical system. For Western tonal music this represents the twelve chromatic notes which
divide the octave (comprising all the notes available in instruments of fixed pitch such as the piano). At the third
level, the tuning system is a subset of the available pitches used in actual musical pieces, and for Western tonal
music this typically comprises the diatonic scale. The final, most concrete level of mode provides an anchor for
frequencies, a tonal focus in the tuning system, and a tonal hierarchy determining which notes are more important
within the pitches of the tuning system. This level of mode equates roughly to tonality.
Figure 1
Dowling's pitch framework (shown for Western tonal music)
Our preliminary review of the features of music to which listeners have been found to show sensitivity in various
settings and under various conditions will thus adapt this framework to consider many dimensions of music in this
'levels of depth/abstractness' manner. The theoretical views of similarity reviewed above can now be applied to
consider how listeners might make sense of relationships between different parts of a piece of music whilst they are
listening to it. We first apply the framework outlined above to consider how the piece might be perceivable, before
looking at research focusing specifically on how the piece is actually perceived.
The music analysis literature implies that compositions are unified by their underlying thematic connections, but
that surface differentiation is included to create interest and variety. An analysis of any given piece of music should
thus show that the more surface features of similarity do not necessarily serve to emphasise the underlying thematic
similarity. There are (functional) places in a piece where we might expect all the levels of similarity relations to
reinforce one another. For example, at the start of a piece it would be important to establish the pattern of what will
follow by highlighting the critical features. This notion is consistent with theories of key derivation (Brown, Butler
& Jones, 1994) and of metre induction (Povel & Essens, 1985) which indicate that the clearest and most
unambiguous statements occur at the beginning of a piece of music to orient the listener and provide a guiding
framework. It also concurs with the notion of psychological essentialism (Medin & Ortony, 1989), which proposes
that features of the world tend to co-occur with both surface and structural similarities in order to assist us in
making sense of important events and objects in the world.
This then suggests that there will be a network of similarity relations within a single piece of music. Some parts of
the music will be strongly related at a range of different levels: for example, the exposition and recapitulation in
sonata form (which are often identical, with the exception of occurring at different time points). Other parts of the
music may share more surface similarities but with underlying fundamental differences. These would be the
similarity relations that 'deceive' the casual listener. Still other parts of the music may share deeper similarities but
not more surface similarities (again, deceiving the casual listener that they are different when after repeated
listening, for example, the similarities may reveal themselves). It is also important to note that there will be parts in
the majority of musical styles which are dissimilar on many levels.
Another important point relates to the way that the particular musical style, and further, the particular musical piece
itself, both set constraints on similarity. For instance, a tonal classical piece will have certain boundaries around the
acceptable tonalities employed within it - an example of style-specific similarity. A piece written for solo piano
will typically not suddenly introduce the sound of a domestic hoover in its final bars. Less extremely, each piece of
music sets up its own similarity criteria. It is thus difficult to set out precisely how similarity relations operate at
more generic levels since these are likely to be highly context-specific (to draw on another notion of psychological
similarity). This issue is theorised elsewhere through the notion of frames (Barsalou, 1992; and its application to
music in Zbikowski, 1999).
Empirical studies of the perception of similarity in music have found evidence for prototype effects, and
differences in the extent to which different kinds of motivic transformations effect recognition and similarity
judgements. Using specially constructed stimuli, Welker (1982) found a melodic prototype received high
recognition and similarity judgements whilst higher-order transformations led to decreases in accuracy and lower
similarity ratings. However, Rosner & Meyer (1986) found surface features of contour and rhythm influenced
similarity judgements as much as underlying melodic processes. Judgements are also influenced by experience and
exposure to the particular piece in question: Francès (1988) found musically inexperienced subjects unable to
identify correctly the two themes of a classical piece, erroneously labelling them as new material (having been
unduly influenced by surface differences), while Pollard-Gott (1983) showed that after a single hearing of a
classical tonal piece, listeners' similarity and descriptive ratings of passages drawn from the piece were primarily
influenced by surface features of the passages such as loudness and complexity of texture. Thematic relationships
only emerged as important with subsequent hearings. Similar findings have been obtained by Dowling and Bartlett
(1981) showing listeners' failure to confuse 'related' thematic material with the 'target' materials in a recognition
task, which contrasts with short-term comparison studies where contour-preserving variants of melodies are often
confused as 'same' (Bartlett & Dowling, 1980). Furthermore, Chapin (1982) found 8 year old children performed
better on a thematic recognition task than adult pianists and choir members, suggesting that different strategies may
be employed at different points in development.
Some of the contradictions which these findings raise may be related to the experimental approaches adopted.
Where instructions, training and musical materials are relatively complex (Francès, 1988; Pollard-Gott, 1983),
thematic recognition or robust similarity judgements based upon thematic criteria appear to be slow to emerge,
whilst in less cognitively demanding situations such relationships appear more easily extractable. In a slightly
different vein, Leonard Meyer, Eugene Narmour and Robert Gjerdingen have provided stimulating discussions of
the perception of archetypes and schemata in music, and Lawrence Zbikowski shows how categorisation can be
used to account for the role and effects of motives and motivic structure (Zbikowski, 1999), but as yet there has
been only limited attempts to investigate such issues empirically.
Empirical Study
Method
Forty university students participated in the study: twenty with and twenty without musical training. The materials
were two different pieces of piano music: Beethoven's piano sonata Op. 10 no. 1, first movement, and Schoenberg's
Klavierstuck Op. 33a. The Beethoven piece was chosen on the basis of Réti's analysis of it, which reveals a single
basic shape underlying a variety of surface manifestations. The Schoenberg piece was selected as an example of a
dodecaphonic piece which is built on a hexachordal combinatorial series, providing a basic shape underlying both
themes in serial sonata form, and because Schoenberg makes explicit reference to the importance of motives and of
motivic development in his own writings (e.g. Schoenberg, 1967). As well as incorporating two themes, the piece
involves a 'modulation' back to the home transposition at the recapitulation. Both pieces thus use the principle of
developing variation within a sonata form, with more than one thematic group, and are written for the same
instrument, and neither were particularly well known to the participants.
Nine extracts were selected from each piece on the basis of the theoretical framework elaborated above, with some
sharing many features on many levels, others sharing more surface elements but not deeper elements, still others
sharing deeper elements but not surface elements, and others with very low levels of similarity at any level. The
extracts were paired and the order of presentation to subjects began for each piece with the opening motive as the
first extract presented. This was to set up the piece-specific criteria for real-life similarity perception.
Following one complete hearing of each piece, listeners heard the 37 pairs of extracts and judged each pair for
similarity on a scale of 1 to 11, where 1 represented minimal similarity and 11 represented maximal similarity.
They also heard each extract individually and provided a series of adjective ratings on bipolar scales, although
these results are not discussed here due to limitations of space.
Results
Similarity judgements
A two-way analysis of variance with pair-wise comparisons as the repeated measure revealed a significant effect of
fragment pairs for both the Beethoven (F35,1190=13.862, p<.0001), and the Schoenberg piece (F(35,1190)=23.196,
p<.0001). In neither case was there any effect of experience (trained versus untrained), although the interaction
between the similarity ratings on pairs of fragments and experience approached significance in the case of the
Schoenberg piece (F(35,1190)=1.408, p=.0588).
The similarity ratings made on pairs of fragments were analysed for each piece separately using multi-dimensional
scaling to reveal any underlying criteria in the similarity judgements. Given that the interaction between similarity
judgements and experience approached significance for one of the pieces, the results for the trained and untrained
listeners were analysed separately.
Beethoven similarity ratings
Four dimensions emerged as the best solution (stress levels of the MDS solution were approximately .05 in each
case - a relatively low value). Interpretations of these dimensions are given below.
Musicians:
• Pitch shapes and intervals. All the melodic material is very similar, so the differences that there are seem to
be due to emphases on particular notes: one cluster of fragments are characterised by a G to G contour with
Ab-G motion, while another is characterised by the interval of a minor sixth (G to Eb) and semitone
appoggiaturas. Absolute pitch seems to be an important factor here rather than scale step or degree.
• Thematic function and tonality. Fragments are polarised with the initial statement of the first theme in the
minor key judged furthest from the beginning of the development (same material as theme 1) in the major
key, with the 'new' development theme close to this.
• Well-formedness and closure. At furthest remove are those fragments with perfect cadences and one which
ends on a prolonged dominant, with more and less cadencing sequences in between.
• Major/minor and modulation. The 'most' minor within the similarity space of the MDS is the subdominant
minor (F minor) rather than the home key (C minor), which appears in the middle of the space. A fragment
with a major/minor contrast is the most major (Ab to F minor), with other major key fragments close by,
perhaps because the major-minor contrast highlights the 'majorness' of the major key.
Non-musicians:
• Pitch shapes. Although pitch shapes emerge as an important similarity criteria again, in this case the
emphasis is on particular notes rather than shapes, i.e. those fragments with the same pitches are judged as
most similar (Eb, G, etc.) rather than those with the same intervals.
• Thematic function and salience. The first and second themes are clustered together at one end of the MDS
with restatements and recapitulations at the other. This factor is more apparent for the non-trained listeners
than it is for the trained listeners: the non-trained listeners treat the first and second themes as very similar on
this dimension, whereas the trained listeners the preparation for the second theme seems more salient, and
the first theme is polarised by mode.
• Rate of harmonic change. One-bar harmonic changes are positioned furthest from the more flowing
harmonic changes which happen over two or four bars.
• Major/minor and metric emphasis. This dimension appears to be a combination of tonality and of which
beat in the bar is emphasised, with syncopated and minor-mode fragments maximally distant from metrically
clearer 4/4, major fragments.
In sum, the non-trained listeners appeared to be focusing more on less detailed levels: pitch shapes alone rather
than pitch shapes and intervals, thematic function and salience rather than thematic function and tonality, and
temporal aspects like harmonic change and metric emphasis rather than more syntactic and style-specific
parameters such as modulation or closure. The non-trained listeners also seemed to be focusing on the identity of
the two main themes and differentiating less between other material which is often grouped together.
Schoenberg similarity ratings
Musicians:
• Thematic function and salience. The first statements of thematic material are grouped together and judged
maximally different from restatements of material which are also grouped together. Less salient material is
placed between these extremes.
• Register and (conjunct/disjunct) melodic motion. Three groups of fragments emerge in the MDS: fragments
that are static and registrally constrained, those involving intervallic oscillations, and fragments characterised
by stepwise descents.
• Texture. On this dimension, three groups of fragments emerge characterised by different textures: melody
with accompaniment, fragments including both chords and arpeggios, and fragments using homophonic
chordal movement in crotchets.
• Global contour. Fragments on this dimension are characterised by different kinds of contour: from those
descending and then ascending, to those ascending and then descending, then ascending fragments only.
Non-musicians:
• Thematic function and salience. As with the trained musicians, first statements of thematic material are
grouped together at one extreme of the dimension, and restatements of the same thematic material are
grouped at the other extreme, with less salient material in between.
• Length and tempo. Shorter and faster extracts appear to be polarised from longer and slower ones.
• Melodic motion (conjunct/disjunct). The influence of conjunct or disjunct melodic movement is clearer for
here than in the case of the trained musicians: fragments involving stepwise versus disjunct melodic
movement are clearly arranged along this dimension.
• Global contour and complexity. Those fragments involving simple ascending and descending movement
are placed furthest from those with more complex contour and more changes in contour on this dimension.
Perhaps unsurprisingly, given the findings of previous research, there is no contribution of row structure on
listeners' ratings. Both trained and untrained listeners appear to be focusing on the thematic function and/or
salience of fragments in the Schoenberg piece (unlike the situation for the Beethoven piece where tonality appeared
to be interacting with thematic function for the trained musicians). Once again, the influence of temporal aspects of
the music appears to be a stronger factor for the untrained listeners.
Conclusions
The results of this study suggest that in tonal music, listeners' judgements are characterised by pitch, tonality and
thematic function, with additional influences for non-musicians of tempo-based features (harmonic rhythm and
metric emphasis). In atonal music, listeners' judgements are characterised by thematic function, but also show the
influence of factors which could be characterised as being more 'surface' features of the music, namely melodic
motion and texture, length, tempo, and contour. These similarity judgements seem to be piece- and/or style-specific
(though in order to verify this the empirical study would need to be repeated with a range of other tonal and serial
pieces).
If these differences are indeed style-specific, there may be two possible reasons for this. The first explanation is
that these differences are a result of differing degrees of familiarity with the two styles: tonal music can be assumed
to be the more familiar of the two styles on the basis of likely exposure, and it may be that the structural principles
underlying it are better internalised those for serial music. A second possible explanation is that the differences are
due to 'essential' features of the musical styles: perhaps tonal music is more 'obvious' in its delineation of motivic
and thematic similarity - a possibility which would require further research. One further point to emerge from this
study is that the similarity judgements are also context-sensitive: listeners treated the pieces as setting the
constraints on similarity, rather than using global context-independent features like loudness or pitch height. This is
congruent with observations that categorisation and similarity are context-dependent.
We wish to highlight here, however, one point emerging from the similarity ratings: the polarisation of first
statement material versus elaboration/restatement on one dimension. This polarisation is an exact counter to the
intuitive expectation that a dimension of 'theme' would emerge (i.e. that listeners would judge all statements of the
first theme as similar and maximally different from all statements of the second theme). In fact, listeners appear to
be judging a theme and its restatement as maximally dissimilar on one dimension; no dimension of 'theme' emerges
as such. There are two possible explanations for this. First, listeners may be sensitive to the rhetorical role of
material, hearing the extracts in terms of their characteristics as 'statement' or 'elaboration' / 'restatement'. Although
music theorists recognise the rhetorical character of material (e.g. Agawu, 1991), its pertinence for the listener
remains to be verified. Therefore we consider a second possible explanation: it may be that listeners' similarity
judgements are influenced by the relative salience of extracts. According to theories of salience and the asymmetry
of similarity judgements (e.g. Tversky, 1977), well-formed objects or events are judged both more similar and
more different than less salient material. This would explain the polarisation found on this dimension, where
thematic material (which could be assumed to be well-formed and salient) is judged as being maximally different
from its own elaborations, and where extracts from the development and coda (which might be assumed to be less
salient) are located in the middle of the dimension).
This finding questions the basis for salience and well-formedness in music. Salience, well-formedness, and
thematic function have not previously been found to be important, perhaps because it is harder to pin down and
formalise 'salience' than it is to describe harmonic relations or melodic progressions. Indeed, this study suggests
that further research is needed in order to determine what constitutes salience and well-formedness in music. The
present findings suggest only that they are realised differently in different styles or pieces: for example,
well-formedness in Schoenberg appears to correspond to a hexachordal statement of the row, whereas
well-formedness in Beethoven is an eight-bar phrase with antecedent and consequent, and balanced harmonic
movement.
The implications of this main effect of salience for compositional practice is that material which is well-formed
causes perception of both more similarities and more differences. This is an effective compositional strategy and is
congruent with the notion of 'connected antithesis' as central to the creation of tension and formal structure since it
maximises diversity and unity. If the unity of the piece is assumed, then diversity will be the major outcome of
similarity judgements, and as such, makes the task of empirical research into this area all the more problematic.
This consideration of compositional practice and theory also highlights the importance of considering not only how
far this focus on coherence and unity is a compositional strategy but to what extent it is a received listening
ideology.
Acknowledgements
This research was funded by an Arts and Humanities Research Board small research grant.
References
Agawu, V.K. (1991). Playing With Signs: A Semiotic Interpretation of Classic Music. Princeton, NJ: Princeton
University Press.
Alegant, B. (1996). Unveiling Schoenberg's op.33b. Music Theory Spectrum, 18(2), 143-166.
Barsalou, L.W. (1992). Cognitive Psychology: An overview for cognitive scientists. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Bartlett, J.S. & Dowling, W.J. (1980). Recognition of transposed melodies: a key-distance effect in developmental
perspective. Journal of Experimental Psychology: Human Perception and Performance, 6, 501-515.
Brown, H., Butler, D. & Jones, M.R. (1994). Musical and Temporal Influences on Key Discovery. Music
Perception, 11, 371-407.
Chapin, S.S. (1982). Extracting an Unfamiliar Theme from its Variations. Psychomusicology, 2, 48-50.
Deliège, I. & Mélen, M. (1997). Cue abstraction in the representation of musical form. In: I. Deliège & J.A.
Sloboda (Eds.), Perception and Cognition of Music, Hove: Psychology Press.
Dowling, W.J. (1982). Musical scales and psychophysical scales: Their psychological reality. In: R. Falck & T.
Rice (Eds.), Cross-cultural perspectives on music, Toronto: University of Toronto Press, 20-28.
Eysenck, M.W. & Keane, M.T. (1990). Cognitive Psychology: A Student's Handbook. Hove: Lawrence Erlbaum.
Francès, R. (1958/1988). The Perception of Music. London: Lawrence Erlbaum (translated by W.J. Dowling).
Hampton, J. (1997). Similarity and Categorization. Proceedings of SimCat 1997: An Interdisciplinary Workshop on
Similarity and Categorisation, M. Ramscar, U. Hahn, E. Cambouropolos and H. Pain. (Eds.), Dept. of Artificial
Intelligence, Edinburgh University, 37-41.
Keil, F.C. (1989). Concepts, Kinds and Cognitive Development. Cambridge, MA: MIT Press.
Komatsu, L.K. (1992). Recent views of conceptual structure. Psychological Bulletin, 112, 500-526.
Medin, D.L., Goldstone, R.L. & Gentner, D. (1993). Respects for similarity. Psychological Review, 100(2),
254-278.
Medin, D. & Ortony, A. (1989). Psychological essentialism. In: S. Vosniadou & A. Ortony (Eds.), Similarity and
Analogical Reasoning, Cambridge: Cambridge University Press.
Medin, D.L. & Schaffer, M.M. (1978). Context theory of classification learning. Psychological Review, 85,
207-238.
Meyer, L.B. (1973). Explaining Music: Essays and Explorations. Berkeley, CA: University of California Press.
Murphy, G.L. & Medin, D.L. (1985). The role of theories in conceptual coherence. Psychological Review, 92,
289-316.
Pollard-Gott, L. (1983). The emergence of thematic concepts in repeated listening to music. Cognitive Psychology,
15, 66-94.
Povel & Essens, 1985
Réti, R. (1951). The thematic process in music. New York: The Macmillan Company.
Rips, L.J. (1989). Similarity, typicality and categorisation. In S. Vosniadou & A. Ortony (Eds.) Similarity and
Analogical Reasoning. Cambridge: Cambridge University Press.
Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General,
104(3), 192-233.
Rosch, E. & Mervis, C.B. (1975). Family resemblances. Cognitive Psychology, 7, 573-605.
Rosch, E., Mervis, C.B., Gray, W.D., Johnson, D.M. & Boyes-Braem, P. (1976). Basic objects in natural
categories. Cognitive Psychology, 8, 382-439.
Schoenberg, A. (1967). Fundamentals of musical composition. (G. Strang & L.Stein, Eds.). London: Faber & Faber
Limited.
Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327-352.
Welker, R.L. (1982). Abstraction of themes from melodic variations. Journal of Experimental Psychology: Human
Perception & Performance, 8, 435-447.
Zbikowski, L.M. (1999). Musical coherence, motive, and categorization. Music Perception, 17(1), 5-42.
Back to index
Proceedings paper
(a) finding music, (b) daydreaming, (c) talking, (d) fiddling with instrument, (e) expressing
frustration, (f) resting, and (g) being distracted.
Clarissa was 9 years and 7 months old when she joined her school's band program. She chose clarinet
because "It sounded fun", because "My best friend plays clarinet", and also because "The clarinet
teacher looked nice." Before starting clarinet lessons, Clarissa had learnt Suzuki violin for 4 years, but
this was discontinued when she started clarinet. Her intrinsic motivation for learning an instrument
appeared to be something of a family 'story': "I saw someone playing violin on TV when I was about 3
and I asked my Mum if I could learn an instrument."
Clarissa's school academic results show that she was intellectually very capable. Grade 3 literacy and
numeracy tests that are routinely conducted in New South Wales schools ranked her in the top 8% of
her age group. In addition, she scored in the top 20% of her age group for musical aptitude, as
measured by Gordon's (1982) Intermediate Measures of Music Audiation. Despite these
above-average academic and music aptitude results, Clarissa's actual musical achievement was much
closer to the average of our sample of 157 children. Her score on the Watkins-Farnum Performance
Scale (Watkins & Farnum, 1954), a standardised sight-reading test, was 15 in Year 1 of the study,
compared with a mean of 15.2 (SD = 10.9). One year later, her score of 17 had not risen substantially,
compared with the total sample's mean of 24.5 (SD = 13.0). In Year 3, Clarissa's score rose more
sharply to 28, bringing it closer to the sample mean of 29.2 (SD = 14.7).
Previous research with schoolchildren has established the link between musical achievement and
accumulated practice time (Sloboda, J. W. Davidson, Howe, & Moore, 1996). The slow pace of
Clarissa's musical achievement might thus be partly explained by the quantity of practice she
undertook, as regular interviews with her mother in Year 1 provided us with an estimate of only 6.8
minutes of average daily practice, a figure that was slightly lower than the total sample, who averaged
7.3 minutes (SD = 4.7).
We chose Clarissa in this case study because she fell mid-way on our measures of practice time and
musical achievement, and also on many behavioural measures in the sub-sample of subjects whose
videotaped practice we analysed (McPherson & Renwick, 2000). Her videotaped behaviour suggests
that it may not have been simply the small quantity of practice that led to sub-optimal musical
achievement, but also the cognitive and motivational quality of that practice. The purpose of this
study, therefore, was to investigate the cognitive and motivational processes of Clarissa's practice
behaviour in Year 1, and to compare these with results from Year 3. This provided the basis for us to
speculate about possible explanations for the wide variance found in Clarissa's task engagement and
persistence.
Year 1
In the two recorded practice sessions analysed from Year 1, Clarissa was sitting on the couch in her
family's lounge room. Interview data tells us that practice sessions regularly occurred at 7:45 a.m. and
that this was the last activity in Clarissa's routine before leaving for school. In the interview at the end
of Year 1, Clarissa stated that both parents and her younger sister often listened to her practice, and
that they would "stand and watch and clap afterwards." This family support was evident in the
behaviour of both parents and the sister on the videotapes: practice was clearly an activity in which
the whole family was involved.
The role of the parents was mostly restricted to listening and making encouraging comments, although
there were some instances where Clarissa and her mother discussed the material that should be
practised:
Gottfried, 1985) as demonstrating a lack of intrinsic motivation, but with Clarissa, it seemed that there
was a strong association between intrinsic motivation and the pleasure that she took in playing easier,
familiar melodies. The less familiar, more difficult pieces were played so haltingly and with so many
errors that they seemed to seriously undermine Clarissa's intrinsic desire to practise this type of piece.
As she said in an interview at the end of Year 1: "I don't like learning hard pieces because I find it
annoying." When asked, in the same interview, what she considered the most important thing to do
when practising, Clarissa stated that it was to "play my favourite songs."
One possible explanation for this clear distinction between 'hard' and 'easy' pieces in Clarissa's
practice accuracy and self-reports may be related to the clarity of the goal-state in the problem-solving
activity of decoding the notation (J. E. Davidson & Sternberg, 1998). For familiar children's melodies
such as Old Macdonald, Clarissa seemed quite capable of using her mental representation of the
melody as a template against which to measure her performance. Her success in playing this melody
correctly seemed to give her pleasure. However, with melodies that were not familiar, her attentional
resources seemed to be exhausted by the act of decoding the musical notation while playing, and she
therefore seemed unaware of the majority of the errors that she failed to correct. Interestingly, this
may have been less of a problem when Clarissa was learning violin through the Suzuki method
because recorded models of set pieces are used as a reference point and could have been used by
Clarissa and her parents as a way of checking her accuracy. As her mother stated in an interview:
With Suzuki, I knew every note. I don't feel I know enough about what she should be
doing with clarinet-that's the style of teaching.
It would seem from the child's and mother's comments, as well as our observational data, that the
instructional method that Clarissa was using in Year 1-typical of most of the children in our study-was
prone to inducing cognitive overload (Sweller, Van Merrienboer, & Paas, 1998) because its lack of
sequencing and of a clear goal-state led to excessive demands on working memory. Given that
Clarissa's motivational orientation was most optimally engaged when she played her "favourite
songs", it is not surprising that she did not show the persistence necessary to overcome the cognitive
overload inherent in her approach to new material.
Year 3
The observational and interview data for Year 3 reveal some interesting factors that changed over the
intervening 2 years. One clear change was in the practising environment. The time set for practice
remained at 7:45 a.m., but the venue had changed from the lounge room to the quieter dining room. In
the videotapes, Clarissa appeared alone, and in the interviews she stated that her parents now only
listened to her practice "sometimes".
The content of practice had progressed to more difficult material, but was still dominated by repertoire
for 92% of the time she spent playing. Despite Clarissa's insistence in the Year 3 interview on the
importance of practising technical work, this aspect made up only 8% of her time spent playing on the
videotapes. Time spent not practising was almost exclusively taken up with looking for music to play
(94%) with the remainder (6%) spent resting. This increased resting time may be a factor of the longer
pieces now being practised. Because there were no other people present, no talking was observed.
Clarissa's predominant approach to practising remained playing a piece or exercise through once and
correcting some errors on the way: 96% of playing time occurred within the first run-through of a
piece. This behaviour contrasts with Clarissa's description of her own practising methods in the Year 3
interview:
I normally play the piece all the way through and then come back to the bits that are bad
Comparison of the pattern of errors (Table 1) reveals additional large differences. With Golden
Wedding, Clarissa was more likely to repeat sections longer than two notes, and less likely to repeat
only one or two correct notes than in the three classical pieces. In other words, in this one piece, she
demonstrated behaviour that Gruson (1988) found to be associated with expertise level. It was also
only in Golden Wedding that Clarissa showed any signs of deliberately altering her tempo when
repeating sections-another important component of the approach of experts (Miklaszewski, 1989;
Nielsen, 1999). The appearance of this strategy in Clarissa's otherwise novice behaviour was
unexpected.
These large differences in practice behaviour between pieces led us to investigate possible
explanations. When Clarissa was interviewed she explained that her desire to learn Golden Wedding,
unlike other pieces, was strongly motivated by her intrinsic interest in the piece. Apparently, in one of
her instrumental lessons, Clarissa's teacher had mentioned that he played a 'jazzy' version of La
Cinquantaine in his big band, and he demonstrated so that she could hear the transformation. Strongly
motivated by her desire to play in a jazz style, Clarissa asked her teacher to notate the theme of
Golden Wedding, so that she could practise it at home. Thus, rather than the task being chosen by the
teacher, as is the usual practice in most lessons, Golden Wedding was chosen by the student.
The notated version of Golden Wedding, hastily sketched out by the teacher, appears on the videotape
to be acting as only a rough prompt for which notes to play. The aural memory of the teacher's
performance was possibly a more vivid prompt. There is a phrase where the melody climbs to notes in
the clarinet's range that Clarissa does not know well, and she uses a trial-and-error approach to find
these notes, by reference to her mental representation. Thus, it would appear that Clarissa was able to
return in Golden Wedding to the pleasurable activity she reported in Year 1 of playing her "favourite
songs" by ear, at the same time as she demonstrated highly atypical task engagement.
Related literature and discussion of findings
Our subject's motivational pattern in Year 3, as she approached adolescence and the transition to high
school, appears from the interview data to be highly multifaceted. For instance, what might seem a
noticeable increase in autonomy, as reflected in an increasing desire to practise alone, is qualified by
Clarissa's remarks about her practice being contingent on extrinsic rewards (i.e., pocket money).
When asked to respond to 14 Likert scale items which provided possible reasons for practising her
instrument, the strongest response ("very true of me") was on the scale "Because that's what I'm
supposed to do". Enjoyment-related reasons were scored "not very true of me". Thus, it would appear
that Clarissa's general motivational orientation to school activities was that they were part of her
larger set of obligations:
Playing my clarinet is part of my morning routine, which is part of my job list and I get
paid my pocket money if I do everything on the job list.
Nevertheless, recent research (Pintrich & Schrauben, 1992) has suggested that high levels of extrinsic
motivation can occur together with high levels of intrinsic motivation. Several comments made by
Clarissa in her Year 3 interview revealed a sense of achievement motivation. When asked what was
the most exciting thing that had happened to her musically, she answered:
When I graduated from Book 1 Suzuki when I played the violin.
However, she now seemed unsure of her own achievement level on the clarinet, making this a less
potent motivator:
I don't know if I am going well on the clarinet. Not many people have made any
comments on my playing, so I am not sure.
By Year 3, Clarissa seemed to have changed her attitude towards 'hard' pieces. Asked if she liked to
learn them, she replied: "Yes. It makes the pieces a challenge."
Existing research demonstrates that primary-school children can differentiate levels of intrinsic
motivation for different school subjects, and that a general motivational orientation can also be found
for each individual that is less domain-specific (Gottfried, 1985). For instance, while Clarissa stated in
Year 3 that practising was more boring than fun, she also said: "I like to practise most of the things
my teacher gives me," which shows that she was able to distinguish between her intrinsic interest in
the repertoire and her dislike for the process of learning it. She was also able to distinguish between
tasks that have extrinsic utility (Eccles, Wigfield, & Schiefele, 1998), but are not inherently enjoyable.
Asked what were the bad things about learning clarinet, she replied: "You have to keep on playing
your scales, but you need them to play songs."
Recently, there has been a resurgence of research in the area of interest in learning (Krapp, Hidi, &
Renninger, 1992; Schiefele, 1991). This field investigates the domain-specific aspects of intrinsic
motivation, often by observing the effects of differential levels of interest on individuals' processing
of text. Clarissa's observed practice and comments on Golden Wedding can be explained according to
this literature in terms of actualised interest, which Schiefele (1991) defines as content-specific
intrinsic motivation. Asked what were the good things about learning the clarinet, Clarissa replied:
"You get to play fun, jazzy songs." Interest has been found to enhance the subjective quality of the
learning experience and also to influence the quality of learning results, with high-interest subjects
engaging in more intensive and meaning-oriented processing of text (Schiefele, 1991). Interest
(Schiefele, 1991) and task value (McPherson & McCormick, 1998; Pintrich & De Groot, 1990) have
been found to be associated with the use of higher-order learning strategies, as we observed in
Clarissa's practice of Golden Wedding. The notion of task-specificity that is inherent in research on
interest can be observed in the large differences between Clarissa practising her standard repertoire
and her self-selected jazz piece. Similarly, research in expectancy-value motivation is beginning to
clarify these processes in music. For example, O'Neill (1999) found that student musicians'
perceptions of the importance of a musical task predicted the amount of practice they undertook,
while McPherson (2000) found that estimates children made before lessons commenced of how long
they intended learning their instrument predicted their practice time and musical achievement 9
months later. The present observational case study extends and supports these empirical findings.
Conclusions
The results of this case study suggest that with strong enough motivation, even quite young music
learners can engage in the types of self-regulatory behaviour that will enhance their achievement. The
effect that was found in Clarissa's practice of Golden Wedding of intrinsic interest leading to
considerably greater persistence, strategy use, and monitoring of accuracy seems intuitively obvious.
Nevertheless, the majority of instrumental teachers-at least in the English-speaking world-continue to
choose most of the repertoire played by their students, and to base their lessons around a
teacher-directed model where the prime focus of attention is 'learning the notes' (Reid, 1997). We
speculate therefore, that encouraging students to practise repertoire which they select themselves and
find personally interesting, can lead to an increase in cognitive engagement and more efficient
learning. Consequently, a detailed and systematic analysis of data collected in the 3-year longitudinal
study will continue as one means of clarifying this important issue.
Note
This research has been supported by a large Australian Research Council grant (No. A79700682),
awarded for a 3-year study in 1996.
References
Davidson, J. E., & Sternberg, R. J. (1998). Smart problem solving: How metacognition
helps. In D. J. Hacker, J. Dunlovsky, & A. C. Graesser (Eds.), Metacognition in
educational theory and practice (pp. 47-68). Mahwah, NJ: Erlbaum.
Eccles, J. S., Wigfield, A., & Schiefele, U. (1998). Motivation to succeed. In W. Damon
(Series Ed.) & N. Eisenberg (Vol. Ed.), Handbook of child psychology: Vol. 4. Social,
emotional, and personality development (5th ed., pp. 1017-1095). New York: Wiley.
Feldstein, S., & O'Reilly, J. (1988). Yamaha band student: A band method for group or
individual instruction (Book 1). Van Nuys, CA: Alfred Publishing Corporation.
Gordon, E. E. (1982). Intermediate measures of music audiation. Chicago: G. I. A.
Publications.
Gottfried, A. E. (1985). Academic intrinsic motivation in elementary and junior high
Back to index
Proceedings paper
Introduction
Time evaluation of short temporal intervals represents an interesting topic in the research about
human cognition because the basic mechanisms implied are still not clearly understood. Several
models attribute the ability to evaluate time to the processing of an internal clock and revealed a
correlation between accuracy of time evaluation and the degree of attention devoted to time (Thomas
and Weaver, 1975). Other models, derived by the influential hypothesis proposed by Ornstein (1969),
consider time evaluation a by-product of the cognitive processing, and regard the memory coding as
the main determinant of time evaluation accuracy. In these years, however, a lot of experimental
works showed a more complex picture. Neurophysiological studies (Rammsayer, 1994; Macar &
Casini, 1998) revealed a possible dissociation between mechanisms underlying the processing of very
short temporal intervals (in the order of few milliseconds) and long ones (from few seconds to few
minutes). Already in 1984, Fraisse (1984) draw a distinction between time perception and time
evaluation, as time perception is tied to psychological present, whereas time evaluation involves
memory. As Block (1990) pointed out time evaluation behavior depends on the interaction of several
factors such as subject's individual characteristics, internal attributes of the time period, cognitive
activities carried out during the time period and other time related behaviors imposed by
environmental or experimental requests. Moreover Zakay (1990) showed the involvement of different
cognitive processes (STM or LTM) by using different methods of assessing time evaluation. In
particular using a prospective paradigm (when subjects know in advance the temporal request) we
have to do with an experienced time, whereas using a retrospective paradigm (when the subjects
ignore the task) we have to do with a remembered time.
Previous works showed that, in prospective condition, duration estimates are longer when they are
requested after a short time interval occupied with interfering cognitive tasks than when they are
requested immediately (Bueno, 1994; Vitulli & Shepard, 1996, 1998). The interfering task causes
subjects to rebuild the duration from memory. We should expect the opposite pattern (shorter
estimates) by using an empty time interval in a remote estimation paradigm. In this case the delay
reduces the availability of temporal information and the time estimates should reflect the attentional
resources required by event processing (Kahneman, 1973). Events requiring complex processing were
evaluated as shorter than events requiring simple processing (Craik and Lockhart, 1972).
Both Poynter (1983, 1989) and Jones and Boltz (1989), although starting from two different points of
view, outlined the importance of the segmentation of the flow of events in time evaluation. According
to Jones and Boltz some natural events are characterized by a high degree of structural coherence and
tend to elicit an automatic mode of attending because their temporal course is more predictable. On
the other hand events characterized by a low degree of coherence require a controlled mode of
attending and an active segmentation of the flow of stimulation by means of counting and grouping
strategies.
Some studies report that, in retrospective condition, overestimation of duration increases as events
coherence decreases (Boltz, 1995). Other studies show that, in prospective condition, irregular events
are more underestimated than regular events (Macar, 1996). Event structure depend on the interplay
between two different level of organization: a vertical organization corresponding to the periodical
grouping of high-level time units; and an horizontal organization corresponding to the serial
developments of low-level time units. We suppose that both vertical and horizontal organization
contribute to modulate attentional factors and memory representation in time evaluation. In this study
an attempt was made to verify this hypothesis.
Method
Twenty-four students of a first year course of Psychology participated. Subjects were requested to
reproduce the duration of structured auditory events in a prospective paradigm condition.
The auditory events derived by some musical fragments composed according to two different criteria:
Tonal (TO) or Non-Tonal (NT) composition, corresponding to the periodic organization; and Salient
(SA) or Non-Salient (NS) characterization, corresponding to the serial organization. These musical
fragments were edited so that only temporal relations between consecutive pulse were conserved.
Three Analyses of Variance carried out on the effective durations (mean =7.44 sec.; d.s=1.26), on the
number of beats (mean=4.87 beats; d.s.=0.81) and on the number of pulses (mean=22.25 pulses;
d.s.=5.14) did not show any significant difference among the different classes.
The role of structural factors in time reproduction was evaluated in relation to the effect of delayed
estimation condition. In the immediate condition the reproduction started shortly after the end of the
sequence. In the delayed condition a twenty-second empty time interval was introduced between the
end of the sequence and the beginning of the reproduction.
A within factors 2x2x2 experimental design was used. The repeated measure factors were the delay
condition (immediate or delayed condition), the periodic organization (tonal or atonal sequences), and
the serial organization (salient or non-salient sequences) of events structure. The dependent variable
was the Directional Error calculated as the ratio between the reproduced duration and the effective
duration of each auditory sequence. Music education, experienced difficulty, interest for the task and
attention needed to complete the task were assessed by means of a self-evaluation scale.
Result and Discussion
Results show that both serial and periodic organization affect time estimation. The serial organization
contributes to the overall accuracy of estimates independently of the delay of the reproduction. In fact
time reproductions of Non-Salient sequences are less accurate and shorter than time reproductions of
Salient sequences in both immediate and delayed condition.
On the other hand the periodic organization, also, enhances the processing of events. Time
reproductions of Non-Tonal sequences are less accurate and shorter in the delayed condition than in
the immediate condition.
This pattern of results suggest that the serial organization (Salient vs. Non-Salient) modulate
attentional factors, whereas the periodic organization (Tonal vs. Non-Tonal) affect the memory
representation of auditory event processing in time estimation.
References
Block, R.A. (1990). Models of psychological time. In R.A. Block (Ed.), Cognitive models of
psychological time (pp. 1-36). Hillsdale: Lawrence Erlbaum Associates.
Boltz, M.G. (1995). Effects of event structure on retrospective duration judgments. Perception
& Psychophysics, 57(7), 1080-1096.
Bueno, M.B. (1994). The role of cognitive changes in immediate and remote prospective time
estimations. Acta Psychologica, 85(2), 99-121.
Craik, F.I.M., & Lockhart, R.S. (1972). Levels of processing: A framework for memory
research. Journal of Verbal Learning and Verbal Behavior, 11, 671-684.
Fraisse, P. (1984). Perception and estimation of time. Annual Review of Psychology, 35, 1-36.
Jones, M.R., & Boltz, M.G. (1989). Dynamic attending and responses to time. Psychological
Review, 96(3), 459-491.
Kahneman, D. (1973). Attention and effort. Prentice-Hall, Englewood Cliffs, NJ.
Macar, F. (1996). Temporal judgments on intervals containing stimuli of varying quantity,
complexity, and periodicity. Acta Psychologica, 92(3), 297-308.
Macar, F., & Casini, L. (1998). Brain correlates of time processing. In V. De Keyser, G.
d'Ydewalle & A. Vandierendonck (Eds.), Time and the dynamic control of behavior (pp.
71-82). Seattle: Hogrefe & Huber Publishers.
Michon, J.A. (1990). Implicit and explicit representations of time. In R.A. Block (Ed.),
Cognitive models of psychological time (pp. 37-58). Hillsdale: Lawrence Erlbaum Associates.
Ornstein, R.E. (1969). On the experience of time. Penguin Books, Harmondsworth
Poynter, W.D. (1983). Duration judgement and the segmentation of experience. Memory &
Cognition, 11, 77-82.
Poynter, W.D. (1989). Judging the duration of time intervals: A process of remembering
segments of experience. In I Levin & D. Zakay (Eds.). Time and human cognition A life-span
perspective (pp. 305-321). Elsevier, Amsterdam.
Rammsayer, T.H. (1994). A cognitive-neuroscience approach for elucidation of mechanisms
underlying temporal information processing. International Journal of Neuroscience, 77(1-2),
61-76.
Thomas, E.A.C., & Weaver, W.B. (1975). Cognitive processing and time perception.
Perception and Psychophysics, 17, 363-367.
Vitulli, W.F., & Crimmins, K.A. (1998). Immediate versus remote judgements: delay of
response and rate of stimulus presentation in time estimation. Perceptual & Motor Skills, 86(1),
19-22.
Vitulli, W.F., & Shepard, H.A. (1996). Time estimation: effects of cognitive task, presentation
rate, and delay. Perceptual & Motor Skills, 83(3), 1387-1394.
Zakay, D. (1990). The evasive art of subjective time measurement: some methodological
dilemmas. In R.A. Block (Ed), Cognitive models of psychological time (pp. 59-84). Hillsdale:
Lawrence Erlbaum Associates.
Back to index
Proceedings paper
of information about the aggregate structure. Therefore, the resource demands of aggregate pattern
processing differ from those associated with attending to one's own part mainly in terms of the
grouping process that combines elements from different parts. This process, which may be termed
'between-part grouping', requires considerable attentional flexibility as it involves scanning between
parts continuously, and in real time, to determine their interrelationship.
It is assumed here that, apart from between-part grouping, attending to one's own part, on one hand,
and the aggregate structure, on the other hand, are subserved by common processes. Accordingly, the
tracking and representation formation processes associated with the perception of aggregate structures
operate similarly to those employed in relation to one's own part (except that tracking other parts
occurs only via the auditory channel). Likewise, retrieval demands arise if the performer recalls
performance goals relating to the aggregate structure. The importance of such performance goals has
been demonstrated by Goodman (1998, 2000), who investigated how 'model' conceptions of aggregate
structures develop and influence interactions between individual performers in ensembles.
Clearly, due to the substantial overlap between resources involved in attending to one's own part and
the aggregate structure, these two components of prioritised integrative attending should be
susceptible to mutual interference. Two primary sources of interference are identified in ARAMEP:
(a) the degree to which tracking one's own part disrupts the tracking of other parts, and (b) the
disruption of between-part grouping caused by the structural complexity of the interrelationship
between parts. Following from principles in multiple task theory, interference to tracking is related
mainly to the difficulty of one's own part, whereas interference to between-part grouping is largely a
function of the compatibility of one's own part and the aggregate structure. The factors that influence
these considerations are discussed next.
Factors that influence prioritised integrative attending
The difficulty of one's own part is a subjective consideration insofar as it is the product of attentional
and motor constraints that operate within the individual performer. Thus, difficulty is prone to be
affected by general, relatively extramusical factors (i.e., qualities of the performer rather than the
music) that are related to both attention and motor control, such as anxiety, arousal, mastery of
instrumental technique, familiarity with the music in question, and other factors relating to musical
expertise (see Sloboda, 1996). However, there are also some more objective determinants of
difficulty. These include several musical factors that have been found to affect attentional processes
under a wide variety of circumstances, e.g., rhythmic complexity (e.g., Bharucha & Pryor, 1986;
Essens, 1995; Handel, 1973; Handel, 1992; Jones & Boltz, 1989; Klein & Jones, 1996; Povel &
Essens, 1985), pitch-related factors such as the size of the interval between adjacent tones and
melodic contour (e.g., Dowling, 1978; Dowling & Bartlett, 1981; Dowling, Kwak, & Andrews, 1995),
and tonality and harmonic context (e.g., Dawe, Platt, & Racine, 1993; Holleran, Jones, & Butler,
1995; Krumhansl, Sandell, & Sargeant, 1987; Palmer & Krumhansl, 1987; Schmuckler, 1989; Smith
& Cuddy, 1989). It is assumed in ARAMEP that these factors disrupt the tracking of other parts to the
extent that they augment the demands associated with producing and monitoring one's own part.
It is also assumed in ARAMEP that the compatibility of one's own part and other parts may affect
both tracking and between-part grouping, but only the latter is of primary concern here. In multiple
task theory, two broad varieties of compatibility have been identified: spatial and temporal (Wickens,
1989, 1991). Both spatial and temporal compatibility are relevant in music, where their analogues are
pitch and rhythm, respectively (see Jones, 1981, 1992).
The compatibility of the relationship between parts in terms of pitch is dependent upon factors such a
tonality and pitch range. In particular, between-part grouping process should be more difficult when
parts are in different keys (as in polytonality, see Krumhansl, 1990; Thompson & Mor, 1992) or when
there is large pitch separation between parts. The latter is exemplified in auditory streaming
demonstrations where (at certain presentation rates) a sequence of alternating high- and low-pitch
tones is perceived as a single stream when pitch separation is narrow, whereas the sequence
segregates into a stream of high tones and a stream of low tones when pitch separation is wide (see
Bregman, 1990; Jones & Yee, 1993).
Temporal compatibility in ensemble music is a matter of the rhythmic complexity, i.e., coherence of
the temporal relationship, between parts comprising the multipart texture. Lack of coherence has been
found to interfere with between-part grouping in studies of auditory streaming (Jones, Kidd, &
Wetzel, 1981), and related work has been done with polyrhythm (Handel, 1984, 1989; Pressing,
Summers, & Magill, 1996; Summers, Rosenbaum, Burns, & Ford, 1993). Rhythmic complexity, in
general, can be defined according to how a pattern's structure fits within a metric framework, i.e., a
cognitive/motor schema consisting of hierarchically nested pulsations generated in the performer. In
situations requiring prioritised integrative attending, multipart rhythmic complexity concerns the
degree to which one's own part and the aggregate structure can be accommodated by the same metric
framework.
Combined effects of spatial and temporal compatibility arise within textural factors such as density,
i.e., the number of parts in the ensemble, and how differentiated they are in terms of rhythm (even if
low in complexity), tessitura, and timbre. Between-part grouping should generally deteriorate with
increases in the number of parts and differences in rhythm, tessitura, and timbre.
The findings of a survey-based study conducted by Keller (2000b) with practising ensemble
musicians are consistent with the above ideas about difficulty and compatibility. Musicians were
asked to rate how influential upon their ability to engage in prioritised integrative attending are
various extramusical factors (anxiety, arousal, and technical mastery) and musical factors (complexity
relating to rhythm, texture, and several pitch-related factors). A particular order of importance
emerged, with extramusical factors generally being rated as more influential than musical factors.
Within the musical factors, rhythm and texture were considered to be most influential, followed by
tonality, melodic contour, and pitch-interval size. In response to open-ended questions about
situations in which prioritised integrative attending is compromised, musicians listed several
additional influential factors (e.g., an uncomfortable performance environment and poor balance in
terms of loudness between parts). The most noteworthy outcome of this study is that the rhythmic
complexity of one's own part and the rhythmic complexity of the relationship between parts were
claimed to be particularly influential. This finding implies that metric frameworks may have a special
role in prioritised integrative attending. Specifically, metric framework generation may facilitate the
processing and representational efficiency, and thereby the attentional flexibility, required in
ensemble performance (Keller, 1999).
A weakness of the above approach is that it relies solely upon introspective self-reports by musicians.
Nevertheless, the proposed link between metric frameworks and prioritised integrative attending is
supported by some recent, more rigorous experimental research (see Keller, 1999). The experiments
employed novel dual task paradigms that were intended to simulate the attentional demands arising in
ensemble performance. Both recognition memory and reproduction-based tasks were used. In all
tasks, participants were presented with multipart patterns, and required to attend simultaneously to a
particular 'target' part and the aggregate structure. The target parts comprised either metrical patterns
(i.e., patterns that fit a metric framework) or nonmetrical patterns (i.e., patterns that do not fit a metric
framework), and the aggregate structures were always metrical. Across experiments it was found that
processing the aggregate structures encountered greater interference when target parts were
nonmetrical than when they were metrical. Thus, attentional flexibility was enhanced when
participants were able to use a common metric framework for both components of prioritised
integrative attending.
Mechanisms underlying prioritised integrative attending
In ARAMEP, it is claimed that metric framework generation provides an attentional scheme that
guides the tracking, between-part grouping, representation formation, and retrieval processes that
were identified earlier as sub-skills of prioritised integrative attending. The notion of meter as an
attentional scheme is a central concern in Mari Riess Jones' dynamic attending approach to rhythmic
behaviour (Jones, 1976; Jones & Boltz, 1989; Large & Jones, 1999). It is proposed in the dynamic
attending approach that attentional activity fluctuates lawfully in response to the structure of musical
patterns. Specifically, attentional energy surges towards temporal locations at which events are
expected to occur. In metrical patterns, due to underlying periodicities, events are statistically more
likely to occur at strong metric locations (such as the beginning of bars and beats) than at weak
locations (Palmer & Krumhansl, 1990). Therefore, when processing metrical patterns, expectancies
occur periodically and the attender's 'biological rhythms' synchronise with the pattern's structure.
Consequently, the attender experiences periodic fluctuations in attentional activity that mirror metric
structure. In other words, there is a greater attentional activity at strong metric locations than at weak
locations. Gjerdingen (1989, p. 78) has suggested that it might even appear as if the attender is "
'paying more attention' to events occurring on strong beats".
Although the concept of metric fluctuations in the activity of attentional resources is generally useful
when accounting for efficiency and flexibility in processing, it is not sufficient for explaining
prioritised integrative attending. If it is assumed that resources are limited in the sense that there is a
level ceiling on resource capacity, then an increase in attentional activity at strong metric locations
would bring resource consumption closest to full capacity at the corresponding points in time. This
logically might lead to greater scarcity of the resources necessary for processes such as tracking other
parts and between-part grouping at strong, relative to weak, metric locations. This proposition is
counterintuitive, as it implies that it should be more difficult for ensemble performers to engage in
prioritised integrative attending at these specific locations. Although this has not been investigated
directly, research on asynchrony between ensemble members (e.g., Rasch, 1988; Shaffer, 1984)
appears to indicate that this is not necessarily the case (see Keller, 2000a).
Therefore, to accommodate prioritised integrative attending, ARAMEP incorporates a two-factor
account of attentional resource allocation that specifies how variations in resource availability
compensate for fluctuating resource activity. In this account, resource availability refers to the
proportion of the attender's resources that are free to serve in a given task at a particular point in time,
and resource activity refers to the proportion of the attender's available resources that are actually
employed in the service of the task.
It is proposed here that, to overcome capacity limitations, both resource availability and resource
activity have the potential to be modulated in tandem in a manner that is highly plastic and efficient.
In musical contexts, this potential is released by the generation of metric frameworks. This conception
of resource allocation, which is depicted in Figure 1, is based upon several assumptions. First,
resource availability fluctuates across timepoints in a manner that mirrors the profile of metric
frameworks if an appropriate cognitive/motor schema is invoked. Second, the default pattern of
resource activity mirrors the distribution of resource availability across time. Furthermore, activity is
regulated by a feedback mechanism that is sensitive to availability limits, and thus functions to ensure
that activity remains within these limits.
This tight relationship between resource availability and resource activity enables the efficient
processing of metrical patterns. In accordance with the dynamic attending approach (e.g., Large &
Jones, 1999), it is assumed here that resource activity becomes focused at space/time regions where
target events occur, or are expected to occur. Consequently, due to variability in the concentration of
events at different metric locations in music, resource activity is usually increased at strong locations.
When resource availability and resource activity operate in concert, however, compensatory increases
in availability accompany the momentary increases in activity. Therefore, sufficient resources are
available so long as the pattern continues to conform to the established metric structure, as a greater
proportion of resources is 'on standby' at strong metric locations. Note that this account differs from
the dynamic attending concepts described earlier mainly in that it addresses resource availability in
addition to resource activity: both factors come to share metric structure. The current account also
deviates from traditional resource theory in that it emphasises lawful fluctuations in resource
availability at the timescale where meter resides.
Figure 1: Both resource availability and resource activity mirror metric structure (the dots represent
metric pulsations).
The two-factor conception of resources described above becomes particularly useful when attempting
to explain resource allocation during prioritised integrative attending. At weak metric locations,
resource activity associated with processing one's own part is typically relatively low, and therefore
the performer is free to track other parts and engage in between-part grouping. At strong metric
locations, even though resource activity associated with processing one's own part may be higher,
tracking and between-part grouping are enabled by increased resource availability. Thus, the efficient
distribution of attentional resources provides a foundation for flexibility in attending. These benefits
cease to exist in the absence of metric framework generation.
It is assumed in ARAMEP that resource availability is no longer modulated systematically when
metric frameworks, or other appropriate schemas, are not generated. Resource availability is instead
constant over time (or characterised by minute random fluctuations) in these situations - such as when
attention is directed to a nonmetrical pattern. Nevertheless, resource activity continues to fluctuate,
typically quite wildly, in response to the unfolding nonmetrical pattern. Thus, resource activity and
resource availability may become decoupled: fluctuations in activity are not compensated for by
corresponding fluctuations in availability.
Furthermore, when attending to a nonmetrical pattern, resource availability should eventually begin to
decrease. Expectations about the temporal location of events comprising nonmetrical patterns lack
precision. Therefore, there must be a corresponding increase in the size of the temporal region during
which the attender is prepared for the events. As is the case in metrical patterns, this preparedness is
manifested as increased resource activity. However, in nonmetrical patterns, these relatively high
levels of resource activity must extend over regions of greater duration than those circumscribed by
strong locations in metric frameworks. Based upon the assumption that sustained focused activity is
effortful and leads to decreases in resource availability (see Koelega, 1996), the present account
postulates that adequate increases in resource availability are not sustainable over the extended
regions of high resource activation demanded by nonmetrical patterns.
In any case, when attending to nonmetrical patterns, resource availability and resource activity are less
likely to be correlated than when attending to metrical patterns. This independence becomes
especially problematic in multipart contexts where prioritised integrative attending is required. This is
because resource activity associated with attending to nonmetrical target parts frequently nears, or
even meets, resource availability. Recovering from these frequent disturbances interferes with
processing and leaves little scope for the flexible attending that underlies tracking other parts and
between-part grouping.
Interactions of factors and mechanisms
The primary goal of ARAMEP is to explicate how the musical and extramusical factors identified
earlier interact with the resource availability and resource activity mechanisms described above. It is
claimed that this interaction is initiated when the factors impact upon the 'state' of the performer and
their environment, i.e., the surrounding musical context. ARAMEP is concerned with three aspects of
the performer/context relationship: intrinsic cognitive-motivational, intrinsic executive, and extrinsic
states.
Intrinsic states in general occur within the performer, and include the degree to which his or her
perceptual, cognitive, and motor resources are prepared to deal with the ensemble interaction at hand.
Two varieties of intrinsic state are distinguished: cognitive-motivational and executive. Intrinsic
cognitive-motivational state refers to high level cognitive phenomena such as attentional sets, and
motivational or emotional factors such as mood. Musical factors that contribute to intrinsic
cognitive-motivational state include performance goals and musical schemas - e.g., metric
frameworks and abstract knowledge of tonality. Extramusical factors that influence intrinsic
cognitive-motivational state include anxiety, arousal, and motivation. Intrinsic executive state
incorporates the strategies that are available to the performer for producing target behaviour. Intrinsic
executive state is defined jointly by the performer's mastery of instrumental technique and the
schemas that guide their motor actions (e.g., performance plans).
In contrast to the intrinsic states, extrinsic state is a product of the performer's external environment,
which comprises the ensemble sound, physical surroundings, and ambient social context. Extrinsic
state is affected by musical factors such as rhythm, texture, tonality, melodic contour, and
pitch-interval size, and extramusical factors such as acoustic conditions, lighting, background noise,
and comfort of the performance space.
It is proposed here that the three performer/context states are linked systematically to resource
availability and resource activity (see Figure 2). Intrinsic cognitive-motivational state modulates both
resource availability and resource activity, whereas extrinsic and intrinsic executive states have a
direct influence only upon resource activity. In other words, the temporal profile of resource activity
is determined by the full range of musical and extramusical factors, but the timecourse of resource
availability is determined exclusively by schematic musical factors - metric frameworks in particular -
and physiological extramusical factors - anxiety, arousal, and motivation. Nevertheless, resource
availability may be affected indirectly by the other factors through causal links between the three
performer/context states (see Figure 2). Extrinsic state affects both intrinsic executive state (e.g.,
comfort may influence technique) and intrinsic cognitive-motivational state (e.g., ensemble sound
structure determines which schemas are invoked, and social context influences anxiety); intrinsic
cognitive-motivational state modulates intrinsic executive state (e.g., anxiety affects motor control);
and intrinsic executive state influences extrinsic state (e.g., motor control affects ensemble sound). A
feedback link from intrinsic executive system to the intrinsic cognitive-motivational system is also
assumed to exist for the endogenous regulation of motor control (e.g., error correction based on
proprioceptive feedback, rather than external acoustic sources).
Figure 2: Interaction between performer/context states and resource availability and resource activity.
The connections between performer/context states and resource availability and resource activitiy in
ARAMEP embody predictions about how the musical and extramusical factors under discussion
interfere with prioritised integrative attending. Basically, different factors lead to three different types
of interference: (Type A) decreased resource availability, (Type B) increased resource activity, or
(Type C) decoupled availability and activity. Type A interference occurs when global decreases in
resource availability are brought about by factors such as extreme high or low levels of anxiety and
arousal (see Kahneman, 1973). On the other hand, the increases in resource activity that characterise
Type B interference are related to rhythmic, textural, and pitch-based complexity. Finally, in Type C
interference, the decoupling of resource availability and resource activity occurs exclusively as a
result of rhythmic complexity. Interactive effects may also arise through combinations of musical and
extramusical factors. Rhythmic complexity should be a particularly potent contributor in these
interactions, given the intimate relation between metric frameworks and resource allocation
mechanisms postulated in ARAMEP. If this claim is valid, then music educational techniques for
dealing with rhythmic complexity by optimising metric framework generation should facilitate the
development of prioritised integrative attending skills.
Conclusions
Prioritised integrative attending in ensemble performance is a multifaceted skill composed of
sub-skills including tracking multiple sound sources and grouping together their elements in order to
derive the aggregate structure. ARAMEP accounts for how the attentional flexibility required for
these sub-skills is influenced by a range of musical factors (e.g., rhythmic and pitch-based
complexity) and extramusical factors (e.g., anxiety and arousal). The central claim is that these factors
act directly upon cognitive/motor mechanisms that regulate attentional resource allocation during
performance. The role of metric framework generation is paramount in this process. Through metric
framework generation, resource availability and resource activity are modulated - in real time and
continuously - in a manner that is both plastic and efficient, and hence conducive to attentional
flexibility. Thus, the sub-skills involved in prioritised integrative attending encounter minimal
interference when availability and activity maintain a lawful relationship. However, the decoupling of
resource availability and activity that occurs in the absence of metric frameworks curbs attentional
flexibility. The challenge for future research is to demonstrate empirically the distinction between
coupled and decoupled resource availability and activity.
By identifying the factors and mechanisms underlying prioritised integrative attending, ARAMEP has
potential to serve as a heuristic for directing research into ensemble skills. It also highlights the
complexity of the task faced by ensemble musicians. ARAMEP implies that ensemble performance
requires special capabilities that transcend the skills commonly identified in music education as
indices of a performer's general musical ability (see Shuter-Dyson, 1999). Technical competence as an
instrumentalist, accurate perceptual-motor skills, and artistry as an interpreter of music are necessary,
but by no means sufficient, for excellence as an ensemble performer. Specific prioritised integrative
attending sub-skills are fundamental to ensemble cohesion and coherence.
References
Allport, D.A., Antonis, B., & Reynolds, P. (1972). On the division of attention: A
disproof of the single channel hypothesis. Quarterly Journal of Experimental
Psychology, 24, 225-235.
Bharucha, J.J., & Pryor, J.H. (1986). Disrupting the isochrony underlying rhythm: An
aymmetry in discrimination. Perception and Psychophysics, 40, 137-141.
Bregman, A.S. (1990). Auditory scene analysis. Cambridge, Massachusetts: The MIT
Press.
Casey, J.L. (1991). Teaching techniques and insights for instrumental music educators.
Chicago: GIA Publications.
Crowder, R.G. (1993). Systems and principles in memory theory: Another critique of
pure memory. In A.F. Collins & S.E. Gathercole (Eds.), Theories of memory (pp.
139-161). Hove, UK: Lawrence Erlbaum Associates.
Damos, D.L. (1991). Dual-task methodology: Some common problems. In D.L. Damos
(Ed.), Multiple-task performance (pp. 101-119). London: Taylor & Francis.
Dawe, L.A., Platt, J.R., & Racine, R.J. (1993). Harmonic accents in inference of metrical
structure and perception of rhythm patterns. Perception and Psychophysics, 54, 794-807.
Dowling, W.J. (1978). Scale and contour: Two components of a theory of memory for
melodies. Psychological Review, 85, 341-354.
Dowling, W.J., & Bartlett, J.C. (1981). The importance of interval information in long
term memory for melodies. Psychomusicology, 1, 30-49.
Dowling, W.J., Kwak, S., & Andrews, M.W. (1995). The time course of recognition of
novel melodies. Perception and Psychophysics, 57, 136-149.
Drake, C., & Palmer, C. (2000). Skill acquisition in music performance: Relations
between planning and temporal control. Cognition, 74, 1-32.
Essens, P.J. (1995). Structuring temporal sequences: Comparison of models and factors
of complexity. Perception and Psychophysics, 57, 519-532.
Fink, I., & Merriell, C. (with the Guarneri String Quartet). (1985). String quartet playing.
Neptune City, NJ: Paganiniana Publications.
Gabrielsson, A. (1999). The performance of music. In D. Deutsch (Ed.), The Psychology
of Music (2nd ed.) (pp. 501-602). San Diego, CA: Academic Press.
Gjerdingen, R.O. (1989). Meter as a mode of attending: A network simulation of
attentional rhythmicity in music. Intégral, 3, 67-91.
Goodman, E. (1998, July). 1 + 1 = 2? The ensemble performance of Chopin's Cello
Sonata. Paper presented at the Tenth International Conference on Nineteenth-Century
Music, University of Bristol, United Kingdom.
Goodman, E. (2000, March). Ensemble rehearsal: Analysis and psychology in practice.
Paper presented at the SMA and SPRMME Study Day 'Pathways to Musical
Understanding: Analysis and Psychology in Conjunction', University of Reading, United
Kingdom.
Gruson, L.M. (1988). Rehearsal skill and musical competence: Does practice make
perfect? In J.A. Sloboda (Ed.), Generative processes in music (pp. 91-112). Oxford:
Clarendon Press.
Handel, S. (1973). Temporal segmentation of repeating auditory patterns. Journal of
Experimental Psychology, 101, 46-54.
Handel, S. (1984). Using polyrhythms to study rhythm. Music Perception, 1, 465-484.
Sloboda, J.A. (1982). Music performance. In D. Deutsch (Ed.), The Psychology of Music
(pp. 479-496). New York: Academic Press.
Sloboda, J.A. (1985). The musical mind. Oxford: Oxford University Press.
Sloboda J.A. (1996). The acquisition of musical performance expertise: Deconstructing
the "talent" account of individual differences in musical expressivity. In K.A. Ericsson
(Ed.), The road to excellence: The acquisition of expert performance in the arts and
sciences, sports, and games (pp. 107-126). Mahwah, NJ: Lawrence Erlbaum.
Smith, K.C., & Cuddy, L.L. (1989). Effects of metric and harmonic rhythm on the
detection of pitch alterations in melodic sequences. Journal of Experimental Psychology:
Human, Perception and Performance, 15, 457-471.
Smyth, M.M., Morris, P.E., Levy, P., & Ellis, A.W. (1987). Cognition in action. London:
Lawrence Erlbaum Associates.
Summers, J.J., Rosenbaum, D.A., Burns, B.D., & Ford, S.K. (1993). Production of
polyrhythms. Journal of Experimental Psychology, 19, 416-428.
Thompson, W.F., & Mor, S. (1992). A perceptual investigation of polytonality.
Psychological Research, 54, 60-71.
Wickens, C.D. (1980). The structure of attentional resources. In R. Nickerson (Ed.),
Attention and Performance VIII (pp. 239-257). Hillsdale, NJ: Erlbaum.
Wickens, C.D. (1984). Processing resources in attention. In R. Parasuraman, & D.R.
Davies. (Eds), Varieties of Attention (pp. 63-102). Academic Press.
Wickens, C.D. (1989). Attention and skilled performance. In D. Holding (Ed.), Human
skills (pp. 71-105). New York: John Wiley & Sons.
Wickens, C.D. (1991). Processing resources and attention. In D.L. Damos. (Ed.),
Multiple-task performance (pp. 3-34). London: Taylor and Francis.
Back to Index
Proceedings Keynote
parents, and have high academic expectations. In addition to demographic differences, it is possible
that the students who participate in music instruction for an extensive period of time have different
interests or personal characteristics than do students who never take or discontinue music lessons. The
present study was conducted with children who did not seek to participate in formal music instruction
and who came from a less privileged environment than the one described by Duke at al. Its purpose
was to investigate the effects of three years of piano study on their cognitive development, academic
achievement, and self-esteem. A more detailed description of the study's sample, methodology, and
results regarding the cognitive benefits of piano instruction can be found in Costa-Giomi (1999).
Methodology
Sample
One hundred and seventeen 4th-grade children (58 girls and 59 boys) who had never participated in
formal music instruction, did not have a piano at home, and whose family income was below $40,000
Cdn (approximately $27,000US) per annum participated in the study. Thirty percent of the children
lived in single-parent families and 25% had unemployed parents whose welfare subsidies were less
than $20,000. Sixty-seven children were assigned to the experimental group and 50 children to the
control group.
Treatment
Each child in the experimental group received, at no cost to the families, three-years of individual
piano lessons and an acoustic piano. The lessons, which were taught at the children's schools, were 30
minutes long during the first two years and 45 minutes during the third year. Nine experienced
teachers followed a traditional piano curriculum characteristic of Canadian conservatories.
Testing
Prior to the treatment, children in both the control and experimental groups were administered five
standardized tests with adequate reliability levels for the age of the sample: Level E of the Developing
Cognitive Abilities Test (DCAT), the tonal and rhythmic audiation subtests of the Musical Aptitude
Profile, the fine motor subtests of the Bruininks-Oseretsky Test of Motor Proficiency, the language
and mathematics subtests of the level 14 of the Canadian Achievement Test 2 (CAT2), and the
Coopersmith Self-Esteem Inventories (long form). At the end of the first, second, and third year of
instruction, children took the appropriate level of the DCAT (i.e., E, F, and G levels respectively), and
the self-esteem test. At the end of the second and third year of piano instruction, children also took the
language and math subtests of the CAT2, levels 15 and 16 respectively. Tests were administered to
mixed groups of experimental and control subjects in the same order in all schools. I studied children's
academic performance in school through the analysis of their school report cards from third grade
(one year prior to the start of piano instruction) to sixth grade and focused on the effects of piano
instruction on four subjects: music, math, English, and French. The piano teachers completed weekly
progress reports about children's attendance to the lessons and their practice routine throughout the
three years of instruction.
Results
In order to determine the effects of piano instruction on cognitive abilities, self-esteem, and academic
achievement, I compared the experimental and control groups throughout the duration of the project
through ANOVAs with repeated measures followed by Scheffe post hoc comparisons. I conducted
these analyses with a number of additional independent variables: sex, income (<$20,000, $20,000 -
$30,000 Cdn., or $30,000 - $40,000 Cdn.), family structure (single- or two-parent family), and
parental employment (0, 1, or 2 employed parents). I also compared the children who dropped out of
the piano lessons during the duration of the study with those who never participated in formal music
instruction and who participated in the piano lessons for three years.
Cognitive abilities
Results showed that the experimental group's spatial scores were significantly higher than those of the
control group after one and two years of instruction and that the groups' spatial scores did not differ
prior to the treatment or at the end of the intervention. After two years of treatment, also the general
cognitive ability scores of the experimental group were significantly higher than those of the control
group. No differences in quantitative and verbal abilities between the control and experimental groups
could be established throughout the duration of the project. Gender, income, family structure, and
parental employment did not interact with the treatment in a significant way. No differences between
the drop-out and control groups or between the drop-out and experimental groups could be established
for any of the subtest scores or the total scores of the DCTA.
I conducted multiple regression analyses to explore the effects of specific components of the treatment
on the cognitive development of children in the experimental group. These components were
dependent upon the subjects motivation to learn the piano, variable which could have affected the
relationship between music and cognitive development. Motivation to learn the piano, as measured by
lessons missed and average practice time per week, explained 21% of the variance in spatial abilities
and 22% of the variance in total cognitive abilities after three years of piano instruction.
Self Esteem
The results of the analyses showed that the total self-esteem scores of the experimental group
increased significantly from 1994 to 1997 but those of the control group did not. It was noticed that
the School self-esteem scores of the experimental group tended to increase throughout the three years
while those of the control group tended to decrease. The analyses which included gender, income,
family structure, and parental employment as additional variables yielded similar results to the ones
reported earlier and showed no interactions between piano instruction and these variables. Further
analyses indicated that the total self-esteem scores of children who completed the three years of
instruction improved significantly, while those of the dropout and control groups did not.
Academic achievement
The results of the analyses did not show any significant effect of piano instruction on children's total
scores in the language and math subtests of the academic achievement test. The analyses of partial
scores in each of the two math subtests and two language subtests revealed no significant difference
between the control and experimental groups. It was noticed that the math computation scores of the
experimental group tended to be higher than those of the control group especially after two years of
instruction. The analyses which included of gender, income, family structure, and parental
employment as additional variables yielded similar results to the ones reported earlier. The analysis of
math computation scores which included income as an independent variable showed a significant
interaction between Year and Group which was not found in the previous analysis. There was a more
pronounced improvement in the experimental group's math computation scores than those of the
control group.
When re-analyzing the data to study differences in academic achievement among the children who
completed the three years of piano lessons and those who dropped out of or never participated in
piano instruction, it was found that the experimental group obtained higher total language scores than
the control group after two years of instruction.
School performance
The analyses of data from the school report cards showed that piano instruction affected children's
school music marks. The experimental group obtained significantly higher music marks than the
control group after two years of instruction. It was also found that the marks of the control group
varied significantly throughout the three years of the project while those of the experimental group did
not.
Although the analysis of math marks also showed a significant effect of piano instruction on school
math performance, post-hoc analyses did not reveal any differences between the control and
experimental groups. Similarly, no effects of piano instruction on children's school performance in
language subjects were found even when including sex, income, family structure, and parental
employment as additional independent variables into the analyses. The consideration of these
variables did not modify the results presented regarding children's math and music marks either. The
analyses of school marks of children who completed, did not complete, or never participated in the
three years of piano instruction, yielded similar results to the ones reported earlier. The dropout
group's marks did not differ from either the control's or the experimental group' marks.
Other effects
The analysis of data gathered through regular interviews with the parents, teachers, and students
showed a few interesting trends. Almost half of the parents of children in the experimental group
reported that during the three years of the project they provided their other children with music
lessons. With the exception of one child in the control group, children who expressed interest in
pursuing music careers were all in the experimental group. More children in the experimental group
than in the control group showed interest in playing musical instruments at the end of the project.
Parents and children in the experimental group attended more concerts and recitals than those in the
control group throughout the three years of the project. No differences in personal traits, as rated by
the parents, between the control and experimental groups could be established either before or after
the treatment.
Discussion
The results of the study suggest that one of the benefits associated with piano instruction is the
development of children's self-esteem. The total self-esteem scores of the experimental group showed
a more pronounced (and statistically significant) improvement than those of the control group during
the three years of instruction regardless of family income, sex, family structure, and parental
employment. It is important to mention that the treatment of this project involved not only the piano
lessons but also many other special events such as owning a piano, playing in recitals, and getting
individual attention from a caring teacher. Although traditional piano instruction involves all these
elements and, as such, contributes to the development of self-esteem in children, future research might
try to establish the individual contribution of each these elements.
As discussed elsewhere (Costa-Giomi, 1999), the results of the study corroborate that piano
instruction produces temporary improvements of general and spatial cognitive abilities. Children
receiving piano lessons obtained significantly higher general and spatial cognitive scores than those
not participating in formal music instruction after one and two years of treatment. However, no
differences between the groups could be established after the third year of instruction. Additional
findings suggest that motivation to study piano plays an important role in the relationship between
music instruction and cognitive development. Apparently, children who apply themselves benefit to a
larger extent than do those who do not practice or miss lessons.
The results of the study did not show any significant contribution of piano instruction to children's
academic achievement in language and math as measured by standardized tests. I noticed, however,
that children receiving the piano lessons tended to obtain higher math computation scores than those
not participating in formal music instruction after two years of treatment and that those who
completed three years of piano instruction obtained higher language scores than those who
discontinued the lessons.
The effects of piano instruction on children's academic achievement in school, as measured by school
report cards, were similar to the ones measured through standardized tests. No benefits of piano
instruction were evident from the analyses of school marks in language subjects and math. Although
the math marks of the children receiving and not receiving the lessons changed significantly from
third-grade to sixth-grade, the results did no indicate any clear overall gain or loss for any of the
groups. The lack of other effects of piano lessons on school performance are in fact positive signs that
this type of instruction does not necessarily add excessive strain on children's academic
responsibilities. Children in the experimental group were able to meet the academics demands of
school on top of those associated with the study of piano.
Participation in piano instruction had other interesting effects on children and their families. Families
whose children were taking lessons showed an increased interest in music participation than those
whose children were not engaged in formal music instruction. The siblings of almost half of the
children who were offered the piano lessons through this project subsequently started formal music
instruction at their parents' initiative and families with children involved in the piano lessons attended
more concert than those with children in the control group. Participation in music instruction actually
opened othercareer options for the children in the experimental group, options not even considered by
those with no formal music instruction experience.
In summary, the results of the study show that piano instruction benefits children in various ways but
that the scope of these benefits may be more limited than previously suggested.
References
Costa-Giomi, E. (1999). The effects of three years of piano instruction on children's cognitive
development. Journal of Research in Music Education, 47, 198-212.
Duke, B., Flowers, P., & Wolfe, D. (1997). Chilodren who study piano with excellent teachers in the
United States. Bulletin of the Council for Research in Music Education, 132, 51-85.
Harding, J. A. (1990). The relationship between music and language achievement in early childhood.
Dissertation Abstracts International, 52 (10A), 3148.
Hurwitz, I., Wolff, P., Bortnick, B., & Kokas, K. (1975). Nonmusical effects of the Kodaly music
curriculum in primary grade children. Journal of Learning Disabilities, 8(3), 45-52.
Kemp, A. (1996). The Musical Temperament: Psychology and Personality of Musicians. New York:
NY, Oxford University Press.
Kooyman, R. J. (1989). And investigation of the effect of music upon the academic, affective, and
attendance profile of selected fourth grade students. Dissertation Abstracts International, 49 (11-A),
3265.
Lamar, H. B. (1990). An examination of the congruency of music aptitude scores and mathematics
and reading achievement scores of elementary children. Dissertation Abstracts International, 51
(3-A), 778-779.
Legette, R. M. (1994). The effect of a selected use of music instruction on the self- concept and
academic achievement of elementary public school students. Dissertation Abstracts International,54
(7-A) 2502.
Linch, Sheryl Ann (1994). Differences in academic achievement and level of self-esteem among high
school participants in instrumental music, non-participants, and students who discontinue instrumental
music education. Dissertation Abstracts International, 54 (9-A), 3362.
Lomen, D. O. (1970). Changes in self-concept factors: A comparison of fifth-grade instrumental
music participants and nonparticipants in target and nontarget schools in Des Moines, Iowa.
Dissertation Abstracts International, 31, 3962A.
Michel, D. E. (1971). Self-esteem and academic achievement in black junior high school students:
effects of automated guitar instruction. Council for Research in Music Education, 24, 15-23.
Michel, D. E. & Farrell, D. M. (1973). Music and self-esteem: disadvantaged problem boys in an all
black elementary school. Journal of Research in Music Education, 21, 80-84.
Persellin, D. (2000, March). The effect of activity-based music instruction on spatial-temporal task
performance of young children. Paper presented at the National Biennial Music Educators National
Conference. Washington, DC.
Rauscher, R., Shaw, G., Levine, L., Ky, N., & Wright, E. (1994, August). Music and spatial task
performance: A causal relationship. Paper presented at the Annual Convention of the American
Psychological Association, Los Angeles: CA.
Schreiber, E. H. (1988). Influence of music on college students' achievement, Perceptual and Motor
Skills, 66, 338.
Wamhoff, M. J. (1972). A comparison of self-concept of fourth grade students enrolled and not
enrolled in instrumental music in selected schools in the Barstow, California School District.
Dissertation Abstracts International, 33, 626A.
Wagner, M. & Menzel, M. (1977). The effect of music listening and attentiveness training on the
EEGs of musicians and nonmusicians. Journal of Music Therapy, 14, 151-164.
Wood, A. L. (1973). The relationship of selected factors to achievement motivation and self-esteem
among senior high school band members. Dissertation Abstracts International, 35, 1150A.
Tests
Back to index
Motivation and expertise: The role of teachers, parents and ensembles in the
Ilari, B. development of instrumentalists 1
Miura, M. A conceptual design of a CAI system for basse donnee in harmony theory 1
The effects of background music on memory under the perspective of irrelevant speech
Mullensiefen, D. effect and context-dependant memory 1
Starting to play a musical instrument: parents' and children's expectations of the learning
Pitts, S. process 1
The social cost of expertise: personality differences in string players and their
Stepanauskas, D. implications for the audition process and musical training 1
Proceedings paper
'A music system, its style, its main characteristics, its structure are all very closely associated with the
particular way in which it is taught. Not only what is taught but also the activities involved in learning
can tell us what is valued in a music.' (Bruno Nettl 1983, 331-332).
'Colourstrings' teaching is especially designed as pre-school instrumental education. The method was
first developed for violin by Dr. Géza Szilvay in the early 1970's. My research is based on the Violin
ABC books and on practical observations of lessons at East Helsinki Music School, Finland. In this
paper, I throw light on two important ideas related to the cross modal associations, which characterise
this education to performance approach.
In his 'Colourstrings' teaching, Szilvay's first goal is to construct the hearing of the young child. Many
of the ideas developed in his instrumental approach for violin are an extension and application of
Kodaly's principles. Szilvay does not use conventional music notation during the initial year(s) of
learning. The first page of book A (fig. 1.) is a picture the child could even have drawn him/herself
when he/she was confronted with the sound of the open violin strings for the first time. It seems as if
the child has differentiated the four specific timbres of the strings, interpreted and expressed
him/herself through a medium he/she is familiar with: drawing pictures and using colours. As such,
the code is fully comprehensible, the child understands the relation between the pictorial code and
sound instantly. These four connoted characters awaken the interest and involvement of the child and
play a crucial part in the creative processes of structured hearing and musical cognition while playing
the violin.
Fig. 1. : The visual representation of the auditory sound image of the four violin strings: the opening
page of book A of the Violin ABC of Dr. G. Szilvay.
The musical code, designed for and by adults, is often forced prematurely upon the pre-school
instrumentalist. With this background, I am particularly keen to defend the meaningfulness of code as
used in 'Colourstrings' when teaching pre-school children an instrument, in this case the violin.
My study addresses two issues:
-The importance of a colour code in preparation of sight reading conventional notation.
-The semantic meaning and importance of colours in stimulating the personal engagement of the
young player.
I have decided to investigate the stage when 'Colourstrings' pupils have assimilated body movements
adequately in relation to the colour code. The sound of the four strings, coded as a green bear, a red
father, blue mother and a yellow bird, assist the children to identify the sound and direct the required
position of the right arm. With practice, the movements become trained and ingrown. The code
functions as memory aid and programs the inner body and inner ear in preparation of this process. The
young child masters the soundscape he/she is creating and re-creating. Colours and pictures encode
additional concepts needed in this process of re-creation. This code is comprehensible to the
five-year-old and inspires to investigate more and with ease. The child, curious and motivated,
practices more and most importantly, shows enjoyment in practising.
Imagination is the first goal of music education: musical cognition is built on intersense
imagery.
The following tests were given in September 1999, after one month of tuition, and in February 2000,
after six months of tuition. The children were assessed in groups and placed in rows, in order not to
copy each other. The advanced group was evaluated after one year of playing, in September 1999. For
the beginners' groups, colours were added to critical places in string crossings. The most advanced
children, did not need any colour indications and performed the string crossings without hesitation.
Test 1.
After one month of tuition, the six children of the beginners' group were tested in the following way:
Colours or characters were called out in a free order. With closed eyes, the pupils selected the
required position of the right arm, without making a sound as to not reveal the 'answer' to the others.
This resulted in a 'miming' of the string attack and an upward circular right arm movement.
Test 2.
Six months later, the ability to read conventional notation was tested with the same group.
The beginners' group was confronted with new material, conventionally written. The notation
included a fingering in colour where confusion about the string crossing would arise.
Test 3.
After one year of playing, the advanced group's level of sight-reading from conventional notation was
tested.
The group was confronted with a sight-reading, which included more than one string crossing.
Neither colours nor fingerings were given.
Results.
Test 1: after one month of 'Colourstrings' training, the string crossing reflexes were synchronised and
well assimilated by all.
Test 2: was performed without hesitation when colours and/or coloured fingering were added.
Test 3: proved that there was no need to add colours above the conventional notation. All children
performed their string crossings as expected.
Test 1 reveals that from the first month of 'Colourstrings' tuition, the right arm's spatiotemporal
movements are accurately trained to perform string crossings. With closed eyes, the miming of the
string attack is faultless. The sight-reading results in tests 2 and 3 are different. In comparing tests 2
and 3, we notice that there is a transition period of approx. 6 months when a coloured fingering cues
string crossings. After one year of tuition, perfect assimilations of the right arm movements result in a
well-directed performance of colour-free sight-readings including string crossings.
In general, 'Colourstrings' pupils prove to be efficient sight-readers. The audiation or inner hearing is
first colour (timbre) specific before it is pitch specific. From imagining what should be heard, what
should have moved differently, the children control and become 'masters of themselves'. Confronted
with a colour, the instant processing of aural and body imagination is an anticipation in the mind what
the produced sounds have to sound like and which movements are involved. The mental link between
the visual colours and the aural timbre of the string is an internal imaginative process. The four chosen
colours are a mnemonic device to practically overcome string crossings but, equally function as
specific timbre semantic when shifting the left hand and playing on the same string is involved.
Discussion.
Test 3 proves that the transition to conventional notation happens approximately after one year. As
children 'grow out of the colours', the temporary effect of 'coloured hearing' is not lasting. However,
as a mnemonic, colours simplify the process of comprehending and reinforce the development of
reading. Intuitively, the child is eager and demonstrates enjoyment: he/she becomes committed. I like
to link both issues of my study. The musical grammar, which Szilvay presents at the level of thinking
of the child is easy to decode. As such, the young player is stimulated to repeat the experience and
practices more. He/she is capable to encode and eager to show this progress through playing.
However, are colours and pictures the only mnemonic aid? Social learning - in 4 group lessons a week
- is likely to play an important part in the progress of the groups here studied. According to Elliott,
constructive knowledge is worth doing, as the social environment stimulates it. (1993) The social
environment is first created by setting up a primary school within a music school, second by
organizing players of similar ability in the same group.
Dr. Szilvay was motivated to develop the use of colours, because of the inadequacy of the
conventional notation to the mind of the pre-school child. When the conventional code system as we
know it, does not awaken any meaning to the young learner, the child does not see any application for
it in relation to his/her playing. By using 'a sign system based on previous experiences' (
Dewey/Reichling ) - Szilvay conveys to the child that the meaning of music is to reveal his/her inner
world and communicate with the outer world through playing. Szilvay's first message to convey to the
child is that music is an expression, it is the revelation of inner feelings. Through colours, the child's
self-transformation is not based on mechanical routines and abstract comprehension, but is an active
process which the child can respond to affectedly. That, which is transformed here as well, is the mind
of the adult. 'Colourstrings' teaching depends on the teacher's ability to reach the artist in the child.
The teacher has to know 'how to 'music' and how to teach others 'to music.' (Elliott 1991-93, 24) The
application of colour in code, mind and sound demands in the first place an adaptation of the mind of
the adult/teacher to the mind of the child.
I have mentioned two other factors which need consideration when relating to the observations and
tests done with the first violin class of East Helsinki Music School: social learning and the teacher
involved. I acknowledge that there are more factors involved for example related to the background
the children have from home. These issues, I have deliberately limited in this presentation but are
open to further discussion.
If colours and images speak, they can make the bow sing and a pizzicato chuckle.
There are three implications I want to highlight here. First, the importance of colours in developing
the identity of a string and its position, so that the process of sight reading, based on a faster
assimilation of body control, can take place at an early stage in the development of the young
instrumentalist. Second, my hypothesis is, that colours can be seen as a semantic of interpretation and
thereby stimulate the child to become improvisers and interpreters from the start. Third, the music is
not only understood as a performance practice but the inner musical 'chat' the child has developed
along the process of structuring the hearing. The child has understood that he/she can encode an
imagined aural experience visually.
The application of colours in constructing the inner hearing has been applied to other instrumental
teaching. 'Colourstrings' methods exist for violin, cello, double bass, guitar, piano and flute. Each uses
colours to simplify the code to the pre-school child. Colours speak to the young child and intensify the
visual perception linked to specific aural timbre expectations. These will be heard in imagination so
that precise right arm movements can be directed. The colours function then as a mnemonic aid in
developing accurate sigh reading skills. As the 'colour hearing' does not last, there are no problems in
transition to reading the conventional code. Tests and almost thirty years of teaching experience world
wide, have proven that children learn faster the conventional code, are better sight readers and as such
can join the string group activities at an early stage in their development. Through colours, the
learning process has been intensified.
However, the meaning of the code and the musical theory is learned in a playful way. As the child
could make sense out of colours, music reading becomes more valuable. In addition, colours speak to
the emotion of the child and suggest that the expression of inner feelings can add other dimensions to
decoding or reading. In my opinion, the emphasis on timbre, which Dr. Szilvay's initiates in his
'Colourstrings' approach, is at the basis of the process of musicing. As Davidson claims: 'My choice of
string on which I play a given note now reflects my understanding of the role timbre plays in my
interpretation. As my options expand, the choices I make become increasingly significant. This
dramatically transforms my perspective of the musical experience as well as the nature of
performance.' ( L. Davidson 1994, 103) Thus, visual colours as semantic of the string timbre imply an
interpretation and technical ability in function of expressing feelings. The child has quasi
independently discovered how to music musically. The musical knowledge, the violin aesthetics and
the ability to imagine, be personally engaged and give an emotional response, are all combined in the
colour code. It is this total comprehension, without notes or words but through colour and pictures,
which is at the basis of making music in an intellectual and musical way.
It must be noticed, that from as young as pre-school age, the child not only enjoys the practice of
decoding and re-creating, but equally improvises and interprets known or new tunes. He/she not only
does so through playing, but is equipped with the knowledge to encode what he/she has created. As
the child's musical and expressive knowledge and inner hearing has developed from listening for
colours s/he was looking at, the choice of timbre according to his/her own inner feelings when
improvising a tune will correspond with a visual colour in the young composer's code. As the mind
makes new combinations, the child not only possesses the skills to play these almost simultaneously -
as in the case of improvisation - but can also encode the ongoing musical 'chat' in his/her mind.
Already from this stage, expressive meaning is engraved in the code. The 'Music is born out of the
need to express ourselves and to communicate aesthetically through the abstractions and
characteristics of sound.' (R. Aiello 1994, 44). Music does not only mean sound, but from now on is
recognised as a system of coding.
In 'Colourstrings' teaching, Dr. Szilvay re-structures and re-values the instrumental approach of the
very young, based on the relative theories of Kodaly. He aims at developing a musical constructed
knowledge by applying a colour code appropriate to the level of comprehension of the child, who
becomes a musical master of him/herself from the very start. At this point, Szilvay overtakes Kodaly:
even before reading notes, the pre-school violinist has grasped the concept of musicing in relation to a
compact but meaningful picture borrowed from his/her own familiar world and has experienced music
making and music coding/decoding in several dimensions.
REFERENCES
Aiello, R. (1994) Music and Language. Parallels and Contrasts. In R. Aiello (Ed.) Musical
Back to index
Proceedings paper
Discussion of the beneficial effects of music on child development often centers on cognitive
development, such as spatial-temporal reasoning (e.g. Rauscher et al., 1997). In contrast, the present
research addresses the effects of musical activities on an area of social and emotional development,
the self-concept. This research investigates why some young children think of themselves as musical,
and hence are motivated to do music activities, whereas others do not think of themselves as musical.
Because music programs are often the first to go when the budgets of schools in the U.S. are cut, it is
important for us to understand the consequences of a lack of music education for the development of
children's musical self-concept and their possible involvement in future musical activities.
Even when elementary schools in the U.S. have music programs, it is common for children not to
have an opportunity to learn to play a musical instrument in school until fourth grade. Unfortunately,
by the time children are 9 years old, we may have missed valuable years in terms of the formation of
their self-concepts. According to Piaget, children enter concrete operations when they grasp the ideas
of conservation and seriation between the ages of 5 and 8 (Boden, 1979). Commensurate with these
newfound cognitive abilities, children begin to think of themselves and other people in terms of traits,
and to use social comparison to rank individuals, including themselves, to others in different domains
(Sameroff & Haith, 1996). For example, a child might go from thinking of someone as a person who
makes cookies and cakes to thinking of someone as a person who is good at baking. Because the
transition from thinking of oneself and others as doing things to thinking of oneself and others as
having abilities takes place between the ages of 5 and 8, it is critical for developmental psychologists
to study the early formation of self-concepts in children as young as 5 to 8 years old.
Unfortunately, however, most studies of children's self-concepts have been carried out with older
children, because it is easier to administer measures to children who can read. Although it is possible
to read 64 self-concept questions aloud to young children (Marsh, Craven, & Debus, 1991), it is
probably not too engaging a task for young participants. A fairly recent innovation methodologically
is the use of puppets with very young children (Eder, 1990; Ablow & Measelle, 1993), but a pictorial
instrument is most widely used to study the self-concepts of young children (Harter & Pike, 1983).
The Pictorial Scale of Perceived Competence and Social Acceptance for Young Children (PSPCSA)
is developmentally appropriate and user-friendly, but neglects to measure any kind of musical or
artistic self-concept that children may have. For the present research, a new instrument was devised
which measures musical self-concept and artistic self-concept, as well as self-concept in other areas.
To do this, the format of the PSPCSA was retained, but the content - pictures and questions - were
modified.
In addition to assessing children's self-concepts, the present study assessed family musical
environment through a parental questionnaire. Although the family environment that children grow up
in probably affects their self-concepts, including their musical self-concept, surprisingly little research
has been conducted on family influence on children's musical development. No research directly
addresses the initial formation of young children's musical self-concept, which is likely a necessary
precursor for individuals to continue to be involved in music as they get older and perhaps later decide
to become professional musicians. A study of concert pianists from the U.S. found that 19% came
from families with no prior musical involvement (Sosniak, 1985). Perhaps what might matter more to
a child's musical development than whether a child's family members are musically involved
themselves is whether a child's family is supportive of the child's musical activities. For example, in a
British study of musically gifted 10- to 18-year-olds, 86% of the students benefited from some form
of parental encouragement or pressure to practice (Sloboda & Howe, 1991). Note that this study did
not include children as young as those in the present research. Because parents' attitudes towards
music may affect their children's musical self-concepts, an additional component of this study was a
parent questionnaire assessing attitudes towards music and family musical environment. Not much
research has focused on parents' attitudes towards music, with such notable exceptions as the 1994
Gallup poll measuring Americans' attitudes towards music, and a measure of the home musical
environment which correlated with second graders' musical ability as assessed by teachers (Brand,
1986).
METHOD
Participants
Participants were 88 children between the ages of 5 and 8, and their parents. The sample consisted of
43 first-grade students and 45 second-grade students. There were 46 girls and 42 boys. The Northern
California school district from which students were drawn is about 44% European American, 29%
Latino, 19% Asian American, and 6% African American. Most were attending a public elementary
school which was the first to participate in a pioneering music program, Guitars in the Classroom,
with the remainder attending a neighboring public school without a music program.
Materials
Self-Concept: A new adaptation of the Pictorial Scale of Perceived Competence and Acceptance for
Young Children (Harter & Pike, 1983) was used. Whereas the original measure assessed scholastic
competence, physical competence, peer acceptance, and maternal acceptance, the revised measure
assesses musical competence, artistic competence, scholastic competence, physical competence,
prosocial competence, and acceptance of physical appearance. Child participants point to which of
two pictured children is more like them, then answer a second question asking to what degree they are
like the pictured child. This two-step process results in each variable being measured on a 4-point
forced choice scale.
Questionnaire: Parents were asked questions about the family's active experience with music, the
family's listening habits, and the parents' attitudes towards music. Some questions were adapted from
Brand's (1986) Home Musical Environmental Scale (HOMES).
Procedure
Child and parent participation was on a voluntary basis. Child participants were individually
administered the revised version of the Pictorial Scale of Perceived Competence and Acceptance for
Young Children at the school site. Parent participants filled out the questionnaires at home and sent
them back to school with their children. Data were collected at the beginning and at the end of the
school year.
Design
A 2 (grade) x 2 (school) x 2 (gender) x 2 (within) ANOVA was used to analyze child self-concept
data. Chi-squares and correlations were used to analyze parent questionnaire data.
RESULTS
Child Self-Concept
Music vs. Other Activities: For the overall mean values of the different areas of self-concept, see Table
1. Apparently, the first and second graders studied are more confident in their abilities to do anything
measured besides sing and play musical instruments.
TABLE 1 - MEAN VALUES OF CHILD SELF-CONCEPT
IN DIFFERENT AREAS
Spring
Self-Concept Area Fall
Developmental Differences: See Figure 1. Not surprisingly, second graders rated themselves better at
reading and at math than did first graders, an assessment that may have a basis in reality. In contrast,
however, second graders rated themselves worse at singing and at playing musical instruments than
Gender Differences: See Figure 2. Girls rated themselves better at singing than did boys. Boys rated
themselves better at climbing than did girls.
musical abilities in the spring. Specifically, parents' endorsement of the statement "I can't imagine life
without music" was positively correlated with child's later self-rated singing ability (r=.34, p<.05).
DISCUSSION
In the early elementary years, children's self-concepts in music are low compared to their
self-concepts in other areas, such as physical activity. The current research, which found that first- and
second-graders' self-concepts in playing a musical instrument and in singing were lower than their
self-concepts in other areas, corroborated Eccles et al.'s (1993) finding that first-, second-, and
fourth-graders' self-concepts in playing a musical instrument were lower than their self-concepts in
other areas. In fact, although in the current study children's self-concepts in most areas increase or
remain relatively stable from first to second grade, their self-concepts in music undergo a decline
during this period. In contrast, Eccles et al. (1993) did not find significant differences in children's
self-concepts in most areas from first to second grade, but did from second to fourth grade. The
difference between the findings of the current study and Eccles et al.'s (1993) findings perhaps can be
explained by the differing methodologies used to assess self-concept. Eccles et al. (1993) read
questions aloud to children, who used a pictorial representation of a 7-point Likert scale to respond.
Since first- and second-graders have not necessarily mastered the ability to make multiple
comparisons yet, and hence would not necessarily understand the relatively complex 7-point Likert
scale, this method would be less sensitive to measuring their self-concept than our method of asking
two dichotomous questions in succession. In any case, Eccles et al.'s (1993) findings suggest that it is
crucial to supply musical activities to children before third or fourth grade, before they become
convinced that they are not good at music and fail to develop any interest in doing musical things. Our
findings, using a methodology more compatible with the cognitive limitations of first- and
second-graders, suggest that it may be critical to introduce children to music even earlier. This
empirical finding is supported by developmental psychology theory asserting that the ages from six to
eight are vital to the early formation of self-concept, i.e. when children begin to think of themselves
and other people in terms of traits, not just their physical characteristics and the things they do
(Damon & Hart, 1982), and when social comparison becomes very important (Sameroff & Haith,
1986).
Previous research regarding children's gender role development and music has found girls and boys to
gravitate towards different musical instruments as early as in third grade (Abeles & Porter, 1978), and
that girls express more favorable attitudes towards music than boys in the third through sixth grades
(Nolin, 1973). Children's gender role development with respect to music has rarely been studied
before third grade. In a notable exception, the self-concepts of first-, second-, and fourth-graders were
studied (Eccles et al., 1993); girls perceived themselves as better at music and reading than boys
perceived themselves to be, and boys perceived themselves as better at sports and math than girls
perceived themselves to be. In the current study, two of these findings were corroborated; girls have
more confidence in their singing abilities than boys have in theirs, and boys have more confidence in
their climbing abilities than girls have in theirs. Future research could investigate the phenomenon of
boys already thinking that they are not as good at singing relative to girls as early as first grade by
administering a measure of gender role development to participants. Then it would be possible to
discover whether girls are more likely to perceive themselves as good at music, or whether
feminine-stereotyped individuals are more likely to perceive themselves as good at music.
Parental perceptions of children's competence in certain areas can affect the development of children's
competence in those areas. With regard to math, English, and sports, parents' perceptions of their
children's abilities have been shown to be influenced by children's gender, and not necessarily by
children's actual performance (Jacobs & Eccles, 1992). Since previous research has not been carried
out to study parents' perceptions of their children's competence in music, this exploratory research is
noteworthy for showing a link between positive parental attitudes towards music (belief in the
importance of music for themselves personally) and child musical self-concept. Future data collection
could be improved by having parents rate their attitudes on a Likert scale rather than by globally
endorsing or failing to endorse a statement. In previous research with adolescents talented in music
and other arts, most participants reported that someone's approval helps motivate them to do their
music or other art; a mother's approval is most frequently mentioned (Chin, 1997). Since many of the
adolescent participants in the aforementioned study had been involved in music and other arts for
several years, it is of vital importance to carry out more research with parents of young children.
REFERENCES
Abeles, H. F., & Porter, S. Y. (1978). The sex-stereotyping of music instruments. Journal of Research
in Music Education, 22.
Ablow, J. C., & Measelle, J. R. (1993). Berkeley Puppet Interview: Administration and scoring system
manuals. Berkeley: University of California.
Boden, M. A. (1979). Piaget. Brighton: Harvester Press.
Brand, M. (1986). Relationship between home musical environment and selected musical attributes of
second-grade children. Journal of Research in Music Education, 34, 111-120.
Chin, C. S. (1997). The social context of artistic activities: Adolescents' relationships with friends and
family. Unpublished manuscript, University of California at Santa Cruz.
Damon, W., & Hart, D. (1982). The development of self-understanding from infancy through
adolescence. Child Development, 53, 841-864.
Eccles, J., Wigfield, A., Harold, R. D., & Blumenfeld, P. (1993). Age and gender differences in
children's self- and task-perceptions during elementary school. Child Development, 64, 830-847.
Eder, R. A. (1990). Uncovering young children's psychological selves: Individual and developmental
differences. Child Development, 61, 849-863.
Jacobs, J. E., & Eccles, J. (1992). The impact of mothers' gender-role stereotypic beliefs on mothers'
and children's ability perceptions. Journal of Personality and Social Psychology, 63, 932-944.
Harter, S., & Pike, R. (1983). The Pictorial Scale of Perceived Competence and Social Acceptance for
Young Children. Denver, CO: University of Denver.
Marsh, H. W., Craven, R. G., & Debus, R. (1991). Self-concepts of young children 5 to 8 years of
age: Measurement and multidimensional structure. Journal of Educational Psychology, 83, 377-392.
Nolin, W. H. (1973). Attitudinal growth patterns toward elementary school music experiences.
Journal of Research in Music Education, 21, 123-134.
Rauscher, F. H., Shaw, G. L., Levine, L. J., Wright, E. L., Dennis, W. R., & Newcomb, R. L. (1997).
Music training causes long-term enhancement of preschool children's spatial-temporal reasoning.
Neurological Research, 19, 2-8.
Sameroff, A. J., & Haith, M. M. (1996). The five to seven year shift: The age of reason and
Back to index
Proceedings paper
Children have difficulty in perceiving harmonic changes until the age of eight or nine (Bentley, 1966;
Franklin, 1956; Hufstader, 1977; Imberty, 1969; McDonald & Simons, 1989; Merrion, 1989; Moog,
1976; O'Hearn, 1984; Petzold, 1966; Schultz, 1969; Shuter-Dyson & Gabriel, 1981; Simons, 1986;
Taylor, 1969; Vera, 1989; Zimmerman, 1971). Young children do not react adversely to dissonant
accompaniments to a melody (Antochina, 1939; Believa-Exempliarskaia, 1925; Bridges, 1965; Moog,
1976; Rupp, 1915, cited in Funk, 1977; Revesz, 1954; Sloboda, 1985; Teplov, 1966), or dissonant
chords and intervals (Valentine, 1913, Yoshikawa, 1972) and are inconsistent when identifying
dissonant and consonant stimuli (Zenatti, 1974). They often fail to perceive the difference between a
theme and its harmonic variations (Hufstader, 1977; O'Hearn, 1984; Pflederer & Sechrest, 1968;
Taylor, 1969) and in identifying the number of tones present in a chord (Vera, 1989). They also have
difficulty in expressing their perception of harmony verbally (Hair, 1981) or through the use of visual
representations (Hair, 1987). These findings have often been taken as an indication that young
children are incapable of perceiving harmony. However, studies have shown that kindergarten
children can recognize simple chord changes (Costa-Giomi, 1994a, 1994b) and 6-year-olds can
identify a chord that is different between pairs of short progressions (Zenatti, 1969). First graders
readily discriminate between pairs of chords (Hair, 1973) and seem confused when asked to sing a
familiar song with unfamiliar accompaniments (Sterling, 1985).
Research has provided little information about how to help young children learn harmonic elements.
Are there any factors that affect children's harmonic perception and understanding and that teachers
can manipulate in order to teach harmony to children effectively? The present study addressed this
question. The purpose of the study was to identify developmental trends in young children's
perception of simple accompaniments to familiar songs and musical factors that affect their harmonic
perception.
Methodology
Children attending kindergarten through 3rd grade at a public school in Montreal participated in the
study. The school had no formal music program. The classroom teachers, who developed singing
activities, did not teach harmonic concepts to the children and did not use any harmonic musical
instrument to accompany the children's singing. There were 18 children in kindergarten, 30 in first
grade, 22 in second grade, and 21 in third grade.
Children were provided with 10 weeks of music instruction. The 30-minute weekly lessons were
taught by a music specialist and focused on harmonic elements. The goals of the short music program
were for the children to learn to identify and play a simple chord progression (I V I) on the
omnichord, to sing short songs with an accompaniment of tonic and dominant chords, to identify
chord changes in more complex chord progressions, to perceive the difference between chord changes
and chord position changes, and to learn that most songs end in a tonic chord. Children learnt songs
based on I and V including three songs used in the posttest, played the accompaniment to these songs
individually on the omnichord, wrote the accompaniment of the songs on the board, played games
based on the aural discrimination of chord changes and chord position changes, and practiced how to
identify the chords in simple progressions played in the omnichord.
The posttest had two parts. The first part was a paper and pencil group test and the second one was an
individual test requiring singing, the performance of a simple accompaniment on the omnichord, and
selected perceptual tasks. The present manuscripts reports the results of the first part of the test only.
The first part of the test consisted of two different tasks, one requiring the discrimination of various
accompaniments to familiar songs and the other one requiring the identification of the chords of a
simple accompaniment. First, the music specialist sang a familiar song to the children in four different
ways accompanying herself with the omnichord. For each rendition of the song, children were asked
whether the song sounded right. The rendition considered to be correct was the one children had heard
during the treatment and that was based on the conventional tonic and dominant chords. One of the
incorrect versions presented the chords of the accompaniment in the reversed order, that is, tonic
chords were replaced with dominant chords and dominant chords were replaced with tonic chords. In
the second incorrect version, the accompaniment was transposed a fifth higher in the middle of the
song while the melody was sang in the original key throughout the performance. In the third incorrect
version, both the melody and the accompaniment were transposed a fifth higher in the middle of the
song. Kindergarten children were presented with the four renditions of only one song: "Firilalala."
The other children were presented with the corresponding renditions of two additional songs "Blue
Bird" and "Row Row Row Your Boat." The order of presentation of the four renditions was different
for each of the three songs. All children listened to the stimuli in the same order.
To complete the second task, children were asked to identify the eight chords of the accompaniment
of a familiar song. Children listened to the music teacher sing the refrain of the song "Firilala" six
times with a simple omnichord accompaniment. While they listened to the stimuli, they wrote the I
and V chords in the answer sheet which included the drawings of eight birds representing the eight
chords of the refrain (the song "Firilalala" is about a bird wedding). Kindergarten children were not
asked to complete this task.
Results
Task 1
Children's responses to each of the four renditions of the songs were considered correct and given 1
point or incorrect and given 0 points. Because kindergarten children only listened to one song, two
analyses of variance (ANOVA) with repeated measures were performed with the data. One ANOVA
included the responses of first-, second-, and third-graders who listened to the four renditions of three
songs. The other ANOVA was based on the responses of children in kindergarten, first, second, and
third grade to the four renditions of the song "Firilala".
The results of the first ANOVA showed that grade level and song affected children's responses,
F[2,438] = 5.20, p < .01 and F[2,438] = 13.68, p < .01 respectively. Three interactions were
significant: grade x song, F[4,438] = 5.24, p < .01, grade x rendition, F[6,438] = 2.49, p = .02, and
song x rendition, F[6,438] = 2.32, p =.04. Analyses of simple effects indicated that song affected the
responses of first graders but not those of the older children. While first graders responded more
accurately when presented with the renditions of "Firilala" than those of the other songs, second- and
third-graders responded quite evenly to the three songs. The analyses of simple effects also showed
that grade affected children's responses to two of the incorrect renditions of the songs, the one in
which the dominant and tonic chords were switched and the one in which both melody and
accompaniment were transposed. While these two renditions elicited the lowest scores from first
graders, they elicited the highest scores from children in second and third grade. Further analyses of
simple effects indicated that song affected children's responses to the two incorrect renditions that
included transpositions. While these renditions of "Firilala" elicited the highest responses from most
children, the same renditions of the other songs elicited the lowest scores.
The results of the second ANOVA performed with the data from the song "Firilala" showed that
children's responses differed according to grade and rendition F(3,270) = 4.17, p < .01, F(3, 270) =
4.85, p < .01 respectively. The interaction between grade and rendition was significant, F(9,270) =
2.66, p < .01. Analyses of simple effects indicated that grade affected children's responses to two of
the rendition, the one in which the dominant and tonic chords were switched and the correct rendition.
The comparison of means showed that rendition affected the performance of kindergarteners but not
those of the older children. Further analysis determined that kindergarten children provided more
accurate responses when presented with the two renditions which included transpositions than when
presented with the other renditions.
Task 2
The data from the second task was analyzed in order to see developmental trends in the way young
children perceive chords. An idea that was stressed during the training was that the accompaniments
of most songs usually end in the tonic chord. I found that 60% of the children in first grade identified
the last chord of the accompaniment as I, 95% in second grade did so, and 91% in third grade
identified the last chord accurately.
The last two chord of the accompaniment were tonic chords. I found that 27% of first graders, 32% of
second graders, and 62% of third graders were able to identified these chords accurately.
The first four chords of the accompaniment were I I V V. I found that 20% of first graders, 64% of
second graders, and 86% of third graders identified this simple progression accurately.
The last four chords of the accompaniment were I V I I. Only one child in first grade, two in second
grade, and two in third grade identified the chords of this more complex progression accurately. The
five children who were able to do so, also identified the first four chords of the song accurately. These
five children were the only subjects who identified all eight chords of the accompaniment correctly.
Discussion
The results of the study indicate that there are certain factors that affect children's performance in
harmonic perception tasks and show developmental trends in the way children perceive simple
harmonic progressions. Children's familiarity with a song affects the accuracy with which they
discriminate among various accompaniments to its melody. Although children learnt the three songs
used as stimuli during the training, they were more familiar with one of the songs (i.e., "Firilala") than
with the others, because "Firilala" was sang every week for 10 weeks while the other songs were only
introduced during the fifth week of instruction. Children were more successful in discriminating
between an incorrect and a correct rendition of "Firilala" than in doing so between the renditions of
the other songs. This was particularly true for the younger children. Perhaps, by listening to and
singing the same song for many weeks, children become more aware of the various features of the
song including those on which they would not spontaneously focus their attention. It is known that
young children tend to focus their attention on musical elements other than harmony but that they may
be prompt to focus on this element if presented with simple stimuli (Costa-Giomi, 1994a, 1994b). The
results of this study suggest that teachers can direct students' attention to the harmonic features of the
music more successfully by using the songs that are most familiar to the children. This practice might
be effective especially when introducing complex accompaniments. In this study, the stimuli that were
most difficult to discriminate were two renditions of the less familiar songs; in fact, the same
renditions of the most familiar song elicited the highest scores.
Only a few students could identified the eight chords of the accompaniment to the refrain of "Firilala"
accurately indicating that the identification of chords is obviously a difficult task for young children.
Even in third grade were children usually unable to identify the tonic and dominant chords of simple
accompaniments.. However, children were able to identify chords with different degrees of accuracy
depending on their grade level. Older children were more successful in identifying the chords than
were second graders, and in turn, second graders were more accurate in their identifications of the
progressions than were first graders. The accompaniment children were asked to identify was
composed of two phrases. The first phrase, which was quite simple (I I V V), was identified by 86%
of the third-grade children and 68% of the second graders. Despite its apparent simplicity, only 20%
of the first grade children could identify the four chords accurately. The second phrase of the
accompaniment was more complex because it presented two chord changes (I V I I). The difficulty of
this progression was reflected in the low number of children who identified it accurately.
Interestingly, the five children who were able to do so were distributed among the three grade levels
indicating that even first graders may be able to identify the chords of a tonic-dominant progression.
Although children in general applied the knowledge they had learnt during the lessons when taking
the test, the younger ones were not as consistent as the older students in their use of new knowledge .
For example, most children remembered that the last chord of the accompaniment was likely to be the
tonic but 40% of first graders failed to do so. It is clear that young children benefit from the repetition
of simple concepts, especially those that are more foreign to them.
Teachers should be aware of the difficulties young children experience when presented with simple
harmonic tasks and are recommended to consider carefully the inclusion of harmonic concepts in the
early childhood music curriculum. It seems important that they provide children with opportunities to
apply of harmonic concepts through performance activities in addition to perceptual tasks.
References
Bentley, A. (1966). Musical ability in children and its measurement. New York: October House Inc.
Bridges, V. (1965). An exploratory study of the harmonic discrimination ability of children in
kindergarten through grade three in two selected schools. Unpublished Doctoral dissertation, Ohio
State University, Columbus.
Costa-Giomi, E. (1994b) Recognition of Chord Changes by 4- and 5-year-old American and
Argentine Children. Journal of Research in Music Education, 42, 68-85.
Franklin, E. (1956). Tonality as a basis for the study of musical talent. Goteberg: Gumpert.
Funk, J. D. (1977). Some aspects of the development of music perception. Dissertation Abstracts
International, 38, 1919B (University Microfilms No. 77-20,301.
Hair, H. I. (1973). The effect of training on the harmonic discrimination of first-grade children.
Journal of Research in Music Education, 73, 85-90.
Hair, H. I. (1981). Verbal identification of music concepts. Journal of Research in Music Education,
29, 11-21.
Hair, H. I. (1987). Descriptive vocabulary and visual choices: children's responses to conceptual
changes in music. Bulletin for the Council of Research in Music Education, 91, 59-64.
Hufstader, R. A. (1977). An investigation of a learning sequence of music listening skills. Journal of
Research in Music Education, 25, 184-196.
Imberty, M. (1969). L'acquisition des structures tonales chez l'enfant. [The acquisition of tonal
structures in children]. Paris: Klincksieck.
McDonald, D. T. & Simons, G. M. (1989). Musical growth and development birth through six. New
York: Schirmer.
Merrion, M. (1989). What works: instructional strategies for music education. Reston: Music
Education National Conference.
Moog, H. (1976). The musical experience of the pre-school child. (C. Clarke, trans.), London: Schott
& Co., Ltd. (original work published in 1968).
O'Hearn, R. N. (1984). An investigation of the response to change in music events by children in
grades one, three, and five. Dissertation Abstracts International, 46, 371A.
Petzold, R. G. (1966). Auditory perception of musical sounds by children in the first six grades.
(Cooperative Research Project No.1051). Madison: University of Wisconsin. (ERIC Document
Reproduction Service No. ED 010 297).
Pflederer, M. & Sechrest, L. (1968). Conservation-type responses of children to musical stimuli.
Bulletin for the Council of Research in Music Education, 13, 19-36.
Schultz, S. W. (1969). A study of children's ability to respond to elements of music. Unpublished
doctoral dissertation, Northwestern University, Evanston, IL.
Shuter-Dyson, R., & Gabriel, C. (1981). The psychology of musical ability. (2nd edition). London:
Methuen.
Sloboda, J. A. (1985). The musical mind: the cognitive psychology of music. Oxford: Clarendon Press.
Sterling, P. A. (1985). A developmental study of the effects of accompanying harmonic context on
children's vocal pitch accuracy of familiar melodies. Dissertation Abstracts International, 45, 2436A.
Taylor, S. (1969). The musical development of children aged seven to eleven. Doctoral dissertation.
University of Southampton, UK.
Teplov, B. M. (1966). Psychologie des aptitudes musicales. [Psychologie of musical aptitudes]. Paris:
Presses Universitaires de France.
Valentine, C. W. (1913). The aesthetic appreciation of musical intervals among school children and
adults. British Journal of Psychology, 6, 190-216.
Vera, A. (1989). El desarrollo de las destrezas musicales [The development of musical abilities].
Infancia y Aprendizaje, 45, 107-121.
Yoshikawa, S. (1973). Yoji no waon-Kan no hattatsu [A developmental study of children's sense of
tonality]. Ongaku-Gaku, 19 (1), 5-72.
Zenatti, A. (1969). Le developpement genetique de la perception musicale [The development of
Back to index
Proceedings paper
BACKGROUND
The practice of music therapy in Britain began in the 1950s, with the British Society of Music
Therapy being established in 1958. Music Therapy is a specialised and rapidly developing profession.
Due to the diverse activities and approaches within the discipline, finding an adequate definition is
difficult. To summarise, music therapy can be viewed as the use of sounds and symbols within an
evolving relationship between child or adult and therapist to support and encourage physical, mental,
social and environmental well being Bunt (1994).
Documentation of music therapy practice has become an integral part of a therapist's activities. In
addition, there is a growing interest in quantitative and qualitative research methodologies that can
help us understand in more detail the process and outcomes of this intervention (Purdie, 1997).
Consequently, this has lead to investigations on the influence of music therapy on specific aspects of
behaviour such as communication (MacDonald, O'Donnell and Davies, 1999). This research has also
focused on the utility of music therapy as a unique therapeutic method within clinical practice
(Aldridge, 1993).
There has been over the last two decades an upsurge in qualitative social psychological research based
on communication, language and texts. This is due partly to an interdisciplinary trend towards
communication-orientated research in sociology, women's studies, anthropology, media studies and so
forth. Language is now viewed as an active site for the continuing negotiation of various meanings
and often investigations examine dyads, or group situations. It has been suggested the most important
element of task activity in groups is the dialogue among groups (Miell, D. and Mac Donald R.A.R, in
press; Tolmie, Howe, Mackenzie and Geer, 1993). Rogoff (1980) considers children to be apprentices
in thinking, active in their efforts to learn, observe, and participate with peers and with skilled
members of society. Central to Rogoff's theory of guided participation is the notion of
intersubjectivity, or sharing of focus and purpose between children and their more skilled partners and
their challenging and exploring peers. Progress occurs when children internalise or appropriate social
processes. Taking this view, the music therapy sessions can be seen as socially collaborative situations
where a skilled member of society (music therapist) is guiding the children through a musical and
verbal language.
Within her discussion of guided participation and intersubjectivity, Rogoff (1980) also talks about
how she sees the creative process occurring among individuals. She suggests that the mutual
involvement of people working on similar issues is part of the social context of creativity. In the case
of music therapy this context is the music therapy session and the creative musical relationships that
develop over time between therapist and client group. The sessions therefore can be seen as a
collaborative situation where the music therapist employs both musical and verbal dialogues to reach
the objectives of the sessions as well as develop the children's creativity and imagination.
AIMS
Taking the above view to social group processes and musical communication this study investigates
the effects of music therapy on social development as examined by the dialogues employed within the
sessions and communication taken as the musical relationships over 10 weekly sessions of music
therapy.
METHOD
The participants were 7 children, aged between 8-11 years, (Group 1 = 2 boys and 2 girls, Group 2 = 3
boys). The sessions were conducted as part of a program of music therapy in special education
schools. After 10 sessions of music therapy, videotaped analyses and transcriptions of each session
were conducted to determine how musical and verbal dialogues developed.
The sessions were conducted on a weekly basis, on a Friday morning for 20 minutes in the schools
music room. A professional music therapist conducted all the sessions.
The music therapist developed a structure to the sessions, which was: hello song, egg shaking game,
instruments playing/rhythm game, goodbye song. Analysis of the sessions focused on each of the
games and interpreting how the musical relationships were developing and how verbal dialogue
supported these relationships.
RESULTS
Transcripts from the videos show how the musical and verbal dialogues reflect communication and
social development over the 10 weeks, and how verbal dialogue supported the developed of the
musical relationships. Analysis of the music therapist's dialogues indicated how the music therapist
acts as a facilitator guiding the children through musical communication and with assisted verbal
dialogue development. For example, the music therapists simple and repetitive dialogue allows for the
musical relationships to be the main form of communication; 'Listen to each other.....try again....go
slower....consequently the music therapists dialogue develops a scaffold which enhances the musical
communication. It also has been shown that repetitive and structured task assist children with
moderate learning disabilities as it helps the child to master a task and feel confident with their work.
Analysis of musical development indicates how the children's skill in the games advances, as well as
their imagination and confidence within these games. For example, when Session 1 and 10 are
compared both Group 1 and Group 2, show improvement in their confidence to play and experiment
with the instruments.
DISCUSSION
Chomsky (1990) claims that the structure of language does not allow direct expression of our thought,
as the knowledge that we possess is not always reducible to words. Language has the limitations of
representing what one thinks without necessarily being what one thinks. This also relates to
Wittgenstein's (1953) arguments that we can never be entirely sure that we do in fact correctly
understand precisely what is intended, that language is not simply a matter of transmitting intentions
and knowledge. What is being proposed here is that through the analysis of the musical relationships
and verbal dialogue within the music therapy session we can develop a clearer picture of how the
therapy sessions evolve. As well as examining how the music therapist guides the children within the
session, assisting them in their development as apprentices in musical thinking.
The research shows examples of how the children's musical expressions can be demonstrated both
musically and verbally as both the children and the music therapist establish a common understanding
of the session by projecting their thoughts and ideas directly into the musical games within the
sessions. In this way music therapy can be viewed as form of musical and verbal discourse, a
discourse that is though music rather than about music. The children and the music therapist do not
therefore need to discuss their ideas, as they become apparent through the direct action, the musical
games. It is through this direct action that a common understanding of the musical games develops
between the music therapist and the children and its form this that the musical relationships,
imagination, creativity and confidence of the children develop. It is this shared musical reality, which
is principle form of communication, which the verbal dialogue assists and is based around. The simple
repetitive statements from the music therapist and the short statements form the children indicate that
the musical relationships are what is important within the music therapy sessions.
CONCLUSION
The findings suggest that music therapy can be analysed though the musical and verbal dialogue that
develops within the sessions. These forms of dialogue are unique to the music therapy session and
assist in the development of children with moderate learning disabilities musical communication,
experimentation, creatively and imagination. The results of the experiment also indicated how the
music therapist acts as a facilitator to the musical relationships, by employing certain dialogue
techniques. The therapist assists, through guided participation, the development of the children's
musical relationships. The results of the experiment are encouraging since there is a limited amount of
research on the effects of music therapy with groups of moderate learning disabled children that focus
on the development of musical relationships and on the verbal dialogue within music therapy sessions.
KEY WORDS
Music therapy, learning disabled children, social and communication skills, musical relationships and
verbal dialogue.
REFERENCES
Aldridge, D. (1993) Music therapy research 1: A review of the medical research literature within a
general context of music therapy research. The Arts in Psychotherapy, 20, 11-35.
Bunt, L. (1994) Music Therapy: An Art Beyond Words. Routledge: London.
Chomsky, N. (1990). Language and mind. In D. H. Mellor (Ed.), Ways of Communicating: The
Darwin College Lectures. Cambridge University Press
MacDonald, R.A.R., O'Donnell, P.J, & J.B., Davies (1999). Structured music workshops for
individuals with learning difficulty: an empirical investigation. Journal of Applied Research in
Intellectual Disabilities 12(3) 225 - 241.
Miell, D. & MacDonald, R.A.R. (in press). Children's creative collaborations: The importance of
friendship when working together on a musical composition. Social Development
Tolmie, A., Howe, C., Mackenzie, M. and Geer, K. (1993). Task design as an influence on dialogue
and learning: Primary school group work with object flotation. Social Development, 2 (3), 189-211
Purdie, H. (1997) Music therapy with adults who have traumatic brain injury and stroke. The British
Journal of Music Therapy, 11(2), 45 -51.
Rogoff, B. (1980). Apprenticeships in thinking: Cognitive development in social context. Oxford
University
Wittgenstein, L. (1953). Philosophical Investigations (translated by Anscombe G.E.M.). Blackwells,
Oxford
Back to index
Proceedings paper
Introduction
To qualify for a career as a professional musician, it is necessary to receive many years of high quality
instrumental teaching (Ericsson 1997; Manturzewska 1990; Sosniak 1990). The years the students
spend in higher music education institutions are crucial in this respect, and students often consider the
study of their principal instrument to be most important (Nielsen 1998). The quality of instrumental
teaching is therefore of vital concern to institutions of higher education.
Student evaluation of teaching is one means of developing the quality of teaching, that is in use in
many institutions. In this context student evaluation is used formatively, to improve teaching, and not
summatively as a basis for decisions on tenure, merit pay and so on (Centra 1993). The question
addressed in this paper is if student evaluation of individual teaching represents a special challenge to
the teacher-student relationship as compared to evaluation of class teaching. Research on higher music
education substantiates the close and personal relationship that normally develops between the
instrumental teacher and his student (Kingsbury 1988; Nettle 1995; Nielsen 1998). In such dyadic
teacher-student settings, it is vital to develop and preserve a good relationship between the two
parties. Tiberius og Flak (1999) claim that in every relationship between teacher and student there will
be some disappointment and negative emotions. Dyadic teaching and learning represent a special
challenge, however, because «... the overt civility of dyadic relationships can mask unexpressed
tensions and (...) these tensions, if not addressed, can increase to the explosive point, at which the
relationship itself is destroyed» (ibid. p.3). Therefore, they conclude, it is important to «...structure a
relationship that can handle conflicts and tensions routinely and thereby prevent escalation (p.5).
Student evaluation can be understood as a routine built into the relationship with the purpose of
unmasking tensions in a controlled manner, thereby enabling the parties to address the problems.
On the other hand, being subject to evaluation is not always pleasant; student evaluation reflects on
the teacher´s self respect as a professional and can sometimes be experienced as wounding,
threatening and demoralising (Braskamp & Ory 1994:128; Moses 1986; Ryan et.al.1980; Seldin 1989,
1993; Strike 1991). It is therefore a common recommendation in evaluation literature that student
evaluation should be conducted anonymously. In individual instrumental teaching, however, it is often
difficult, and perhaps not even very productive, to maintain anonymity. It is therefore a relevant
question to ask whether student evaluation of individual instrumental teaching in some cases might, in
Method
Since there is so little research done on this particular field, I chose to do an exploratory study. To
understand how student evaluation works, it is vital to understand how the persons involved
themselves look at it. I therefore conducted semi-structured qualitative research interviews (Kvale
1996) with principal instrument teachers and their students. The sample consists of 9 instrumental
teachers with long teaching experience. The interviews with the teachers indicated that there were
three different approaches to the use of student evaluation represented. I therefore chose one
representative for each of these approaches (teachers A, C and H) and interviewed 9 of their students
(students a1-3, c1-3, and h1-3). The students in the sample had all completed a minimum of two years
of study at the Academy, and they all had music performance as a central part of their programme.
Results
The results indicate that both teachers and students are well aware of the fact that the evaluation in
reality is not anonymous. The teacher has such a limited number of students, and normally knows
each of them so well that he can identify them. There might be ways to reduce the possibility of
revealing the identity of the students, but both students and teachers claim that student evaluation
might not serve its purpose then. If the evaluation does not reveal the needs and opinions of the
individual student to the teacher, it will not be of much help to him in tailoring his teaching for that
particular student. It is important to understand that student evaluation in reality is not anonymous,
whatever the procedures are, when discussing it in this particular context.
A fundamental question, then, is if the students dare to be honest in their evaluation. If not, student
evaluation loses its point. The results indicate that this can be a problem. One reason for this seems to
be that the student might be afraid of hurting his teacher's feelings, and therefore does not dare to be
frank and honest. This fear is something that preoccupies several of my informants:
This is how teacher F sees this:
... it is something to do with the «chemistry» also, you are kind to each other. They are much more
afraid of hurting the teacher in a way. (Teacher F)
Teacher H also indicates that students might be considerate in what they say:
In a way you have to attach more importance to any hint of objection that crops up and then decide
whether this is only a considerate way of saying that this is hopeless, because they don't dare to express
themselves more strongly. (Teacher H)
When I ask the teachers about their reactions to critical evaluations, some of them answer that
normally they do not feel upset; they can handle whatever comments they receive professionally.
Others again admit that they can feel hurt when being criticised. Teacher C describes her reactions in
this way:
...you are, quite naturally, a bit hurt by negative [ comments] , especially when you believe you are as
good as I believe I am. What? I, who am «world famous»! and so on. And it definitely hurts a bit.
(Teacher C)
Teacher J experienced some years ago that several of his students filed a complaint against him. This
is how he describes his reaction to being criticised:
When you are as fond of the students as I actually was − I loved the job because of the students − then it
comes as such a disappointment that you cannot describe it with words. ... As a teacher, you have to find
the balance between humility, self confidence and joie de vivre. My self confidence is still there,
strangely enough, even if the joie de vivre received a blow that lasted several years. I still haven´t got
over it entirely. I felt as if something died inside me at that time. (Teacher J)
Teacher J says that this experience makes him oppose the use of student evaluation, because the
thought of being evaluated at the end of the year will make him apprehensive. He fears this will
interfere with his teaching and make him a bad teacher.
When I ask the students if they are afraid of hurting their teacher, I receive different points of view.
One of the student answers that:
...you have to realise that getting a good education is your own responsibility. You can't be afraid of
hurting a teacher. You have to tackle the problem yourself and try to criticise. (student c2)
Others again feel that it is difficult to criticise the teacher, and two of the female students imply that
girls in particular might be afraid of hurting their teacher:
Afraid of hurting the teacher, yes, we probably are. I think that's true. I don't know about the boys, but I
have talked with a lot of girls, and I think many girls are afraid of hurting their teacher. (student a1)
It is probably typical of girls that we care more about people. You feel it hurts to criticise someone.
(student c1)
evident from the interviews that many of the principal instrument teachers invest a lot more time and
personal commitment in their students than would be expected of a university professor. This personal
relationship between the teacher and his student can be understood as a fundamental trait of individual
instrumental teaching. There seem to be at least two possible reasons for this: Firstly, this type of
teaching implies a one-to-one teacher-student relationship that often lasts several years, years that are
of vital importance in a young musician's life. One student for example, compares the relationship
between teacher and student to a parent-child relationship in order to describe the bonds between
them. Secondly, it seems that characteristics of the subject matter; the music, forces both student and
teacher to expose themselves emotionally, and therefore to come closer to each other on a personal
level. Teacher F touches on this when she says:
F: With regard to having a close relationship - A lot of people say that the teacher-student relationship
should not become too personal, but I find that difficult to regulate. We talk a lot about real feelings
during the lessons, not just 4th finger on f sharp, right. We talk about what this music expresses. It might
sound sentimental, but you have to open up your whole register of feelings, and then you cannot just sit
there and keep a distance to the student. ... You cannot be close in your teaching without being close as
a human being.
I: And I suppose the students might feel the same way, and then they are perhaps afraid of hurting you?
The result indicates that there might be a price to pay for the closeness between the principal
instrument teacher and his student: Some students might not dare to voice any criticism for fear of
hurting the person he feels attached to thereby destroying the openness and intimacy that is so vital in
this type of teaching.
Another reaction to a negative evaluation can be anger and hostility. Such feelings can in themselves
be a strain on the relationship between teacher and student. In addition, they might lead to reprisals
against the student. Several of my informants comment on the fact that the instrumental teacher is in a
position where he has the means to retaliate in different ways, and that fear of reprisals might stop the
student from expressing any criticism of the teacher or his teaching.
Two of the teachers expressed their concern in this connection:
... the teacher can decide whether you are going to get a job engagement or not, then it is hopeless when
you know that you will be studying with that teacher for the next three years. There is no question of
making any criticism, as they know it will not improve things. The only thing that might happen is that
the relationship might becomes worse. You will definitely be out of favour with the teacher. (Teacher C)
...they will feel that they might insult me, or that I somehow might reject them if they have something
negative to say. ...In individual teaching, and in the milieu here as a whole, they are more careful not to
come into conflict with anyone. ... If they come into conflict with someone they may have the
impression that it could harm their career. (Teacher E)
When I ask the students if this is something they worry about, I get different answers. A few of them
say that they have never thought about it, but several students say that fear of reprisals has kept them
from being frank and open either with their present teacher or with former teachers.
The fact that the student often "surfs on the contacts that his teacher has in the job market" as one
student puts it, implies that the teacher has an instrument of power that he has the potential to use on
the student. Two students comment on that:
...you know very well that it is preferable not get onto bad terms with your teacher, because then you
will not get jobs. ... I am very much aware of the fact that if I got into a major conflict with her, I would
have a problem getting those jobs, and those are jobs that I really want. Then it becomes just hopeless.
(student c1)
...because often if you get onto bad terms with your teacher, it implies that you will have difficulties in
the free lance market and the like. It is a problem, really a problem. (student c2)
The principle instrument teacher can choose to use his contacts in the job market for the benefit of his
student, or he can choose not to use them. It is not surprising, then, that the student in some cases
thinks it wiser to stay friends with his teacher by holding his tongue.
As we saw earlier, instrumental teaching is often described as learning by apprenticeship. I was
therefore interested in finding out if the roles of the master/teacher and apprentice/student are
perceived as consistent with student evaluation.
Teacher C´s answer indicates that student evaluation is not a natural part of this teaching tradition:
It is not the usual way of thinking; to let the students evaluate. I don't think it is common among my
colleagues or myself. In this master-apprentice tradition you are what you are, namely in this case, a
musician ... It is not natural for the master to ask for an evaluation, because the master is, per definition,
a master ... student evaluation is not perceived as natural within the master-apprentice tradition, it just
isn't; you only destroy yourself. (Teacher C)
At the same time, both this teacher and most of the others I interviewed were anxious not to be
identified with a master role in the sense of someone who has all the answers, and several of them
express a strong wish to reduce their authority towards the students.
Some of the students I interviewed also state very clearly that they do not feel the authority of the
teacher as a hindrance in the sense that the teacher would object to being evaluated. On the contrary,
several comment on the fact that they perceive their teachers as being anti-authoritarian and open to
feed-back and criticism.
Nevertheless, it might not always feel natural for the students to evaluate and be critical to their
teacher. One student expresses himself in this way:
But it is sometimes a bit ridiculous that you as a 20-year old should criticise a teacher who has 30 years
of experience. ... I have that much respect for C´s experience not to criticise her teaching in this way.
You have to accept it as it is. (student c2)
Teacher J seems to agree. He claims that student democracy has gone too far, and that student
evaluation is neither appropriate nor necessary:
If we are supposed to have the best teachers in Norway here, I feel that, in a way, this should be quite
unnecessary. (Teacher J)
And even if the teachers might wish to play down their authority, the students might not perceive it in
the same way. One of the students claims that the teachers might not realise how strong their authority
in reality is, and that they perhaps underestimate their power over the students:
At least I feel that in this master-apprentice relationship in which we actually find ourselves, the teacher
has a lot of power. ... this power is not obvious to the person posessing it, only to the one who might be
exposed to it. I have been teaching enough myself to know that you don't feel very powerful when you
stand there [ in front of a class or student] , but nevertheless you are, because it is your agenda, it is your
word that counts. It is easy to forget, is all too easy to forget that when I teach. And I suppose it is as
easy to forget for an instrumental teacher, also because you have such a friendly relationship with your
student. (student a1)
Furthermore, evaluating the teacher might for some students be incompatible with having a great
professional confidence and trust in his teacher, a trust that seems to be fundamental in learning by
apprenticeship:
In my opinion, to put up too much resistance against the teacher or the type of system he has, just doesn't work,
especially in the type of teaching tradition that we have. I think you have to decide to go along with him
entirely, or otherwise you have to find yourself another teacher. (student a1)
Teacher E expresses somewhat the same attitude when looking back to his own student days:
E: It is a question of faith, to subject oneself to teaching. It is a question of believing in it.
E: Yes, for me it was. I had to make a choice: either I was suspicious and distrustful, or I just had to «swallow»
what he came with. And then, in a way, you have put behind you that dispassionate and critical attitude. You
have to have faith in the person and trust that this will work out. (Teacher E)
The results indicate that there might be some role expectations built into this kind of teaching that can
make it difficult for the student to have a dispassionate and appraising attitude towards his own
teacher. The teacher´s professional authority per se sometimes seems to be an obstacle, even if the
teacher himself does not necessarily stress his authority or expect any reverence. The reason might
just as well be that the student needs to have complete faith in his teacher as a professional authority.
But the results also indicate that some teachers might feel student evaluation to be alien to the kind of
roles he and his students have within this teaching tradition.
Conclusion
We have seen that the teacher-student relationship plays a decisive role in the student´s development
towards becoming a professional musician. His professional trust in his teacher is a fundamental
condition in this relationship In his book Personal Knowledge Michael Polanyi (1958:53) underlines
the importance of this almost blind trust when he writes: «You follow your master because you trust
his manner of doing things even when you cannot analyse and account in detail for its effectiveness.»
It seems, however, that it is not always easy to combine this trust with a more democratic relationship
and a dispassionate and appraising attitude. Both teachers and students might feel that student
evaluation confuses the roles.
Individual instrumental teaching is a kind of teaching that normally creates closeness between teacher
and student, but it also presupposes closeness to succeed. The results of this study indicate that
students might be very anxious not to destroy this intimacy and confidence. In this situation student
evaluation can be a double-edged sword. On the one hand it can help the student to express any
negative feelings he might have in a regulated and accepted context, and thereby contribute to
reducing the tension in the dyadic relationship. Gaining insight into the needs and feelings of the
student will also enable the teacher to adapt his teaching and thereby prevent future disappointments
and frustrations. On the other hand it seems as if student evaluation in some cases actually results in a
deterioration of the relationship because the teacher cannot handle negative evaluations and feels hurt
or even becomes hostile. In other words; in some cases student evaluation can be counterproductive.
The students seem to be painfully aware of this possible outcome, and their strategy in some cases,
seems to be to keep quiet. They prefer to live with the problems, rather than tackle them by criticising,
or they change teachers if it becomes too much of a strain. In many cases this fear of the teacher´s
reaction might be groundless. Many teachers probably handle criticism professionally and do not let
the student notice any negative reaction. But at the same time, the students´ tales of experiences they
have had trying to voice criticism to teachers they have had through the years, give grounds for
concern. It underlines how important it is that the teacher has a highly developed professionalism and
ethical awareness in his role as a teacher. If not, student evaluation might actually make things worse
for the student.
Instrumental teachers are only human beings, naturally. Human beings, who invest a lot of time,
commitment and professional reputation in their work as teachers. Disappointment and anger are
therefore understandable reactions when the student is dissatisfied, or does not want to accept what
one has to offer. In such a close relationship both parties are dependent on each other for support and
acknowledgement, and, therefore, they have power over each other. As the Danish philosopher Knud
Eilert Løgstrup (1999) says, in every relationship we hold something of another human beings life in
our hand: We are each other´s destiny and have therefore power over each other. The teacher is the
student´s destiny, but the student is to a large extent also the teacher´s destiny. We understand that
when we listen to teacher J tell about his reactions to being criticised. Intimate relationships imply that
one has to reveal oneself to the other person, and for that trust is a precondition. «Acknowledgement,
respect and consideration can only develop between persons who dare to expose themselves to each
other in the conviction that they will not be rejected by the other part» (Bergem 1998:80). Criticism
and negative evaluations can easily be regarded as a rejection of what one stands for both as a teacher
and as a musician, and a natural reaction to this might be a feeling of hurt or anger. That is why it is so
important that the teacher is aware of the ethical demands that are ingrained in his role as a teacher.
The teacher is always the stronger part in a relationship that per definition is asymetric, no matter how
close it might be. This imposes on the teacher an ethical responsibility towards the student: He has to
control his own reactions, and put his own needs aside in favour of the student´s.
A first condition for being able to act morally, is to be aware that one is facing an ethical demand
(ibid.): The teacher must realise what a high price the student might have to pay for his honesty if it is
not met in a decent and ethically justifiable manner. Otherwise, the teacher risks, if only through
ignorance or thoughtlessness, misusing his power to preclude the student from the possibility of
criticising; criticism that might be both justified and necessary in order to improve his learning.
References:
Bergem. T. (1998). Læreren i etikkens motlys. (The teacher in the light of ethics) Oslo:
adNotam Gyldendal.
Braskamp, L. A., & Ory, J. C. (1994) Assessing Faculty Work. Enhancing Individual and
Institutional Performance. San Francisco: Jossey Bass Publishers.
Centra, J. A. (1993). Reflective Faculty Evaluation. San Francisco: Jossey-Bass
Publishers.
Ericsson, A. K. (1997). Deliberate practice and the acquisition of expert performance: An
overview. In H. Jorgensen and A. C. Lehmann (Eds.). Does practice make perfect?
Current Theory and Research on Instrumental Music Practice. Oslo: Norges
musikkhøgskole. pp 9-51.
Kingsbury, H. (1988). Music, Talent and Performance: A Conservatory System.
Philadelphia: Temple University Press.
Kvale, S. (1996). InterViews. An introduction to Qualitative Research Interviewing.
Thousand Oaks: Sage Publications.
Back to index
Proceedings paper
Motivation and Expertise:
The role of teachers, parents and ensembles in the development of instrumentalists
Beatriz Ilari
bilari@po-box.mcgill.ca
McGill University
Faculty of Music
555 Sherbrooke Street West, Montreal, QC H3A 1E3, Canada
The outstanding performance of remarkable individuals has long interested scholars, educators and
researchers (Ericsson & Charness, 1994; Csikszentmihalyi, 1996). Expertise is the term used to designate
optimal human performance and there are many reasons that explain our interest in it. From a sociological
viewpoint, we understand that exceptional performance in certain domains is a culturally valued behavior
(Simonton, 1999) and is often a synonym of success. Many terms such as gifted, prodigy and genius
designate and discriminate exceptional performers from the rest of the population; although these terms
have been changing with the constraints of time (Csikszentmihalyi, 1996). From an educational viewpoint,
we are fascinated with the possibilities of understanding the cognitive processes employed by these
outstanding performers as such understanding could perhaps, shed light into the development of educational
strategies which might help a large number of individuals to achieve. According to Collins, Brown and
Newsman (1989), as learners we tend to compare our performance to the one of the expert, in order to
situate our knowledge within the domain and improve our skills. Yet, the notion of expertise has gone
through several transformations. In the past, experts were able to handle different tasks in different domains
but, as culture evolved, domains have split into sub-domains and specialization has been a natural trend
(Csikszentmihalyi, 1996). According to Sternberg (1998), expertise involves acquisition, storage and
utilization of at least two types of knowledge: - explicit knowledge of a domain and implicit or tacit
knowledge of a field. Then, it is presumable that experts have knowledge of the facts, formulas and main
ideas of a domain, as well as a "non-verbalized knowledge form"; both needed to obtain success in a field.
In the last fifteen years, many cognitive scientists have studied the expert performance of individuals in
domains that use symbolic representation such as mathematics, calculation, chess and music (Ericsson &
Charness, 1994). In the domain of music some studies have been carried out to investigate the performance
of expert musicians. While some studies concentrated on the thought processes employed by expert
musicians (see McPherson, 1997; Whitaker, 1996; Younker & Smith, 1996), others focused on the factors
or motives that lead performers to devote a large amount of time to music.
Motivation plays a very important role in the understanding of expert performance. The literature suggests
two types of motivation often related to music: - intrinsic and extrinsic. Sloboda (1993) teaches us that
intrinsic motivation is developed from intense pleasurable experiences with music, which might lead to a
deep and fulfilling personal commitment to music whereas extrinsic motivation is more related to achieving
certain goals such as parental/peer approval or winning competitions than to music itself. It seems that all
individuals have a mixture of the two types of motivation (Sloboda, 1993)
Research has shown that, as an average, elementary schools start providing instrumental instruction at age
nine (Martignetti, 1966; Mackenzie, 1991).Children who are engaged in private instruction usually start
earlier depending on instrument choice and family/cultural influence. Klinedinst (1992) concluded that
approximately 25 percent of students who start instrumental lessons discontinue them after one year of
instruction. Henson (1974) found that the decision to dropout of instrumental education usually happens
during the first three years of instruction. As Sloboda and Howe (1991) pointed out, only a minority of
beginners will persist taking lessons until they reach a high level of musical competence.
Parental involvement and support seems to be a strong variable of motivation in instrumental instruction.
Parents who support their children in the early years, regardless of their musical competence, are an
important source of motivation (Doan, 1973; Davidson, Sloboda & Howe, 1995). A supportive
environment is clearly important for the success of the young instrumentalist (Allen, 1974; Allen, 1998;
Bonifatti, 1997; Davidson, Howe, Moore & Sloboda, 1996; Henson, 1974; Martignetti, 1966; Webber,
1974).
The relationship between music teacher and student is also very important when studying motivation for
instrumental learning. Sandene (1997) suggested that students are often discouraged by the negative attitude
of teachers who are too concerned with achievement and have an ego-goal orientation, creating fear instead
of pleasure and enjoyment during lessons. Davidson & Scutt (1999) studied the teacher, student and
parents' interaction before, during and after musical examinations and concluded that, although the music
learning process is of a "triadic nature" involving teacher, student and parents; the teacher is still the central
figure, responsible for shaping the learning experience by mediating the relationship between parents and
students. The study also emphasized that teachers' comments have critically an important role to play in the
learning process as parents usually count on teachers opinions and ideas. Sloboda and Howe (1991) who
studied the lives of young musicians found that teachers are extremely important especially in the early
years of instruction. While young musicians who persist in instrumental instruction learn how to
differentiate and distinguish between professional and personal qualities of their instructors, children who
dropout of instruction can not make such judgements (Davidson, Howe & Sloboda, 1995). A student's first
instrumental teacher is very important, and personal warmth seems to be an essential characteristic when
working with young musicians (Howe & Sloboda, 1991).
Many researchers have looked at the role of affect in motivation. Enjoyment and pleasure seem to be
important for keeping students motivated in instrumental instruction. Csikszentmihalyi (1990) believes that
music instruction often over emphasizes how children perform rather than what they experience. Sloboda
(1993) suggests that when too much emphasis is placed on achievement, especially in the early stages of
learning, intrinsic motivation is often inhibited and students experience anxiety instead of pleasure. Many
parents and teachers have expectations which are too high and generate great stress in the child, instead of
enjoyment. Indeed, as Howe and Sloboda (1991) suggested, many students discontinue their musical
instruction due to a lack of enjoyment when playing and practicing their instruments.
Disinterest in music seems to affect student's decision to continue music instruction. Many students lose
interest in music due to an inadequate choice of instrument (Martignetti, 1966; Henson, 1974; Allen, 1995).
The difficulty of the instruments is also mentioned as a cause for loss of interest in music as suggested by
Martignetti (1966) and Henson (1974). Casey (1964) suggested that loss of interest is sometimes related to
student's inability in achieving a satisfactory level of performance. Loss of interest in music is often not
easy to explain in words, although it is responsible for many cases of student's dropouts of instruction.
Students beliefs for success and failure in music contribute enormously for their persistence in instrumental
instruction. Many students often believe that they are not "good enough" for music and therefore, dropout
of instruction. Asmus (1986) investigated children's beliefs for success and failure in music. He found that
students, while young , tend to believe that effort is what justifies their success or failure in music. As
students mature, their beliefs change and they tend to attribute their success or failure in music to ability.
Students beliefs should be considered and taken in account as they have clear effects on achievement
(Asmus, 1986; Chang & Costa-Giomi, 1993).
Other studies that investigated the role of motivation in instrumental music education suggest that, while in
school, students are many times forced to opt between music and other classes or activities, dropping out of
instruction (Martignetti, 1966; Casey, 1964; Henson, 1974).
While most of the studies tend to look at children's motivation, a rather small number of studies have
investigated the perceptions and opinions of adult instrumentalists, recalling their experiences as students.
Not that instrumental music education should be viewed and geared towards a musical career choice, but in
some ways, when a student chooses to become a professional musician, it is assumed that she or he is
motivated for music. In a longitudinal study, Manturzewska (1990) investigated the life-span development
of Polish professional musicians and concluded that family environment and intrinsic motivation are the
most important contributors to an effective and meaningful instrumental education. The study also
mentioned the importance of teachers, colleagues and social and emotional support in the motivation of
instrumentalists. Sosniak (1985) interviewed North American professional pianists in the beginning of their
careers, as well as their parents and confirmed the assumption that long term support from parents and
teachers is essential for a successful instrumental education. The importance of instrumental practice was
also emphasized by the vast majority of musicians interviewed by Sosniak (1985).
The purpose of this exploratory study was to compare the opinions of Brazilian and Canadian adult
instrumentalists on motivation in instrumental music education. The variables studied were teacher's
influences, parental involvement, musical background and, participation in ensembles. The study examined
if Brazilian and Canadian musicians answer in a similar manner, or agree about the relevance of these
variables as motivating factors.
The hypotheses for this exploratory study were:
1. Teachers are the most important influences in instrumentalists lives regardless of cultural differences.
2. Both, Canadian and Brazilian musicians consider participation in ensembles essential to student's
development.
3. All musicians, regardless of cultural background, have thought about discontinuing their musical
activities.
METHOD
Thirty-one musicians participated in the exploratory study; 18 Brazilian and 13 Canadian. Brazilian
musicians (12 females and 6 males) ranged in age from 23 to 43 (mean age of 31), started playing their
instruments between ages 5 and 23 (mean age of 12) and were playing their instruments from 11 to 20 years
(mean of 16). Canadian musicians (7 females and 6 males), ranged in age from 20 to 37 (mean age of 26
years), started playing their instruments between ages 4 to 22 (mean age of 10 years) and were playing their
instruments from 2 to 30 years (mean of 14). The instruments played by these musicians were: - violin (6),
viola (2), cello (1), double bass (3), flute (1), oboe (1), clarinet (1), French-horn (2), trombone (1),
percussion (1), piano (6), voice (1), guitar (1), and multiple instruments (4). All musicians in the study
expressed their desire to continue in the career as professional musicians.
A survey in Portuguese and English languages was developed to gather information on instrumentalists'
main musical activity, musical background, important influences in their studies, intentions to dropout of
instruction or discontinue musical activities, family musical background and support, participation in
ensembles and importance of such practices. Subjects were also asked to write a short and concise
definition of a good instrumental teacher.
All Brazilian surveys were translated to the English language. To verify consistency in the translations, all
answers were checked by an English-Portuguese translator. For each question of the survey, verbal answers
were analyzed, classified and categorized with the use of numbers. Similar answers received similar
numbers and profiles were created to better assess in the interpretation of the results. Since many questions
were of a rather descriptive nature, many respondents answered in essay form. Answers were then
classified and categorized by the use of multiple numbers contributing enormously to the study. This
explains the reason why very often there was a larger number of answers than the number of respondents.
2. Enchantment - Heard a particular piece or performer/group and was instantly drawn to music.
3. Had a musician in the family and music was a "natural" or mandatory choice.
an instrument. Surprisingly, while a larger number of Brazilian musicians in this study explained their
initial motivation for playing an instrument due to enchantment, only one Canadian musician answered in a
similar manner. The question that remains here is if this answer is connected to a cultural issue or if it is just
related to chance. Further research would be needed to address this question.
Still discussing initial motivation to start instrumental instruction, it was found that 47 % of Canadian and
28% of Brazilian responses related to the idea that the motivation to play an instrument comes, many times,
from some innate desire, not always easy to explain in words. This finding suggests that there might be
some individuals naturally drawn to music. As researchers we still need to investigate these desires or
natural "drawns" in order to understand their nature, if genetic or not, and situate and frame learning within
such context.
Teachers seem to be the most important influences in the lives of instrumentalists regardless of their
cultural background. Similarly, 44% of the Brazilian and 37% Canadian responses emphasized the
importance of teachers in the lives of performers. However, Brazilian and Canadian musicians differed in
their second choice of influence; while 27% of Canadian responses addressed the importance of
participating in an ensemble as a motivator for instrumental music education, Brazilians suggested other
issues as important motivators such as participation in a particular ensemble or desire
to have a "particular life style", only peculiar to artists. Table 2 shows the responses presented by both:
Brazilian and Canadian subjects.
Table 2. - Most significant influences in musicians lives
Category Description Brazilian Responses Canadian Responses
Number
2 Musician/Performer/Group 7% 21%
3 Musical Work 3% 5%
5 Participation in a competition - 5%
6 Other 27% 5%
As mentioned earlier, when talking about major influences in their musical lives, most subjects mentioned
an inspiring teacher as the main influence. This answer agrees with the hypothesis of teachers as the most
important influences in the lives of instrumentalists. However, many teachers play an important role in
discouraging students. Swanwick (1995) believes that instrumental instruction is many times related to
elements of luck and chance , as it is done on a one-and-one basis and is rather idiosyncratic, depending
mainly on teachers' approach. A preoccupation with the education of the instrumental teacher has grown
enormously, and many schools in the United States offer programs exclusively developed for those who
wish to dedicate their lives to instrumental music education.
The most difficult answer to categorize was the one related to the definition of a good teacher. Although the
definitions of a good instrumental teacher were similar in their contents, Brazilian and Canadian
instrumentalists answered in different manners. Canadian musicians tended to be more objective and
concise in their answers, using fewer words to describe their ideal teachers:
Supportive and demanding.
Original.
Brazilian musicians tended to use more words to describe their ideal masters:
Someone who is updated, open minded and allows creativity and new ideas.
One who stimulates students to love and get more involved technically and artistically speaking.
One who helps students develop a critical view of his/her own development and shares the "secrets" of the
profession.
However, both groups, Brazilian and Canadian agreed upon the assumption that a good teacher is someone
who has a good technique, is knowledgeable of his or her instrument and is able to relate it to important
elements and aspects of music history and theory. Other ideas presented in all surveys were related to
teacher's patience, respect for the student, attitudes and faith towards students:
A good teacher has a great deal of faith in his/her students' ability to become a good performer. They
communicate clearly and don't give up on trying to find ways to express important concepts to his/her
students.
An ideal teacher is someone who can look at their students and guide them to what he/she believes is their
largest potential; if the student is talented for chamber music, the teacher should emphasize that, if the student
is talented for solo performance , then that should be emphasized and so on. That is the art of teaching, is
knowing how to bring out the best of each student in a respectful way.
It seems that there will never be one single definition of a good teacher, as people are different and learn in
different ways and at different paces. What seems to be a consensus is the need for knowledge; teachers
must be good instrumentalists themselves. A good relationship with the instrumental teacher is also
desirable and might influence enormously in the student's motivation.
The issue of continuing or discontinuing musical activities generated the largest controversy. In table III,
the results for Brazilian and Canadian subjects regarding the question :
"During your life as a music student, did you ever consider discontinuing your music lessons
and activities?"
Table 3 - General Responses Table 3a - Brazilian Responses by Gender Table 3b - Canadian Responses by Gender
Interestingly 88% of Brazilian subjects mentioned their desire to discontinue playing their instruments
while still studying, while only 38% of Canadian subjects expressed such intention. Many reasons were
given to explain the desire to discontinue instrumental instruction. Some were of a rather philosophical
nature, when subjects questioned the validity of music in their lives and their positioning towards music:
I went through a period of questioning. Why music in such a messy world?
I felt a lack of motivation to create new things. It seemed that there was nothing new to be learned as there was a constant repetition of ideas in
instruction.
Many musicians mentioned the difficulties of music itself and the type of commitment that is necessary to
achieve a satisfactory performance level:
I wasn't sure if I was good enough for music.
I thought about discontinuing music when I found out how much work is needed in order to achieve a satisfactory performance level.
Other instrumentalists thought about discontinuing their instruction when they found out about the working
environment and its conditions. This answer appeared solely in the surveys of the Brazilian musicians, and
was the main reason given to explain the desire of discontinuing instrumental instruction. Knowing that
Brazil has no strong tradition in instrumental music education, and that students often face many difficulties
to pursue their education it is not difficult to understand the wishes to dropout of instruction. Another
explanation, still of a social-economical nature is the fact that Brazilians are many times forced to start
working professionally while still studying, and are often not ready to face the challenges and the routine of
the working environment. Then, they are more likely to be disappointed with the working environment and
its realities. Also, as Gainza (1984) pointed out, musicians in Latin America deal with constant political and
economical changes that affect thoroughly their studying conditions. Still, more studies are needed to
investigate these important social, economical and cultural issues.
The data obtained from the question on discontinuing instruction indicates clearly the existence of two
different types of motives. We could also refer to such motives as internal and external. External motives
would be those external to the individual, such as financial difficulties or lack of qualified teachers. Internal
motives would be those very particular to each individual, which could be exemplified by the answers that
referred to the periods of questioning the validity of music in one's life or a loss of interest for music. It
seems that these external and internal motives are associated with the concepts of extrinsic and intrinsic
motivation. Although difficult to classify, in this particular study, the vast majority of responses related to
continuing/discontinuing music instruction were of an internal nature. This suggests that intrinsic
motivation plays a very important role in the development of performers. The difficult issue lies on finding
ways to help students develop intrinsic motivation. In an attempt to address such question, Deci (1995)
suggested that, to foster intrinsic or self motivation, the emphasis should be in helping people create the
conditions within which others would motivate themselves rather than trying to motivate them. Deci (1995)
added that extrinsic motivation works solely when the goals and results are known in advance and matched.
Nevertheless, there are many ways to foster intrinsic motivation, through external factors.
Family support plays an essential role in instrumental music education. The vast majority of musicians
talked about an existing amateur musician in the family. Other musicians described a big inspiration
coming from siblings, parents or relatives who were involved in music, and how music seemed to be a
natural path to them. The few musicians who had no family support mentioned difficulties in staying
motivated for music, and more desires to discontinue their instruction.
Most subjects mentioned that they had family support while playing their instruments. Among Canadian
subjects, 92% said they had parental support while pursuing their instrumental instruction and 8% answered
that they had "more or less" parental support. No Canadian subjects mentioned lack of parental support.
Among Brazilian musicians, 78% mentioned support, 17% used the term "more or less" and only 5% felt a
lack of support from parents. Interestingly 81% of the total number of subjects (Brazilian and Canadian)
commented on the existence of at least one person in the family playing a musical instrument. The
instruments played by the family members varied: - violin, piano, guitar, accordion, flute, recorder, cello,
bassoon, oboe and voice.
Another agreement found in this study, relates to ensemble experience. All respondents, no exception,
mentioned the importance of ensembles in instrumental music education. Ensemble experience is seen as a
form of broadening student's perspectives on music as the student learns more repertoire, develops
attention, focus, concentration, discipline and intonation. Students also interact with other people learning
respect and leadership. Especially if we consider that a vast majority of musical activities involve ensemble
work. One subject mentioned that, through ensemble experience, students learn early to sacrifice their own
musical ideas for the greater goal of the grand musical schema, something that he or she will experience at
one point of his/her musical career. Ensemble experience was considered then, an important form of
education.
One question that was raised was related to a comparison between the number of Brazilian and Canadian
musicians involved in orchestral activities. Since there was a larger number of orchestral musicians among
the Brazilian subjects, it was hypothesized that, Brazilian students probably enroll in orchestral activities
earlier than Canadian students due to financial needs as most Brazilian youth orchestras offer small
stipendiums or salaries to their participants. Another hypothesis was that there are fewer ensemble
possibilities for students in Brazil than in Canada, which might explain a larger participation in orchestras.
Still further research is needed to answer these questions.
CONCLUSION
The present study investigated the role of teachers, parents, family and participation in ensembles in
students' decision to continue playing musical instruments. In agreement with previous research on
motivation in instrumental instruction, this study found that teachers and family provide a very important
source of motivation for instrumentalists regardless of their cultures. The same can be said about
participation in ensembles. As many musicians described, by participating in ensembles, students learn to
share musical ideas, gain sight reading skills, and develop perception, attention, concentration and
leadership.
However, cultural aspects should be taken in account when considering students' motivations for starting or
discontinuing music training. In this particular study, most musicians had parental support and had a
musician, professional or amateur, in the family. Perhaps a study conducted with a larger number of
subjects would provide different results. Interestingly, it was evident that depending on cultural
background, subjects responded questionnaires in one or another way; using more or less words and
different expressions reflecting different understandings. These cultural aspects are extremely important as
they show us differences and similarities among people, and should be carefully observed in further
research.
Nevertheless, intrinsic motivation seems to be a key concept in the development of performers, regardless
of cultural background. It is still unclear how people gain and lose interest for music; or what causes them
to initiate or terminate musical activities and performance. We could speculate that intrinsic motivation is
related to feelings of competence and autonomy as suggested by Deci (1995) and Csikszentmihalyi (1990).
Perhaps teachers could help students develop some sense of competence and autonomy by setting goals that
are not beyond or below each student's capacity or need; creating conditions for the development of
intrinsic motivation.
Still, there are many questions related to motivation for music, many that we are not able to answer
precisely. Are there genetic differences that determine people's involvement in activities that require a lot of
practice, patience and concentration, such as music? Can we compare music to passion in the sense that, in
both: music and passion we get "enchanted" and supposedly, if we are able to maintain this "enchantment"
for long periods of time we might develop strong skills and intrinsic motivation to make a long term
commitment? In other words, can we as teachers, help students keep enjoying music throughout the years
while teaching them the necessary, and often, difficult skills?
In summary: there is still much to be discovered about human motivation for music. Hopefully, research in
this area will help us understand and teach the beauties and challenges of music in a more meaningful way,
considering enjoyment and pleasure. A student's relationship with the instrument can be a very rich one. We
just have to help students find their own paths of development and enjoyment through instrumental music.
As Leonard Bernstein (1964), the American conductor and composer once said:
" Why? Motivated by what? That, thank Heaven, is still a glorious mystery; and it is a mystery that
enshrouds every artist I know, rich or poor, successful or not, old or young. They write, they paint, they
perform, produce, whatever, because life to them is inconceivable without doing so."
REFERENCES
Allen, M.L. (1998). An Investigation of Selected Retention Variables Among Middle School String Students. Paper presented
at the Music Educators National Convention. Phoenix, AZ.
Anderson, M. (1996). A study of motivation and how it relates to student achievement. Canadian Music Educator, 38, 29-31.
Asmus, E.P. (1994). Motivation in music teaching and learning. Quarterly Journal of Music Education - Teaching and
Learning, 5, 5-32.
Asmus, E.P. (1986). Students beliefs about the causes of success and failure in music: A study of achievement motivation.
Journal of Research in Music Education, 34, 262-278.
Austin, J. (1991). Competitive and non competitive goal structures: an analysis of motivation and achievement among
elementary band students. Psychology of Music, 19, 142-158.
Borges-Scoggin, G.A. (1993). A study of the pedagogy and performance of string instruments in Brazil and the social,
cultural and economic aspects affecting their development. (Doctoral Dissertation, University of Iowa, 1993). Dissertation
Abstracts International, 49. (University Microfilms No. 9421227-dd).
Chang, M.L. & Costa-Giomi, E. (1993). Instrumental Student Motivation: An exploratory study. Missouri Journal of Music
Education, 30, 18-25.
Collins, A., Brown, J.S. & Newman, S.E. (1989). Cognitive apprenticeship: teaching the craft of reading, writing and
mathematics. In L.B. Resnick (Ed.), Knowing, learning, and instruction: Essays in honor of Robert Glaser . Hillsdale:
Erlbaum, 453-494.
Csikszentmihalyi, M. (1996). Creativity - Flow and the psychology of discovery and invention. New York: Harper & Collins.
Csikszentmihalyi, M. & Csikszentmihalyi, I.S. (1993) Family influences on the development of giftedness. In G. R Block
(Ed.): Proceedings of the Symposium on the Origins and Development of High Ability, held at the CIBA Foundation, London,
January 25. Chichester: John Wiley and Sons, 187-206.
Csiksentmihalyi, M. (1991). Flow - The psychology of optimal experience. New York: Harper & Row.
Davidson, J.W. & Scutt, S. (1999). Instrumental learning with exams in mind: a case study investigating teacher, student and
parent interactions before, during and after music examinations. British Journal of Music Education, 16, 79-95.
Davidson, J.W., Sloboda, J.A. & Howe, M.J.A. (1995). The role of parents and teachers in the success and failure of
instrumental learners. British Journal of Developmental Psychology, 14, 399-412.
Deci, E. (1995). Why we do what we do - Understanding self-motivation. New York, Penguin Books.
Doan, G. (1973). An investigation of the relationships between parental involvement and the performance ability of violin
students. (Doctoral Dissertation, Ohio State University, 1973). Dissertation Abstracts International, 49.
Ericsson, K.A. & Charness, N. (1994). Expert Performance - Its structure and acquisition. American Psychologist, 8, 725-747.
Eisner, E. (1993). Objectivity in Educational Research. In: M.Hammersley (Ed.) Educational Research: Current Issues.
Toronto: Paul Chapman Publishing Company.
Gainza, V. (1984) Music in the Americas. International Society for Music Education Yearbook, 11 (1984): 30-37
Henson, M. (1974). A study of dropouts in the instrumental music programs in Fulton County and City of Atlanta School
Systems. (Doctoral Dissertation, Florida State University, 1974). Dissertation Abstracts International, 49. (University
Microfilms No. 1976-01738-dd).
Howe, M.J.A. (1993). The early lives of child prodigies. In: G.R. Bock (Ed.): Proceedings of the Symposium on the Origins
and Development of High Ability, held at the CIBA Foundation, London, January 25. Chichester: John Willey and Sons,
85-105.
Howe, M.J.A. & Sloboda, J.A. (1991). Young musician's significant accounts of significant influences in their early lives. 1.
The family and the musical background. British Journal of Music Education, 8, 39-52.
Howe, M.J.A. & Sloboda, J.A. (1991). Young musician's significant accounts of significant influences in their early lives. 2.
Teachers, practicing and performing. British Journal of Music Education, 8, 53-63.
Klinedinst, R.E. (1992). Ability of selected factors to predict performance and retention of fifth grade instrumental music
students. Bulletin of the Council for Research in Music Education, 111, 49-52.
Lehman, A.C. 1997). The acquisition of expertise in music: Efficiency of deliberate practice as a moderating variable in
accounting for sub-expert performance. In: I. Deliege & J.Sloboda (ed.) Perception and Cognition of Music. Taylor and
Francis, East Sussex, 161-190.
Mackenzie, C. (1991). Starting to play a musical instrument: a study of boys and girls motivational criteria. British Journal of
Music Education, 8, 15-19.
McPherson, G. E. (1997). Cognitive Strategies and Skill Acquisition in Musical Performance. Bulletin of the Council for
Research in Music Education, 133, 64-71.
Manturzewska, M. (1990). A biographical study of the life span development of professional musicians. Psychology of
Music, 18, 112-139.
Martignetti, A. (1966). Causes of elementary instrumental music dropouts. Journal of Research in Music Education, 13,
177-183.
Sandene, B. (1997). An investigation of variables related to student motivation in instrumental music. (Doctoral dissertation,
University of Michigan, 1998). Dissertation abstracts International, 58/10. (University Microfilms No. AAT 9811178).
Simonton, D.K. (1999). Talent and its development: An emergenic and epigenetic model. Psychological Review, 106,
435-457.
Sloboda, J.A. (1993). Musical ability. In G.R. Bock (Ed.), Proceedings of the Symposium on the Origins and Development of
High Ability, held at the Ciba Foundation, London, Jan. 25. Chichester: John Wiley & Sons, 106-118.
Sloboda, J.A. & Howe, M.J.A. (1991). Biographical precursors of musical excellence: an interview study. Psychology of
Music, 19, 3-21.
Sloboda, J.A. & Howe, M.J.A. (1992). Transitions in the early careers of able young musicians: choosing instruments and
teachers. Journal of Research in Music Education, 40, 283-294.
Sosniak, L.A. (1985). Learning to be a concert pianist. In: B.S. Bloom (Ed.) Developing Talent in Young People. New York:
Ballantine Books, 143-167.
Sternberg, R. (1998). Abilities are forms of developing expertise. Educational Researcher, 27, 11-20.
Swanwick, K. (1994). Musical Knowledge - Intuition, analysis and music education. London: Routledge.
Younker, B.A. & Smith, W.H. (1996). Comparison and modeling musical thought processes of expert and novice composers.
Bulletin of the Council for Research in Music Education, 128, 25-35.
Whitaker, N.L. (1996). A theoretical model of the musical problem solving and decision making of performers, arrangers,
conductors, and composers. Bulletin of the Council for Research in Music Education, 128, 1-13.
Back to index
Proceedings paper
Toshio IRITANI
CHÔFU WOMEN'S COLLEGE
Kawasaki-shi, Kanagawa-ken
Japan 215-8542
1. Introduction
The purpose of this paper is to argue that there is a different method for adults' learninging of music
(especially for piano playing) ,which is a practical, much faster method and more practical way for of
getting acquainted with piano playing. In the Formal piano education of piano playing, music teachers
normally start with the lessons of learning simple melodies,. Then students and gradually go on to learning
more complex melodies, using such traditional textbooks as the "Beyer" which is written for early
beginners.
In the case of adult, however, such a method of learning is not necessary for adults and this author would
like to proposes a more economical, practical, and efficient and method of learning music. This is
especially true with respect to piano playing.
The structure of music is very similar to the structure of a sentences written in a foreign language. An
adult who has experience of learning a foreign languages, or are acquainted with different kinds of
languages, can probably learn faster to learn the scores written in the stave of musical notation faster. Just
as a written and spoken languages consists of sounds, rules for word combinations and its word order
(grammar), stops, phrases, and articulation, paragraphs, and chapters,. so in the same way music consists
of musical sounds (octaves) that go up or down in steps from lower to higher pitches and vice vice versa.
How each sound is combined and transformed transforms sounds in harmony, forming a parts and phrases.
In addition, in contrast with young children, adults have a wide experience and knowledge of a good
number of good melodies. Adults and memorized these melodies and can even sing them when an old
memory can reactivate the melodies in their brains. The problem is, how to read the notes composed by
composers in order to transform these good melodies into musical notation by playing on the piano
keyboard, through reading the notes composed by a composer. Even in that case when adults know the
melodies, they still must know the basic rules of musical notation, and how it is expressed and
understood,. This is the same as how children or matured adults knows consciously and unconsciously the
rules of grammar and how a sentences and phrase is composed before they speak.
My own experience, is based on starting to play the piano after passing the age of sixty, and have an
experience of learning to play the piano for five years, and still having a piano teacher, but I have now
reached the stage of playing Beethoven's Opus 57 (the so-called "Appassionata") , especiallyincluding the
first and the second movements. In the past I had an experience of haveing successfully played in recitals
for a small groups after learning some easy classical music, such as Morzart's Andante Cantabile (Opus
545), Schumann's Träumelei, Chopin's Nocturn Opus 9- No.2 Etudes Opus 25-18 (the so-called Prelude of
"Rrain-drops"), and Grande Valse Brilliante Opus 18 with a great applause with a great applause. I would
like to explain my footstep of how I succeeded in playing the piano so quickly. and was able to have my
performances greatly applauded.
2. The Basic Theory of Adult's Music Learning of Music: The Cognition and Comprehension of the
6. Also to be understand are the equivalent time values, e.g., x2= , =2 1/2, or 6 s, etc., and the
7. Other special notations such as the slur, , crescendo, and decrescendo, turn, etc.
8. Performance direction (which is usually written at the start of classic music usually written in Italian, such
as allegro, adagio, andantino, and tempo, etc.,). and one must decide how to following the speed of play
through this direction. (The above are based on E. Taylor's Music Theory in Practice, 1990, pp. 4-23.)
In addition, there are many more special signatures which that are occasionally encountered in many
compositions, such as the cancel, staccato, trill, and tremolo, etc, etc.,. and also the Students positions of
finguring should also be assisted in the fingering positions by a piano teacher.
The next step is how to transform the acquired musical notations into the composed text written by a the
composer. This process occasionally helps to activate the melody that was previously learned and experienced in
one'swhich is stored in the adult past memory.
1. The Practice with Some Simple Melodies to Get Acquainted with the Above Notations
Here are two abbreviated musical notes melodies. The first was written by Beethoven, and is the last part
of the "Chorus" in the 9th Symphony. and the second was written by Johannes Brahms and is the
beginning of the his Symphony No. 1. written by Johannes Brahms.1
Both melodies are heard quite often and everybody knows this these melodies quite well. If one an adult
tries to play these streams of notes with while paying attention to their basic time values, then the adult
can play very easily and comfortably.
A. Chopin's's Example
Some of Chopin's earlier and simpler work can also be analyzed in the following way. Chopin's has produced a
number of difficult classical piano music in different genres (such as Mazurukas, Polonaises, Waltzes, Ballads,
and Nocturns, etc.) which liberalized the tonal structures, combined the elements of Polish dancing, in used
bright tones, and expressed his delicate moods and sentiments in sublimated forms. This is especially expressed
in his Nocturns, Etudes, and Waltzes which use a number of alternations of sharps and flats in staves. Some of
his earlier and simplest works can be analyzed in the following way.
1. There is a continuation of the same tones (cf. the beginnings and the middle of the Etude "Rraindrops",
and also at the start of the Grand Valse Brilliante, Opus 18. See the beginning parts of both compositions).
2. There is also a repetition of a group of harmonic melodies.
3. In the above these two compositions, there is no big major modulation of the tonal structures, compared
with his other difficult pieces, so that an inexperienced pianists can follow the stream of melodies in such
phrases after grasping understanding the whole structure of the above two examples. Here are two
relatively simple examples of Chopin's work that are mentioned in the above.
(cf. Chopin's Etudes 28-15 (the Pprelude of "Rrain-drops") in the opening phrase and the middle
parts (depicting the rainfalls) and the opening paragraph of the Grand Valse Brilliante Opus 18.)
A. Beethoven's Sonata
While Although Beethoven composed 32 different sonatas in his life, the style and the contents of his earlier,
accomplished and later of his works are found out to be quite different from his earlier in the stages of his career,
from one to another, while the characteristic tonal elements of his phrases (mostly melancholic and suddenly
bursting out in tones), however, can be traced in each of his compositions.
His most well-known sonatas were composed in his the period around when Beethoven was in his late twenties
and early thirties. These sonatas are the his Opus No. 13 (called "Pathetic"), Opus No. 27-2 ("Moonlight"), and
Opus No. 57 ("Appassionata").
In the latter two sonatas, some of the easy parts are found as follows. The moonlight starts with the four groups
of sol-do-mi in the first bar line of the treble-clef and changes gradually like la-do-mi, la-do-mi, la-re-fa, and
la-re-fa in the next bar line in the performance direction of adagio sostenuto.
While the opening paragraph of the Opus 57 ("Appassionata") is very fast (in allegro assai), followed by slow
and fast bursting phrases alternatively in 12/8 time signature, the second part consists of a harmonic melodies
written in both treble clefs,. The melodies are la-do la-do-fa sol-si-mi, 3 sol-sis, la-do sol-si (upper tones) sol-si
(middle), and sol-si (lower tones) and the second tones written in the treble-clef consist of the repetition of mi
(upper tones).
The second part starts with a bass clef la-do-mi-la, la-do-mi-la, la-do-mi-la, and la-do-mi-la and goes up to the a
series written in treble clef,. This series is do-mi la-la (combination of upper and a lower tones), do-do, la-la,
sol-sol, sire-sire, re-re, sol-sol, la-la, and mi, fa-re, do-mi, do-si, and mi-re-do-mi, etc., which constitutes a song
of a praise for a lover (who maybe may have been one of his Beethoven's sweethearts).
(cf. The beginning of Beethoven's "Moon Light" and Opus 57 ("Appassionata").
2. Piano playing is analoguous to other the mechanical learning of other skills such as using computer key
boards, word processors, E-mails E-mail, and the Internet which are all recently developed mechanical
technological innovations. The only difference between music and those these other techniques exists in
the skill of hearing musical sounds and of memorizing them distinctively. With regard to the finger
movements, the mimicry of motor movements after piano teachers seems to be very important with regard
to the finger movements.
3. The problem of the speed as written in at the head of each composition (performance direction, such as
allegro, presto, largo, etc.) and the problem of a good coordination of the left and right hands still exists,.
but this will be improved by further practice, and by listening to many performances of experienced
pianists.
4. (4) On looking back, I can see that it was rather slow when I first learned ( the understanding of the music
fundamentals) was rather slow, but I had a skill from the beginning, however, concerning the expressiong
of the melodies that I have heard since my childhood. I also had strong motivation to become a good piano
player and musician, and I have not forgotten to practice and some time ago what seemed to be difficult at
the beginning could be overcome by later rests and practices. In this context, I think what Professor
Bartlett called "effort after meaning" was activated in my mind concerning the memorization of melodies;.
in other words, That is to say that a schemata of tonal elements was enlivened unconsciously in my brain
(Bartlett, 1932, 1995).
Now I have a had good experience of the deep feeling or and delicate emotions of composers as to how they
expressed themselves in their compositions.
I can now identify with them in the expression of melodies, harmonies, and rhythms with certain forms of
musical notation, phrasing and articulation, modulation, ornamentation, and pauses.
Notes
1. 1)Beethoven's Symphony No.9 adapted from Tomoe Kitamura (1994). Piano Lessons for Adult
Beginners, Ongaku-no-tomosha, p. 19.
2)Brahms Symphony No. 1 from James Bastienne (1981), Favorite Classic Melodies, Kjes, West,
San Diego, California, p. 9.
2. 1)Mozart, KV. 545 Zen-on Piano Library (1956), Mozart Sonaten 2.
P. 236.
2)Mozart, KV. 525 Eine Kleine Nachtmusik, Zen-on Piano Library (1988) p. 4.
3. 1)Chopin, Opus 28-15, Zen-on Piano Library (1955) pp. 29-30.
2)Chopin, Grand Valse Brilliante Opus 18, Zen-on Music for Piano, No. 128, p. 1.
4. 1)Beethoven, Moonlight Sonata, Zen-on Music, Music for Piano, No. 1. P. 1.
Beethoven, Opus 57 (Appassionata) G. Henle Verlag, p.4.
References
1. Taylor, E. (1990). Music Theory in Practice, London, The Associated Board of the Royal School of
Music.
2. Baxter, H. and M. (1993). The Right Way to Read Music, U.K. Right Way.
3. Keller, H. (1955). Phrasierung und Artikulation. transTrans.lated by Uemura, K. and Fukuda, T. (1969),
Tokyo, Ongaku-no-tomo-sha.
4. Köhler, W. (1947). Gestalt Psychology: An Introduction to New Concepts in Modern Psychology, New
York, (??Liverlight?? spelling does not match with above??), republished (??or reprinted??) reprinted in
1970.
5. Bartlett, F.C. (1932). Remembering: An Experimental and Social Study. Cambridge, Cambridge
University Press, reprinted in 1995.
Back to index
Proceedings paper
Competence Beliefs
These refer to individuals' beliefs about their ability and their own evaluations of their competence in
different domains. The beliefs relate to their achievement performance, choice of achievement tasks,
the amount of effort applied, cognitive strategy use, achievement goals and overall self-worth
(Wigfield and Eccles, 1994). Harter (1992) states that an individual's perception of competence may
contribute directly to motivation. Researchers have been interested in how these beliefs may change
over time (see for example, Wigfield, Eccles, Mac Iver, Reuman, and Midgley, 1991; Wigfield,
Eccles, Yoon, Harold, Arbreton, and Freedman-Doan, and Blumenfeld, 1997).
Expectancies for Success
These are defined in terms of individuals' beliefs about how well they will do on an upcoming task.
(Wigfield, 1994), i.e. the confidence they have in their ability. Eccles et al. (1983) state that
achievement expectancies play a significant role in an individual's academic choice. Consequently, it
is important to identify those components that are shaping those expectancies. They state that these
expectancies are most directly influenced by self-concept of ability and the estimation of the difficulty
of the task.
Eccles et al. (1983) define self-concept of ability as "the assessment of one's own competency to
perform specific tasks or to carry out role- appropriate behaviours" (p.82). It is formed through a
process of observing and interpreting one's own behaviours and the behaviours of others. In terms of
an individual's perception of the difficulty of the task, Eccles et al. conclude that these perceptions
may influence self-concept of ability in that individuals who rate a task as more difficult develop
lower estimates of their own abilities for that task. One other important factor that is believed to be
influential in shaping an individual's self-concept is the perception of others' expectations.
Self-efficacy
Proposed by Bandura (1994), it is defined as individuals' confidence in their ability to accomplish a
task. Eccles, Wigfield and Schiefele (1998) state that Bandura's theory emphasises expectancies for
success which are distinguished between two types of expectancy beliefs; that of outcome
expectations (beliefs that certain behaviours will lead to certain outcomes) and efficacy expectations
(beliefs about whether one can perform the behaviours necessary to produce an outcome). These two
are distinguished in that an individual may believe that a particular behaviour produces a particular
outcome, but may not believe that they can do that behaviour.
An overview of the literature on self-related beliefs has been presented in order to provide a
framework for the analysis in the present study. In order to understand why and how musicians talk
about these constructs in relation to their own performances, qualitative research methods are most
appropriate. Henwood and Pidgeon (1992) state that qualitative analysis is the representation of reality
through the eyes of the participants, and it is this reality of musical performance that the present study
aims to explore. Discourse analysis is the method that will be employed in order to identify and
interpret the individual's construction of self. It is considered that how an individual talks about these
constructs provide a sense of how their identity is constructed. This is determined by the different
social encounters in which they are involved.
SOCIAL CONSTRUCTIONISM AND DISCOURSE ANALYSIS
According to Burr (1995), an individual's identity is constructed socially, out of the discourses
(spoken interaction) culturally available to us. It is the product of social encounters and relationships
with others. Therefore, identity is created rather than discovered. Neimeyer (1998) refers to "arenas",
as the two concepts that underlie the social constructionist movement; 'language' and 'the self'.
Widdicombe and Woffitt (1995) state that the focus should be on the text or discourse within or
through which selves and identities are socially constructed.
Language is organised into discourses or interpretative repertoires, which in turn employ various
devices that can be used in the identification and interpretation of the individual's construction of self.
Sherrard (1999) defines a repertoire as a recognisable, self-contained point of view. Within this notion
of interpretative repertoire, three aspects of language have been proposed (see, for instance Potter and
Wetherell, 1987). Firstly, that when analysing the accounts that individuals give, there is always
variation, which, in this particular movement, carries more importance than that of consistency.
Secondly, talk has a variety of different functions, other than just the process of transmitting
information. Thirdly, an individuals talk (or writing) is drawn or constructed from existing resources.
It is these resources that have been labelled repertoires. Burman and Parker posit that these repertoires
are not newly created when individuals speak, but instead have been borrowed or remoulded for a
particular purpose at that particular time.
Within the repertoires, a number of different devices may emerge. These could include for example,
resolutions which Sherrard (1999) states are attempts by the individual to resolve contradictions when
they become apparent, and subject positions, which Burr defines as the positions that individuals
occupy in talk and the implications that this has for the individual involved. She illustrates this idea of
positioning in the area of gender, drawing on how men and women are positioned within particular
discourses and what this might be saying in terms of the power relations between them. Other devices
such as dilemmas or metaphors presented by the individual may also be apparent in the text, which
Method
Participants
Seven musicians (three females and four males) from the university music department, preparing for
final year recitals were asked to participate. Although approached randomly, a balance between
gender and choice of instruments was sought. The range of instruments included guitar, flute, piano,
clarinet and saxophone. The conditions of being able to do the recital were that they were all of
diploma standard and had to achieve a minimum of 60% in the prerequisite module. In addition, three
of the seven musicians chose to do two recitals, which required auditions. Choosing a second recital
had no bearing on the interviewee's level of musicianship and was considered in the same light as
those that chose to do only one recital.
Procedure
The seven participants were interviewed separately at two points in time. The interviews were
semi-structured and were tape recorded (with permission) for later transcription. All participants were
assured that their data would remain anonymous.
The first interview took place one month before the recitals and lasted approximately one hour. The
second interview of approximately fifteen minutes was conducted no more than seven days after the
recital, in order to confirm the accuracy of the interpretations made during the first interview. This
was to clarify and expand on any areas that were ambiguous and also to record participants' reflections
on their recital performances.
Data Analysis
The raw data was first transcribed before analysis. Through discursive analysis (Potter and Wetherell,
1987), specific attention will be given to how the musicians construct their conceptions of ability and
their evaluations of the performance outcome. In particular, it will be noted where an individual may
apportion blame or offer justification for the performance outcome.
To increase the credibility of the research, the interview transcripts were read and re-read, with all
interviews analysed by two researchers. On discussion the same interpretations were concluded from
the analysis. Within the interviews itself, the second, follow up interview was an opportunity to
confirm with the participants that the interpretation by the researcher of the initial interview was
correct. This idea of objectivity, in this instance, participant validation, has been proposed by
Henwood and Pidgeon (1995) in terms of making the research more credible.
Reflexivity involves a reflection of the ways in which the researcher can influence the research
process. The researcher needs to be aware how their biases, interests, values and experiences can
affect the research and subsequent interpretations (Banister, 1994), as well as focusing on the power
imbalance between the interviewer and interviewee.
feeling nervous, you can let it all get on top of you" [Mark].
Mark suggests that the control comes from within, that the decision to choose to 'lose the situation' is
his. This suggests a reflection of his belief in his ability, i.e. if he believes in his ability then he should
be able to stay in control of the situation. It may be interpreted that he is suggesting that those who are
out of control are those who are less confident in their ability. This idea is supported as Mark makes
reference to nerves equalling the level of control the performer has:
"If it's not going well maybe you're too nervous and it's out of your control, if you do feel
that then I just try and kind of make sure that you kind of step back from it and think
right this is me playing and I'm in control, I can play this music...I'm going to really listen
to the notes I'm playing, and then you step back, you get out of that feeling of
nervousness ....if you do feel it's all running away from you it's because you are not all in
control, it's at that point that you've got to think I'm in control of this" [Mark].
Mark describes how he maintains control of the situation. At no point does he attribute a lack of
control to factors outside the individual. Because he is able to do this, he comes across as a performer
with a certain maturity, a musician that is at ease with his own ability. This idea is supported at
various times throughout the interview, and which will be illustrated further in the analysis.
Effort
For most of the musicians, there are accounts of how much effort they have been putting into the
recital. Through mentioning their efforts they illustrate that they have taken responsibility for the
factors that are within their control, such as putting in the work that is required. This way, a
potentially negative outcome can not be attributed to a lack of effort, as the following quotes suggest:
[When speaking of the music] "There is a lot of work gone into that for me, to give an
individual performance of it...I'd have probably didn't realise how much hard work
actually goes into it because even now, a few weeks before, there is still a lot of work
that needs to be done, so you think no matter how long, I mean even though I started
preparing for these about last year, you think well you know, should I have started
preparing earlier should I have thought about them the year before, I think its hard for me
to do that with other, with other things going on as well, with another subject" [Rachel].
[After the recital] " I was still just trying to enjoy it and been just thinking well I've put a
lot of work into this I've been practising for a year I don't want it to all go wrong on the
final day...at first it was a bit daunting but just as it went on I was thinking oh yeah this is
me I can play these pieces, all this work that's been put in" [Rachel].
Rachel implies in her discourse that she has put a lot of effort into the recital. She makes it explicit
that she started working on the recital over a year ago, and points out how much work is involved by
stating that there is still a lot of work that needs to be done. However, she justifies not spending
anymore time on it by stating that she has other subjects competing for her time. It would be
reasonable to assume that a year spent on working towards a recital is long enough, and this way it
could not be concluded that Rachel did not put the required effort in. Her discourse not only illustrates
effort, but also commitment to the recital, as she has spent a year working towards it. Because Rachel
has invested so much time, she is more concerned about it going wrong on the day. It may be
interpreted that because she has convinced the interviewer that so much time went into the recital, a
negative outcome may be attributed more to her musical ability as a result.
However, for one musician, Nicola, her aim is to create the impression that she has not put the work
in. Through making reference to her laziness, or presenting herself as such, she provides a reason to
attribute a negative recital to nothing other than a lack of effort. This has the effect of taking the
emphasis away from attributing a negative outcome to her ability as a musician:
"...I am a bit lazy, I don't want to have to put the work in, I'd just like to be able to get up
there and do it...I think in a sense I prefer to just go in on the day and do it you know, I'm
afraid of getting bored....I don't know I seem to have a fairly laid back attitude to it but
that's probably more to do with laziness than, because then I do occasionally have stress,
panic moments when I think that I'm not going to be able to do it and I think well you
know, I should have started earlier, but on the whole I probably don't really think about it
so much, when it gets to the last couple of weeks I just get on with working on it because
I know that I have to then".
AI: "So what are you scared about?"
Nicola: "the worry about the overall degree...it will probably be the kind of thought on
the day that I've probably messed it up by not doing enough practice and it will mess up
my whole degree...because I leave it to the last minute that's part of what I worry about
just the fact that there are parts of it that could go wrong that I perhaps could have
practised more" [Nicola].
Nicola's justification for not putting in the work, is that it helps her concentrate more and stops her
from getting bored. All her stressful moments or concerns about it going wrong are attributed to her
lack of practising, or not starting the preparation early enough. However, this also has the opposite
effect in that if the recital goes well, it is likely to be attributed to her ability, as she has already
presented herself as someone who did not work for the recital. Nicola points out further that her
laziness towards her recital is a result of who she is, as later on in the interview she states that she
approaches subjects other than music in a similar way.
Musical Skills and Musical Awareness
For a few of the musicians, a significant proportion of the interview focuses on the music that they are
playing. Mark, in particular, provides great detail as to what he is trying to achieve through the music,
which is the main focus of his interview. His discourse offers an understanding of what he believes is
involved in being a performer and in what he wants to achieve through the performance. This style of
discourse serves as a way of representing him as an individual with a high level of musical ability. He
appears as though he has a lot of experience in performing and has a good knowledge of what is
required of him as a musician, as the following quotes illustrate:
[In the solo pieces] "I'm very aware of the tone produced, and very aware of the
dynamics... its like if you are an artist and you've got a palette of paint, you can like sort
of dip into all different sorts of colours, just by changing the way you focus on the sound,
so that's what I concentrate on, and you just try and make some coherent musical idea out
of what I'm playing".
[Of the music] " I know exactly where it is going....if you think of it like where's this
within itself, where is that phrase musically going, how is it all related, what are the
important notes here, what's leading to it, then you just make sense of it musically and it
all becomes, one part of the coherent idea, so it makes it easier to perform".
"I quite like doing solo stuff because then, its all on you,.. .you can really concentrate on
the sound you're producing, ...you're very free to interpret it as you like...if it's a piece
that really moves you, you listen to it and it really does something for you then, when
you come to interpret it, you put so much more into it...if you don't understand the
piece... then you don't make such musical sense of the notes" [Mark].
Mark's analogy of an artist is used to describe the process he uses to communicate the music. It also
suggests that he has worked hard at preparing for the recital (e.g. "I know exactly where it is going").
In addition, playing solo music gives him the opportunity to demonstrate this ability to others.
Through his discourse, he presents himself as a very confident and competent player, which is also
supported at other times, for example:
"I'm fairly competent on the clarinet....I know I'm one of these kind of people who
whatever will get though it" [Mark].
However, it may also be interpreted that the attention to detail he gives when talking about the music
in the interview takes the focus off him as a player, as often he makes generalised statements as to
what is required when performing.
Kate also gives detailed descriptions about her approach to the music. In this instance though, it is
used as way of justifying the amount of effort that is required. She also equates this higher level of
performing with experience and years of practice, which, in a general sense, is what music ability
means to her.
[Speaking about the music] "You need to give it the right sound and it's that side of
things that needs more effort too sometimes".
AI: "What do you mean by the right sound?"
Kate: "Well you can't just play the notes, and you can't just put the dynamics in because
that is not enough, its got to have a line and its got to have expression in it.... Its
definitely the sound and the feeling attached to it that you give people, and give yourself,
which I think is one of the harder things to do and I think it is the thing that comes
between years of practice and years of playing, it's the experience element I think again
that you need".
Through her description, Kate describes the pace of the pieces as the main source of what makes them
technically demanding. This is also a feature of Paul's discourse as he talks about the music, although
his concern is to make it appear easy to the audience:
"The other pieces is sort of like very, technically involved, its very sort of, well,
ornamented and its quite virtuosic really, it's a bit of a nightmare (laughing), in my book,
I have to move my fingers pretty fast and stuff but yeah its great fun, hopefully that will
come across as well, the fun aspect of it will come across, trying to make something that
is incredibly hard seem quite simple and fun" [Paul].
As he talks about another one of his pieces, Paul indicates that not only was he having problems
learning the piece, but his accompanist was also experiencing difficulty. This lends support to his own
interpretation that the music is technically demanding. His confusion in relaying the different time
signatures of the piece adds to the overall impression that the piece he is learning is very difficult. The
fact that he states that he has only just worked it out implies how difficult it is. The pace of the piece
is also expressed in the way that he describes collapsing in a heap at the end of it, suggesting that it is
an exhausting piece to play:
[Speaking of one piece] "..Just looked at it and though, my God (laughing), where do I
possible start, its got an introduction part which is absolute hell to be honest, it's written
in simple time in respect that its sort of 12/4, 12/4?, no it isn't in 12/4, 12/8, ok and its in,
then it swaps between 4/4 and things but, that's quite simple, but the piano is into 12/4
and 12/8 and we're all different time signatures and we like what's going on, so its taken a
while to break that down into actual manageable parts..... we've just worked it out how to
actually play that part, but, the introduction sorts of like, its very sort of like, this is me,
very sort of, very showy-off-ee, but you just go out there and give it to them...it just
explodes at the end and when I finish I'll just collapse in a heap, but yeah, it will be good,
I'm looking forward to it" [Paul].
Through mentioning the technical demands of his programme, Paul presents himself as someone who
is a competent musician in that despite his struggle to learn and play the piece, he is able to make it
through to the end.
control, and this idea is supported by the way that Kate describes that she 'thinks I am getting above it
now... and I'm making it go the way I want it to go'. In this way, Kate is able to attribute a negative
performance outcome to the degree of risk in the programme. It may be suggested that the greater the
degree of risk the more the outcome can be attributed to external factors.
The risk is presented both in terms of her degree and herself as a musician. Its cost is in terms of being
awarded a low mark and potential negative evaluations being made of herself as a musician. It may be
suggested that the word 'risk' is a negative term.
This notion of risk serves as an interpretative repertoire for understanding her experience. Risk is used
in terms of representing the consequences of choosing a technical programme. When compared to the
other participants, the notion of challenge is another interpretative repertoire, which tends to carry a
more positive meaning. For example:
"there is a lot of work gone into that for me, to, to give an individual performance of it
whereas a lot of music you can see the phrases and the dynamics which you can follow,
but on this there isn't anything so I thought that was more challenging to do something
like that, to put more of my own thoughts into the music" [Rachel].
"I'm doing a concerto... and it's quite challenging as well it's got a few really quite
technically difficult passages" [Nicola].
In this context, challenge suggests a belief in ability, i.e. that the individual is secure enough in their
belief to push themselves out of 'the boundaries' which ultimately means taking responsibility for the
outcome. Kate knows her own boundaries, she implies that she is still taking a risk even within what
she believes she can play. When Kate does use the word challenge it is used in a positive way as a
motivator, i.e. without challenge no effort would be made. Its use in this context is in referring to a
global belief that without challenge the recital would be too easy. It is presented in a depersonalised
way as she takes the focus away from herself (i.e., through the use of 'you' rather than 'I').
The Audience
For one participant, Simon, his expectations for a successful recital are weighted heavily according to
how he views the expectations of his audience. For Simon, the real issue is how he is going to be
judged as a musician. It may be suggested that he is concerned that his own perceptions of his musical
ability will not match the perceptions of the audience or judges. Therefore, the greater the discrepancy
between his own and others' perceptions, the more nervous he is likely to be. How good he really is
will be indicated through audience feedback and the recital mark. However, through his discourse,
Simon suggests a lack of confidence in his ability. His perceptions of the audiences' evaluations are
always negative, it never occurs to him that they may make positive evaluations:
"Perhaps it is a little bit of paranoia but you think that the audience can pick up on
absolutely everything you do wrong...when you are playing that's the way it seems and
once you make the first mistake it's as if the audience are, they're kind of bearing with
you rather than listening to you, that's not true but that's the way it seems, well for me
anyway when I'm playing....I know it might seem a but selfish but it's as if the audience
are a distraction, it's as if there against you rather than being on your side ... I don't know,
its probably the nervous paranoia type thing again in that perhaps there are people in the
audience thinking, well just waiting for you to slip up and completely fall down"
[Simon].
"You are being assessed, there are people there who are going to be sitting there judging
how good I am as a musician...I'm worried that the people assessing me will think I'm
terrible or whatever...it is a personal thing, how well you can play your instrument, its
very very personal, for somebody to actually put a number on it... if they put a low
number on it then its going to be soul destroying" [Simon]
For Simon the mistakes lead to a negative view of the performer, and after the first one has been made
it is as if the audience is waiting for their negative evaluation to be confirmed. This also suggests a
lack of belief in his ability, as it implies that he knows he is going to make mistakes before he even
begins the recital.
For Ben, this idea of the audience evaluating him is also significant, particularly when the audience do
not know him. If the audience know him and his ability, and the recital is not as good as it could be,
then the audience are less likely to attribute the outcome to his musical ability but rather to the recital
situation. Those that do not know him are left making an assessment of his ability:
"You don't want the recital to reflect badly on your recital marks, because there is an
audience there, there is a number of people that are going to get an interpretation of you,
perhaps a lot of people there won't know me so I think it will be a bad thing for them to
get a bad interpretation of me, I think that's what worries me the most" [Ben].
For Ben, what adds to these evaluations is how he is going to be compared to the other performers.
The order of the recitals means that he is performing last and this puts pressure on him to do well. He
illustrates a lack of belief in himself, as he does not want to be compared to the other musicians that
have gone before him:
"I'm one of the last, I have to sit through an awful lot of recitals, an awful lot of people
doing well, and that kind of compounds the anxiety... however much they say that they
don't I'm sure that your recital's gauged against other peoples" [Ben].
General Discussion
The analysis in this paper has dealt with two main questions, 1) in what ways do the musicians
construct their notion of ability?, 2) how do the musicians apportion blame or responsibility in terms
of their ability and the recital outcome? In light of the discursive analysis, each will be discussed in
terms of what this tells us about their sense of identity.
In what ways do the musicians construct their notion of ability?
One of the main ways in which the musicians present their self-beliefs about ability is through their
discussions of the various difficulties associated with their programmes. Kate and Paul gave examples
where the fast pace of the music gives the impression that it is very difficult. Others emphasised the
difficulty of the music by discussing how long it took them to learn it. Thus, ability means being able
to perform successfully a technically demanding programme.
In contrast, for a few of the musicians, a significant proportion of the interview focused on the music
that they were playing not in terms of its perceived difficulty, but in terms of being able to
communicate musical ideas. This was evident in Mark's account where he was keen to display an
understanding of the music, which in turn suggested a high degree of musical awareness. In other
words, musical ability is presented as the ability to go beyond mastering the technical demands of the
music and displaying this ability to the audience, but also being able to communicate the musicians'
own interpretation of the music.
For the musicians, how they perceive their ability is related to how they evaluate the outcome of the
recital in terms of how much it is going to be affected by nerves or poor audience evaluations. This
idea was illustrated in Kate's discourse when she described the outcome of the recital as dependent on
the amount of risk that was taken in the programme choice.
For the musicians, their sense of self is related to the idea that being a musician means having a high
musical ability. How the participants construct this notion of ability is therefore a reflection of their
perceptions of self as a musician.
How do the musicians apportion blame or responsibility in terms of their ability and the recital
outcome?
For all musicians the outcome of the recital was described as being dependent on the actual day. This
was suggested by the perceived level of control over the performance that emerged in their discourse.
Those with a greater lack of belief in their ability adopted less responsibility for the recital outcome,
and in these instances they tended to apportion blame to nerves and the perceived negative evaluations
made by the audience.
It may be interpreted that the construction of identity as a musician is determined to a large extent by
the musicians' beliefs in their ability and in their perceptions of the recital outcome. This was reflected
through two extremes presented in the discourse of the musicians; those who took little personal
responsibility for the recital outcome through to those that strived to adopt complete control over the
performance situation.
For most of the musicians, it is important to be seen to be putting a lot of effort into the recital. This
way a potentially negative outcome can not be attributed to a lack of effort. Again, two extremes are
presented here. One musician is eager to illustrate a total lack of effort in terms of the recital
preparation. This way, if the outcome of the recital is negative, then it can be attributed to a lack of
preparation, rather than a lack of musical ability. On the other extreme, most of the musicians want to
illustrate that they have worked hard, this way if the recital outcome is negative, they can justify the
result in terms of other factors which are out of their control, such as debilitating nerves during the
performance. Whatever the technique used, the musicians want to avoid any negative evaluations of
their ability. Their aim is to create a positive impression of themselves as a competent musician. It
may be suggested that by creating these impressions, the participants are in conflict with their own
perceptions of their musical ability. Through this discourse, they are therefore able to maintain their
beliefs about their identity as a musician.
Conclusion
In this study the theoretical orientation of social constructionism and discursive analysis was
employed to explore issues of self-belief and identity in seven musicians preparing for a music recital.
There is a need in music performance literature to provide fuller accounts of musicians' experiences of
preparing for a public performance. The analysis has not only provided a deeper understanding of how
undergraduate music students' identity is constructed within the context of music performance, but
also framed the way in which they viewed motivation. This new insight into music and motivational
theory has important implications for the development of performing musicians in terms of why they
are choosing to perform and the process of their preparation.
As there has been little research to date that has produced in-depth data on motivation and identity in
musicians, future research could explore these issues in relation to other constructs of achievement
motivation. Those in particular that can be explored are the individuals' construction of task values
and goals in terms of the recital, and what this tells us about their sense of identity as a musician.
References
Bandura, A. (1986). Social foundations of thought and action: A social cognitive theory. Englewood
Cliffs, NJ: Prentice-Hall.
Banister, P. (1994). Report writing. In Banister, P; Burman, E; Parker, I; Taylor, M and Tindall, C.
Qualitative methods in psychology. (pp. 160-179). Buckingham: OUP.
Burman, E. & Parker, I. (Eds.). (1993). Discourse analytic research. London: Routledge.
Burr, V. (1995). An introduction to social constructionism. London: Routlege.
Harter, S. (1982). The perceived competence scale for children. Child development, 53, 87-97.
Harter, S. (1992). The relationship between perceived competence, affect, and motivational
orientation within the classroom: Processes and patterns of change. In A.K. Boggiano & T.S. Pittman
(Eds.), Achievement and motivation. (pp.77-114). Cambridge: Cambridge University Press.
Henwood, K.L. & Pidgeon, N.F.(1992). Qualitative research and psychological and psychological
theorising. British journal of psychology, 83: 97-111.
Neimeyer, R. (1998). Social constructionism in the counselling context. Counselling psychology
quarterly, 11 (2), 135.
Potter, J. & Wetherell, M. (1987). Discourse and social psychology: Beyond attitudes and behaviour.
London: Sage.
Sherrard, C. (1999). Repertoires in discourse: Social identification and aesthetic taste. In N. Hayes
(Ed.). Doing qualitative analysis in psychology. (pp. 69-83). Hove, UK: Erlbaum Psychology Press.
Widdicombe, S & Wooffitt, R. (1995). The language of youth subcultures. London: Harvester
Wheatsheaf.
Wigfield, A. & Eccles, J.S. (1994). Children's competence beliefs, achievement values, and general
self-esteem. Journal of early adolescence, 14 (2), 107.
Wigfield, A., Eccles, J.S., Mac Iver, D., Reuman, D.A., & Midgley, C. (1991). Transitions during
early adolescence: changes in children's domain-specific self-perceptions and general self-esteem
across the transition to junior high school. Developmental Psychology, 27 (4), 552-565.
Wigfield, A., Eccles, J.S., Yoon, K.S., Harold, R.D., Arbreton, A.J.A., Freedman-Doan, C., &
Blumenfeld, P.C. (1997). Change in children's competence beliefs and subjective task values across
the elementary school years: a 3-year study, Journal of Educational Psychology, 89 (3), 451-469.
back to index
Proceedings paper
Conversely, Reinecke (1974) stated that "no evidence has been found to prove that one specific musical
piece has only one 'right' tempo"(p.414). Here one may conclude that, in a single-movement
composition or between the movements of large-scale compositions, the relation of tempi to each other
may be constant and in a definite and unambiguous relationship to an "inner" or "base" (Margulis,
1984) tempo, which, on the other hand, cannot be determined by the musical structure in a precise and
absolute way. This perhaps may be the reason why composers set metronome marks to their music.
Although tempo is considered to be a prominent factor in harmonic rhythm, it is surprising that music
theorists have paid relatively little attention to it. Yet there are apparently no theories of music that
assert that because all note values are obviously relative to each other, a specific time value can only be
determined by referring to the speed of the temporal structure of music in relation to "real" (externally
metered) time. While Glenn Gould (1982) considered the tempo of a composition to be "one constant
reference point," Cooper and Meyer (1966), on the other hand, criticised the notion of fixed
relationships of pulse and the concomitant belief in an absolute tempo:
Tempo, though it qualifies and modifies [pulse, meter, and rhythm], is not itself a mode of
organization. Thus a rhythm or theme will be recognizably the same whether played faster or
slower. And while changes of tempo will alter the character of the music and perhaps influence
our impression of what the basic beat is (since the beat tends to be perceived as being moderate in
speed), tempo is not a relationship. It is not an organizing force... It is important to recognize that
tempo is a psychological fact as well as a physical one (p. 3).
Concurring with Cooper and Meyer, Kramer (1988) stated: "If we consider tempo as both the rate of
beats and the rate of information, then we can incorporate into this broad concept both the objectively
measured and the subjectively felt."
Physiological Basis of Tempo Consistency
The histories of performance practice and psychology teach us that people have long attempted to
define relationships between "real" time (physical, actual or clock-time) and "musical" time
(psychological, psychical or virtual time). For example, there is an ample and conflicting literature
documenting attempts to support the belief that human pulse serves as a physiological basis of time
sense and musical tempo. As early as 1696, Loulie constructed a pendulum with 72 different swing
durations in an attempt to measure the musical effect according to an average number of pulse strokes.
Winckel (1967) stated quite explicitly that this kind of measurement would not do. Jacques-Dalcroze
(1912) also supported the view that human heart provides a basis of rhythm. Jones (1976) noted that
with increased arousal by means of stimulants, familiar patterns of music unfold more slowly than
usual. Conversely, reporting on his experimentation with mescaline, Huxley (1960) found that music
perception did not distort. Further, when Fuchs (1953) "metronomized" Bach's Mass in B Minor--each
movement separately and on various days--he found that his beat was consistently near 80 beats per
minute. Fuchs concluded: "the pulse can certainly measure music. But just as certainly it does not rule
it." (p. 34).
Conversely, Radocy (1980) pointed out that people perceive music of varying rhythmic regularity and
tempo regardless to the speed of physiological processes. Moreover, measuring the principal tempo of
an extensive number of selected recordings known as the Carnegie set, Hodgson (1951) proposed that
all music is based on one fundamental psychological tempo range between 60 and 70 beats per minute,
and it is this psychological range that largely governs our decisions about musical tempo. From a
phenomenological point of view, Clifton (1984) made the following comment: "The "time sense"
cannot be attributed to a specific organ or physiological function. If the term makes sense at all, it can
only refer to the activity of human consciousness" (p. 56).
movement. As Donington (1963) claims, "Dance steps can only be performed correctly within narrow
margins of speed." (p.392) Another criticism of this work must be directed at the impreciseness of the
apparatus, although it is obvious that the researchers did the best they could with the tools available at
the time.
Fifty-four years later, Halpern (1988) conducted a similar two-part study with college students
unselected for musical ability. It is remarkably similar in purpose and design to the 1934 work by
Farnsworth and his associates. However Halpern does not note the connection. In her two-part
investigation, nineteen well-known popular songs served as stimuli and were presented to subjects by
an Apple II computer controlling a synthesiser (Study 1). Instead of manipulating the tempo lever of a
player piano, as was the case in Farnsworth's study, subjects could change the tempo of the tunes by
manipulating the software interface on the computer until they sounded "correct." Moreover, instead of
tapping on a telegraph key, subjects were instructed to set a metronome to coincide with what they
imagined to be the "correct" tempo of the songs. Results reported a generally positive relationship
between the metronomic evaluations and the setting of the tempi on the computer, i.e. between
"imagined" and "perceived" correct or preferred tempi for each tune. The results are indeed similar to
those found by Farnsworth and his associates concerning the positive correlation between the tapping
task and the setting of the tempo lever. It was also found that imagined tempi seemed to regress to a
middle range of approximately 100 beats per minute, between the faster and slower perceived tempi. In
Study 2, though, which utilised 10 of the tunes of Study 1 and only the "imagery" task (i.e. the
metronome setting), it was reported that the mean preferred tempo was 109 beats per minute,
significantly faster than the mean imagined tempo from Study 1 and much closer to the mean tempo of
120 beats per minute reported in Farnsworth et al. study. Both parts of Halpern's research suggest that
familiar, popular tunes are represented in our mind with a particular tempo.
Interesting as these results may be, they do not demonstrate whether judgements of correct tempo are
consistent across separate trials over an extended period of time, especially when subjects are presented
with musical compositions chosen because they represent a wide range of musical styles and
familiarity. It also seemed important to investigate how tempo judgements might differ among subjects
with different musical backgrounds.
To investigate these issues, Lapidaki & Webster (1991) conducted a study in which subjects were 15
highly experienced musicians (5 composers, 5 performers, and 5 music education specialists) recruited
from a pool of professors and graduate students of a School of Music in the Midwestern United States
and 5 nonmusicians who were professors and graduate students from other departments of the
university and had little formal music education and involvement in musical activities. Three music
examples (e.g., J. S. Bach's "Air in D Major" from the Suite Number 3 in D major; F. Chopin's Prelude
Number 7, Op. 28, and A. Schoenberg's second piece from "6 kleine Stücke," op. 19) were chosen
because they represented a wide range of musical styles and familiarity. All subjects were tested
individually at three sessions at three-day intervals. For each of the three testing sessions, subjects were
asked to make correct tempo judgements of each of the three compositions. The initial tempo of the
presentation of the compositions was systematically in each session.
The findings of Lapidaki & Webster's study (1991) showed that when tempo is judged by highly skilled
musicians in repeated listening tasks of the same compositions, initial tempo has a dominant effect on
correct tempo judgements. Simply stated, no single correct tempo emerged as a consistent entity of
individual or group performance across the three trials. The sample of adult nonmusicians indicated a
basis for a similar conclusion. Nevertheless, this tended to vary according to the composition in
question. These results did not support the observations reported by Farnsworth, et. al. (1934), Halpern
(1988), and Levitin & Cook (1996) that one tempo is consistently associated with particular listening
examples. On the contrary, listeners' perceptions of correct tempo for a particular composition varied
dramatically from trial to trial. Few statistically significant differences in consistency of tempo
judgements were found as a result of musical background and compositional style. Many of these
tendencies suggested important questions for further study.
It was obvious, however, that additional work was necessary with larger and more varied musical
samples and with better measures of individual familiarity with, and preference for judged
compositions. Also of interest would be how these judgements may differ among subjects from
different age groups and musical background.
The majority of empirical studies on tempo perception have been carried out on adults (Farnsworth et.
al., 1934; Halpern, 1988; Hodgson, 1951; Lapidaki & Webster, 1991; Levitin & Cook, 1996; Lund,
1939). However, there is general agreement that the experience of musical time is not separable from
the subjects' age (Bamberger, 1994; Petzold, 1966; Shuter-Dyson & Gabriel, 1981; Zenatti, 1993). To
counter this deficiency, it has proved necessary to investigate the following question: Is the capacity for
consistent tempo judgements for particular pieces of music affected by the age of listeners (e.g.,
preadolescents, adolescents, and adults)? Once the age question has been answered, it might be then
possible to set varied music educational standards for each age level by considering the often
overlooked development of temporal perception in students and, in turn, create a more effective
condition for the growth of musical experience.
Furthermore, the capability to perceive different musical parameters, such as tonality, harmony, form,
and rhythm, without being able to identify and analyse them, is considered to be the outgrowth of
implicit musical knowledge or acculturation (Hargreaves, 1986; Francès, 1988; Bigand, 1993). In other
words, in this situation what listeners know is not something they are aware of knowing, but rather it is
acquired from knowledge that is implicitly or subconsciously built into their auditory systems through
common everyday exposure to music in their cultural environment. There is general agreement among
researchers, on the other hand, that this knowledge becomes explicit or conscious only after musical
training (Dowling, 1993). In essence, musicians presumably possess a fuller understanding and
appreciation of a piece of music, due in part to their ability to possess a sophisticated scheme or set of
rules for encoding its musical events in terms of musical meanings and, thus, to assign to it a stable
structural description (Sloboda, 1994; Dowling, 1994; Wolpert, 1990; Lerdahl & Jackendoff, 1983).
The study was therefore concerned whether the musical background of listeners, that is, the level of
formal music education and/or participation in specialised musical activities, affected the consistency in
the perception of the correct tempo.
Purpose
The present study was designed to investigate the consistency of "correct" tempo as it might exist in
compositions of of various musical styles among when evaluated by subjects with differing musical
background, age, familiarity with, and preference for selected music. It should be noted that the study
was about the extent to which individuals can set consistent tempi across four separate trials: no attempt
was made to establish whether or not these tempi were correct as compared with those set by the
composers in the original pieces. Along these lines, it was reasoned that if a correct tempo did exist,
subjects ought to be able to arrive at consistent judgements about the tempo of examples despite the
examples being presented with differing initial tempi in every session.
Is there an "absolute" or "right" tempo" which may be considered as a unifying construct of the music
examples chosen and whose function is the synthesis of finite, juxtaposed musical elements in relation
to "real" time? Is the concept of a particular tempo represented in the mind as a consistent musical
entity like pitch, perhaps due to a distinct, yet unconscious, psychobiological clock "programmed"
during the listening process?
To investigate these issues, we reasoned that if an "absolute" tempo did exist, subjects ought to be able
to arrive at a consistent decision about the tempo of examples if these judgements occurred over a
period of several days and if the initial tempo of each hearing was varied systematically. We also
wondered whether listeners from different age groups with high levels of formal music education and
listeners with little formal music education would demonstrate different levels of consistency. Finally,
what effect would the style of the listening examples have on consistency of judged tempo?
Research Questions
(1) Is there a consistent judgement of correct tempo across four separate sessions of the same musical
examples using varying initial tempi for each trial?
(2) Is the consistency of tempo judgement affected by the age of the listener?
(3) Is the consistency of tempo judgement affected by the musical background of the listener?
(4) Is the consistency of tempo judgement affected by the style of music?
(5) Is the perception of tempo affected by the familiarity with
a) the individual pieces and
b) their overall style?
(6) Is the consistency of tempo judgement affected by the listener's preference/liking for a particular
musical example?
Methodology
Apparatus
The software program employed for both recording and playback of performance data was the
professional MIDI sequencing program Performer from Mark of the Unicorn. This program was chosen
in large part because of its ability to alter the graphic window display on the computer screen so that the
metronome controls could be easily manipulated. In addition, the program had the capacity to vary the
tempo precisely, without altering any other musical attributes (e.g., pitch, timbre, articulation, etc.).
The tempo of each musical example-that is, the initial tempo-could be easily set by the experimenter
prior to each session of each musical example. The mouse was used by the experimenter to manipulate
the tempo, following the explicit directions of each subject. Set in manual tempo mode, the tempo slider
of the graphic window display on the Macintosh was used to display and change the tempo in real time
in the metronome window. To change tempo, the experimenter dragged the triangular indicator along
the slider: to the left decreased the tempo, to the right increased it. The experimenter could also use the
arrows at either end of the slider: the + (plus) arrow increases the tempo and the - (minus) arrow
decreases it. Subjects were not asked to use the mouse themselves, since to do so would have required
training for a number of subjects.
Selection of Musical Examples
In all trials subjects listened to the following six compositions: C-major and A-minor Two-Part
Inventions by J. S. Bach (Bach I and II, respectively), Clair de Lune by Claude Debussy, Piano Piece
by Michalis Lapidakis, Yesterday by the Beatles, and The Children of Piraeus (Never on Sunday) by
Manos Hadjidakis. These works were chosen because they represented a wide range of musical styles
(e.g., Baroque, Impressionistic, contemporary idiom, rock ballad, and dance music), familiarity, and
preference.
Subjects
Subjects (n=90) were recruited from different age groups-30 adults (25-52 years, 30 adolescents (junior
and senior high school students), and 30 preadolescents (fifth and sixth grade children). Individuals of
each age group were selected on the basis of musical background and willingness to participate. Within
each age group, half the subjects were musicians, half were nonmusicians.
Procedures
For the four testing sessions, subjects were asked to listen to each composition and tell the experimenter
to alter the tempo upwards ("faster') or downwards ("slower") until the tempo was right; that is, the
most appropriate tempo for that composition, in the opinion of the listener. Once the six compositions
were judged, the subject was asked to return in at least four days time for the next session. This slow
pacing of trials was observed in order to prohibit memory carryover from one trial to another.
Each session for each subject systematically varied the order of the compositions and the initial tempo
(I.T.) of the listening examples in order to eliminate the possibility on contextual cues. Two initial
tempi have been used: M.M. q =20 (slow I.T.) and M.M. q=200 (fast I.T.); all tempo judgements in the
Lapidaki & Webster study (1991) had lain within this range. Each initial tempo was repeated twice:
either in the first and third or in the second and fourth trials.
In order to examine subjects' familiarity with the listening examples a questionnaire form was handed to
them at the beginning of the first testing session. Subjects had to answer questions concerning their
familiarity with the particular example and its relevant musical style, after they judged the correct
tempo of each example.
Finally, with regard to the question of their individual preference/liking for a particular musical
example, subjects were asked to rate it on a scale ranging from 1 (least-liked or poor) to 4 (most-liked
or excellent), after they judged the correct tempo of the example at the fourth testing session. This
information was recorded and used in later analyses.
Results
To test the hypothesis that listeners would render consistent tempo judgements, independently from the
initial tempi, a one-way repeated ANOVA for each musical example was performed using tempo
judgements at each of the four trials as the independent variable. The .05 level of significance was
adopted as the alpha level for these tests.
Results for these analyses show that listeners did not exhibit significant consistency in their judgements
of the most appropriate tempo of the musical examples across the four trials (Bach I, F=84.43, p <
.0001; Bach II, F=86.27, p< .0001; Debussy, F=80.37, p< .0001; Lapidakis, F=139.07, p< .0001;
Beatles, F=59.02, p< .0001; Greek dance, F=78.856, p< .0001).
Further examination of the results revealed that both means of tempo judgements for the trials with the
fast initial tempi were higher than the means for the trials with the slow initial tempi with respect to all
musical examples: the slower initial tempo generally evoked slower preferences, and so on.
Furthermore, in order to ascertain which age group exhibited the highest degree of consistency, the
individual deviation scores (IDS) averaged over the four trials of each piece were used as an additional
measurement of tempo judgement consistency for each musical example. IDS reflects the standard
deviation of the four different tempo judgements (Y1, Y2, Y3, and Y4) at the four trials for an
individual. IDS gives a more global sense for the deviations of each group. IDS was used as a primary
response variable to answer questions about consistency associated with other factors of interest such as
M SD M SD M SD F
response variable was performed. The results revealed that the style of rock ballad exhibited the highest
degree of consistency (M=23.27, SD=22.54) followed by the styles of Greek dance music (M=30.90,
SD=25.02), Impressionism (M=35.510, SD=26.29), and Baroque (M=36.51, SD=29.53 (Bach I,
M=36.53 & Bach II, M=36.49)), respectively (F=13.68, p < .0001). The tempo judgements for the
contemporary idiom were the less consistent among all styles (M=52.55, SD=31.56). In other words,
the following consistency scale with respect to the musical styles was observed in subjects' tempo
judgements: Rock ballad < Greek dance music < Impressionism < Baroque < Contemporary idiom.
A repeated measures MANOVA was performed using tempo judgements for each example averaged
over the four trials and the 5 familiarity levels as variables. Results indicated that familiarity with
musical examples significantly influenced tempo judgements (p< .001).
Furthermore, a repeated measures MANOVA was employed using tempo judgements averaged over the
four trials and preference levels as variables. Results revealed that tempo judgements were significantly
affected by subject's preference for the musical examples (p < .05).
The musical ability of 'absolute tempo'
A closer look at the range separating the fastest from the slowest tempo judgements of individual
subjects for each piece often revealed strikingly small discrepancies. It appears that a relatively small
number of listeners (e.g., adult musicians and non-musicians) possess an exceptional ability with
respect to acute stability of large-scale timing in music. This ability to give over time consistent tempo
judgements to a piece of music in conditions seemingly devoid of an external tempo reference (a score
or the body interaction involved in performance) may be referred to as absolute tempo, analogous to
absolute pitch.
It must be also noted that "absolute tempo" has been observed with musical examples that were
thoroughly known by the subjects. Nevertheless, this finding should be treated with caution, since these
subjects did not exhibit the ability of absolute tempo with respect to all pieces for which they had the
same level of familiarity. Contrary to absolute pitch, one might suppose with respect to absolute tempo
that the same person seems to follow different cognitive strategies of timing for each individual piece,
which leaves one wondering whether the stability in viewpoint is to some extent discrete rather than
continuous.
Interestingly enough, these subjects reported that they were surprised when they heard that their right
tempo choices were virtually identical across trials. Thus, it would seem that physical, psychological,
and environmental factors, such as, fatigue, mood or time of day, did not have an effect on their tempo
judgements. One reason might be that music engages and programs psychobiological clocks or neural
oscillations (Goody, 1977; Epstein, 1985; Clynes, 1986; Pöppel, 1990) which function subconsciously
but give conscious read-outs and thereby guide the listeners' choice of right tempo in an exact and stable
manner.
Recommendations for Music Education
Perhaps the most important insight gained from this study is that right tempo judgements lie deeply
within the human ear which intuitively attempts to supply its own right tempo to melody, phrasing,
harmony, rhythm, and other long-scale musical events, in order to ensure their meaningful coordination
and motion through real time. Along these lines, it becomes obvious that music educators can guide
students to achieve a better sense of recognition and mastery of all kinds of relations in a piece of music
by helping them develop a more refined or discerning concept of tempo (Lapidaki, 1992 & 2000).
To help students of all ages to find a use for the concept of tempo in music, music educators may
consider the design of this research which proposes a fascinating, creative, and- most importantly-an
intrinsically musical activity reflecting our need to organise and control the passage of time in music by
means of digital technology (Lapidaki, 1990).
In this context, the finding that most listeners did not exhibit the musical ability of absolute tempo
becomes a secondary issue. Indeed we all vary in the abilities with which our aesthetic perceptions
operate. After all, we are not metronomes.
References
Aldwell, E. & Schachter, C. (1978). Harmony and voice leading (Vol. 1). New York:
Harcourt Brace Jovanovich.
Bamberger, J. (1994). Coming to hear in a new way. In R. Aiello (Ed.), Musical
perceptions (pp. 131-151). New York: Oxford University Press.Barry, 1990
Berry, W. (1986). Form in music (2nd edition). Englewood Cliffs, N. J.: Prentice-Hall.
Bigand, E. (1993). Contributions of music to research on human auditory cognition. In S.
McAdams & E. Bigand (Eds.), Thinking in sound. The cognitive psychology of human
audition (pp. 231-277). Oxford, UK: Clarendon Press.
Braun, F. (1927). Untersuchungen über das persönliche Tempo [Investigations on the
personal tempo]. Archiv der gesamten Psychologie, 60, 317-360.
Clifton, T. (1984). Music as heard: A study in applied phenomenology. New Haven: Yale
University Press.
Clynes, M. (1986). When time is music. In J. R. Evans & M. Clynes (Eds.), Rhythm in
psychological, linguistic, and musical processes (pp. 169-2240). Springfield, IL: Charles
C. Thomas.
Clynes, M., & Walker, J. (1982). Neurobiologic functions of rhythm, time and pulse in
music. In M. Clynes (Ed.), Music, mind and brain: The neuropsychology of music (pp.
171-216). New York: Plenum Press.
Clynes, M., & Walker, J. (1986). Music as time's measure. Music Perception, 4 (1),
85-119.
Cooper, G., & Meyer,, L. (1966). The rhythmic structure of music. Chicago: The
University of Chicago Press.
Donington, R. (1963). The interpretation of early music. New York: St. Martin's Press.
Donington, R. (1980). Tempo. In S. Sadre (Ed.), New Grove's dictionary of music and
musicians (Vol. 18). New York: Macmillan.
Dowling, W. J. (1994). Melodic contour in hearing and remembering melodies. In R.
Aiello (Ed.), Musical perceptions (pp. 173-190). New York: Oxford University Press.
Dowling, W. J. (1993). Procedural and declarative knowledge in music cognition and
education. In T. J. Tighe & W. J. Dowling (Eds.), Psychology and music. The
understanding of melody and rhythm (pp. 5-18). Hillsdale, NJ: Erlbaum.
Epstein, D. (1985). Tempo relations: A cross-cultural study. Music Theory Spectrum, 7,
34-71.
Farnsworth, P., Block, H., & Waterman, W. (1934). Absolute tempo. Journal of General
Psychology, 10, 230-233.
Forte, A. (1979). Tonal harmony in concept and practice. New York: Holt, Rinehart &
Winston.
Francès, R. (1988). The perception of music (trans. by W. J. Dowling). Hilsdale, NJ:
Erlbaum.
Fuchs, C. (1953). Rhythm and tempo: A study in music history. New York: W.W. Norton.
Goody, W. (1958). Time and the nervous system. The Lancet, 7031, 1139-1141.
Gould, G., & Page, T. (Winter 1982-83). Excerpts from an interview with Tim Page. The
Piano Quarterly, 120, recording.
Halpern, A. R. (1988). Perceived and imagined tempos of familiar songs. Music
Perception, 6 (2), 193-202.
Hargreaves, D. J. (1988) The developmental psychology of music. Cambridge, MA:
Cambridge University Press.
Harrison, R. (1941). Personal Tempo. Journal of General Psychology, 24 & 25, 343-379.
Hodgson, W. (1951). Absolute tempo: Its existence, extent, and possible explanation.
Proceedings of the Music Teachers National Association, XLIII, 158-169.
Huxley, A. (1960). The doors of perception. London: Ghatto & Windus.
Jacques-Dalcroze, E. (1912). Rhythm, music and education. London: Ghatto & Windus.
Jones, M. R. (1976?). Times our lost dimension: toward a new theory of perception,
attention, and memory. (ed: We are still running down this citation.)
Kirkpatrick, R. (1984). Interpreting Bach's well-tempered clavier: A performer's discourse
on method. New Haven: Yale University Press.
Kramer, J. D. (1988). The time of music: new meanings, new temporalities, new listening
strategies. N. Y.: Schirmer Books.
Lapidaki, E. (2000). Stability of tempo perception in music listening. Music Education
Research, 2 (1), 25-44.
Lapidaki, E. (1992). Time. In B. Reimer & J. Wright (Eds.), On the nature of musical
experience (pp. 246-248). Niwot, CO: The Universiry Press of Colorado.
Lapidaki, E., & Webster, P. (1991). Consistency of tempo judgements when listening to
music of different styles. Psychomusicology, 10 (1), 19-30.
Lapidaki, E. (1990, July). L' imagination au pouvoir: Some riddles on the issue. Paper
presented at the International Symposium on Research and Teaching in the Philosophy of
Music Education, Bloomington, IN.
Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA:
MIT Press.
Lester, J. (1982). Harmony in tonal music (Vol. 2). New York: Alfred A. Knopf.
Levitin, D. J., & Cook, P. R. (1996). Memory for musical tempo: additional evidence that
auditory memory is absolute. Manuscript submitted for publication.
Loulie, E. (1696). Elements ou principes de musique [Elements or principles of music].
Paris: Presses Universitaires de France.
Lund, M. (1939). An analysis of the "true beat" in music . Unpublished doctoral
dissertation, Stanford University.
Margulis, V. (1984). Tempo relationships in music. Isny, Germany: Rudolf Wittner, GmbH
and Co.
Miles, D. W. (1937). Preferred rates in rhythmic response. Journal of General Psychology,
16, 427-469.
Mishima, J. (1956). On the factors of mental tempo. Japanese Psychological Research, 4,
27-38.
Pö ppel, E. (1990). Unmusikalische Grenzüberschreitungen? [Unmusical crossings of
limits?].In C. R. Pfaltz (Ed.), Musik in der Zeit [Music in time] (pp. 105-124). Basel,
Switzerland: Helbing & Lichtenhahn.
Petzold, R. G. (1963). The development of auditory perception of musical sounds by
children in the first six grades. Journal of Research in Music Education, 21, 99-105.
Piston, W. (1978). Harmony. New York: W. W. Norton.
Radocy, R. E. (1980). The perception of melody, harmony, rhythm, and form. In D. A.
Hodges (Ed.), Handbook of music psychology. National Association for Music Therapy.
Reckziegel, W. (1961). Musikanalyse: Eine exakte Wissenschaft? [Musical analysis: An
exact science?]. In H. Heckmann (Ed.), Elektronische Datenverarbeitung in der
Musicwissenschaft [Electronic data processing in musicology].Regensburg: Gustav Bosse
Verlag.
Reinecke, H. P. (1974). Vom musikalischen Hören zur musikalischen Kommunikation
[From musical hearing to musical communication]. In B. Dopheide (Ed.), Musikhören
[Musical hearing]. Darmstadt: Wissenschaftliche Buchgesellschaft.
Rimoldi, H. J. A. (1951). Personal tempo. Journal of Abnormal and Social Psychology, 46,
280-303.
Shuter-Dyson, R., & Gabriel, C. (1981). The psychology of musical ability (2nd ed.).
London: Methuen.
Sloboda, J. A. (1994). Music performance: expression and the development of excellence.
In R. Aiello (Ed.), Musical perceptions (pp. 152-172). New York: Oxford University Press.
Wagner, C. (1974). Experimentelle Untersuchungen über das Tempo [Experimental
investigations of tempo]. Österreichische Musikzeitschrift, 29, 589-604.
Wallin, J, (1911). Experimental studies of rhythm and time (Part 1 & Part 2).
Psychological Review, 18, 100-133 & 202-222.
Winckel, F. (1967). Music, sound, and sensation: A modern exposition. New York: Dover
Publications.
Winckel, F. (1962). Optimum acoustic criteria of concert halls for the performance of
classical music. The Journal of the Acoustic Society of America, 34 (1), 81-86.
Wolpert, R. S. (1990). Recognition of a melody, harmonic accompaniment, and
instrumentation: musicians and nonmusicians. Music Perception, 8, 95-106.
Zenatti, A. (1993). Children's musical cognition and tast. In T. J. Tighe & W. J. Dowling
(Eds.), Psychology and music. The understanding of melody and rhythm (pp. 177-196).
Hillsdale, NJ: Erlbaum.
SHORT NOTE OF BIOGRAPHIC DETAILS
Eleni Lapidaki is a professor of music education and psychology at the Department of Musical Studies,
Aristotle University of Thessaloniki, Greece, and a research fellow at the Center for the Study of
Education and the Musical Experience (CSEME), School of Music, Northwestern University, U.S.A.
Her dissertation from Northwestern University has been given the "Outstanding Dissertation in Music
Education Award" by the Council for Research in Music Education (CRME) at the 1998 Music
Education National Conference (MENC) in Phoenix, AZ. Her research concerns a closer interaction
between the artistic, scientific, and pedagogical aspects of temporal experience in music. It was
published in the book On the Nature of Musical Experience (Eds. Bennett Reimer & Jeffrey Wright), in
Psychomusicology, co-authored by Peter Webster, and Music Education Research (Vol. 2, No.1, 2000).
She also presented her research at the 1999 "Research in Music Education: An International
Conference," University of Exeter, and at the 1999 Conference of the Society of Music Perception and
Cognition (SMPC).
Back to index
Proceedings paper
Patrick J. O'Donnell
Department of Psychology, University of Glasgow, Adam Smith Building, Hillhead, Glagsow, G12
8RT, UK.
Introduction
Previous research has demonstrated developments in musical and psychological variables for
individuals with learning disabilities following a 10-week music intervention (MacDonald, O'Donnell
& Davis 1999). The psychological mechanism underpinning these developments was highlighted an
area for future research and is the focus of the study presented here. The purpose of this study was to
investigate developments in joint attention made by individuals with a learning difficulty who
participated in structured music workshops.
Individuals with learning disabilities are one of the main target populations for music interventions
with a therapeutic focus (Aldridge, 1993; Oldfield & Adams, 1990; MacDonald & O'Donnell, 1996).
In addition, a wide range of outcomes has been claimed in the published research, including
behavioural improvements, reduction in anxiety, improved motor coordination and enhanced
communication skills (Aldridge, 1993). Moreover, a number of authors have suggested that music
interventions may offer an environment within which individuals with an learning disability can
develop social, cognitive, and physical skills that may enhance their life experiences (Aldridge, 1993;
Oldfield & Adams, 1990; Wigram, 1995). It has been noted, however, that there is still a need for
empirical research that further investigates the process and outcomes of therapeutic and educational
music programmes for individuals with learning disabilities (Radhakishnan, 1991; Purdie, 1997;
MacDonald O'Donnell & Davies, 1999; Schalkwijk, 1994). Such studies will develop understanding
of the nature of music interventions in a modern health care context (Bunt, 1994; Oldfield & Adams
1990; MacDonald, O'Donnell, & Dougall, 1996; Wigram, 1995).
MacDonald O'Donnell and Davies (1999) reported studies that focused on investigating the outcomes
of music programmes for individuals with learning difficulties. The results highlighted significant
improvements in musical ability, communication skills, and self-perception of musical ability. While
the studies produced results supporting the efficacy of the intervention, the psychological mechanism
underpinning these developments remained unclear and a crucial area for future research. It is this
issue that the present paper addresses.
Joint attention
Joint attention structure is well explored in language development work (Hughes, 1998; Morales,
Mundy, & Rojas, 1998; Sigman, 1998). For example, mothers who spend a longer time in
linguistically active joint attention have children with larger vocabularies and more developed
syntactic structures (Tomasello 1992, 1995; Tomasello & Todd, 1983). In this context, joint attention
is defined as a shared focus of attention to the same object by caregiver and child. A crucial point here
is that the music workshop environment and the environment in which the participants were tested
contained examples of classic joint attention situations.
Recent evidence suggests that joint attention is disrupted in children of non-typical development
(Harris, Kasari, & Sigman, 1996; McCathren, Yoder, & Warren, 1995). For example, children with
Down syndrome find situations of joint attention particularly difficult (Kasari, Freeman, Mundy, &
Sigman, 1995; Roth & Leslie, 1998). The problems found with the management of joint
attention-based language in children with Down syndrome may have implications for
developmentally disadvantaged groups generally, since the basic mechanism involves the
management of a limited cognitive resource in what is a dual task situation. A comparison of
handicapped and non-handicapped infants argued that joint attention deficits could be ameliorated by
an appropriate intervention strategy (Yoder & Faran, 1986). Given the nature of the Gamelan
workshop environment, it is suggested here that development in joint attention is a possible benefit for
the participants.
To further understand this link between joint attention and the workshop/assessment environment, it is
important to consider in more detail the characteristics of joint attention situations. Individuals must
maintain attention on the object, orientate to the speaker, perhaps by changing visual direction,
reorientate to the object, retain a working memory representation of the speaker's utterance, apply it to
the object in hand, reconstitute the object as a focus of shared attention, and ensure that the correct
features and implications are being shared between the participants, perhaps recheck by using NVC
cues, and finally organise some response. All of these skills are involved in the musical and
assessment tasks used in this experiment.
In summary, MacDonald, O'Donnell and Davies (1999) have reported musical and psychological
developments as a result of a music intervention. The reasons as to why these developments take
place, however, are an important area for research. Given that individuals with learning difficulties
have been shown to have deficits in joint attention, and that the musical and assessment environment
contain examples of joint attention situations, it is suggested that participants at the music workshops
will display development in joint attention
Methods
Participants
The study contained 40 particpants. These individuals had either mild or moderate learning disabilities
and, at the time of the study, were receiving health care from a number of institutions in central
Scotland. Participants were randomly assigned to either the experimental or an intervention control
group. Although 20 individuals (10 males and 10 females) were originally pre-tested in the
experimental group, one female was dropped from the study because of illness, and only 19 (10 males
and 9 females) participants were post-tested and used in the analysis. The chronological age ranged
from 17 to 58 years and their mean age was 40.4 years (S.D.=8.41). The intervention control group
contained 21 participants. The chronological age range of the group was 25 - 43 with a mean age 37.6
(S.D.=7.01).
Equipment
The music intervention was a percussion-based workshop programme focused upon playing a series
of instruments from Indonesia known as a Javanese Gamelan. Gamelan is a generic name for a set of
tuned percussion instruments consisting of gongs, metallophones, cymbals, and drums that can be
found throughout Malaysia and Indonesia and range in size from 4 to 40 instruments (Lindsay, 1989).
Use of the Gamelan for a population of individuals with learning disabilities was outlined by Sanger
& Kippen (1987), who describe a particular musical and social event in which the Gamelan was used
as part of a 2-week music programme.
Participants in both groups were pre- and posttested on the following validated measurement
instruments: (a) The Elmes test of musical attainment (MacDonald & O'Donnell, 1994, 1996; (b) The
communication assessment profile for adults with a mental handicap (CASP) part 2 section 3 (van der
Gaag, 1988, 1989,1990); and (c) Self-Perception of Gamelan ability visual analogue question
(MacDonald & O'Donnell, 1996). In addition, all assessment sessions were video taped using a VHS
video recorder.
Procedure
Ethical approval was obtained from The Greater Glasgow Research and Ethical Committee, and
participation in both the experimental and control group was voluntary. Participants were able to
withdraw at any time. Participants in the experimental group were pre-tested and then attended
weekly workshops for 10 weeks. These workshops lasted approximately 90 minutes and began with
rhythm exercises. The purpose of this warm-up session was to relax the group and help set up
cohesive group dynamics that are essential to the success of a workshop. The rest of the time was
usually given over to playing the Gamelan. Various methods were employed by the workshop leader
to communicate the musical ideas to the participants. Initially, participants were asked to repeat a
rhythmic pattern being played on one of the Sarons. More complex patterns were played as the
workshop progressed, and there was opportunity for improvisation within the context of any piece of
music. The participants also had the opportunity to select a particular part of the Gamelan. The
emphasis was on group involvement and rhythmic awareness through musical participation while at
the same time attempting to cater for the individual needs of participants.
The organisational vehicle for delivery of the musical intervention was Sounds of Progress (SOP), a
music and theatre company that draws 75% of its musicians from within the special needs sector. SOP
provides opportunities for people with special needs to explore their creativity through music.
Ongoing work includes providing music therapy, delivering music workshops, and recording and
performance projects. The company encourages musicians to develop their skills to the highest
standard and has an explicit educational objective in terms of developing the musical skills and
awareness of all individuals who participate in SOP activities. The company focuses on enhancing a
wide range of musical skills and developments in rhythmic ability on percussion instruments; singing
skills and compositional and improvisational skills are some of the objectives.
Participants in the intervention control group were pre-tested and then attended communal cooking or
art classes once a week for 10 weeks. The art group met once a week for approximately 90 minutes.
There were two groups of six individuals with one occupational therapist present in each group. The
two groups each worked on a separate piece of art. This was a large painting (10m X 5m) to be
completed over the 10-week period. The cooking group also met once a week for 10 weeks with two
sub groups containing six individuals. The class lasted approximately 90 minutes with one
occupational therapist present for each group. Each week the groups would prepare and eat a meal
together. Following the 10-week programme, all participants were post-tested.
All the assessment sessions during pre and posttesting were video taped and key aspects of non verbal
communication were focused upon for the analyses. Conceptually, the foci of measurement were
indicators of attention structure, joint attention structure, and divided attention (between the
experimenter and the task). Since the different aspects of the testing situation have different task
demands, measures were constructed for the different components. The order of testing involved first
a clapping sequence or set of sequences, then a Gamelan tone copying session, a Gamelan note
sequence set of tasks, and a communication testing session. Also included was self-assessment of
musical confidence. The Communication task involves two phases: scanning a set of pictures on a
page and identifying the one demanded by the experimenter followed by responding to the
experimenter pointing at a picture; and asking what the object is and what it does. As has been argued
earlier, all the tasks involve examples of joint attention. The targets of the operationalised measures
were precisely these features of joint attention.
In the clapping task, joint attention was defined as both participants focusing on the experimenter's
hands during the clapping, or the participant focusing in pace when listening to the sound. Some
participants had trouble doing either and allowed their attention to wander, usually by looking at the
experimenter's facial cues. Times were measured to the second and expressed as a percentage of time
in joint attention. The number of times the participant interrupted the experimenter was also recorded
since it is a measure of failed attention. The participant has failed to appreciate the shared object.
On the Gamelan sequence task, the number of times gaze was directed away from the keyboard
during the experimenters' demonstration was measured. On the CASP, two measures were taken. The
participants' delay in pointing, timed from the moment at which the experimenter finished the
instruction, was recorded (repeated or misheard instruction trials were ignored). Also, the participants'
delay in giving the name was recorded on the second part of the CASP. The CASP sessions involved
participants switching attention backwards and forwards from the experimenter to the task in hand,
the CASP booklet. Some participants coped with this by focusing gaze almost exclusively on the
booklet and relying on listening to the experimenter to orientate to the task demands. Others switched
attention to the experimenter at the beginning and end of the individual trials. Still others allowed
their attention to be taken up by checking the experimenter's facial cues even during any given trial.
The number of switches of attention was therefore recorded as a measure of attention allocation
strategy. Number of irrelevant comments made during the CASP session was used as an indicator of
attention distraction Both the experimental and the control groups underwent this assessment prior to
and after the interventions.
Results
Table one summaries the non-verbal communication measures for the control and experimental
groups. It highlights that there are significant improvements in 5 of the 6 measures of NVC for the
experimental group, while there are no significant changes in the control group on any of the
measures. Specifically, the experimental group show significant improvement in joint attention as
defined by: time watching the experimenter clap; delay in pointing to the communication booklet;
time studying the communication booklet; number of attention switches and number of irrelevant
remarks.
Table 1. NVC Measures for Control and Music Groups for T1 and T2: Sign Test
Variable Probability
Experimental Control
t1 t2 p t1 t2 p
Discussion
The results clearly show significant improvements on measures of non-verbal communicative
functioning in the experimental or music group, and not in the control group, which was exposed to a
joint social task (a cooking or art class). All the measures showed improvement from time one to time
two, with the exception of the interrupt measure, which did not offer sufficient variability to be
analysed. Note that the number of irrelevant comments goes up in the control group. Actually they get
more talkative in an irrelevant way as time goes on. What is perhaps more surprising is the lack of
progress made by this group. After all, they are tested on the CASP for the second time, and they
have, by the time of second testing, extensive experience of a social task. The control task, however,
does not show generalisation. The CASP improvement in the first instance and the related
improvements on the NVC communicative measures indicate that, for the experimental group, some
aspect of the music training is generalising to communicative competence. In operational terms, the
music group spends more time attending to the experimenter when he is giving instructions, but less
time when he is not. They listen rather than make comments, and they concentrate on the task when
the instructions finish without switching attention back and forth to the experimenter. They listen
when they should, and when they should concentrate on the CASP, they shut out the experimenter.
The earlier analysis of the cognitive demands of the CASP in the introduction emphasised the
complex nature of the assessment and its relationship to joint attention. Why should the music training
generalise to this task? The first point to be made is that music participants do get better on the music
variables and that this improvement correlates with improvement on the CASP (MacDonald &
O'Donnell, 1996). Given the importance of, for example, rhythmic processes to speech and to motor
coordination, it is possible that the training has developed mechanisms supporting language. A crucial
feature of the Gamelan training is its mix of music and cooperative sociability. The explanation for
some task generalisation should emphasise the precise nature of the experimental group's activity.
What marks the Gamelan is the combination of listening to instructions, paying attention to other
people's performance and appropriately executing one's own. It does involve executing a planned
sequence in the context of a joint attention task. If the joint attention parallels between the CASP and
Gamelan are pursued, then the question becomes: "Why does practice on one joint attention problem
generalise to another?" An answer needs brief reference to research on joint attention structures.
Recent work on the topic suggests that joint attention is subject to developmental processes interacting
with experiences, and that it is partly an acquired skill (Hughes, 1998; Roth & Leslie, 1998;
Tomasello, 1995). In the case of the Gamelan workshops, practice will improve the music task, and
also certain features of communication. Participants will improve their recognition of others' interrupt
and activity signals. But what do participants learn that generalises? One possibility is that they learn
to concentrate on one thing at a time. Participants can either attend to the music or try to worry about
others simultaneously. What they must learn to do is to inhibit other people at least for a time. The
skill that might generalise is learning not to attend to the social cues unless at particular moments. The
evidence from the CASP based nonverbal measures show that they are paying attention to one thing at
a time, suppressing irrelevant talk and looks, and being more focused on and quicker to the task
In conclusion, it can be argued plausibly that music interventions are an ideal focus for the
development of joint attention skills. Research evidence suggests these joint attention skills can be
taught (Girolametto, Verbey, & Tannock 1994). A common object that the child or client follows
naturally is the key element and it may be that music provides the ideal common focus of attention.
References
Aldridge, D. (1993). Music therapy research 1: A review of the medical research literature within a
general context of music therapy research. The Arts in Psychotherapy, 20, 11-35.
Bunt, L. (1994). Music therapy: An art beyond words. London: Routledge.
Girolametto, L., Verbey, M., & Tannock, R. (1994). Improving joint engagement in parent child
interaction: An intervention study. Journal of Early Intervention, 18, 155-167.
Harris, S., Kasari, C., & Sigman, M. D. (1996). Joint attention and language gains in children with
Down syndrome. American Journal on Mental Retardation, 100, 608-619.
Hughes, C. (1998). Executive function in preschoolers: Links with theory of mind and verbal ability.
British Journal of Developmental Psychology,16, 233-253.
Lindsay, J. (1989). Javanese Gamelan: Traditional orchestra of Indonesia. Oxford: Oxford
University Press.
Kasari, C., Freeman, S., Mundy, P., & Sigman, M. D. (1995). Attention regulation by children with
Down syndrome: Coordinated joint attention and social
referencing looks. American Journal on Mental Retardation, 100, 128-136. MacDonald, R. A. R., &
van der Gaag, A. (1990). The validation of a language and communication assessment procedure for
use with adults with intellectual disabilities. Health Bulletin, 48(5), 254-260.
Wigram, T. (1995). A model of assessment and differential diagnosis of handicap in children through
the medium of music therapy. In T. Wigram, B. Saperston & R. West, (Eds.), The art and science of
music therapy (pp. 181 - 194). London: Harwood Academic Publishers.
Yoder, P. J., & Farran, D. C., (1986). Mother infant engagements in dyads with handicapped and
nonhandicapped infants: A pilot study. Applied Research in Mental Retardation, 7, 51-58.
Back to index
Proceedings paper
Discrimination and Interference in the Recall of Melodic Stimuli among School Children
This study represents an extension of research published over three decades ago concerning melodic memory
(Collings, 1966; Madsen, Collings, McLoed & Madsen 1969; Madsen & Staum, 1983). Research concerning
memory has been both long and continuous seemingly because memory appears to be an important ingredient in
human life, general education and especially in music education (Deutsch, 1971; 1972; 1973; 1975; Dowling,
1973; Murdock, 1974; Tanguiane, 1994).
It has been speculated that interference in memory comprises 85-95% of all forgetting. In verbal memory, the
detrimental effect of prior learning (proactive interference), and the detrimental effect of temporally later learning
(retroactive interference), are affected by the number of items in a sequence, by the length or amount of time
occurring between items, by the length of time between sequence presentation and beginning of recall, and by the
number of prior sequences learned (Murdock, 1974).
In tonal memory, interference for pitch appears to be generalized across all octaves (Deutsch, 1973), inhibited by
semantic content (Long, 1977), and enhanced by the interpolation of familiar melodies (Dowling, 1973). Memory
for tones also appears to be affected by attending to interpolations (Deutsch, 1971), by the similarity and
repetitions of tones within sequences (Deutsch, 1972, 1975) and by the placement of the test tone within an
interpolated sequence (Deutsch, 1975).
Although many studies have been concerned with memory perception of musical stimuli (Attneave & Olson,
1971; Cuddy & Cohen, 1976; Davies & Jenning, 1977, Davies & Yelland, 1977; Dowling, 1978; Dowling &
Fujitani, 1971) some controversy exists suggesting that memory for single tones is distinctly different than the
process which functions as a melodic gestalt for musicians (Davies, 1980; Taylor, 1980). Croonen (1991)
describes two experiments where a tonic triad was placed at various places through a series of tones and indicates
that the location of a tonic triad in a tone series is important for recognition. Tsuzaki (1991) examined the effects
of musical context to determine under what conditions melodies retain their gestalt properties and found that "the
significant effect of the starting tone in the diatonic condition suggests that the presentation of the diatonic scale
might have imposed a strong anchoring point."
Duration of sequences and time allowed for recall have also been associated with the ability to recall tonal entities
(Deutsch, 1972; Masssaro, 1970; 1971; Wickelgren, 1969) as has the length of sequences (Taylor, 1972), and the
length of pause occurring between test tone and the beginning of interpolated material (Deutsch, 1978). Presently,
there are various theories of music memory that attest to the ongoing controversy in exploring this important
activity (Booth & Cutietta, 1991; Cutietta & Booth, 1995; Large, Palmer & Pollack, 1995; Tanguiane, 1994).
Additionally, appropriate methodology in investigating music cognition has also been advanced (Krumhansl,
1990).
In early music research by Collings (1966) college music majors were tested for retention of major/minor
melodies with duple and triple meter after one, two, three, four, and five interpolations of similar melodies.
Results indicated that even the slightest differences in melodies were differentiated and that subjects were better
able to discriminate a duple rhythmic background in combination with a harmonic minor scale. A similar study
was administered to non-music majors that yielded similar results (Madsen, et. al., 1969).
A large-scale study by Madsen and Staum (1983) assessed 400-college students' ability to recognize a target
melody after several other similar melodies were played. Results showed that
Subjects were able to discriminate effectively throughout the stimulus presentations with
an extremely high accuracy for those specific melodies that were identical to the test
melody. In addition, when melodies were identical except for slight modifications,
melodies presented in duple meter appeared less susceptible to interference than
melodies in triple meter or than melodies having modal changes. As would be expected,
accuracy declined across interpolated melodies; however, even after 8 interpolated
melodies, subjects recalled the test melody with at least 43% accuracy (Madsen &
Staum, 1983).
Robertson (1998) recently completed a replication of the Madsen and Staum study in assessing differences
between Asian subjects compared to subjects in the original study. Her results indicated that there were indeed
patterns of differences when comparing Chinese subjects with their United States counterparts with Chinese
subjects demonstrating greater accuracy for both the simple and duple meters. In another study examining the
effects of age, music experience, and style of musical stimuli on recall of transposed melodies the authors
concluded that both "age and experience effected different aspects of the task, with experience becoming more
influential when interference was provided." (Halpern, Bartlett & Dowling, 1995). Additionally, Boltz (1998)
investigated the relationship between an event's temporal (e.g., rhythm, rate, total duration) and nontemporal
information (e.g., sequence of pitch intervals) and found that the nature of encoding is strongly dependent on the
structure of environmental events and the degree of learning experience. Thus this entire line of research indicates
that more research should be done with various aged subjects as well as different levels of musical sophistication.
It was the purpose of the present investigation to assess the nature of forgetting (interference) in retaining melodic
sequences and to identify the characteristics of perceptual memory process for these same musical stimuli for
grade school students compared to adults. This study represents an extension of the Madsen & Staum (1983) and
the Robertson (1998) line of research using identical stimuli with the exception that all melodies were re-mastered
using current technology.
Method
Subjects
Sixty-two elementary aged students were randomly selected for the sample, ages ranged from 9-12 with a mean
age of 10.88. Elementary aged children attended a large university developmental research school. This school has
a long-standing policy of selecting students such that the total student body is representative of the greater
demographic distribution of a much larger community of approximately 250,000.
Sixty young adults, having had no private instruction and less than three years of formal group music study with a
mean age of 22.87 and 60 trained musicians (mean age 23.66) with a minimum of 10 years of formal individual
and/or group music instruction were selected from a large university. Untrained and musically trained adult
subjects were randomly selected from intact classes within this large university with a total population of 30,000;
the trained musicians were selected from the same institution's school of music, which has a total music
population of approximately 1,000.
Apparatus
All melodies were programmed using Opcode's Studio Vision Pro sequencer. The tone source was a Kawai
G-Mega using an acoustic piano patch. The tone module was set for equal temperament, A= 440hz. All MIDI
velocities were set to 64. Melodies were digitally recorded at a sample rate of 44.1 kHz, 16-bit stereo using Pro
Tools hard disk recording software. The results were burned to an audio compact disc.
Melodies
Melodies used in this study were identical to those used in both the Madsen & Staum (1983) and the Robertson
(1998) studies except that they were re-programmed using current technology (see above).
Four melodies were used for the study, each derived from a descending diatonic scale beginning and ending on
Eb. Each melody was altered in mode (major and minor) and meter (compound and simple) to produce in total,
sixteen different melodies modeled after Collings (1966). Thus, each of the four original melodies consisted of (a)
compound major (CM), (b) compound minor (cm), (c) simple major (SM), and (d) simple minor (sm). Each of the
sixteen melodies was used one time each as the test melody for the sixteen trial tests. These same sixteen melodies
were also used as distraction for each other.
Within each test trial, nine consecutive melodies were heard. The first always included the test melody in the first
position, while that same melody again appeared in one of the other eight recall positions throughout the trial.
Each of the eight recall positions was tested twice within the sixteen examples. Additionally, in half of the trials
the test melody also appeared in another recall position with a meter change only (R= rhythm) and in the other
half, with a mode change only (M= mode). All other melodies for each example were randomly selected from the
group of sixteen original melodies with the condition that no melody appeared twice within one example.
Additionally, the eight melodies that followed the initial melody were randomly determined. Thus, for every trial,
an initial test melody was played, the identical melody was played again, the same melody with a mode or meter
change was played, as were six other interpolated melodies. The melodies, each lasting 10-seconds in length, were
recorded with a 3 second pause between interpolated melodies and a 15 second pause between the end of one
example and the beginning of the next example. Therefore, from the beginning of the first melody to the end of
the last melody in each trial spanned a memory length of one minute, 54 seconds.
Procedures
A special answer sheet was developed for the elementary population and the regular music teacher did the testing.
Thus the procedure was nested within the natural environment of these youngsters. When subjects arrived in the
experimental room they were given an answer sheet with written instructions to be followed along with taped
directions. The task involved listening to the first melody, then comparing each additional melody with the very
first one by marking S (same as the first melody) or D (different from the first melody) in each of eight
consecutive boxes as each additional melody was played. Subjects were given three practice examples to complete
before testing was initiated. The duration of the test was approximately 30 minutes with an additional ten-minute
break after the first eight melodies. Subjects were randomly assigned to the two testing conditions: Half of the
subjects were given the first part of the test (examples 1-8) first, while the remaining subjects were first
administered examples 9-16. Two separate experiments were administered on subsequent days. In the first
experiment A the effect of simple and compound alterations was controlled by placing all compound major and
minor identical melodies in Recall Positions 2, 4, 6, or 8 and all simple major and minor identical melodies in
Recall Positions 3, 5, 7, or 9. In experiment B the effects of modal alterations were controlled by placing all major
simple and compound identical melodies in Recall Positions 2, 5, 6, or 9 and minor melodies in Positions 3, 4, 7,
or 8.
Adult populations were also tested on two successive days with the orders of the examples reversed for half of the
subjects on both experiments as described above.
Results
Responses were initially calculated by determining the percentage of individuals who perceived each melody as
identical to the initial test melody whether or not it was actually the same. In this manner each group was given an
overall percentage score across the various melodies whereby rank orders could be determined. Each group's
percentage scores of correct responses were then computed across the eight interpolated positions. Discrimination
across melodic interpolations was indeed apparent for each group and indicated that subjects discriminated
differentially across the 16 examples in each temporal position.
Statistical analysis indicated that there were differences across the three groups as would be expected (H = 12.40
p< .001). Percentages of total correct responses were significantly different across populations: 53 for the
children, 69 for untrained subjects and 74 for trained musicians. Even after 1 to 8 extremely similar melodies most
subjects were able to identify the original melody. Recall position 1 yielded 66 % accuracy for children, 89 % for
untrained adults, and 89 % for trained musicians; recall position 8 evidenced 39 % for children, 62 % for
non-trained adults, and 66 % for trained musicians.
Further nonparametric analyses indicated while there was not a significant difference for the adults on
experiments A versus B, there was a significant difference for the children.
Experiment A: Grade School
A rank ordering of responses for each placement yielded the highest ranks (first and second ranks) for
differentiation of the identical melody when it was in fact identical, in the temporal positions of 2, 4, 5, and 8. In
position 3, the identical responses were ranked first and third, for position 6, first and fifth, for position 7, first and
seventh and 9, second and sixth. It was also noted that in two positions a relatively greater percentage of subjects
perceived the recall melodies as the same when they were the identical melody with a mode or meter change. In
fact, a consistently higher trend of reporting "same" occurred for identical melodies with modal alterations than
with meter alterations.
Experiment B: Grade School
The most important finding for this group of this trial arrangement was that in every case the percentage of correct
responses was less than for the same melodies when presented in a different arrangement (experiment A above).
The rank order indicates that only in positions 2 and 8 did subjects choose the correct answers as their top choices.
In positions 3, 5 and 6 the correct responses were ranked first and third. In position 4 correct responses were
ranked as third and fifth, and in position 7 correct responses were ranked first and fourth (irrespective of ties).
Position 9 received the lowest correct response where correct responses were rated sixth and eleventh indicating
much greater interference.
Experiment A: Adult Nonmusic
It should be remembered that this group represented a replication of the original study (Madsen & Staum, 1983)
using current technology and re-mastered tapes. Results indicated almost identical scores in both Experiments A
& B. In positions 2, 3, 4, 5, 7, and 9 the top responses were always correct. In position 6 the highest response was
ranked first but the second response was ranked third. In position 8 the correct responses were ranked second and
forth.
Experiment B: Adult Nonmusic
Correct responses were ranked both first and second for positions 2, 3, 6,and 9, while all other positions were
ranked first and third.
Experiments A & B: Adult Musicians
Correct responses from the Adult Music Group were ranked in first and second places throughout the entire 32
trials.
While it is not surprising that musicians were better at this task, it is interesting that there was a great deal of
variability across the positions for all groups. For example, the collective correct response for the grade school
subjects was slightly more accurate on position 4 in experiment A than either the adult nonmusic group or the
musicians. While the musicians were better overall, the adult nonmusic group was better on the third position
when compared to the music group.
Discussion
Consistent with previous studies (Collings, 1966; Madsen, et. al., 1969; Madsen & Staum 1983; Robertson, 1998),
it appears that subjects are able to discriminate among melodies that are identical or very similar to an initial
melody over an extended period of time. The most important finding of this research indicates that even young
children have the ability to discriminate among highly similar melodies even with many interpolations. It should
be remembered that all 144 melodies were actually derived from one basic melody and all tested melodies were
extremely similar structurally.
While in previous research (Madsen, Staum, 1983) non-music majors adults correctly perceived 69% of the
melodies as being identical (even with 9 interpolated melodies) young children in the present study correctly
perceived 53%. In Experiment A the correct total for the children was 59% and dropped to 46% in Experiment B.
While students completed Experiment B on a different day, it is speculated that the previous listening for
Experiment A produced inattentiveness for the subsequent testing of Experiment B. Perhaps this drop in total
correct responses was entirely due to fatigue in that the correct responses for adults were almost identical for both
of the adult groups. Apparent differences among the three groups and within the two experimental orders are
interesting and may be important in future research, especially when the two tasks are considered separately. For
example, when correct responses are contrasted for Experimental Order A versus B only the children's group
demonstrated a significant difference between the two experimental configurations. Statistical analyses of these
data were purposefully conservative to avoid the "noise" inherent in such a large study. Further analyses might
tease out other relationships. Specifically, issues concerning relationships among the duple and compound meters
as well as the major versus minor relationships need further investigation.
It may be speculated that melodies are a form of "chunking" similar to that which has long been reported in verbal
memory (Miller, 1956). The musical "chunk" forms a comprehensible unit that is less vulnerable to interference
than unrelated single items. The degree of similarity of interfering stimuli, and the length or numbers of melodic
"chunks" retainable in a memory span, are still questions to be clarified in understanding melodic perception and
memory.
Implications of these data to music perception seem both important and interesting. As stated previously,
musicians seem to be called upon to exhibit much fine discrimination and the ability to remember various musical
events. Additionally, the apparent ability of professional musicians to remember literally thousands of separate
"tunes" seems to be a very special ability, especially since many of these melodies seem to be quickly learned
after just initial modeling. The present research suggests that the apparent ability of even unsophisticated
youngsters to remember melodies is highly developed. Obviously, more research in warranted.
References
Attneave, F., & Olson, R.K. (1971). Pitch as a medium: A new approach to psychophysical scaling. American
Journal of Psychology, 84, 147-166.
Baddeley, A.D. (1976). The Psychology of Memory. New York: Basic Books, inc.
Boltz, M. G. (1998). The processing of temporal and nontemporal information in then remembering of event
durations and musical structure. Journal of Experimental Psychology: Human Perception & Performance, 24(4),
1087-1104.
Booth, G.D., & Cutietta, R.A. (1991). The applicability of verbal processing strategies to recall of familiar songs.
Journal of Research in Music Education, 39(2), 121-131.
Croonen, W. (1991). Recognition of tone series containing tonic triads. Music Perception, 9(2), 490-498.
Collings, D.S. (1966). Principles of retroactive inhibition applied to melodic discrimination. (Unpublished
master's thesis, The Florida State University.)
Cuddy, L.L., & Cohen, J.A. (1976). Recognition of transposed melodic sequences. Quarterly Journal of
Experimental Psychology, 28, 255-270.
Davies, J.B. (1980). Memory for melodies and tonal sequences: A brief note. (Unpublished paper, University of
Strathclyde, Glasgow, Scotland.)
Davies, J.B., & Jennings, J. (1997). Reproduction of familiar melodies and the perception of tonal sequences.
Journal of the Acoustical Society of America, 61(2), 534-541.
Davies, J.B., & Yelland, A. (1977). Effects of two training procedures on the production of melodic contour in
short-term memory for tonal sequences. Psychology of Music, 5(2), 3-9.
Deutsch, D. (1979). Tones and numbers: Specificity of interference in short-term memory. Science, 168,
1604-1605.
Deutsch, D. (1972). Effect of repetition of standard and comparison tones on recognition memory for pitch.
Journal of Experimental Psychology, 93(1), 152-162.
Deutsch, D. (1973). Octave generalization of specific interference effects in memory for tonal pitch. Perception
and Psychophysics, 13(2), 271-275.
Deutsch, D. (1975). Facilitation by repetition in recognition memory for tonal pitch. Memory & Cognition, 3(3),
263-266.
Deutsch, D. (1978). Interference in pitch memory as a function of ear or input. Quarterly Journal of Experimental
Psychology, 30(2), 283-287.
Dowling, W.J. (1973). The perception of interleaved melodies. Cognitive Psychology, 5(3), 322-337.
Dowling, W.J. (1978). Scale and contour: Two components of a theory of memory for melodies. Psychological
Review, 85(4), 341-354.
Dowling, W.J., & Fujitani,D.S. (1971). Contour, interval and pitch recognition in memory for melodies. Journal
of the Acoustical Society of America, 49(2), 524-531.
Halpern, A.R., Bartlett, J.C. & Dowling, W.J. (1995). Aging and experience in the recognition of musical
transpositions. Psychology & Aging, 10(3), 325-342.
Krumhansl, C.L. (1990). Tonal hierarchies and rare intervals in music cognition. Music Perception, 7(3), 53-96.
Large, E,W., Palmer, C. & Pollack, J.B. (1995). Reduced memory representation for music. Cognitive Science,
19(1), 53-96.
Long, P.A. (1977). Relationship between pitch memory in short melodies and selected factors. Journal of
Research in Music Education, 25(4), 272-282.
Madsen, C.K., Collings, D., McLeod, B., & Madsen, C.H., Jr. (1969). Music and language arts. (Paper presented
at NAMT Regional Convention, Atlanta.)
Madsen, C.K. & Staum, M.J. (1983). Discrimination and interference in the recall of melodic stimuli. Journal of
Research in Music Education, 31(1), 15-31.
Massaro, D.W. (1970). Consolidation and interference in the perceptual memory system. Perception and
Psychophysics, 7(3), 153-156.
Massaro, D.W. (1971). Effect of masking tone duration on perceptual auditory images. Journal of Experimental
Psychology, 87(1), 146-148.
Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits of our capacity for processing
information. Psychological Review, 63, 81-97.
Murdock, B.B., Jr. Human Memory: Theory and Data. New York: John Wiley & Sons, 1974.
Robertson, B.J. (1998). Discrimination and interference in the recall of melodic stimuli by Asian students and
their spouses. (Unpublished Master's Thesis, Florida State University).
Taylor, J.A. (1972). Perception of melodic intervals within melodic context. (Doctoral dissertation, University of
Washington, 1971). Dissertation Abstracts International, 32, 6481A-6482A.
Taylor, J.A. (1989). Psychomusicology: Perceptual cognitive research with implications for the composer and
performer. (Paper presented at the College Music Society Annual Meeting, Denver.)
Tanguiane, A.S. (1994). A principle of correlatively of perception and its application to music recognition. Music
Perception, 11(4), 39-48.
Wickelgren, W.A. (1969). Associative strength theory of recognition memory for pitch. Journal of Mathematical
Psychology, 6, 13-61.
Back to index
Proceedings paper
1. INTRODUCTION
Harmony theory and Counterpoint are the main subjects in music education for traditional music and even for most of contemporary
music. These subjects are based on the western classical music well established in 17-18 centuries. In most music courses of universities in
Japan, the harmony theory is taught using a standard but rather old textbook written by Yujiro Ikenouchi in 1964. Students are required to
learn the harmony theory at the beginning stage of their courses, where students mean not only those majoring in composition but also
those majoring in performance. Harmony structures of most prevailing music including popular music are more or less of the same origin
as Deutsch-Austerreich or Italian baroque music.
The contemporary harmony theory consists of so-called inhibition rules. So, students are required to understand individual inhibition rules
together with inter-relations among them. To learn harmony theory by oneself, however, is said to be difficult, because mutual
dependencies among rules are so complicated, though individual rules are not so difficult to understand. The most hard thing for students
is that they cannot judge by themselves the correctness of their answers to given bass or soprano parts because there is no exercise book
that gives all the allowable answers of the subjects given in it owing to abundance of possible solutions. So, they are forced to rely on their
instructors for judgements. Proposed here is a system that can judge whether student's answers violate any of the inhibition rules or not and
makes students study harmony theory by themselves. It is expected that studying with such a system being provided with appropriate
comments gives the same effects as studying under a tutor.
One of the approaches to realize such a system is to generate all the possible answers to given bass or soprano subjects. We have
constructed a proto-type system that works on a subset of inhibition rules implemented in algorithms. This system, however, is not suitable
for supplementing and modifying rules or pointing out errors in students' answers.
As conventional CAI systems usually employ knowledge base, our system also employs this kind of knowledge base. To express
inhibition rules in the harmony theory, conventional methods such as the IF-THEN scheme, frame representation or semantic networks
seem to be not suitable for this case. For example, if you try to express an exhibition rule using the IF-THEN scheme, the description of
the conditional part becomes huge and complicated. Proposed in the second half of this paper is a "Rule Unit" model for expressing
inhibition rules. A rule unit represents a single rule and a network consisting of unit rules represents the total system of inhibition rules in
the harmony theory.
2. AN OUTLINE OF THE HARMONY THEORY
Basse donnée is a task to assign upper three voices for given bass sequences observing the inhibition rules designated in harmony theory.
A sequence of upper three voices is regarded to be correct if it does not violate any inhibition rule. As the task of "Basse Donnée" doesn't
require any aesthetic evaluation, there can be several or a lot of allowable answers for a given sequence of bass. There is no exercise book
that gives all the allowable answers because hundreds of correct answers are possible for a given bass sequence. This difficulty forms a
high barrier for students' self-learning of the harmony theory. Some rules stated in a book are sometimes described in different forms in
other books, and some rules taught by a teacher are different from what others teach. Sometimes a teacher's personal or special rules are
inconsistent with rules described in textbooks. So, the structure of the rule system depends on textbooks or teachers and is very
complicated. In this paper, a rule commonly stated in all text books is denoted as "Rc", and a rule written in a textbook i but not included
in {Rc} is denoted as "Rbi", and a rule given by a teacher Tj but not included in {Rc} or {Rbi} is denoted as "Rtj". As music is a field of
art, the degree of requirement differs rule to rule. For example, some inhibition rules should be strictly observed but others are not. Some
inhibition rules written in a textbook are modified or neglected by some teachers. Consequently, Rcs, Rbis and Rtjs are not in simple
inclusion relation to one another, and students are embarrassed by the complexity.
3. A CONCEPTUAL DESIGN OF A TRAINING SYSTEM FOR HARMONY THEORY AND ITS IMPLEMENTATION
3.1 A CONCEPTUAL DESIGN
Described here is an imaginary system for learning and training the harmony theory mainly for beginners. The purpose of the system is not
to assign chords to a given melody, as many composition systems do, but to make student practice the inhibition rules in the harmony
theory. The targets of the system are not only beginners but also those who have mastered the rules at some degree and want to
systematize their knowledge. In case of latter half, it is desirable for the system to have facility to show all the possible chord progressions
accepting partially correct chord progression assigned by students. The system is expected to carry out the tasks of both "Basse Donnée"
and "Soprano Donnée". In case of basse donnée, for example, the system first provides a student a bass sequence with which the student
can practice a specific inhibition rule. The student is supposed to have some knowledge about the rule he/she is going to practice. The
system generates appropriate bass subjects employing a random number generator. The student's answer is fed to this system, and the
system judges whether the answer is allowable or not by referring to a set of registered rules. If the answer violates one or some rules, the
system is expected to point out the student's knowledge deficiency or misunderstanding of the rule concerned and provide supplemental
exercises concerning the rule he/she violates.
A brief conceptual function of the system is depicted in Fig.1. The knowledge base in Fig.1 means a set of inhibition rules written in a
Receiving a bass sequence, the system searches allowable chord progressions observing the chord-function rules, chord-progression rules
and note assignment rules. As the system generates all the allowable chord progressions, you can select one of the chord progressions and
the system searches the allowable note assignments for upper three voices according to a chord progression chosen by users. According to
the chord progression ID chosen by users, the system begins to generate all the allowable assignments of the upper three voices for the
specified chord progression. The system, then, asks users to choose one of note assignment IDs and shows the sequence of upper three
voices of the specified note assignment ID on screen in a score format (top) together with MIDI code format (right top in lower half) as
shown in Fig.3. Users can specify what additional information are to be shown on the display besides standard score elements, such as
chord function, chord degree, inversion type, cadence type and allocation type, where "chord function" means one of Tonic(T),
Dominant(D) or Subdominant(S), "chord degree" is an expression of chords using interval numbers of the root note counted from the
Tonic. Superfix 1 or 2 representing inversion type is added to chord degree in case inversion is applied. Cadence type denoted by K1, K2
or K3 is determined every time when Tonic chord appears based on the foregoing chords, and allocation type expresses distinction among
standard/non-standard(?) assignment, Dense(D)/Sparse(S) for standard chords, Octave(Oct) assignment and missing third( ).
Click-Sensitive buttons are displayed with thick characters and marks, while insensitive buttons are made somewhat faded at each stage of
user input. Every operation is possible by clicking the mouse button. Users can listen to the chord progression as MIDI sound by clicking
the "Play" button. The system provides users facilities for validity checking of their answers to given bass sequences. Users can specify a
part of chord progression and make the system show all the allowable chord sequences, that match the partial chords specified by the
users. Exploiting the facility, users can check if their answers violate any of the rules without inputting all the notes in their answer to the
system.
4.2 PERFORMANCE EXAMPLE
Figure 3 shows a display example for a given bass exercise sequence in a textbook (IKENOUCHI, et al, 1964). First, a user is required to
input an exercise sequence by clicking "Input subject" button, and click the chapter number and the subject number afterwards, then the
system reads the bass sequence from data-files and displays it on screen as a subject for exercise. If the user immediately wants to see the
output results calculated by the system, he/she just clicks "Search chord progression" button. Then the system shows the total number of
possible chord progressions in the functional level, i.e. individual note assignment is out of the scope in this stage. Then, the user is asked
to choose a chord progression ID among those listed in a pull-down display. This is done in the input box for "chord progression ID" by
choosing one of the ID numbers shown in the box.
If the user wants to see the chord function assigned to each bass note, he/she put a mark in the box "Show chord function and cadence" and
if the user wants to know the chords in degree number together with inversion type, he/she puts a mark in the box "Show chord
progression". In case the user wants to check whether his/her answer is allowable or not, he/she is required to input some part of his/her
answers filling three input boxes labeled "Restriction on any voice on any chord" in the left on the screen to specify the position or the
bass note ID, its voice(soprano/alto/tenor) and note number in MIDI format. Clicking the "Search" button, the system begins to search all
the allowable note assignments under the specified condition, and it displays the total number of allowable note assignments in the "Note
assignment ID" box in the center-top. The user chooses an ID in the "Note Assignment ID" box, then the system shows the results as both
in score format in the top window and in MIDI code in "Details" box. The resultant chord progression can be confirmed through listening
to it by clicking the "Play" button.
Experiencing this system, a lazy student majoring in piano performance said that "It seems not necessary to study harmony theory as we
have got this system".
In general, matching conditions required in Rp is looser than those required in any of Rck (1≤k≤m), which treats detailed cases
under conditions described in Rp. So, rule Rp should be adopted only if none of Rck is adopted. For example, Rp is a rule concerning chord
progression of V-VI and Rc1 is a rule concerning V-VI-II, Rc2, V-VI-IV, where Rc1 and Rc2 are detailed rules under Rp, which should not
be adopted if either Rc1 or Rc2 is adopted.
"Invalidation" means that adoption of a rule kills other rule(s). We call a rule that invalidates other rule(s) Ri a "Killer rule" Rk as depicted
in Fig 6(b). An example of a killer rule is one concerning chord progression V-VI that invalidates a rule concerning progression between
root positions having no common notes. With adopting Rk, Ri is not necessarily observed, i.e. Ri should be invalidated.
Mutual relations among rules make it hard to extend the rule system implemented in algorithms. With the rule unit model, however, it
becomes much easier to construct the total system and extend its rule having mutual relations though the initial cost is high.
(a)Parent/Children relation
When a set of data enters a rule unit for the first time, the rule unit checks if the rule is to be adopted on the input data by filtering with
corresponding conditional description. In case the rule unit judges that it has to apply the rule to the input, the rule unit investigates
whether the rule has any child rules or not. In case the rule has no child rule, the rule is immediately applied to the input data, otherwise
the unit stores the number of the child rules in a counter in the structured data and the data are sent to the top rule unit Rc1 among the child
units. Rc1 investigates if it is the first time for the unit to receive the data. In case it is, the unit investigates if the data satisfy the conditions
of Rc1. In case of YES, the main rule of Rc1 is applied to the data ignoring the parent rule Rp and the rest of child rules Rc2 through Rcm.
As Rc1 is a child rule of Rp, the counter is reduced by unity to memorize the number of unfinished child units. Then, the system checks the
counter. In case the counter shows zero, it means that the rule unit just processed is the last child unit under a parent unit concerned, and
the data are returned back to the parent unit. In case the counter shows a non-zero value, the data are sent to the succeeding child unit until
no child unit is left. In case no child unit matches the data, the counter value is reduced to be zero and the data are sent back to the parent
unit. In this case the data enter the parent unit as the second time, then the rule Rp is immediately applied to the data.
The scheme described above realizes a smart control mechanism that only one single rule in a rule chain is adopted while other rules in the
chain is suppressed.
A rule unit outputs its resultant judgement J for the input data, but J is a binary variable at this moment, the system cannot express "degree
of goodness" of the chord progression given as its input. Quantitative evaluation of the "degree of goodness" is expected to be realized by
introducing an analog value for J according to degree of importance of the rule concerned. In this case, the value of J should be set large if
the data violate important inhibition rules. To implement this mechanism, a hierarchical control system to handle meta-rules for specifying
dynamic priority description among rule units and a simple threshold logic that rejects chord progressions in case the sum of all the analog
output J values from rule units exceeds a certain value.
6.2 INVALIDATION OF A RULE
There are some rules that should be ignored under some conditions. To realize such a situation, a control mechanism for invalidation
among rules is implemented in the rule unit system. A control sign Inv is assumed to be sent through a channel connecting rule units
concerned from the killer unit to the unit to be invalidated. The relation between a killer unit and an invalidated unit is depicted in Fig. 6,
where Rk denotes a killer unit and Ri, an invalidated unit. In case Rk is adopted, invalidation information is sent to Ri, resulting in
invalidation of Ri.
R21 Connection between root position and first inversion|have same notes|
R22 Connection between root position and first inversion|don't have same notes|
R23 Connection between first inversions
R24 2nd chord|first inversion|->5th chord (root position)
R25 2nd chord|first inversion|->5th chord|first inversion|
R26 Any chord ->2nd chord|first inversion|
R27 Any chord->2nd chord|first inversion, root note high, and sparse|
R28 5th chord (first inversion, having two 6th notes) -> any chord (root position, having any 5th note)
R29 Position|second inversion|
R30 Connection between root position and second inversion
R31 Connection between first position and second inversion
R33 1st chord(second position)->5th chord functions as Dominant
R34 1st chord ->5th chord (second position)->1st chord|root position|
R35 1st chord|root position|->5th chord (second position)->1st chord
R36 1st chord ->6th chord (second position)->1st chord
R37 1st chord|second position|->5th chord
R38 2nd chord or 2nd chord|root position|->1st chord|second position|->5th chord
R39 2nd chord|root position, root note high, and octave position|->1st chord|second position|->5th chord
R40 4th chord or 4th chord|root position|->1st chord|second position|->5th chord
7. CONCLUSIONS
Described in the first half of the paper is a design and user interface of a system that can algorithmically generate all the allowable note
assignments to allowable chord progressions for a given bass sequence. The system is also used to check whether a sequence of chord
progression with concrete note assignments given by a student is allowable or not, referring to a definite set of inhibition rules in Harmony
theory within triads. Functions and user interface are described in details showing some example of its performance.
A rule unit model is proposed as a scheme for representing the system of inhibition rules in the harmony theory, demonstrating that it can
describe even complicated rules including those having exceptions and those having invalidation relations. The proposed rule unit system
will replace the functional basis of the current "Basse Donnée System" that can generates all the allowable chord progressions for given
bass sequences. The system is currently implemented on an algorithmic scheme, but will be reconstructed using the rule unit model
proposed in the second half of the present paper. A driving mechanism of the rule unit model has been constructed and a student model is
in conceptual design stage.
Acknowledgement
This work is partly supported by Grant-in-Aid for Scientific Research (No. 10680395) of Ministry of Education, Science and Culture,
Japan.
REFERENCES
IKENOUCHI, T., & SHIMAOKA, J., et al. (1964) Harmony Theory -theory and practice-. Ongaku no tomo-sha, (in Japanese)
Back to index
Proceedings paper
Generally speaking, irrelevant speech effect means an impairment in serial short-term memory tasks that is
caused by simultaneous background speech or noise, hence irrelevant to the primary task. This impairment is
also observed when task items are presented visually, hence the ISE is not due to acoustic confusion.
In general terms the ISE might be characterised as follows:
● below 90 dB, the magnitude of the effect is independent of the volume of background speech (Colle,
1980; Salamé & Baddeley, 1983; Salamé & Baddeley, 1987)
● the magnitude of the effect is independent of the meaningfulness of the background speech and even
of the language presented in the background (Colle & Welsh, 1976; Salamé & Baddeley, 1982;
Salame & Baddeley, 1987; Jones, Miles & Page, 1990)
● Generally, the ISE is believed to be observed only in serial recall experiments (Salamé & Baddeley,
1990) but LeCompte (1994) and Beaman and Jones (1997) also found it in free recall-, recognition-
and pair associate-experiments.
● most researchers locate the ISE in the retention or rehearsal phase of the memory process, but recent
results (Meiser, 1997) proved that it can be found in the encoding phase as well.
The ISE is an experimentally well investigated phenomenon. Within the above mentioned experimental
restrictions, it can be replicated reliably.
At the moment, there are two competing interpretations of the ISE.
One interpretation within the scope of Baddeley's model of working memory explains the ISE as two
processes recurring on the same cognitive resource (e.g. Baddeley, 1986, p. 89):
Acoustically presented speech-like sounds gain automatic access to the so-called phonological loop in
working memory. The phonological loop is the place where verbal information is stored for short periods of
time. The verbal target items of the memory task are stored in the phonological loop as well. But since the
capacity of the loop is restricted, less items can be stored there when the irrelevant, acoustically presented
material accesses the loop at the same time. Thus a reduction of the recalled number of items is observed in
the retrieval phase. According to this interpretation, the magnitude of the effect depends on phonological
similarity between the target items and the irrelevant background material (e.g. Gathercole & Baddeley,
1993, p. 13).
A different interpretation of the ISE is given by Dylan Jones with the 'changing-state hypothesis' within his
Object-Oriented Episodic Record-Model of working memory. According to Jones' model (e.g. Jones, 1993),
the items of a serial recall experiment which the subjects are supposed to remember form a stream of objects
that is stored in an amodal short-term store (blackboard). In this store, the stream can be rehearsed and thus
the serial informations of the items remain intact. If a second stream of objects enters the blackboard, both
streams are disturbing each other and the serial information of the events might be lost. In other words,
according to this interpretation the magnitude of the effect depends on the similarity between neighbouring
objects within one stream. This means if the adjacent events of the irrelevant stream are very dissimilar, the
result will be a stream with clearly noticeable serial informations. Such a sequence disturbs the rehearsal of
the target items to a large degree. The characteristic feature of such a stream is called 'changing-state' by
Jones et al.
Despite numerous experiments in the last years, the discussion over both interpretations has not yet come to
an end. According to some recent studies (Klatte, Kilcher & Hellbrück, 1995; Meiser, 1997), however, the
changing-state criterion proved to be a useful experimental variable, which - in combination with Baddeley's
working memory model - could account for a broader range of experimental data than either one of the
theories.
In recent years there have been a handful of studies carried out within the ISE experimental framework using
music instead of speech as irrelevant acoustic background material. The results of these experiments can be
summarised as follows:
● sine tones and real music can disrupt memory processes to the same degree as spoken syllables (Jones
& Macken, 1993; Klatte & Hellbrück, 1993).
● with music, perceptual factors of higher degrees (like distraction or auditory streaming) seem to
moderate the ISE (Nittono, 1997).
● instrumental as well as vocal music can produce the ISE (Klatte & Hellbrück, 1993; Klatte, Kilcher &
Hellbrück, 1995), although instrumental music might cause a smaller effect (Salamé & Baddeley,
1989).
● the changing-state criterion seems to be useful to predict memory disruption by musical stimuli
(Morris, Jones & Quayle, 1989, Klatte, Kilcher & Hellbrück, 1995).
All in all, to produce the ISE by use of music, the changing-state criterion seems to be significant. For
musical sounds, changing state might be defined as pronounced segmentation in the auditory stream
produced by sudden energy transitions in the acoustical spectrum that are not regular (c.f. Klatte, Kilcher &
Hellbrück, 1995). This criterion is applied in the experiment described later.
3. Context-dependent memory
The notion of a context-dependent memory is related to the daily experience that memory performance is
enhanced when the situation of retrieval of information resembles the one of learning. The starting-point for
the investigation of context-dependent memory (CDM), with the external surrounding defined as context,
may have been an experiment by Godden and Baddeley (1975). Scuba divers were asked to learn simple
words on land or underwater and to recall these words later in either of these conditions. The divers who had
the same conditions during learning and retrieval showed a far better memory performance. This superior
memory performance under matching learning and retrieval conditions is defined as the CDM-effect.
In the last 25 years, many studies have been carried out within this theoretical framework using a variety of
contexts such as different rooms, indoor vs. outdoor or even the presence or absence of the smell of
chocolate (Schab, 1990). Steven Smith (1988) gives an overview of the studies conducted until the late
eighties.
Unfortunately, the phenomenon of context-dependent memory proved to be rather unreliable. For example,
Fernandez and Glenberg (1985) failed to produce a CDM-effect under the exact same conditions that Smith
used several years earlier (Smith, 1979) to demonstrate a CDM-effect between different rooms. According to
writers summarising the CDM-literature, this unreliability is one of the aspects that is hardest to explain
about CDM (e.g. Eich, 1985; Smith, 1988; Thompson & Davies, 1988; Roediger & Guynn, 1996).
Still, CDM-effects have been observed using music as a context stimulus in memory experiments.
The conclusions that can be drawn from these 'music-dependent memory' experiments are the following:
● music may serve as a context stimulus to produce significant effects in a CDM experimental design
(Smith, 1985; Balch, Bowman & Mohler, 1992; Thaut & de l'Etoile, 1993; Balch & Lewis, 1996).
● the musical tempo seems to be the decisive parameter to make the music be 'felt' the same by the
subjects (Balch & Lewis, 1996).
● the so-called mood-mediation-hypothesis (Eich, 1995) may serve as a theoretical explanation for the
musical CDM. According to this hypothesis the different musical tempo might induce different moods
(or at least different levels of arousal) in the subjects which make them perceive the musical selections
as different (Balch & Lewis, 1996).
According to these conclusions, it might be expected that a CDM-effect will occur when two otherwise
identical musical stimuli only differ in tempo.
4. The experiment
As described in the introduction, this experiment aims partly at revealing the mechanisms that account for the
effects of background music while subjects are engaged in a short-term memory task. Two candidate
mechanisms that describe the relation between the physical structure of music and memory performance
have been found in the literature. In the case of the irrelevant speech effect, clear and irregular segmentation
of the musical stream is thought to impair the verbal short-term memory. The CDM-effects found in the
study of Balch and Lewis (1996) are caused by different tempos in the musical background selections.
All the effects described above were observed in typical laboratory settings where most of the variables
relevant for memory performance were controlled within the experimental design. Unlike the mentioned
studies the second aim of this experiment is to replicate the two effects in a so-called field experiment that
comes much closer to an everyday-like learning situation in terms of its experimental conditions than the
typical laboratory setting. That way, I hope to answer the question whether the two effects can be useful in
explaining short-term memory performance in everyday life.
Experimental Design:
The experimental design was developed so that both effects - the ISE and the CDM - could be tested in one
experiment. The design comprised five conditions. In four of them music was played while the subjects were
presented with the target items and during recall. The fifth condition had no music during learning and recall
phase. Two versions of the same musical piece were used. One had a clear changing-state character and was
considerably faster (CS-Version), while the other one was a slower version with presumably less changing
state to it (Non-CS-Version). The five experimental conditions followed the permutated design shown in
figure 1:
Experimental condition Music during presentation Music during recall
A CS-Version CS-Version
B CS-Version Non-CS-Version
C Non-CS-Version CS-Version
D Non-CS-Version Non-CS-Version
E Silence Silence
After the experiment the subjects were asked to fill in a questionnaire that contained questions about their
demographic data, musical preferences and customs concerning learning with music in the background. In a
closing discussion after the experimental session subjects were asked about their subjective impression of the
task, the experimental music and learning customs in general.
Task and material:
In the first phase of the experiment subjects were asked to memorise 25 simple German words that were
taken from a standardised test of amnesia (Metzler, Voshage & Rösler, 1992) while they were listening to
one of the musical versions in the background. Every word was presented for five seconds. After presenting
the 25 words for the first time the same words were shown again but in a different order. This was done with
the intention to extend the time of exposure to the music to about 250 seconds. The presentation of the words
was followed by a short break of two minutes during which a completely distinct musical piece was played
to prevent the subjects from remaining in the mood state induced by the first background music.
After the break, the subjects were asked to write down all the words they could remember in any order while
one version of the experimental music was played in the background. The recall phase lasted as long as the
presentation phase.
The music played in both cases was the Praeludium No. 8 of the first part of the Well-Tempered Clavier by
J.S. Bach.
For the CS-Version, an interpretation by Glenn Gould which is characterised by pronounced staccato playing
was used. This version of the praeludium was sped up with a digital audio editor without changing the pitch.
The Non-CS-Version was an interpretation of the same praeludium by Friedrich Gulda which is marked by
extensive use of the pedal and played legato. It is this difference in articulation between Gould's and Gulda's
interpretation that defines the changing state criterion in this experiment. This feature turned out to be
sufficient to produce the ISE in a similar experiment with real music described by Klatte and colleagues
(Klatte, Kilcher & Hellbrück 1995, Exp. 4). Gulda's interpretation was slowed down considerably (with the
pitch remaining the same), so that the duration of the CS-Version was only 33% of the Non-CS-Version. The
intensity of the music was controlled by an analogue Minuphone sound level meter and ranged between 55
and 85 dB(A) for both versions following the natural intensity changes of the music.
This design is obviously much closer to the experimental procedure used by Balch and Lewis in their 1996
CDM-study than to the ordinary ISE-experiment. Unlike the experimental conditions that proved to reliably
produce the ISE, free recall was used instead of serial recall, the target items were whole (but simple and
short) words, not numbers nor single letters, and the duration of presentation, break and recall phase was
much longer than the usual 10 seconds. All these changes meant to create a learning situation coming much
closer to a real-life condition than a laboratory setting. The intention of this study was not to simply produce
the ISE, but rather to find it in a situation that is very similar to conditions under which people really learn,
like for example in a class room. In any case, the possibility to find an ISE under these modified conditions
cannot be precluded. LeCompte, for instance, showed that the ISE can be detected under certain conditions
with the free recall-paradigm as well (LeCompte, 1994), and according to a classroom study by Hygge
unpredictable noises with irregular fluctuations in the noise level can impair complex learning tasks with
longer experimental durations (Hygge, 1993).
Subjects and location:
279 German school children participated in the experiment. 97% of the subjects aged 11 to 13. All of them
attended "Gymnasiums", the most demanding school form in German higher education. The subjects were
tested in groups of about ten people. The experiment was carried out in their own classrooms which
represented the most common learning environment to the subjects.
Results:
The results of the five experimental groups were expected to show an inhibiting effect of the CS-version of
the musical selection and enhancing effect if the same music was played during learning and recall
(conditions A, D and E). Thus group D (non-CS-version during learning and recall) or group E (silence in
both situations) were supposed to attain the highest scores, while groups B and C (CS-version in one
situation and non-CS-version in the other one) were expected to show the worst memory performance.
Figure 2 shows the actual mean number of words correctly recalled for the five experimental groups.
Figure 2
A two-factor (2 music conditions while learning × 2 music conditions while recall) analysis of variance
(ANOVA) was performed. Neither one of the main factors was significant, F(1, 273) = 2.19, p=0.14 for the
learning condition and F(1, 273) = 1.47, p=0.227 for recall, nor the interaction effect between the two
factors, F(1, 273) = 1.21, p=0.27.
Although the mean score of group B was significantly higher on the 5%-level according to one-tailed t-tests
than the scores of all other groups, a one-way ANOVA with the five groups as steps of the factor revealed no
significant F-value, F(4, 273) = 1.46, p=0.21. This result was confirmed by the non-parametrical
Kruskal-Wallis-Test that was computed along with the one-way ANOVA, p=0.27. T-tests for the mean
differences of other combinations of two groups did not reach an acceptable level of significance.
The results were controlled by some of the variables that were examined on the questionnaire. Only two
variables exhibited a significant relation with the results of the memory test.
Firstly, those pupils who marked on the questionnaire that they had been disturbed by the music during the
experiment had significantly worse test results (ANOVA: F(2, 204) = 3.259, p = 0.04). At the same time,
those subjects saying they never listened to classical music felt disturbed by the experimental music to a
significantly greater portion than those pupils who listened to it sometimes or regularly (Chi-Square-Test
after Pearson: value: 14.053; df: 4; two-tailed significance: 0.007).
Secondly, a significant relation between the subjects' sex and their memory performance was found. Girls
remembered a mean of 1,77 words more than boys. This led to a highly significant p-value of < 0.000 in
Mann-Withney's U-Test. As the proportion between girls and boys differed in the five experimental groups, a
multiple regression analysis was performed with five dichotomic variables (sex and four dummy-variables
that represented the experimental conditions). Only the variables sex and the experimental condition turned
out to have significant beta-weights (sex: standardised beta=0.248, p<0.000; condition B: standardised
beta=0.153, p=0.34). The corrected R2-coefficient for the global regression model was only 0.058. Thus, it is
not possible to explain the test results only by means of sex and experimental conditions. The distribution of
the standardised residuals of the regression model approximated the normal distribution. So no hint was
found in the data for other systematic influence factors that might have been overlooked in the regression
model.
5. Discussion:
First of all, the better results of the girls can be explained easily. A female advantage of access to and use of
elaborate memory strategies until the early adult age has frequently been reported in the literature on memory
development (e.g. Cox & Waters, 1986; Waters & Schreiber, 1991; Jones et al., 1996).
With the exception of group B, all groups achieved almost the same mean results in the memory test. The
better performance of group B was completely unexpected, because the different versions of the music
during learning and recall should have impaired results according to the CDM. Additionally, the CS-version
in the learning phase should have had an detrimental effect as well. It is very unlikely to assume that the
subjects of group B simply had a better memory, because group B consisted of 62 subjects that came from
four different classes of three different schools. The other experimental groups consisted of subjects that
were similarly distributed over different classes and schools. Hence, differences in memory performance
between classes should have levelled out in the five experimental groups.
So generally speaking these results allow the conclusion that the two effects did not play an observable role
in this experiment.
They may have been overridden by other factors that are more important when real music is used instead of
artificial sounds and learning takes place outside a psychological laboratory. Two candidate factors in this
experiment might have been the musical preferences and the use of mnemonic strategies.
The statistically significant relationships between the low frequency of listening to classical music, the
qualification of the experimental music as disturbing and inferior memory performance favour the
interpretation that musical preferences may have played a crucial role in this experiment.
In most of the closing discussions after the experimental sessions, Subjects expressed that they used some
kind of mnemonic technique to commit the words to memory. This was of course favoured by the retrieval
mode of free recall and by using meaningful words as target items. The significantly better performance of
the girls who supposingly made use of their developmental advantage in this respect (see above) is an
indirect proof of the important role of elaborate memory strategies.
Of course, it is not known if these factors are indeed responsible for overshadowing the predicted memory
effects. But fact is that these assumed influence factors are not or only very poorly integrated in the
theoretical concepts of the ISE and CDM: Musical preferences are not even mentioned in both theories.
Memory techniques do not play a role in the ISE-concept due to the serial rehearsal strategy which is always
presupposed (apart from telephone numbers serial rehearsal is used in very few real-life situations). In the
CDM-theory only the relation of target items with environmental cues is theorised as memory technique.
In summary, the failure to produce the desired memory effects and the assumed influence of factors outside
the theoretical framework lead to the conclusion that the irrelevant speech effect and context-dependent
memory alone are of little importance in describing memory performance under the influence of background
music outside the psychological laboratory.
Influential factors such as attitudes and preferences with respect to background music in general, preferences
concerning the chosen experimental music, and memory techniques employed by the subjects should be
controlled in a future experiment. The question whether and how the physical structure of music affects
memory processing may probably be asked again when at least the above mentioned factors are controlled
within the experimental design.
References
Baddeley, A. (1986). Working memory. Oxford: Oxford University Press.
Balch, W., Bowman, K. & Mohler, L. (1992). Music-dependent memory in immediate and
delayed word recall. Memory & Cognition 20 (1), 21-28.
Balch, W. & Lewis, B. (1996). Music-dependent memory: the roles of tempo change and mood
mediation. Journal of Experimental Psychology: Learning, Memory, and Cognition 22,
1354-1363.
Beaman, C. & Jones, D. (1997). Role of serial order in the irrelevant speech effect: tests of the
changing state hypothesis. Journal of Experimental Psychology: Learning, Memory, and
Cognition 23, 459-471.
Behne, K.-E. (1999). Zu einer Theorie der Wirkungslosigkeit von (Hintergrund-)Musik. In K.-E.
Behne, G. Kleinen & H. de la Motte-Haber (Eds.). Musikpsychologie Bd. 14. Göttingen:
Hogrefe. pp 7-23.
Carlson, J. & Hergehahn. (1967). Effects of rock-n-roll and classical music on learning of
nonsense syllables. Psychological Reports 20, 1021-1022.
Cockerton, T., Moore, S. & Norman, D. (1997). Cognitive test performance and background
music. Perceptual and Motor Skills 85, 1435-1438.
Colle, H. (1980). Auditory encoding in visual short-term recall: effects of noise intensity and
spatial location. Journal of Verbal Learning and Verbal Behaviour 19, 722-735.
Colle, H. & Welsh, A. (1976). Acoustic masking in primary memory. Journal of Verbal
Learning and Verbal Behaviour 15, 17-31.
Cox, D. & Waters, H.S. (1986). Sex differences in the use of organization strategies: a
developmental analysis. Journal of Experimental Child Psychology 41, 18-37.
Davidson, C. & Powell, L. A. (1986). The effects of easy-listening background music on the
on-task-performance of fifth-grade children. Journal of Educational Research 80 (1), 29-33.
Drewes, R. & Schemion, G. (1992). Lernen bei Musik: Hilfe oder Störung? In K.-E. Behne, G.
Kleinen & H. de la Motte-Haber (Eds.). Jahrbuch der Deutschen Gesellschaft für
Musikpsychologie Bd. 8, Wilhelmhaven: Noetzel. pp 46-66.
Eich, E. (1985). Context, memory, and integrated item / context imagery. Journal of
Experimental Psychology: Learning, Memory, and Cognition 11, 764-770.
Eich, E. (1995). Mood as a mediator of place dependent memory. Journal of Experimental
Psychology: General 124 (3), 293-308.
Fernandez, A. & Glenberg, A. (1985). Changing environmental context does not reliably affect
memory. Memory & Cognition 13, 333-345.
Gathercole, S. & Baddeley, A. (1993). Working memory and language. Hove, Hillsdale:
Lawrence Erlbaum Associates.
Godden, D. & Baddeley, A. (1975). Context-dependent memory in two natural environments:
on land and underwater. British Journal of Psychology 66, 325-331.
Greenberg, R. & Fisher, S. (1971). Some differential effects of music on projective and
structured psychological tests. Psychological Reports 28, 817-818.
Hagemann, H.W. & Schürmann, P. (1988). Der Einfluß musikalischer Untermalung von
Hörfunkwerbung auf Erinnerungswirkung und Produktbeurteilung: Ergebnisse einer
experimentellen Untersuchung. Marketing ZFP 4, 271-276.
Hygge, S. (1993). Classroom experiments on the effects of noise on long term recall and
recognition in children aged 12-14 years. In A. Schick (Ed.). Contributions to psychological
acoustics. Oldenburg: Bibliotheks- und Informatiopnssystem der Universität Oldenburg. pp
627-641.
Jones, D. (1993). Objects, streams, and threads of auditory attention. In A. Baddeley & L.
Weiskrantz (Eds.). Attention: selection, awareness, and control. Oxford: Claredon Press. pp
87-104.
Jones, D. & Macken, W. (1993). Irrelevant tones produce an irrelevant speech effect:
implications for phonological coding in working memory. Journal of Experimental Psychology:
Learning, Memory, and Cognition 19, 369-381.
Jones, D., Miles, C. & Page, J. (1990). Disruption of proofreading by irrelevant speech: effects
of attention, arousal or memory?. Applied Cognitive Psychology 4, 89-108.
Jones, M., Yokoi, L., Johnson, D., Lum, S., Cafaro, T. & Kee, D. (1996). Sex differences in the
effectiveness of elaborative strategy use: knowledge access comparisons. Journal of
Experimental Child Psychology 62, 401-409.
Kerr, W. (1945). Experiments on the effect of music on factory production. Applied
psychological monographs 5, 1-40.
Back to index
Proceedings paper
The term aural training has been applied in relation to many different skills from singing solfeo lessons, tapping rhythms,
singing scales, writing down dictations, to discriminating musical textures, forms and timbres (Henson 1987). In fact
aural awareness can be seen as an ability concomitant with any musical activity, but in this work I am concerned with the
concept of aural training focused in the teaching of Standard Musical Notation (henceforth SMN).
The use of notation is a constitutive feature of Western music. In its origin notation was simply a tool for remembering
the tunes, a mnemonic devise that really did work: all music prior to this century has reached us by notational means; and
it is precisely that usage what still mainly justifies its teaching in the conservatories: to have access to the vast "written"
repertory accumulated from the Renaissance to our days. Important as it goal may be, its status has been erroneously
exaggerated in conservatories’ academic circles to the point of being considered the main goal of music theory courses.
Western-tonal music grew increasingly dependent on symbols, both for compositive and performance terms, not only for
needs of recording and transmission but to achieve high levels of complexity. If in its origin notation was only a kind of
by-product of music, along the centuries notational symbols in turn also shaped the way of thinking about music (cf.
López Puccio 1978). If we consider notation as the final output of the cognition of music, as the formalization of abstract
relations among sounds, it is made evident the usefulness of any notation for music teaching and learning. When we put
musical ideas on paper, they "hold still" and "talk back" (Bamberger 1991: 118), and this two-way confrontation is central
for the development of musical understanding, as it was during the history of Western music.
The above arguments justify my view that music literacy - by means of any notational system - is paramount for the study
of music, and not just an accessory skill: it is essentially "an act of reflection . . . of which features are most salient and in
need of recording". . . [a] "post hoc consideration of a piece of music" (Serafine 1988: 37). Accordingly, we can assume
that any attempt to teach notation should start by re-constructing this sequence in the novice music learner, guiding the
reflection over the features that are prominent in his/her representation of the music. To assess that we must consider first
how the illiterate listener listens to music.
FIGURAL AND FORMAL KNOWLEDGE
How people listen to music
Musical listening is a multidimensional experience. The physical stimulus offers us many interacting dimensions -
rhythm, melody, harmony, timbre, text - integrated in a perceptual continuum. To this wholeness our cognition adds even
more layers of "meaning", ranging from low level grouping strategies involving a few events to abstract relations at
higher structural levels, and also aesthetic and emotional responses:
"the sense made of phenomena is always a construction . . each of the individuals finds in the material and
thereby gives existence to aspects that simply does not exist for the other. For the person who attends the
metric aspects of rhythm, figures [in the sense of figural knowledge] remain unrecognised; for the person
who attends to figures, the classification of events according to their shared duration remains inscrutable"
(Bamberger 1991: 29)
Figural knowledge, or sometimes functional, refers to that kind of global, continuous apprehension of the musical
phenomenon that gives instantaneous access to a holistic representation of the music. Our cognition keeps track
simultaneously of all music parameters, assigning each surface event a function inside a figural representation that is
highly dependent on the surrounding context, "events within a figure are contextually bound" (Upitis 1987: 41). Usually a
minor alteration in a single dimension, or the tiniest attempt to break down this functional relation into its different
constituents, is enough to alter or lose altogether the musical identity of the figure.
Formal knowledge or sometimes metrical, stands for a kind of music apprehension that is completely different. It is
concerned with those aspects of the musical stimulus than can be counted, measured, classified, mostly in terms of
proportional durations and frequency ratios, and is put to play when a conventional representation of music, as in SMN, is
needed.
Figural apprehension is the kind of knowledge involved in several different responses to music - aesthetic, emotional,
kinaesthetic - that involve some kind of meaning. The transit from a figural mode to a formal one may be conceived in
relative terms: any time that we integrate a formal concept in a context, we are moving up in the hierarchy and giving rise
to a figural concept. Where structural components are involved, it could be said that figural knowledge allows the access
to a higher level hierarchy that shapes the meaning of individual events, and viceversa, to formalise in the context of this
work means to access to a lower level in a hierarchical representation of music. However, it is not that the figural and
formal modes are mutually exclusive, nor can we imagine each one of them in a pure form. While most of the times both
figural and formal modes alternatively contribute to music cognition, their contribution may be considered asymmetrical.
If a "purely" figural approach is in a sense limited because it has a restricted access to formal knowledge, this fact does
not diminish its musical value. On the contrary, a purely formal approach is limited in a more important way, since no
formal knowledge is musical, without the meaning added by the figural dimension. Presumably a musically literate
musician can switch from one mode of perception to the other at will, though I suspect that there is always a mismatch
between both modes; our figural perceptions always staying beyond what we are able to formalise; we always ‘know
more than we can tell’.
The figural - formal transaction
The teaching of notation - certainly a complex formalization - consists in allowing the learner to perform a fluid interplay
between both modes. At the very beginning the learner needs to switch from his/her figural representation to a formal
one, a process that we call the figural - formal transaction. The crucial point is how to integrate both dimensions in the
learner’s representation: one is intuitive, continuous and holistic, the other is rational, discrete and analytical. When we
are perceiving in a figural mode we do not hear each note in the music, but rather "a non-linear relationship [exists]
between the notes in the score and what people hear when they listen to a performance of it" (Cook 1994: 79). The
formalization needed for notation follows exactly the opposite way: it replaces a figural experience with a formal
description, "a meaningful continuity with a meaningless particularity" (Cole 1974, cited by Terry 1994: 104). It is this
wholeness what renders so difficult to determine exactly all the processes involved in the figural experience of music, but
certainly grouping, beat and meter abstraction, perception of regularities like pattern alternation or repetition,
segmentation, and awareness of tension-relaxation schemas can be considered kinds of figural apprehension. One central
contention of this work is that when we "rescue" our students from that undifferentiated sea of music, when we guide the
first transactions between their figural and formal knowledge, if we fail in establishing the right first link between both
modes of representation it probably will never be easily and naturally restored. Or otherwise, it will be restored just by
some exceptional individuals who are able to find their way alone in that tempestuous sea.
Pedagogical implications: modelling the learner’s cognition
The formalization required for SMN necessarily needs some kind of categorisation of this figural perceptions - to make
discrete something that is naturally continuos -, and categorisation in turn involves a finite number of categories within a
given continuous dimension. If we assume that possible figural hearings are infinite and formal representations are finite,
we conclude with the very obvious assertion that two or more figural representations may have, depending on the level of
analysis, a common formal representation; and also the opposite, that a formal representation - no matter how detailed it
may be - will have more than only one functional representation. This is so because the figural experience is holistic and
will be automatically affected by the context. A higher hierarchical organisation can always be preserved with a
theoretically infinite number of surface (low level) features. Conversely, a formal construct may preserve its integrity
while its figural function may be affected by an infinite number of contextual, higher level frames of interpretation. In
Example 1 are exemplified two cases of formalizations coupled with different figural interpretations. In both cases the
figural interpretation involves the consideration of a context operating at a level higher than the one at which the
formalization operates.
Example 1
FORMALIZATION DIFFERENT FIGURAL INTERPRETATIONS
three events stress context
What puzzles the learner in his/her first contacts with SMN is the finite, relatively limited number of concrete means of
notation (musical figures) - a characteristic inherent to any symbolisation - and the consequent infinite number of musical
"faces" that that symbols may adopt in different contexts:
"mistake[s] can be traced to a mismatch between internal mental representation and conventional
descriptions, [and so] it is useful to help students confront these differences. In this instance, that would
mean helping students to move back and forth between metric and figural hearings and between metric and
figural descriptions of a rhythm or melody" (Bamberger 1991: 66-7).
The usual strategy of the ear training pedagogy is to present as many examples of formal configurations as possible, each
one tied to a unique figural representation. This one-to-one relationship obscures the listener’s active role in the process
of construction of these figural meanings from a neutral notation, and consequently the student achieves, at best, a figural
- formal transaction that is rigid and inflexible. He/she finds it difficult to distinguish one mode of representation from the
other, and has problems to differentiate among the aspects of his/her figural experience that can be reflected by notation
and those that notes cannot portray, no matter the degree of detail in the score’s specifications.
This dialogue between the specifications present in the notation and the sense that the learner makes of it is of crucial
importance and must be carefully aided. It could be said that the role of the teacher is to model the learner’s cognition by
helping to disentangle both kinds of representations while allowing a fluid interplay between them. As few as a single
error, only a small mismatch between this two dimensions, may be catastrophic for any further learning. The didactic
strategy proposed here is in a sense opposed to the traditional one: it consists in presenting the learner with as many
different figural representations as possible that a single formal representation can stand for.
To learn what is already known
How can extant psychological studies inform an ear training pedagogy guided by the conceptual framework outlined so
far? One assumption of this approach to the teaching of notation is that formal knowledge can only be constructed from
figural knowledge. In this conception the figural mode is not an imperfect or preliminary form of knowledge but a
foundation and in turn also a goal of the formal one. If we found that every member of a culture is able to construct a
figural representation of the music, that would imply that every person within the range of standard perceptual and
cognitive abilities should be able to learn to read and write music.
Cultured adults process automatically enormous amounts of musical information. The exposure to the music of the
culture, together with innate cognitive machinery to structure the environment, originates an ‘implicit knowledge’ that
can be found in most people. This ability consists mainly in grouping musical sounds and in giving them meaning inside a
culture-driven pattern (cf. Dowling 1999). This musical competence is relevant here because it seems to involve similar
processes in both musicians and no musicians. If we turn to experiments concerned with some of the different processes
involved - like grouping (cf. Deliège 1987), the perception of tensing-relaxing patterns (cf. Bigand 1993) and of
underlying hierarchic structures (cf. Serafine, Glassman and Overbeeke 1989) - we find that naive listeners indeed show a
performance very similar to that of musicians. It seems that musical training does not affect substantially the ability of
normal adults to respond to the figural aspects of the music, but rather that "music perception is fundamentally similar in
listeners with varying degrees of sophistication" (Trehub, Schellenberg & Hill 1997: 104).
There is also enough evidence to assume that the processes involved in this music competence develop gradually from the
pre-natal stage (cf. Lecanuet 1996) and through childhood as a result of normal exposure to music. Infants group auditory
information according to Gestalt principles of proximity and similarity in ways similar to those used by adults (Demany
1982; Fassbender 1993; Thorpe and Trehub 1989; Thorpe, Trehub & Morrongiello 1988), can discriminate between
different rhythmic and melodic patterns (Chang and Trehub 1977a-b), and are sensitive to phrase structure (Krumhansl &
Jusczyc 1990; Jusczyc & Krumhansl 1993). These and other cognitive processes involved in the figural apprehension of
music - like the understanding of tonal closure and harmonic relationships - develop steadily through childhood and "are
generally well in place in human cognition by the age of 10 or 11 years" (Serafine 1988: 224).
This challenges the traditional conception of the aural training pedagogy that treats the learner as if his/her first contact
with the music took place at the conservatory classroom. If the music theory teacher assumes a valuable pre-existing
musical knowledge in the learner, the process of formalization will consist simply in learning "what he/she already
knows" (Bamberger 1991: 259). The transition from figural to formal will consist simply in putting names to "things" that
the novice already perceives and notices. In that sense the job of the teacher is not to teach but just to show.
When a person listens to a given piece she recognises - understands - several characteristics of the music in a
straightforward manner. The cruder, most basic formalization of that understanding is the discrimination between that
precise piece and another, even if she could not rationalise the difference. In that case she "knows more than she can tell".
In a very direct way that situation of the novice learner differs only in degree from the experience that any musician
knows: to listen to some sounds or music passage, to perceive and understand the function, the "meaning", but not being
able to write it down or to play it in the piano: "that sounds like a cadence but I don’t know exactly which the chords or
the inversions are". Musicians and aural training teachers, no matter their audio-analytic sophistication, continuously are
faced with passages, chords, musical relations that they "understand", but are not able to formalise.
But if the average listener can indeed make all the subtle discriminations involved in the "musical" apprehension of
sounds, why is it that not everybody can learn to formalise this knowledge in the form of SMN? I argue that when people
fail to understand the basis of music notation it is because aural training didactics fails when coming to formalise their
pre-existing musical knowledge: the "units of perception" do not match the "units of description" (cf. Bamberger 1991:
8).
One of the central contentions of this work is that with the adequate teaching almost every musically cultured adult
should be able to learn the basics of music reading and writing. We only need to find means in which the learner could
profit from what he/she already knows. If they do not find bonds between their percepts and the formalizations required
for notation the music teacher will simply be, as the saying goes, "answering questions that the learners have never asked
themselves" and that knowledge will never become operative. I propose that the best way to establish links between
figural and formal percepts in music is to draw on the extant figural knowledge of the learner and to exploit any
previously acquired abilities in general auditory pattern processing. In the following section I present some brief
comments and suggestions of how a didactic approach to the teaching of rhythm could take into account that premises.
A DIDACTIC APPROACH TO RHYTHMIC TEACHING
The key didactic issue is to find out the kind of formal description of rhythm that the novice can more easily access, the
one that is closer to his/her figural description. The crucial question for the aural training teacher is how to lead the
learner’s rhythmic processing beyond the basic cognitive mechanisms like regularity extraction and segmentation into
groups (cf. Drake 1998). What is the next step that the learner should take towards a formalization of his/her perceptions
of uniformity and change in the rhythmic flow? At this point the usual focus in ear training courses is to start from the
metric aspects of rhythm. Traditional pedagogy has assumed that, given that everyone can perceive an underlying beat,
they can easily use it as a temporal ruler to measure the length of events that group over the pulsed field. But, as we have
seen, that is an "after the fact" reasoning that assumes a metrical "skill" inexistent in the learner. I argue that the next
stage in the formalization of rhythmic perception should rely not on the metric aspects of groups of events, but on the
functional relations among them. Now let’s examine what are the functional relations in a group than can be more easily
formalised (I use the word functional instead of figural because it reflects more accurately the idea of relatedness of one
event to other, or others, in a group).
Speech and music prosody
The grouping of events - a functional perception - takes place in the framework of "a regular pattern of strong and weak
beats to which he [the listener] relates the actual musical sounds" (Lerdahl & Jackendoff 1983: 12) called metrical
hierarchy. It seems evident that the first novices’ attempts at the formalization of rhythmic groups should be more
conveniently done at a single level of the metrical hierarchy and without loading on long term memory, that is, weak
beats grouped around strong ones, and precisely speech prosody offers an example of that kind of grouping. Since most
music in the Western tradition can be interpreted in terms of figural units that can be described in prosodic terms, I
propose that the figural-formal transaction should start by the formalization of rhythmic groups in terms of prosodic units.
Prosody is the overall acoustical profile of a spoken/chanted utterance that results from the organisation of weak discrete
elements - syllables - around stressed ones; the term stress designates perceptual salience achieved by any mean like
dynamic changes, lengthening of sounds, and pause, etc. In speech comprehension language prosody plays a paramount
role (cf. Cutler, Dahan & Donselaar 1997, Pynte 1998), and the same is true for music. We can imagine the musical
parallel in the case when we chant a melody in a monotone, at half way between speech and singing. In the absence of
accurate melodic or rhythmic information, we can still keep track of its structure relying solely in the information carried
by the prosodic profile of the output: almost every music can be reproduced, or mimicked, with variable degrees of
precision by the human vocal apparatus.
Both speech utterances and music involve the perception of auditory patterns in the framework of sequential and
hierarchical cognitive processing, where continuous sound strings are analysed into discrete phonemes or notes. For
language and music reading the opposite way is followed: a symbol string is decoded to produce a
spoken/sung/performed output. This continuous strings of syllables or sounds, as they unfold in time, are structured
around metrical stresses that determine the prosodic profile of the phrase. The similar nature of constituents marking in
speech and music (cf. Carlson, Friberg, Frydén, Granström. & Sundberg, 1989) indeed suggests the existence of some
shared cognitive mechanisms for grouping in language and music (cf. Pinker 1997: 535, Fassbender 1996: 80).
The first didactic work with prosody must start with the distinction between stressed and unstressed events in a sound
string. This is not a trivial point, since we have seen that in the multidimensionality of the listener’s experience it may be
difficult to keep track of the salience of events along a single dimension. But the formal processing of speech sentences -
a well learned process in most schooled adults - constitutes a clear example of the grouping of events (syllables) along a
single temporal dimension. The syllable is the basic rhythmic unit of Spanish, and the prosodic profile of an utterance is
shaped by the way in which weak syllables group around stressed ones. One of the most characteristic features of the
spoken language is that all the syllables in a rhythmic group, whether stressed or unstressed, tend to follow each other at
more or less evenly spaced intervals of time. The identification of stressed syllables in words and sentences may be used
as a common cognitive framework for the processing of musical events. By splitting words into syllables, and imposing a
steady beat, we can bring easily our students to the perception of the prosodic profile of an utterance, and to transfer it to
SMN. In a typical didactic sequence we can ask the student:
● to recite a sentence following a steady beat and assigning one syllable to each beat
● to transfer to music notation the metrical structure of the utterance - establishing a correspondence between strong
and weak syllables and strong and weak musical beats - by replacing the syllables by any conventional music
symbol (Example 2).
Example 2
Es - ta ca - si - ta tie - ne ven - ta - nas
| | | | | | | | | | a steady beat
Es - ta ca - si - ta tie - ne ven - ta - nas
| | | | | | | | | |
Es - ta ca - si - ta tie - ne ven - ta - nas
| | | | | | | | | |
Once the metrical structure is established, the further identification and description of prosodic constituents follows. Each
prosodic unit, or foot, is organised around a metrical stress. The formalization of this feet in the learner’s representation
must consist in the distinction between the different ways in which syllables/events organise around this stress. Since the
same metrical structure allows several prosodic interpretations, depending on the grouping, by interchanging different
words to match a given metrical structure, the different groups/feet that unstressed syllables/events may form around
metrical stresses can be made evident for the learner (Example 3).
Example 3
Es - ta ca - si - ta tie - ne ven - ta - nas
In a typical analytic-reading activity the learner will be asked to search for all the possible different figural interpretations,
that is, all possible prosodic feet that could be formed, following the rule that each of them must contain one, and no more
than one, metrical stress (or head). In that way, for instance, a 4 bar rhythmic phrase will present 4 different groups, each
"wrapped" around a head. The feet can be kept similar among themselves, within the possibilities of the phrase structure,
or different feet may be combined (Examples 4 and 5).
Example 4 Example 5
|____| |_______|
But rhythmic teaching must also go on to help the grasping of that groupings that do depend in gross variations in the rate
of appearance of events, like the relations expressed by conventional figures. This relations are indeed very few and
simple: patterns composed of a few events with attack points holding relations multiples of two and three, and further
rhythmic complexity is achieved through a hierarchical arrangement of these basic "rhythmic contours" (Monahan 1993:
127). In traditional aural-training settings the confusion between absolute and relative duration is perpetuated by means of
the explicit taxonomy of contents, that presents as different several groupings-of-figures that have in fact a common
figural-grouping when transposed to another metrical hierarchy (i.e., the groups half note - two crochets and crochet - two
quavers are metrically (formally) different but figurally identical.
The conflict arising from this ambiguity surely is one of the factors responsible for the mismatch between novices’ figural
and formal representations mentioned in previous sections. I argue that the formalization of these hierarchical
transpositions should come only after the basic relationships of rhythmic contours have been grasped at a single metric
level. By introducing whole-beat rests to our ‘elemental’ prosodic phrases we can convey a whole range of time-span
proportions without the burden of introducing different durational symbols. Since the goal is not to teach relations among
figures but among time spans, this method allows the learner to grasp the concept of duration as an abstract relation
between time spans, rather than a property of figures themselves. A whole range of rhythmic contours can be conveyed
without using more than just one metrical level (Example 7). Only after the formalization of this rhythmic contours is that
the "transposition" to other hierarchical levels symbolised by the different range of durational figures makes sense.
Example 7
= =
or
By constructing and deconstructing the different figural representations corresponding to different preferred hearings
learners may gain a conscious access to the factors that govern the formation of that representations. This factors, like
Gestalt principles of temporal proximity, similarity, and good continuation, and abstract concepts like balance between
phrases, symmetry, and motivic repetition, must be made evident to the students from the beginning of teaching. This
elemental, crudely stated, "well formedness rules" are perfectly accessible to the music novice and constitute a basic
premise for all further learning. The educative power of the concept of preferences among competing organising
principles cannot be overestimated. Such concept does not play an accessory role in music reading but on the contrary,
good music reading consists in evaluating in real time the odds of each different musical figure to arise from the current
notation. This ability to foreseen the chances of alternative groupings must not be considered just one more didactic
devise to gain proficiency in note reading but a fundamental component of musical understanding. From this standpoint
the ambiguity inherent to music notation represents not a marginal effect of an inefficient system, but rather the space left
for a "cognitive" interpreter.
Rhythmic complexity in most music is achieved by competing simple preferences at multiple levels. But, regrettably, the
prevailing didactic approach to rhythmic training in conservatories and music schools indeed reflects a noteworthy lack of
contact with the principles of structural organisation that are at the heart of "musical intuitions". I argument that this lack
of musicality is the consequence of confounding structural complexity with sheer difficulty, confusion that in turn reveals
a profound ignorance of the ways in which human cognition "composes" and in turn interprets music pieces. If it is
possible "to explain artistically interesting aspects of musical structure in terms of principles that account for simpler
musical phenomena" (Lerdahl & Jackendoff 1983: 2-3), that imply that artistic music teaching should always remain
close to that simpler phenomena: "structural simples" (Bamberger 1991) like Gestalt principles, temporal symmetry or
balance, alternation between stability and tension, and so forth.
CONCLUDING REMARKS
Aural training pedagogues must take into account what is now widely accepted in music cognition literature: that artistic
content - or musical meaning - is not an unexplainable achievement of humans, but rather the output of very refined
cognitive processes that are common to the species and that are revealed by the widespread receptive music competence
of people. What that implies is that formal music learning - involving SMN or not - is accessible to any person with
standard cognitive abilities and exposure to music. The lack of "ear" can be seen only as the failure to develop
spontaneously the links between the inborn and acquired cognitive processes underlying musical understanding and a
formal system of description. Keeping this in mind would reverse the charge of responsibility for learning that we found
in typical music settings, where the success of the learner mainly depends on his ability, or talent, to pick up the musical
knowledge provided by the teacher. A failure to do so is considered most of the time as beyond the reach of music
pedagogy. On the contrary, a "cognitive teaching" would be attentive to the way in which learners make sense of the
musical phenomenon, and their success would heavily depend on how that implicit knowledge is didactically developed -
transformed from figural into formal - by the teacher.
What can be done to improve the communication between pedagogues and psychologists?, and who can do it? Within the
great diversity of lines and approaches, it could be said that researchers in perception have done their job well. They have
been improving their methods, technologies and - more important - their theoretical frameworks steadily during the last
three decades, and there is a growing concern with achieving a musical significance in their works. Perhaps music
educators would be grateful if music psychologists focused more on the evolutive aspects of the phenomena that they
investigate, and indeed it is very rare to find in the academic papers a explicit reference to the transference of their
findings to music pedagogy. But with regard to the teaching profession, I am inclined to think that after all psychological
findings are there waiting for music educators to make sense of them. If somehow these findings may be too many for the
professional teacher to handle, it seems useful to outline a framework - articulating pragmatic and theoretical concerns -
inside which it could be assessed the relevance of the psychological data for pedagogic ends. This work is an attempt to
foreseen how that kind of synthetic framework could enrich professional music teaching.
REFERENCES
Aguilar, M. C. (1978). Método para leer y escribir música a partir de la percepción. Bs. As.: María del
Carmen Aguilar Ed.
Bamberger, J. (1991). The Mind behind the Musical Ear. Cambridge, MA: Harvard University Press.
Bigand, E. (1993). The influence of implicit harmony, rhythm and musical training on the abstraction of
"tension - relaxation schemas" in tonal musical phrases. Contemporary Music Review, 9 (1-2), 123-37.
Butler, D. (1997). Why the Gulf Between Music Perception Research and Aural Training? Bulletin of the
Council for Research in Music Education, 123, 38-48.
Carlson, R., Friberg, A., Frydén, L., Granström, B. & Sundberg, J. (1989). Speech and music performance:
parallels and contrasts. Contemporary Music Review, 4, 389-402.
Chang, H. W. & Trehub, S. E. (1977a). Auditory processing of relational information by young infants.
Papousek, M. (1987). Melodies in motherese in tonal and nontonal languages: Mandarin Chinese, Caucasian
American, and German. Presentation at the Ninth Biennial Meeting of the International Society for the Study
of Behavioural Development, Tokyo, Japan, July 1987.
Pinker, S. (1997). How the Mind Works. Allen Lane The Penguin Press.
Povel, D. J. & Okkerman, H. (1981). Accents in equitone sequences. Perception & Psychophysics, 30,
565-72.
Pynte, J. (1998). The Role of Prosody in Semantic Interpretation. Music Perception, 16 (1), 79-98.
Rakowski, A. (1999). Perceptual dimensions of pitch and their appearance in the phonological system of
music. Musicae Scientiae¸ 3 (1), 23-39.
Serafine, M. L. (1988). Music as Cognition. New York: Columbia University Press.
Serafine, M. L., Glassman, N. & Overbeeke, C. (1989). The Cognitive Reality of Hierarchic Structure in
Music. Music Perception, 6 (4), 397-430.
Terry, P. (1994). Musical Notation in Secondary Education: Some Aspects of Theory and Practice. British
Journal of Music Education, 11, 99-111.
Thorpe, L. A. & Trehub, S. E. (1989). Duration illusion and auditory grouping in infancy. Developmental
Psychology, 25, 122-7.
Thorpe, L. A., Trehub, S. E., Morrongiello, B. A. & Bull, D. (1988). Perceptual grouping by infants and
preschool children. Developmental Psychology, 24, 484-91.
Trehub, S., Schellenberg, E. & Hill, D. (1997). The origins of music perception and cognition: A
developmental perspective. In I. Deliège and J. A. Sloboda (Eds.) Perception and Cognition of Music, pp.
103-28. Hove: Psychology Press.
Upitis, R. (1987). Children’s Understanding of Rhythm: The Relationship between Development and Music
Training. Psychomusicology, 7 (1), 41-60.
Proceedings paper
Introduction
An accepted approach to the study of instrumental teaching is to investigate the interaction between
teacher and student. Some research has focused on the student and teacher behaviour in this situation
(e.g. Hepler 1986, Persson 1994a, 1994b), others have been studying the teaching of expert teachers,
for the purpose of describing methods and strategies of practice being used in their instruction (e.g.
Kennel 1997, Gholson 1998). Moreover, some studies have dealt with effectiveness and evaluation
where instrumental teaching is concerned (e.g. Abeles 1975, Rosenthal 1984).
This research has contributed with interesting perspectives and insight regarding what is happening in
the teaching-learning situation. However, what these studies have in common is that they deal with
the student-teacher-interaction as kind of an isolated practice. Teacher and student behaviour are
analysed as a two-part relationship, and the strategies of teacher practice are investigated through
what may be called an individualistic, single context model. From my point of view, what has not
been much focused upon is how this kind of teaching practice is socially and historically bounded.
We know little about what shapes the teaching activity, what cultural factors are determining the
choices made - or, in other words, what makes the teaching be the way it is. The understanding of the
student and teacher positions in this kind of pedagogic practice, as well as the logic and the
intentionality in this interaction, depends upon viewing this activity in a relational perspective. By
understanding instrumental teaching as cultural practice, what takes place in the teaching-learning
situation will be looked upon in relation to professional standards, values and practices in the music
community in other respects. This paper discusses a theoretical basis in order to understand
instrumental teaching as cultural practice, and gives examples on research questions which are current
to elucidate within this kind of theoretical framework.
To me, the later research on apprenticeship and learning as situated in social practices (Lave &
Wenger 1991, Nielsen & Kvale 1997) has been an important gateway to this field. Hence, I will start
by giving a short description of learning within a master-apprentice construction, whereupon I will
bring these concepts into the further discussion of instrumental teaching as cultural practice.
Within music education there will be a need for studies within all fields and in different contexts.
Naturally, one has to make some choices and limitations for the individual research projects, but an
important challenge is to be concerned for the relational and the view for totality also when focusing
on certain parts of the activity. From now on I will mainly focus on the teaching within this kind of
pedagogical construction, to be more specific on the principal instrument teaching within higher
music education. From my own background and experience, I have primarily the Norwegian
education of classical musicians in mind.
teacher bring into the teaching-learning situation will be determining for what may happen there.
Among other things, the extent of unanimity between the references of the participants will be
important for their communication. My experience is that there is often a remarkable agreement
between student and teacher with respect to their understanding and expectations of the activity of
instrumental teaching. This may be telling that many aspects of the practice are taken for granted by
the persons involved, which again may be referable to established cultural practices that have
developed over time and thereby achieved a degree of stability in the community.
According to the theories of Bourdieu, cultural practices live within social fields, through ways of
thinking and behaving, rituals, myths and standards inscribed in the everyday practice. A social field
may be understood as a network of relations between single practitioners, groups of practitioners and
institutions, and is recognised by an ongoing social fight between parts and groups in the field about
the power of definition in the relation to which values, standards and rules should be regarded as
fundamental for the social practice (Bourdieu and Wacquant 1992). As an example, the professional
music society can be regarded as such a field, where already established ensembles, concert
organisers, the record industry and educational institutions are among the central instances. To view
higher music education in this perspective actualises the question of how the structure and dynamics
of this field imply possibilities and limitations for the content and form of the education (compare
Krüger 1994). Which parts and groups seem to have the legitimacy to make an influence on the
education? How do different practices in the field play a part in the producing and maintenance of the
conception of e.g. good music and professional performance? In our current topic, how do these
mechanisms add form to instrumental teaching? And, in turn, how does this teaching practice
influence other practices in the field?
To Bourdieu, central premises are that actions and social practices should be regarded as basically
relational, and that the individual is being constituted through cultural practices. The individual is also
an agent for such practices. This means that the individual acts "on behalf of " a wider structure, and
that the culture in a way operates through its participants and is kept alive or continued through their
actions. Instrumental teaching may be seen as a micro practice within a social field, where the student
and teacher also act as cultural agents who, through their teaching-learning activity, contribute to the
furtherance of particular cultural practices. A master-apprentice relationship will often be asymmetric,
considering the teacher serves as an agent for a professional practice in which the student wants to
participate. For all one can see, it may be this familiarity with the culture that constitutes the teacher's
authority and position as a master of the discipline. In apprenticeship learning the visualisation of the
profession is regarded as an important advantage, as the apprentice gets the opportunity of observing
professional practitioners of the discipline as basis for his own process of learning (Nielsen & Kvale
1999).
Among other things, the institutional culture operates through the language used, in spoken as well as
in written curricula and other documents (Säljö 1992). Moreover, the culture may come into view
through the procedures of employment, for example in the description of the position. The didactical
structure of the education, for instance the apprenticeship way of organising teaching and learning,
will furthermore both be an effect and a producer of the institutional culture. Higher music education
can be viewed as organised trajectories of participation, where aspects concerning the culture of the
institution may be determining for the choice of social practices to which the students could take part
and relate their experiences.
In order to grasp the mechanisms of institutional cultures, Bernsteins notion of educational codes may
be a useful approach. He uses the concepts of framing and classification as central dimensions in an
educational code. Classification refers to the degree of separation between the different subjects,
whilst framing concerns to what extent the teacher and student themselves may control the selection
and organisation of subject matters in the teaching-learning process (Bernstein 1971). He also uses the
concepts of integrated and collective codes as two general codes of knowledge, respectively
recognised by weak and strong classification. Such concepts may be useful for investigating the
horizontal and vertical relations of the discipline of instrumental teaching. Aspects that are relevant to
study in this connection are the preservation of division lines toward other subjects in the education,
and the particular teaching-learning practice in relation to the degree of external influence through the
written curriculum, other administrative documents and rules. As an example, Kingsbury (1988)
asserts that the teaching at an American conservatoire to a great extent seemed to be a matter between
the individual teacher and his / her students, where the staff or other teachers only infrequently
interfered. To what extent is this also descriptive for European music education? Further, it seems
possible to elaborate this way of thinking into a wider social field: How is the relationship between
the musical profession in general and the teaching practice within the education? There may be a
connection between these levels, for instance that the teaching could be kind of private just because
the professional practice for orchestra musicians is quite strongly regulated. If there is a high degree
of congruence concerning the cultural practices for which the teachers serve as agents, this might
reduce the need for institutional regulation of the individual practice. A space for random occurrences
and variations will arise; when there is a unity concerning the content of the message, teaching
practices that appear as different could nevertheless socialise the students into similar professional
practices.
To understand instrumental teaching as cultural practice will in effect actualise research questions
concerning what kind of relationship there is between the participants in the teaching-learning
process, how this practice is situated within an institutional culture, and how the instrumental teaching
in turn gets its shape through interrelations to other institutions and to musical and social practices
within the social field. It is, however, hard to believe that it would be possible to deal with empirical
data from all these arenas in a single research project. An important challenge is, thus, to develop
analytical approaches by which it is possible to grasp such research questions. We need to find a way
to "open up" micro practices within a social field, for example the instrumental teaching, in order to
"read" relational mechanisms out of the interactions that take place in this practice.
kept alive through social practice. Such conceptions may concern ideas about mechanisms for
inclusion into the professional community, about central knowledge of the discipline, and about how
to acquire and develop professional competency. They may include ideas about the need for a special
kind of talent, about criteria for assessing repertoire and performance and about assessing
performance ability compared to creative ability. Thus, the discourses make up a repertoire for action.
They set boundaries for what gives meaning in the various situations, and therefore also for what may
happen in a teaching-learning situation.
Säljö (1999) asserts that human knowledge development is largely a question of learning to master
discourses. As to the education of musicians, one can say that the students develop professional
competency by relating to, internalising and continuing to develop the dominating discourses of their
profession. They gain access to these discourses through their interaction with the teacher of principal
instrument, but also through their encounter with other students and actors and practices within the
musical community. The discourses they encounter and through which meaning is generated from
experiences are quite determining for the students' learning and professional growth. Given a focus
upon the teaching situation itself, this is a question of what discursive practice the teacher brings to
the teaching. The discourses the teacher draws upon are manifested in his or her actions - verbal or
non-verbal - and in the relationship between them; the way s/he gives instructions, her or his use of
musical examples, the way s/he positions her or himself physically in relation to the student can be
also regarded as "staging" of discourses. Thus, the teaching can be regarded as "an ensemble of
discursive practices" (Krüger 1999). However, pedagogical practice in such a one-to-one situation
should be understood as social interaction. From the perspective of instrumental teaching on a higher
level, it is of special interest to see how this interaction is being constructed in the encounter between
the discursive practice of the teacher and that of the student. Both the fact that the teaching takes place
individually, and also that it is about an education of practitioners where mature students are involved,
may make the student's influence upon this construction be stronger than if teaching took place in an
ordinary classroom situation.
Using discourse as an analytical concept in research implies investigating how the participants
construct the didactical space, within which learning can occur. It implies making explicit the
discursive rules and the structure that organise the practice. How does one proceed in doing this?
First, it is necessary to gain insight into the actual practice the way it is being performed, and not only
through the way it is talked about and described by the participants. Therefore observation is
fundamental. However, interviews may also provide an important insight, especially into how those
involved think about and experience their situation. Therefore, a combination of interview and
observational data may provide a sound ground for the analysis. Both interviews and observation
should be relatively open in form so that the persons interviewed and observed get an opportunity to
express themselves and act as naturally as possible. The researcher should try to minimise her or his
forcing upon the material pre-determined categories that have little grounding in the practice under
study.
Traditionally, discourse analysis as a methodological approach has focused on verbal language and
other sign systems of social practice. Epistemologically, the method is grounded in social
constructionism and in more recent philosophy of language. The argument has been put forward that
language is not only being shaped on the basis of our conceptions of reality,
but also that the language itself, in fact, shapes this reality. Linguistic analysis is one possible
approach. For example, concepts that appear central to the interaction, the use of metaphors and use of
personal pronouns, may all express basic conceptions of the subject and of the participants' experience
of their position in the actual community of practice.
However, in research on instrumental teaching a focus only upon verbal language may imply a
significant limitation to an understanding of what actually takes place in the interaction. In such
practice, verbal comments, gestures and musical expressions are typically woven together to form
meaningful statements. Therefore, to understand how meaning is being shaped in the
teaching-learning situation and how this interaction is being constructed, it is important to consider all
kinds of social actions. In order to reveal discursive patterns the following questions should be raised:
What topics can be identified in the interaction? What are the repeated actions and what are the
patterns that organise them? When, and how, does the teacher intervene in the student's playing? What
examples are being used, and what narratives from the professional practice are brought into the
situation? What are being made the objects for explicit discussions and choices, and what seem to be
taken for granted by the participants? Such questions may reveal basic ideas and styles of reasoning
about the subject and about the musical profession. They may open instrumental teaching as a micro
practice within a social field, to be explored in relation to other practices and discourses that operate
within the music society in general.
Epilogue
Within the discipline of instrumental teaching there will be a need for research on all levels and
arenas. The theoretical perspective discussed in this paper might primarily contribute by opening up
cultural practices and showing aspects and mechanisms that are taken for granted, as a basis for
reflection and refinement of the discipline. For some time now there has been a focus on the
importance of educating reflective practitioners, and of encouraging professional practitioners to
reflect upon the knowledge, values and norms that are embedded in their work (Schön 1983, 1987).
There are, however, reasons for questioning whether it is possible or not to grasp the determining
mechanisms for this kind of social practice through individually based reflection. In many cases this is
not only a question about the individual's personal theory of practice, but also about ideas and
conceptions that operate within the institutions, the systems, the culture:
"If we are to engage in something called action research or "reflective teaching", we need
to ask what systems of ideas organize how we construct the objects that we are calling
schooling, children, teaching, learning, and so on. (...) The ordering of reason through the
commonsense and ordinary languages of schooling needs to be brought into focus."
(Popkewitz 1993: 27)
By focusing on collective mechanisms instead of on the individual, we might also gain new insight
into the individual participant's actions in the teaching-learning situation. As an example, we could
better understand how the instrumental teacher constructs the didactical space through his practising
of cultural or discursive practices. Returning to the matter of apprenticeship learning, the following
questions arise: How do the social practices of the music education create possibilities and limitations
concerning the students a) participation in professional communities of practice, b) access to
examples, narratives and models regarding professional practice, and c) process of acquiring a
professional identity? This is not only a question of how to learn a musical profession, but also about
what kind of a musician one becomes; what kind of participation, knowledge and musical preferences
that are given legitimacy in the education and thereby are communicated in the social practices of
instrumental teaching. From my point of view, this approach seems important as a supplement to the
studies where teacher and student behaviour are investigated on basis of a single context model.
References
Abeles, H.F. (1975): Student perceptions of characteristics of effective applied music instructors.
Journal of Research in Music Education, 23, 2, 147-154
Bernstein, B. (1971): On the classification and framing of educational knowledge. In Class, Codes
and Control. London: Routledge and Kegan Paul Ltd.
Bourdieu, P. & Wacquant, L. (1992): An Invitation to Reflexive Sociology. Chicago: The University of
Chicago Press
Gholson, S. (1998): Proximal Positioning: A Strategy of Practice in Violin Pedagogy. Journal of
Research in Music Education, 46, 4, 535-545
Hepler, L.E. (1986): The measurement of teacher / student interaction in private music lessons and its
relations to teacher field dependence / independence. Dissertation Abstracts International 47, 2939-A
Jørgensen, M.W. & Phillips, L. (1999): Diskursanalyse som teori og metode [Discourse analysis as
theory and method] Frederiksberg: Roskilde Universitetsforlag
Kingsbury, H. (1988): Music, Talent and performance. A Conservatory Cultural System. Philadelphia:
Temple university press
Kennel, R. (1997): Teaching music one-to-one: A case study. Dialogue, Spring 97, 21, 69-81
Krüger, T. (1994): Musikklærerutdanningen som et sosialt felt. [Music teacher education as a social
field]. In P. Dyndahl & Ø. Varkøy (Eds): Musikkpedagogiske perspektiver. Oslo: Ad Notam
Gyldendal
Krüger, T. (1999): Undervisning som et ensemble av diskursive praksiser [Teaching as an ensemble
of discursive practices]
In Pedagogikk - normalvitenskap eller lappeteppe? Rapport: 7. nasjonale fagkonferanse i
pedagogikk, bind 2. Forskningsrapport 43/1999, p. 219-225 Lillehammer: Lillehammer College
Lave, J. & Wenger, E. (1991): Situated Learning. Legitimate Peripheral Participation. Cambridge
University press.
Nielsen, K. & Kvale, S. (1997): Current issues of apprenticeship Nordic Journal of Educational
Research, 17, 130-139.
Nielsen, K. (1998): Apprenticeship in Music. Learning at the Academy of Music as Socially Situated.
Doctoral Dissertation. Aarhus University, Institute of Psychology.
Nielsen, K. & Kvale, S. (Eds.) (1999): Mesterlære: Læring som sosial praksis [Apprenticeship:
Learning as social practice] Oslo: Ad Notam Gyldendal
Persson, R. S. (1994a): Concert musicians as teachers: On good intentions falling short European
Journal for High Ability, 5, 1, 79-91.
Persson, R. S. (1994b): Control before shape - on mastering the clarinet: A case study on
commonsense teaching. British Journal of Music Education, 11, 223-238
Popkewitz, T. S. (1993): Professionalization in teaching and teacher education: Some notes on its
history, ideology and potential. Paper, also published in Teacher and Teacher Education, 1994, 10, 1.
Rosenthal, R.K. (1984): The relative effects of guided model, model only, guide only, and practice
only treatments on the accuracy of advanced instrumentalists musical performance. Journal for
Research in Music Education, 32, 4, 265 - 73
Schön, D. A. (1983): The Reflective Practitioner. How professionals think in action. Arena. Ashgate
Publishing Limited.
Schön, D. A. (1987): Educating the Reflective Practitioner. Towards a new design for teaching and
learning in the professions. San Fransisco: Jossey-Bass
Säljö, R. (1992): Institutioner, professioner och språk. [Institutions, professions and language]
Unpublished paper.
Säljö, R. (1999): Kommunikation som arena för handling - lärande i ett diskursivt perspektiv.
[Communication as an arena for action. Learning in a discursive perspective.] In C.A. Säfström og L.
Östman (Eds.): Textanalys. Introduktion till syftesrelaterad kritik. Lund: Studentlitteratur
Back to index
Proceedings paper
Abstract
Recent research in the psychology of music has helped to form a clearer picture of the processes involved in learning
a musical instrument, in particular of the need for both extrinsic support and intrinsic motivation to sustain
commitment to learning. The research presented in this paper brings a new dimension to that understanding, as it
captures the thoughts and ambitions of children and their parents before the child starts lessons, drawing on
qualitative interview data to gain a clearer picture of what inspires a child to start learning an instrument in the first
place.
Introduction
Research in recent years has advanced our understanding of the processes involved in learning a musical instrument,
with notable investigations into motivation (O'Neill, 1996), practising strategies (Gruson, 1988; Hallam, 1997; 1998),
and the involvement of parents and teachers (Davidson, Howe & Sloboda, 1997). Much of this research has centered
on the established student population, profiling those children who have already begun lessons, through interview,
observation and skills analysis. As a result, we now have a clearer picture of the complex social, musical and
developmental questions involved in musical instrument learning, even though investigation of the initial motivations
and ambitions of learners has previously been largely retrospective.
The data presented here offer a new insight on the stage just before learning commences, in which parents and
children are considering the task that they are about to undertake and predicting the level of commitment that learning
a musical instrument will involve. Quantitative analysis of data from the same longitudinal study has already revealed
that children bring to their music instruction expectations and values that potentially shape and influence their
subsequent development. If this preliminary evidence (McPherson, 2000) is correct, then children as young as eight
are able to differentiate between their interest in learning a musical instrument, the importance to them of being good
at music, whether they think that their learning will be useful to their short and long-term goals, and also the cost of
their participation, in terms of the effort needed to continue improving. Interestingly, children who display short-term
commitment to learning their instrument achieved at a lower level, irrespective of whether they were undertaking
low, moderate or high levels of musical practice; students who express medium-term commitment achieved higher
average scores which increased according to the amount of their practice during the period studied. The highest
achieving students were those who displayed long-term commitment to playing coupled with high levels of practice.
These results are consistent with findings in more general educational research (Eccles, Wigfield & Schiefele, 1998;
Wigfield et al., 1997), which show that young children's beliefs about their own personal competence and valuing of
an activity predict how much effort they will exert on the task, their subsequent performance, and their feeling of
self-worth, even after previous performance is controlled (Eccles, Wigfield & Schiefele, 1998; Wigfield et al, 1997).
Evidence to date shows that children are able to distinguish between what they like or think is important for them
against perceptions of their own competence in a particular field (Eccles et al., 1993; Wigfield, 1994), and that
expectancy beliefs, such as self-concept, ability perceptions, and expectations for success, are effective predictors of
achievement (Wigfield & Eccles, 1992).
Methodology
The families participating in this longitudinal study include 156 young brass and woodwind players taken from eight
different primary schools in Sydney, each of which has an established instrumental teaching and school band
programme. The cohort was chosen with a balance of gender, socio-economic status and school background, and
reflects the variety of backgrounds and experiences that children bring to musical instrument learning. Over the three
year study, parents and children participated in regular interviews. The children also worked with the investigator in
research sessions that mapped out their progress across a number of performance areas, including sightreading,
performing rehearsed repertoire, memorising, playing by ear and improvisation. For the purposes of this paper,
qualitative data are drawn from the initial interviews with parents and children conducted immediately before the
child started instrumental lessons. In these interviews, participants were asked to state their reasons for beginning
musical instrument tuition, and to predict the effort and involvement that this tuition would require.
The results have been analysed for emergent themes, allowing an exploration of the different perceptions held by
parents and children about what learning an instrument and participating in the school band actually entails. For the
purposes of this paper, the focus is on the types of reasons given by parents and children, rather than their validity as
a predictor of success. It will become apparent that the perspective on learning held by the two generations can often
be significantly different, but further analysis will be necessary to assess whether these differences are in any way
responsible for dropout rates and loss of motivation.
Results
Taking an overall view of children's and parents' responses to questions about motivation and expectations on starting
to play an instrument reveals many emerging links across and within the two groups. Unsurprisingly, parents tend to
be more articulate about their educational reasons for encouraging their child to start instrumental lessons and join the
band - two activities that are connected for the majority of the children in the survey - whereas children's comments
are more focused around the ideas of having fun and enjoying the experience of being in the school band.
Extracting themes by frequency of occurrence, the responses can be broadly grouped in the following categories:
Table A: Children's responses
CATEGORY EXAMPLES
2. Following a role model or peer 'My sister is in school band and she says it's really fun'
group
'Lots of other people are doing it - my friends are all in band'
'My brother suggested it because he plays drum kit and we could form a
band'
3. Fulfilling an ambition 'Since year one I've always wanted to play trumpet'
'I wanted to play it since last year when I saw the band playing'
4. Following parental advice or 'Mum thought it would be really nice - she likes hearing instruments and
expectations she likes teaching them'
'My last year's school report said my music was bad so Mum said I should
do the band to help me'
5. Extra-musical benefits 'The band camps sounded interesting - being away from home and trying
out different sorts of foods'
'I like the way that everyone has to wake up early [for band rehearsals]
and we don't have an excuse to wake up early in the morning unless I
have to go somewhere'
'It's a nice heavy instrument so you can like get muscles and stuff carrying
it around'
'I wanted to get one of those little trolleys so that I could use it for other
stuff too'
1. Decision generated by the child 'We're not a music learning family - I'd never offered it to him'
'He was hanging out to be old enough to join. We discussed it with him
and decided to let him, to help boost his confidence, which has worked'
'It's not something I would have chosen for her, but she convinced me. I'm
happy and I hope it lasts'
2. Following a role model or peer 'He knew what band was about because his brother's been in it for three
group years, so he's been looking forward to joining'
'She's been to many band practices with her brothers and always assumed
she'd join too'
'It came about through her sister learning, and seeing Lisa Simpson play
sax'
3. Following school expectations 'There's never been any question about it, it's always been assumed she'd
join when she got to Year 3 - it's so much a part of the school'
'No real decision - it's part of the school ethos'
4. Following parental expectations 'We talked about it as a family, the cost and commitment, and said we
wanted her to stick with it for a year'
'We just presumed she would, and she just expected to go in the band'
'It was always expected by her and by us. If she'd said she didn't want to,
we would have had to look seriously at the family dynamics'
Discussion
A surface-level analysis reveals immediately that for children who are starting musical instrument tuition and band
membership the focus is on having fun and being with friends, whereas for parents, a longer-term perspective on the
educational value is prominent, along with a much clearer acceptance that band participation is an accepted part of
the primary school programme. In other words, the decision seems more momentous to the child, who has often been
waiting for some time to join the band, and anticipates gaining enjoyment and satisfaction from doing so. For parents,
though, working from the broader perspective of what other children and offspring have previously done, learning an
instrument is often seen as a 'foregone conclusion', expected by them or by the school. Thus parents and their children
perceive the significance of the decision differently, creating a potential conflict before the instrumental tuition has
even begun. Typically, parents are more aware of the more serious implications of learning, including costs and
commitment to practice and rehearsals, whereas children are looking forward to gaining enjoyment from being a
member of the band. Once again, the two parties are starting out with different expectations, where the child
anticipates fairly immediate, pleasurable results, whilst the parent looks to the possible challenges of sustaining
motivation beyond the initial enthusiasm. In some families that were interviewed, deliberate efforts had been made by
the parents to point out these longer term consequences to the child through negotiating expected practice times, or
stipulating that the instrument should be tried for at least a year.
There are also differences in the way that parents and children perceive the school opportunities, which are usually
centred around band rehearsals, often held weekly before school. Many children cite the experience of having heard
the band as being a motivation for taking up an instrument, and whether they have connected with the sounds they
have heard or the prospect of going on band camp, it is clear that membership of the band is an important part of
school for them. Parental reactions differ within group on that question, with some parents sharing the children's
impatience to be in the right year for joining the band, and others feeling somewhat pressured into allowing their
child to participate. Particularly where the child is the first of the family to be involved, some parents claimed to be
unaware of the opportunities available until their child came home with information, and as a result those parents
expressed concerns about the costs and commitment involved. Others acknowledge the prominence of the band in the
school's ethos, and appear to have received clear information from an induction evening held in school. Getting
accurate information across to all parents is always difficult, but in this case it serves as an illustration of the different
expectations and levels of support that different children and their families bring to the band.
Children and parents involved in our study were also asked about the child's choice of instrument, and their responses
demonstrated that children, in particular, often had very clear ideas about the instrument that they wanted to play.
Their reasons ranged from having seen a sibling, friend or public figure play the instrument, to liking the sound or
some physical feature of the instrument. Parents' reasons ranged across many categories, but were often more
practical than the children's, including considerations of how loud the instrument would sound in a small flat, or how
easily the child would be able to carry it on public transport. All these reasons were overshadowed by the audition
process in operation at most of the primary schools involved, where children would be tested on different instruments
and informed of their suitability. Many children therefore ending up playing instruments other than the one they had
been hoping for, and although most parents and children seemed to be resigned to the school's choice, it is possible
that the mismatch between the ideal choice and the reality could put a dampener on their initial enthusiasm.
Most parents anticipated that their child's 'worst thing' would be practising, and indeed, quantitative analysis of data
from the same study has revealed that unrealistic expectations about practice can predict failure to continue beyond
the first nine months of tuition on the instrument (McPherson, 2000). These effects were particularly marked where
the mother had not previously learnt a musical instrument, suggesting that parents learn from their own and other
children's experiences, placing only and eldest children from 'non-performing' families at something of a
disadvantage. The reluctance of some of these mothers to allow their children to become involved in band comes
across clearly in the qualitative data, where they express concerns about the pressures and commitments involved in
learning, contrasting with those parents who are more relaxed about letting their children take the opportunities in the
hope of them being of benefit.
Conclusions
From the data presented here, parental and child attitudes to learning can be seen to be varied and strongly held,
painting a vivid picture of the different expectations that children and their families bring to instrumental learning.
These results do not attempt to analyse the direct effects of different attitudes upon children's success in learning, as
the relationship between such complex factors is beyond the scope of this paper. However, it is evident that this
previously neglected area of research has much to offer our understanding of instrumental learning, motivation and
musical perception amongst young children.
Broadly speaking, it is apparent that the majority of parents have a clearer idea of the potentially negative aspects of
instrumental learning, such as cost, commitment and practice, than do their children, who are more aware of the
elements of sociability and fun involved in being in the band. It could be argued that to make children more aware of
the realities of learning an instrument, as some of the parents in our study tried to do, could be potentially off-putting,
squashing the enthusiasm that sustains most of the children through the initial stages of learning. We would suggest
that more useful intervention could be made after the first few months of learning, to address the difficulties that arise
when the less committed novices encounter the inevitable need for increased effort for less reward, having got past
the initial stages of learning. Identifying and making goals more manageable at this stage would help to give children
a clearer perspective of the task that they had embarked upon, whilst educating parents in strategies for supporting
practice would create closer links between parental and child expectations of learning. It is certainly the case that
further investigation is necessary in order to foster the ideals and hopes that children bring to learning, and to sustain
their enjoyment and development of their instrument, and of music.
References
Davidson, J. W., Howe, M. J. A. & Sloboda, J. A. (1997) 'Environmental factors in the development of musical
performance skill in the first twenty years of life', in Hargreaves & North (Eds), The Social Psychology of
Music (pp. 188-206). Oxford: Oxford University Press.
Eccles, J., Wigfield, A., Harold, R. D., Blumenfeld, P. (1993). Age and gender differences in children's self
and task perceptions during elementary school. Child Development, 64, 830-847.
Eccles, J. S., Wigfield, A., & Schiefele, U. (1998). Motivation to succeed. In W. Damon (Series Ed.) and N.
Eisenberg (Ed.), Handbook of child psychology (5th ed., Vol. 3): Social, emotional and personality
development. (pp. 1017-1095). New York: Wiley.
Gruson, M. L. (1988) 'Rehearsal skill and musical competence: Does practice make perfect?', in Sloboda (Ed.)
Generative Processes in Music: The psychology of performance, improvisation and composition. Oxford:
Clarendon Press.
Hallam, S. (1997) 'Approaches to instrumental practice of experts and novices: Implications for education', in
Jørgensen & Lehmann (Eds), Does practice make perfect? Current theory and research on instrumental music
practice (pp. 89-107). Oslo: Norges Musikkhøgskole.
Hallam, S. (1998) Instrumental Teaching: A practical guide to better teaching and learning. Oxford:
Heinemann.
McPherson, G. E. (2000). Commitment and practice: Key ingredients for achievement during the early stages
of learning a musical instrument. Paper to be presented at the Eighteenth International Research Seminar of the
International Society for Music Education, Salt Lake City, July, 2000.
O'Neill, S. A. (1996) Factors influencing children's motivation and achievement during the first year of
instrumental music tuition. PhD thesis, University of Keele.
Wigfield, A., & Eccles, J. S. (1992). The development of achievement task values: A theoretical analysis.
Developmental Review, 12, 265-310.
Back to index
Proceedings abstract
Is the mozart effect "debunked"?
Frances H. Rauscher
Background:
The finding of a significant difference in spatial-temporal task scores for college students who listened
to ten minutes of a Mozart sonata compared to relaxation instructions or silence has been challenged.
Although the effect has been replicated ten times in seven autonomous laboratories, three recent
reports by Steele and his colleagues have generated media accounts claiming that the "Mozart effect"
is "debunked."
Aims:
The goal is to review the literature on the Mozart effect, and to place the work in context.
Main contributions:
The Mozart effect has been investigated by neuroscientists, psychologists, and educators. The original
studies were motivated by a neural network model of higher brain function developed by Gordon
Shaw and his colleagues. Shaw's model predicted that specific music might excite cortical firing
patterns used for spatial-temporal task performance. Some psychologists attempting to replicate the
work, however, were largely unfamiliar with Shaw's model and its predictions, and mistakenly
employed a battery of inappropriate dependent measures. Relevant experimental considerations (i.e.,
practice effects, task difficulty, experimenter effects, task-to-task priming, and attention) were also
largely overlooked. Researchers using appropriate dependent measures and experimental designs have
largely succeeded in replicating the effect. The Mozart effect is further supported by research on the
effects of early music instruction on spatial-temporal task performance, studies with Alzheimer
patients, rats, epileptics, and studies using EEG and fMRI.
Implications:
The Mozart effect is not "debunked." It is based on sound neuroscientific principles and is supported
by psychological and educational research. Researchers should examine the converging evidence
before attempting replications.
Back to index
Proceedings paper
The Influence of Parental Attitudes and Support on Children's Engagement in Instrumental Music
Katherine J. Ryan and Susan A. O'Neill
Department of Psychology, Keele University
Introduction
Although there has only been limited research performed in the music domain investigating participation and motivation,
this area has received a growing amount of interest across other domains. During the last fifty years a substantial amount of
research has been performed examining the role of specific parenting practices and beliefs on children's motivation and
performance outcomes (for an overview see Eccles, Wigfield & Schiefele, 1998). The finding that parents' educational
expectations and beliefs affect their children's educational aspirations has been replicated on a variety of age groups,
nationalities, and races (Seginer, 1983). Researchers who have assessed parents' beliefs about their children's abilities have
found that parents are reasonably accurate at estimating their children's general abilities (Galper, Wigfield & Seefeldt
1997). Furthermore, parents' beliefs about their children's abilities have been found to influence children's performance in
school. Hess, Holloway, Dickson, and Price (1984) found that mothers' expectations for their children's academic
performance predicted their children's reading readiness scores. Other research has established a positive relationship
between parents' expectations for their children's achievement behaviours and children's actual behaviours (e.g. Crandall
1969; Winterbottom, 1958).
Eccles suggests that parents' have the biggest impact as conveyors of expectancies regarding their children's abilities. Eccles
[Parsons], Adler & Kaczala (1982) found that parents' beliefs about their children's maths ability had a stronger influence
on children's own maths ability beliefs than either parents' role modelling of different activities or the children's own grades
in school. These findings were repeated in a more recent study (Frome & Eccles, 1998) confirming earlier hypotheses that
parents act as expectancy socialisers for their children and that children's self-perceptions reflect children's and parents'
interpretation of reality in addition to reality itself. These findings highlight how influential parents' beliefs and expectations
are to their children's own beliefs. It also suggests that in the case of a voluntary activity, such as instrumental music, these
beliefs may have a considerable impact on the level of support provided.
Much of the related research in the music domain has focused on understanding and predicting the development of musical
achievement and performance skills. Research suggests that parents provide an important source of motivation, by
generating and sustaining children's interest and commitment to music lessons, both initially and in the long-term (e.g.
Davidson, Howe, Moore & Sloboda, 1996; O'Neill, 1994; Sloboda & Howe, 1991; Sosniak, 1985, 1990). The active
support provided by parents, such as attending lessons, obtaining feedback from teachers and supervising daily practice has
been identified as a major contributory factor in the development of musical performance skills in successful musicians
(Davidson et al, 1996). Many high-achieving musicians reported that without this level of active parental intervention they
would not have spent so many hours practising. Indeed, in a study investigating children's motivation for instrumental
music, Yoon (1997) suggests that the parents' level of involvement in their children's musical activities can influence
children's perceptions of their parents' valuing of music, and their motivation for participating in the activity. The majority
of past research in this area has tended to focus on key behaviours (such as supervision of practice) rather than the influence
parental attitudes and expectations may have on young people's involvement in music. Yet according to Csikzentmihalyi,
Rathunde, and Whalen, (1993), for young people to transform their potential into actuality, 'the support of family, in both
material and psychological terms, is essential They found that one of the key factors which separated 'talented' from
'average' young people in a variety of domains, including music, was that talented individuals tended to come from families
which provided both a stable environment where individuals felt a sense of support, whilst at the same time, members were
encouraged to develop their individuality by seeking out new challenges and opportunities. These findings are similar to
other research regarding the importance of parental influence in children's development across a variety of domains, such as
sport and maths (e.g. Eccles et al, 1998).
There is no doubt that parents play a significant role in children's motivation and participation in activities. Previous
research has demonstrated that the motivation to take up and persist with playing an instrument is inextricably linked to the
social and educational environment. However very little empirical work has been done to investigate the interrelations of
parental attitudes and support on children's engagement in instrumental music. The present study aims to examine these
influences and the types of support most likely to lead to successful outcomes and high levels of engagement.
Method
Participants
The present study is linked to a longitudinal project investigating the social and motivational factors influencing young
people's participation and achievement in music. Parents of Year 6 children (aged 10-11years) were recruited through the
33 primary schools in North Staffordshire taking part in the larger project.
Procedure
The parents completed a questionnaire designed to assess the influence of their child-specific beliefs and their support on
children's engagement in instrumental music. All items were answered using 7 point Likert scales, except where categorical
responses were required, such as "Does your child play a musical instrument? yes or no". The questionnaires were taken
home and later returned to school by the children. Five hundred and six questionnaires were completed and returned for
collection.
Measures
Musical Engagement
The children's level of musical engagement was measured by two factors. Firstly, the child's level of participation for which
parents were asked whether their child currently played, or had ever given up, a musical instrument. Secondly, parents of
children who currently played an instrument were asked to indicate the number of hours their child spent playing an
instrument in an average week, e.g. 1 hour or less, 2-3 hours, etc. The responses to these questions were categorical. (See
Appendix A for full list of items).
Child-Specific Beliefs
The parents' child-specific beliefs about music were measured by asking parents to rate their child's 'competence' at, and
'value' of, instrumental music, along with perceived 'effort' and their own 'expectation' for the child over the next year. For
example, 'How good do you think your child is at instrumental music? with scale anchors of (1) not at all good to (7) very
good. All were single items, except for 'value' which had two items (liking and importance). These two items were
collapsed into a mean composite score for all analyses. (See Appendix B for scale and full anchor details ).
Parent-Specific Beliefs
The parents' beliefs about the support they offered was measured by asking parents to rate how much they felt their support
had improved their child's performance during the last year, and how confident they felt about their future support. For
example, 'How much do you think you have improved your child's performance in instrumental music during the last year?
with scale anchors of (1) very little to (7) a great deal. (See Appendix C for scale and full anchor details).
Level and Type of Support Provided
Parents of children who presently played an instrument were asked to rate the level of involvement and type of support they
provided for their child on a real time scale. The 'Support' scale began with the phrase "How often do you...." with 7
real-time scale anchors from (1) never to (7) everyday (30 minutes or more). A principal component analysis with varimax
rotation confirmed that there were three factors, accounting for 66% of the variance. The three factors were interpreted as
encouragement, (e.g. 'How often do you encourage your child to practise?') negotiation, (e.g. How often do you reward
your child for practising?') and active involvement (e.g. How often do you play an instrument with your child?'). Scale
scores were created from the mean composite scores for 'Encouragement' (6 items, α =.8775), 'Negotiation' (3 items, α
=.5739) and 'Active involvement' (2 items, α =.7231) and used in all further analyses. (See Appendix D for scale and full
anchor details)
Results
The results are presented in four sections. First, descriptives of the children's level of engagement are presented. Secondly,
the influence of parent's child-specific beliefs on their child's level of engagement are described, followed by the influence
of the parent's beliefs about their support. Finally, details of the level and type of support provided by parents and its
relation to children's level of engagement are presented.
Children's Level of Engagement
The children were assigned to a categorical cohort based on their parent's designation. The three cohorts are (1) Players
(children who presently play an instrument (n =267)), (2) Gave ups (children who had previously played an instrument but
given up (n=131)), and (3) Non-players (children who have never played an instrument (n=108)). These cohorts were used
in further analyses.
Parents' Child-specific Beliefs and Children's Level of Engagement
A series of one-way analyses of variance (ANOVA) identified significant differences between the parents' child-specific
beliefs for children in the different cohorts. A summary of the results, with means and standard deviations, are displayed
below in Table 1.
Table 1
Mean Scores and Standard Deviations for Parents' Child-Specific Beliefs and Cohort
Further post-hoc analyses of cohort identified that parents of children who currently played an instrument believed their
child to be significantly more competent than children who had given up or never played. Yet parents of children who had
given up still perceived their children to be more competent than children who have never played. Parents of children who
currently play also perceived their child as having a higher value for instrumental music, applying more effort and predicted
that they would do better in the following year than children who had given up or never played.
Parent-Specific Beliefs about Their Own Support and Children's Musical Participation
One-way analyses of variance (ANOVA) identified significant differences between the parents' beliefs about their support
for children in different cohorts. A summary of the results, with means and standard deviations, are displayed below in
Table 2.
Table 2
Mean Scores and Standard Deviations for Parent-Specific Beliefs of Support and Cohort
Current support 4.45 1.8 2.68 1.7 2.14 1.6 85.232 2 .0001
Future support 4.52 1.9 3.56 2.1 2.01 2.0 28.085 2 .0001
Further post-hoc analyses of cohort identified that parents of children who currently played an instrument believed they had
improved their child's performance at instrumental music significantly more than parents of children who had given up or
never played. They were also more confident in their ability to improve their child's performance in instrumental music
over the next year.
The Type and Level of Parental Support Provided and Children's Level of Engagement
To examine how the type of parental support provided influences children's level of engagement, correlations were
performed between the composite 'Support' scales and the number of hours children were reported to play during an
average week. 'Encouragement' support was found to be the most highly correlated to children's level of engagement in
playing an instrument ( r = .408, p<.0001). Correlations between the number of hours played and the two other types of
support were very low, with only one significant finding ('Active involvement' r = .140, p<.05 and 'Negotiation' r = .051,
p<.432). This suggests that the 'Encouragement' type of support will be more likely to lead to successful outcomes and
higher levels of engagement.
Discussion
The significant effect of cohort identified in parent's child-specific beliefs confirms the important role of parental beliefs
and expectations on children's level of engagement. Parents of children who currently play instruments believe their child is
significantly more competent than parents of children who have given up or never played. It is to be expected that the
child's current, or previous, level of participation will influence the parents' beliefs about ability. This was demonstrated by
the parents of children who have given up also perceiving their children to be more competent than parents of children who
have never played. However, it could be argued that parents who have high beliefs about their child's abilities will convey a
high value for music and therefore indirectly, as well as actively, encourage the child in playing an instrument, resulting in
them applying more effort and becoming more competent. This is in line with previous research in other domains which
found that parent's beliefs and values have a strong influence on children's own beliefs and competence. It is also supported
here by the findings that parents of children who currently play also perceive their child to have a higher value for
instrumental music, apply more effort and predict them to do better in the following year than children who have given up
or never played.
Parents of children who currently play an instrument also believe they have improved their child's performance at
instrumental music during the last year significantly more than parents of children who have given up or never played. Due
to their children being currently involved it is not surprising that parents of players should hold this belief more strongly
than other parents. Parents who are actively involved in supporting their children are more likely to believe that their
support has improved their child's ability than parents who do not have the chance, or choose not to be involved. Parents of
children who currently play are also more confident in their ability to improve their child's performance in instrumental
music over the next year. As previous research has found that parents' level of involvement is perceived as an indication of
their valuing of music, which influences their child's motivation for participating, it is likely that children with parents who
are confident about their ability to provide support and assistance will continue playing longer than those whose parents are
not involved or are less confident in their ability to provide support.
Upon examination of the types of support parents provide, 'Encouragement' was found to be the most highly correlated to
children's level of engagement in playing an instrument. This suggest that support involving encouragement will be more
likely to lead to successful outcomes and higher levels of engagement. 'Encouragement' support included behaviours such
as providing praise, helping the child find time to practice, listening to practice, and attending performances. This finding
suggests that general support, such as encouraging practice, promotes the highest levels of engagement, and is in line with
previous research on higher achieving musicians. As this style of support does not require any musical skills, such as
reading music or playing an instrument, it can be provided by any parent, suggesting that it is possible for all parents to
support their children in a manner which can lead to successful outcomes. The findings contribute to the development of
theory by increasing our understanding of the ways in which parental attitudes and support might influence children's
motivation and engagement during the early stages of learning to play a musical instrument and indicate a direction for
future research in this area.
References
Crandall, V.C. (1969). Sex differences in expectancy of intellectual and academic reinforcement. In C.P. Smith (Ed.).
Achievement-related behaviours in children. New York: Russell Sage.
Csikzentmihalyi, M., Rathunde, K., and Whalen, S. (1993). Talented teenagers: The roots of success and failure.
Cambridge: Cambridge University Press.
Davidson, J.W., Howe, M.J.A., Moore, D.G, and Sloboda, J.A. (1996). The role of parental influences in the development
of musical ability. British Journal of Developmental Psychology, 14, 399-412.
Eccles Parsons, J.A., Adler, T.F., and Kaczala, C.M. (1982). Socialization of achievement attitudes and beliefs: Parental
influences. Child Development, 53, 310-321.s
Eccles, J.A., Wigfield, A. and Schiefele, U. (1998). Motivation to succeed. In Damon, W. (Ed.). The Handbook of Child
Psychology, Vol. 3, 1017-1095.
Frome, P.M. and Eccles, J.A. (1998). Parents' influence on children's achievement-related perceptions. Journal of
Personality and Social Psychology, Vol. 74, 2, 435-452.
Galper, A., Wigfield, A., and Seefeldt, C. (1997). Head start parents' beliefs about their children's abilities, task values, and
performances on different activities. Child Development, Vol. 68, 5, 897-907.
Hess, R.D., Holloway, S.D., Dickson, W.P., and Price, G.L. (1984). Maternal variables as predictors of children's school
reading and later achievement in vocabulary and mathematics in sixth grade. Child Development, 59, 259-285.
O'Neill, S.A.(1994). Musical development: Aural. In A. Kemp (Ed.). Principles and processes of music teaching. Reading:
International Centre for Research in Music Education. pp. 1043.
Seginer, R. (1983). Parents' educational expectations and children's academic achievements: A literature review.
Merrill-Palmer Quarterly, Vol. 29, 1,1-23.
Sloboda, J.A. and Howe, M.J.A. (1991). Biographical precursors of musical excellence: An interview study. Psychology of
Music, 19, 3-21.
Sosniak, L.A.(1985). Learning to be a concert pianist. In B.S. Bloom (Ed.). Developing Talent in Young People. New York:
Ballentine.
Sosniak, L.A.(1990). The tortoise, the hare, and the development of talent. In M.J.A. Howe (Ed.). Encouraging the
Development of Exceptional Abilities and Talents. Leicester: The British Psychological Society.
Yoon, K.S. (1997). Exploring children's motivation for instrumental music. Paper presented at the biennial meeting of the
Society for Research in Child Development, Washington.
Winterbottom, M.R. (1958). The relation of need for achievement to learning experiences in independence and mastery. In
J.W. Atkinson (Ed.). Motives in fantasy, action, and society. Princeton, N.J.: Van Nostrant.
Back to index
Appendix A
Musical Engagement Items
Does your child play a musical instrument now? (please circle) YES NO
On average, how many hours each week does your child spend playing an instrument?
Appendix B
Child-Specific Beliefs
Appendix C
Parent-Specific Beliefs
Very little A great deal
Appendix D
Level and Type of Support Provided
How often do you........ Never A few A few Weekly A few Everyday Everyday
times a times a times a (less than (30 mins
year month week 30 mins) or more)
Proceedings paper
The social cost of expertise: personality differences in string players and their implications for the audition
process and musical training
Daina Stepanauskas
Background
Not only the number of applicants for places in an orchestra (Rinderspacher, 2000), but also their standard of
competence is rising from year to year. This study will examine personality differences in string players and the
consequences for the audition process and musical training which arise from these. The violin professor H.
Schneeberger (Noltensmeier, 1997) comments: "Twenty to twenty-five years ago, we could quite happily advise
violin players to apply for positions as concertmasters, but today we should really sometimes be telling them that
they should be glad to find a place in one of the last seats of a good orchestra". As those musicians who excel in
solo performance tend to be selected for the orchestra, good preparation for solo audition playing is the most
important part of musical training (Griffing, 1994).
However, orchestra members criticise the fact that auditionees neglect to prepare the excerpts (extracts from
orchestral works, employed in the audition process) thoroughly. In the final round of auditioning, an alarming
discrepancy emerges: musicians with highly-trained abilities in solo playing - indeed, the very best players, who
make it through to the final round of auditioning - are startlingly unprepared for the practical demands of
orchestral playing. Such orchestral work may at the first glance appear less demanding than solo pieces; in
reality, however, the body of orchestral literature contains works of as great a difficulty and demanding of just as
much skill (albeit of a somewhat different kind ) as solo pieces.
The majority of successful auditionees fill tutti positions within the orchestra and are hence no longer required to
perform solos. Musicians experience this transition from an individualistic to a group role in very different ways.
Some are relieved at having successfully mastered the stressful audition period, whereas others have difficulties
integrating into their section and miss the challenge of solos. The award of a place in an orchestra is succeeded
by a one-year trial period, during which failure is more often due to such problems of integration than to
insufficient ability. Such cases of failure for these reasons demonstrate the importance of adaptation to the
orchestral group situation for successful and permanent integration into the orchestra.
Aims of the study
Popular belief among orchestra musicians has it that problems of integration and adaptation and indeed greater
individualism occur more evidently and frequently among violinists, than among other string players. I supposed
that such differences, if they exist, would surely be most evident between violinists and double bassists. The
reason for this supposition lies in the different roles these two instruments tend to take in orchestral music. The
parts double-bassists are required to play in art music are by no means solos, but rather supporting parts. The
instrument has a primarily accompanying function. A double-bassist will not generally have ambitions to be a
soloist. He or she is required to acclimatise him/herself to a supporting role within the group from the very
beginning. Are violinists, then, different from double-bassists? Or are particularly good musicians, regardless of
which instrument they play, more individualistic than their colleagues? Are there thus no differences between
players of each instrument? Or is it possibly the case that these differences are not to be found between either
instrument groups or different levels of musical competence, but that they rather occur equally distributed across
all instruments and competence levels? Each of these three possibilities has various arguments speaking for it.
But which one is in fact accurate? This question is the subject of my TICOM study (Test of Individualism and
Collectivism of Orchestra Musicians). The study examines personality traits which are closely connected to
individualism and collectivism. I here define individualism as a person's viewing liberty of self-development and
variety of life choices as a basic requirement for personal happiness and satisfaction. Collectivism is defined as
the sense of belonging to a group and requiring membership of this community as the basis for personal
happiness and satisfaction (Triandis, 1995; Triandis & Gelfand 1998). Individualism, then, would imply a greater
reluctance to work simply as a member of the group, here: the orchestra, and a pursuit of one's own personal
development.
Method
The final sample of the TICOM study, 121 music students from 12 German music academies, took part in the
study in the winter semester 1998/99 and the summer semester 1999, their participation taking the form of the
completion of a questionnaire. The students were divided into groups according to whether their main instrument
was violin or double-bass, and according to level of competence into two further groups, the "best" and the
"good" group. The criterion for the "best" group was that a student had in the last three years, with their present
main instrument, taken part in at least one national or international competition, in which s/he had successfully
reached at least the second round. This resulted in four groups: best violinists, good violinists, best
double-bassists, and good double-bassists. The questionnaire consisted of particular scales based on the German
version of the Personality Research Form (PRF), one of the most commonly used personality questionnaires;
scales from the ITAM (International Test of Achievement Motivation), a newly developed personality
questionnaire; and scales based on Instrument 1 (I1) Triandis, 1995. These scales measure particular personality
traits (see results). Each of these scales, contained various items, i.e. statements the participant is required to rate
on a seven-point Likert scale of agreement, running from 1 (strongly disagree) to 7 (strongly agree). Some details
of the data taken into the final analysis: I finally examined data from 4 categories: best violinists, good violinists,
best double-bassists and good double-bassists. Subjects were grouped by two factors, each factor measuring two
levels: the factors "instrument" (violin/double-bass) and "competence" (best/good). For each of the scales (such
as affiliation, dominance, etc), a separate two-way AOV was carried out. The variance analysis posed the
following questions for each scale (dependent variable):
1. Are there significant differences in the mean scores for the personality traits between violinists and
double-bassists? (effect of factor "instrument")
2. Are there significant differences in the mean scores in the personality traits between the good and the best
students? (effect of factor "competence")
3. Finally, are there interaction between the two main effects of instrument and competence?
Results
An effect of competence appeared in every trait where significant differences were discernible (see below). The
"best" students demonstrate higher mean scores in every trait except affiliation, in which they had lower mean
scores. Affiliation bears an inverse relation to the other traits studied; it implies a seeking out of the company of
other people and an ability to co-operate well with them, while the other traits measured imply higher
individualism and the assumption or acceptance of a competitive or hierarchical relation.
Traits, where significant differences were discernible:
Affiliation (PRF): high scores in this trait indicate a tendency to seek the company of other people.
Dominance (PRF): high scores indicate the tendency to aim to reach the top of a given hierarchy or occupy a leading
position in the group.
Exhibition (PRF): high scores indicate that the person enjoys being in the lime-light, receiving favourable attention,
etc.
Status orientation (ITAM): a sense or consciousness of, consequently self-orientation according to, status.
Competitiveness (ITAM): a need to compare oneself with, and the desire to achieve more than, others.
Vertical individualism (I1): emphasis on free choices and liberty of development, differing from horizontal individualism in
that vertical individualism implies an acceptance of hierarchical relations between individuals.
The analysed results were as follows: affiliation (p<0.05), dominance (p<0.01), exhibition (p<0.01), status
orientation (p<0.01), competitiveness (p<0.05) and vertical individualism (p<0.05). However, a main effect of
instrument occurred in only two traits: dominance (p<0.05) and exhibition (p<0.05). The third effect examined,
the effect of interaction between the two main effects of competence and instrument, likewise appeared in only
two traits, namely dominance (p<0.05) and status orientation (p<0.05). Interpreting the results one should be
aware that there is no one-to-one matching of statistical findings and substantial effects. To illustrate this we shall
take as an example the mean values for dominance. The F-Test yields significant values for the instrument, the
competence, and the interaction term. Examination of the mean values, however, gives us a different impression
(see figure 1). Further analysis by the Newman Keulls procedure suggests that a contrast appears between best
violinists, and the remaining three groups which seem to be homogeneous.
The associated contrast vector is c=(3,-1,-1,-1), where the co-ordinates correspond to the groups in the order
given in the bar chart. It is plausible that this contrast is reflected by effects in all subtests of the two-way AOV
since their corresponding contrast vectors are (1,1,-1,-1,) - instrument, (1,-1,1,-1) - competence, and (1,-1,-1,1) -
interaction, and they correlate positively with the vector c. At the first glance, the significant result in the
instrument contrast seems to support the general belief among music students that violinists are less co-operative
than others. Obviously, this observation is made only in the best group, and subsequently attributed to all violin
players. But the group of good violinists has the lowest mean score, hence the instrument contrast is not
interpretable as main effect. The question arises as to the origin of the difference. If it is due to the competence
level the same effect should be observed in the double-bassists, but there the difference is insignificant. Male and
female violinists show similar patterns in this scale, so the different proportions of the genders in the two
instrument groups can not be made responsible for our finding. My interpretative approach to this question was to
examine the age at which the musicians began to play their instrument. The violinists began at an average age of
6 years with a variance of 1.3 years. The earliest starting age in my sample was 3, the latest was 9. The
double-bassists began at an average age of 10 years with a variance of 4.2, a variance, we will observe, much
larger than that for the violinists. The earliest starting age for the double-bassists in my sample was 3, the latest
20. It is important to note that these figures do not refer to the double-bass itself, but to the first instrument they
played. This first instrument could be anything at all. The double-bassists in my sample played instruments
ranging from the drum to the trombone, flute, violin, saxophone. Many of the girls played the harp. The average
age at which they started to play the double-bass itself was 16 years. The variance of 2.5 for the figure "starting
the double-bass" is small in comparison to that for "starting to play an instrument", but still much larger than that
for the violinists. The minimum and maximum age exhibited quite a large gap: the youngest was 11 years old, the
oldest 30. Ericsson et. al (1991, 1993) have established that the "best" violinists differ from the "good" violinists
in that they do a much larger amount of practice during early adolescence. The role of teachers, parents, peers in
encouraging them in these activities is a very significant one (Davidson et. al, 1996; Sloboda 1997; Sloboda &
Howe, 1991). If progression in expertise brings praise from such people, it in turn increases self-confidence and
possibly also the level of dedication, leading to a form of "virtuous circle". During this long period such
development likewise goes hand in hand with increased levels of dominance, tendency to exhibitionism,
consciousness of status, competitiveness, etc. The violinists had already been undergoing this development fur a
substantial period in early adolescence, a very sensitive phase of personality development, before the
double-bassists had even started to play their "main instrument".
Conclusion
The TICOM study established that the best violin students (those who had the best chances of gaining a place in
an orchestra later on) exhibited differences from "good" violin students in personality traits which are closely
connected to greater degrees of individualism. These differences, however, are at odds with the requirements of
the profession of orchestra musician.
This expertise effect on personality, along with the heavy emphasis on solo work at the music academy, can lead
to many an orchestra violinist coming down to earth with a bump once successfully past the audition stage and
hence past the point at which success is measured almost exclusively by solo ability. Carl Flesch (1929), the
doyen of modern violin pedagogy, comments as follows on the rude awakening of the orchestra violinist: "Nearly
every orchestra violinist, once upon a time, has dreamed of becoming a celebrated soloist. [...] An orchestra
violinist of this type, therefore, will and must always be discontented with his lot."
The dilemma for the training of professional musicians, then, is: how to devote adequate attention, within violin
lessons, both to perfecting musical / technical skills on the instrument and to the demands of the students' later
working life? Studying orchestral playing as a subsidiary subject within the academic music course cannot
replace the systematic promotion of difficult orchestral works by the violin professor in a one-to-one teaching
situation. There exists the need to arouse interest among music students for orchestral as well as solo repertoire,
as it is orchestral literature with which most of them will primarily occupy themselves in the course of their work
in the orchestra. A further problem facing musical training is the common belief or prejudice in music academies
that only solo works, such as violin concertos, are suited to provide the musical and technical training
professional musicians require. In fact, orchestral repertoire contains numerous works more than suitable for the
training of such technical and musical skills; one example of such a suitable work would be Strauss'
"Rosenkavalier". We observe within the training of musicians, then, a form of "two-tier system", the two
"classes" being soloists and orchestra musicians. The tendency is to view orchestra musicians as "the less
competent soloists", whereas in fact these two areas represent two different tasks which require somewhat
different skills. A potent illustration of this prejudice is provided by Flesch's defensive acknowledgement of these
difficulties: "Although the frequent incidence of teachers neglecting to disabuse the student of his illusions is
from an ethical point of view not altogether above criticism, we cannot, at least not from a practical standpoint,
reproach teachers with this fact, as they are, after all, principally charged with the task of providing our orchestras
with suitable new talent." We see, then, that Flesch, while he is aware of the ethical problems involved in the
training by solo pieces of the musician almost certainly destined for orchestral work, is himself of the opinion
that the only "proper" way to train violinists is by the use of solo works. This conservative mode of thought
regarding musical training is certainly a difficulty to be overcome if these problems are to be solved. However, it
would constitute an injustice to place all the blame at the door of the music academies; the orchestras themselves
appear to continue to serve this prejudice, basing their auditions on the playing of solo pieces. In this sense,
music academies and other musical training institutions are simply carrying out their task - preparing students for
the demands of the application and audition process, i.e. enabling them to have a realistic chance of a place in the
orchestra - in precisely the correct manner. The onus, then, would be on the orchestras to change their practices.
The training could only then represent a better preparation for the actual profession of orchestra musician, in
which most students who go through it will eventually earn their living (as opposed to being soloists), if the
orchestras' requirements and practices at audition were not so at odds with the reality of working life for most
musicians who enter this profession.
The study has demonstrated that the practice of searching for the conventionally "best" violinists, relying on
audition methods which concentrate on solo playing and to the exclusion of all other skills, naturally increases
the likelihood that those players eventually selected for the orchestra will, similar to the "best violinists" group in
our study, be those who tend to have difficulties in adapting to a group situation and be, to refer again to Flesch,
"discontented with their lot". Nevertheless, the study works with average values. This means that within this
"best violinists" group there will be violinists who have more and less difficulty in adapting to a group, i.e. more
"individualist" and more "collectivist" types: that is to say, there do exist violinists who combine very high levels
of musical and technical competence with skills of social competence in group conditions. Thus it would make
sense to introduce a systematic element into the audition process that is capable of selecting those with both the
highest level of competence and the greatest ability to adapt to groups. My suggestion, therefore, is to include in
the audition process a test of ability to integrate speedily into a group. One possible way of testing such ability,
which would not at all be problematic or difficult to organise, would be to organise a quartet from the orchestra,
remove the second violinist, and replace him/her with the applicant, then presenting the applicant with a piece of
music with which s/he is unfamiliar. This procedure would test both sight-reading ability and competence in
adaptation to the group, which are precisely the abilities required in the everyday working life of an orchestra
member. Old habits die hard: conservative modes of thought which suggest the solo work is the only adequate
means of training a future orchestra member, or testing the orchestra applicant on his/her competence at audition,
are clearly still very powerful within the musical world in Europe. On the other hand the Metropolitan Opera in
New York has already effected radical changes in the audition process. Aware that is it not necessarily "soloist"
types who are most fitted for orchestral work, it now completely leaves solo works out of the audition procedure,
instead preferring to test applicants on orchestral excerpts at each stage of the audition. Should a musician not
show sufficient interest in such excerpts to prepare them properly for audition, s/he finds him/herself failing the
audition in the first round.
If even the majority of orchestras were to take up a similar practice, musical training would be freed from the
necessity of preparing students for an audition solely concentrating on solo works, and would consequently be
able to provide the students with a training more closely resembling the reality of their future orchestra career.
Likewise, the procedure I am suggesting would bring the requirements of the audition closer to the demands of
the real work situation, which would in turn have a positive effect on musical training. However, music
academies cannot be merely passive recipients of "knock-on" effects from changes in the orchestras. They have
the potential and the duty to arouse students' interest in orchestral works, which they could achieve by widening
the training repertoire beyond solo concertos. A concluding remark on musical training's paradoxical and
somewhat bizarre attitude towards orchestra playing, which demonstrates beyond doubt how much change in
attitudes continues to be necessary: Particularly good students are excused from having to take part in the
orchestra - presumably to allow them more time to perfect their solo skills.
__________________________________________________________________________
Davidson, J. W., Howe, M. J. A., Moore, D. G., & Sloboda, J. A. (1996) The role of parental influences in the development of
musical ability. British Journal of Developmental Psychology, 14, 399-412.
Ericson, K.A., Krampe, R., & Tesch-Römer, C. (1991). The role of deliberate practice in the acquisition of expert performance.
(Technical Report 91-06). University of Colorado at Boulder, CO 80309: Institut of Cognitve Sciences.
Ericson, K.A., Krampe, R., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance.
Psychological Review, 100, 363-406
Flesch, C. (1929) Die Kunst des Violinspiels. Berlin: Ries & Erler.
Griffing, J. (1994) Audition procedures and advice from concertmasters of American orchestras. Ohio State University, UMI ON:
9427653
Noltensmeier, R. (1997) Grosse Geigenpädagogen im Interview. Kiel: Götzelmann.
Triandis, H.C. & Gelfand, M.J. (1998) Converging Measurement of Horizontal and Vertical Individualism and Collectivism. Journal
of Personality and Social Psychology, 74, 118-128
Back to index
There are two theories about the way we perceive and represent rhythm in music. One theory identifies an "internal clock", activated at
the onset of musical sounds. Grouping is created when note events map onto a pulse-grid. (Clynes & Walker, 1982; Drake & Gerard,
[989; Handel & Lawson, 1983; Pouthais, 1996; Pouthais, Provasi, & Droit, 1996; Povel, 1981; Povel & Essens, 1985). A second theory
states that we organize durations into figural groups, based on the structural aspects of the durations themselves, possibly unrelated to an
underlying pulse. Boundaries between groups are formed by elongated note events, changes in loudness, texture, melodic direction, or
by the repetition of a pattern. (Bamberger, 1978, 1991; Deutsch, 1980, 1982; Lerdahl & Jackendoff. 1983; Upitis, 1987)
Studies of pre-natal and infant children point toward a natural tendency to organize sounds around an internal clock mechanism.
Pouthais (Pouthais et al., 1996) explored biobehavioral rhythms in infants and found several behaviors that are performed on a steady
pulse. Infants, furthermore, could manipulate these behaviors, altering the tempo of performance without losing consistency, in order to
control their environments. Other studies record rhythmic behavior and young children's to move in synchronicity to music for short
periods of time. (For a discussion of such studies, see Pouthais, 1996)
However, young children have not demonstrated an ability to sustain synchronized movement with music, or to perform a steady beat
for any length of time, until they are between the ages of three- and five- years. (Davidson, McKernon, & Gardner, 1977; Dowling &
I-Harwood, 1986; Frega, 1979; Jones, 1976; Moog, 1976; Perney, 1976; Rainbow & Owen, 1979; Serafine, 1979; Upitis, 1987) It is
possible that this inability could be attributed to undeveloped gross motor coordination, or it may be possible that there is a
developmental trend in the induction of an internal clock in the performance of musical rhythm.
Bamberger (1978, 1991) and Upitis( 1987) suggested that there is a developmental trend in young children, from grouping through
rhythmic figure to figural grouping plus internal clock (formal) grouping. Evidence of discrimination between different rhythmic figures
is evident in many studies, even when tempo and pitch are altered. (Chang & Trehub, 1977; Clifton, 1974; Donohue & Berg, 1991;
Stamps, 1977) Bamberger's explorations of graphic representation (1978, 1991) and Upitis's examinations of rhythmic development
(1987) indicate the presence of grouping by structural boundaries in younger children, moving to grouping by metric characteristics in
older children. Their findings invite investigation into other musical behaviors.
The primary goal of this pilot study was to develop a tool to examine the vocal performances of young children to determine whether
Bamberger and Upitis's findings are supported through performance behaviors. The secondary goal of the research was to begin to
explore the characteristics of the performances for indications of other tendencies that may help to categorize the representation of
rhythm by young children. Three research questions guided the investigation:
1) Is there any evidence of a steady pulse in pre-school children's vocal performances?
2) Is there evidence of organization into rhythmic phrases?
3) Can evidence for either type of rhythmic behavior be categorized for further investigation?
Method
A pilot test in vocal performance was developed for three-to five-year-old children. Vocal performance was chosen because preschool
children seem to be able to sing before they are capable of performance tasks which require gross motor coordination. (Drake, Dowling,
& Palmer, 1991; Hargreaves, 1986; McDonald & Simons, 1989; Webster & Schlendrich, 1982)
A stimulus tune was constructed to allow for grouping by figure or by internal clock. A two- phrase tune with the longest duration
between similar phrases was judged to allow for examination of timing in performance. Note values within sub-phrases had a 2:1
proportion, (quarter note = 563ms, eighth note = 281ms), the end of sub-phrases with a 3:1 ratio, and finally a 4:1 ratio between the
phrases. (end of the sub-phrase =818ms, end of phrase = 1094ms) If it was true that young children organize rhythmic information
figurally, then they would accurately recreate the ratios within the phrases, (Fraisse, 1982) but would produce variable results between
phrases.
As expected, the timing charts indicate that performances were more internally inconsistent than indicated by the tempo graphs. Timing
of the sub-phrases was clearly more accurate than phrase endings, and it also became clear that the eighth notes in each phrase become
progressively longer to the ends of each phrase. Individual performance timing charts revealed a tendency to overcompensate for
Back to index
Proceedings paper
Broadening the Concept of Music Aptitude: New Approaches to Theory and Practice
Peter Webster
School of Music
Northwestern University
I am quite interested in the study of children's musical thinking, especially the kind of thinking that is generative (Sloboda,
1988). I believe that, as researchers, we should study both the natural occurrence of such thinking and its development. Because
of this, I am naturally interested in music aptitude as a focus of study.
In the first portion of the paper, I argue for a broadening of the traditional view of music aptitude and its assessment. Next, I offer
examples of research that show promise for such expansion; this work is experimental and awaits substantial, sustained study. I
place special emphasis on creative thinking in music as an important avenue for extending our view of music aptitude. Finally,
the paper concludes with some speculation about music technology's role in expanding our conception of how aptitude might be
assessed procedurally.
When music aptitude is considered an attribute that people possess in varying degrees, it has strong overtones
of being more "mentalistic" than "behavorial." A characteristic that music ability tends to suggest. Perhaps
this is because most music aptitude tests primarily involve discrimination tasks, which rely on perception and
cognition of differences between pairs of musical stimuli. (Boyle, 1992, p. 249)
Perhaps the answer lies in a refinement of our understanding of music aptitude. For me, music aptitude represents a set of
constructs that relate not only to the ability to "audiate" tonal and rhythmic patterns and to make simple preference choices but
also the ability to think with and to manipulate larger musical wholes. I am also convinced that such a construct set must also
account for the ability of the individual to manipulate expressive elements of music. At the core of these notions is that success in
music can best be predicted if we design musical tasks that tap a deeper level of "thinking in sound" than the present
discrimination and preference tasks. The full story of what music aptitude is cannot be fully understood until we know more
about both the traditional convergent tasks as well as the more divergent, problem-solving tasks that require more holistic mental
processing. This combination of skills is what our art is all about.
I believe that Seashore knew this. In his writings about the factors that comprised music aptitude, he indicates this clearly, along
with other comments about music memory, imagination, emotional reaction and expressivity. (Seashore, 1919, pp. 211-235) His
Seashore Measures of Musical Talents (Seashore, 1960), first developed in 1919, did not reflect these beliefs in part because of
the dominance of sensation as a major approach to psychological measurement at the time (Humphreys, 1993). He was taken to
task by Mursell (1937) within the context of the famous "omnibus" vs. "theory of specifics" debate, but this was less about the
nature of the exact constructs and more about (1) whether there was a general factor of music ability versus a constellation of
factors and (2) if it was wise to isolate factors from musical context. Mursell did not suggest an alternative to music aptitude tests
of specific skills, but his criticisms did raise interesting questions about how we should think about identifying potential. In the
years that followed this debate, we have had refinements of Seashore's work with more context-sensitive test items, but no real
fundamental change in the basic approach of discrimination and preference of relatively simple tasks.
Figure 1
The repeated theme forms a primitive hierarchic structure: the whole is formed by grouping of groups. This is thought to
be a general and basic musical property. Only structural changes are made in the 'answer' part of an item, there may be
changes in the sequence or amount of tones, but the same tones are used. If, for instance, there were new, different pitches
in the answer part it would be easy to notice the change by mere recognition, without conceiving the structure. (Karma,
1984, p.28)
This assessment approach celebrates a somewhat higher level of mental processing than comparing two separate patterns. What
is interesting about this work is that it represents not only a higher level of musical thinking, it also allows for the inclusion of
other musical elements such as dynamics (accents). Further extension of this approach might include experimentation with tone
color and articulation.
Analogical Thinking
Nelson, Barresi and Barrett (1992) were interested in testing children's ability to solve musical problems in an analogical setting.
Citing the importance of analogical reasoning in general intelligence, this group reasoned that the same approach might be
applied to auditory processing. The research questions in this study centered on the developmental trends in musical and spatial
analogical thinking tasks and how these results compared with Gordon's measures.
What is fascinating about this study is the construction of the analogical tasks in music. Figure 2 represents one kind of spatial
task that uses analogical thinking. Here, the child must encode, compare, and evaluate identical features of shape. Size and color
were used in more advanced tasks. Figure 3 represents the same kind of mental operations but applied to musical material.
Figure 2
Figure 3
As with Karma's work, the researchers here tax a more holistic process than simple discrimination of patterns. The musical tasks
demand that the individual encode, compare and evaluate one to four perceptual changes in the auditory examples. Melodic
patterns are used in Figure 3, but one can imagine tasks like this that include rhythm, tone color, harmony, dynamics and other
kinds of musical material.
Sensitivity to Expression
The importance of expression as an important part of musicality was noted by Seashore in his text on the psychology of music
(Seashore, 1938). Gordon devoted a section of his Musical Aptitude Profile (Gordon, 1965) to a group of three subtests that
evaluated discriminations of phrasing, balance, and style. Each task asks an individual to choose between a pair of short musical
examples.
Little work to expand the study of expression as part of aptitude has been done until recently. Rodriguez (1995) reviewed what
few studies do exist, particularly in adults, and designed a series of interesting experiments with young children. Using
MIDI-based technology, Rodriguez (1998) placed expressive and mechanical performances of tunes side by side for children to
compare. He used a "2+1 oddity paradigm" as a presentation strategy in which either one mechanical and two expressive
versions of the same tune was used or two mechanical and one expressive were used. The children interacted with a
touch-sensitive computer screen to audit the three versions, then selected the one that was different in the set of three. Twenty
sets of tunes were given to 60 children (20 Kindergarten, 20 Grade 2, and 20 Grade 4). Each age group showed evidence of being
able to do the task (above chance) with the ability rising by age level.
Rodriguez also experimented with production tasks by having the children guide the performance of well known tunes by
controlling the tempo and loudness using a simple MIDI trigger that played successive musical events of the song. These
performances were saved and judged by experts for their expressiveness. He also engaged the children in a discussion of what
they did.
Work in the study of creative thinking ability in school-aged children is a relatively new venture. I have summarized much of the
empirical work by dividing studies into those that address content (product and process) and those that follow the psychometric
tradition (Webster, 1992). Since 1992, the work has continued with greater emphasis on compositional and improvisational
thinking. Not all of this work is intended at broadening the assessment of music aptitude per se, but much of it has underscored
the importance in understanding creative thinking as a vital part of music teaching and learning.
How does this imaginative thinking relate to the big picture? I have found the model pictured in Figure 4 to be useful in my
thinking about creative thinking in music and attempts to measure it as a function of music aptitude (Webster, 1987a). Such
attempts at conceptual modeling are useful for teachers and researchers. They suggest relationships between variables which
imply possible teaching strategy and give direction to research. They can also generate a platform for debate in the profession
which is always a healthy sign. This model is designed to be representative of both child and adult creative thinking, although
certain aspects of the model might be qualitatively different at various stages of development.
I have presented a description of the model elsewhere (Webster, 1987a). Important in this paper is the notion that enabling skills
include aptitudes that are both convergent and divergent in nature. To operationalize these ideas, I have created a measure that
uses a microphone for amplifying the voice, a round sponge ball with a piano, and a set of temple blocks to engage children in
musical imagery (Webster, 1987b) (see Figure 5). Called Measures of Creative Thinking in Music (MCTM), the activities begin
very simply and progress to higher levels of difficulty in terms of divergent thinking. There are no right or wrong answers
expected.
Figure 4
Figure 5
The first section is designed to help the children become familiar with the instruments used and how they are arranged. The
musical parameters of "high/low", "fast/slow", and "loud/soft" are explored in this section, as well as throughout the measure.
The way the children manipulate these parameters is, in turn, used as one of the bases for scoring. Tasks involve images of rain
in a water bucket, magical elevators, and the sounds of trucks.
The middle section asks the children to do more challenging activities with the instruments and focus on the creation of music
using each of the instruments singly. Children enter into a kind of musical question/answer dialogue with the mallet and temple
blocks and the creation of songs with the round ball on the piano and with the voice and the microphone. Images used include the
concept of "frog" music (ball hopping and rolling on the piano) and of a robot singing in the shower (microphone and voice).
In the final section, the children are encouraged to use multiple instruments in tasks whose settings are less structured. A space
story is told is sounds, using drawings as a visual aid. The final task asks the children to create a composition that uses all the
instruments and that has a beginning, a middle, and an end.
This measure, and others like it, yield scores for such factors as musical originality, extensiveness, and flexibility, as well as
musical syntax. Measurement strategies are based on the careful analysis of video tapes of children actually engaged in the
activities. Objective criteria as well as rating scales are used.
Results based on over three hundred children have been encouraging. MCTM reliability and validity data seem to suggest
consistent patterns of response and appropriate task content. The tasks are not measuring the same skills as traditional musical
aptitude tests (tonal and rhythmic imagery) nor are they significantly related to general intelligence. There seem to be no
differences in scores in terms of gender, race, or socioeconomic background.
Perhaps the most important point surrounding this work is that what was once thought to be unapproachable and mysterious is
now being studied. We are beginning to have been tools to evaluate the development of creative thinking musical aptitude as
well as aptitudes that are based on convergent perception skills.
provide better assessment systems. In the work of Rodriguez, we saw the use of a computer to provide children with expressive
and non-expressive musical examples. The computer allowed the child to hear and re-hear these examples and to make a
judgment about the content in a non-timed environment. Today's music technology provide rapid access to CD-quality examples
that can be heard again and again. It does not take much imagination to sense the way that the work of Karma and Nelson, et. al.
could be enhanced with the use of such an approach.
Computers provide a variety of possibilities for assessing aptitude by providing many different kinds of musical problems to
solve. My own work with the MCTM could be designed as a set of tasks delivered by computer with MIDI equipment. Hickey
(1995), Daignault (1997), and Younker (1997) have all used MIDI equipment and computer software to study children's
composition in terms of product and process. Using the computer, these researchers were able to record children's responses to
musical tasks in the form of MIDI data which could be studied carefully for both product and processes variables. Although their
work was not focused on aptitude as a construct, the techniques employed could easily be adapted for this purpose.
Commercial software already exists that holds enormous potential for the study of children's music aptitude with
performance-based tasks. Music Mouse (Spiegel, 1990) and Morton Subotnick's Making Music (Subotnick, 1995) are two
programs that offer children a chance to make music without any previous experience. Such tools allow children to demonstrate
a great deal of natural ability to "think in sound" as they rely on their inner imaginations.
I can imagine an Internet-based assessment tool that would use these techniques and others to provide children with interesting
music problems to solve when recording the gestures and responses to questions. If done well with reliable and valid design,
such a tool could provide invaluable help for music teachers and researchers. Its existence would be a logical extension of the
psychometric tradition, beliefs in cognitive science, current philosophies of learning, and technological resources.
Summary
The psychology of music, indeed all of psychology, has grown enormously in method and substance since Seashore's death. I
think, if he were alive today, Seashore would be fascinated with the tools and methods that we now have to study musicality, but
would be impatient with our slowness to explore these possibilities. I believe he would be urging us to push forward with the
study of the "higher-level" music abilities such as those connected with creative thinking.
The present interest in cognitive science, in studying mental representations and mental structures by presenting problems using
more complex abilities then those required for Seashore's early measures, would have intrigued Seashore the scholar. He
certainly would be swept away by the role of technology: video tape, digital sound, the speed and affordability of personal
computers, MIDI, simulations, AI programming, multimedia, and the Internet. I believe that Seashore would return to his
measures and write far different ones than those of his time.
References Cited
Boyle, D. (1992). Evaluation of music ability. In: R. Colwell (Ed.), Handbook of research on music teaching and
learning. (pp. 247-265). New York: Schirmer Books.
Colwell, R. (1970). The evaluation of music teaching and learning. Englewood Cliffs: Prentice-Hall.
Daignault, L. (1997). Children's creative musical thinking within the context of a computer-supported
improvisational approach to composition. (Doctoral dissertation, Northwestern University, 1997). Dissertation
Abstracts International, 57, 4681.
George, W. (1980). Measurement and evaluation of musical behavior. In D. A. Hodges (Ed.), Handbook of music
psychology, (pp.291-340). Lawrence, Kansas: National Association for Music Therapy.
Hedden, S. (1982). Predicition of music achievement in the elementary school. Journal of Research in Music
Education. 30, 61-68.
Hickey, M. (1995). Qualitative and quantitative relationships between children's creative musical thinking processes
and products. (Doctoral dissertation, Northwestern University, 1995). Dissertation Abstracts International, 57, 145.
Humphreys, J. (1993). Presursors of musical aptitude testing: From the Greeks through the work of Francis Galton,
Journal of Research in Music Education, 41, 315-327.
Karma, K. (1973). The ability to structure acoustic material as a measure of musical apitutde. 1. Background theory
and pilot studies. Research Bulletin, 38, Institute of Education, University of Helsinki.
_________ (1982). Validating tests of musical aptitude. Psychology of Music, 10 (1), 33-36.
_________ (1983). Selecting students to music instruction. Bulletin of the Council for Research in Music Education.
75, 23-32.
_________ (1984). Musical aptitude as the ability to structure acoustic material. International Journal of Music
Education, 3, 27-30.
Lehman, P. (1968). Tests and measurements in music. Englewood Cliffs: Prentice-Hall.
Mursell, J. (1937). What about music tests? Music Educators Journal, 24 (2), 16-18.
Nelson, (1992). Musical cognition within an analogical setting: Toward a cognitive component of musical aptitude
in children" Psychology of Music, 20
Rainbow, E. (1965). A pilot study to investigate the constructs of musical aptitude. Journal of Research in Music
Education. 13, 3-14.
Rodriguez, C. (1995). Children's perception, production, and description of musical expression (Doctoral
dissertation, Northwestern University, 1995). Dissertation Abstracts International, 56, 2602.
_____________ (1998). Children's perception, production, and description of musical expression. Journal of
Research in Music Education. 46, 48-61.
Seashore, C. (1919). The psychology of musical talent. Boston: Silver, Burdett and Company.
__________ (1938). Psychology of Music. New York: McGraw-Hill Book Company.
Shuter-Dyson, R. & Gabriel, C. (1981). The psychology of musical ability (2nd ed.) London: Methuen.
Sloboda, J. (Ed.) (1988). Generative processes in music. Oxford: Clarendon Press.
Spiegel, L. (1990). Music mouse [computer software]. Personal Distribution.
Subotnick. M. (1995). Morton Subotnick's making music [computer software]. New York: Learn Technologies
Interactive (formally published by Voyager).
Webster, P. (1987a). Conceptual bases for creative thinking in music. In Peery, J., Peery, I. & Draper, T. (Editors).
Music and child development, (pp. 158-174). New York: Springer-Verlag.
_________ (1987b). Refinement of a measure of creative thinking in music. In C. Madsen and C. Prickett (Eds.)
Applications of research in music behavior. (pp. 257-271). Tuscaloosa: University of Alabama Press.
_________ (1988). New perspectives on music aptitude and achievement. Psychomusicology, 7, 177-194.
_________ (1992). Research on creative thinking in music: The assessment literature. In: R. Colwell (Ed.),
Handbook of research on music teaching and learning. (pp. 266-280). New York: Schirmer Books.
Younker, B. (1998). Thought processes and strategies of eight, eleven, and fourteen-year old students while engaged
in music composition. (Doctoral dissertation, Northwestern University, 1998). Dissertation Abstracts International,
58, 4217.
Back to index
Proceedings abstract
Warszaw, Poland
psyche@chopin.edu.pl
Background.
Past reasearch has shown that to apply the effective form of management of
music performance anxiety , especially the psychotherapy or pharmacotherapy,
there is a need for a differential diagnosis of this phenomenon in relation to
thr temperamental and personality characterisic as same as to the musical
background of musician.
Aims.
The main goal of the study was to find out the nature and the dimensional
structure of performance anxiety.
Method.
The participants were 145 students from Chopin Academy of Music in Warsaw, all
of them after 12-14 years of intensive training in special music schools. All
students were administered The Check List of
Results.
Back to index
Proceedings abstract
defonso@ipfw.edu
Background:
Previous research (DeFonso & Kelley, 1994) found that aesthetic (like-dislike)
ratings as well as semantic differential-type judgments of singing voices were
related to the degree and rate of vibrato and the distribution of harmonics
within the vocal range. These judgments were made for samples from different
singers who were given no specific instructions on how they should produce the
excerpts.
Aims:
For the present study, acoustic factors contributing to the listeners'
judgments are more closely examined. The variability in the between-singer
samples is more effectively controlled by using just four singers, one for each
voice range (soprano, alto, tenor, bass), and then varying the vocal quality
within each voice.
Method:
The participants were 120 Introductory Psychology students with limited musical
background (as determined by questionnaire). Listeners made aesthetic and
descriptive judgments of 24 excerpts of America the Beautiful. Each of the four
singers produced six versions of the tune using two pitch levels, three types
of vibrato, an "edge" or not, and full-voice or not.
Results:
The acoustic properties of the voices were analyzed using Hypersignal software.
A discriminant analysis will be used to determine which descriptor properties
and acoustic properties contribute to judgments of liking/not liking. Based on
the results of the previous study, it is anticipated that aesthetic judgments
of different voicing styles within and between singers will be reflected in
differences in acoustic properties.
Conclusions:
Back to index
Proceedings abstract
e-mail contact
Abstract title:
Mr Tobias Egner
t.egner@ic.ac.uk
Background:
Aims:
method:
Results:
Conclusions:
Back to index
Proceedings paper
Abstract
This study investigated the effects of melodic and harmonic coherence on sight-singing ability. Twenty-four experienced singers performed an interval singing task, and then sang at sight four novel pieces of music twice each, containing either easy or
hard melody and easy or hard harmony. Both harder melody and harder harmony increased errors. Error rate correlated with interval singing performance, indicating the importance of both pattern-recognition and harmonic prediction in sight-singing.
Singers made fewer errors on the second reading, indicating the importance of familiarity. A significant correlation between hesitation and overall error rate suggests an increasing role for internal auditory representations with increasing expertise.
Finally, less skilled sight-singers were significantly more affected by a disruption in harmony than better sight-singers. Auditory representations seem more important for sight-singing than for most instrumental sight-reading. The findings are discussed
in terms of a cognitive framework for pitch determination in sight-singing.
Introduction
The volume of research into sight-reading novel music has increased markedly over the last 15 years (see Sloboda 1984, 1985, Gabrielsson 1999 for reviews). Although sight-reading ability does not necessarily correlate with musical expertise (Wolf
1976, Waters, Townsend & Underwood 1998), it is nevertheless an important, even essential, skill for professional musicians to acquire (Sloboda 1978), particularly orchestral players, accompanists (Ericsson & Lehmann 1994) and choral singers.
Most sight-reading research has been carried out on pianists, few studies investigating other instruments, examples including the flute (Thompson 1987) and strings (Salzberg & Wang 1989). With the exception of Goolsby's research on eye movements
(Goolsby 1994a, 1994b), the handful of papers on sight-singers (e.g. Sheldon 1998, Demorest 1998) have generally investigated the effects of rather than the processes involved in sight-singing.
Sight-singing and piano sight-reading differ, however, in a number of important ways. One main difference is the extent to which pitch production is internalised. Despite findings that internal auditory representations are involved in piano sight-reading
(Waters et al 1998, Townsend 1997), pianists can still translate the visual stimulus (score) into a motor response (pressing the correct piano key) without knowing the note's pitch internally. However, singers must know the sound of any note before its
production, and this presumably involves working out its pitch internally. Indeed, singers need a starting note when performing (in the absence of absolute pitch), whereas pianists do not. Another important difference is that, unlike pianists, singers
rarely perform by themselves, but more often with other singers, orchestral players or a piano. These other parts therefore potentially provide cues to the pitches to be sung. Certainly, there are no studies in the sight-reading literature where the pianist
hears other lines of music simultaneously. Personal experience and anecdotal evidence from singers suggests that these other parts are of great importance in determining the notes to be sung. The present paper is concerned specifically with the
influence of other parts on sight-singing ability.
Music sight-reading is normally thought of as a transcription task (cf. Shaffer 1978, 1982), such as copy-typing or reading aloud. Like reading text, it can be fractionated into cognitive sub-tasks and operations: perceptual processes (pattern-recognition,
expectation); translation processes from visual / auditory to motor responses; and the formation of auditory representations (e.g. Waters et al 1998). Pattern-recognition ability has been shown to be important in sight-reading. There is a strong
correlation between sight-reading skill and the ability to report groups of briefly presented notes (Bean 1938, Salis 1980, Sloboda 1976, 1978), perhaps akin to chess masters' ability to recall board positions (Chase & Simon 1973). This seems due to the
use of a more efficient and possibly qualitatively different mechanism (Sloboda 1984). Waters et al (1998) demonstrated correlations between pianists' sight-reading skill with single note recognition speed, with recall of briefly presented music and
with pattern matching ability of music (Waters, Underwood & Findlay 1997). Pianists' eye-hand span has also been shown to be related to sight-reading ability (Furneaux & Land 1999, Sloboda 1974, 1977), as has flautists' eye-performance span
(Thompson 1987). Expert sight-readers' eye-hand span tends not just to be larger but also expands or contracts to coincide with phrase boundaries, suggesting the processing of higher-order structures (Sloboda 1984). Further evidence from
eye-movement research (e.g. Waters et al 1997, Waters & Underwood 1998, Goolsby 1994a, 1994b, Rayner & Pollatsek 1997, Truitt, Clifton, Pollatsek & Rayner 1997, Kinsler & Carpenter 1995) indicates that highly skilled readers scan larger units,
sometimes with briefer fixations, and that their fixation pattern is more likely to depend on the type of music being read.
Prediction and expectation of the subsequent note(s) to be performed are important cues. As musicians become more knowledgeable of the conventions of harmony and musical structure, they are more likely to employ prediction. In so called
proof-reading errors, skilled pianists play what they expect rather than printed errors (Wolf 1976, Sloboda 1976). Priming studies have shown that one chord can influence the processing of subsequent chords (Waters et al 1998, Bharucha 1987,
Bharucha & Stoekig 1986). Furthermore, the formation of internal auditory representations is likely to be important in sight-reading, as shown by correlations between sight-reading ability and ability to memorise music from notation (Eaton 1978, Nuki
1984) and improvisation ability (McPherson 1995). Waters et al (1998) suggest firstly that auditory imaging may allow performers to monitor their reading and secondly that this auditory representation is needed for the priming and predictive ability
already mentioned. Ward & Burns (1978) have provided some evidence for the first of these suggestions by showing that auditory feedback is used by singers to keep themselves in tune, although removing auditory feedback only impairs pianists'
expressive aspects of performance (Repp 1999) and not their sight-reading ability (Banton 1995). It seems, then, that pattern-recognition, predictive ability and internal auditory representations are all integral to sight-reading ability.
Table 1
Percentage errors on sight-singing tasks as a function of harmony, melody and attempt.
Easy harmony Hard harmony
Attempt Easy melody Hard melody Easy melody Hard melody
EN EI EN EI EN EI EN EI
Arcsin transformations were carried out (Hays 1993) on the proportion error data prior to running two 3-way (harmony x melody x attempt) repeated-measures ANOVAs, one for EI and one for EN. Harmony was highly significant for both EI (F(1,23)
= 42.91, p < 0.001) and EN (F(1,23) = 51.66, p < 0.001), less coherent harmony leading to more errors. Melody was also highly significant for both EI (F(1,23) = 62.71, p < 0.001) and EN (F(1,23) = 53.28, p < 0.001), with harder melody leading to
more errors. Significantly fewer errors were made on the second attempt than the first for EI (F(1,23) = 5.93, p < 0.05) and EN (F(1,23) = 5.46, p < 0.05). There were no significant interactions.
On the basis of overall error scores, the subjects were divided into two groups for sight-singing ability. Results are shown in Table 2. Two four-way (harmony x melody x attempt x group) mixed ANOVAs were now carried out, again on arcsin
transformed data, for EI and EN. Group was highly significant in EI (F(1,22) = 42.81, p < 0.001) and EN (F(1,22) = 40.24, p < 0.001), indicating a meaningful group division. Group by harmony was significant for EI (F(1,22) = 7.77, p < 0.05) and EN
(F(1,22) = 4.65, p < 0.001), less skilled readers being more affected by hard harmony than better readers. No other interactions approached significance.
Table 2
Percentage errors on sight-singing tasks, splitting subjects into two groups according to sight-singing ability.
More skilled readers Less skilled readers
Easy melody Hard melody Easy melody Hard melody
Harmony Attempt EN EI EN EI EN EI EN EI
Two Spearman correlations compared the number of hesitations with overall sight-singing performance, for EI and EN. Unfortunately the hesitation data for two subjects were lost, so this analysis used data from 22 subjects. This correlation proved to
be significant for both EI (rs = 0.4363, p < 0.05) and EN (rs = 0.5129, p < 0.02). Less skilled sight-singers made more hesitations.
Two final sets of Spearman correlations were carried out, comparing interval test scores, overall sight-singing error scores, the number of years' singing experience, the number of years playing the piano and the total number of instruments played. The
results are shown in Table 3. More skilled sight-singers tended to perform better on the interval test (EI rs = .5820, p < 0.01; EN rs = .5644, p < 0.01) and have more singing experience (EI rs = .4466, p < 0.02; EN rs = .4486, p < 0.02). The number of
years playing the piano correlated with the total number of instruments played (rs = .4723, p < 0.02).
Table 3
Spearman correlation coefficients rs between test performance and musical training.
Discussion
This study investigated sight-singing performance as a function of melodic and harmonic difficulty. Our results indicate that interval singing performance correlates with sight-singing performance, and disrupting the sung line (i.e. harder melody)
impairs sight-singing performance. This melodic disruption tended to increase the size and variability of the presented intervals, interfering with more extended pattern recognition, such as scales or repeating motifs, and hence decreasing sight-singing
performance. In agreement with previous findings (Waters et al 1998, Sloboda 1977, 1984) that more skilled sight-readers have better pattern matching abilities, the findings suggest either that certain intervals are easier to read than others or that
prediction improves with increasing sight-singing skill. Disrupting the line even without altering the harmonic constraints of the piece impairs these predictive abilities (e.g. Waters et al 1998).
Disrupting harmonic coherence also impaired sight-singing ability. Modern, atonal music is much harder to sing, especially at sight, than tonal music from the classical or earlier periods, due to the feeling that the singer "knows where the piece is
going". This is due to the composers' use of harmonic cadences and structures together with the readers' predictive ability based on priming (Waters et al 1998). Prediction becomes harder in atonal harmony, and there is a greater likelihood of singing
incorrect notes. Furthermore, auditory feedback cannot be used to correct the note if the singer is either unaware of the error or does not have enough time available. Indeed, singers skilled at interval (pattern) recognition can transpose an extended
section of a piece (an EI error), sometimes without realising. This can only be corrected when the music reaches a point at which prediction makes the difference between the sung pitch and the required pitch clear and explicit. With the exception of
three subjects, all errors of this type were made in hard harmony conditions.
Auditory feedback can be seen both in increased hesitation in less skilled sight-singers and in the "swoops" that less skilled sight-singers sometimes produce before eventually landing on the right note, suggesting that the formation of the auditory
representation develops with increasing experience and skill. Singers become less accurate in their tuning, more so for less experienced singers, when deprived of feedback from their voices by white noise delivered to the ears (Ward & Burns 1978,
The individual processes, such as interval recognition and the formation of auditory representations, develop with experience. More skilled sight-singers tend to have better interval singing (i.e. pattern perception), are better at multimodal integration
(combining the auditory cues and the visual score), and have better predictive abilities than less skilled sight-singers, although they are also more able to ignore harmony when it is atonal and therefore not helpful or even disruptive to pitch
determination. More skilled sight-singers also tend to produce fewer swoops and hesitations, suggesting a more developed internal auditory representation, together with a more advanced proprioceptive "muscle memory" in the vocal cords. They
probably also make more use of auditory feedback, even in atonal music.
The present findings suggest that the formation of auditory representations is an important skill in the development of sight-singing. Falkner, in his book on voice training, notes "It is a good habit to hear a note before it is sung" and ear training and
Acknowledgement
We would like to thank Burton Rosner and Vincent Walsh for their valuable comments and assistance during the preparation of this paper.
Back to index
Proceedings paper
Dr.M.HARIHARAN
Principal Bharatiar Palkalaikoodam
Center for Performing &Fine Arts
Pondicherry, India
and
The unique feature of the South Indian Music system is its inimitable capacity to bring together two or
more musicians on a concert platform. Though of a totally different mind and style, artistes perform
together in unison. A musical blend results. The contemporary performance genre is hardly 200 years
old.
The term performance is widely used to mean a Katcheri. This is a misnomer. A performance means
an agreement or a union by mutual agreement within which harmonious combination of voices or
instrumentation takes place. Sometimes within agreement there is disagreement among the performers
in the sense they freelance to an extent.
The period between 8th and 15th centuries saw the advent of several performance-oriented
composers. After that the delta known as Thanjavur in the southern part of India came under the sway
of domination of cultural transformation which has resulted in the current trends and settings. The first
concert hall known as Sangita Mahal was built at Thanjavur to foster performances.
It is interesting to note that only select instruments were used for the performances like the Flute,
Yazh, Vina etc. But the advent of the British saw the emergence of the Western dominated
instrument, Violin as an integral part of the Performance genre in the South Indian system of music.
During the end of the 18th and beginning of 19th centuries, ample opportunities were provided to
artistes in closed performance pattern at palaces and rich houses for exhibiting their individual talents.
The string and the percussion instruments emerged as an integral part of a performance.
It was a psychological phenomenon that the two major aspects, the determinate and indeterminate
(the recitative and the interpretative) became the warp and woof of the texture of the modern
performance. The genre in the performance emerged as a well thought out arrangement in the
presentation of classical music in both these aspects, the performances being divided into segments
like pre-heavy pieces, heavy pieces and post-heavy pieces and so on. The first was a mixture of the
recitative and interpretative; the second being the section for high water mark of the artistes talent and
the final section being non-classical or semi-classical section to gain the appeal of the audience in a
lighter vein.
The present day performances, despite changes having taken place, has its root in the post-trinity era.
The enormous music compositions in different format and structure have helped the present day
musicians to shape their concerts in such a way as to please varying aesthetic levels of the audience.
Today's performance stages are packed to full house because of the liveliness provided by younger
generation artistes who strive for innovations and experimentation.
The Millennium Pattern:
The recent trend has been the exposition of a performance in the Jugalbandhi pattern. This is a
trendsetter in mixing two contrasting and contradicting traditions in a single flow of presentation. The
artistes of vocal tradition or instrumental segments join together in union for a presentation of their
own identical ideas and imagination to catch the imagination of two different type of audiences who
has a varying mind set. Here the musical dialogue between the South and the North meets and melts.
Yet another innovative presentation has been the Jazz Yatra where in the East meets the West or the
North meets the South. Among the two styles one will have an alien nature and will be totally
different in the cultural as well as the aesthetic content. The two presentations knit and bound together
will be explicitly open for presentation in their own style but at the same time keeping an identical
path common to both the system. But in Jazz Yatra concerts percussion takes the lead.
The Southern system has seen of all a cross cultural exchange in the performance genres. For example
the western system of music found its way into the South Indian musical system and became one
when presented together. The technique of presentation was such that the two varying systems crossed
its cultural limits and displayed the impact of the other. None other did this experimentation than
Internationally acclaimed musician of the Europe Dr.Yehudi Menuhin and M.S.Gopalakrishnan in
early 50's. Later several others like Pundit Ravi Sankar, L.Subramaniam etc followed suit.
The limited attendance in heavily weighed artistes gave way to the emergence of an alternate
performance pattern in more closeted Chamber music concerts. Here the audiences as well as the
artistes were very carefully chosen for a better emancipation of like-minded wavelength. The
interaction between the artistes and the audiences were well know and had long drawn impact and
influence on both. In other words Chamber music concerts became very successful with more
emphasis on curriculum learning activity.
Sometime later the Private Commercial oriented Organizations in the name of Sabhas that organized a
variety of performances towards innovation and improvement that gave fruition to the fusion and
ethnic exchange performances. In the former there was fusion of musical content and theme within the
same texture or type of musical system. In the latter, people of different ethnic origins joined together
to perform their own indigenous cultural talents but outside the same musical system.
Before the end of the millennium fusion concerts became very popular both in India and abroad. It
paved the way for innovative research in the performance standards both in appearance and
presentation.
But it was the musical festive seasons or melas that put the impetus on the wide acceptance of all
types of performances in both quality and quantity. In the garb of variations all admixtures were tried
and effected in the performance patterns. In other words inter disciplinary ideas crept into the
performance pattern overlooking tradition and history of the musical system. Audience voted for a
change from the existing system whether they wanted it or not. So several leading performers of today
earned name and fame during this experimentation phase. To cite an example in the city of Chennai
alone there are more than 100 cultural organizations come to life during music festivals which
otherwise will be sleeping during the erstwhile period of the year. So also the artistes. One single
artist will have more than a dozen performances during the music season, otherwise like the
organizations the performer will also be on sleeping business during the year. Such trend of overdose
of performance pattern both in content and style during a particular season should change. Because
the performance genre has lost its intrinsic and cultural values. It has become a non-serious affair with
non-serious audience on a non-serious conduct.
Thanks to the Curriculum portals that many of the universities and institutions have created
performance oriented courses and performance training areas to infuse serious trend both in
preparation and presentation. But there are several feats to achieve in this training area. Modifications
are necessary to the effect that trainees be given ample freedom in choosing their Gurus/Teachers to
become an accredited performer. The Organizations and audiences should accept and encourage only
those performers who are seriously and systematically trained through a proper method by a Guru or a
teacher. Self-trained and self-appointed performers must be ignored and discouraged from
performance.
The Celluloid scene is taking away most of the audience to its forte even if the enthusiastic listeners
are not willing to be addressed. The glamour and glory of the color and presentation in the film areas
had diluted the intrinsic interest of the audiences. It has become difficult to recapture them back to the
aesthetic circle. It is high time to preserve the purity of the performances on a traditional path, but at
the same time on a new wave of emotional and cultural content. The organizations, cultural
Institutions, spiritual leaders, elders, serious teachers, genuine students and the bureaucracy should
strive hard to keep up to the level of standardizing and upholding the intrinsic and innate values of
performances in the new Millennium.
Several senior performers have ventured upon starting their own schools of performance to train
serious students and their proteges as trained performers which reflects their own school of thought in
an international perspective. But it is saddening to note that in this area of institutionalization, the
teachers from the North have gained more success than from the south. The reason for this rather
uncomfortable inadequacy is not encouraging.
Some of the leading advocates of innovative performance genres are notably middle aged and
sometimes very young performers who have gained international acclaim. To name a few are:
Mandolin Srinivas (who performed in the Barcelona Olympics and a star performer in most of the
International festivals), Veena Gayatri (who was a child prodigy having gained immense popularity as
a child artist at the age of five), Master Sasank (a leading child artiste on the Flute), Ravikiran (who as
a child prodigy entered the performing stage at the tender age of three), Vocalist like Bombay Jaysree,
Sowmya, a leading Celluloid fame Nithyasree, K.J.Yesudas, ( who have attained fame both in the
traditional and celluloid field) , Sanjay Subramaniam, Vijaya Siva etc.
Back to index
Proceedings paper
No repetition 16 10 3
Instant repetition 10 4 12
Separated Repetition 4 6 15
For the "no repetition" melodies, the frequency count was highest for "pitch salience" variations (i.e., new pitch, long durations, highest/lowest pitch). For the "instant repetition" melodies, the frequency
counts were highest for "repetition and grouping" variations (i.e., larger grouping, repeat melodic, repeat motivic). For the "separated repetition" melodies, the frequency counts also were highest for
"repetition and grouping" variations. It appears that the melodies with some form of repetition had the highest counts for variations guided by structural characteristics (i.e., larger grouping) of the
melodies, while the melodies with no repetition had the highest count for variations guided by specific musical characteristics of the melodies (i.e., introduction of new pitch).
Subsequently, what emerges from these variations in decline of recall accuracy are overall profile differences for the 3 types of structural melodies. For the "no repetition" melodies, recall decreased from
the beginning to end of the passage. There were periodic variations that modulated the overall descent, but in general, the pattern overall descended. For the "instant repetition" melodies, a scalloping effect
(elevation at the beginning and end points) for the beginning and end motivic groupings of the first melodic idea and immediate repetition occurred, followed by a decline in recall to the end of the melody
(i.e., during the new melodic idea) with peaks occurring at major phrase boundary endings. An example of this type of recall may be seen in Figure 1. Finally, for the "separated repetition" melodies there
was high recall for the initial melodic idea followed by a decline for the secondary melodic idea, and then a sharp rise in recall when the initial melodic idea repeated.
These data illustrate that melodic recall is not simply sequential in nature, but rather, the structural characteristics of a particular melody (e.g., repetition, long duration) guide the consolidation and
subsequent performance of melodic material.
Back to index
Proceedings paper
Over the last several years, we have been examining the way in which trombone players move their right arm and
consequently the trombone slide to change the pitch of their instrument. In that work, we have found that
professional trombone players use less muscular activity than other adult performers to move the slide (Lammers,
1983), use their wrist more than nonprofessionals (Lammers & Kruger, 1991), and move the slide faster than
nonprofessionals (Kruger, Lammers, Stoner, Allyn, & Fuller, 1996). Professional and student performers were
also found to differ in the distance they moved the slide to reach each of the seven positions on the trombone
slide. Differences were most notable in the longest positions (Kruger, Lammers, Fuller, Allyn, & Stoner, 1997).
Professionals move the slide further and more accurately when reaching for the sixth and seventh positions. Both
professionals and students move the slide further between first and second position than is recommended by any
of the method texts we've examined.
We have been studying trombone performance with several goals in mind. One goal is to develop a better
understanding of expertise in skilled movements. Trombone performance is an excellent example of a natural
task with wide variability in performance. It is also interesting because accurate motion in itself does not
necessarily lead to musical performance. Consequently, it is interesting to see whether or not experts are able to
attend to musical demands more than other performers simply because they have automated the process of
moving the trombone slide to a greater degree. We have also been interested in using careful descriptions of what
skilled performers do in order to challenge or confirm the folk wisdom developed by teachers of trombone
performance. For example, we've found that performers with longer arms do not as a consequence perform better
than those with shorter arms. We hope to be able to develop a clear set of recommendations that will be
instructive to applied music teachers.
With this in mind the current study focuses on the performance of beginning trombone players. Their
performance will be compared to the performance of professional performers and college level musicians that
we've reported in our earlier work. We seek to compare the speed of their slide motion and their use of their
elbow and wrist with more skilled performers. The data we have collected is intended to allow us to explore the
extent to which differences between beginners and other performers is increased when additional demands are
placed on performance, e.g., by increasing tempo.
METHOD
Subjects
The present study was conducted as part of a larger program of research that examined expertise in trombone
performance. For that study, Forty-two trombonists were recruited to participate in a study of arm motion during
performance. Of these, seven performers were beginners ranging in age from 10 to 14. These performers had
played the trombone between one and four years. All subjects signed informed consent forms and were paid for
their participation. Information about the age, experience, and arm length for all performers can be seen in Table
1 below.
Procedure
A special sleeve fitted with two Penny and Giles electrogoniometers was used to record movement in the elbow
and wrist. A 30.5 cm circular target made of poster board was attached to the end of each performer's trombone
slide. The motion detector was attached to a small stand placed in front of the performer and aimed at the modal
position of the target on the end of the slide. Each individual performed three study exercises and two musical
excerpts. For the present study, only performance data from the first exercise was utilized. This musical exercise
consisted of the B flat scale played ascending and descending at three different tempos: with the quarter note
equal to sixty beats per minute, with the half note equal to sixty beats per minute, and with the half note equal to
120 beats per minute. Additional information on the subjects and procedures can be obtained from Lammers et.
al. (1996).
RESULTS
Analyses of Variance were run to examine the effect of the distance the slide was moved, the tempo of the
exercise, and the direction of motion on the duration of the slide movement. Level of group (professsional,
college student, beginner) was included as a between subject variable. All other variables are within subject
variables. It should be noted that significance tests in all of the analyses reported below can only be taken as
suggestive because the sample sizes were very small. This was especially true for the beginner's group in which
there were a significant number of trials in which data was missing. Some of the performers in this group rotated
their torso and the trombone slide from side to side to the extent that the transducer failed to provide accurate
readings.
Level of expertise, tempo of performance, direction of motion, and length of motion all significantly influenced
the time required to move from one position to another. The tempo of the exercise, the distance moved, and the
direction (away from the face versus toward the face) moved each produced a significant difference in duration of
motion. Surprisingly there was not an overall effect due to level of expertise even though the means clearly show
that beginners move the slide more slowly. Examination of the standard deviations suggests this is most likely
due to high variability within the beginners group and the small sample size. Figure 1 illustrates the differences in
mean motion time found as a function of level of expertise and the direction (away from the performer versus
toward the performer) in which the slide was moved. In this figure the movement time is averaged across
motions of differing lengths (1st to 2nd, 1st to 4th, 1st to 6th, 6th to 1st, 4th to 1st, and 2nd to 1st). A significant
interaction (p < .05) was found between level of expertise and direction of slide movement. Examining Figure 1
suggests that the difference is movement duration between inward and outward motions decreases as expertise
increases.
A similar Analysis of Variance with one between factor (level of expertise) and three within variables (distance,
tempo, and direction of motion) was run on the relative change in elbow angle. Changes in elbow angle were
recorded as changes in milli-volts (mV) by the Penny & Giles instrument used in this study. Figure 2 shows that
the variability in elbow angle was greatest for the long motion for all three groups. Similarity among the groups
decreased with increasing length of the motion. However, there were significant effects of group or interactions
with the group variable found in these analyses.
Figure 3 illustrates what appears to be a notable difference in elbow angle between more experienced and
beginning players when they move from first to sixth position. However, analyses of variance focusing only on
the motions from first to sixth fail to find significant differences as a function of expertise, tempo, or the
interaction between them. Examination of the distance of motion data suggests that this difference is largely
because the beginning performers simply never get their slides all of the way to sixth position.
DISCUSSION
Perhaps the most striking finding to be reported here is that the variability of beginning performers was
sufficiently large as to overwhelm the transducers we used to measure speed and distance of slide movement. A
strategy for improving this measurement system will be described below. An unexpected finding in the analyses
reported above is that performers move the slide faster when making an outward motion than when making an
inward motion. In this case, they are likely avoiding forcing the mouthpiece of the instrument into their
embouchure. It is worth noting that first position is the only position where the slide comes to rest against a
physical barrier. Consequently, it should be the position where uncertainty about where to place the slide is
lowest.
As reported in our earlier work, tempo and distance of motion produced the largest effects on the speed of slide
motion. This suggests that each performer moves the slide as fast as they need to depending on the requirements
of the motion. This finding is inconsistent with the proposal that performers develop a single automated and
crystalized motion from one position to the next that they call upon to position the slide.
The trends found in the data above suggest strongly that further study of beginning performers is needed. It
appears that the primary differences between beginners and other performers occur in the longer motions.
However, it is difficult to draw conclusions from the data we have gathered because our younger performers
simply moved their bodies too much from side to side. While this observation is interesting because it suggests
that students may need to gain better control over their instruments, it should also be possible to examine slide
motion independent of this side to side motion. We are currently working on a strategy that mounts our ultrasonic
transducer directly onto the end of the slide so that it constantly measures the distance between the slide end and
the bell of the instrument. We believe this should allow us to differentiate variability in actual distance of slide
motion from variability in overall body motion, resulting in more accurate estimates of slide motion for all
performers.
REFERENCES
Kruger, M., Lammers, M., Stoner, L. J., Allyn, D., and Fuller R. Musical expertise: The dynamic
movement of the trombone slide. Found in Haake, S. J. (ed.) The Engineering of Sport. Amsterdam,
Netherlands, A. A. Balkema Publishers (1996).
Kruger, M. Lammers, M., Fuller, R., Allyn, D., and Stoner, L. J. Biomechanics of music
performance: Moving the trombone slide. National Association of College Wind and Percussion
Back to index
Proceedings paper
Communication of emotions is important both in everyday life and in the performing arts. Music is widely
acknowledged as an effective means of emotional communication. Studies on music performance, though, have
historically dealt almost exclusively with various structural aspects of performance and largely ignored questions about
emotional expression (for reviews; see Gabrielsson, 1999; Palmer, 1997). More recently, a number of studies have
shown that performers are able to communicate specific emotions to listeners via their expressive performances (e.g.,
Balkwill & Thompson, 1999; Behrens & Green, 1993; Gabrielsson, 1995; Gabrielsson & Lindström, 1995; Gabrielsson
& Juslin, 1996; Juslin 1997a; Juslin 1997b; Juslin & Madison, 1999; Ohgushi & Hattori, 1996).
The previous studies on emotional communication in music performance suggest that the performer's expressive
intention influences practically all factors in the performance. Emotional expression in music thus involves a whole set
of cues that are used by both performers and listeners (e.g., Juslin, 1997a). Different instruments provide different
means to express various emotional characters. Therefore it is of importance to use a wide range of instrumental
settings in studies of emotional expression in music. Until now, there has been no study that has used purely percussive,
non-melodic instruments. Behrens and Green (1993) included timpani in their study, but did not report any
measurements of acoustic characteristics.
The musical material of the present study consists of short rhythm patterns played on a set of electronic drums. This
design offers a more limited range of expressive cues to the performer when compared with earlier studies. Also, the
possible influence of melody on the expression is excluded. The general aim of this study was to investigate if
communication of emotions to listeners is possible under these circumstances, and more specifically to study the
listeners' recognition of the intended emotions, as well as the performers' use of the acoustic cues.
Methods
Two professional drummers were instructed to play three simple rhythm patterns (swing, beat, and waltz) on a set of
electronic drums so as to communicate specific emotional characters (angry, fearful, happy, sad, solemn, tender, and
expressionless) to listeners. The performances were recorded on tape and stored in computer memory.
Thirteen university students listened to all performances and judged them with regard to happiness, sadness, anger,
fear, tenderness, solemnity, and expressiveness. The judgements were made on scales from 10 to 0, where 10
designated maximum and 0 minimum of the respective attribute.
Analyses of the performances were conducted with regard to their acoustic characteristics, e.g. tempo, dynamics, and
timing. All measurements were made with the Soundswell software (Ternström, 1996). The mean tempo was obtained
by dividing the total duration of the performance, until its final note, by the number of beats, and then calculating the
number of beats per minute. The tempo variability was obtained by calculating the tempo for each consecutive beat
(quarter note), and then calculating the standard deviation for the tempo distribution. A measure of the sound level was
obtained by measuring the loudness equivalent level, as provided in Soundswell. The onset times for each note of the
performances were measured, and these values were used to calculate the deviation from mechanical performance for
each note. A mechanical performance is a performance with absolutely constant tempo and strict adherence to the
nominal ratios between different note values.
Results
Listening tests
Separate repeated measures ANOVAs in each emotion scale showed that the listeners' ratings differentiated in
accordance with the performers' intentions (F (6,72) = 4.83-32.91, p < .001). Multiple comparisons (Tukey's HSD)
showed significant differences (p < .05) for all pairwise comparisons between the intended emotion and the other
emotions within each emotion scale (except no expression). These comparisons show that the listeners in general
perceived the intended expressions.
Performance analyses
The two most successfully decoded (highest ranking of the adjective in question and low ranking of the
non-corresponding adjectives) performances of each of the intended emotional characters were chosen for analysis. The
results of the acoustic analyses are shown in Table 1.
Tempo
The happy versions were played the fastest and the sad versions the slowest. The fearful versions showed so much
variation of tempo that the mean tempo is less meaningful.
Dynamics
The angry versions were clearly the loudest followed by the solemn versions, while the sad and tender performances
had the softest sound level.
Timing
The performances were compared with regard to the amount of deviation from the note durations corresponding to the
nominal values as given in the notations. An overall measure of the amount of deviation was obtained by calculating
for each performance (a) the number of notes whose deviation was less than 5 per cent, (b) the number of notes whose
deviation was less than 10 per cent, and (c) the number of notes with more than 20 per cent deviation.
The fearful versions had the by far largest deviations followed by the happy and the solemn versions. Except for the
fearful versions, deviations from mechanical tempo were rather small; most deviations were less than 10 per cent or
even less than 5 per cent. The no expression versions showed the smallest deviations.
Further, the ratios of dotted rhythm patterns were compared. The performance of dotted rhythm patterns is known to
vary depending on, for instance, tempo, musical style, and the performer's expressive intentions. For both the swing
and the waltz rhythms, the "long - short" ratio was higher in the happy versions than in the sad versions.
Table 1
Mean tempo (beats per minute); tempo variability (standard deviation); relative sound level [dB Leq; decibels relative
to the reference level (0 dB) set for the softest performance]; percentages of notes deviating less than 5 per cent, less
than 10 per cent, and more than 20 per cent from the nominal values; and the means of the dotted note ratios for the
two most accurately decoded performances of each intended emotional expression.
Deviations (%)
Expression Performer/Rhythm Tempo Tempo Sound <5% <10% >20% Ratio
variability level
Happy A / Swing 192 5.80 7 48 61 12 2.71
Discussion
The results show that listeners generally perceived the intended emotions. The different expressive intentions
influenced the variables available to the performer (tempo, dynamics, and timing) in ways characteristic for each
intended emotion. The results of this study agree well with earlier studies of emotion expression in music as regards
both the listeners' decoding accuracy and the performers' cue utilisation.
All of the cues measured in this study are also available in non-verbal vocal expression. The patterns of cue utilisation
for tempo and dynamics correspond closely with the results for speech rate and intensity reported in the literature on
non-verbal communication of emotions in speech (e.g., Pittam & Scherer, 1993; Scherer, 1986) as regards happiness,
sadness, anger, and fear. The fact that the patterns of cue utilisation are stable across modalities could point toward a
common origin and/or similar mechanisms in communication of emotions in music and speech (cf. Juslin, 1997a). It
must, however, be pointed out that not all cues and emotions used in this study have been thoroughly investigated in
both modalities.
Acknowledgements
This research was supported by the Bank of Sweden Tercentenary Foundation through a grant to Alf Gabrielsson.
References
Balkwill, L.-L., & Thompson, W. F. (1999). A cross-cultural investigation of the perception of emotion in music:
Psychophysical and cultural cues. Music Perception,17, 43-64.
Behrens, G. A., & Green, S. B. (1993). The ability to identify emotional content of solo improvisations performed
vocally and on three different instruments. Psychology of Music, 21, 20-33.
Gabrielsson, A. (1995). Expressive intention and performance. In R. Steinberg (Ed.), Music and the mind machine. (pp.
35-47). New York: Springer.
Gabrielsson, A., & Juslin, P. N. (1996). Emotional expression in music performance: Between the performer's intention
and the listener's experience. Psychology of Music, 24, 68-91.
Gabrielsson, A., & Lindström, E. (1995). Emotional expression in synthesizer and sentograph performance.
Psychomusicology, 14, 94-116.
Juslin, P. N. (1997a). Emotional communication in music performance: A functionalist perspective and some data.
Music Perception, 14, 383-418.
Juslin, P. N. (1997b). Perceived emotional expression in synthesized performances of a short melody: Capturing the
listener's judgment policy. Musicae Scientiae, 1, 225-256.
Juslin, P. N., & Madison, G. (1999). The role of timing patterns in recognition of emotional expression from musical
performance. Music Perception, 17, 197-221.
Ohgushi, K., & Hattori, M. (1996). Emotional communication in performance of vocal music. In: B. Pennycook, & E.
Costa-Giomi (Eds.), Proceedings of the Fourth International Conference on Music Perception and Cognition. (pp.
269-274).
Palmer, C. (1997). Music performance. Annual Review of Psychology, 48, 115-138.
Pittam, J., & Scherer, K. R. (1993). Vocal expression and communication of emotion. In: M. Lewis, & J. M. Haviland
(Eds.), Handbook of emotions. (pp. 185-197). New York: Guildford Press.
Scherer, K. R. (1986). Vocal affect expression: A review and a model for future research. Psychological Bulletin, 99,
143-165.
Ternström, S. (1996). Soundswell Signal Workstation v 3.4. Computer software, Soundswell Music Acoustics HB:
Stockholm, Sweden.
Back to index
Proceedings abstract
IT'S A PART OF ME: SHAPING MUSICAL UNDERSTANDING AND KINESTHETIC
SELF-IDENTITY THROUGH MUSICAL PERFORMANCE
Alyssa Lightbourn, Department of Ethnomusicology, University of California, Los Angeles
Background. In many studies of the psychology of music, the philosophical premises of research
designs privilege the intellectual understanding of the Western harmonic canon as the optimal
doorway to musical appreciation. While widespread familiarity with this theoretical canon informs
certain aspects of performers' musical experiences and facilitates seemingly objective, verbally
efficient accounts of musical understanding, too often, much about musical experiences is overlooked.
Indeed, musicians have the capacity to powerfully experience music according to various physical,
emotional, and psychological sources of knowledge that lie beyond the comprehension of music
theory. Furthermore, these forms of knowledge shape not only experiences of music but also the
development of musicians' personal identities. They contribute to the intimate connection between the
musician and the music.
Aims. As one part of a larger investigation, this study explores the kinesthetic aspect of musical
performance as it relates to the connection between musical experience and personal
kinesthetic-identity.
Method. Six students learning to play traditional North Indian tabla (drum) music, which involves
rhythmic, dynamic, timbral, and pitch-related subtleties, and their common tabla instructor served as
informants for case studies. The informants had varying degrees of both skill, from beginning-level to
professional-level, and familiarity with tabla music, from almost none prior to the initial tabla lesson
to having had tabla music included in their life-long cultural heritage. Also, all but one of the
informants were musicians before starting their tabla studies. At videotaped interviews, participants
provided extensive, detailed verbal details of their kinesthetic experience of playing tabla pertaining to
(1) their perceived connection with the music or lack thereof and (2) their current and past kinesthetic
self-identity. Where they desired, interviewees supplemented their descriptions with performance.
Also, they contextualized their responses according to current and past emotional and intellectual,
musical and non-musical experiences.
Results. Amateurs' and professionals' kinesthetic experience of the musical performances shaped their
sense of connectedness with the music, their awareness of their bodies, and their approaches to
listening to music, albeit in different ways. All informants expected a stronger bond with both the
tabla and the tabla music with continued musical instruction.
Conclusions. The case studies demonstrate individual complexes of physical, emotional, and
intellectual understandings of music. Also, the studies show that performing tabla music contributes in
various ways to performers' senses of identity.
Back to index
Proceedings abstract
parncutt@kfunigraz.ac.at
Background:
Aims:
Piano performance majors at the University for Music and Performing Arts
attended a series of lecture-demonstrations on selected relevant recent
scientific research. Implications for piano teaching, practising, and
performance were discussed. Arguments were presented for and against strategies
arising from the research, drawing on both scientific evidence and authors' and
participants' practical experience. Two months after the third lecture,
participants were interviewed to explore if, and how, the presented information
had been of practical use. Their teachers were similarly interviewed.
Results:
A pilot study suggested that pianists vary in their willingness to accept and
use suggestions beyond those of their teacher. Of those that are open to
outside influences, pianists with a scientific background (e.g., those who
studied sciences at high school) and those with a relatively analytic approach
to teaching and learning (responding well to detailed instructions rather that
imitiation or intuition) more often took advantage of scientific knowledge. No
consistent preference was observed for physical, physiological or psychological
aspects. Results of the major study will be presented at the conference.
Conclusions:
Back to index
Proceedings paper
Introduction
Studies in music performance provide evidence about the ability of the performer to communicate attributes of the musical piece to
the listener, such as meter (Sloboda, 1983), phrasing (Repp, 1992; Todd, 1985), dynamics of rhythmic gestures (Gabrielsson, 1987),
texture hierarchy (Palmer, 1996b, 1989), among others. Thus, musical structure turns apparent in the performance and, therefore,
this is the most illustrative focus to represent music (Palmer, 1996a). Even though it might not exist a real performance of a musical
piece, at least it does exist as a virtual one in the mind of both the analyst and the performer .
Performance microstructure - the set of modulations of tempo, dynamics, articulation, pedalling, vibrato, tuning, etc. which the
performer displays beyond the score instructions) characterises the "personality" of a performance. It is well known that this
microstructure is highly controlled during an expert performance. Possibly, the structural representation which monitors the
performance is based on structural attributes of higher organisational order such as tonality. Little is known about the incidence of
the relation between tonal structure and performance. Thompson & Cuddy (1997) discovered that performance microstructure is
very important in the cognition of psychological distances between different tonalities. The microstructure of different voices is
particularly special in this process. Thus, expressive components of the voice leading within complex textures seem to be related to
the understanding of the tonal relationships of the musical piece.
At the same time, many studies about the representation of the musical structure on the listener have been developed. An important
body of research has focused on the analysis of the listeners representations according to concepts derived from theoretical models.
Some principles of different theories of musical structure have already started to be explored (Deliège, 1987; Dibben, 1994;
Krumhanls, 1995). In connection to this research tradition Serafine, Glassman & Overbeeke (1989) have provided evidence about
the ability of listeners to match a melody with its rendered underlying structure, pointing to the cognitive reality of the hierarchic
structure. It is noticeable that, even though it exists great interest to investigate the ways in which structure is communicated during
performance, this issue has not been treated in the studies concerning the listeners representation of musical structure.
The aim of the present work is to verify the results of the mentioned study (Serafine et al., 1989) taking into account the incidence
of some peculiarities of the performance. In doing so, different performances are used as independent variables.
The analysis of the underlying voice leading would seem to be important in the rendition of a coherent performance (Cook, 1990;
Rothstein, 1995). However, it does not seem to be a clear evidence of the existence of objective indicators of a hierarchic
representation in actual performances. Cook (1987) studies the timing related to the structure in a Bach prelude, providing same
evidence about timing in relation to long term prolongational structure. However, in a small scale he presents only partial
information interpreted in an ambiguous way.
This paper focuses on the study of timing as an attribute of the microstructure. During the performance, timing reveals both a
low-level way of organisation related to psychoacustical phenomena and a high-level way that involves structural organisation
(Penel & Drake, 1998; Repp, 1998). A second aim is to analyse the underlying voice-leading as a potential source of timing.
Study of Performance
Two restrictions limited the selection of the stimuli used by Serafine et al. (1989):
1. Given the interpretative nature of the reductions and in order to take the interpretation provided in that study, it was possible
to select only the four analysed examples reported there.
2. Taking into consideration the importance of the vertical timing (chord asynchrony) to analyse the problem of the underlying
voice leading (Palmer, 1989, 1996b) the study was focused on monophonic musical pieces -as the stimuli were not obtained
via MIDI, the only procedure which has proved appropriated for the study of vertical timing at present - reducing the
possibilities of selection to only one musical piece.
Even though the mentioned restriction constrains the scope of the present study, it is considered that the results may provide useful
evidence to future endeavours focusing on performance.
Method
The Performances
Six expert performances of the Bourré I from the Suite Nro. 3 in C Major for solo cello by J. S. Bach (measures 1 to 4 - with the
upbeat) were selected ( Figure 1A). The melody shows a sequence of six rhythmic groups of 2, 2, 4, 2, 2, and 4 beats each, in which
the last duration is always the longest. The performers were Paul Casals (PC), Pierre Fournier (PF), Maurice Gendron (MG),
Mitislav Rostropovich (MR), Paul Tortelier (PT) and Yo Yo Ma (YM).
Thus, the collection of versions, although reduced, show famous interpreters whose interpretative styles are both different and
widely recognised, representing a large range of qualified interpretations of the piece.
Figure 1. a) Bourré I from C Major Cello-Suite by J. S. Bach, m. 1-4. b) the foreground reduction - by Serafine et al.
(1989).
Procedure of Measurement
A standard software of sound edition (Sound Forge 4.5) was used for the analysis. It displays wave forms (amplitude envelops). The
measurement of the onset from the cello signals present difficulties, mainly because of the noise produced by the bow, before the
pitch is clearly defined. As the performer intuitively operates with that interval of time to regulate the onset, it was considered that
the timing would be determined by the moment in which the sound is perceived as one melodic tone. The predominant non-legato
articulation made easier the task providing clear wave decays for each note. However, the low register, the arpeggi, and the original
recordings condition added a great deal of noise and confusion to the onset of certain notes. Therefore, it was followed an aural
procedure in which segments increasingly smaller of the wave were subjected to aural testing. Both researchers analysed the
performances separately. Differences between the two measurements of each onset were not higher than 10ms. In those cases in
which the difference was higher, the agreement was reached through Inter ratter sessions. Thus, both visual and aural cues were
considered, and then â€"if necessary- further analysis of the fundamental frequency and spectrogram were run.
In that way, 22 inter-onset intervals (IOI) were determined. The onsets 2-1 (measure 2 first beat) and 4-1 correspond to a chord â€"
which, according to the possibilities of the cello is performed as arpeggio-. In such cases, the considered onset corresponded to the
highest pitch, since this is the one represented in the reductions. Each IOI â€"measured in milliseconds- was divided by the nominal
value of the note according to the tempo of the performance. It is obtained dividing 15.000 (the number of milliseconds of a minute,
divided by 4 â€"the number of quavers which are contained in a single temporal unit, the half-note) by the actual average duration
of the minimal unit. A proportion of deviation of the actual performance from the nominal value was obtained. These values were
graphically represented as profiles of expressive timing, in which the horizontal axis represents time and the vertical one represents
Results
Overall characteristics of timing
BasicTempo (BT). Determining the tempo is important since: i) it is basic to calculate the timing profiles, and ii) it is probable that
different tempi give rise to encoding the timing in different ways, given that there is an optimal zone for the detection of the
temporal variations (Drake y Botte, 1993). The more varied the timing modulation the least applicable the procedure explained
above in order to obtain the basic tempo of a performance, because sharp variations produce strong deviations from this proportion
(Repp, 1998). It is expected that the selected fragment does not show this problem provided that dance style and tempo require a
particular adjustment in timing. However, due to considerations of technique performance, the two chords (mm. 2-1 and 4-1) might
modify substantially the IOI average. Therefore, they were removed in order to calculate the basic tempo, considering only 20 IOI.
The tempi used by the cellists comprised a range of ï•¨ï€ = 59 (MR) a ï•¨ï€ = 88 (YM) [PC = 83; PF = 71; MG = 85 y PT = 74].
Relative Modulation Depth (RMD). It is the IOI coefficient of variation (DS/Mean), and allows the comparison between the amount
of temporal variation of all the examples even though if they are performed in different tempi. The degree in which it is possible to
detect the temporal variation also depends of the level of dispersion of the sequence (Drake & Botte, 1993). The RMD ranged from
0.24 (MG) to 0.71 (MR) [PC = 0.45; PF = 0.30; PT = 0.42; y YM = 0.33]. The tempi utilised by the cellists presented a trend to
correlate negatively with RMD with a marginal significance (considering the small number of degrees of freedom) of r(4)= -.753; p
< .084. This suggests an association between slow tempi and greater variability.
Figure 2. A) Means of the timing profiles. B) Factors I and II. C) Comparison of Factor I with the performances of Rostropovich
and Ma. D) Comparison of Factor II with the performance of Gendron.
Group 1. Although components appear to be very similar, factor I shows a higher relative duration of the first note (E).
Group 4. While factor I (FI) lengthens the passing-tone E, factor II (FII) shortens it. Thus, taking into account that the rhythm,
emphasises F because it is a quarter note, FI reinforces the interval E-F and FII, D-F (this will be named Property A)
Group 3. A great lengthening of the C characterises FI. Inversely, FII lengthens the B. The later is largely emphasised by the
noticeable shortening of the A. Thus, FI shows a relative emphasis of the A, while FII dissembles it. (Property B).
Group 6. In the parallel motive the emphasis is almost reversed in the first note: now is FII which emphasises the G previous to the
arpeggio. At this point it emerges a conflict between the metric position of the tones and their condition as belonging to the C Major
chord. In both factors, timing is used to search the equilibrium. In m. 4-1, FI privileges the metric component and FII the tonal one
(emphasising the E). On the contrary, in m. 4-2 the pattern is reversed, and FII emphasises the metric component while FI does the
same with the tonal one. Therefore, the final emphasis of FI is E-C and the one of FII is D-C (Property C).
Discussion
The analysis of expert performances of the first phrase of the Bourré I from the Suite No. 3 by Bach revealed that renown artists use
a variety of timing strategies. However, it is possible to find some commonalties: for example, the acceleration of the lower voice
reveals that timing is used to differentiate polyphonic principles in this monophonic texture. Thus, timing strategy is operating on
the clarification of particular features of the voice leading, at least in a overall sense. Another communality is the lengthening of the
IOI involving arpeggi.
Differences in the timing strategies range from global aspects ( i.e. adopted tempo) to local effects (i.e. performance of non-chord
tones). In spite of the rhythmic features, it was not possible to find any association between a specific tempo and particular strategy
of timing. Contrarily, extreme tempi (Ma and Rostropovich) both were identified with the same factor II.
Noticeably, differences in the timing strategies appear to be connected to the prolongations at the voice leading level. Thus,
observing the reduction in figure 1b, FI y FII may be understood as different strategies to the performance of the passing notes and
neighbour notes.
There would not be expressive deviations in order to point to the phrasing or the texture because the fragment is short and
monophonic. In that way, tonal and rhythmic-metrical components would be the more relevant to the microstructural organisation
of the example. Some of the temporal variations could be explained as a result of bottom-up processes involved in rhythmic and
melodic perception (Drake, 1993). But the lack of similar timing patterns which match similar structural patterns reveals the
existence of other sources related to the deeper musical structure.
Very different strategies could be observed not only between the artists but also within a single musician. For example,
Rostropovich uses two different strategies to face a similar structural problem: in m. 2-2 it privileges the metric structure and
lengthens the note in the strongest metric position. On the contrary, in m. 4-2 shortens the note in the strongest metrical position and
emphasises the note which is tonally structural. It seems, then, that neither the metric structure nor the underlying voice leading are
able to explain separately, all the temporal alterations.
Study of Listening
In order to verify the influence of the performance in the representation of the hierarchic structure Experiment 1by Serafine et al.
(1989) was followed. In that experiment subjects listened to both a model melody and two reductions â€" true and foil reductions in
schenkerian terms- and match one of them to the melody. In our experiment different versions were used as independent variables,
under the assumption that subjects will tend to choose the foil if it displays notes which, although non structural, are emphasised by
the performer. For example, if the foil displays a E in m. 2-4 (Property B) instead of D (Figure 1a and 1b - compare to Figure 3 b)
subjects will prefer it more while matching with the Rostropovich's performance (who emphasises the E) instead of while listening
the Gendron´s version (who shortens the E).
In order to minimise the effect of repetition of the same piece on the learning of its own structure, the number of versions was
reduced, selecting those which represented the most interesting interpretations to be tested. Thus, Gendron´s version (representing
FII) and Ma´s and Rostropovich´s ones (representing FI with different tendencies in the Properties and extreme tempi) were
used.
Method
Subjects
N = 40 (60 %, with moderate musical experience - mean = 4.8 and 40 % without musical instruction). Mean age = 21.4 years (18 -
36).
Stimuli
Stimuli consisted on three of the performances studied, played by Gendron, Rostropovich and Ma. Besides four reductions for each
of them were synthesised. The first, Original (Figure 1b), is the one proposed by Serafine et al. (1989). The other three, Foils A, B
and C, present only note different from the Original (figure 3). The change in FA refers to Property A. It is observed that B was not
replaced by A; it was added instead, due to the remarkable emphasis on B as a consequence of the arpeggio. So, eliminating it
would have distorted the task. For this reason, FA is "less reduced" , that is to say, corresponds to a more superficial level. FB
presents a change which refers to Property B (D changes by E). FC presents a change which refers to property C (E changes by D).
Figure 3. Foil reductions used as lure. FA: B changes for B-A from m. 2 (Property A). FB: D changes for E from m.2
(Property B). FC E changes for D from m. 4 (Property C).
Both the tempo and the timing profile of the corresponding version were kept as the reductions. For each note of the reductions the
onsets measured in milliseconds were determined according to the way in which the artist played them in the original version.
However, as the reductions did not include the arpeggi, the whole previous IOI sometimes sounded unmusical - with a break that
interferes its continuity-, for that reason, a proportional shortening of this lengthening was implemented . Parameters of dynamics
and timbre (cello) were kept constant all along the 4 reductions.
Procedure
In each trial, subjects listened the melody (Model) and two reductions (1 and 2) in the following sequence: Model, 1, 2, Model, 2,
Model, 1, Model, 2, 1. One of the two reductions was always the Original, and the other was a Foil. After listening to the sequence,
subjects: 1) chose "the best reduction of the model", and 2) indicated how sure they were of the answer - not sure, more or less sure,
or very sure. There were two warm-up examples taken from another fragment. It was included a 15 second fragment from other
pieces for cello by Bach in order to "separate" the trials, with the purpose of spacing the repeated listening of the same fragment.
The whole session lasted approximately 20 minutes.
Design
There were three sequences for each performer, which compared Original/ Foil A (Property A test), Original / Foil B (Property B
test), and Original/ Foil C (Property C test). Thus, the whole test consisted of 9 trials, which were presented in different orderings
according to 1) an order within the pair; 2) the Foil that belongs to the pair; and 3) the performer.
Predictions:
1. Property A test: while listening the pair Original/FA, subjects will prefer Original for Gendron and Ma and FA for
Rostropovich;
2. Property B test: while listening to the pair Original/FB, subjects will prefer Original for Gendron, FB for Rostropovich and an
intermediate value for MA;
3. Property C test: while listening to the pair Original/FC, subjects will prefer Original for Rostropovich and (at a lesser extent)
MA and FC for Gendron.
Figure 4. Means of the ratings for the different foils (A, B and C) against the Reduction in the versions by MG, MR
and YM. The lower the value of the ratings the higher the preference for the Foil
General Discussion
The analysis of the performances showed different timing strategies. Many differences are related to the relative lengthening of the
structural notes compared to the more superficial ones. Possibly, the variety of timings may be the manifestation of diverse ways to
conceive musical structure in reductional terms.
The main aim of the present study was to verify performance effects in the representation of the tonal structure. Although the
fragment employed seems a priori not to require of noticeable timing variations in order to be expressively performed, the values of
the RDM (Relative Modulation Depth) revealed different modalities for the microstructural control.
The performer's possibility to emphasise notes which are in weak metric positions through subtle enlargements, reveals that the
aspects of timing combined to conditions of melodic-tonal coherence may exert an important action in the mental configuration of
the underlying structures. Although the findings are not strong, mainly due to the limitations of a design highly constrained, it is
possible to state that musical performance has incidence in the matching task of a melody with its rendered structure. On the one
side the timing strategies may reveal the kind of structural representation of the performer and on the other side they may convey
such structure to the listener.
From a structural point of view, different microstructures may facilitate or interfere the processes of tonal tension and relaxation in
the listeners representation. This is congruent with the findings of Thompson & Cuddy (1997). However, it may appear to be in
contradiction with important concepts of the theory which does not take into account duration's aspects. Noticeably, the reductions
presented here are very near the surface, a context in which rhythm is taken into consideration by the theoretical framework.
Related to this, it is important to state that listeners tend to match much stronger the option which presents a more superficial (even
subtle) level. Although it is necessary to investigate more this aspect, it could reveal that the reductional process is not automatic;
instead, it requires an activation. If the listener can not activate this process, he will remain at a more superficial level. If this holds
water, it might stand out the role of performance as activator of the process of reduction.
Although the results are very incipient they talk about the needs of considering the findings of listening studies according to the
research about microstructural components of performance. It is particularly relevant to those studies which intend to generate
ecologically valid contexts. Although it has been investigated only one aspect of the microstructure, possibly other attributes have
similar or even more incidence than the timing on the hierarchic listening. It is assumed that the dynamics and the control of tuning
and vibrato -in instruments which allow it- may be powerful attributes which provide important cues to the listener during the
abstraction process of the tonal hierarchies.
References
Cook, N. (1987). Structure and Performance in Bachâ€(tm)s C Major Prelude (WTC1). An Empirical Study. Music
Analysis, 6 (3), 257-272.
Cook, N. (1990). Music, Imagination, & Culture. Oxford: Oxford University Press.
Deliege, I. (1987). Grouping Conditions in Listening to Music: An Approach to Lerdahl & Jackendoffâ€(tm)s
Grouping Preference Rules. Music Perception, Vol. 4 No. 4, 325-360.
Dibben, N. (1994). The Cognitive Reality of Hierarchic Structure in Tonal and Atonal Music. Music Perception, 12 No
1, 1-25.
Drake, C. & Botte, M. C. (1993). Tempo sensitivity in auditory sequences: Evidence for a multiple-look model.
Perception & Psychophysics, 54 (3), 277-286.
Drake, C. (1993). Perceptual and performed accents in musical sequences. Bulletin of the Psychomonic Society, 31,
107-110.
Gabrielsson, A. (1987). Once Again: The Theme form Mozart's Piano Sonata in A Major (K.331). In A., Gabrielsson.
Action and Perception in Rhythm and Music. Publications issued by the Royal Swedish Academy of Music No 55. 81-
103.
Krumhansl, C. (1995). Music Psychology and Music Theory: Problems and Prospects. Music Theory Spectrum. Vol.
17 No. 1, 53-80.
Palmer, C. (1989). Mapping musical thought to musical performance. Journal of Experimental Psychology: Human
Perception & Performance, 15, 331-346.
Palmer, C. (1996a). Anatomy of a Performance; Sources of Musical Expression. Music Perception, Vol. 13 No. 3,
Recording References
Bourré I from Suite No. 3 in C Major for Cello Solo
(Artists. Company. Number)
Casals, Paul. EMI. CDH - 7 61028 2
Fournier, Pierre. Archiv Produktion. Stereo 449 711-2 gior 2
Gendron, Maurice. Phillips. 442 239-2
Ma, Yo Yo. CBS Masterworks. M2K 37867
Rostropovich, Mstislav. EMI. 7243 5 55365 2 5
Tortelier, Paul. EMI. 7243 5 73526 2 8
Back to index
Demonstration Papers
These run concurrently with the poster sessions on Sunday and Wednesday
First Author Poster title Day Room.
Dalgarno, G. A vibroacoustic system for experiencing music Sunday Fraisse
Furno S. Concepts et cat?orisation dans le champ du son musical: Le TAS Sunday Helmholtz
A study of the nature of components of disciplinary structure in applied violin Wing
Gholson, S. performance in expert teaching practice (ABSTRACT)
Sunday
Webster, P. Music composing software for people from ages 6 to 60 Wednesday Seashore
Proceedings paper
Introduction
Vibrotactile and vibroacoustic chairs/couches have been used in two areas: for therapeutic use and for helping hearing impaired to perceive
music. Pioneers in the therapeutic use have been Olave Skille [1] and Tony Wigram [2]. Our own work was for many years solely in the
perception of music area [6], but in the last three years we have moved very much into the therapeutic area.
Systems have been produced commercially; in the UK these include those made by The Sound Beam Project and overseas by the Somatron
Corporation in the USA, and a number of systems from Scandinavia.
These have fallen into two classes: (a) those where music is played though headphones and the vibration in the chair or couch is unrelated
to the music (e.g. those designed or employed by Skille and Wigram); and (b) where the vibration is driven by the music itself.
Type (b) is of course the only type of interest for the perception of music by hearing impaired people. We believe it is to be preferred for
therapeutic use, other than for specific physical treatments designed for example to improve joint angles in people with cerebral palsy. (The
latter topic is not covered in this paper.)
Why is it then not more used? We believe this is because of the several technological problems which have to be solved to make this
possible in a truly satisfactory way. The solution to these will be described. Incidentally, here is then no conflict between the requirements
of designing a chair/couch for therapeutic use and for music perception; only the way of using it is different.
Before one can discuss the upper limit of pitch response one has to define the conditions. There are several mechanisms in the body
though which mechanical vibrations may be transduced, giving nerve impulses which can be received by the brain. Some of these
mechanisms operate best at around low frequencies, eg 30 Hz; the one capable of best transducing the higher vibrational frequencies,
up to a maximum of about 950 Hz, is through the Picinian corpuscles [4] and [5]. However the energy required to obtain a response
increases very rapidly with increasing frequency. Hence if extremely high powers are used at the frequencies in question, then the
result is a volume of sound as a by-product which would be considered unacceptable by most people. Indeed almost any design could
be pressed into giving a higher frequency response if the resultant volume of sound were not a problem. Further as well as excessive
volume of sound, it would be a sound which is distorted and unpleasant to the ear irrespective of its volume.
Hence we will define the response as that obtained using no more than 10 watts of electrical power supplied to the transducer at any
frequency in the range claimed. Most of the vibroacoustic chair/couches on the market are then capable of responding up to about the
note A3 (in the notation where C4 is middle C).
This is the fundamental reason why many give a sensation of little more than the rhythm and the bass at acceptable levels and quality
of sound produced as a by product of the vibration.
A major part of our work has been the design of a suitable acoustic coupling system. Our most recent design enables us to go up an
octave higher, to A4.
The details of this design, which is also helpful in obtaining the desirable type of sensation as described in 1., have to remain
confidential until patenting is complete.
This means that the tune, rather than just the bass of a small number of original pieces of music can now be perceived. But by using
arrangements of music, eg pieces written for say the violin but arranged for the 'Cello or Euphonium the repertoire of pieces can be
much enlarged. Further with modest amount of pitch shifting downwards, a reasonable number of original pieces are also brought
within range. It should be remembered that large amounts of pitch shifting, or transposing by more than an interval of a fourth,
usually gives a result which is musically unacceptable for reasons well understood in musical acoustics. However this of course
applies for listening through normal hearing and we deal with hearing through the ears separately as will be described later
3. To obtain an even response to different pitches over the useable range.
Electronic equalisation can be applied but for this to be successful the acoustic design should be such that there are no sharp
peaks and troughs in the frequency response of the transducers as combined with their coupling system to the body. (An
exactly similar requirement is found in the design of loudspeaker drivers /loudspeaker enclosures for Hi-fi use.)
Having achieved this equalisation can be applied to the signals sent to the power amplifiers so that a sensation of vibration at
the different frequencies in the range is judged subjectively as equal, or rather of a level corresponding to approximate equality
when listening to sounds of the same pitch.
With variations in degrees of equalisation required which change quite rapidly, albiet smoothly, with frequency, it is
indispensable to use 1/3 octave equalisation. This can be achieved conveniently using a commercially available 30 band
graphic equaliser, such as the Alesis model M-EQ230. The table below shows a typical equaliser setting for our system, this
being a combination of the relative sensitivity of the body and the particular design of the acoustic coupling and of the
transducer.
Frequency (Hz) 25 31 40 50 62 80 100 125 160 200 250 320 400 500 640 800 Onwards
Gain/Atten. (dB) 0 -6 -8 -9 -10 -11 -12 -12 -11 -10 -4 4 10 12 -2 -12 -12
Why not apply even more equalisation - eg by using two such graphic equalisers in series? Because as discussed in 2., the
volume of sound as produced as a by-product together with its unpleasantly distorted nature, should prohibit it.
4. To carry out amplitude compression.
The volume of sound of symphony music, live at a classical concert, might range from 35 dB HL to 115 dB HL. While the ear can
cope with such very large range, the tactile sense certainly can not and very considerable compression would be necessary - so that
the range of volume variation is within 10 dB and ideally 6 dB.
In "pop" music the range is very much smaller and sometimes it is in already amplitude compressed - reducing further the already
small range. Sometimes little or no additional compression is necessary.
There is no technical difficulty in carrying out the compression; a commercial dynamic range compressor such as the Behringer
model MDX 1200 or the Alesis model 3630 is suitable for the purpose. If very great compression is required it may be necessary to
use two compressors in series with appropriate settings on each.
A note on 3. & 4.
An alternative to the use of a 1/3 octave equaliser and a dynamic range compressor (on each separate channel - and there may be up
to 4) is to produce a special recording on tape with this processing already done.
5. To have an adequate means of conveying pitch.
When perception via vibration rather than a pleasing accompanying sensation to hearing is desired, just 2 or 3 monophonic parts is
ideal, particularly to start with. Later as the person builds up some experience of vibratory perception, they can perhaps go on to up to
4 parts later, but we think not beyond that number. The use of music with this restriction on the number of parts is strongly
recommended for profoundly deaf people.
For therapeutic use this requirement is not necessary, which is fortunate as this would be too restrictive of the repertoire of music for
this purpose. But some reduction of original music is often useful where the music is complex and this may be done by the selection
of suitable arrangements of music.
8. Separation of parts.
The ear/brain is able to separate the sound of instruments wonderfully well. Thus the sound of a prominent oboe part in a symphony
orchestra can be picked out against a background of perhaps 80 instruments playing at the same time. Moreover this is a task that
even a room full of computers is unable to accomplish.
However, the body through the sense of touch, certainly cannot do this! Hence we believe it is very important: to simplify the body's
perceptual task, and we do this by splitting up the parts and sending them to different parts of the body. For example suppose we
have a trio arranged for baritone saxophone, trombone and bass guitar. We might place the vibration generated by the saxophone on
We do not have the skills or knowledge to achieve the above; accordingly we begin with what we believe to be the best of the
existing commercial products, the Somatron range of vibroacoustic chairs/couches/beds, which as well as being good
vibroacoustically, meets all the above criteria very well. We then either replace the actual vibratory drive units with those of
our own design, or modify the Somatron drive units to our own design.
References
[1] Skille, O. (1992) Vibro Acoustic Therapy : Manual & Reports. Levanger, Norway. ISVA
[2] Wigram, A.L. (1996) The Effects of Vibro Acoustic Therapy on Clinical and Non-Clinical populations. PhD Thesis.
www.members.tripod.com/~quadrillo/VAT/tonyphd.htm
[3] Summers, I.R. (Ed.) (1992) Tactile Aids for the Hearing impaired. London: Whurr Publishers Ltd
[4] Bolanowski, S.R. et all (1988) Four channels mediate the mechanical aspects of touch. J Acous Soc AM 84: 1680-1694.
[5] Varillo, R.T. et all (2000) Some basics of tactile sensation: temporal and spatial considerations. ISAC'00 Univ. of Exeter.
[6] Dalgarno, G. (1989) A computer based system for music for hearing impaired people". Proceedings of the Second National Conference
Acknowledgements
To Professor John A Sloboda for much advice, support and encouragement.
To the musicians who created many of the "doubled part music" as described in 8. were Karen Twitchett, Professor John A
Sloboda and Dr Steve Roberts.
To the support staff, particularly Chris Woods, in the Dept. of Psychology at Keele for much practical help.
To Somatron Inc for help with the supply of equipment and for substantial technical advice.
The work has been possible through grants from the following:-
The Orpheus Trust, The Norman Collinson Trust, The Arts Council of England, The Sport and Art Foundation, The National
Lottery Charities Board, whose financial support is gratefully acknowledged
Back to index
Proceedings paper
1. ANTÉCÉDENTS
L'habilité de distinguer des événements sonores dans l'environnement constitue une des tâches perceptives propres de l'adaptation d'un être vivant à son milieu. La variété et la
richesse des répertoires de sons que les membres de différentes espèces emploient pour communiquer, montrent le développement des habilités de discrimination auditive de
différents degrés de subtilité.
Cependant, la structuration des sons dans des systèmes catégoriels ou la construction de discours qui puissent être compris en termes musicaux et partagés à partir d'une
perspective esthétique, suppose une tâche d'une plus grande envergure, reservée en principe, à l'être humain.
Ainsi que l'homme construit des schèmas classificatoires de toutes sortes qui lui permettent d'ordonner les faits et les phénomènes pour mieux comprendre le monde, la
connaissance des sons dans des contextes musicaux exige l'utilisation des principes et des catégories qui permettent de mieux comprendre la musique et de trouver le plaisir de
l'écouter.
La littérature concernant la formation de concepts fait allusion aux procès d'abstraction, aux critères de groupement selon des propriétés en commun ou aux systèmes de
classification et catégorisation.
La construction de concepts occupe une place préponderante dans la recherche psychologique visant à l'étude des processus cognitifs. Ces questions-là se rattachent à des
problématiques propres de la mémoire (la reconnaissance), de l'attention et de la représentation. En ce qui concerne la formation de concepts on a proposé des modèles et des
hypothèses diverses qui s'appuient sur des principes solides des théories associationnistes, des schèmas, des prototypes, ainsi que des «modèles organisateurs» (Moreno Marimón,
1998). Il est possible ainsi de disposer d'un nombre important de ressources et de tests de différente nature pour explorer la formation de concepts. Il est plus difficile de trouver
des épreuves semblables pour étudier la formation de concepts dans le champ de la musique.
Dans cette étude on présente un instrument spécialement conçu pour explorer la formation et le développement des concepts dans le champ du son musical nommé Test
d'Attributs du Son (TAS). D'après ce qu'on sait, il n'existe pas dans la spécialité, des instruments ou des épreuves spécifiques à ce propos (Madsen. C. 1999).
Le dessin et la production du TAS se développent dans le « Programa de Incentivo al Docente-Investigador » de l'Université National de La Plata, Buenos Aires - República
Argentina.
2. OBJECTIFS
Le propos de ce test est l'analyse de certains processus engagés dans la formation de concepts musicaux. Pour mieux comprendre les mécanismes impliqués dans le processus de
conceptualisation, on considèrent comme représentatives les actions suivantes :
a. explorer des similitudes et des différences parmi les sons ;
b. abstraire les traits communs des sons perçus ;
c. trouver un principe qui puisse lier les traits abstraits ;
d. déployer des stratégies / procédés heuristiques pour résoudre des problèmes de rapprochement et relation entre les sons ;
e. transcoder avec des mots les traits du son perçu (nommer les attributs, décrire des similarités ou des différences, utiliser des métaphores, etc.) ;
f. activer l'utilisation d'autres manières de représentation pour communiquer des caractéristiques de l'attribut perçu quand on n'arrive pas à l'exprimer à travers les mots
(c'est-à-dire, faire des grimaces ou des mimiques, utiliser l'imitation vocale, etc.).
Un autre but de l'étude est de vérifier si ces habilités diffèrent parmi les auditeurs musiciens et non musiciens.
● couleur ou timbre instrumental ( un piano, une guitare, etc. ; des sons produits par un objet en bois ou en métal; des caractéristiques à propos du brillant ou du sourd, etc.
);
Pour établir cette sélection, on a considéré les critères suivants :
a. le besoin de respecter la combinatoire entre quatre attributs ; et
b. l'utilisation de sons musicaux qui puissent être connus des auditeurs musiciens et non musiciens d'âges différents.
Pour un premier contrôle, on a soummis à des experts le degré de différenciation des attributs du son à inclure dans l'épreuve. Sur la base de ces appréciations on a mis au point la
sélection originelle.
● quatre quadrants où l'on peut classifier les sons, chacun avec une couleur différente.
Les sphères sont identiques dans le but de centrer l'attention du sujet sur les stimulus auditifs. On suppose que grâce à cette présentation visuelle les sons acquièrent du corps. Ils
deviennent donc « maniables » : le sujet opère avec la souris sur les sphères qui
● sonnent, quand t-on les signale avec le curseur-main et on clique avec le bouton droit de la souris ;
● changent de place, quand on les entraîne avec le curseur-main, en appuyant le bouton gauche et en déplaçant la souris.
Les sphères peuvent se distribuer sur n'importe quel lieu des quadrants ou retourner au point de départ en les entraînant et en les lâchant sur la spirale du cercle central. Ainsi, il
est possible de grouper les sons, de les séparer, de les écouter en ordre différent, etc.
Les trigrammes qui représentent les catégories des sons sont cachés. Il est possible de les voir quand l'examinateur, depuis le clavier, active certaines commandes pour permettre
le feed-back.
Lorsqu'on considère que le sujet a compris la nature de la tâche, l'examinateur lui fournit l'aide initiale : quand il appuie la touche , une sphère se déplace vers le quadrant
jaune où elle reste fixe et avec le trigramme visible. Cette sphère ne pourra pas être déplacée par le sujet, qui ne pourra que l'écouter.
À partir de ce moment le sujet actionne les sons sans aucune indication : il dispose de tout le temps nécessaire pour essayer de différentes manières de grouper les sons et pour
résoudre la tâche. Ainsi, il écoute, il compare les sons entre eux et/ou avec le son de l'aide initiale. Après, en faisant tous les essais nécessaires, les placera dans la catégorie à
laquelle, selon son jugement, ils appartiennent.
Lorsque le sujet a accompli la première tentative de groupement, l'examinateur lui en demande l'explication. Si le groupement a été erroné, l'examinateur, au moyen de la touche
, montre l'étiquette d'un son qui a été mal catégorisé. Tout de suite, au moyen de la touche et avec le bouton droit de la souris, l'examinateur rend visible le trigramme de
file:///g|/demos/Furno.htm (5 of 14) [18/07/2000 00:30:38]
CONCEPTS ET CATEGORISATION DANS LE CHAMP DU SON MUSICAL : LE TAS (TEST D'ATTRIBUTS DU SON)
ce son-là. Le sujet, dispose alors de deux aides qu'il peut utiliser pour modifier le critère de recherche (changement d'hypothèse).
Ces actions, se répètent jusqu'à ce que le sujet :
a. trouve la solution (le critère selon lequel il est possible classifier les 22 sons en quatre groupes),
b. insiste pour considérer la réponse trouvée comme réussie (malgré les aides reçues) ou
c. abandonne la tâche.
Lorsqu'on arrive à la fin du test, l'examinateur utilise la touche pour rendre visibles tous les trigrammes et pour confirmer de cette manière, la correction de la réponse.
Finalement, l'examinateur appuie sur le bouton « Sortir » pour finir la tâche.
● les comparaisons entre deux sons ne peuvent se réaliser que de façon successive ; ce qui exige de retenir le premier son pour le comparer avec le second ;
● pendant l'émission des stimuli, une alerte soutenue est indispensable. Si le sujet oubliait les sons, il devrait les écouter de nouveau ;
● le temps de contact avec le stimuli est prédéterminé et il ne peut pas être réglé par le sujet.
● deux étudiants universitaires, non musiciens, produisant un rapport verbal simultané, (pensée à haute voix)
4. RÉSULTATS
Pendant l'administration du TAS on enregistre le comportement complet du sujet ( c'est-à-dire, on essaie de recueillir « tout ce qu'il fait et tout ce qu'il dit» ). C'est pour cela que,
les données obtenues proviennent de trois sources : l'information automatisée stockée par le logiciel, le protocole verbal enregistré par le magnétophone et les observations
recueillies par l'examinateur.
On a élaboré un premier schéma initial pour catégoriser les réponses ; il se fonde sur le degré de correction et d'efficacité.
Selon le degré de correction, les réponses peuvent être catégorisées en :
A.- Réponses totalement correctes. Ce sont celles qui présentent
i. une sélection des sons partageant deux attributs critiques ;
ii. justification de la sélection (description des deux attributs).
B.- Réponses partiellement correctes. Ce sont celles qui présentent
iii. une sélection des sons partageant un attribut ;
iv. justification de la sélection (description de l'attribut partagé).
C.- Réponses incorrectes. Ce sont celles qui présentent
v. Une sélection des sons sans attributs partagés.
Le degré d'efficacité de la réponse, dans toutes les catégories, peut être estimé selon
❍ le nombre d'essais nécessités (quantité d'aides) ; et
● Les adultes non musiciens ont fait attention, tout d'abord, au timbre instrumental les rapportant à des sources sonores connues.
● Les aides additionnelles prévues, ont permis aux sujets de considérer le rapport entre deux attributs.
● La restriction dans la disponibilité d'un vocabulaire spécifique s'est mis en évidence au moment de décrire les attibuts. Ces difficultés ¿sont-t-elles perceptives,
discriminatives, ou des deux types... ?
● Les musiciens ont fait appel à une analyse détaillée qui a rendu difficile le groupement dans des catégories plus étendues.
● Les enfants de 6 et 9 ans ont atteint des solutions partielles : ils n'ont consideré qu'un attribut, par présomption celui le plus prégnant ou saillant.
● L'enfants de 9 ans a seulement consideré le timbre instrumental (de la même manière que les adultes non-musiciens).
● D'après l'étude préliminaire des protocoles, on peut penser que, chez les enfants aussi bien que chez les adultes, la pratique musicale préalable est une variable d'une grande
importance.
D'ailleurs, selon quelques tendances de réponses observées, certaines actions pourraient s'associer à l'emploi des stratégies, par exemple :
● profiter de l'information trouvée par hasard, réaction qui pourrait s'associer avec l'insight ;
● prendre un son comme une référence ( prototype ? ) -en le conservant, apparemment, dans la mémoire à court terme- et l'employer pour le confronter avec les autres.
● réécouter les sons -apparemment avec l'intention de faire le monitorage de sa propre action-.
● procéder dans certains cas, par essais et erreurs et dans d'autres, à travers un schéma ordonné -c'est à dire, ces sont des actions qu'illustrent des différents styles
d'approximation au problème-.
Quant aux deux formes du TAS, des données expérimentales préliminaires montrent que les réponses à la forme B ont résulté congruentes avec celles obtenues un mois
auparavant avec la forme A. Les réussites on été les mêmes quant au degré de correction des réponses, la quantité d'aides nécessitées et les commentaires émis.
5. DISCUSSION
On présentera l'information recueillie jusqu'au présent, organisée autour de :
1. l'utilité du logiciel, quant à la présentation et l'enregistrement des données expérimentales ;
2. le raisonnement à voix haute ( protocole verbale ) ;
3. les observations de l'examinateur ;
4. le protocole.
● les sujets, indépendamment leur âge, ont une tendance à percevoir la tâche comme un jeu plus que comme une situation d'examen. L'utilisation d'aides pour orienter la
réponse stimule la curiosité et prédispose au jeu. On dirait que la présentation ludique augmente l'attrait et réduit les effets de la fatigue. On a observé quelques
comportements ludiques tels que : laisser les sphères sur la spirale, les employer pour construire des figures, les grouper et les faire sonner tour de rôle en écoutant les
relations sonores résultantes. Cette caractéristique ludique oblige l'examinateur à déterminer dans chaque cas si le sujet est en train de résoudre la tâche où s'il ne fait que
jouer ;
● la centralisation de l'attention sur le stimulus auditif est favorisée par la posibilité d'isoler des sons d'instruments traditionnels en les dépouillant de leur apparence visuelle.
C'est-à-dire, la technologie permet
❍ d'émuler les sons instrumentaux avec un haute dégré de fidélité; -ainsi, le son digitalisé de la flûte rappelle le son d'une flûte réelle- ;
❍ de réduire l'association avec le stimulus visuel car les sons se présentent avec une apparence visuelle identique ; par exemple, le son de la flûte a non pas la forme
caractéristique de cet instrument mais celle d'une sphère ; de même celui de la trompette, de la mandoline ou ceux des autres timbres instrumentaux ;
● on peut réduire remarquablement le risque de divination de la réponse (Cronbach, L. J., 1998, p. 95) par rapport au MDS, parce que le logiciel a caché les trigrammes, sauf
dans le cas où l'examinateur actionne les commandes respectives pour les laisser visibles.
❍ localiser chaque son pendant toute l'épreuve et connaître leur place exacte dans les quadrants ;
● De même, il est possible d'obtenir des graphiques pour représenter la place des sons dans les quadrants.
● l'impatience et la nervosité dues à la difficulté pour décrire des questions relatives aux sons ;
Malgré la tâche additionnelle qu'implique la récupération et l'enregistrement du protocole verbale (Das-Kar-Parrila R. K. 1998) et des données complémentaires dans le protocole,
on considère qu'il s'agit d'une information d'une grande valeur difficilement remplaçable par des données automatisées.
6. BIBLIOGRAPHIE
Bregman, A. (1999) Auditory Scene Analysis. The Perceptual Organization of Sound. Massachusetts : The MIT Press.
Carlsen J. C. (1996) Las representaciones mentales en la música. Eufonía, 5, pp 67-79.
Cronbach, L. J. (1998) Fundamentos de los Test Psicológicos. Madrid : Biblioteca Nueva.
Chion M. (1983) Guide des Objets Sonores. Paris : Bouchet / Chastel.
Chion M. (1993) La Audiovisión. Introducción a un análisis conjunto de la imagen y el sonido. Buenos Aires : Paidós
Crowder, R. (1994) La mémoire auditive . En Mc Adams, S. St Bigand, E.: Penser les sons. Psychologie cognitive de la audition. Paris: Presses Universitaires de France. pp
123-156
Das J. P., Kar B. C., Parrila R. K. (1998) Planificación Cognitiva. Bases Psicológicas de la Conducta Inteligente. Buenos Aires : Paidós
Deutsch, D. (1982) The Psychology of Music. London: Academic Press.
Dowling, W. J., Hawood, D. L. (1986) Music Cognition. Orlando. Fl., Academic Press.
Ericsson, K. and Simon, H. (1993) Protocol Analysis . London: Cambridge, Massachusetts
Francès, R. (1958) La perception de la musique. Paris: VRIN.
Hargreaves D. J. (1986-1998) Música y desarrollo psicológico. Barcelona: Graò.
Howell, P., Cross I. and West, R. (1985) Musical Structure and Cognition, London: Academic Press.
Imberty, M. (1969) L'acquisition des structures tonales chez l'enfant. Paris: Klinksieck.
Kahneman, D. (1997) Atención y Esfuerzo. Madrid: Biblioteca Nueva. Psicología Universidad
Leal A. (1998) Los Cambios en el lenguaje. En Moreno M., Sastre G., Bovet M., Leal A. Conocimiento y cambio. Los modelos Organizadores en la construcción del
conocimiento. Buenos Aires : Paidós. pp 143-184
Madsen, C. (1999) Communication personnelle avec l'auteur.
Moreno M. (1998) La psicología cognitiva y los modelos mentales. En Moreno M., Sastre G., Bovet M., Leal A. Conocimiento y cambio. Los modelos Organizadores en la
Back to index
Proceedings abstract
sag2@is.nyu.edu
Background:
This project builds on a prior study of the general nature of renowned violin
teacher Dorothy Delay's pedagogical practice. The present investigation
attempts a deeper exploration of DeLay's knowledge base in order to describe
the ways in which basic performance technique and interpretive processes are
understood and externally represented, taught, and integrated into performance
discipline.
Aims:
This qualitative investigation follows the design of a single case study which
includes on-site observation; unstructured interviews; fieldnote, audiotape,
and videotape data collection; systematic data categorization and analysis; and
interpretation and theory development.
Results:
Conclusions:
Back to index
Proceedings abstract
Janet Underhill
junderhill@latinschool.org
Background:
Harmonic structure is the most difficult and least accessible element of music
to the listener of European Art Music, though, like the foundation of a
building, it underpins and structures the music. the appreciation of the
special skill of the great composer's ability to manipulate the harmony to
expressive effect is part of the richness of the listening experience.
Aims:
This presentation uses the mathematics of set theory, and an animated sequence
constructed by Macromedia Director, to clarify the web of harmonic structure in
the first movement of a piano sonata by franz Joseph Haydn.
Main contributions:
Implications
The basic tenets of set theory are straightforward and accessible to the non
mathematician. A knowledge of these basic principles provides a framework for
the experience of harmony. The visual element provided by the animated sequence
of intersecting sets offers insight into the special skill and genius of the
composer, and enhances the listening experience.
Back to index
Proceedings paper
Introduction
There are clearly two components in musical performance; (a) the skill of being able to play the
instrument and (b) the mental process of deciding upon a particular musical interpretation. Which of
these two factors will be most appreciated by an audience will depend on several factors, but tending
towards the first if it is technically demanding piece in which to play the correct notes with the
appropriate timing and to the second when most pianists of modest ability (eg Associated Board
Grade 6 level) could play it correctly from a purely technical point of view but not highly
expressively.
It is argued that these two skills can be separated, and that through the use of suitable computer
software and appropriate additional hardware interfaces, a person who is unable to use their hands
well, or even at all, can nevertheless create their own expressive performance as a recording. Once it
is recorded it is (stored) performance like any other.
There is no reason to believe that such disabled people are any more or less musical than anyone else,
and such a system would enable their musical ability to be creatively used and expressed, which
would otherwise be trapped within them.
Vistamusic is a system of software and ancillary hardware interfaces which we have developed for
this purpose. Currently it is for piano music only and its is capable of successfully tackling most types
of piano music. To extend this to all piano music and to some other instruments is a huge task and we
can only be tackled if major funding is made available
In addition, the system can be used equally by non-disabled people who cannot play the piano to
enable them to nevertheless create their own expressive performances of piano music as a recording.
In this case no additional hardware interfaces are necessary. While some may "sniff" at this, we
believe that this is perfectly valid creative activity in music. Just as with those who cannot use their
hands, provided the piece played is not technically demanding, the interest in the performance will be
in the interpretation, not in being able to play the notes. Clearly it would be inappropriate for either
type of user to play technically demanding pieces (except perhaps for their own private enjoyment).
For either it should be stressed that the credit for the expressive performance is entirely with the
person creating it. The computer system supplies no musicality. (This is by intent; although it would
be perfectly feasible to build some in we have chosen not to do so.) It is on the same basis as word
processor - one would not give the word processor the credit for the writing. Hence "knocking"
comment such as "its only pressing buttons" or "its only the computer that is doing it" are invalid, and
deny people, most cruelly for disabled uses, their due acknowledgement for musical creativity.
Design criteria
Our goal is for the result to be independent of manual abilities, depending only on the inherent
musicality of the person.
Accordingly our criteria for usability is:-
The system has to be equally usable by a person who can only press one key at a time
and who cannot press a key at a specified time or hold it down for specified length of
time once it has been depressed.
It is believed that this has been achieved in a way which in no way compromises usability for people
who are able bodied. In other words it is certainly not "a system for the disabled". By designing
software with such people in mind, rather than by attempting to add some facility afterwards, it has
been made equally accessible to all. This is a philosophy which the author would advocate in the
design of any computer system.
Further, the system should be designed to be as natural to use and as effort saving as possible - with
the minimum of key presses or other physical movement required to achieve a given musical result.
Surely this is something which surely benefits everyone, not just those with limited manual abilities.
not like to contemplate use for the purpose. That is even assuming that the hardware interfaces could
be optimised. (The latter is something which by no means always possible unless the owners of the
commercial software were willing to make changes themselves or willing to provide the source code
to enable others to make changes).
For these reasons we went ahead with the development
a section to operate on more than a single note or chord. With this one can put the cursor
anywhere in the piece and operate on the Duration Unit while leaving the Marked Section
unchanged.
3. Operations on a single note or chord.
One would wish to be able to change the characteristics of how any single note is played. Or
any single chord, or particular notes in chord, eg "all except the bass" or "all except the top
note".
4. Auditioning
Flexible and easy to use features for auditioning what had been done are of the greatest
importance.
The following are examples of some of the keys (or equivalent) which can be pressed to give
the specified result for auditioning purposes:-
• Play from the beginning until a "stop" key is pressed, otherwise to the end.
• Play the current Marked Section repeatedly until the "stop" key is pressed.
• First play the previous phrase, following on with the current phrase then return the cursor to
its previous position.
• Play from the beginning of the current phrase until "stop" is pressed and position the cursor at
the point where "stop" was pressed.
5. Summarising
The ability to hear what has been done and make modifications accordingly, in a quick and easy
way, is essential. It is vital to listen, make adjustment, and listen again. It is believed that it
would not be possible to produce a good performance of a piece of music without doing this -
and surely this fits with the essence of musicianship.
References
[1] Anderson, T.M. (1990) E-Scape: an extended sonic composition & performance
environment. Proc. of ICMC, Glasgow 1990.
[2] Anderson, T.M. & Smith, C. (1996) 'Composability': Widening participation in music
making for people with disabilities via music software & controller solutions. Proc. Of
ASSETS 96 (ACM/SIGCAPH)
[3] Hunt, A.D. & Kirk, P.R. (1994) MIDIGRID - a computer based musical instrument. J.of
Musical Instrument Technology, June 1994
[4] Dalgarno, G. (1991) A computer based system to enable people who cannot use their hands
well, or at all, to produce music with their own individual expression. Dalgarno, G. Proceedings
of the Institute of Acoustics, November 1991, pp 275-283
[5] Dalgarno, G. (1997) Creating an expressive performance without being able to play a
musical instrument". Brit. J. of Music Ed.(1997) 14:2 116-171.
[6] Repp, B.H. (1992) Diversity and commonality in music performance: An analysis of timing
and microstructure in Schumann's 'Träumerei'. J. Acoust. Soc. Am. 92 (5), Nov. 1992
Acknowledgements
To Professor John A Sloboda for much advice, support and encouragement.
To the support staff, particularly Chris Woods, in the Dept. of Psychology at Keele for practical help.
To Dr Richard Parncutt (formerly of the Dept of Psychology at Keele) for many helpful discussions.
To Andy Hunt, of the Dept of Electronics, University of York, for many stimulating discussion and
for technical advice on MIDI systems.
To Professor John Paynter, Dr David Kershaw, Mr Richard Orton and Mr Bruce Cole of the Dept. of
Music, University of York, for their help and encouragement.
The work has been possible through grants from the following:-
The Orpheus Trust, The Calouste Gulbenkian Foundation, The Paul Hamlyn Foundation, The
Radcliffe Trust, The Arts Council of England, The Sport and Art Foundation, The National Lottery
Charities Board, whose financial support is gratefully acknowledged.
_______________________
Back to index
Proceedings paper
Jörg Langner
Humboldt University of Berlin
Musikwissenschaftliches Seminar
Unter den Linden 6, D-10099 Berlin, Germany
Phone: +49-(0)30-20932065
Fax: +49-(0)30-20932183
E-mail: jllangner@aol.com
Reinhard Kopiez
(Music Conservatoire Hannover)
Christian Stoffel
Martin Wilz
(University of Cologne)
Introduction
Compared to research on timing, the field of musical dynamics is a neglected parameter in performance research. For example, despite a
focus on musical rhythm, timing and performance, the latest edition of Deutsch's (1999) survey over a whole discipline, The psychology
of music does not even contain a sub-chapter on dynamics. Other research literature is widely distributed. We cannot say whether this
situation is due to a lack of interest but would rather assume that it is due to a lack of adequate research methods, which prevents a deeper
understanding of the nature of dynamics. To sum up we can formulate some important research topics:
● Although the history of performance practice shows the increasingly important role of dynamic shaping for conveying expression
in music, we know only very little about the relationship between musical form and musical dynamics. Based on musical
experience we can say that e.g. a Bruckner-Symphony is unimaginable without the form-generating force of dynamics.
● The relationship between timing and dynamics and its importance for musical perception is unclear: they either exist in a
hierarchical relationship (e.g. with a dominance of timing over dynamics), or are of equal importance. In the first case we could
assume that dynamics have only a small effect on a global level and a greater effect on a more local level - this contradicts musical
experience; in the second case the problem of redundancy is evoked: why should we take care of a second expressive parameter
(dynamics), if expression is already mediated by the domain of timing?
● Which methods of analysis of dynamics and of obvious presentation of results are available? This question concerns the field of
performance analysis as well as that of educational application. Only a highly obvious and easy manageable analysis of dynamics
will be accepted by the majority of instrumental teachers. This means a special need for realtime methods of analysis and
presentation.
Some answers to the above mentioned questions can be found in the literature: as one of the founding authors, Riemann (1884) published
a treatise on musical phrasing which concentrated exclusively on the role of dynamics and rubato. His simple assumption was that
dynamics and rubato are coupled, and that the development of an eight bar long musical phrase is shaped simultaneously by a crescendo
and an accelerando until the climax of the phrase. This more global perspective of dynamics seems to be more plausible. Huron's (1991;
1992) perception theory of "ramp archetypes" fits well into this perspective. Huron (1991) calculated a mean length of 4.3 bars for
crescendi and of 5.8 bars for decrescendi using a sample of 537 works or movements of 14 composers with a total of 85476 bars. The
same idea of a simple coupling of the two parameters can be found in Todd's (1990) model of musical expression: "the faster the louder,
the slower the softer" (p. 3540). We don't believe in such a simple, rule-based relationship of parameters and assume that this perspective
meets only a part of musical reality. This approach allows only a very limited view. However, as Friberg (1991) tried to show, it is
possible to generate decent synthesized performances by use of such a rule-based system.
We would like to try a different approach: referring to the "Theory of oscillating systems" (TOS) by Langner (1999) we hypothesize that
the timing and dynamics of a performance are shaped on multiple levels, including local and global layers. Local layers concern the
dynamic shaping for example from note to note or from measure to measure; global layers on the other hand are connected to the
relationship of dynamics between larger sections or subsections of a piece of music. (Such multi-level structure can also be found in
other domains of performance; for the analysis of timing see Langner & Kopiez 1995). An adequate method of performance analysis
should preserve the full information contained in the performance data (without any reduction), and as Langner (1999, pp. 153-155;
Method
The procedure will now be described, based on the assumption that the piece of music to be analysed is available as complete audio file.
Following that is an outline of the modifications made when the analysis 'works through' the music step by step in realtime.
(a) Non-realtime procedure
The starting point for the procedure is the digitized audio signal. From this (step 1) the loudness curve of the piece is calculated - this
means that at regular intervals in the piece a loudness value is allocated in Sone units. For this purpose a particular computer programme
was used which was developed by Bernhard Feiten & Markus Spitzer (Technological University of Berlin) on commission from the
Hochschule für Musik und Theater Hannover (see also Langner, Kopiez & Feiten 1998, pp. 18-20). This programme is based on
Zwicker's 'Model of loudness' (Zwicker & Fastl 1990, pp. 197-214), which guarantees close proximity to the perceived loudness, and
produces superior quality to the simple use of decibel values.
This loudness curve (step 2) was then subjected to multiple 'smoothing out' processes of varied strength. This 'smoothing out' was
achieved through the inclusion, when measuring at a particular point of time, of not only the loudness value at exactly this point but also
the surrounding values, thus creating a mean measurement. This 'surround' of the point is also termed the 'window' for calculation. The
wider the calculation window, the stronger the smoothing out effect. If one were to take in an extreme case the length of the entire piece
of music on a window, there would be only one mean value for the whole piece, and the smoothing out would therefore be at its
maximum. In contrast, a very narrow window would produce a smoothing out very similar to the original loudness curve. There are
many interim steps between these extremes. Concrete graphic examples are to be found in Langner (1997). A strong 'smoothing out' of
the loudness curve only represents differences (that is deviation from a horizontal line) when the appropriate wide-ranging dynamic
shaping exists; precisely these varied strength smoothing out cases allow multi-layer analysis as mentioned in the introduction. Our
procedure uses a wide spectrum of various window sizes. The exact range can be selected within the programme. A frequently applied
setting contains 37 different 'windows', sized between 0.25 and 128 seconds (in logarithmic steps).
Finally (step 3) the gradients in every smoothing out curve are calculated at each point in time. The procedure is similar to that of the first
derivative in mathematics and physics; here though the gradients according to each separate window size for each smoothing out curve
were calculated. The effect is that the strongly smoothed out curves (which generally show much weaker fluctuation) can obtain just as
steep gradients as the more weakly smoothed out. There is a sense of 'equal treatment' under the contrasting smoothing realignments.
Following from this change to gradients, the analytical perspective changes from focusing upon loudness to focusing upon loudness
changes - that is from loud/soft to crescendo/decrescendo. (This change in perspective stood the test of previous analyses; the final
decision as to whether loudness or loudness changes are actually represented has not yet been reached).
The output (step 4) is produced in graph form showing the gradients referred to, in a so-called "Dynagram" (see fig. 1 and fig. 2). The
time axis shows the horizontal axis; the window size is represented on the vertical axis. Red colouring signifies crescendo (the more
intense the red the stronger the crescendo); green colouring then shows decrescendo (the more intense the green, the stronger the
decrescendo).
(b) Realtime procedure
During the realtime version, the audio signal is recorded through a microphone link to the computer. The procedure described above is
carried out in the same way. In order to create the Dynagram it must be considered that in calculating the smoothing out, a certain
surrounding area of a point in time has constantly to be included. One must almost "look into the future" - for the weaker smoothing out
to a lesser extent, for the stronger one to a greater extent. (Such "looking into the future" can be considered with retrospective
re-interpretation of what is heard and is from this point of view plausible). As far as the procedure is concerned, the following is relevant:
the Dynagram can only be calculated retrospectively - with inconsequential delay for the small window sizes, but with considerable delay
for the large window sizes. The data points of a realtime Dynagram thus appear on the screen not in vertical axis form, but approximately
as a diagonal line.
Results
Figures 1 and 2 show the Dynagram of both a professional and a non-professional performance of Erik Satie's Gymnopédie No.1. The
generally more intense loudness pattern from the professional pianist is noticeable. Particularly remarkable is also the greater intensity in
the larger window area; the shading in this area reveals correspondence of loudness organization to formal structure. The composition
namely is made of two identical parts, each of which again consists of two almost equal sized sections. (To demonstrate these structural
segments, the start of each section is marked in the upper horizontal frame of the Dynagram.The starting points of the two main parts are
black; the less strong formal devisions are in contrast coloured grey). It is clear from the Dynagram of the professional player's version
that each of the four formal sections is covered from a red-green pair in the window size area from 8 to 16 s, just as the two halves in the
vicinity of 32 s. Clearly the professional pianist is capable of pointedly marking the structure of the composition through control of
Fig. 1: Dynagram of a professional performance of Erik Satie's Gymnopédie No.1. The different colours have the following meaning:
intense red = strong crescendo, pale red = weak crescendo, white = constant loudness, pale green = weak decrescendo, intense green =
strong decrescendo. The dynamic shaping reflects clearly the formal structure of the composition (the formal breaks are marked in the
upper horizontal frame).
Fig. 2: Dynagram of a non-professional performance of Erik Satie's Gymnopédie No.1. The dynamic shaping is not as strong as in the
professional performance and reflects the formal structure of the composition less clearly.
Further examples fitting to the procedure showed the Dynagrams to be a way of making visible in particular more extensive loudness
shaping of a performance. The analysis of a recording of a movement a Bruckner Symphony (conducted by Günter Wand) for instance
revealed a build-up spanning some 20 minutes from start to the final climax of the piece.
References
Deutsch, D. (Ed.) (1999). The psychology of music. 2nd edition. New York: Academic Press.
Friberg, A. (1991). Generative rules for music performance: A formal description of a rule system. Computer Music Journal, 15(2),
56-71.
Huron, D. (1991). The ramp archetype: A score-based study on 14 piano composers. Psychology of Music, 19, 33-45.
Huron, D. (1992). The ramp archetype and the maintenance of passive auditory attention. Music Perception, 10(1), 83-92.
Langner, J. (1997). Multidimensional dynamic shaping. In A. Gabrielsson (Ed.), Proceedings of the third triennial ESCOM conference,
Uppsala, Sweden, 7-12 June, 713-718.
Langner, J. (1999). Musikalischer Rhythmus und Oszillation. Eine theoretische und empirische Erkundung. [Musical rhythm and
oscillation. A theoretical and empirical investigation]. Dissertation, Hochschule für Musik und Theater Hannover. (A printed version of
this dissertation, including a comprehensive abstract in English, will be published by Peter Lang Verlag, Frankfurt/Main in 2000 or
2001).
Langner, J. & Kopiez, R. (1995). Oscillations triggered by Schumann's "Träumerei": Towards a new method of performance analysis
based on a "Theory of oscillating systems" (TOS). In A. Friberg & J. Sundberg (Eds.), Proceedings of the KTH Symposium on Grammars
for music performance, Stockholm, May 27, 45-58.
Langner, J., Kopiez, R. & Feiten, B. (1998). Perception and representation of multiple tempo hierarchies in musical performance and
composition. In R. Kopiez & W. Auhagen (Eds.), Controlling creative processes in music (pp. 13-35). Frankfurt a.M.: P.Lang.
Riemann, H. (1884). Musikalische Dynamik und Agogik. [Musical dynamics and agogics]. Hamburg: Rather.
Todd, N.P. McAngus (1990). The dynamics of dynamics: A model of musical espression. Journal of the Acoustical Society of America,
91(6), 3540-3550.
Zwicker, E. & Fastl, H. (1990). Psychoacoustics. Berlin: Springer.
Back to index
Designed to be used in the accompaniment of improvisation, BIAB has several resources useful in
teaching composition. Students can experiment with harmonic progressions, enter melodies over
harmonies (with MIDI setup), or analyze a wide variety of styles. Teachers can also prepare cassette
tapes for students to practice composing melodies over. It is most useful with a MIDI keyboard, but
serviceable with Quick Time Instruments. Demonstration tracks of a variety of styles are included.
Learning curve of the program for student use suggests middle and high school students.
MicroLogic AV MIDI/Digital Audio Emagic $89
http://www.emagic.de/english/products/logicline/mlav.html
MicroLogic AV is the inexpensive, multifunctional entry level program of the Logic series. With up
to 16 audio tracks, easy real-time effects, the integrated stereo sample editor and virtual General MIDI
mixing consoles, MicroLogic AV gets the user familiar with desktop studio technology. Innovative
details such as the interactive real-time windows allows for easy use.
Print Music MIDI Coda $100 http://www.codamusic.com/
A subset of the the professional level music notation program, Finale 2000. This program offers
power and sophistication similar to Coda's Finale, but less staves and fewer options. Still offers the
same user interface and many of the options needed for most score preparation. Excellent value and
allows a student to "move up" to the professional-level Finale 2000 without having to learn a whole
new system.
Note: Other notation programs include: Overture 2 from Cakewalk, Finale 2000 from Coda; Encore
and MusicTime from Gvox, and the newest program: Sibelius
Back to index
Proceedings abstract
malloch@ozemail.com.au
Background:
Aims:
Implications
The implications for the understanding of Music Therapy are clear. Here we see
that musicality is at the very heart of human companionship and human sympathy
- thus, we argue that Music Therapy works by engaging the very foundations of
human sympathetic emotional exchange.
Back to index
Proccedings abstract
Background.
Aims.
To measure the vibrato rate and extent in real performances and examine the
results in the context of dynamics, timing and phrase structure among others.
Method.
Results.
Both rate and extent varied in a systematic way and the range of variation was
larger than the perceptual limits. Preliminary observations indicate that
vibrato extent is correlated with sound level and that the rate is higher for
shorter tones and usually increases at the end of tones (cf. the previously
repported "vibrato tails").
Conclusions.
The systematic variation of rate and extent confirms that the vibrato is used
for expressive purposes. The results indicate that vibrato is related to other
performance variables as mentioned above.
Back to index
Proceedings paper
1. Introduction
One significant component of musical understanding is the ability of listeners to cluster together musical materials into categories such as motives,
themes and so on. Salient musical features enable listeners to make similarity judgements between various musical materials and to organise these
materials in meaningful groups. It is maintained, in this study, that feature salience, similarity judgements and categorisation processes are
inextricably bound together in a way that each of these can be defined only in relation to the rest.
Based on a number of definitions for the above notions the Unscramble clustering algorithm has been developed. Given a segmentation of a
melodic surface and an initial representation of each segment in terms of a number of attributes (these reflect melodic and rhythmic aspects of the
segment at the surface and at various abstract levels), the Unscramble algorithm organises these segments into 'meaningful' categories. The
proposed clustering algorithm automatically determines an appropriate number of clusters and also the characteristic (or defining) attributes of
each category. There has been a limited number of attempts to use clustering techniques for organising melodic segments into motivic categories.
A brief survey and comparison of some existing formal models is presented in (Hoetheker et al., 2000).
A number of psychological studies have attempted to examine the notions of melodic similarity and cue abstraction using real melodic material
(e.g. Pollard-Gott, 1983; Carterette et al., 1986; Lamont and Dibben, 1997). The most extended studies however have been performed by I.
Deliège - see overviews in (Deliège, 1997; Deliège and Mélen, 1997) - wherein issues of feature salience (cue abstraction), musical similarity and
prototypical description of categories (imprint formation) in musical listening are empirically examined.
It is interesting to compare the performance of a computational model against the results given in empirical studies. A computational approach
requires explicit representations of the musical materials and detailed formal descriptions of similarity and categorisation processes. The various
processes can thus be traced and analysed step-by-step in a way that usually is not possible in empirical studies.
This study attempts to replicate, by means of computational modeling, two psychological experiments on cue abstraction and categorisation
performed on a monophonic piece by J.S.Bach (Deliège,1996; 1997). The results of the computational approach are compared to the empirical
results, and convergences and deviations are reported. The clusters produced by the algorithm correspond closely to the categories provided in the
empirical study. The application of the algorithm confirms most of the suggestions presented in the psychological studies regarding which cues
play a most significant role in categorisation tasks.
In the following sections, initially the concepts of similarity and categorisation will be discussed. Then, the Unscramble algorithm will be
described. Finally, results of the application of the algorithm on motivic segments of J.S.Bach's Allegro Assai, Finale of the Sonata for Solo Violin
in C major BWV 1005 will be presented and various interesting aspects of the computational experiment will be discussed.
2 Similarity and Categorisation
A commonly encountered hypothesis on which many categorisation models are grounded is that categorisation is strongly associated with the
notion of similarity, i.e. similar entities tend to be grouped together into categories.
However, there are different views on the relation between similarity and categorisation (Goldstone et al., 1994; Medin et al., 1993). On the one
hand, similarity is considered to be too flexible and unwieldy to form a basis for categorisation, i.e. any two entities may be viewed as being
similar in some respect (e.g. a car and a canary are similar in that both weigh less than 10 tons, but these objects are not normally considered to be
members of the same category!). On the other hand, similarity is regarded to be too narrow and restricting to account for the variety of human
categories (e.g. a whale is more similar to other fish but we still consider it to be a mammal). Goodman (1972) doesn't hesitate to call similarity 'a
pretender, an impostor, a quack' (p.437). Rips (1989) claims that "there are factors that affect categorisation but not similarity and other factors that
affect similarity but not categorisation. ...there is a 'double dissociation' between categorisation and similarity, proving that one cannot be reduced
to the other" (p.23).
The above debate is directly linked to a further issue; that is how entities and their properties are represented. If objects are described in terms of
mainly perceptual (e.g. visual or auditory) properties, then, obviously similarity is insufficient for many categorisation tasks, whereas, if any sort
of properties - perceptual or abstract or relational - are considered then similarity becomes too flexible.
It seems that the notions of categorisation, similarity and the representation of entities/properties are strongly inter-related. It is not simply the case
that one starts with an accurate description of entities and properties, then finds pairwise similarities between them and, finally, groups the most
In other words, two entities are similar if the distance between them is smaller than a given threshold and dissimilar if the distance is larger than
this threshold.
The above definition of similarity is brought into a close relation with a notion of category. That is, within a given set of entities T, for a set of
properties P and a distance threshold h, a category Ck is a maximal set:
In other words, a category Ck consists of a maximal set of entities that are pairwise similar to each other for a given threshold h.
A category, thus, is inextricably bound to the notion of similarity; all the members of a category are necessarily similar and a maximal set of
similar entities defines a category. According to definition I, similarity is not merely the inverse of distance, but additionally requires a threshold
that can be determined in relation to a specific categorisation description for a given context.
As the similarity function sh is not transitive, the resulting categories need not be disjoint (i.e. equivalence classes). In other words, overlap
d(x,y)= Σ w ·w
pi qi ·δ (pi,qi)
where: δ (pi,qi) = 0 if pi=qi and δ (pi,qi) = 1 if pi≠qi (III)
i=1
Step 2. For each of these thresholds, all the similar objects are computed according to definition (I) and (III) and an undirected graph for each
threshold is created where edges connect similar objects.
Step 3. All the maximal cliques (II) are computed for each of these graphs, resulting in l different clusterings.
Step 4. For each of the l clusterings a 'goodness' value is calculated according to a 'goodness' function (section 2.3.2.1)
Step 5. The clustering that rates highest according to the 'goodness' function is selected and new weights are calculated according to function (IV).
Step 6. The algorithm is repeated from step 1 for the new weights.
Step 7. The algorithm terminates when the newly calculated 'goodness' value is less or equal to the value that resulted during the immediately
preceding run.
3.2.3 Additional fundamentals
The following definitions are also necessary for the algorithm:
3.2.3.1 'Goodness' of clustering
As the Unscramble algorithm generates a large number of clusterings (one for each possible similarity threshold) it is necessary to define some
measure of 'goodness' for each clustering so as to select the best. Two such measures have been considered:
a) Overlap Function. This simple function provides a measure for the degree by which clusters overlap; the less overlapping between clusters the
better.
b) Category Utility. This function favours categorisations with high uniformity (in terms of properties) within individual clusters ('intra-class
similarity') and strong differences between clusters ('inter-class dissimilarity'). Another way of interpreting this is that category utility measures the
prediction potential of a categorisation: it favours clusterings where it is easy to predict the properties of an entity, given that one knows which
cluster it belongs to, and vice versa (Gluck and Corter, 1985)
In the experiments reported in section 4, category utility has been used. The main advantages of this measure are its firm grounding in statistics, its
intuitive semantics, and the fact that it does not depend on any parameters. These measures are discussed in more detail in (Cambouropoulos et al.,
1999).
3.2.3.2 Weighting function.
When a clustering is selected, then the initial weights of properties can be altered in relation to their 'diagnosticity', i.e. properties that are unique to
members of one category are given higher weights whereas properties that are shared by members of one category and its complement are
attenuated. A function that calculates the weight of a single property p could be:
w = m/n-m'/(N-n) where: (IV)
R1: diatonic pitch, contour and duration patterns for full-length motive (3 attributes)
R2: diatonic pitch, contour and duration patterns for full-length motive, for cell a and cell b (9 attributes)
R3: same as R2 only cells of motive 3 are reversed
R4: same as R3 plus 3 extra statistical attributes: leap, rep, cdir that are meant to reflect melodic properties such as 'smoothness', 'repetitiveness'
Even though experiment II is more difficult to replicate and the above computational experiment is a gross simplification, it is still very interesting
to see that Unscramble is capable of inducing descriptions of the emerging categories that can be used successfully both to classify correctly
UNHEARD instances and to exclude MODIFIED motives that don't fit in these categories for the particular stylistic context (mainly because of
the rhythmic differences).
5 Conclusions
In this paper a computational model for melodic clustering and membership prediction has been presented. An attempt was made to apply this
model on melodic data used in two psychological experiments. Despite the simplifications for the needs of the computational experimentation, it is
still clear that the results obtained by the application of the proposed algorithm support the underlying hypotheses of the empirical studies on cue
abstraction, imprint formation and categorisation - e.g. from various of different attributes Unscramble abstracted a number of cues that were
appropriate for the specific categorisation tasks, organised the given melodic segments into plausible categories and successfully categorised new
melodic material into the previously determined motivic groups.
It would be very interesting to attempt a more sophisticated computational replication of the aforementioned experiments. Ideally, the system
should be able to break down automatically the musical surface into meaningful segments, then construct sophisticated representations for each
segment and finally organise these into motivic categories (preliminary such attempts have been made by the author on other melodic data - a
robust model however has not as yet been achieved). Such computational experiments are interesting as the various stages of the analytic process
are transparent to the researcher and the initial hypotheses can be systematically studied.
Acknowledgements
This research is part of the project Y99-INF, sponsored by the Austrian Federal Ministry of Education, Science, and Culture in the form of a
START Research Prize. The Austrian Research Institute for Artificial Intelligence is supported by the Austrian Federal Ministry of Education,
Science, and Culture.
References
Cambouropoulos, E. (1998) Towards a General Computational Theory of Musical Structure. Ph.D. Thesis. University of Edinburgh, U.K.
Cambouropoulos, E., Smaill, A. and Widmer, G. (1999) A Clustering Algorithm for Melodic Analysis. In Proceedings of the Diderot'99
Forum on Mathematics and Music, Vienna, Austria.
Cambouropoulos, E. and Widmer, G. (2000a) Melodic Clustering: Motivic Analysis of Schumann's Träumerei. In Proceedings of the III
Journées d' Informatique Musicale, Bordeaux, France.
Cambouropoulos, E., Crawford, T. and Iliopoulos, C.S. (2000b) Pattern Processing in Melodic Sequences: Challenges, Caveats and
Prospects. Computers and the Humanities, 34:4 (forthcoming).
Carterette, E.C., Hohl, D.V. and Pitt, M.A. (1986) Similarities Among Transformed Melodies: The Abstraction of Invariants. Music
Perception, 3(4):393-410.
Deliège, I. (1997) Similarity in Processes of Categorisation: Imprint Formation as a Prototype Effect in Music Listening. In Proceedings of
the Interdisciplinary Workshop on Similarity and Categorisation, University of Edinburgh, U.K.
Deliège, I. (1996) Cue Abstraction as a Component of Categorisation Processes in Music Listening. Psychology of Music, 24:131-156.
Deliège, I. and Mélen, M. (1997) Cue Abstraction in the Representation of Musical Form. In Perception and Cognition of Music. I. Deliège
and J. Sloboda (eds), Psychology Press Ltd, Hove, U.K.
Gluck, M.A., and Corter, J.E. (1985) Information, Uncertainty, and the Utility of Categories. In Proceedings of the Seventh Annual
Conference of the Cognitive Science Society, Lawrence Erlbaum Associates, Irvine (Ca).
Goldstone, R.L., Medin, D.L. and Gentner, D. (1991) Relational Similarity and the Non-independence of Features in Similarity Judjements.
Cognitive Psychology, 23:222-262.
Goodman, N. (1972) Seven Strictures on Similarity. In Project and Problems, by N. Goodman, The Bobbs-Merrill Comany, Inc., New
York.
Hoetheker K., Hoernel D. and Anagnostopoulou C. (2000) Investigating the Influence of Representations and Algorithms in Music
Classification. Computers and the Humanities, 34:4 (forthcoming).
Lamont, A. and Dibben, N. (1997) Perceived Similarity of Musical Motifs: An Exploratory Study. In Proceedings of the Interdisciplinary
Workshop on Similarity and Categorisation, University of Edinburgh, Edinburgh.
Medin, D.L., Goldstone, R.L. and Gentner, D. (1993) Respects for Similarity. Psychological Review, 100(2):254-278.
Pollard-Gott, L. (1983) Emergence of Thematic Concepts in Repeated Listening to Music. Cognitive Psychology, 15:66-94.
Rips, L.J. (1989) Similarity, Typicality and Categorisation. In Similarity and Analogical Reasoning. S. Vosniadou and A. Ortony (eds),
Cambridge University Press, Cambridge.
Tversky, A. (1977) Features of Similarity. Psychological Review, 84(4):327-352.
Back to index
Proceedings paper
Irène Deliège
University of Liège
URPM - CRFMW
Department of Arts and Sciences of Music
In the introductory theoretical talk of this symposium, the different aspects of the
SIMILARITY-DIFFERENCE model were sketched briefly. Other approaches then addressed certain
particular aspects and investigated empirically the rôle of cue abstraction in the model in the
processes of segmentation, the organisation of memory and the categorisation of musical events
during listening. The present research is also concerned with the rôle of cue abstraction, but, more
precisely, with the effect of prototypicality-which I have called the formation of imprints-that is, the
result of insistent use by the composer of the same cue in more or less varied forms.
Recently, during a meeting that Pierre Boulez devoted to the analysis of his latest work Sur Incises
(filmed presentation, November 1999), it occurred to me that the two fundamental elements of the
model-the principles of SIMILARITY andDIFFERENCE and the mechanism of cue
abstraction-expressed certain processes developed by the composer in the production of his work.
Indeed, Pierre Boulez stressed the primary rôle played in his compositional technique of very short
basic figures-he labelled them according to their audible effect: the seed, the slap in the face etc.-and
showed how these cells were later exploited, developed and varied while still remaining recognizable.
He emphasized that it was necessary for the listener to pick out these figures if he or she was to be
able to keep track of the intended meaning of the composition.
The experimental results that have been presented have demonstrated the different effects of
SIMILARITY and DIFFERENCE, together with the effects of abstracted cues in the development of
the mental schema of a work. For all that, we cannot assume that the listener is in a position to
memorise exactly the various operations that have been performed on the the "primary cells"-the
cues-during the process of composition. In fact-and it is worth emphasising this point-memory
simplifies the global information and effectively finds "statistical means"-the imprints-that retain the
essential information about a collection of more or less similar presentations of a given cue.
In this way, the most important feature of the SIMILARITY-DIFFERENCE model is the idea of
similarity as a principal axis in the real-time processes of categorisation that occur during listening
and its organisation around some central examplar of a given cue-the prototype. Intuitively, this
argument seems plausible in the case of music perception. It formed the basis of a number of
theoretical and empirical studies in the field of categorisation of the environment carried out during
the seventies including the work of Rosch which we have referred to above in connection with our
own research.
At around the same time, the so-called "exemplar models" (see in particular Medin and Schaffer
1978), also involved the idea of similarity but gave less emphasis to the idea of a central prototypical
tendancy and recognized the important influence of specific traits of the different "exemplars" of a
given category which were supposed to be saved in memory. These approaches seem at first sight to
be close to each other and moreover seem not to contradict each other. According to Barsalou (1990),
it is even hard to distinguish between them on an empirical level. On the other hand, certain schools of
thought, perhaps too quickly influenced by a statement by the philosopher Nelson Goodman, cited
many times in this kind of context, which presented the concept of similarity as "a pretender, an
impostor, a quack" (1972, p.473) on the grounds that it was too vague and elastic, have developed
other theoretical arguments, demolishing the principal rôle that had been accorded up to that point to
similarity in categorisation and, consequently, the rôle of the prototype.
To put it briefly, we are talking here about the ad hoc models (Barsalou, 1983) that suggest that other
categorical forms could be developed on quite different principles, in particular, on the basis of
particular circumstances-for example, the collection of things that one gathers together to be taken on
a journey, or to make a meal, etc. To that extent, these models seem to be closely linked with Schank
and Abelson's (1977) idea of scripts and Minsky's (1975) idea of frames. About half a century before
that, Vigotsky and Luria (in Wertsch, 1985) distinguished between contextualised and
decontextualised categorisation behaviour that was strongly influenced by the educational background
of the individual. They noted, in particular, that the illiterate peasants of Uzbekistan grouped objects
only according to use or activities performed in the context of their daily life. These authors
emphasised that this type of behaviour becomes blurred as soon as literacy intervenes and disappears
completely after one or two years of education giving way to a process that was more guided by
abstract principles.
We would therefore recognise the existence of a plurality of categorisation strategies, adapted to the
circumstances and requirements of the subject which initiate grouping processes that are either
essentially functional where the context and thematic aspects are more important than relationships of
similarity; or taxonomic where the notion of similarity is predominant. We have proposed that the
processes of categorisation that occur in real-time during listening belong to the latter category.
Moreover, it would seem that this mode of categorisation could be considered relatively "universal," it
being found in a variety of cultural environments. It is also this mode of categorisation that is chosen
naturally by a child of about 5 years of age: it has been noticed that at this age, classifications operate
on the principle of "global similarity" rather than being a function of particular attributes (Smith,
1981; Keil, 1987). These various reports have given rise in recent times to a resurgence of interest in
the idea of similarity on the part of a number of authors who are today, once again, pleading in favour
of this point of view (Goldstone 1995; Hampton 1997). Not so long ago, James Hampton expressed
this idea in unequivocal terms as follows:
for everyday purposes we are content to continue putting together things that are
(superficially or deeply) similar. After all, such a system serves us perfectly well for most
daily purposes. (1997, p. 109)
see Cambouropoulos, this symposium) and to approach the possible effect of new experimental
variables on the imprint formation process. Therefore new conditions were introduced allowing to
investigate not only the role of musical training (T), but also the effect of (i) familiarisation (F) with
the musical context, (ii) length of the experimental sequences, and (iii) musical parameter concerned,
i.e. rhythmic vs pitch errors, introduced in the modified sequences. These aspects will be analyzed
under the factor EXPERIMENT (E).
As already explained, an imprint is considered to be an average value of the characteristics of a
category. Thus sequences which exhibit too distant features in comparison to this central value, should
induce hesitating responses. Moreover, questions already raised in the preliminary experiment
concerning the effect of the amount of information processed - one or two-bar items -, will receive
here a more extended analysis. In addition, it is here hypothesized that the number of familiarization
listenings to the piece (factor F) should improve subjects' performances. Finally, it seems reasonable
to expect a more significant effect for the rhythmic than for pitch modifications: a change of pitch
might even not be noticed whereas a rhythmic modification seems to damage more effectively the
integrity of the memorized musical sequence. The effects of the above factors should also have a clear
influence on the degree of certainty the subjects will be given for their responses.
GENERAL METHODS
1. Materials
The same piece by J. S. Bach - Allegro Assai of the Sonata for violin solo in C major BWV 1005 - was
employed. The first part of the piece (bars 1-42) and the sequences requested for the four different
experiments were played by Mira Glodeanu on a barock violin and recorded on a DAT. For playback
during the tasks, a cassette player and 2 Yamaha loudspeakers powered by a Denon amplifier were
employed.
Four different series of items were prepared, one for each experiment. They are based on the two
parent-motifs of the piece (motifs A and B, see Cambouropoulos, figure 2, this symposium) and some
of their variations.
[ Link to Cambouropoulos figure 2 - use the browser back button or the link to return here]
Two series presented two-bar items, and two others employed one-bar items. These were made up by
dividing the two-bars items in two sequences of one bar. Consequently, an equal number of first and
second bar items appear in the series of one-bar.
As in the preliminary experiment, each serie was built up in three parts: HEARD (H), i.e. items taken
in the first part of the piece, already heard in the acquisition phase ; UNHEARD (UH), i.e. items
borrowed in the second part of the piece, which the subjects did not listen to; MODIFIED (M), i.e.
items containing a small rhythmic or pitch modification.
Criteria chosen to set up the modified items
The effect of modifications introduced in the original text was studied in relation with rhythm
modifications in Experiments I and III and pitch modifications in Experiments II and IV. All these
modifications were introduced in the items of the part HEARD.
a) rhythm modifications : As in the preliminary experiment, one complete beat of the item is modified,
but no changes were introduced in the first beat, because some of the items have only three
semi-quavers in that beat and also because changes located at the very beginning of the item would
have induced too easy responses. The modified beat is replaced by :
2 quavers
1 quaver + 2 semi quavers
1 dotted quaver + 1 semi quaver
or 1 crotched
The original pitches were always preserved.
In the one-bar items, modifications occured an equal number of times on the second and the third
beat, using an equal number of times the one of the other type of modification planned.
In the two-bar items , (i) a complete set of items presented one single modification bearing on the first
beat of the second bar; (ii) another set was designed by the "addition" of bars 1 and 2 of the sets of
one-bar items. Modifications did intervene alternatively on the second and third beat.
b) pitch modifications : They altered only one single note defined in order to meet an equal number of
times the first, the second, the third and the fourth semi-quaver of each bar of the original item. But it
was decided that the three first semi-quavers of an item would not be changed to avoid too easy
responses. Modifications appeared an equal number of times and are located as follows :
1st beat modified on the 4th semi-quaver
2nd beat modified on the 1st or 3rd semi-quaver
3rd beat modified on 2nd semi-quaver
In the one-bar items, the whole set of items of the HEARD part has been modified according to this
plan.
In the two-bar items , three different sets are built by "addition" of the previous one-bar items: (i) one
set with changes bearing in the second bar, and nothing in the first one; (ii) another set with the
contrary, i.e. no changes in the second bar, but changes in the first bar ; (iii) a final set with changes in
both bars of the item.
2. Participants
600 adult subjects - 300 musicians and 300 non-musicians - took part in the four experiments. The
musicians were students of Royal music conservatories in Belgium and the non-musicians were
students of post secondary schools. They were between 18 and 25 years of age (average = 22).
3. Procedure
Subjects were tested in groups. The instructions were given on a form, stating explicitly the tonality of
the piece. Subjects had 10 minutes to read and ask questions. It was explained that they would be
asked to listen to the first section of the Bach piece - twice, six or ten times, depending their condition
- followed by a set of small items taken from that section, or from the non-heard section, or slightly
modified, and that their task was to respond, for each item, if they had already heard it or not. In
addition, it was asked to indicate their degree of certainty in their responses. They received a response
form to be completed. Those forms displayed a number of items in accordance with the experiment :
48 for Experiment I and II; 30 for Experiment III; 40 for Experiment IV. The degree of certainty for
each item was symbolised by an horizontal line of 10 cm. Subjects were invited to draw a small
vertical line at the point corresponding to their feeling of certainty.
• Musicians perform significantly better except for UH items: H = 74 vs 65% [F(1,148)=14, p <.001];
UH = 46 vs 44% [F(1,148)=.6, p =.43]; M = 76 vs 66% [F(1,148)=13, p <.001].
• F did not influence significantly the results, except for H items [F(1.147)=6, p <.005].
(ii) The main effects (T and F ) tested by a two-way ANOVA recorded a significant effect of T
[F(1,144)=24, p <.001], but not for F.
For the H items, the main effects were significant, Musicians performed better [F(1,144)=16, p
<.001], and the number of previous listenings have produced higher results [F(2,144)=6, p =.002].
No effect was reported for UH, and for M items, only the effect of T was significant [F(1,144)=14, p
<.001].
(iii) The Friedman test recorded a strong influence of the type of items. The mean ranks are
respectively 2.3 (H), 1.3 (UH), and 2.5 (M): c2= 124, p<.0001. Considering the data from the
Musicians and the Non-Musicians separetely, a similar effect is observed : respectively mean ranks
2.2 (H), 1.3 (UH), and 2.6 (M): c2= 69, p<.0001; and 2.3 (H), 1.3 (UH), and 2.4 (M): c2= 57,
p<.0001. The best results were thus recorded for the M items, followed by the H items; as expected,
regarding the prototype effect, UH items were less successful.
3. Results of Experiment II (n = 150)
Globally, 57% correct responses (sd =10%) were observed, i. e. 65% for H items (sd = 14%), 50% for
UH (sd = 18%), and 57% for M items (sd = 18%). Already at this point of the analysis, it is observed
that the pitch modifications were less well perceived than the rhythmic ones.
(i) A one-way ANOVA on T and F showed that :
• Musicians performed significantly better than non-musicians except for HEARD items: H = 64 vs
65% [F(1,148)=.2, p <.64]; UH = 56 vs 44% [F(1,148)=18, p <.001]; M = 65 vs 49% [F(1,148)=35, p
<.001].
• F had no influence on the results, except for M items [F(1.147)=11, p <.001].
(ii) The main effects (T and F ) tested by a two-way ANOVA were strongly significant : T
[F(1,144)=37, p <.001]; F [F(2,144)=9 p <.001]. Nevertheless, the mean percentages for 10, 6, and 2
previous listenings are respectively 68, 59, 58% for the Musicians and 55, 55, 49 % for the
Non-Musicians. Thus the progress is located between 6 and 10 listenings for Musicians and between 2
and 6 for Non-Musicians [F(2,144)=5 p <.006].
Considering the types of items, it is observed that for the H items, the main factors did not show any
effect. On the contrary, they were strongly significant for the UH and the M items. The results are,
respectively,
T [F(1,144)=19, p <.001]; F [F(2,144)=2.5 p =.009] for the UH items, and
T [F(1,144)=42, p <.001]; F [F(2,144)=14 p <.001] for the M items.
(iii) Again, the Friedman test did indicate a strong influence of the type of items: mean ranks
respectively 2.4 (H), 1.6 (UH), and 2.0 (M): c2= 47, p<.0001. For Musicians' and Non-Musicians' data
considered separetely, a similar effect was observed : respectively mean ranks 2.2 (H), 1.6 (UH), and
2.2 (M): c2= 16, p=.0003; and 2.6 (H), 1.6 (UH), and 1.9 (M): c2= 40, p<.0001. In comparison with
the Experiment I, regarding the prototype effect, UH items were again less successful. In addition,
these results show an influence of the modified musical parameter: as expected the rhythmic errors
were better perceived than the pitch errors.
This experiment employed two-bar items including two different types of rhythm modifications : M1
= one modification in the second bar; M2 = two modifications, i.e. one in each bar.
66% correct responses (sd =13%) were observed globally, i. e. 76% for H items (sd = 19%), 57% for
UH (sd = 19%), 57% for M1 (sd = 23%) and 76% for M2 (sd = 21%).
(i) A one-way ANOVA on T and F has shown that :
• Musicians performed significantly better. Nevertheless for H items, the difference is not statistically
significant: H = 78 vs 73% [F(1,148)=3.4, p =.07]; UH = 60 vs 53% [F(1,148)=6, p =.02]; M1 = 63 vs
51% [F(1,148)=12, p =.001]; M2 = 82 vs 69% [F(1,148)=15, p <.001].
• F influenced the results in all respects:
H = 82, 77 vs 68% [F(1,147)=7, p =.001]; UH = 64, 54 vs 53% [F(1,147)=5, p =.007]; M1 = 68, 52 vs
52% [F(1,147)=8, p <.001]; M2 = 82, 75 vs 70% [F(1,147)=4.7, p =.01].
ii) The main effects (T and F ) were strongly significant, as shown by the two-way ANOVA: T
[F(1,144)=28, p <.001]; F [F(2,144)=19 p <.001]. F was more effective for the Musicians: the mean
percentages recorded for 10, 6, and 2 previous listenings were respectively 82, 69, 62% for the
Musicians and 66, 60, 59 % for the Non-Musicians [F(2,144)=4.4 p <.01].
For the different types of items considered separetely, the main factors (T and F ) remain significant.
A more important effect of F was observed in Musicians for the M2 items [F(2,144)=3.2, p =.04]
which might indicate an influence of two-bar items modified rhythmically twice.
(iii) As for Experiment I and II, the Friedman test has indicated a strong influence of the type of items:
mean ranks respectively 3.0 (H), 2.0 (UH), 2.0 (M1) and 3.1 (M2): c2= 99, p<.0001. The best results
were recorded for the H items and the M2 items.
When the items are modified once in bar 2, they were less rejected : the absence of changes on the
head of the motif had a strong effect on the responses. For Musicians' and Non-Musicians' data
considered separetely, similar effects were observed : respectively mean ranks 2.9 (H), 1.9 (UH), 2.0
(M1) and 3.2 (M2): c2= 56, p<.0001; and 3.0 (H), 2.0 (UH), 2.0 (M1) and 3.0 (M2): c2= 46, p<.0001.
This analysis corroborates the observation made above showing that the H and the M2 items in both
groups of subjects received the best responses.
5. Results of Experiment IV (n = 150)
This experiment employed also two-bar items; three different types of pitch modifications were
included: M1 = in 2nd bar; M2 = in 1st bar; M3 = 2 modifications: one in each bar.
61% correct responses (sd =14%) were observed globally, i. e. 72% for H items (sd = 19%), 66% for
UH (sd = 18%), 44% for M1 (sd = 21%), 59% for M2 (sd = 25%) and 65% for M3 (sd = 25%).
(i) The one-way ANOVA on T and F has shown that :
• Musicians perform significantly better. Nevertheless for M1 items, the difference is not statistically
significant: H = 78 vs 67% [F(1,148)=20, p <.001]; UH = 70 vs 62% [F(1,148)=7.5, p =.007]; M1 =
45 vs 43% [F(1,148)=0.2, p =.62]; M2 = 71 vs 48% [F(1,148)=37, p <.001]; M3 = 76 vs 53%
[F(1,148)=57, p <.001].
• F influenced the results except for UH and M1 items: H = 78, 72 vs 68% [F(1,147)=5, p <.001]; UH
= 70, 65 vs 62% [F(1,147)=2.7, p =.07]; M1 = 49, 42 vs 41% [F(1,147)=2.0, p =.13]; M2 = 71, 55 vs
52% [F(1,147)=10, p <.001]; M3 = 71, 65 vs 58% [F(1,147)=4.4, p =.01].
ii) The two-way ANOVA has shown that T and F are strongly significant, except for the M1 items.
On the global percentages we observe T [F(1,142)=60, p <.001]; F [F(2,142)=16, p <.001].
(iii) The influence of the type of items analyzed by the Friedman test recorded worst results for the
M1 items: mean ranks 3.7 (H), 3.2 (UH), 1.9 (M1), 2.8 (M2) and 3.3 (M3): c2= 121, p<.0001. Similar
effects were observed for Musicians and the Non-Musicians data considered separetely.
6. Degrees of certainty
(i) A one-way ANOVA on each main factors (T, F and E ) has shown that :
• Musicians gave generally higher degrees than Non-Musicians, both for correct (C) and incorrect (I)
responses { C = 7.2 vs 6.3 [F(1,598)=49, p <.001]; I = 6.4 vs 6.0 [F(1,598)=9, p =.004] }.
• For C responses , but not for the I responses, the degrees increased with the number of opportunities
to listen to the piece {C = 7.1, 6.8 vs 6.3 [F(1,597)=13, p <.001] }.
ii) With a three-way ANOVA (2 x 3 x 4) on the three main factors
(T, F, E), similar significant results were observed { T : [F(1,582)=49, p <.001]; F : [F(2, 582)=12, p
<.001]; E : [F(3,582)=12, p <.001] }.
Musicians were given higher degrees the more previous listenings had been heard: mean degrees for
10, 6, and 2 listenings respectively 7.6, 7.0, 6.4 for the Musicians and 6.2, 6.3, 6.1 for the
Non-Musicians [F(2,582)=9, p <.001]. Thus subjects' confidence in their responses was related to the
familiarization length.
In addition, it appeared that the Familiarization listenings induced more progress for Experiments I
and II: more stable degrees were reported for III and IV [F(6,582)=4.4, p <.001].
CONCLUSIONS
As already noticed in the preliminary study (Deliège, 1997), the results reveal an Imprint effect
operational for all subjects. The Friedman tests established that the weakest performances were
recorded for the UNHEARD items which were accepted as already been heard during the
Familiarization listenings. Concerning the factor TRAINING, musicians gave better responses, a
result which corroborates previous remarks: « Non-musicians had less capacity to counteract the
effect of imprint by calling on an understanding of explicitly analytic and syntactic associations. But
associations of this kind were not always sufficient for the musicians either; the effect of imprint
supplanted acquired knowledge in many instances. » (ibid. p. 63). An additional Training effect was
also observed for the two-bar items : Musicians performed better with longer items (Exp. III & IV)
showing that an extended amount of information only favored trained subjects.
Familiarization was not considered in the previous study. Broadly speaking, performances are better
when the number of familiarization listenings are increased. However, some specific results showed
that this effect was more effective for Musicians: again trained subjects had a clear advantage on this
point and particularly for the longer items which emphazises the remark made above about the same
items.
The two-bar items, in relation with the Modified items, lead to new interesting observations. Indeed,
the M1 items of Experiment III and IV did not collect very good responses. Those items were only
modified in the second bar which might signify an unexpected effect of absence of modification of the
head (beginning) of the sequences. Pattern recognition was made on the basis of the very beginning,
and subjects did not pay attention on what happened afterwards. This particular impact of the "heads"
of the units was already reported by Anderson in his ACT* model (1983, pp. 52-53), in relation to a
study by Horowitz (1968).
Another aspect investigated in this study concerned the degrees of certainty given by the subjects for
their responses. The three factors (T, F, E) had a significant effect. TRAINING favored confidence in
the responses as well for Correct than Incorrect responses, and FAMILIARIZATION lead to higher
degrees especially for one-bar items.
Considering finally the aspect of style of the piece, which should influence the responses recorded for
the MODIFIED items, it was again observed that the imprint , incorporating an average value of the
main stylistic characteristics, is rather rapidly formed after the cue abstraction . In general, the M
items were rejected as having not been heard previously. But, when comparing rhythm and pitch
modifications (i.e. results of Experiments I vs II, and III vs IV), it was observed that rhythm
modifications were much more efficient to induce rejection. These results clearly emphazise the
already cited Schoenberg's statement « The preservation of the rhythm allows extensive changes in the
melodic contour » (1967, p.30).
REFERENCES
Anderson, J.R. (1983) Architecture of Cognition, Cambridge, Mass., Harvard University Press.
Barsalou, L.W. (1983) Ad hoc categories. Memory and Cognition, 11, 211-227.
Barsalou, L.W. (1990) On the indistinguishability of exemplar memory and abstraction in category
representation. In T.K. Srull & R.S. Wyer Jr (eds), Advances in social cognition, vol. 3, Hillsdale, NJ,
Lawrence Erlbaum, p. 61-88.
Deliège, I. (1997) Similarity in processes of categorization. In Proceedings of SimCat 1997,
Edinburgh University, 59-65.
Goldstone, R.L. (1995) Mainstream and avant-garde similarity. Psychologica Belgica, 35, 145-165.
Goodman, N. (1972) Seven strictures on similarity. In H. Goodman (ed.) Projects and Problems. New
York, Bobbs-Merrill, p. 437-447.
Hampton, J.A. (1997) Similarity and Categorization. In Proceedings of SimCat 1997, Edinburgh
University, 103-109.
Horowitz, L.M., White, W.A., & Atwood, D. W. (1968) Word fragments as aids to recall : the
organization of a word. Journal of Experimental Psychology, 76, 219-226.
Keil, F. C. (1987). Conceptual development and category structure. In U. Neisser (ed.), Concepts and
conceptual development : Ecological and intellectual factors in categorization.. Cambridge,
Cambridge University Press. pp. 175-201.
Medin, D.L. & Schaffer, M.M. (1978) Context theory of classification learning. Psychological
Review, 85, 207-238.
Minsky, M. (1975) A framework for representing knowledge. In P.H. Winston (éd.) The psychology
of computer vision. New York, McGraw-Hill.
Schank, R.C. & Abelson, R.P. (1977) Scripts, plans, goals and understanding. Hillsdale, N.J.,
Lawrence Erlbaum.
Back to index
Proceedings paper
Numerous studies have established the importance of phrase cues in melodic perception and recognition. We know that listeners use
phrase cues to help make sense of musical structure (Gregory, 1978; Lerdahl & Jackendoff, 1983; Sloboda & Gregory, 1980), to
learn new songs (Sloboda, 1977, Sloboda & Parker, 1985), and to recognize previously learned songs (Tan, Aiello & Bever, 1981;
Chiappe & Schmuckler, 1997). We know less about when such strategies develop. Since not all of the world's music has the same
phrase structure, we can assume that a certain amount of our response to musical phrase information is learned at some point, but
when?
A recent study of children's melodic memory found that children as young as 7 years old use phrase cues as a grouping strategy in
their memory for melodies even when they are taught the song as an entire unit (Demorest, 1999). Children from grades k-1 and
grade 4 were asked to reconstruct two previously-learned songs by putting four melody blocks in the right order. In one condition the
melody blocks matched the phrases of the melody, in the other, the blocks were of similar length, but divided against the phrase
break. All of the children reconstructed a regular and a irregular set of blocks. The hypothesis was that if phrases cues were important
in children's melodic representations then it should be harder to reconstruct melody blocks that did not match their internal
representations. There was a significant difference in the number of times subjects had to listen to the blocks and move the blocks
with the irregular condition requiring more listenings and more moves to complete the task. This was true for both the younger and
the older children, though older children required fewer operations overall to complete the task in either condition.
In this earlier study the children were taught two common (though unfamiliar) children's melodies in English to maximize the
ecological validity of the song-learning task. A number of studies have found evidence for the integration of text and melody in the
musical memory of adults (Crowder, Serafine, & Repp, 1990; Serafine, Crowder & Repp, 1984; Serafine, Davidson, Crowder &
Repp,1986) and children (Chen-Hafteck, 1999; Feierabend, Saunders, Holahan, & Getnick,1998; Morrongiello & Roes, 1990). The
relationship between text and melody seems to be complex, and may be affected by subject experience, the nature of the memory
task, and the subjects' culture. For children, text seems to be a crucial dimension in melodic recognition, and plays a significant role
in melodic memory (Feierabend, Saunders, Holahan, & Getnick,1998; Morrongiello & Roes, 1990). It is possible that the children in
my earlier study responded more to text phrase cues than musical phrase cues to help them memorize the songs, and then used that
information when reconstructing the melody. This study investigates the role of musical phrase cues in children's melodic memory in
the absence of meaningful text phrase cues.
Method
The participants were 39 children aged 6-11 years from the Northwestern United States. They were taken from kindergarten (n=9),
grade 3 (n=15), and grade 5 (n=15) of a local elementary school where they received general music instruction twice a week for 30
minutes. All of the children were taught two standard children's songs by their music teacher using the whole song or immersion
method of rote teaching (i.e. not phrase by phrase) as a regular part of their music class. The two songs were Appalachian folk
melodies, but both songs were set to an unfamiliar language (Maori) to remove text cues. This was considered the best solution to the
issue of text cues since learning songs without text (e.g. on "la") would be quite unnatural for children this age, whereas learning
songs in another language is not uncommon.
Phrase memory was tested using a reconstruction paradigm in which subjects are asked to demonstrate their memory for a melody by
reconstructing a familiar four phrase melody on a computer from four pieces or "blocks" that were placed out of order (Figure 1).
Children clicked on a block to hear the melody fragment it contained and then placed the blocks in a left-to-right sequence to
reconstruct the song. The reconstruction approach is more involved than simple recognition, but does not rely on children's
performance skills in testing their melodic recall.
Each subject performed the reconstruction task on the two melodies they had been taught in music class. In the regular phrase
condition, the four melody blocks represented the four phrases of the melody while in the irregular phrase condition, the blocks were
broken either before or after the natural phrase break. Figure 2 shows the regular and irregular divisions for Melody 1. The number of
notes per block is very similar, but the difference is in the placement of the division. Thus if the memory task were simply a matter of
remembering notes, there should be no difference in task difficulty between the two conditions. If however, children rely on
structural information such as phrases as a memory aid, then the irregular condition would not match their internal melodic
representation, and should be more difficult to reconstruct. All children reconstructed one melody in each condition. As with the
earlier study, the hypothesis was that if phrase groupings were important in melodic memory, it should be more difficult to
reconstruct the irregular phrase groupings than the regular. This difficulty would be reflected statistically as a within-subject
Figure 1. The reconstruction task using Impromptu (Bamberger & Hernandez, 1992-2000).
Figure 2. One of the test melodies in regular and irregular phrase groupings shown with the Maori text.
Results.
The primary question of the study was the influence of melodic grouping on children's reconstruction performance in the absence of
text cues. Repeated measures analysis of variance revealed a significant difference in performance between the regular and irregular
phrase conditions. Subjects had to listen an average of 2.67 more times to reconstruct the irregular melody blocks than the regular
melody blocks [F (1, 33) = 12.77, p=.001]. This was true regardless of the age of the subject. Figure 3 shows the mean scores by age
group for the regular and irregular phrase conditions. There were no significant between subject differences due to age [F (2, 33) =
0.69, p=.507] or musical training [F (1, 33) = 0.03, p=.864], but there was a significant age by training interaction [F (2, 33) = 4.74,
p<.05]. This interaction can be seen in the graph in Figure 4 which demonstrates that while students with private training in grades
three and five needed fewer hearings overall to complete their reconstructions, kindergartners with private training performed worse
Figure 3. Mean number of hearings required by each age group in the two phrasing conditions.
Figure 4. Mean number of hearings required for subjects with and without private training.
Discussion
Reconstructing melodies whose segments do not correspond to melodic phrase breaks is a more difficult task for children even
without the benefit of meaningful text cues to aid in segmenting the melody. This suggests that children as young as 6 years old are
recognizing and employing purely musical phrase cues in song acquisition and memorization. In fact, when the performance of these
students was compared with those of the earlier study, they actually required fewer moves overall to reconstruct the same melody in
either condition. This difference may be due to a number of factors including the quality of music instruction at the different schools
and other aspects of musical background, but it does suggest that the lack of text didn't impair student's performance. Indeed some
research has suggested that simultaneous presentation of text and music can actually hamper song-learning and song performance
(Goetze, 1986; Levinowitz, 1989; Welch, Sergeant, & White, 1995/1996). Future research should directly compare student's
reconstruction performance with and without text cues in song-learning.
The lack of an age-related difference in overall performance contradicts the findings of the earlier study (Demorest, 1999) and is not
consistent with overall developmental improvements memory. Perhaps the lack of a meaningful text reduced the memory advantage
for older students in the study, forcing them to rely on their musical memory alone. It would be interesting to see if age-related
differences in melodic memory are dependent on text conditions in future studies.
The interaction between age and private study is not too surprising given the relatively short time that the kindergartners had been
studying privately. It is unlikely that any instructional benefits would be present after such a short time, and it is unusual for most
children to be receiving lessons at that young age regardless of their ability. The improvement in performance for older children may
indicate the benefits of private training, or simply reflect the natural tendency of students with musical interests or aptitude to seek
extra instruction.
Tan, N., Aiello, R., & Bever, T. G. (1981). Harmonic structure as a determinant of melodic organization. Memory and Cognition, 9,
533-539.
Welch, G. F., Sergeant, D. ., & White, P. J. (1995-1996). The singing competencies of five-year-old developing singers. Bulletin of
the Council for Research in Music Education, 127, 155-162.
Back to index
Proceedings paper
Besides the Chinese, the meaning of songs for the Africans is also different from that of the
West. In African culture, songs mean much more than songs in the Western sense, which
implies a piece of music with fixed melodies and words to be sung by human voice. The
traditional African songs do not have fixed melodies and one can always improvise and make
variations to the melodies. Thus, very often, we can find that a traditional African song will not
be sung in exactly the same way by two people, or by the same person at different times.
Furthermore, music and dance are integrated in African songs, and one cannot sing an African
song authentically without the accompanying movement and dance. All these cultural
differences should be noted when comparing African and Western songs and singing.
The two examples above show that we need to be cautious when carrying out cross-cultural
comparison of children's song-learning and singing. As we can see, the meanings of songs and
singing are not universal. Obviously, this can exert effects on children's singing behaviour.
When asking a Chinese child to sing, he/ she sings with the musical quality of the language,
like reciting a poem. When asking an African child to sing, he/ she sings with creativity in the
music and combines the singing with natural body movement. Therefore, it is important to take
into consideration the meanings of songs and singing for children from different cultures before
Among the various measurements of singing used in past research for comparing children's singing
collected from different cultures, they fall within the basic categories of human ratings on pitch
accuracy and computer analysis on the sound as recorded from children's singing (Welch, 1994). The
human-based analysis offers us more information on the musical aspects of singing, based on human
perception, but there is a risk of obtaining data, which are biased due to the cultural background of the
raters. Thus, in cross-cultural research, it will help if the raters are selected from the different cultures
under studied. On the other hand, the computer-based analysis provides objective data free of human
bias. It gives precisely all the minute details of different properties of sound such as frequency,
amplitude, and spectrum, which cannot be observed by human. Yet it cannot infer musical meaning to
the sound, a quality which is only possible through the human ear and mind. Therefore, if it is
possible, there should be a balance between the two kinds of analyses. Human assessment is important
to inform us of the phenomena in singing which are significant to the human ear. At the same time,
scientific measurements can provide detailed and reliable data to support and verify human
judgement. Thus, the use of both methods can help to complement the inadequacy of each other.
In addition, it is important to include qualitative analysis of some data on the cultural background of
the children under examination. Such descriptive qualitative data can inform us more about the
possible cultural factors that can affect the findings, and thus, be used to supplement our knowledge of
the issue studied in context. If this aspect of analysis is not considered, the findings will be out of
context and thus, will not be significant.
To study cultural differences in music is an exciting area of research. However, it must be conducted
with caution, considering the various issues in context. If such research is designed and conducted in
an appropriate manner, it can be an important contribution to the knowledge of music and musical
behaviour.
Reference:
Chen-Hafteck, L. (1998) Pitch abilities in music and language of
Cantonese-speaking children, International Journal of Music Education, 31,
14-24.
Chen-Hafteck, L. (1999a) Tonal languages and singing in young children. In S. W.
Yi (Ed) Music, Mind, and Science, pp. 479-494. Seoul, Korea: Seoul National
University Press.
Chen-Hafteck, L. (1999b) Singing Cantonese children's songs: Significance of the
pitch relationship between text and melody, Music Education Research, 1, 1,
93-108.
Chen-Hafteck, L. (1999c) Discussing text-melody relationship in children's
song-learning and singing: a Cantonese-speaking perspective, Psychology of
Music, 27, 1, 55-70.
Rutkowski, J. & Chen-Hafteck, L. (2000) The singing voice within every child: a
cross-cultural comparison of first graders' use of singing voice. Paper to be
presented at the Ninth ISME Early Childhood Seminar, Kingston, Canada, and the
ISME World Conference, Edmonton, Canada.
Back to index
Proceedings paper
instant need to be taken into account. If the music is performed and thus contains expressive timing, it is necessary to use
adaptive oscillators (Large & Kolen, 1994; Toiviainen, 1998).
The perceptual salience of each pulse sensation is modeled with the resonance value of the respective oscillator. The
contribution of each tone to the resonance of each oscillator depends on the degree of synchrony between the tone onset and
the oscillator's pulse, the inter-onset interval following the onset, and the pitch of the tone. To study the effect of these
different factors, three different models were used.
Model 1. Model 1 relies solely on the temporal structure of the music. The resonance dynamics are modeled with a damped
system driven by an external force. More specifically, the resonance value of oscillator is determined by
, (1)
where is the driving force, is the damping constant, and is the time constant. The first-order time derivative is
included in order to smoothen the resonance function; for the damping constant, the value is used. The parameter
models the length of the temporal integration window. With the absence of any external force, the resonance value decays
approximately by the factor of during an interval of .
The driving force has the form
, (2)
where is the output of oscillator at the most recent tone onset. According to Equations 1 and 2, the oscillators that are
at the peak of their output start to increase their resonance up to the next note onset. Due to the exponential decay of the
driving force, the increase of the resonance is proportional to the perceived durational accent of the respective tone (Parncutt,
1994). The resonance value of each oscillator is weighted according to its oscillation period: the closer the period is to the
period of most salient pulse sensations, the higher is the weighting.
At each instant, the oscillator with the highest resonance represents the perceived pulse. This oscillator is referred to as the
winner. To model the stability in maintaining the tapping mode observed in tapping studies, the winner is changed only when
the highest resonance value exceeds that of the winner by a switching threshold . In other words, a switch in the tapping
mode occurs when
. (3)
The model produces a tap whenever the winner oscillator has zero phase and its resonance exceeds the tapping threshold
.
Model 2. Model 2 is similar to model 1, with the addition that it takes pitch height into account. It does so by passing the
tone information through a bank of Gaussian filters that are equidistantly spaced on the pitch dimension. This filter bank
divides the input to several pitch channels, for each of which the resonance dynamics scheme is applied separately.
Therefore, the model segregates the input to a set of streams depending on the pitch height. For each pulse mode, the
resonance value is then obtained by summing the resonance values across all the channels. Each pitch channel has an
individual weight that depends on the center pitch of the channel according to
, (4)
where is the center pitch, with 64 corresponding to C4, and is the pitch weighting parameter. When , all channels
receive an equal
weighting; when , low pitches receive a higher weighting than high pitches.
Model 3. Model 3 is similar to model 2, with the addition that it weights the notes according to their tonal significance. It
assumes that tonally significant tones increase the salience of the pulses with which they co-occur more than do less
significant tones. The model uses the key-finding algorithm by Krumhansl (1990), with the modification that it uses an
exponential time window for integrating the pitch information. For each tone, the driving force of equation 2 is weighted by
the value of the respective component of the probe tone profile (Krumhansl & Kessler, 1982) of the current key. Whenever
Each of the three models had approximately equal optimal parameter values. These were , , ,
and . Thus the optimal values were obtained using a temporal integration length of 4 seconds, weighting the
pitches so that the weighting increases by a factor of approximately 1.4 for each descending octave, and accepting switches
only when the maximum resonance exceeds that of the winner by at least 20 percent. The meaning of the optimal tapping
threshold value, , is more difficult to interpret.
Comparison between human and model data
RMS errors. The total RMS error values for the optimized models were 1.78, 1.82, and 1.19 for models 1, 2, and 3,
respectively. In terms of the total RMS error, model 3 thus performed best, followed by model 1 and model 2, in that order.
Table 1 shows the root-mean-square (RMS) errors for each performance measure and model separately. For the performance
measures BST, down, up, and switches, the lowest RMS error was obtained with model 3. The lowest RMS errors for the
performance measures neither and aper were obtained with models 2 and 1,respectively.
TABLE 1. RMS errors between human and model data
Correlations. Table 2 shows the correlations between the human and models data for each performance measure and model.
The average correlations, taken across the six performance measures, are 0.417, 0.463, and 0.553 for the models 1, 2, and 3,
respectively. As can be seen, the highest correlation for all performance measures except switches was obtained with model
3. For model 3, all the correlations except that for BST are significant at the p<0.05 level. Figure 1 shows the performance
Figure 1. Scatter plots of the six performance measures taken from subjects and model 3. Each point represents one stimulus;
its abscissa and ordinate correspond to human and model data, respectively.
Conclusion
References
Boltz, M., & Jones, M.R. (1986). Does rule recursion make melodies easier to reproduce? If not, what does?
Cognitive Psychology, 18, 389-431.
Brown, J. C. (1993). Determination of meter of musical scores by autocorrelation. Journal of the Acoustical
Society of America, 94(4), 1953-1957.
Clarke E. F. (1999). Rhythm and timing in music. In D. Deutsch (Ed.), The psychology of music (2nd ed., pp.
473-500). New York: Academic Press.
Dannenberg, R. B. & Mont-Reynaud, B. (1987). Following a jazz improvisation in real time. In Proceedings of
the 1987 International Computer Music Conference. San Francisco: International Computer Music Association,
241-248.
Dawe, L. A., Platt, J. R., & Racine, R. J. (1994). Inference of metrical structure from perception of iterative
pulses within time spans defined by chord changes. Music Perception, 12(1), 57-76.
Desain, P. & Honing, H. (1989). The quantization of musical time: a connectionist approach. Computer Music
Journal, 13(3), 56-66.
Deutsch, D. (1980). The processing of structured and unstructured tonal sequences. Perception &
Psychophysics, 28, 381-389.
Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.), The psychology of music (2nd ed., pp. 149-180).
New York: Academic Press.
Gasser, M., Eck, D. ,& Port, R. (1999). Meter as mechanism: a neural network model that learns musical
patterns. Connection Science, 11, 187-215.
Jones, M. R., Boltz, M., & Kidd, G. (1982). Controlled attending as a function of melodic and temporal context.
Perception & Psychophysics, 32, 211-218.
Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220,
671-680.
Krumhansl, C. L. & Kessler, E. J. (1982). Tracing the dynamic changes in perceived tonal organization in a
spatial representation of musical keys. Psychological Review, 89, 334-368.
Large, E. W. & Kolen, J. F. (1994). Resonance and the perception of musical meter. Connection Science, 6(2-3),
177-208.
Lerdahl, F. & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press.
Longuet-Higgins, H. C. & Lee, C. S. (1982). Perception of musical rhythms. Perception, 11, 115-128.
McAuley, J. D., & Kidd, G.R. (1998). Effect of deviations from temporal expectations on tempo discrimination
of isochronous tone sequences. Journal of Experimental Psychology: Human Perception and Performance, 24,
1786-1800.
Monahan, C. B., Kendall, R. A., & Carterette, E. C. (1987). The effect of melodic and temporal contour on
recognition memory for pitch change. Perception & Psychophysics, 41, 576-600.
Palmer, C. & Krumhansl, C. (1990). Mental representations of musical meter. Journal of Experimental
Psychology: Human Perception and Performance, 16, 728-741.
Back to index
Proceedings paper
MUSICALITY AND MUTUALITY : THE DEVELOPMENT OF MELODY
Dr Gudrun Aldridge
gudruna@uni-wh.de
Background:
Aims:
Main contributions:
Implications
Music therapy assists an individual to find his own expressive style that
relates to another person. Thus adult human communication can be seen as having
its foundations in an inherent musicality.
Back to index
Proceedings abstract
Tokyo,
Background.
Singers may control the rate and magnitude of vibrato to effectively express
emotions in singing voice. The effects of controlling vibrato on emotional
contents in singing voice have not yet been clarified enough.
Aims.
Method.
Ten professional opera singers sung /a/ with four intended emotions of
happiness, anger, sadness and fear. Emotional profile of each recorded sample
was evaluated by 25 listening subjects using seven point rating scales. An
object-oriented acoustic analysis system was used to evaluate vibrato
characteristics including its rate and magnitude.
Results.
Although the vibrato rate varied only in a narrow range of 5.1 and 5.6 cycles
per second, the rate was found to have a significant effect on perceptual
emotions with the fastest rate for fear and the lowest for sadness. The vibrato
magnitude, which was the smallest for fear and the largest for anger, showed
also a significant effect on perceptual emotions. The rising time of vibrato at
the initial part of sung vowel also varied depending on emotions with a fastest
onset for anger and the slowest for sadness.
Conclusions.
These results suggest that the singers adjust fine characteristics of vibrato
so as to express emotions, which are reliably transmitted to the audience even
in short vowel segments.
Back to index
Proceedings apstract
Background:
Aims:
method:
Participants were presented with two pieces. Both are build on two basic
structures (A & B), exposed at the beginning of the piece. The rest of the
piece consists in variations of both structures. After listening to the whole
piece, children performed three tasks. They first listened both parent
structures and had to judge which of them was more frequent in the piece. In
the second task, they listened to the derivatives of both structures and had to
categorise them as A-derivatives or B-derivatives. In the third task, thay were
asked to rate the degree of similarity between the parent structures and their
derivatives.
Results:
Both musicians and non-musicians should be equally able to perform the first
task. Although, both groups should perform above chance, an advantage should
appear for musicians in the second task. Musicians should perform much better
than non-musicians in the third task. Age should interact with musical tuition,
in a sense that these tendencies should be accentuated by older children.
Conclusions:
Back to index
Proceedings paper
Introduction
Reductional Music theories propose a sort of musical listening which involves a mental processing of a series of events belonging to
the musical surface, establishing a hierarchy where some of them receive more structural importance, and, in that way, prolonging
their "existence". Harmony defines the tonal space in which prolongation develops itself, interacting with the melody which impulses
the movement (Salzer and Schachter, 1969). Both the notes which are prolonged as long as time passes by and the relationships of
tension and relaxation between events are important features of this interaction. This musical organisation is substantially governed
by the voice leading principles derived mainly from the strict counterpoint. Beyond the musical surface an underlying organisation
takes place, whose perceptual reality is intended to investigate. The process of abstraction of main events from the complete musical
piece, called reductional representation (Mc Adams, 1989), favours the attribution of tonal coherence to a musical piece (Salzer,
1962).
Research has been developed to study this phenomenon from a psychological perspective (Serafine, 1988; Serafine, Glassman &
Overbeeke, 1989; E. Bigand,1990,1994; N. Dibben, 1994; Martínez & Shifres 1999a, 1999b; Shifres & Martínez 1999). They used
different experimental paradigms: goodness of fit between a melody and the rendered reduction of its underlying voice leading (
henceforth UVL), similarity judgement between melodies with same or different UVL and family categorisation according to
resemblance of melodies.
Nevertheless, the study of reductional representation faces methodological difficulties in order to apply an experimental paradigm:
given that the events belonging to the underlying hierarchic levels also belong to the musical surface, when an structural event is
modified to create appropriate experimental conditions, the surface is simultaneously modified. Thus, it is difficult to specify the level
on which the listener´s response is based.
The study of the cognitive reality of the reductional representation should begin by the analysis of the reciprocal influences between
the surface and UVL attributes. Our own research (Martínez & Shifres 1999a, 1999b; Shifres & Martínez 1999), was devoted to
monitor this relationship using a similarity judgement paradigm. Results revealed that neither the contour hypothesis (according to
which two melodies are judged as more similar as higher is the association between their contours) nor the structure hypothesis
(which predicts that two melodies will be judged as more similar if they have same UVL) could explain separately the perceptual
similarity. Thus, a perceptual rivalry hypothesis was formulated.
This work follows a previous experiment which aimed to test such hypothesis, analysing further melodic components in order to
verify previous interpretations of the results, and to differentiate surface components from those of the UVL. It is an epistemological
exercise which asks which kind of melodic knowledge does the listener use while judging similarity of melodies. It is assumed that
the explanations provided by alternative models -more formalised and accepted than the one previously employed- may be useful to
the analysis of both components. It is expected to bring further support to the exploration of the psychological reality of the
assumptions of the reductional representation of the tonal hierarchic structure.
The Baseline Experiment section summarises relevant aspects of the cited investigation in order to clarify the actions which followed
it. Next section, The Models, synthesises the advantages of four models to analyse melodic attributes. Two of them (Combinatorial
Model and Oscillations Model) focus on surface components, studying the note to note level relationships and considering only the
particular melodic information of the musical examples. The other two (Tonal Weights and Melodic Anchoring) emphasise structural
aspects of the tonal melodies, because they are based on the study of invariants of the tonal system. The Method section describes the
analysis of the melodies used in the former experiment in terms of the mentioned four models. In Results section, empirical data from
the experiment are interpreted from the point of view of these new analysis. Finally, in the last section, contributions of these models
and different paradigms are discussed.
Figure 1. Procedure followed in the composition of the stimuli. Example No. 8: Chopin, Study Op. 25 No. 5. a)selection
of the fragment (Melody A); b) analysis of the underlying voice leading; c) reduction of the underlying voice leading
(R1); d) transformation of R1 into R2; e) Reconstruction of a melody (Melody C) from R2; f) Reconstruction of a melody
(Melody B) from R1, homologating changes between B and A to the changes between C and A.
146 adults with different levels of musical experience took part in the experiment. The experimental task consisted on listening to the
sequence AB - AC (or AC - AB) and to judge which of the two comparison melodies (B or C) was the most similar to A. Besides,
subjects had to estimate the level of certainty of the answer using a three point scale (very sure- not so sure- not sure). It was
hypothesised that listeners would judge melody B as the most similar given that it has the same UVL, although different degree of
association between surfaces would cause confusions in the responses. Thus, responses for melodies belonging to AC Group would
be less certain than responses for BC Group melodies. Data for B/C responses and certainty ratings were translated into a single score
ranging from 1 (very certain C) to 6 (very certain B), where 3 and below represent "C" and 4 and above represent "B". Thus, the test
value was 3.5.
Results confirmed the prediction: (a) subjects always tend to judge melody B (same UVL) as the most similar; and (b) different levels
of scores showed that structural and superficial attributes compete, causing different levels of perceptual Rivalry.
When the UVL is modified, inevitably the surface level is modified as well. For that reason, the experimental control of the
theoretical rivalry was based in the identification and exhaustive control of the surface attributes, which in spite of all the precautions
taken change anyway.
The purpose of the present work is then to test the pertinence of alternative theoretical models to describe those melodic attributes
which were modified while modifying the UVL. It is expected that if the models are useful in giving account of a different melodic
information they may help in finding a more precise estimation of the real incidence of the UVL in the similarity judgements. So,
stimuli used in the previous experiments were analysed according to the following models: Oscillations Model, Combinatorial Model,
Tonal Weights and Melodic Anchoring.
The Models
Oscillations Model (Schmuckler, 1999)
Proposed by Schmuckler (1999), it is probably the most simple idea to describe the melodic contour, considering the most superficial
level of note-to-note relationships. In order to differentiate this measure from others used in former studies, and at the same time to
capture the melodic information avoiding any type of structural component, it was used the simplest version of the model which
consists on counting the number of ascents and descents of the melodic contour.
Thus, both models (a) capture the information of ascents and descents without considering the interval dimension and (b) represent
two edges in relation to the temporal focus required (from the note-to-note level -Oscillations Model -, to the level which considers
the long term linear connections that may be present in this type of short melodies - Combinatorial Model ).
Method
Oscillations Analysis
The number of reversals in the direction of the melody was counted. This provided a measure of the tendency of movement. Although
the original version of the model does not include repetitions, as in this test three melodies with the same rhythm were listened, it was
assumed that repetition would be clearly noticed as a change of direction (in that way reversal is understood as change of direction).
Two measurements were obtained:
1. Differences of oscillations (*) of melodies B and C compared to melody A:
RSIM = rsimAB - rsimAC
Figura 2. Procedure followed to compare the theoretic similarity of the melodies according to the four models analysed.
Combinatorial Analysis
Each melody was represented with a matrix. The matrixes were compared into pairs counting the number of similar entries and
dividing it by the total number of entries of the matrix (without the diagonal- see figure 2). These proportions (csim) generated two
measures:
1. Difference of proportions (*):
csimAB - csimAC = CSIM
This value ranges from 1 to -1. If positive, A and B have the highest theoretic similarity; if negative, this relationship is
between A and C.
2. Classification of trios (**) according to the highest theoretic similarity corresponding to AB (AB Group) or to AC (BC Group)
∆TW=∑TWP[A]+∑TWP[B]
If the tonal weights are similar, ∆TW will be next to 0 - and the perceptual similarity between both melodies will be higher.
5. The same procedure was followed with AC. Then, the following measures were obtained
1. Proportion of differences between tonal weights (*). Given that subjects listened to the sequence AB - AC, it is possible
to think that the relative value of ∆TW in a pair within the total ∆TWs of the whole sequence may influence the
answer. Thus,
∆TW%= ∆TWAB/(∆TWAB+∆TWAC)
The more similar the amounts of tonal weights between B and C, the more the proportion tends to 0.5. In this case tonal
weights will not influence the subject's decision for any of the two melodies. As soon as this proportion is closer to 1 -
indicating that the difference in tonal weight of AB is higher than the difference of AC - the similarity judgements will
tend to C. Thus, it is possible to estimate a correlation between the ∆TW% and the means of perceptual similarity
observed in the experiment.
2. Classification of trios (**) according to their ∆TW. Trios were classified according to the lowest ∆TW found
between the pairs AB and AC:
Results
The continuous measurements obtained were used to run a lineal regression analysis. It showed that none of the models could predict
the differences in the similarity judgements.
Predictions emerging from the classification of the models taken in pairs were compared: Anchoring vs. Oscillations; Anchoring vs.
Combinatorial Model; Anchoring vs. Tonal Weights; Tonal Weights vs. Oscillations; Tonal Weights vs. Combinatorial Model; and
Oscillations vs. Combinatorial Model. According to this comparison the melodies could represent different possibilities of
agreement/disagreement between models: Agreement B (when the trio belongs to the AB Group in both models and they agree on
predicting the highest similarity of B); Agreement C (the trio belongs to the AC Group in both models and they agree on predicting
the highest similarity of C); and Rivalry (when one model predicts the highest similarity of B and the other predicts the highest
similarity of C).
It was predicted that the ratings of perceptual similarity for the melodies classified as Rivalry in the comparative analysis of the
models would be intermediate between examples in Agreement B and examples in Agreement C (Figure 3).
Figura 3. Values of predicted perceptual similarity for the cases of agreement and disagreement about the theoretic
similarity predicted by a pair of models.
A Anova 3 (Agreement/Rivalry) X 5 (Comparisons) repeated measures showed significant results for the factor Agreement/Rivalry
(F[2, 143] = 91.224; p<.000). Thus, Agreement B represented higher perceptual similarity of B, Agreement C represented higher
relative tendency toward C, and an intermediate value represented those cases in which the comparison of the two models resulted in
alternative predictions (Rivalry ) (Figure 4).
However, none of these means is lower than 3.5, showing that subjects always judge B as the most similar. This implies that the
models do not capture all the information which is being used in the similarity judgements.
The factor Comparisons was also significant (F[4,141] = 24.616; p<.000), indicating that the combined effect of the two models within
the pair presented differences. This means that, as it is observed in the figure, each pair of compared models show different values of
perceptual similarity on the agreement for B or C. For example, the perceptual similarity of B is better predicted by the agreement
between the Combinatorial Model and the Melodic Anchoring or the Tonal Weights than by the agreement between the Oscillations
Model combined with any of the other models.
The most curious result is that the Interaction between both factors also was significant (F[12,134] = 11.906; p<.000) indicating that if
a pair of models represents the best combination in order to predict the perceptual similarity of B, it is not the case for the predicted
similarity of C and vice versa. Besides, it can be observed in the figure that although the tendency of highest similarity can be
predicted by different combinations of two models, the rivalry between them only is well defined by the combination between the
Oscillations Model and the Anchoring Model.
Figura 4. Values of observed perceptual similarity for the cases of agreement and disagreement in the predicted theoretic similarity by
5 different pairs of models
Discussion
This study tested different models of melodic analysis in order to investigate whether listeners capture the attributes which they
describe while judging similarities. The aim was to contrast the conclusions of a previous study according to which , under certain
constraints, the Underlying Voice Leading was the attribute that listeners took into account when they made similarity judgements,
even though if a Rivalry between the melodic contour and the UVL could be inferred. The results of the contrast of the alternative
models show that:
a. It also exists a Rivalry between the attributes -captured by the models- and the UVL;
b. In spite of that Rivalry, choices always favoured the melody with the same UVL, being the mean always higher than 3.5.
c. The UVL is an attribute which listeners seem to capture when they judge similarities. The different models do not explain these
responses.
d. Therefore, and according to the tendencies observed in these studies, it is considered that the UVL was isolated as an
experimental variable.
One of the main methodological difficulties that presents the experimental study of UVL as a plausible explanation of the cognition
of musical structure, is that principles that govern their relationships are neither described in parametric terms nor according to
categories defined in absolute terms. In Psychology, it is considered that the analysis of similarities and differences of data is a valid
methodological tool in order to capture the structure of the underlying structure (non measured) of the objects under observation.
Thus, the application of the paradigm of similarity judgements in the Baseline Experiment was pertinent to the purpose of the former
study.
Another methodological challenge has to do with the fact that, although the theory exhibits prescriptive terms, the derivation of these
principles is an interpretative matter. The current study praised models which analyse the processing of the melodic information in
parametric ways. Thus, the contrast with the explanations providing by such approaches with the "heuristics" of the UVL, intended to
abduct the later from the rest of the components. The measurements gave account of an important scope of features characterised by
the models, from the note-to-note level of the musical surface to the invariants of the musical structure.
Although the Melodic Anchoring principle characterises some aspects of the melody, invoking voice leading principles, there are
differences between this theoretical framework and the one provided by the schenkerian theory. Thus:
a. Melodic anchoring is prospective, while melodic diminutions may be either prospective or retrospective - prefix or suffix
(Forte & Gilbert, 1982) -.
References
Bharucha, J. J. (1984a). Anchoring Effects in Music: The Resolution of Dissonance. Cognitive Psychology, 16, 485-518.
Bharucha, J. J. (1996). Melodic Anchoring. Music Perception, 13-3. 383-400
Bigand, E. (1990). Abstraction of two forms of underlying structure in a tonal melody. Psychology of Music,18, 45-60.
Bigand, E. (1994). Contributions de la musique aux recherches sur la cognition auditive humain. In S. Mc Adams & E.
Bigand (eds.). Penser les sons.Psychologie cognitive de l'audition. Paris: Presses Universitaires de France. 249-298.
Dibben, N. (1994). The Cognitive Reality of Hierarchic Structure in Tonal and Atonal Music. Music Perception, 12 No
1, 1-25.
Forte, A. y Gilbert, S. ([1982] - 1992). Introducción al Análisis Schenkeriano. [trad: Introduction to Schenkerian
Analysis, Pedro Purroy Chicot]. Barcelona: Labor.
Krumhansl, C. (1995). Music Psychology and Music Theory: Problems and Prospects. Music Theory Specturm.Vol. 17
No. 1, 53-80.
Krumhansl, C. L. (1990). Cognitive Foundations of Musical Pitch. New York. Oxford University Press.
Martínez, I. C. & Shifres, F. (1999a). Music Education and the Development of Structure Hearing. A Study with
children. In M. Barret, G. Mc Pearson & R. Smith (Eds.) Children and Music: developmental perspectives. Launceston:
University of Tasmania.
Martínez, I. C. & Shifres, F. (1999b). The rivalry between structure and surface while judging the similarity of melodies.
Paper presented to SMPC99. Evanstone, Illinois. USA
Marvin, E. W. & Laprade, P. (1987). Relating musical contours: Extensions of a theory for contour. Journal of Music
Theory, 31, 225-267.
McAdams, S. (1989). Psychological constraints on form-bearing dimensions in music. Contemporary Music Review, 4,
1-7.
Quinn, I. (1999). The combinatorial Model of Pitch Contour. Music Perception, Vol. 16, No. 4, 439-456.
Salzer, F. & Schachter, C. (1969). Counterpoint in Composition. New York. Columbia University Press
Salzer, F. ([1962]-1990) Audición estructural. Coherencia tonal en la música.[trad.: Structural Hearing. Tonal coherence
in Music. Pedro Purroy Chicot]. Barcelona: Labor.
Back to index
Proceedings paper
Figure 1 presents the tonogramme of the text of the song "Come All You Fair and Tender Ladies", compared with
its melody line. A tonogramme is a grid in which dashes and dots represent stressed and unstressed syllables. The
speech intonation falling and rising at the end of the phrases are shown by upward and downward curves ( , ) or a
dash with a dot, in case the last syllable in a phrase is unstressed .
Of course, if compared with speech intonations, music intonations have a wider range of tones (several octaves)
with deeper differentiation and even redundancy. (While speaking we usually use the range of 4 tones of the octave
to give the content of the utterance. Emotionally coloured speech can cover 6 tones). However, there seem to be a
lot of concurrences in the general motion of music and speech intonations within one and the same phrase; i.e. the
tendencies of speech rising and falling intonations reveal themselves in melody modulations.
The similar analysis of modern popular songs, which are often used in the English language courses, showed that
this concurrence takes place far from often. See, for example, "Tom's Diner" figure 2. The melody phrase ending
with the word "actor" has a falling intonation, whereas, the speech intonation in this phrase should rise before the
subordinate clause "who had died while he was drinking". The subordinate clause concludes the affirmative
utterance, so its speech intonation falls, though the melody line in this place rises.
So the question arises, what influence the employment of songs either with concurrent or non-concurrent melody
and speech intonations can make on the formation of the correct speech intonation when learning foreign
languages.
In order to test our assumption that the melody of the song influences correct or incorrect speech intonation
memorization we conducted an experiment. It was held in Kharkiv National University, Ukraine, and lasted four
months. 24 first-year students of Foreign Languages Department, Kharkiv National University, were offered to use
9 songs with specially designed activities, aimed at practicing particular grammar structures and vocabulary.
During the lessons the students listened to each song from three to five times.
In six of the selected songs the melody line didn't differ from the speech intonation of the lyrics. But in three songs
the melody deviated from the speech intonations of the lyrics, in particular, the song "Tom's Diner" (fig. 2). At the
final stage of the experiment the participants were asked to reproduce some phrases from the lyrics in another
context. The students' performance showed that in case of melody and speech intonation deviation, the phrases
stuck in their memory with the wrong intonation imposed by the melody.
The students themselves noted that the song melody they had memorized "wronged" their speech intonation.
A year later a similar experiment was conducted in Kharkiv State Pedagogical University with another 22 first-year
students of the Department of Foreign Philology. The results were the same. They allow us to conclude that there
really exists a certain influence of song melody on speech intonation memorization during a foreign language
learning process. We can state that quick melody memorization imposes speech intonation patterns, thus hindering
the use of the correct ones.
We believe that this question requires further investigation both by music psychologists and psycholinguists.
However, even now we can already suggest that, song material for teaching and developing different language
skills should be carefully selected, so that the melody fully coincided with the speech intonation patterns of the
lyrics.
We also believe that similar process may be observed in other languages. So, further development of the problem
under consideration could help not only the English language teachers but also teachers of other languages.
I would like to acknowledge dr.T.Merkulova for fruitful discussion on the topic.
References
Asafiev, B.V. 1965. "Speech Intonation." - Moskva-Leningrad." (in Russian)
Ginsborg, J. 1999. Unpublished PhD thesis, University of Keele, UK.
Grenough, M. 1995. "Sing it! Learn English through Songs." - McGraw-Hill
Griffee, D.T. 1992. "Songs in Action." ? Prentice Hall, Seigakuin University, Japan.
Krashen, S.D. 1985. "The Input Hypothesis" ? London: Longman.
Murphey, T. 1992. "Music and Songs" ? Oxford University Press.
Ruchyevskaya, E. 1973. "On Methods of Speech Intonation Realization and Expressiveness of its
Meaning." ? Poetry and Music. Moscow: Music. (in Russian)
Silver, J. 1996. "English through Songs." ? London: Silver Songs.
Back to index
Proceedings abstract
A Sensory motor theory III: Human vs Machine performance in a beat
induction task.
N.P.McAngus Todd and S. Kohon.
Department of Psychology, University of Manchester, Manchester, M13
9PL.
1. Background
Over the last few years we have been developing a general theory of
rhythm and timing which is consistent with the idea that the
perception and production of rhythmic sequences in music and speech
is an elaboration of a more general sensory-motor sequencing faculty
of the left hemisphere. The theory is implemented as a computational
model in which an attempt is made to simulate the principal brain
structures involved in sequencing (Todd et al 1998, 1999). The model
takes sound samples as input and sychronises a simple dynamic system
to simulate beat induction.
2. Aims
The aim is to evaluate the model by comparison with the performance
of human subjects in a beat tracking task.
3. Method
In the first experiment 2 human subjects and the model were compared
in a beat tapping task for 160 samples of music (Todd and Kohon,
submitted) with a variety of rhythmic structures and tempi. The
second experiment focused in more detail, with a larger number of
subjects, on a subset of the samples, a performance of the 48 fugue
subjects from the Well-Tempered Clavier by J.S. Bach. In all cases
subjects were instructed to tap the beat in synchrony with the
music. The tapping responses were evaluated by a number of
parameters: including the main tapping period, the time to reach a
steady rate and the variability of the tapping compared to the ideal
beat.
4. Results
In the first experiment the model produced viable responses
(responses with a clear beat rate) in 70% of cases. The human
subjects produced viable responses in between 90 and 95% of cases.
Of the viable responses, the beat rates of the model corresponded
significantly to the beat rates of the human subjects. In the second
experiment the model was indistinguisable from humans in it's beat
rates. The model also compared well with other comparison measures
including time to reach a stable beat and position of the down beat.
5. Conclusions
The results provide support for the model but also raise several
issues for model improvement not least that of scene analysis and
the role of top-down processing in the extraction of a beat.
References
Todd, N.P.McAngus and Kohen, S. (submitted) Testing a sensory-motor
theory of rhythm perception: Human vs machine performance in a beat
induction task.
Todd, N.P. McAngus, Lee, C.S. and O'Boyle, D.J. (1998). A
sensory-motor theory of rhythm and timing in music and speech.
Proceedings of the International
Conference on Neural Information Processing. ICONIP'98. Japan,
October, 1998.
Todd, N.P. McAngus, Lee, C.S. and O'Boyle, D.J. (1999). A
sensory-motor theory of rhythm, time perception and beat induction.
J. New Music Research 23(1), 25-70.
Back to index
Proceedings abstract
c.trevarthen@ed.ac.uk
Background:
Aims:
This presentation will seek a unified theory of the Intrinsic Motive Pulse by
which, it is proposed, human movements are generated and integrated, and relate
this to movements of the body in bipedal locomotion, communicative gesture and
voluntary acts of all kinds.
Main contributions:
Implications
Back to index
Proceedings paper
INTRODUCTION
Vibrato is the periodic fluctuation in pitch, amplitude and/or timbre of a musical tone. It is used by singers, string players, and, in some cases, by wind instrumentalists to ornament or
color a tone. Vibrato research has focused on the general characteristics of vibrato, such as its form (generally found to be sinusoidal, but mostly trapezoidal according to Horii,
1989b), its perceived central pitch (mean or median of pitch fluctuation, see, e.g., Shonle & Horan, 1980, Sundberg, 1978), and its mean rate and extent in musical performances (rate
between 5.5-8 Hz, extent between 0.6-2 semitones for singers and between 0.2-0.35 semitones for string players, see Seashore, 1938; Sundberg, 1987; Meyer, 1992). The modeling of
vibrato characteristics has suggested that pitch vibrato is the primary acoustic characteristic of vocal and string vibrato from which amplitude and timbre vibrato result (Horii, 1989a;
Meyer 1992). The (un)conscious control of vibrato characteristics by professional musicians has been a point of debate. While strong instruments are generally found to be able to
control vibrato rate and extent, singers are said to have very limited to no control, or to have only some control over vibrato extent (Seashore, 1932; Sundberg, 1987; King & Horii,
1993). Analyzed dependencies of vibrato to other performance aspects are (among others): short notes only contain an upper arch (Castellengo, Richards & d'Alessandro, 1989);
notes generally start with a rising pitch (Horii, 1989b) and end in the direction of the transition (Sundberg, 1979). Equal debate concerns the dependency between vocal vibrato and
pitch height, which is confirmed by Horii (1989b) and rejected by Shipp, Doherty & Haglund (1990).
In this paper, we will focus on vibrato as an expressive means within musical performances. In this respect, we assume that vibrato may be used by musicians to stress notes or to
convey a certain musical interpretation. It is an area of research that recently gained interest and is still in an explorative stage (see the contribution of Gleiser & Friberg to this
proceedings). We turn to musicians' hypotheses concerning the expressive function of vibrato and compare this to observations made on the relation between music structural
characteristics and vibrato rate and extent in actual performances. The analyses of the performance data are based on the predictions of expressive vibrato behavior (Sundberg,
Friberg & Frydén, 1991) and on predictions stemming from piano performance research that attributes expressive behavior to the pianist's interpretation of musical structure (e.g.,
Clarke, 1988). The comparison aims to show that the scientific inquiries could be inspired by hypotheses stemming from musicians/experts who devote their life to refining their
control of musical parameters for expressive means, and teaching that to students. Vise versa, the scientific results can achieve a musical meaningfulness and value, also for
musicians and teachers.
METHOD
Subjects
Five professional musicians participated in the study: a cellist, an oboist, a tenor, a thereminist, and a violinist. The musicians are all known-musicians for their performance in
file:///g|/Sun/Timmers.htm (1 of 9) [18/07/2000 00:31:24]
VIBRATO, QUESTIONS AND ANSWERS FROM MUSICIANS AND SCIENCE
orchestras, chamber ensembles and/or as soloists. Each participant was paid for participation.
Material
The study used a notation of the first phrase of 'Le Cygne' by Saint-Saëns (1835-1921) for musicians to play from. Originally, 'Le Cygne' (translation: the swan) is for cello solo with
orchestral accompaniment. A piano reduction of the orchestral accompaniment is however very common, as is a performance of the solo part by other melodic instruments than cello.
The swan is in G major and 6/4 measure. The first phrase is the theme of the piece, is four measures long and consists of two sub-phrases of two bars. It is preceded by an
introduction from the accompaniment of one bar (see figure 1). The melody of the first sub-phrase starts with a descending movement in quarter notes (measure 2) and ends with a
counter-movement in longer notes. The melody of the second sub-phrase consists of one long ascending movement in eighth notes that starts and ends on a dotted half note (m3,
figure 1). The accompaniment consists of broken chords in sixteenth notes. The harmony of the broken chords is: the tonic chord in root position (m1-2), the ii sub-dominant chord
with a pedal on G in the bass (m3), a progression from ii to dominant-7 chord with a pedal on G in the bass (m4) and a return to the tonic in root position (m5).
The questions of the interview concerned the production of vibrato on the musician's instrument, the general use and function of vibrato, the specific expressive treatment of the first
phrase of 'Le Cygne', and the differences in this treatment between repetitions. Expressive treatment includes variations in amplitude, vibrato (general), and vibrato rate & extent.
Figure 1 Score of the first phrase of 'Le Cygne' with annotation of metrical structure (dots) and sub-phrase structure (bold lines).
Procedure
The musicians performed the first phrase of 'Le Cygne' by Saint-Saëns along with a metronomical accompaniment, which they heard over headphones. They performed the phrase six
Data processing
A spectral analysis was run on each file, the fundamental frequency was extracted and half-cycles were detected between subsequent local maxima and minima. The half-cycles were
interpreted as vibrato when their rate was within the range of 2-10 Hz and their extent was larger than 0.1 semitone. Note on- and offsets were detected on the basis of a dynamic
amplitude threshold (less than -40 dB as compared to the maximum amplitude) in combination with a dynamic pitch threshold (less then 0.3 semitone deviation from the mean pitch).
The resulting data to be analyzed consisted of collected note features: mean amplitude, mean vibrato rate of pitch vibrato, and mean vibrato extent of pitch vibrato per note.
RESULTS INTERVIEW
Cello
According to JI, vibrato is made on a cello by moving the left hand in a periodic and symmetric way up and down the neck of the instrument around a central pitch. The impulse is
rather large and comes from the arm. Vibrato is according to JI quite natural and easily learned. He saw the function of vibrato in 'Le Cygne' as aid in the production of a legato
performance, and of a warm and lyrical sound. Vibrato was used as part of the phrasing of the music. He used a kind of vibrato that is not too fast and not too exuberant. Some notes
of the phrase got stress by giving them a more full sound, which means that he performed those notes with a more expressive and faster vibrato, and with more "meat" of the fingers.
The end of phrases "died" away, which was accompanied by a smaller vibrato. In general no note was performed the same or with equal vibrato.
Oboe
According to HR, vibrato on an oboe can be made in several ways; by using the throat, the diaphragm, or even the lips or jar. HR used throat vibrato in 'Le Cygne', because it is quite
fast and expressive. Throat vibrato is produced by a rapid repetition of a short 'a' sound on the oboe. She teaches her pupils to perform a rhythmically and fast vibrato by
synchronization with a metronome. The result is a periodic fluctuation in pitch around a stable pitch center.
HR used small vibrato in her performance of 'Le Cygne' is, because of its soft and subtle character. She gave the first note and the fourth note more vibrato than the other notes of the
first bar. In the second bar, she played the a' intense and relaxed towards the end of the sub-phrase. The next e' got considerable vibrato, the following eighth notes did not get vibrato,
but "bellies", and the last note got extra vibrato. Dependencies between vibrato and other performance aspects are, according to HR: vibrato rate and extent increase with the
resistance of a tone, vibrato rate increases with the loudness of tone and is influenced by the rhythm of the accompaniment.
Tenor
According to AO, a singer will naturally sing with vibrato if he or she breaths correctly and the air flows fluently. So, AO does not produce vibrato, instead he let it come naturally as
a result of natural singing. In the recording session, he had to sing 'Le Cygne' on a single vowel, so without text. He found this a bit unnatural. AO sang the entire piece with the same
vibrato, the only differentiation that he made is one of stopping or starting the vibrato. The first measure, he performed legato, which naturally included vibrato. The long a' of the
Theremin
A theremin is an electronic instrument controlled by moving both hands towards two antennas. The left hand determines the loudness and the right hand controls the pitch of the
electronically generated tone. According to LK, finger positions include all positions between a closed hand (finger position 0, relative low tone) and an open hand (relative high
tone). She makes vibrato by moving her hand to the left and to the right, which constitutes one vibrato cycle. The start is at the minimum pitch, which equals, according to LK, the
perceived pitch of the note. The general vibrato principle is to let the vibrato and volume change together. This means that a note starts soft and without vibrato and then builds up in
volume and vibrato. In 'Le Cygne', LK used lyrical vibrato, which is fast and wide and differs from melancholic, expressive, or nervous vibrato. The shorter notes in the piece did not
need much vibrato; longer notes did. The function of vibrato was expression. Special treatment of notes was done (however) by playing without vibrato. For example, LK gave the
long a' a long start without vibrato.
Violin
As RK told us, the vibrato on a violin is made by rotating the fingers of the left hand up and down the neck of the violin. This movement is a regular movement around a central pitch
and is controlled either by the fingers, hand or arm, or a combination of the three. RK himself has arm-vibrato. RK performed 'Le Cygne' with relative large vibrato, like a cello and
less like a violin. The function of vibrato was to color the tone. The first two measures were in his opinion quiet, while the second two measures were more intense. This, he wanted
to reflect in his performance: first measure: a calm and fluent movement; second measure: leaning on long a' and relaxation towards the end; third and fourth measure: in general
faster vibrato, leaning on e' and scale with equal intensity.
Table 1
Coding of the notes within the analysis along three structural descriptions (metrical stress, phrase position and melodic charge).
g'' 000
f#'' 215
b' 214
e'' 113
d'' 211
g' 210
a' 012
b' 314
f#' 215
g' 310
a' 112
b' 314
d'' 311
e'' 213
f#'' 315
b'' 024
The metrical stress is related to the metrical hierarchy of 'Le Cygne' which is indicated in figure 1 and follows the metrical indication at the start of the piece and a hierarchical model,
such as described by Lerdahl & Jackendoff (1983). Metrical stress increases with metrical hierarchy (for the coding of individual notes see table 1) and the prediction would be that
vibrato rate and extent increase with metrical stress. For the phrase positions, we separated the start (first note), middle and end (last note) of each sub-phrase (see table 1). This is in
line with descriptions of rhythmic structure as groupings starting and ending with a structural downbeat (e.g., Cone, 1968) and with a common finding in performance literature that
performers tend to mark phrase boundaries (Palmer, 1989). The melodic charge of notes is coded according to Sundberg et al. (1991). Melodic charge increases with increasing
distance between the melody note and the tonic note of the prevalent key (G major). The prediction is that the vibrato rate and extent, as well as the loudness of notes increase with
increasing melodic charge. Each note is given a relative level of melodic charge (see table 1).
Below, we report the results of three different ANCOVA's. In each analysis, the independent variables are metrical stress (nominal), phrase position (nominal) and melodic charge
(continuous). The dependent variable is mean vibrato rate per note in the first ANCOVA, mean vibrato extent per note in the second ANCOVA, and mean amplitude per note in the
last ANCOVA.
The individual effect of phrase is significant for all instruments: cello (F (2) = 3.12, p < 0.05), oboe (F (2) = 6.6, p < 0.002), tenor (F (2) = 8.3, p < 0.001), theremin (F (2) = 26.0, p <
0.0001), and violin (F (2) = 3.6, p < 0.05). For the cello, the notes at the start of phrases are in average loudest, those in the middle of a phrase are in average intermediately loud and
endnotes are generally softest. For the oboe and the theremin, notes at the end of a phrase are generally softer than notes at other positions. For the tenor and oboe, the opposite is
case: endnotes are in average louder than other notes.
The individual effect of melodic charge is significant only for the oboe (F (1) = 39.8, p < 0.0001). For the oboe, the amplitude of notes rises with the melodic charge of notes.
DISCUSSION
The most articulate answers of the musicians in the interviews concerned the production of vibrato on the instrument, the general characteristics of vibrato, such as its form and pitch,
and the general function of vibrato, such as production of a warm sound, expression or legato performance. When asked, the musicians also indicated a way to use special vibrato to
accentuate certain notes. Surprisingly, this special treatment often consisted of starting notes without vibrato. The musicians were explicit about their expressive intentions, such as
phrasing, contrasting first and second half, and tension and relaxation of the music. They suggested related variations in vibrato.
The strongest results from the analysis of the vibrato data concerned the general considerably strong effect of musical structure on amplitude, vibrato rate and extent, the general
consistency of vibrato characteristics over repetitions that is implied by this strong effect, and the limited relatedness between amplitude, vibrato rate and extent. Interestingly, all
instruments had a significant relation between metrical stress and vibrato rate, while phrase position was for all instruments significantly related to amplitude. The specific direction
of the effects differed between expressive aspects (e.g., amplitude, vibrato rate, and vibrato extent) and between instruments. The suggestion is that different expressive means were
used for different purposes.
In general, it is clear that only few aspects that are mentioned by the musicians return in the analysis, and, vice versa, only few clear results from the analysis are mentioned by the
musicians. This is not entirely surprising, since only part of expert behavior is conducted consciously and is therefore primed to be reported verbally (see Ericsson & Simon, 1980).
The performances are instead a result of both automated and consciously directed processes. Nevertheless, there are two inconsistencies between analysis and interview results that
are of direct importance for the study of expressive behavior. First, the musicians talk about expressive aspects of the performance in a sequential way, while the analysis tests for
similar expressive treatment of notes with similar structural descriptions. In some cases, the sequential viewpoint is easily translated into a structural one. In other cases, this is less
easily done and may only lead to confusion. In other words, a sequential viewpoint may be more beneficial. Second, the difference in viewpoint is especially strong if special
treatment of vibrato is concerned. While expressive behavior is theoretically most often related to an intensification of vibrato rate or extent (see, e.g., Sundberg et al., 1991), the
musicians actually mention to play without vibrato to mark a special note.
Acknowledgements
This research has been made possible by the Netherlands Organization for Scientific Research (NWO) as part of the "Music, Mind, Machine" project. We would like to thank
Henkjan Honing for his helpful comments and Huub van Thienen en Rinus Aarts for their help in the data collection and data processing in an earlier stage of the project.
References
Castallengo, M., Richard, G., & d'Alessandro, C. (1989). Study of vocal pitch vibrato perception using synthesis. Proceedings of the 13th International Congress on
Acoustics. Yugoslavia.
Clarke, E. F. (1988). Generative principles in music performance. In J.A. Sloboda (Ed.), Generative processes in music. The psychology of performance, improvisation
and composition (pp. 1-26). Oxford: Science Publications.
Cone, E. T. (1968). Musical Form and Musical Performance. New York: Norton.
Ericsson, K. A., & Simon, H. A. (1980). Verbal Reports as Data. Psychological Review 87 (3), 215-251.
Gleiser, J., & Friberg, A. (in press). Vibrato rate and extent in violin performance. Proceedings of the 6th ICMPC.
Horii, Y. (1989a). Acoustic analysis of vocal vibrato: A theoretical interpretation of data. Journal of Voice, 3, (1), 36-43.
Horii, Y. (1989b). Frequency modulation characteristics of sustained /a/ sung in vocal vibrato', Journal of Speech and Hearing Research, 32, 829-836.
file:///g|/Sun/Timmers.htm (8 of 9) [18/07/2000 00:31:24]
VIBRATO, QUESTIONS AND ANSWERS FROM MUSICIANS AND SCIENCE
King, J. B., Horri, Y. (1993). Vocal matching of frequency modulation in synthesized vowels. Journal of Voice, 7, 151-159.
Lerdahl, R., & Jackendoff, F. (1983). A Generative Theory of Music (pp. 69-104). Cambridge, MA: MIT Press
Meyer, J. (1992). On the Tonal Effect of String Vibrato. Acustica : journal international d'acoustique, 76 (6), 283-291.
Palmer, C. (1989). Mapping musical thought to musical performance. Journal of Experimental Psychology: Human Perception and Performance, 15, 331-46.
Shonle, J. I., & Horan, E. (1980). The pitch of vibrato tones. Journal of the Acoustical Society of America 67, 246-52
Seashore, C. E. (1932). The Vibrato. Iowa City, Iowa : University of Iowa.
Seashore, C. E. (1938). Psychology of Music. NY and London: McGraw-Hill Book Company, Inc.
Shipp, T., Doherty, T., & Haglund, S. (1990). Physiologic factors in vocal vibrato production. Journal of Voice 4, 300-304.
Sundberg, J. (1978). Effects of the vibrato and the 'singing formant' on pitch. Journal of Research in Singing, 5 (2), 5-17.
Sundberg, J. (1979). Maximum speed of pitch changes in singers and untrained subjects, Journal of Phonetics, 7, 71-79.
Sundberg, J. (1987). The Science of the Singing Voice. Illinois: Northern Illinois University Press.
Sundberg, J., Friberg, A., & Frydén, L. (1991). Common Secrets of Musicians and Listeners: An analysis-by-synthesis Study of Musical Performance. In P. Howell, R.
West, & I. Cross (Eds.), Representing Musical Structure (pp. 161-197). London: Academic Press.
Vennard, W. (1967). Singing, the mechanism and the technic. New York: Carl Fischer, Inc.
Back to index
Proceedings abstract
Takahiro Aoyagi
hirotaoyagi@hotmail.com
Background:
Aims:
A main purpose of the present study is to identify tonal hierarchy and the
defining elements of a non-Western mode. The differences in cognitive schema of
native and non-native subjects are examined.
This paper proposes a new method of measuring tonal hierarchy. This method
allows less of subjective interpretation and is robust against
misinterpretations of the experimental task.
method:
There were two experiments of distinct methods, in which both Arab and non-Arab
musicians participated.
Results:
The lower scale degrees were generally perceived more salient. Seemingly, the
relationships among them are more fundamental and contribute to the sense of
modality. The response patterns of non-Arab (Western) musicians conform to the
diatonic scale structure.
Conclusions:
Back to index
Proceedings paper
Introduction
Two questions provided the impetus for the present study of expert singers' memorisation strategies
and their recall for the words and music of songs. The first question relates to how expert singers, as
opposed to instrumental musicians, practise and memorise songs. The second question relates to the
interaction of words and music in memory.
Although there is a wealth of anecdotal and pedagogical literature on singers' practising and
memorising strategies, dating from the 18th century, there has been very little empirical research
focusing on what singers actually do when they practise and memorise. This is in contrast to the
literature on instrumental musicians: their practising activities have been explored by music
psychologists and educationalists in a series of case studies (e.g. Miklaszewski, 1989; Chaffin and
Imreh, e.g.1994, 1996ab; Nielsen, 1997; Lehmann & Ericsson, 1998); effective memorising strategies
for pianists and other instrumentalists have also been investigated (e.g. Rubin-Rabson, 1937; Ross,
1964; Nuki, 1984; Hallam, 1997). How, then, are singers' strategies similar to those of
instrumentalists, and how are they different, given that singers perform words as well as music from
memory? Expert musicians practise more, for example, than novice musicians (Ericsson,
Tesch-Romer and Krampe, 1990; Ericsson, Krampe & Tesch-Romer, 1993; Sloboda, Davidson, Howe
and Moore, 1996). However expert musicians' practice strategies differ from those of novice
musicians and therefore the practice strategies of musicians must change with the development of
expertise (Gruson, 1988; Hallam, 1994, 1997).
It has been shown that the words and music of songs are integrated in memory (e.g. Serafine, Crowder
& Repp, 1984) and that music enhances recall for text (e.g. Rubin, 1977; Hyman & Rubin, 1991;
Wallace, 1994). However the methods used by these researchers required largely
non-musically-trained listeners to make familiarity judgments or to write down the words of texts
including songs with music. Expert singers are accustomed to memorising and performing songs from
memory on a daily basis: what can their strategies tell us about the extent to which they integrate
words and music at different stages of the memorising process? What can they tell us about the extent
to which music influences recall for words, or vice versa?
The research programme took the form of a pilot interview study and three main studies, one
observational and two experimental, which will be outlined in turn.
Interview study
Semi-structured pilot interviews were carried out with five professional singers. They reported
learning the words and music of songs separately before combining them. Their memorising strategies
were primarily for the words rather than the music. These included studying the meaning of the text,
translating it into English if necessary, as well as memorising the words phonetically, by rote. Overall,
the singers described a three-stage process: 1) initial study, 2) learning and 3) deliberate, rather than
implicit, memorisation. These findings echo those of an interview study carried out by Wicinski
(1950, cited by Miklaszewski, 1989b, p.96) who interviewed ten eminent pianists of the day on the
topic of preparing pieces for performance from memory. Seven reported that initial study of the whole
piece formed a first stage; this was followed by a second stage in which they worked on technical
difficulties, while the third and final stage consisted of rehearsals of the whole piece in order to
perfect the final 'interpretation'.
Observational study
The aims of the observational study were, first, to determine the practising and memorising strategies
available to singers and to compare them with those used by instrumental musicians; second, to
compare the strategies used by singers of different levels of expertise; third, to try to distinguish more
effective from less effective strategies.
According to the literature on pianists of varying levels of expertise practising and learning music,
their activities include 'analytic pre-study' and 'mental rehearsal' (Rubin-Rabson, 1937), playing with
hands together and separately (Rubin-Rabson, 1937; Gruson, 1988; Miklaszewski, 1989), playing at
different speeds (Miklaszewski, 1989) and repeating single notes, bars and sections of music (Gruson,
1988; Miklaszewski, 1989). Singers, too, might well carry out 'analytic pre-study' and 'mental
rehearsal'; indeed some of the activities reported by the interview respondents could be defined as
such. Singers, too, might well sing at different speeds and repeat different portions of the music to be
learned. However, playing with hands together and separately is clearly not an option for singers.
Nevertheless, a comparison might be made, for example, between the pianist learning to play the
music for the right hand and the music for the left hand separately, and the singer learning words and
music as independent components of a song.
To what extent would the strategies used by singers of different levels of expertise at different times
during the learning and memorising process be different, and to what extent would they be similar?
We know that musicians' practising strategies change with the development of expertise. Would this
be the case also for singers?
For example, in an observational study of expert, intermediate and novice pianists who learned three
new pieces over the course of ten practice sessions, Gruson (1988) found that in the first session
experts were more likely than the other groups to repeat sections of a piece rather than single notes or
bars, to play with left and right hands separately, to spend more time on each piece and to use
'self-guiding' speech. That is, they began by dividing the task into discrete units and working
methodically and systematically on them. As the sessions progressed, experts spent more time playing
uninterruptedly, repeated fewer notes and slowed down less. While the novice and intermediate
pianists increased their speed of playing steadily throughout the sessions, the experts decided on their
final tempo at an early stage - which tended to be faster than the tempi used by the other groups - and
achieved it by the seventh session. This suggests that they had clearer goals and had a better idea of
how to meet them than the less expert pianists. In other words they had better metacognitive
awareness, which seems to have been confirmed in the course of interviews during which they were
able to describe the varied and complex nature of their strategies.
Hallam's (1997) phenomenographic study of expert and novice instrumental musicians explicitly
investigated their memorising strategies. Like the expert pianists in Gruson's study, the expert
musicians in Hallam's study showed more metacognitive awareness than novice musicians insofar as
the experts were more likely to report using analysis in the course of memorisation. This involved, for
example, noting features of the material to be remembered such as key changes, harmonic structure,
the length of rests and difficult 'exit points'. However they sometimes also used the technique of
memorising largely without conscious awareness, and then linking short memorised sections together
to form longer sections until the whole piece was memorised. Most of the expert musicians reported
combining the two approaches as appropriate to the demands of the particular music to be memorised.
To expand on the aims of the present study as outlined above, then, the first was to find out what
activities would be undertaken by singers that might be directly or indirectly comparable to those of
instrumentalists, identified by Gruson. The second was to compare the activities carried out by singers
of different levels of expertise over the course of the learning and memorising process. For example,
would they work on different lengths of 'practice unit', as Gruson found that pianists repeated longer
or shorter sections according to their level of expertise? Third, would expert singers be as aware of
their goals for practice and memorisation and how to meet them, as the expert instrumentalists in
Gruson's and Hallam's studies?
13 singers (students, amateurs and a group of experienced professional singers who had not taken part
in the pilot interviews) were asked to learn and memorise the same new song, over the course of six
15-minute practice sessions, and to provide a concurrent verbal commentary. The practice sessions
were audio-taped, and the tapes were transcribed and analysed.
The strategies used by the singers were defined initially as 'modes of attempt'. They included singing
the words and music together, either reading from the score or singing from memory, speaking the
words without the music and playing or singing the music without the words: playing or vocalising
the melody; playing the accompaniment; counting beats aloud. As suggested earlier, these modes of
attempt could not be compared directly with the strategies of instrumental musicians identified in
Gruson's and Hallam's studies. However, some similarities were found between the singers in the
present study and Gruson's expert pianists in that the singers - whether experts, amateurs or students -
chose to work on practice units that gradually increased in length and corresponded to compositional
units such as phrases and verses. As shown in Figure 1, the expert singers were differentiated from the
other groups in that they made more attempts using different modes of attempt and were more likely
than the other groups to speak the words, count aloud and sing from memory.
The expert singers appeared to be more goal-oriented, in that, as reported by the interview
respondents, they memorised deliberately and from the beginning of the practice sessions. This is
illustrated in Figure 2.
Another strategy that distinguished them from the other groups involved focusing on the words
separately from the music. However, although the interview respondents had suggested that this was a
strategy to be used in the earliest stages of familiarisation with a song, the expert singers in the present
study were more likely to speak the words aloud, a way of studying the words and music separately,
The aim of distinguishing more effective from less effective higher-level strategies was met initially
by defining the 'best' memoriser and the 'worst' memoriser. The 'best' was the first of the 13 to sing the
whole song entirely accurately from memory. The 'worst' was the singer who took longest to
memorise and made the most errors when she sang from memory. The verbal commentaries provided
by these two singers were then analysed along with their practice and error data.
The 'best' memoriser sang the words and music of the song together rather than separately. She started
memorising early and tested her memory throughout the practice sessions. She worked on a variety of
lengths of practice unit. She made plans and implemented them, monitored and corrected her errors,
and explicitly evaluated her practice. Her approach to practising and memorising thus resembled the
approaches made by the expert pianists described by Gruson (1988): her strategies were varied and
complex; her verbal commentary was detailed and 'self-guiding'. All in all, she appeared to possess a
high degree of meta-cognitive awareness. In contrast, the 'worst' memoriser implemented plans,
monitored errors and evaluated her practice to a much lesser extent; she preferred to sing the music
only, started to memorise comparatively late and consistently repeated the whole song rather than
smaller sections.
The hypothesis that the strategies of the 'best' memoriser were indeed more effective than those of the
'worst' memoriser, in that they would consistently produce better performance outcomes, remains to
be tested. One way to do this would be to undertake an intervention study in which participants of
equivalent levels of expertise would memorise new songs using both types of strategies identified.
Experiment 1
The results of the two analyses of data gathered in the observation study included two apparently
contradictory findings. The professional singers who took part in the pilot interviews reported learning
the words and music of songs separately, and the expert singers in the observation study made more
attempts on the words separately from the music than the other groups did. In contrast, the 'best'
memoriser in the second analysis preferred to sing the words and music together rather than
separately. An experiment was therefore carried out in order to find out if there would be any
advantage, in terms of accuracy and confidence in performance from memory, of memorising words
and music separately, prior to memorising them together, or memorising them together throughout the
whole memorising process.
A new unaccompanied song was constructed and 60 singers were asked to memorise it. The melody
was a folk song, 'tweaked' slightly to remove direct repetitions within the melody, and the text was the
second verse of a not-very-well-known poem. Half the participants were expert memorisers and half
were novice memorisers of songs. They were randomly divided between three conditions. In one
condition they memorised the words first, then the music, and then the words and music together. In
the second condition they memorised first the music, then the words, and then both. In the third
condition they memorised the words and music together all the time. At the end of the memorising
phase, which lasted 20 minutes, the participants were asked to sing the song from memory. They were
then interviewed about their musical education and experiences for ten minutes. At the end of the
interview they were asked to sing the song again. Their word errors, music errors and hesitations in
both performances were scored and analysed. Hesitations represented pauses either because the singer
had forgotten what came next, or to correct errors.
There proved to be no statistically significant differences between the expert and novice memorisers
either in terms of accuracy or confidence. There were no differences between the participants who had
memorised the words and music of the song separately and together in terms of accuracy. However
the participants who memorised the words and music together made significantly fewer hesitations
than the participants who memorised the words and music separately, as shown in Figure 4.
If I were asked to offer practical advice to singers on the basis of this finding, I would suggest that,
when time is short, singers are better advised to memorise words and music together than to memorise
them separately if their aim is to 'keep going'.
Why were no significant differences found between participants with more and less experience of
memorising songs? It may be that singers do not become expert memorisers simply by memorising
many songs; expertise is acquired as a result of deliberate practice and memorising, per se, is rarely
the focus of most singers' practising or memorising activities. On the other hand, many of the
participants in this study who were deemed 'novices', on the basis that they were choral singers who
rarely memorised vocal music with words, were also instrumental musicians with experience of
memorising music.
In order to test the hypothesis that the ability to memorise songs accurately is related to the ability to
learn songs accurately, which in turn is a skill acquired through the development of musical expertise,
the participants hitherto deemed expert and novice memorisers were therefore re-grouped on the basis
of the levels of musical education they had attained. 35 participants had 'high', and 25 participants had
'low' levels of musical expertise. Although there seemed to be no effects of experience of memorising
on accurate memory for the song there turned out to be significant effects of musical expertise. That
is, the more musical training a singer had, and therefore the more musically expert he or she was, the
easier it was to learn the song accurately and therefore to recall it accurately.
What was more interesting, from a theoretical as well as a practical point of view, is that as well as the
significant effect of expertise on accurate memory for the music, there was also a significant effect of
musical expertise on accurate memory for the words: participants who made fewest music errors also
made fewest word errors. Furthermore, memorising the words and the music of a song together
proved a more effective strategy for these comparatively more expert musicians, in terms of accuracy,
than memorising them separately, as shown in Figures 5 and 6.
These findings complement the evidence I have already presented to suggest that singers who
memorise words and music together are more confident when they perform from memory. It also
supports previous findings by Rubin (1977), Hyman and Rubin (1990) and Wallace (1994) that recall
for words is enhanced by music.
Experiment 2
The final experiment, undertaken with the help and encouragement of Anders Ericsson and Andreas
Lehmann, investigated three questions. First, are the words of songs recalled primarily in terms of
their semantic meaning, as suggested by the professional singers who took part in the pilot interviews?
Or are they recalled in terms of their 'structural' qualities (defined by Wallace, 1994, as alliteration,
assonance, prosody, rhythm and rhyme) which are emphasised by their musical setting? In other
words, how important is it to understand the meaning of the words of a song for the purposes of
memorising them? This question does not refer to the 'interpretation' of a song in performance, for
which understanding the meaning of the words of the song is clearly paramount. Rather, is it possible
to explain the ability of singers to sing from memory, in languages they do not understand or speak, in
terms of the relationship of the words to the music to which they are set?
We addressed this question by asking expert singers to memorise songs with semantically-meaningful
words and non-semantically-meaningful words, in this case digit strings, and to perform them from
memory within a variety of constraints. If the words of songs are memorised and recalled in terms of
their semantic meaning, songs with non-semantically-meaningful words should be harder to memorise
and recalled much less easily and less accurately than songs with semantically-meaningful words.
Second, how separable are the words and music of newly-memorised songs when they are recalled?
Serafine, Crowder and Repp (1984) played folksongs with interchangeable words and melodies to
non-musically-trained listeners and asked them to rate the words and melodies for familiarity when
they were played a second time with either the same or different melodies and words. They found
what they called an 'integration effect' for the words and music of songs. That is, listeners
remembered the songs better when they heard both the words and the music for a second time, than
when they heard familiar words set to a different melody or a familiar melody with different words.
We wanted to know if there was an integration effect when singers are asked to recall the words and
music of songs as well as when listeners are asked to recognise the words and music of songs. We
addressed this question by comparing recall for the texts of the songs with and without melody, and
the extent to which text and melody 'triggered' recall for each other.
Third, what is the relationship between speed of acquisition for songs and effective memorisation?
Lehmann and Ericsson (1995) propose that musicians hold abstract mental representations of the
music they perform that underlie both memorisation and performance skills. This proposal is
supported by their finding that pianists who memorise quickly are able to carry out complex tasks
from memory, such as transposition. We used Lehmann and Ericsson's memorising paradigm to
explore the relationship between speed of acquisition and the ability to 'manipulate' the texts and
melodies of the songs once they had been memorised. This involved showing each participant the
musical score of the song to be memorised and simultaneously playing a recording of the melody and
accompaniment. The score was then removed and the participant was asked to sing as much of the
song as he or she could remember to the recorded accompaniment. These pairs of trials, singing first
with the score and then without, were repeated until the participant was able to sing two consecutive
accurate performances of the whole song from memory. Speed of acquisition, then, was measured by
the number of pairs of trials preceding memorisation to criterion: the fewer pairs of trials the
participant needed, the faster the song was memorised.
20 singers with high levels of musical expertise, most of whom had participated in the first
experiment, took part in this study. Each carried out two sessions in which they memorised an
unfamiliar song, one with a word-text and one with a digit-text, to different but matched melodies.
Once each song was memorised to criterion they then performed a series of 15 tasks designed to
assess the extent to which they could retrieve the text and melody independently, modify the text and
melody, and respond to different types of cues. The experimental sessions were recorded on
audio-tape and the participant's performance on each task was transcribed and scored. Measures
included accuracy, latency and task duration.
The first question was whether the words of newly-memorised songs are memorised and recalled
primarily in terms of their semantic meaning or as a component of the melody. We predicted that
participants would take more time to memorise songs with digits instead of words than songs with
semantically-meaningful words in English, and that if understanding the meaning of the words of the
song were crucial for recall then the songs with word-texts would be recalled more easily and more
accurately than the songs with digit-texts. Eight post-memorisation tasks measured recall for text.
Figure 7 shows that, as predicted, songs with digit-texts took longer to memorise.
However it was not the case that participants recalled the songs with word-texts consistently more
accurately and faster than digit-texts. In fact, as shown in Figure 8, digit-texts were recalled more
accurately than word-texts in one task, and word-texts were not recalled any more accurately than
digit-texts in the other tasks.
On the other hand, as shown in Figure 9, recall was slower for digit-texts when participants were
asked to recall the text of the whole song at speed both with and without the melody, and in two other
tasks.
So understanding the meaning of the song clearly does play an important part in recall, though
perhaps not as much as is sometimes thought.
The second question was how separable the words and music of newly-memorised songs are when
they are recalled? Again, the answers were equivocal. We found that, as shown in Figures 10 and 11,
participants recalled digit-texts but not word-texts both faster and more accurately with than without
the music.
The results of the cueing tasks, however, showed that both types of text and melody are more likely to
be encoded and retrieved as integrated than independent components. Although participants were not
able to retrieve the appropriate text when cued with a fragment of melody any faster than they were
able to retrieve the appropriate melody when cued with a fragment of text, they found it much harder
to sing the appropriate melody without also singing the text than they did to speak the words without
also singing the melody.
The final question concerned the relationship between speed and effectiveness of memorisation. We
predicted significant correlations between speed of acquisition and performance, such that the faster
participants memorised the quicker and more accurately they would perform on the 15 tasks devised
to show different aspects of memory for the song. We found significant correlations between speed of
acquisition and performance on seven tasks, and also memorising ability as measured, in terms of
accuracy and confidence, in the previous experiment. Six of these tasks, however, measured speed
rather than accuracy of recall, for example speaking the words of the whole song at speed; retrieving
fragments of the text and fragments of the music 'reversed', responding to cues, both 'forward' and
'reverse', and singing the phrases of the song in reverse order. The seventh task involved singing the
pitches of the melody of the song only, without rhythm, to the regular beat of a metronome.
Lehmann and Ericsson (1995) argue that the ability to form rapid mental representations of a piece of
music, measured as speed of acquisition, underlies the ability to produce performances from memory
that are both stable and flexible, as exemplified by their transposition task. It may well be, then that
this same ability also underlies the ability to perform from memory at speed and to carry out certain
tasks involving 'manipulation' of the memorised song. On the other hand, it may be that performance
on the tasks that did not correlate with speed of acquisition are better explained in terms of the
automatisation of performance resulting from the rote memorisation of text and melody together.
These tasks included the accurate performance of the song with accompaniment and at the same
tempo as that at which the song was originally memorised, as required in circumstances more usual
than that of this study.
Conclusion
How are singers' strategies similar to those of instrumentalists, and how are they different? Because
singers have words to memorise as well as music, they have some options for practice that are
unavailable to instrumental musicians. On the other hand, like instrumental musicians, singers choose
to practise 'chunks' that correspond to units of the compositional structure of the song they are
learning. There is also evidence that singers with higher levels of expertise, like instrumental
musicians with higher levels of expertise, show more metacognitive awareness, in that they use a
wider variety of strategies than less expert singers and are more likely to memorise deliberately rather
than implicitly.
Although the small sample of experienced professional singers who took part in the pilot interview
study reported learning words and music separately in the initial stages of memorisation, the evidence
from the observational study suggests that singers are in fact more likely to practise singing the words
and music together. The expert singers spoke the words of the song aloud more than less expert
singers did, suggesting that they were focusing on the semantic meaning of the words of the song, but
they did so much later during the memorising process than might have been predicted from the
interview data.
In fact closer inspection of the observational data from the most and least 'effective' memorisers
showed that singing the words and music together rather than studying them separately was associated
with earlier memorising and more accurate performance from memory. This finding was borne out by
the results of the first experiment: singers of varying levels of musical expertise were significantly
more confident in their performances from memory, and singers with high levels of musical expertise
were also significantly more accurate, both in their recall for words and music, when they had
explicitly memorised words and music together rather than separately.
These studies, taken together, have practical implications for singers. Although they might hold
implicit theories about what constitutes efficient practice and memorisation, singers either fail to
practise according to their theories, or their theories are wrong. In other words experienced singers,
even those who can also be defined as expert musicians, do not necessarily practise and memorise as
efficiently as they might: memorising words and music together is clearly a more effective strategy
than memorising them separately.
What does this tell us about the extent to which music influences recall for words, or vice versa? The
second experiment provides evidence to support an 'integration effect' for recall as well as recognition
memory (e.g. Serafine et al., 1984). Further, while I would not wish to discount entirely the effect of
semantic memory for the meaning of the words of songs, it is worth noting that, when memorised and
performed from memory with music, recall for semantically-meaningless texts was in many ways no
different from recall for semantically-meaningful words. This supports the notion that music
structures and therefore enhances recall for words (e.g. Wallace, 1994). These findings show also that
studies involving the participation of expert singers can be a useful way of exploring the interaction of
words and music in memory, and provide a basis for further research.
References
Chaffin R. and Imreh, G. (1994). 'Memorising for piano performance: A case study of expert
memory'. Paper presented at 3rd Practical Aspects of Memory Conference at University of Maryland,
Washington, DC, July/August 1994
Chaffin, R. and Imreh, G. (1996a). 'Effects of difficulty on expert practice: A case study of a concert
pianist'. Poster presented at the 4th International Conference on Music Perception and Cognition,
McGill University, Montreal, Canada, August 11-15
Chaffin, R. and Imreh, G. (1996b). 'Effects of musical complexity on expert practice: a case study of a
concert pianist'. Poster presented at meeting of the Psychonomic Society, Chicago, 3 November, 1996
Gruson, L. M. (1988). 'Rehearsal skill and musical competence: does practice make perfect?' In J. A.
Sloboda (ed.), Generative Processes in Music: the Psychology of Performance, Improvisation and
Composition. London, Oxford University Press
Hallam, S. (1994). 'Novice musicians' approaches to practice and performance: learning new music',
Newsletter of the European Society for the Cognitive Sciences of Music, 6, 2-10
Hallam, S. (1997). 'The development of memorisation strategies in musicians: implications for
education', British Journal of Music Education, 14 (1), 87-97
Hyman, I. E. and Rubin, D. C. (1990). 'Memorabeatlia: a naturalistic study of long-term memory',
Memory and Cognition, 18 (2), 205-214
Lehmann, A. C. and Ericsson, K. A. (1995). 'Expert pianists' mental representation of memorised
music'. Poster presented at the 36th meeting of the Psychonomic Society, Los Angeles, California,
November 10-12
Lehmann, A. C. and Ericsson, K. A. (1998). 'Preparation of a public piano performance: the relation
between practice and performance'. Musicae Scientiae, 2 (1), pp. 67-94
Miklaszewski, K. (1989b). 'A case study of a pianist preparing a musical performance', Psychology of
Music, 17, pp. 95-109
Back to index
Proceedings paper
MODULATED RHYTHMS
-A New Model of Rhythmic Performance-
Carl Haakon Waadeland
Trondheim Conservatory of Music, Faculty of Arts,
Norwegian University of Science and Technology,
7491 Trondheim, Norway
carl.haakon.waadeland@hf.ntnu.no
1. Introduction:
performer are fundamentally implemented. A basic idea in this respect is to view performed rhythm as a result of mutual
interactions of different movements (oscillations), and to construct a theoretical model describing rhythmic activity by
means of frequency modulated rhythms, where mathematical, trigonometric functions are representatives of "atomic"
movements. The construction of this model is done in a stepwise manner, providing solutions to the following problems:
A. Present a model of rhythmic structure, where information of note values is represented as continuous movements
through attack points.
B. Construct a model of expressive timing, where performed rhythm is viewed as a result of continuous interactions
of movements: one movement modulating another movement.
This, somewhat complementary relationship between rhythmic movements characterizing rhythmic performance of
music, and modulated rhythms used as a new technique of making rhythm synthesis, constitute an important axis in the
development of our concepts and models. An illustration of the basic idea we are trying to pursue is given in the figure
below:
Figure 1. Illustration of the relationship between rhythmic movements and modulated rhythms.
The left hand side of Figure 1 illustrates expressive timing, understood as a process where structural properties of
rhythm are transformed into live performances of rhythm, whereas the right hand side indicates the construction of
our models (A) and (B). The elements of model (B) are various non-metronomical movement curves, naturally
interpretable as movements associated with syntheses of rhythm performances. Thus, our idea is that rhythmic
movements typical of live, rhythmic performances are approximated by modulated rhythms.
Within our theoretical model the various syntheses of live rhythmic performances that we are able to create exist
on a purely formal and rather axiomatic level, where mathematical functions and graphic illustrations are
interpreted as different representations of rhythmic performances of music. The musical performance itself,
however, "exists" in an interaction with a temporal unfolding of sound. In collaboration with Sigurd Saue a
MIDI-based computer program has been constructed which converts the mathematical model into audible
syntheses of rhythm. The computer implementation of rhythmic frequency modulation is outlined in non-technical
terms by Waadeland & Saue (1999), Waadeland (2000, Chapter 6), and is given a more technical description by
Saue (2000). At the end of this paper we briefly present some examples of rhythmic performances that can be
simulated by means of modulated rhythms.
Figure 2. Graphic illustration of a possible movement curve when tapping the hand against a table in
perfect synchronization with a metronome. Time is displayed along the horizontal axis, and the hand's
distance from the table is measured along the vertical axis.
Observe that the points where the curve touches the horizontal axis correspond to the instants when the hand
touches the table, producing the audible tap. To be quite explicit, the curve above is given by the mathematical
function:
(*) pf(t) = A[1 - cos(ft)],
where t is time, f is the frequency, i.e. a measure of the speed by which the tapping is performed, and 2A is a
measure of the hand's maximum distance from the table. (In the figure above f = 1 and A = 1.) This curve is, of
course, not the only possible movement curve, but certainly a plausible one. It is, for instance, interesting to note
that on the basis of empirical investigations Viviani (1990) states that sinewaves are easy to approximate by
human movements, and are among the simplest predictable motions. In the following we call pf a pulse, and the
minimal points of pf (i.e. the points where the curve touches the horizontal axis) will be denoted pulse beats. With
reference to Figure 2 we choose the following numbering of the pulse beats of pf :
Figure 3. An illustration showing the relation between different movement curves associated with
metronomic performances of different note values.
Observe that in accordance with ordinary musical terminology p represents a subdivision of p in 2, and p is
2f f 3f
naturally interpreted as a subdivision of pf in 3. We now make the following definition:
Definition of subdivision in k :
Let pf(t) = A[1 - cos(ft)], k = 1, 2, 3, ...
In order to represent complex sequences of note values by continuous movement curves we also need to define an
operation making ties of pulse beats. Note that whereas a subdivision of a pulse is given in a unique way, the
operation of making ties of pulse beats is multivalued, being dependent on the pulse beat on which the tie is to
start. If, for instance, we wish to make ties of n pulse beats, n = 1, 2, 3, ..., we have the following n possibilities:
The tie may start on the first, the second, the third, ..., or the n-th beat. Since the pulse pf has frequency f, a tie of pf
making ties of n pulse beats should be a new pulse with frequency f/n. Thus, we make the following definition:
Definition of n-tie:
It is straightforward to verify that τ is a n-tie with minimal points on the pulse beats of p with numbers: l, l+n,
l f
l+2n, l+3n, ...
We are now ready to define a model of metronomic performance of rhythm, MPR:
MPR = Combinations of pulses, pf , subject to the operations of making subdivisions and ties.
Since every note value, as defined in traditional musical notation, is given as a result of subdividing or making ties
of some chosen reference value, it follows that every sequence of note values may be represented by continuous
movement curves using elements of MPR. Therefore, we claim to have obtained a solution of problem (A)
formulated in the introduction. An illustration of how note values, pulses, and movement curves are related, is
given in Figure 4 below. In this figure a quarter note is represented by the pulse: p1(t) = 1 - cos(t) (and
phase-translations thereof). The pulses representing the other note values are calculated using the definitions of
subdivisions and ties.
Figure 4. An illustration showing the connections between a sequence of notes, a movement curve, and a
mathematical representation of pulses as continuous functions, all being related to a robot-like rhythmic
performance executed in perfect synchronization with a metronome. The horizontal axis displays time, t,
where the first beat occurs at time t = 0. Observe how the choice of phase-translations make the minimal
points of the different components of the movement curve coincide at the different beats of the metronomic
performance.
Figure 5. Flowchart for basic rhythmic frequency modulation. (Compare this figure with the flowchart for
basic FM-instrument, Chowning, 1973.)
In accordance with the figure above we now propose the following definition:
If d = 0, there is no rhythmic modulation (i.e. as we see it; no interaction of movements) and the output of the
operation illustrated in the figure above is simply pf. To give a brief illustration of what might happen when d = 0 ,
we look at a concrete example:
Example:
Looking at the graph of this frequency modulated rhythm, we immediately observe that the distance between the
modulated pulse beats is no longer the same for every pair of successive beats. If we, for the following discussion, let ∆
j
denote the distance between the pulse beats j and j+1 (modulated or not), we find that ∆1 is approximately 0.42T, ∆2 is
close to 0.32T, whereas ∆ is approximately 0.26T. Moreover, it can be shown that r is a periodic function with period T
3
= 2π. Thus, the distances between successive pulse beats make the pattern: L(long) - I(intermediate) - S(short);
L-I-S;...etc. Based on the interpretation of the unmodulated pulse as a drummer playing a sequence of quarter notes in
perfect syncronization with a metronome, it is now tempting to interpret this modulated pulse as a movement curve
associated with a new, non-metronomic performance, theoretically constructed by applying an "interaction" of one
movement, p , with another movement, q . Since the modulated pulse is periodic with three beats in each cycle, it
3 1,π/2
seems reasonable to interpret r as a performance of quarter notes in 3/4 meter, where the first beat is performed
lengthened, and the third shortened compared to a strict metronomic performance. Observe, however, that the length of
the measure in this non-metronomic performance is T (=2π), which is equal to the length of the measure in the
metronomic performance interpreted as a performance in 3/4 meter. It should also be noted that the modulated first pulse
beat in the graph above occurs before t = 0. Hence, a performance in accordance with the movement curve of this
example would perform the first beat of every measure a bit early compared to a strict metronomic performance where
the first beat is at t = 0.
The example above illustrates some important features of modulated pulses. However, in order to be able to create
syntheses of live performances of more complex rhythms, we need to investigate rhythmic frequency modulation of
subdivisions and ties as well. The manner by which subdivisions and ties of modulated pulses are constructed, is
expressed in the following algorithm:
Algorithm for constructing modulated subdivisions and ties:
The condition 0 ≤ d ≤ f/f´ ensures that the modulated pulse has the same number of beats as the unmodulated pulse
over any interval of length 2π. If d = 0, (α) reduces to the definition of subdivisions and ties in the unmodulated
case. It now makes sense to propose the following definition of a model of live performance of rhythm, LPR:
As illustrated in the example above, the technique of rhythmic modulation applied to pulses creates different
movement curves representing rhythmic performances characterized by various deviations from metronomic
regularity. Carried on to more complex rhythms, we are now able to construct syntheses of a wide variety of
non-metronomic performances of rhythm utilizing repeated applications of the algorithm (α). However, whether
such syntheses of non-metronomic rhythmic performances also yield relevant approximations to live performances
of rhythm is a question which can be answered only on the basis of empirical investigations. In the next section we
give some examples indicating that RFM might create some interesting simulations to live rhythmic performances.
Through our stepwise constructions of the models MPR and LPR, we have now arrived at a situation as illustrated
by the following figure:
Figure 6. Illustration of interrelations between rhythmic structure and the models MPR and LPR.
Observe that if m ε MPR, then m is a movement curve associated with a metronomic performance of rhythm, and
δ (m) is a movement curve associated with a non-metronomic performance of rhythm, which in some cases also
i
represents a relevant simulation of live performances of rhythm. Moreover, it should be noted that on the basis of
the defined relations between rhythmic structure and movement curves illustrated in Figure 6, δ , δ , δ , ....., δ
1 2 3 k
represent transformations of structural properties of rhythm into (approximations of) live rhythmic performances,
and may thus be seen as representations of expressive timing, as Clarke defines this notion (Clarke, 1999, p.490).
Hence, we now claim to have obtained a solution of problem (B) formulated in the introduction. A theoretical
interpretation offered by our model construction is to describe representations of expressive timing as non-linear,
continuous transformations of rhythmic structure; or, to put it another way, to view expressive timing as a result of
rhythmic structure being "stretched" and "compressed" by actions of movements. It is at this point interesting to
note that Beek, Peper and van Wieringen (1992), modeling coordinated rhythmic movements such as breathing
and walking, cascade juggling, and polyrhythmic tapping conclude that: "Constrained movement involving more
than one limb segment often leads to modulation." (ibid., p.604). This conclusion of Beek et al. seems to support
our choice of frequency modulation as a mathematical expression of interaction of movements.
the beat level in the accompaniment; the first beat is shortened and the second beat is lengthened,
whereas the third beat is close to one third of the measure length. In the example of Section 3 we
created a movement curve making a cyclic pattern: L-I-S of beat distances, where L = 42%, I = 32%,
S = 26% of the total measure. Hence, if a Vienna waltz accompaniment starts on the third beat of this
movement curve, the following cyclic permutation of L-I-S occurs: S-L-I, where, as before, S = 26%,
L = 42%, I = 32%. According to Bengtsson & Gabrielsson (ibid.) this represents a typical distribution
of beat durations in a Vienna waltz accompaniment. Observe that if d (the peak deviation) is allowed
to vary between d = 0 (no modulation) and the maximum value d = 3 (given by the condition d ≤ f/f´
in the algorithm (α), see section 3), the values of S-L-I are changed between the limits:
S = 33%, L = 33%, I = 33% 88 S = 17%, L = 61%, I = 22%
It should be noted that the values of the beat durations above, refer to Dii ("duration in-in"), i.e. the
duration from the onset of a tone to the onset of the following tone. As strongly emphasized by
Bengtsson & Gabrielsson (ibid.), the values of Dio ("duration in-out"), i.e. the duration from the onset
of a tone to the end of the same tone, are also of crucial importance in rhythmic performance of
music. In the computer implementation of RFM, Dio corresponds to the distance between the MIDI
messages: Note on - Note off. To make a simulation of a Vienna waltz accompaniment sound right,
Dio should be "long" for the first beat and "short" for the second and third beat of every measure (cf.
Bengtsson & Gabrielsson, ibid.).
Example (b): Simulating Non-Synchronization Between Musicians Playing Together:
In live performances involving multiple voices and different musicians playing together, perfect
synchronization between the voices seldom, if ever, occurs. When applying different parameters of
modulation to the different voices of a synthesized, polyphonic performance, various such
occurrences of non-synchronization, by Keil (1987, 1995) called participatory discrepancies (PDs),
may be simulated by means of RFM. In Waadeland (2000) several such examples are given,
including simulations of non-synchronization between the bass player and the drummer in a jazz
rhythm section, and syntheses of rhythmic "phasing" as applied in many compositions by Steve Reich
(e.g. Steve Reich: "Drumming", composed in 1971).
Example (c): Syntheses of Accelerando and Ritardando:
In example (a) above rhythmic modulation was applied to create deviations on the beat level. By
applying some more complex algorithms of modulation, e.g. "modulation of modulation" (serial
modulation), various deviations on the measure level can also be constructed (cf. Waadeland, 2000).
Moreover, by choosing the ratio between "carrier frequency" and "modulating frequency", f/f´, "large
enough" various syntheses of accelerando and ritardando may be created (ibid., Section 7.2).
5. Conclusions:
A main concern of this project has been to give a new description of rhythmic performance of music where
fundamental aspects of movements are incorporated. In doing so we have focused on gestural rhythm understood
as a continuous unfolding through attack points, rather than attack-point rhythm where a finite number of discrete
registrations of rhythmic performance are studied. Some interesting achievments of our investigations are:
1. A model of live performance of rhythm, LPR, is presented where expressive timing is simulated by applying
rhythmic frequency modulation as a new technique of rhythm synthesis. The model MPR is naturally
embedded in LPR. From our model construction it follows:
■ Structure of attack-point rhythm is transformed into structure of gestural rhythm. Thereby we
suggest a shift from a discrete to a continuous representation of rhythmic performance.
■ Written representations of music are transformed into representations of live performances of
music.
■ The technique of RFM is shown to provide a very simple temporal control over movement
curves associated with quite complex rhythms. In other words: by manipulating a small number
of parameters we are able to create a large variety of different simulations of live performances
of rhythm.
1. Apart from approximating real performances, an interesting application of RFM is also to create various
unreal (or, "pathological") unfoldings of rhythm. The construction of such pathological performances is
interesting for several reasons. On the one hand, an understanding of parameters determining a pathological
performance may give valuable insight into what kind of adjustments should be made to make the
pathological performance non-pathological. Moreover, if we were able to correlate these model-constructed
adjustments to physical movements of the performing musician's body, this knowledge would indeed be
quite significant to music education. On the other hand, by appreciating the pathological performance as
valuable in its own right a new standard, or maybe put more appropriately, a new esthetics of rhythmic
performance is suggested.
2. The possibilities of creating various unreal rhythmic performances indicate that RFM synthesis may be
applied as a new compositional tool of electro-acoustic music. This is also illustrated in our simulation of
the phasing technique of Steve Reich.
A full understanding and comprehensive application of the RFM technique seems only in its beginning. Various
theoretical developments of our model can certainly be made and new interpretations and applications to a larger
class of rhythmic unfoldings than here presented may find their support in empirical investigations. Some such
ideas are presented in Waadeland (2000).
Acknowledgments:
The author is very grateful to Ola Kai Ledang, Department of Musicology, for encouraging support and critical
comments during these studies, and would also like to express his sincere gratitude to Sigurd Saue, Department of
Telecommunications, Acoustics, for fruitful cooperation in the development of the computer implementation of
rhythmic frequency modulation. Economical support to this project was provided by Norwegian University of
Science and Technology.
Notes:
1. This paper presents some basic results from the author's doctoral dissertation: "Rhythmic Movements and Moveable
Rhythms" (Waadeland, 2000). Some of these ideas have also been presented in complementary manners in Waadeland (1999).
References:
Alén, O. (1995). Rhythm as Duration of Sounds in Tumba Francesa. Ethnomusicology, 39 (1), 55-71.
Beek, P.J., Peper, C.E. & van Wieringen, P.C.W. (1992). Frequency locking, frequency modulation,
and bifurcations in dynamic movement systems. In Stelmach & Requin (Eds.), Tutorials in Motor
Behavior II (599-622). North-Holland.
Bengtsson, I. (1974). Empirische Rhythmusforschung in Uppsala. Hamburger Jahrbuch für
Musikwissenschaft, 1, 195-219.
Bengtsson, I. & Gabrielsson, A. (1977). Rhythm research in Uppsala. In Music, room, acoustics
(19-56). Stockholm: Publications issued by the Royal Swedish Academy of Music, No.17.
Bengtsson, I. & Gabrielsson, A. (1983). Analysis and synthesis of musical rhythm. In J. Sundberg
(Ed.), Studies of music performance (27-59). Stockholm: Publications issued by the Royal Swedish
Academy of Music, No.39.
Bengtsson, I., Gabrielsson, A. & Thorsén, S.M. (1969). Empirisk rytmforskning. Swedish Journal of
Musicology, 51, 49-118.
Chowning, J.M. (1973). The Synthesis of Complex Audio Spectra by Means of Frequency
Modulation. Journal of the Audio Engineering Society, 21 (7), 526-534.
Clarke, E.F. (1999). Rhythm and Timing in Music. In D. Deutsch (Ed.), The Psychology of Music,
Second Edition (473-500). Academic Press.
Gabrielsson, A. (1999). The Performance of Music. In D. Deutsch (Ed.), The Psychology of Music,
Second Edition (501-602). Academic Press.
Keil, C. (1987). Participatory Discrepancies and the Power of Music. Cultural Anthropology 2 (3),
275-283.
Keil, C. (1995). The Theory of Participatory Discrepancies: a Progress Report. Ethnomusicology, 39
(1), 1-19.
Palmer, C. (1997). Music Performance. Annual Review of Psychology, 48. 115-138.
Prögler, J.A. (1995). Searching for Swing: Participatory Discrepancies in the Jazz Rhythm Section.
Ethnomusicology, 39 (1), 21-54.
Saue, S. (2000). Implementing Rhythmic Frequency Modulation. In C.H. Waadeland, Rhythmic
Movements and Moveable Rhythms, Appendix II (252-276). Trondheim: Department of Musicology,
Norwegian University of Science and Technology.
Seashore, H.G. (1937). An objective analysis of artistic singing, In C.E. Seashore (Ed.), University of
Iowa studies in the psychology of music: Vol.IV. Objective analysis of musical performance
(12-157). Iowa City: University of Iowa.
Viviani, P. (1990). Common Factors in the Control of Free and Constrained Movements. In M.
Jeannerod (Ed.), Attention and Performance XIII (345-373). Lawrence Erlbaum Associates, Publ.
Waadeland, C.H. (1999). Rhythmic Frequency Modulation - A New Synthesis of Rhythmic
Expression in Music. In Feichtinger & Dörfler (Eds.), DIDEROT FORUM on Mathematics and
Music. Computational and Mathematical Methods in Music (335-350). Vienna: Österreichische
Computer Gesellschaft.
Waadeland, C.H. (2000). Rhythmic Movements and Moveable Rhythms - Syntheses of Expressive
Timing by Means of Rhythmic Frequency Modulation. Dissertation. Trondheim: Department of
Musicology, Norwegian University of Science and Technology.
Waadeland, C.H. & Saue, S. (1999). Computer Implementation of Rhythmic Frequency Modulation
in Music. In J. Tro & M. Larsson (Eds.), Proceedings 99 Digital Audio Effects Workshop,
Trondheim, December 9-11, 1999 (185). Trondheim: Department of Telecommunications, Acoustic
Group, Norwegian University of Science and Technology.
Back to index
Proceedings abstract
Background:
Theories of frequency modulations in vocal vibrato must address controllability of rate and extent of
fundamental frequency. It is reasonable to assume that one would not develop motor control unless
the results of the control are perceptible. The literature is equivocal on the issue of perceptibility and
controllability of rate and extent of vocal vibrato among professional singers. In particular, detailed
psychoacoustic data regarding auditory discrimination of vibrato rates and extents for singers are
needed.
Aims:
This paper examines perceptual discrimination of vocal vibrato rate and extent using synthesized
signals and controllability of the rate and extent by singers.
method:
Auditory stimuli were (1) synthesized /a/ with 3 rates (3, 5, 7 Hz)and 3 extents (0.5, 1, 1.5 semitones)
and (2) synthesized /a/ with 2 rates (5 and 7Hz) and 12 extents (2% to 7.5% with a 0.5%
increment),generated by a VAX computer using a program SPEAK at the Recording and Research
Center, the Denver Center for Performing Arts. Twelve singers attempted to match the synthesized
stimuli (1) above. Recorded voices were analyzed for accuracy of the match. Another group of ten
singers listened to a randomly ordered pairs of the synthesized stimuli (2) above, and made
"same/different" decisions for each pair.
Results:
Results of the singers attempting to match given rate and extent of vibrato strongly indicated that the
rate of vibrato is under voluntary control (within 10% accuracy) for the range examined while the
extent is not (within 60%). Results for extent discrimination indicated that the listeners needed
approximately 2.5% difference in extent in order to detect difference of two vibrato extents. Acoustic
analyses also revealed frequency modulation is a potential source of amplitude modulation in vocal
vibrato.
Conclusions:
In spite of some claims that vibrato extent is easier to controll than rate, the present study revealed the
opposite: the rate is easier to control than the extent. In addition, acoustic analyses of the singers'
production of vibrato indicated underlying relationships of frequency modulation and amplitude
modulation during vibrato.
Back to index
Proceedings paper
1. Introduction
Numerous developmental studies have shown that at least some of children's knowledge is organized as schemata forms for familiar events, objects, people or places
and have focused on the role of schemata organisation in children's memory (see for a review Davidson, 1996). The present study is concerned with the ability of
musician and non-musician children (from 10 to 12 years of age) to form musical schemata during listening to a piece. As stressed by Neisser (1976, p.54) « A
schema ... is internal to the perceiver, modifiable by experience and somehow specific to what is being perceived. The schema accepts information as it becomes
available at sensory surfaces and is changed by that information. » Thus, the term schemata refers to mental structures which organise information received from our
senses and are continuously altered by that information. Regarding music information derived from listening to a piece, the elaboration of a schema, as pointed out by
Deliège (1997), should be understood as a reduction of the musical piece, rather than the reconstitution of its score.
The main problem in understanding the cognitive processes underlying real-time listening are related to memory for events evolving in time. In the late eighties,
Deliège (1987 for a first sketch) suggested that listening should be considered as a schematisation process built on cues picked up from the musical surface by an
abstraction mechanism (referred to as cue extraction in Deliège 1987; 1989; Deliège & El Ahmadi 1990). The role of this mechanism is to provide landmarks of the
temporal flow of the musical piece and to generate the segmentation and categorization of the musical structures (for a more general view of the model, see Deliège a
& b, and Mélen & Wachsman, this symposium). More closely in relation with the mental processes involved in the present experiments, i.e. the elaboration of a
mental line of a piece of music, are the concepts of "cognitive maps" and of "carte mentale" put forth, respectively by Tolman (1948) and Pailhous (1970), as cited by
Deliège (1991; 1998), which suggest that the animal or the individual are building up some kind of maps to summarize a more important amount of information.
Indeed, this idea is likely to be rather close to the processes involved in the elaboration of a mental line during listening to music.
The validity of this last proposal was tested, with adults musicians and non-musicians, using pieces from the contemporary repertoire (Deliège 1989) and the cor
anglais solo from Tristan und Isolde by Wagner (Deliège 1998). However, the development of this process has not been studied in details yet. Mélen (1999; Mélen &
3.1. Method
Participants: The children having participated in the segmentation task were here employed.
Experimental materials and equipment: The same pieces by Schubert and Diabelli were each cut into 8 segments of different lengths at the ends of musical
phrases. The segments were transmitted via MIDI interface and the MAX software into 8 keys of a device named ScaleGame (for details see Deliège, Delges, Oter &
Sullon, 1998), which permitted the real time listening of the musical information sent in each key.
Procedure: All participants were tested individually. They had been informed that they would be invited to reconstruct the piece, after 4 or 6 listenings, by using 8
keys of the device ranged in a different random order for each subject. The duration of the task was not limited. The MAX software (version 2.5) collected the data.
Results and comments:
a) Schubert: 3 children out of the total of 41 rebuilt correctly the piece (7.56%). They were all members of the musicians group (1 of the M1 group and 2 of the M3
one). Within the 38 wrong reconstructions an additional analysis was performed to observe if the children had been sensitive to the deep structure of the piece, i.e.
alternance of variations of motifs A and B. These results are interesting but not yet significant. 10 children in the remaining 38 (28.9%: 5 M and 5 NM) were sensitive
to the deep structure of the piece. However, as in similar research with adults (Deliège 1998), an effect of primacy and recency was found. 14 children (36.8%: 9 M
and 5 NM) chose correctly the first key for the beginning and 18 children (47.3%; 9 M and 9 NM) chose the last key to finish the rebuilding of the piece. Table 1
shows the distribution parameters -primary mode, mean, and mean distance- for the key positions chosen by all the participants. Table 2, on the other hand, refers to
the possible locations chosen by the participants for each key.
Table 1
Distribution parameters -primary mode, mean, and mean distance- for the key positions chosen by all the participants in the Schubert piece.
M1 M3 NM1 NM3
Segm Modes Mean M. Modes Mean M. Modes Mean M. Modes Mean M.
Dist. Dist. Dist. Dist.
file:///g|/Sun/Koniari.htm (5 of 10) [18/07/2000 00:31:43]
MUSICAL SCHEMATA IN CHILDREN FROM 10 TO 12 YEARS OF AGE:
Table 2
Localisation attributed by participants to the eight segments
In relation with the eight possible locations (bold characters), the numbers in the columns indicate which segments have been localised in that place, and the number
of participants (in parentheses) who have attributed this location to this segment.
Segm 1 2 3 4 5 6 7 8
1(17) 1(1) 1(2) 1(5) 1(3) 1(6) 1(7) 1(1)
b) Diabelli: Results are slightly better for both groups but the dominance of the musician group is relevant. 3 musicians (M3) and 1 non-musician (NM1) out of 41
children correctly reconstructed the piece (9.75%). Within the 37 wrong reconstructions, 8 respected the deep structure (21.6%; 7 M and 1 NM). As for the primacy
and recency effect, 23 children (62.1%; 10 M and 13 NM) put correctly the first key and 33 children (89.1%; 16 M and 17 NM) ended correctly with the last key.
Table 3 shows the distribution parameters -primary mode, mean, and mean distance- for the key positions chosen by all the participants. Table 4 shows the possible
locations chosen by participants for each key.
Table 3
Distribution parameters -primary mode, mean, and mean distance- for the key positions chosen by all the participants in the Diabelli piece.
M1 M3 NM1 NM3
Segm Modes Mean M. Modes Mean M. Modes Mean M. Modes Mean M.
Dist. Dist. Dist. Dist.
1 5 3 2 1 2 1 1 3.2 2.3 1 1.5 0.5
2 2 3.8 2 2 3 1.2 5 4 2.2 5 4.7 2.5
3 3 3.3 1.5 3 3.4 1 3 3.7 1.4 3 4.1 0.9
4 4 4.5 1 4 4.3 0.7 2 3.3 1.5 4 4 1
5* 6 4.1 1.5 5 4.1 1.1 5 4.6 1.3 5 4.5 0.9
6 6 5.2 1.4 6 4.8 1.6 7 4.9 1.8 6 4.7 1.7
7* 3 3.5 3 7 6.6 0.4 4 3.7 3.3 7 4.5 2.1
8 8 7.6 1 8 7.6 0.4 8 7.6 0.4 8 7.6 0.4
Table 4
Localisation attributed by participants to the eight segments
In relation with the eight possible locations (bold characters), the numbers in the columns indicate which segments have been localised in that place, and the number
of participants (in parentheses) who have attributed this location to this segment.
Segm 1 2 3 4 5 6 7 8
1(23) 1(2) 1(5) 1(1) 1(3) 1(3) 1(4) -
* * * * * 7(10)* 7(18)* -
* Segments 5 and 7 are identical. In order to calculate the distribution parameters, n° 7 is considered as n° 5 when is located in positions 1-5, and n° 5 as n° 7 when is located in positions 6-8.
file:///g|/Sun/Koniari.htm (7 of 10) [18/07/2000 00:31:43]
MUSICAL SCHEMATA IN CHILDREN FROM 10 TO 12 YEARS OF AGE:
c) General remarks
The absolute values of the difference between the right key location and the location chosen by each child were calculated and analysed with a 2(training) x
2(familiarisation) x 2(composer) x 8(segment's position) ANOVA. Results revealed a main effect of the segment's position (F(1,37)=6.258, p=0.0001), corroborating
the primacy and recency effect already mentioned above, and of composer (F(1,37)=22.670, p=0.0001) indicating better results for the Diabelli piece. The difference
between musicians and non-musicians was not significant (F(1,37)=1.696, p=0.2009). Perhaps musician children's training, in terms of years and hours of tuition, was
not yet sufficient to reach significatively different results. Familiarisation was not significant either (F(1,37)=1.540, p=0.2224). However, the performance of the M3
group was better than the non-musicians', as shown especially in tables 1 and 3, where it can be observed that for both pieces this group of participants realised a
completely correct mode regarding the rebuilding of the piece.
4. General Discussion
The aim of the present study was to investigate the role of early music practice in the schematisation process by children, during real-time listening. Musician children who
participated in our experiments were at the first steps of their music practice and presented a mean time of 2.5 years of music training. Data collected from the analysis of
the results showed that their performances differenciated slightly from the non-musician children's ones. In similar studies with adults, musicians and non-musicians, a
greater differenciation in the performance of the two groups was observed (Deliège 1989; 1998). However, it is worth noting that data analysis reported the same results'
pattern, as the one observed with adults in all previous experiments for both tasks of segmentation and reconstruction of a piece (Deliège 1989; 1998). Music practice, even
from its early stages, seems to influence processes involved during real-time listening. Additionally, the factor familiarisation had a more prominent role in musician
children's memory, showing better skills to grasp the musical features.
Analytically, data from the segmentation task showed no effect of the music training factor in the process of grouping formation, i.e. the segmentation, during listening.
Similar groupings were observed in both groups of musician and non-musician children and they were in accordance with the main articulations of the piece. However,
differences in the coherence of the performance between the first and the second segmentation by musicians and non-musicians provide evidence that processes during a
re-representation of a musical piece might be influenced by musical experience. Musician children, even in the first stages of their music practice, are more stable in their
groupings. This provide evidence that, although segmentation process is suggested as a rather automatic psychological behaviour (Deliège 1998), music practice has an
effect in stability of its results in knowledge's representation.
Performance in the reconstruction task was slightly better for musicians. In addition, they seemed more sensitive to the deep structure of the piece than non-musicians. An
effect also of familiarisation was founded mainly in the piece by Diabelli. Children in the NM3 and M3 groups performed better than children from the NM1 and M1
groups. In Schubert, the comparison between the results of the two familiarisation groups showed a slightly better performance for the groups which received three previous
listenings. Children's ability to remember musical schemata occurring in listening seems highly related with familiarisation, music practice and the particularities of the
attended piece. As in general memory investigation, an effect of primacy and recency also appears in memory for musical structures. Children presented better results in the
reconstruction task for the begining and the end of the attended pieces.
Participants' performance differenciated also in accordance with the musical piece that had been listen to. Results from Diabelli were clearly different from those collected
for the Schubert piece. This differenciation might be highlighting the fact that even in musical pieces of the same style and period, particularities in their surface involve a
different impact in listeners' processing. Cue abstraction, schematisation process and memory of the resulted schema of a musical piece is influenced by its characteristics
as, for instance, the flow of the temporal rate.
In general, processes exhibited by both categories of children listeners, musicians and non-musicians, in the formation of musical schemata are not different. They are
simply used more efficiently (and perhaps result in more explicit representations, from a cognitive point of view) by children with music training than by children without
file:///g|/Sun/Koniari.htm (8 of 10) [18/07/2000 00:31:43]
MUSICAL SCHEMATA IN CHILDREN FROM 10 TO 12 YEARS OF AGE:
music training. Similar observations have been reported in previous studies with adults (see Deliège & Mélen, 1997). These remarks, underlined also by the fact that
experiments with infants (Mélen, 1999; Melén & Wachsman, this symposium) have shown evidence of the presence of the cue abstraction mechanism already in the initial
state of human cognition, suggest that this mechanism might be a predisposition of the human mind/brain, which during development and experience is modularized, in the
sense of Karmiloff-Smith's theory (1992).
Karmiloff-Smith developed the hypothesis that the human mind/brain starts out with cognitive predispositions that already exist in its early age. As development proceeds,
these predispositions are moduled from external influences and result to specific brain circuits that are activated in responce to domain-specific inputs and, in certain cases,
to relatively encapsulated modules' formation. This modularization process of the human mind/brain sustain the structure of its behaviour and is responsible for its particular
way of acquiring knowledge. During modularization, on the one hand, representations of the information already stored (both innate and acquired) are continuously altered
via a process of redescription or, more precisely, of an iteratively re-representation of knowledge in different representational formats, and on the other hand, implicit
information from these procedural representations is rendered via a process of "explicitation" in a more explicitly one.
Thus, in the light of Karmiloff Smith's theory, the cue abstraction mechanism might be an innate predisposition of the human mind which during development is influenced
and modulated by environmental constraints such as experience, training or culture. However, additional researches are needed to validate this hypothesis and to provide
more evidence for the role of the cue abstraction mechanism in the schematisation process of children of other ages and different levels' of musical experience.
Bibliography
Davidson, D. (1996). The role of schemata in children's memory. Advances in Child Development and Behavior, 26, 35-58.
Deliège, I. (1987). Grouping conditions in listening to music: An approach to Lerdahl & Jackendoff's grouping preference rules. Music Perception, 4(4),
325-360.
Deliège, I. (1989). A perceptual approach to contemporary musical forms. In S. McAdams et I Deliège (Eds). Music and Cognitive Sciences. Contemporary
Music Review, 4, 213-230. Trad. Frs : Approche perceptive de formes musicales contemporaines. In S. McAdams et I Deliège (Eds) La Musique et les Sciences
Cognitives. Bruxelles: Pierre Mardaga, pp. 305-326.
Deliège, I. (1991). L'organisation psychologique de l'écoute de la musique. Des marques de sédimentation - indices et empreinte - dans la représentation
mentale de l'œuvre. Thèse de Doctorat. Université de Liège: Non publié.
Deliège, I. (1997). Similarity in processes of categorisation : Imprint formation as a prototype effect in music listening. A preliminary experiment. In M.
Ramscar, U. Hahn, E. Cambouropoulos & H. Pain (eds), Proceedings of SimCat 1997: An Interdisciplinary Workshop on Similarity and Categorisation.
November 28-30, Edinburgh University, Edinburgh, pp. 59-65.
Deliège, I. (1998). Wagner "Alter Weise": Une approche perceptive. Musicae Scientiae, Special Issue, 63-90.
Deliège, I. (this symposium, a). Prototype effect in music listening. An empirical approach on the notion of imprint.
Deliège, I. (this symposium, b). Perception of Similarity and Related Theories.
Deliège, I., Delges, P., Oter, J-C. & Sullon J-M. (1998). Annexe: Le ScaleGame, un outil MIDI multi-fonctionnel. Musicae Scientiae, Special Issue, 117-121.
Deliège, I. & Dupont, M. (1994). Extraction d'indices et catégorisation dans l'écoute de la musique chez l'enfant. In I. Deliège (Ed). Proceedings of the Third
International Conference on Music Perception and Cognition/Actes de la Troisième Conférence Internationale sur les Sciences Cognitives de la Musique.
Brussels: ESCOM Publications, pp. 287-288.
Deliège, I. & Mélen, M. (1997). Cue abstraction on the representation of musical form. In I. Deliège & J. Sloboda (eds). Perception and Cognition of Music.
East Sussex: Psychology Press, pp. 387-412.
file:///g|/Sun/Koniari.htm (9 of 10) [18/07/2000 00:31:43]
MUSICAL SCHEMATA IN CHILDREN FROM 10 TO 12 YEARS OF AGE:
Deliège, I., Mélen, M. & Bertrand D. (1997). Development of Grouping Process in Listening to Music : An Integrative View. Polish Quarterly of
Developmental Psychology, 3(1), 21-42.
Dowling, J. (1999). The development of music perception and cognition. In D. Deutsch The Psychology of Music, San Diego: Academic Press, 2nd ed, pp.
603-625.
Karmiloff-Smith, A. (1992). Beyond Modularity: A developmental perspective on cognitive sciences, Cambridge MA: MIT Press.
Mélen, M. (1999). Les principes organisateurs du groupement rythmique chez le nourrisson. Musicae Scientiae, III(2), 161-191.
Mélen, M. & Deliège, I. (1995). Extraction of Cues or Underlying Harmonic Structure : Which Guides Recognition of Familiar Melodies? European Journal of
Cognitive Psychology, 7(1), 81-106.
Mélen, M. & Wachsman, J. (this symposium). Categorisation of musical structures in 6- to 10- month-old infants.
Neisser, U. (1976). Cognition and Reality. New York: WH Freeman & Co.
Back to index
Proceedings paper
Figure 1: The standardized key profile, taken from Krumhansl and Kessler (1982).
Given that the perceived stability of a pitch depends on the tonal context in which it occurs, it is important for musical processing that listeners apprehend the tonal structure of a
musical passage. One approach to how listeners establish a sense of key (but see also Butler, 1989) stems from the recognition that, within a tonal context, those pitches that are
perceived as most psychologically stable are also those pitches that are played most frequently, and for the greatest total durations; similarly, psychologically unstable pitches
occur infrequently, and for the shortest durations. This observation has lead to models of tonality perception, such as the Krumhansl-Schmuckler key-finding algorithm (described
in Krumhansl, 1990), that propose that listeners are sensitive to distributional information in music, and identify the key of a musical context based on the degree to which it
matches with such acquired representations of tonal structures. In this vein, a number of studies (Coady, 1994; Laden, 1994; Oram & Cuddy, 1995) have demonstrated listeners'
sensitivity to distributional information.
The Processes of Tonality Perception
What psychological properties underlie listeners' sensitivity to distributional information? One possibility is that sensitivity to this type of structure reflects two complementary
processes: differentiation and organization. By differentiation we mean the distinguishing of pitches from one another in terms of some relevant dimension, such as their total
duration or frequency of occurrence. In contrast, organization refers to a sensitivity to relations between differentiated pitches and the form in which the differentiated pitches are
represented. In a series of experiments we examined these two processes by manipulating the distributional properties of random orderings of the chromatic scale.
Figure 2: Duration profiles for hierarchical, nonhierarchical, and binary stimuli, at tonal magnitudes of 0.5 and 3.5
Second, as the tonal magnitude approaches zero each value progresses toward the mean of all values, resulting in a profile that becomes flat, with the pitches less differentiated.
In contrast, as the tonal magnitude increases the pitches become increasingly differentiated, although at very high tonal magnitudes the value for the tonic pitch so far exceeds the
remaining pitches that the tonic dominates the profile, becoming the one and only pitch.
Organization
Organization was examined by either preserving or destroying the hierarchical structure present in the distributional information. In the hierarchical condition, the
duration/frequency-of-occurrence values mirrored those of the standardized key profile, such that the longest (0th scale degree) and second longest (7th scale degree) pitches were
a perfect fifth apart, and so forth. In contrast, in the nonhierarchical condition the duration/frequency-of-occurrence values were randomly assigned to pitches, thereby destroying
the typical hierarchical relations between the pitches, while preserving the degree of differentiation among the pitches. If the perception of tonality reflects simple memory for
longer or more frequently occurring pitches, then ratings of tonal stability should be similar in the hierarchical and nonhierarchical conditions. If, however, tonality perception
requires hierarchical organization of distributional information, then ratings of the tonal stability of pitches will correspond to the duration/frequency of occurrence of pitches
only in the hierarchical condition alone.
A third condition examined the possibility that listeners could extrapolate a hierarchical structure of tonality onto a set of pitches based solely on differentiating the tonic from all
of the remaining pitches. In this binary condition the value for the tonic was identical to that of the corresponding value for the tonic in the hierarchical and nonhierarchical
condition; in contrast, the remaining nontonic pitches all had the same value. If listeners can indeed extrapolate tonal structure by differentiating only the tonic, then ratings of
tonal stability should demonstrate some (presumably hierarchical) differentiation among the nontonic pitch despite these pitches being undifferentiated in their durational
properties. Sample hierarchical, nonhierarchical and binary duration profiles are shown in Figure 2.
Experiment 1
Experiment 1 examined whether listeners' perceptions of tonality were influenced by (a) the degree of differentiation (i.e., tonal magnitude) of the duration profiles, and (b) the
presence or absence of hierarchical structure in these profiles.
Method
Participants.
Forty students at the University of Toronto at Scarborough, all meeting a 3-year minimum musical training requirement, participated in this experiment in exchange for payment
or course credit. These listeners were assigned to one of two conditions: hierarchical and nonhierarchical. The mean years of musical training for listeners in each condition were
9.9 and 8.2 respectively; this difference in training was not statistically significant, t(19) = 1.38, p > .05.
Materials.
The stimuli consisted of a series of algorithmically composed melodies that were all 10 seconds in length and contained 24 notes, with each pitch of the chromatic scale occurring
twice. The duration of each pitch was determined in the following way. For the hierarchical condition each value in the standard major key profile (Krumhansl & Kessler, 1982)
was raised to one of ten tonal magnitude exponents (0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5). These transformed values were then expressed as a percentage of the sum for all
12 values, multiplied by 10000 (the duration of the melody in milliseconds) and divided by two (the number of times each pitch occurs in the melody. Melodies were then created
by randomly ordering the 24 notes, with the onset of a note immediately following the offset of the previous note. For the nonhierarchical condition, a randomized version of the
standardized key profile was created by randomly assigning duration values to the different pitches; all other aspects of stimulus generation were the same as in the hierarchical
condition. Thus, hierarchical and nonhierarchical melodies contained the same number of long and short pitches, but differed in how these pitches were organized.
Figure 3: Correlations between hierarchical and nonhierarchical probe-tone ratings and the standardized key profile, as a function of tonal magnitude.
In summary, Experiment 1 found that perceptions of tonality were affected by both the differentiation and organization of pitches. In terms of differentiation, the fact that a
minimum level of tonal magnitude was required demonstrates that absolute, rather than relative, differences in pitch durations are important for perceiving tonality. In terms of
organization, the finding that listeners only perceive tonality when the differentiated pitches are organized in a hierarchical fashion shows that differentiation alone is not
sufficient, but that the relations between differentiated pitches are important.
Because listeners perceived tonality in melodies in which the pitches were differentiated by their total duration, but completely undifferentiated in terms of their frequency of
occurrence (each pitch occurred twice in each melody), the present findings demonstrate that duration differences can be a crucial distributional property of music for tonality
perception. It is an open question, however, whether or not frequency-of-occurrence information can also play as significant a role in perceiving tonality; this issue is addressed in
Experiment 2.
Experiment 2
Finally, the degree of organization was examined by correlating mean probe-tone profiles for the various conditions with the standardized key profile of Krumhansl and Kessler
(1982); the results of these correlations are shown in Figure 4. For all conditions the strength of the correlation increased as a function of tonal magnitude. However, the
correlation only reached statistical significance in the hierarchical uncontrolled duration condition, suggesting that the perception of tonal structure is especially dependent on
hierarchically organized absolute differences in duration.
Figure 4: Correlations between hierarchical and nonhierarchical probe-tone ratings and the standardized key profile, as a function of tonal magnitude and duration type.
In summary, Experiment 2 confirmed the finding that listeners' perceptions of tonality were affected by both the degree to which pitches were differentiated and the way in which
the differentiated pitches were organized. Adding to these findings is the result that perceptions of tonal stability were primarily dependent upon increasing duration information,
and were not driven by frequency of occurrence per se. When duration information was held constant (i.e., the controlled duration condition), listeners showed some sensitivity to
pitch differentiation through the general increase in correlation with the standardized key profile, although not enough sensitivity to recover the typical sense of tonal
organization.
Experiment 3
Although the previous studies suggest that it is the hierarchical organization of the complete set of differentiated pitches that leads to a percept of tonality, it might be that less
severe forms of hierarchical structure can similarly drive tonal perception. Is it, for example, necessary that there be multiple levels of differentiation and organization among
pitches, or can the percept of tonality be instantiated with a minimum amount of such organization? This question can be examined by creating a new type of melody whose
duration profile has only two possible levels of differentiation. In these binary profiles the duration value for the tonic is preserved, with the durations of the remaining pitches set
to a different level; thus, the only differentiation is between tonic and nontonic pitches. With these binary profiles, the tonal magnitude manipulation thus varies the ratio between
Back to index
Proceedings paper
Introduction
Periodicity (the recurrence of events at regular time intervals) is generally viewed as a fundamental aspect of rhythm. There is also general
agreement that in music, variously strong periodicities of different sizes can be superimposed. Examples for this are the metric structures of
the various time signatures; a three/four meter for example can be seen as a superposition of a shorter periodicity (from beat to beat) with a
three times longer one (from down-beat to down-beat).
A series of music psychology studies have dealt with the modelling of listeners' perception of periodicity (e.g. Povel & Essens 1985,
Rosenthal 1992, Miller, Scarborough & Jones 1992, Brown 1993, Parncutt 1994, Large 1994, Todd & Brown 1996, Toiviainen 1997 and
Todd, Lee & O'Boyle 1999). The models available mostly display one or more limitations in their applicability or efficiency. For example,
they may not allow the treatment of the finer fluctuations of tempi present in actual performances (this is the case with Povel & Essens 1985,
Miller, Scarborough & Jones 1992 and Parncutt 1994), or they may react too slowly to such fluctuations (as is the case with Large 1994 and
Toiviainen 1997). A thorough discussion of these properties of the models can be found in Langner (1999, pp. 11-19). More remarkable still
are the differences between the rhythm-theoretical backgrounds of the studies named. Most of them concentrate on the metrical structure of a
piece of music. Underlying this is the notion that there is one correct solution (oftenly the time signature given by the composer), and the
model is only deemed to be successful when it comes to the same conclusion. However, the studies by Povel & Essens (1985) and Parncutt
(1994) go one step further. The authors argue, or at least suggest, that actual musical meaning possibly depends on the simultaneous
appearance of very different periodicites, which are in fact incompatible with one particular time signature. This idea is supported by the
studies of Yeston (1976), whose analyses of musical works consistently demonstrate the often very rich and highly complex network of
diverse periodicities, both simultaneous and successive: the analyses clearly view this as an important characteristic of artistic quality.
Seen from this perspective, it becomes obvious that models which can deal with more than the perception of meter are essential. If the aim is a
musically relevant analysis of rhythmic structures, it is necessary to detect the complex network of all the periodicities perceived by listeners.
The central question of this study is thus:
What kind of musical relevant periodicities are present in the rhythmic structure of a piece of music at any particular moment?
Method
In recent music-psychological research into rhythm, oscillation models play an important role. The aforementioned studies by Miller,
Scarborough & Jones (1992), Large (1994) and Toiviainen (1997) for example are based on such models. In these cases, it is presumed that
so-called oscillators can be activated by the periodicity of certain periodically occurring events present in the music; these oscillators
consequently function as periodicity detectors. Oscillators can be regarded either as abstract, mathematically-describable objects, or as
concrete populations of nerve cells. (These kinds of neural oscillators are known from neural science: see e.g. Dudel, Menzel & Schmidt
(1996, pp. 367 and 519-537). In both cases, however, the activation of oscillators is simulated by computer, i.e. abstractly.
An oscillation model of this type is also applied in the present study. Its basis is a set of 4080 oscillators, each with a fixed frequency and
phase. The frequency spectrum comprises 85 frequencies, which stretch logarithmically over the range from M.M.= 7.5 to MM = 960; in each
case, the phase spectrum is formed from 48 phases, which cover the region from 0° to 360° at an constant interval of 7.5°, so that any kind of
temporally shifted periodicites can also be detected. (The total number of oscillators results from 85 x 48 = 4080.)
Each oscillator contains what is known as an activation window, which opens and closes periodically, in exact correspondence with the
frequency and phase concerned. The oscillator is only sensitive to input when the window is open: in other words, it can only be activated at
these times. If a musical event occurs while the window is open, it activates the oscillator. In this manner, the dynamics of the music also
come into play, since louder events produce stronger activation. The calculation of activation proceeds step by step through the course of a
Results
Fig. 1 presents the oscillogram for a conga-performance of the „Bonanza" rhythm by a percussion student (Example No.15 in Langner 1999).
The darkest colours appear at those frequencies which correspond to quarter-, eighth- and sixteenth-notes. Thus the strongest activations are
found among the note values present in the piece. (NB: the dark quarter-note band lies somewhat beneath MM = 120, this corresponds to the
recommended tempo of MM = 108.) If we regard the oscillators as periodicity detectors, then this is a plausible result: the very noticeable
darkening on the quarter-note level is particularly revealing, as the rhythm essentially arises from units with the length of a quarter-note
(eighth-note + sixteenth-note + sixteenth-note = quarter-note). Please note when reading the oscillogram that all oscillations emerge with a
certain delay compared with the events which stimulate them. For example: one can find no dark shading at all before the second tone has
been played. Before this point, the system cannot „know" which frequency belongs to the first two tones.
Fig.1: Oscillogramm for a conga performance of the „Bonanza" rhythm. The activation of the oscillations is indicated by shading at the
corresponding frequency (the more intense the oscillation, the darker the shading). The note values belonging to a particular frequency are
named at the right edge of the graph. The strongest activation can be found at frequencies corresponding to the note values of units present in
the rhythm. The oscillogram reveals the fine tempo deviations of the performer, especially at the level of the sixteenth notes.
On the level of the sixteenth-note, numerous tempo fluctuations made by the player can be observed: there is no consistent, horizontal
oscillation band. However, these fluctuations occur merely on the level of microtiming: even the quarter note band is consistent and almost
completely level.
Some of the shadings may be surprising, such as the light grey band representing the dotted eighth-notes. At first glance, there seems to be no
periodicity at this frequency. A careful inspection of the score reveals however that there are many tones with such a distance between their
onsets, for example the first, third and fifth as well as the fourth, sixth and eighth note of the rhythm. (A dotted eighth-note has the same value
as three sixteenth-notes.) Thus we can state that the procedure is sensitive to such hidden periodicities. It remains to be seen whether this
feature is relevant from a psychomusical point of view.
Discussion I
The example in Fig. 1, along with many more rhythmic analyses (see Langner 1999) demonstrate that, all in all, the procedure enables a
reliable periodicity detection - reliable in the sense that it can indicate all the periodicities clearly present in the rhythm under analysis. In
particular, oscillograms allow the visualisation of many details of performance, such as the fine tempo fluctuations of the player, and also
whether these fluctuations only occur on the microlevel or impact on larger units. It is exactly this possibility - of presenting the
tempo-figuration of a piece of music on more than one level - which seems a particularly beneficial application of the procedure (more
examples can be found in Langner, Kopiez & Feiten 1998 and Langner 1999).
It remains to be seen if all of the many details presented in the oscillograms are actually relevant for musical perception. In particular it must
be ascertained (1) whether the intensity levels indicated by darkening are correct in this form, and (2) whether the aformentioned hidden
periodicites actually play a musical role.
Concerning (2), we can refer to the studies by Yeston (1976), Povel & Essens (1985) and Parncutt (1994) mentioned in the introduction,
which hint the musical significance of the simultaneous occurrence of a great variety of different periodicities. With regard to (1), it has to be
noted that the oscillation intensities increase when their periodicities occur several times successively (at present to an upper limit of 5
repetitions, this is adjustable in the model); this behaviour of the model appears reasonable - it is to be accepted that the perception of a
periodicity becomes more intense when the repetition rate increases. Despite these positive indications, the question of verifying the model
remains. In the course of applying this method to various examples, a possible means of doing this was discovered quite unexpectedly. A
definite trend became apparent when various performances of the same rhythm were compared: those rhythmic performances which were
musically convincing demonstrated on the whole higher oscillation intensities and a more varied oscillation pattern than the badly-played
versions. This observation led to the idea that two values could be derived from each oscillogram: the overall intensity of oscillation and the
overall intensity of change (the latter is a measure for the amount of variation within an oscillogram); from these two values, an explanation
for the musical quality of a performance could be achieved. An examination of the models would thus be possible in an indirect way: a series
of performances could be evaluated by listeners, and in conclusion the usefulness of the model for explaining these evaluations could be
determined by regression analysis.
Experiment
Within the confines of this paper, only an overview of the method and its most important results can be offered. An in-depth description can
be found in Langner (1999), which also includes a CD with the sound examples used.
The stimuli of the experiment were 62 different conga-performances of 10 different rhythms. It dealt with relatively simple rhythms, such as
the „Bonanza" from Fig. 1 or the famous „Bolero" drum-rhythm. Most of these performances were played by students of the Musikhochschule
Hannover, but deadpan and average versions were also integrated. The various versions of a rhythm are differentiated only in their dynamic
shaping and their timing: the average tempo on the other hand, and the timbral and articulatory parameters, were uniform for all versions. To
achieve this, the performances had been transformed on to a drum computer. 47 of the performances were evaluated by 24 expert subjects
(people who study or studied music, ages ranging from 16 to 53 years), 40 of the performances were evaluated by 127 school pupils (from a
comprehensive school: around half had musical training; age range from 15 to 19 years). The subjects heard the different performances of a
rhythm several times and were then asked to evaluate how well (in the musical sense) the rhythm concerned had been played. A scale of one
to six was used for this purpose. In addition, the subjects were requested to give verbal commentaries on the versions and to explain the
reasons for their evaluations, where possible.
The average ratings were first examined by means of variancy analysis. By this process, the factor „version" appeared to be statistically
significant at the p<0.01 level in all of the rhythms analysed; in the majority of cases, the consequent multiple comparisons of means provided
significant differences as well (p<0.01). In a second stage, a multiple quadratic regression analysis was calculated. The dependent variable
was the average rating; the predictor-variables were the above-mentioned values for the overall intensity of oscillation and the overall
intensity of change. The regression analysis was carried out spearately for the 47 versions evaluated by the experts and the 40 evaluated by the
school pupils. The r2-value which signifies how well the regression fits the data, is 0.661 for the experts and 0.727 for the pupil subjects. This
means that around 66% and 73% respectively of the variance can be explained by the two predictors. (The effect of the overall intensity of
oscillation was statistically significant for both groups of subjects, the overall intensity of change was only significant for the school pupils.)
Discussion II
The ability to explain variance of 66% and 73% respectively in the evaluation experiments, provided by this model, can be regarded as
extremely beneficial, given the commonly accepted fact that listeners, including musical experts, are generally not in a position to justify their
evaluations in any objective way. This inability also manifested itself in the course of the experiment, when the verbal explanations were
evaluated (c.f. Langner 1999, pp. 103-104). The oscillation model thus allows a large measure of objectification in an area which was
heretofore hardly possibly to objectify. However, the model does not imply that there is only one good performance: on the contrary, there
may be very different oscillograms for any given rhythm, all resulting in equally high values for both the intensity of oscillation and the
intensity of change.
The musical relevance of the detected periodicities and therefore of the model as a whole can thus be seen to be confirmed. It is worth bearing
in mind that the question of whether or not a rhythm is well played is of great import: playing rhythms well is the daily goal of thousands of
musicians all around the world.
In addition, one single aspect of the model has been seen to be confirmed. The hidden periodicities detected by the model, which were
mentioned above, perform an important function in the explanation of the evaluations: if they are not present, the variance explanation reduces
drastically. From this we can conclude that these periodicities are indeed relevant for perception, in agreement with the ideas of Yeston (1976)
or Parnucutt (1994) cited above.
In other calculations not discussed here, the positive influence of further mechanisms of the model can be demonstrated, particularly the
intensification of the contrasts. When these mechanisms were removed, the variance explanation always deteriorated drastically (c.f. Langner
1999, pp. 116-117).
The relationship between the oscillations and aesthetic evaluation had already been demonstrated in earlier experiments made with piano
performances (Langner & Kopiez 1995, Langner & Kopiez 1996). The question of whether the explanatory value of the model is merely
culture-specific (the culture in question being that of the West), or if it can be applied more broadly, cannot be answered definitively at the
present time. Initial experiments in comparative evaluation, with the same performances of the same simple rhythms and African subjects,
have been carried (Kopiez, Langner & Steinhagen 1999). The first regression analyses with the results gathered up to now indicate that the
model must be adjusted to explain the African evaluations. It appears that the same model can be applied, but with different values for some
of the parameters. From this, it would be possible to conclude that the same basic perception mechanisms are in effect in the case of African
listeners, but that these work in a slightly different way due to different cultural conditioning.
A comparison of the present study with the other studies referred to above reveals that the most important difference is the goal set for the
research. Whereas the other studies are geared towards the explanation of phenomena of perception, the present study aims ultimately for the
explanation of musical effects. Its focus is thus the treatment of another level of hearing.
The procedure demonstrates none of the limitations to application mentioned in the introduction. It deals with real performances, and reacts as
quickly as possible to deviations in tempo. Moreover, it is online-compatible: in other words, it calculates the oscillations parallel to the
continuing progress of the music. These characteristics would appear to make it suitable for providing the musician (for example, a
percussionist) with instant feedback regarding the performance. For this reason, the development of a real-time version is planned - a PC
software package which would allow the generation of an on-screen oscillogram while the musician plays. The fine deviations of tempo for
example could then be viewed, while the display of additional values for the overall intensity of oscillation and the overall intensity of change
give further information on the musical qualities. Such a software package could be utilised by musicians as an aid to self-regulation, and
would of course also be applicable in instrumental tuition.
An example of such a real-time analysis program, which has already been realised, will be presented in the course of the conference: c.f. in
this regard the paper „Real-Time Analysis of Dynamic Shaping" presented by Langner, Kopiez, Stoffel & Wilz.
References
Brown, J.C. (1993). Determination of the meter of musical scores by autocorrelation. Journal of the Acoustical Society of America, 94 (4),
1953-1957.
Dudel, J., Menzel, R., & Schmidt, R.F. (Eds.). (1996). Neurowissenschaft. Berlin u.a.: Springer.
Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.), The psychology of music. New York: Academic Press.
Goldstein, E.B. (1999). Sensation & Perception. Pacific Grove u.a.: Brooks/Cole.
Kopiez, R. & Langner, J. (1998). The irresistable force of rhythm: Evidence for multiple oscillation maxima in the "spontaneous" generation
of tempo and reactions to trigger-impulses. In Suk Won Yi (Ed.), Proceedings of the 5th International Conference on Music Perception and
Cognition, Seoul, August 26-30 (pp. 91-94).
Back to index
Back
Proceedings
(Abstract)
Detecting Self-regulated use Categorising folk Engagement and How two voices Recent
intonation errors of learning melodies using experience: a make a whole: discoveries in the
in familiar strategies in similarity ratings model for the contrapuntal psychophysiology
melodies instrumental study of children's competition for of absolute pitch
practice musical cognition attention in
human and
machine
pulse-finding
Symposium introduction
Rationale.
Over the past ten years there has been an increasing interest in the strategies
that musicians adopt when they practice and prepare for performance. This
research has developed within several different paradigms. Some has been
conceived within the =91expertise=92 paradigm, some has been within a
phenomenographic framework, exploring musicians=92 experiences of practice and
performance, and some has developed within cognitive educational psychology and
is related to studies of metacognition and learning. Despite the different
origins of the research there is much commonality in the findings.
This symposium brings together researchers with different perspectives, who
have adopted different methodologies with a view to increasing our
understanding of the issues relating to practice and performance in musicians
and outlining the directions that future research might take within or between
alternative paradigms.
Aims.
Speakers.
Robert H. Cantwell, Neryl Jeanneret, Yvette Sullivan and Ian Irvine, Faculty of
Education, University of Newcastle, Australia
Discussant.
Back to index
Proceedings paper
Introduction
Professional singers and singing instructors often use figurative expressions that are difficult for a layman to understand in order to describe voice
quality and singing technique, for example, "supported voice", "directed to the mask" or "made to fly". The persistent use of such expressions seems
to indicate that the expressions carry a more-or-less clearly defined specified meaning and that language use here is of a clearly metaphoric nature.
The present paper will concentrate on the study of a pair of concepts: the "forward " or "backward" placement of the singing voice. Clearly, the
words do not indicate the location of the voice source from the point of view of the listener but certain timbral qualities of the singers, which can be
modified by the appropriate application of specific articulatory mechanisms. Random observations from the classrooms of the Estonian Academy of
Music (EAM) show that instructors often advise their students to "direct their voice forward", "avoid moving the voice back", "concentrate the
whole sound stream behind the two upper teeth" and so on. At the same time it is also clear that the communicative precision of such metaphorical
expressions cannot be great. Yet, the success of teaching depends on the unambiguous meaning of terms used in the study process: the student has to
understand, as clearly as possible, in what direction the instructor wants to develop their voice.
Vocal pedagogy uses the concept of "placing the voice" quite frequently. Miller (1977, 1996) defines "directing" the voice as a subjective term that
refers to the vibratory sensation during singing. "Directing the voice" should describe or provoke a certain feeling in the resonance cavities of the
vocal tract. Singers do not usually employ the term "resonance" in its scientific meaning but to rather freely describe the vocal timbre. According to
Miller, most users of the concept of "directing" the voice do not believe that the "directing" would take place literally. The physical processes
necessary for the production of a specific vocal tone may be, to an extent, coordinated through a concept of "directing" the voice.
Rulnick et al (1997: 711) state that two axes can be differentiated in the concept of vocal "placement": up/down and front/back. The sounds the point
of articulation of which is close to the alveolar ridge, for example /t/, /d/, /n/, /l/, /s/, /Z/, /S/, /tS/, /dZ/, /T/, /f/, /v/, /p/, /b/ and /m/, may be considered
"front" sounds. /r/ and /g/ are placed in the "middle". When frontal consonant has been established, it may be assumed that the vowel that is
pronounced together with the consonant would also be carried "forward". Harpster (1984) asks the student to sense the voice in "front", in the area of
the bridge of the nose. He associates five basic vowels (/i/, /e/, /a/, /o/, /u/) with perceptions on the scale of brighter (/i/ and /e/) and darker (/o/ and
file:///g|/Mon/Vurma.htm (1 of 9) [18/07/2000 00:31:57]
Vurma
/u/), while the vowel /a/ remains in the middle of the scale of perception. He claims that the brighter vowels are felt to be "forward", in the mask area
(in the area of cheekbones, bridge of the nose and eyes) and the darker vowels "back", in the mouth. In his opinion, the balance of the vocal timbre
depends on the right balance of the tone of the "front" and "back" vowels.
In the literature of the field the term pair "front" and "back" can be found in the descriptions of vowel quality. Thus, in phonetics (e.g., Wiik 1991,
International Phonetic Association 1999) different vowels are classified by referring to the position where the tongue is arched in the formation of
the specific vowel. The position of the tongue is in turn is related to the lower formant frequencies in the vowel spectrum. The dimension of vowel
height depends on whether the tongue has been arched high or low in relation to the palate. /æ/ is an example of a low, /i/ of a high vowel, with the
height of the vowel linked to the frequency of the first formant (F1) in the spectrum. Beside the dimension of vowel height, a parallel scale of
closed-open (connected to the openness of the mouth in pronouncing the vowel) is used where the closed vowel corresponds to the high and the open
vowel to the low vowel. The second dimension in describing vowel space shows whether the tongue is arched in the front, near the teeth or back,
near the velum. This gives the classification of vowels into front (e.g., /i/ and /e/) and back vowels (e.g., /u/ and /o/) which are linked to the
frequency of the second formant (F2) in the sound spectrum: it is high for the front vowels and low for the back vowels.
Titze (1994: 167) writes that singers often sense specific "positions" for vowels. The author assumes that such perceptions of location may be caused
by resonance features in the vocal tract. At certain frequencies standing waves are created by the reflection of the vibrations from the walls of the
vocal tract. The sensation of where the vowel is localized or vibration occurs is related to the localization of maximal pressure of the standing waves
in the vocal tract. Benninger (1994) finds that it is often useful to first recognize the sensations created by the right way of singing and memorize it
in order to re-create the process. Even though descriptions of such sensations need not be scientifically accurate, they are of great use in learning
singing technique.
The possibility of directing the voice has also been taken literally. For example, Vennard (1967: 81) refers to the acoustic theory of Scripture (1906)
on this topic, based on an earlier treatment by Willis from 1829. The theory of speech production recogized today (Fant 1960) that describes the
functioning of the human vocal tract, does not, however, have an easy answer to the question of directing the voice. Since Helmholtz (1875), two
major components are distinguished in the voice production process, the source of the voice and the system of resonators of the vocal tract that
function as filters. The spectrum of the so-called glottal sound, produced by the vibration of the vocal chords, is modified under the influence of the
resonators of the vocal tract, so that some partials are enhanced and some are attenuated, enabling the speaker or the singer to produce sounds of
very different timbre by changing the shape of the vocal tract. The singer's ability to change the timbre of the voice created is limited to giving this
or that shape to the resonators in the vocal tract and, to an extent, to regulating the qualities of the glottal sound, while the spreading of the voice in
the body is not controlled by the singers will.
There have been attempts to find a connection between the terminology describing the sound of the voice and objectively measurable acoustic
parameters. Bartholomew (1934) has associated the brightness of the voice of a good opera singer with strong partials in the voice spectrum in the
range of at about 2800 Hz. Cleveland (1977) describes connection between the type of voice and the frequencies of vowel formants in the singing
voice. In the eight male singers studied by him the vowel formant had a lower frequency if the singer was a bass and higher frequency if the singer
was a tenor; the formant frequencies corresponding to the baritone were of an average value. The qualities of the vocal timbre that are connected to
different vocal techniques (e.g., covered open, throaty, pressed, free) have been described, on the basis of the spectrograms, by Berg and Vennard
(1959). Sundberg (1970, 1973) has studied bass singers of dark and light vocal timbre and discovered that in the case of darker timbre, the formant
Methods
The present study will address the possible differences between the voice placed "forward" and "backward" in three different ways. First, a group of
singing teachers, both Estonian and foreign, were asked, in the form of interviews or written expert opinions, three questions: (1) Does the
terminological opposition - voice placed "forward"/"backward" - have a clearly defined meaning? (2) If yes, then what are the qualities that are
associated with the voice placed "forward" and "backward" (3) What vocal techniques should be employed to achieve a voice placement "forward"
and "backward", respectively? These questions were posed to eleven instructors at the Estonian Academy of Music (EAM) and to vocalists thorough
the Internet mailing list at the address http://www.vocalist.org/.
As the second step, short triads, performed by students of classical singing at the EAM, were recorded. Students were requested to sing so that the
voice placed "forward" and "backward" could be distinguished: in the first series they were to sing with the voice "placed forward", in the second
with the voice "placed backward". The triads were sung on the five so-called basic vowels, /a/, /e/, /i/, /o/ and /u/. The students were recommended
to sing in D major but they could change the key, choosing the tessitura most suitable to their voice. The meaning of the words "voice placed
forward/backward" was not explained. Of the twenty students eleven were female and nine male and their period of study in singing ranged from
two to ten years, with the average duration of studies being five years. The recordings were made using a DAT tape recorder Pioneer D-500 in a
low-reverberation studio, with a AKG C414B microphone placed 20 cm from the singer's mouth. The recordings were digitalized at the frequency of
22.05 kHz.
The recordings were then subjected to acoustic analysis with Voice Analysis software from Tiger Electronics in order to identify the differences
between the voice "placed forward" and "backward". Both the LPC and FFT algorithms were used to estimate the second (F2) and third (F3)
formant frequencies of each triad as well as the relative strength of the so-called singer's formant, i.e., the frequency maximum within the range of
2-4 kHz. Since it is difficult to determine the first (F1) formant frequency in the singing voice (especially in case of a high fundamental) and the
division of vowels into front and back vowels is phonetically linked to F2 frequency, F1 frequency was not determined in the analyzed material.
In the third stage, a listening test was conducted for a group of singing teachers and students (sixteen people in all) where the testees had to decide
whether the recorded triads were sung with the voice "placed forward" or "backward". The test consisted of pairs of triads, both of which were
file:///g|/Mon/Vurma.htm (3 of 9) [18/07/2000 00:31:57]
Vurma
performed by the same singer in the same key and using the same vowel. The only difference between the members of the pair was that in one case
the singer tried to keep his or her voice "placed forward", in other "backward". The order of the members in the pairs was random.
There were 87 pairs of triads, 78 of which had been chosen from among the material recorded by the students. In order to test the consistency of the
expert opinions, four of the 78 pairs of triads were used twice. Five of the 87 pairs of triads had been computer-synthesized by means of Voice
Synthesis software by Tiger Electronics. The members in each synthetic pair differed from each other by the value of one specific acoustic
parameter, which could be either F1, F2, F3 and F4 frequency or the relative strength of the spectrum peak (actually F3) within the range of 2-4 kHz
that provisionally corresponds to the singer's formant. The sounds resembled the Estonian vowel /a/. The selection of the formant frequencies of the
vowel were based on the results of the acoustic analysis of the vowel /a/ sung by a baritone who participated in the recording.
The sound volume of all the recordings used in the listening test was levelled. The duration of the test was approximately 16 minutes. If a testee
found it hard to decide whether the voice was placed "forward" or "backward", they were allowed to leave the question unanswered. The total
number of answers was 1392.
On the basis of the measurements, it can be posited that both the F2 and F3 frequencies as well as the level of the so-called singer's formant tend to
increase rather than decrease in "forward placement" in comparison with "backward placement". "Forward placement" makes the F2 frequency to
rise in 63 per cent and to diminish in 17 per cent of the pairs. For F3 the percentages are 55 and 23, respectively. The level of the singer's formant
increases in 70 per cent and decreases in 23 per cent of the cases. At that, the average increase of F2 is 5.2 per cent and that of F3 1.7 per cent. The
level of the singer's formant in the spectrum (measured in decibels) is, on the average, 2.9 dB higher for "forward placement" than for "backward
placement".
If we analyze the results vowel by vowel, it is revealed that the front/back opposition works in the most consistent way with the vowel /e/, for which
F2 increases in 90 per cent of the cases and decreases in 5 per cent of cases. On the other hand, for /o/ the probability of both F2 and F3 increasing or
decreasing is nearly equal (36 and 32 per cent, respectively) although the singer's formant is strengthened rather systematically (in 63 per cent of the
cases). The level of the singer's formant is rising most consistently for /e/, /a/ and /i/ (in 79, 78 and 72 per cent of the cases, respectively) and the
least consistently for /o/ (56 per cent).
If we observe the average values of the changes vowel by vowel (see Table), we can state that the F2 of /e/ undergoes the most conspicuous changes
(10.7 per cent when "placed forward"). The average change of F2 for /i/ and /u/ is also relatively high (6.5 and 6.8 per cent, respectively). For /o/,
however, the average value of F2 decreases by 1.9 per cent when the voice was "placed forward". The average change of the values of F3 is less
salient. Still, the change is positive for all vowels except for /o/, where the average value of F3 decreased by 0.8 per cent. On average, the singer's
formant is stronger for "forward placement" in all vowels. The difference is more marked for the vowels /e/ and /i/ (4.5 and 4.0 dB, respectively).
The fact that the F2 frequency of the second formant changes a lot for the vowels /e/ and /i/ and little for /o/ is in some correlation with the
international phonetic alphabet. IPA vowel diagram (International Phonetic Association 1999) differentiates between the front vowels /i/ and /e/ and
their back variants /I/ and / ε /, but the /•/ and /o/ are in similar positions on the "back-front" axis.
The data at our disposal do not allow us to draw definitive conclusions about which articulatory mechanisms could produce the acoustic changes
described above. We can only speculate that the increase in the frequency of F2 is caused by the arching of the tongue forward in the mouth
(Sundberg 1987). The increase of formant frequencies (both F2 and F3) could be the result of shortening of the vocal tract, for example through
raising the larynx or retracting the lips and corners of the mouth (Sundberg 1987). The increase in the level of the singer's formant with the voice
"placed forward" may also be caused, in addition to alteration of the vocal tract shape, by changing the operation of the vocal cords, i.e. the
phonation type.
Table. Results of the acoustical analysis of the "forward/backward" voice placement. Data on three variables are presented separately for five
vowels, /a/, /e/, /i/, /o/ and /u/. The three variables are the frequencies of the second and third formants, and the relative strength of the singer's
formant. For each variable, its value for the "forward" placement, averaged for all singers (F2f and F3f, in Hz, columns 2 and 5, or sformf, in dB,
column 8), the difference between the "forward" (f) and "backward" (b) placements (in per cent, columns 3 and 6, or in dB, column 9), and the
statistical significance level of this difference (columns 4, 7 and 10) are given.
1 2 3 4 5 6 7 8 9 10
/a/ 967 3.8 0.003 2905 2.4 0.03 -18.1 1.5 0.74
/e/ 1659 10.7 0.26 2664 1.6 0.52 -16.2 4.5 0.43
/i/ 1923 6.5 0.006 2767 4.9 0.31 -16.3 4.0 0.19
/o/ 879 -1.9 0.006 2898 0.3 0.03 -19.3 2.5 0.64
/u/ 824 6.8 0.025 2868 -0.8 0.13 -27.7 1.6 0.07
It seems that the greater the difference between the measured formant frequencies in the spectra for "forward" and "backward" placements, the more
easily the listeners perceive the singer's intention, i.e., classify the voice as being placed "forward" or "backward". The Pearson Product Moment
Correlation between the expected gradient of the formants (i.e. the higher frequencies for "forward" placement and the lower ones for "backward"
placement) and the correct guesses of the singer's intention by the listeners is higher for F2 (R=0.45, p<0.001) than for F3 (R=0.28, p=0.01). An
analogous correlation can be observed between the inverse formant gradient (where higher frequencies are measured for "backward" placement and
lower ones for "forward" placement) and the number of "wrong" answers, i.e., considering the singer's intention to have been the opposite to what it
actually is (R=0.44, p<0.001 for F2 and R=0.37, p<0.001 for F3). If the level of the singer's formant is higher for "forward" placement, the listeners
are more accurate in identifying the singer's intentions (R=0.29 and p<0.01); if the level of the singer's formant is lower for "forward" placement
than for "backward" placement, the listeners tend interpret the timbre of the pair against that of the singer's intention (R=0.25, p<0.05).
In addition to the sung triads, the hearing test also contained five synthesized triad pairs. The members of the pairs differed by the F1, F2, F3, and F4
frequencies and by the level of F3. Defining the synthesized voice in terms of "forward" and "backward" placement was considered strange by the
majority of experts who found it impossible to perform the task. For those experts who were able to differentiate between the two synthesised vowel
triads, the "forward" quality of the voice was connected to both the increase in the frequency of F1, F2 and F3 and the strengthening of the singer's
formant (in reality F3).
Conclusions
The terminological opposition of "forward" and "backward" placement is rather widely used in vocal pedagogy. It is based on a century-old idea that
a good quality of a singing voice cannot be achieved without the singer "directing" his or her voice into certain body resonators such as various parts
of the cranium, thorax, etc. The concept of "directing" the voice is also connected to the vibratory sensations that are created in the process of
singing, also with the place where different sounds are created in the vocal tract. The idea that the voice can be literally directed into different parts
of human body has little to do with objective knowledge about the production and propagation of a human voice. However, the results of the present
study demonstrate that the "forward/backward" placement opposition still seems to have a meaning that - within certain limits - can be objectively
defined, and that is understood similarly by most students and professors. "Forward placement" seems to be a desirable aim to be achieved during
the training lessons with the help of the teacher. Acoustically, a voice placed "forward" differs from one placed "backward" by a higher level of the
so-called singer's formant. The increase of F2 and F3 frequencies also seems to be associated with the "movement of the voice forward from the
back" for both the singers and the listeners. In the case of the changes in the frequency of F2, the opposition of "forward/backward" used in vocal
pedagogy is analogous to the division, generally accepted in phonetics, of vowels into front and back ones which are distinguished by their F2
file:///g|/Mon/Vurma.htm (7 of 9) [18/07/2000 00:31:57]
Vurma
frequencies. On the basis of the changes in the acoustic parameters, one could posit that the terms "voice placed forward/backward" may be
synonymes to terms "light/dark timbre". Sundberg's study (1970) distinguishes between similar differences in the acoustic features of the voice in
the case of light/dark timbre as we have in the case of voice placed "forward/backward" in the present study. If such a relationship indeed exists
between those terms, it would be necessary to clarify whether the vocal quality of "forward/backward" placement can always be associated with the
concepts light/dark timbre. This task, however, remains beyond the scope of the present project.
The results of this study demonstrate that there is considerable indeterminacy in the use of "forward/backward placement" as terms in vocal
pedagogy. This ambivalence may reach, on the basis of the listening test results, quite a high level in some cases (professors either could not
distinguish the placement or gave an opinion different from the one intended by the singer in 64 per cent of the cases). Consequently, the acoustic
(and articulatory) changes when the voice is placed either "forward" or "backward" need not be universal. The given situation may be the result of
both a different (sometimes even opposite) interpretation of the terms "forward/backward placement" by the singers or in their still insufficient
command of their own vocal mechanism, which hinders them from inducing the changes desired by the teacher into their own voice.
It may be said that, despite a certain objectively definable content of the terms "voice placed forward/backward", every teacher seems to use the
concepts idiosyncratically. The quality of being placed "forward" may mean accordance to a certain subjectively defined ideal standard. The voices
placed "forward" that are considered ideal by different experts, need not be the same. In other cases, the term "voice placed forward" may mean the
direction of certain changes in vocal quality, without being connected to a very clearly fixed standard. In working with beginners it should be
remembered that, in order to ensure effective communication, the vocabulary used between the teacher and the student should be developed
gradually and that the terms used by different singing teachers need not be absolute equivalents.
References
Bartholomew, W. T. (1934). A physical description of "good" voice quality in male voice. Journal of the Acoustical Society of America, 6, 25-33.
Benninger, M. S., Jacobson, B. H., & Johnson, A. F. (Eds.) (1994). Vocal arts medicine: the care and prevention of prefessional voice disorders.
New York,Theime Medical Publishers.
Berg, J. and Vennard, W. (1959). Towards an objective vocabulary for voice pedagogy. NATS Bulletin, 15, 10-15.
Bloothooft, G. & Plomp, R. (1986). Spectral analysis of sung vowel III. Journal of the Acoustical Society of America, 79, 852-864.
Bloothooft, G. & Plomp R. (1988). The timbre of sung vowels. Journal of the Acoustical Society of America, 84, 847-860.
Cleveland, T. F. (1977). Acoustic properties of voice timbre types and their influence on voice classification. Journal of the Acoustical Society of
America, 61, 1622-1629.
Fant, G. (1960) Acoustic theory of speech production. Mouton, The Hague.
Harpster, R. W. (1984) Technique in singing. New York, Schirmer.
Back to index
Proceedings paper
Introduction
There is considerable evidence that practice plays a crucial role in the acquisition of expertise on a musical
instrument (Ericsson et al., 1993; Sloboda, et al., 1996), although there is debate regarding the degree of
importance (Hallam, 1998; Sloboda and Howe, 1991). One factor which may mediate the amount of
practise required to learn a piece to performance standard may be the effectiveness of the practice
undertaken.
Much of the research relating to effective studying has been carried out in relation to students in Higher
education. Early work was reviewed by Ford (1981). Since then a number of multi-dimensional models
have been developed which attempt to account for the many factors which contribute towards learning
outcomes in learners of all ages (e.g. Entwistle et al, 1992; Biggs 1993). There have also been attempts to
develop students capacity to "learn to learn" (e.g. Dansereau et al., 1978; Howe, 1991), which particularly
stress the importance of metacognitive activity. This is mirrored by developments in professional training
which increasingly make use of the notion of reflective learning (Kolb, 1984) and conceptualise the
professional as a "reflective practitioner " (Schon, 1987). These perspectives share in common a view that
educators should be concerned with enabling students to learn to learn. Within a musical context there has
been little research, although Jorgensen (1997) advocates the view that practice should be seen as a
"self-teaching" activity, with training in conservatoires designed to develop reflective learning.
Research suggests that experts have extensive domain knowledge, which can facilitate them in perceiving
meaningful patterns in that domain quickly and improve their analysis of the problem, which they
represent at a deeper level. They also have improved short and long term memory skills and strong self
monitoring skills (Glaser & Chi, 1988). Holyoake (1991) suggests that strategies adopted are dependent on
the context. He cites Dorner & Scholkopf (1991) who suggest that successful problem solvers have to
continually adjust the processes of planning, gathering information, forming hypotheses, making choices
and reconsidering decisions. They know how to do the right thing at the right time. There is no single
"expert" way to perform all tasks. Effective musical practice might therefore be seen as "that which
achieves the desired end-product, in as short a time as possible, without interfering negatively with
longer-term goals" (Hallam, 1997b). This assumes that effective practice might take many forms
depending on the nature of the task to be undertaken; the context within which the task is to be learned; the
level of expertise already acquired; and individual differences. It also suggests that the musician requires
considerable metacognitive skills in order to be able to recognise the nature and requirements of the task ;
identify particular difficulties; have knowledge of a range of strategies for dealing with these problems;
know which strategy is appropriate for tackling each task; monitor progress towards the goal; if progress is
unsatisfactory acknowledge this and draw on alternative strategies; evaluate learning outcomes in
performance contexts and take action as necessary to improve performance in the future. The musician
must also have well developed metacognitive skills for supporting practice, e.g. managing their time
appropriately to be able to meet deadlines, maintaining concentration, maintaining motivation,
understanding what preparations they need to make to ensure high performance standards. The aim of this
study is to explore the nature of metacognition and planning in musicians and how these may change as
expertise develops.
The study
A semi-structured interview technique was adopted to enable an in depth analysis of the approaches to
practising, interpretation, memorisation and performance of the musicians. In the early stages of the
research, to validate the content of the interviews, each musician was shown a piece of music and asked to
describe the activities he or she would undertake during the initial stages of learning that work.
For ethical reasons, and the difficulties inherent in classifying professionals in terms of levels of expertise,
all of the musicians interviewed were chosen on the basis of peer evaluations of their high levels of
technical competence and their sensitivity in performance. Only those musicians whose performances were
consistently referred to as being of a high standard, both technically and musically, were included in the
study.
Twenty-two professional musicians were interviewed, 11 female and 11 male with an age range of 22 to
60. They were selected to represent differing spans of time in the music profession, differing
instrumentation, and a broad range of musical experience. All were practising freelance professionals
working within a range of musical environments.
The novice sample consisted of 55 string players, with standards ranging from beginner to music college
entrants, aged 6 - 18. They were recorded for a period of ten minutes practising a short piece of appropriate
standard, which they then performed. The task was part of normal examination procedure for the students.
The taped performance was assessed by two independent judges, marks being awarded out of ten for
overall impression, rhythmical accuracy, steadiness of pulse, notational accuracy, intonation, sense of
tonality and observation of marks of expression. Inter-rater reliability ranged from .82 to .96 (p=.0001).
The students were also interviewed using the same schedule as that used for the professionals.
Each interview was transcribed in full. The content of the tapes from the recordings of the novices was
also transcribed to give a detailed account of their activities while they were practising. This included
information about errors, their correction, stops, starts, poor intonation, inaccurate rhythm, faltering,
repetitions, etc.
Objectivity was established by insisting on agreement between three independent judges on the
categorisation of statements. Where there was disagreement about the categorisation of statements they
were discussed. Only where complete consensus was reached that a statement supported a particular
categorisation was it included in the analysis.
Despite the fact that all the professional musicians interviewed exhibited great sensitivity in performance
and had considerable technical skills, it became evident that there were indeed clear differences in the way
that practising was undertaken. Initial analysis of the data from the interviews and tapes of the novices also
indicated qualitative changes in the nature of expertise as it developed. This was particularly marked at
advanced levels, i.e. Grade 8 and above. The data from these students was therefore examined separately.
Findings: Professional Musicians
What emerged clearly from the data was the extensive metacognitive skills of the professional musicians.
They demonstrated acute self-awareness of their own strengths and weaknesses, extensive knowledge
regarding the nature of different tasks and what would be required to complete them satisfactorily and had
a range of strategies which could be adopted in response to their needs. This not only encompassed
technical matters, interpretation and performance but also questions relating to learning itself, e.g.
concentration, planning, monitoring and evaluation. Although there were similarities in some aspects of
their practice there was also considerable variation because of individual need. This was well illustrated by
statements from two musicians relating to their teaching.
"My pupils are very different from each other. Some are incapable of playing with any kind of
freedom at all, they are so rigid....their fingers go down like machines and so I encourage
them to get away from that. Others are incapable of playing a simple melody with the right
note values, they distort everything. There are two extremes. My pupils sometimes ask me
whether they can come and sit in on other lessons I give. I say, you are most welcome, there's
no secret in what I'm trying to do but I don't think you'll gain because I am only trying to help
that particular pupil at that moment".
"I think we all have our little idiosyncrasies in fingering because of the shapes and sizes of
our hands and the way we approach it. When I'm teaching I find out what suits them."
This acknowledgement of individual needs in relation to practice appeared consistently throughout the
analysis. It demonstrates metacognitive activity as central in determining the nature of the practice
undertaken. Differences were found in the regularity of practice, its content, the extent to which it was felt
necessary to warm up and the type of technical work undertaken (Hallam, 1995a). All these depended on
what the musician felt was necessary to maintain their standards of performance. There were wide
differences in the ways in which musicians prepared for performance. Some adopted an intuitive serial
approach to developing interpretation which evolved as they learnt the piece, others planned in advance,
listening extensively to recordings to develop their ideas, some were prepared to make spontaneous
changes in performance if they felt it was musically appropriate, even to the extent of creating technical
difficulties (Hallam, 1995b). Metacognitive skills were also demonstrated in relation to memorisation (for
details see Hallam, 1997a).
Learning new music
When learning new music all but one of the musicians initially acquired an overview of it, either by
playing it through or by careful examination of the score. Getting an overview of the work served technical
and musical purposes. It enabled the identification of difficulties, an assessment of tempo, which had
musical and technical implications, and a consideration of the structure of the work and the thematically
important material (Hallam, 1995a, b).
Difficulties identified
The difficulties identified by the musicians varied. This variability was, in part, due to their own individual
strengths and weaknesses.
"I would be looking for areas which I know to be my own weaknesses and therefore areas that
I have got to look out for particularly carefully".
"For me personally, semiquavers, fast passages, low notes are never any problem. If it's got
high notes in, it means I have to put in extra practice to build up strength to play it".
Some general trends did emerge, although there were exceptions. Passages that required performance at
the extremes of the instrument, high and low for wind players, high for string players were often seen as
problematic. Particular technical tasks relating to specific instruments e.g. double stopping, triple tonguing,
position changing, particular hand shapes for pianists were frequently mentioned. Generally, fast technical
passages were seen as requiring practice although, for some, once learned they posed no difficulties and
one reported:-
"I don't use the metronome for speeding things up, if anything I've had to use it for slowing
down".
What was clear was that all the musicians knew what for them was difficult and would in their initial
examination of a piece be looking to identify those passages for practice. They also had a range of
strategies available for that purpose.
Strategies
After the identification of technical difficulties, practice was undertaken to overcome problems. The
musicians had a repertoire of strategies which they were able to utilise as necessary to master differing
technical passages. To some extent these depended on the nature of the instrument itself. However some
general trends emerged. All of the musicians emphasised the importance of either cognitive analysis or
slow meticulous playing in the early stages of learning a new work. After this initial stage one of two main
strategies was adopted, repetitious or analytic (Hallam, 1995a). Practise was goal oriented but in some
cases did not focus on learning a particular piece but rather ensuring that technique was of a sufficiently
high standard to deal with difficulties as they arose. Where this strategy was adopted it tended to reflect the
limited nature of the repertoire of the instrument.
Marking the part
The musicians varied in the extent to which they marked things on the music. This depended on perceived
need. Some were reluctant to write anything.
"I try and remember.....I tend not to write much on the music".
"I don't like writing....I find if it is covered in marks I'm looking at the marks instead of the
rhythm and the notes. I find that very upsetting".
"I never mark in accidentals, I never mark in semitones, I don't go in for that at all. It doesn't
mean anything to me".
In contrast some relied heavily on marking parts.
"I write things in to help, very much so."
"I write a lot of things on the music. I have a memory like a sieve."
Others sometimes wrote things in the part.
"Eventually, the day before the show I eventually get round to scribbling a few things,
maybe".
Eleven of the musicians reported extensively marking information on their music, 2 reported making
moderate use of marking, 7 reported using it very little and 2 said that they did not mark things on the part
at all.
Organisation of practice
As with the other aspects of practice there was considerable variability in the level of organisation
reported. Some musicians reported being very well organised.
"I don't have time to waste sitting for hours hammering away ineffectively. If I know I've got
to do something I will do it as fast and as efficiently as possible. If I haven't got anything to
work for I will obviously be a lot more selective in what I'm motivated to practice. Whatever I
practise will be done efficiently and really properly".
"I can achieve a lot in a comparatively short time."
"I try and take a passage. I try and be systematic about it so that I don't always start in the
same place. I decide that today I'm going to take this chunk and work at it".
In contrast some musicians felt a lack of "natural" organisation.
"I don't think I've ever been a very organised practiser.....I wasn't very efficient
"If I don't have a routine its just a waste of time for me .....I fritter the time away".
Some musicians appeared to be aware of their lack of organisation and had taken steps to adopt strategies
to help.
"Well in the past what I would do is just, sort of toy with this bit and that bit and do the same
thing every day in the hope that eventually everything would gradually get better but I've
realised that that is not a very good way to do it. That you've really got to decide from the
word go which of the bits are really going to be the ones that need all the work and get down
to those straight away and that's what I try to do now".
"I would start working on it a month in advance and two weeks before the concert I would
learn it from memory.....which never works out because it tends to be a week before....I tend
to do most of my practice when I'm learning it from memory".
"I'm not terrible self-disciplined that's why I have these little schemes of time schedules and
building up towards the point of performance because in fact I'm very unself-disciplined. A
person who practised easily and more naturally wouldn't need this kind of organisation. So I
do find that my practice isn't always as I would desire it to be. I would like to start every
practise session with slow scales. And in fact I used to start when I did the Dvorak concerto
some three or four years ago, I practised for an hour to three quarters of an hour of slow scales
a day in four different keys which I found very satisfying, very pleasurable, but it's difficult to
get into a routine like that".
It seems as if these musicians are trying to compensate for some degree of natural disorganisation by
imposing schedules on themselves. Others reported difficulties with concentration while they were
practising.
"I get the metronome out. I'm a great believer in the metronome. Well it's a discipline.
When...if you're not feeling like practising......the metronome concentrates your mind in a way
that nothing else seems able to do, because you've got to concentrate on it.....you can't be
thinking wouldn't it be nice to be in the garden, or what am I going to do with my life."
Of the 22 musicians, 7 appeared to be low in "natural" organisation, 10 were moderately organised in their
practice and 5 considered themselves to be highly organised and efficient.
When concentration was considered, 14 reported no problems or good concentration, 5 reported
considerable difficulties, 3 reported some difficulties. When the relationships between reported
organisation and concentration in practice were examined, those musicians who were well organised (5)
also exhibited high levels of concentration, those who reported low levels of organisation (7) had either
low or moderate levels of concentration. While the moderate planners (10) tended to have high levels of
concentration (9). It seems that there is a relationship between level of organisation in practice and level of
concentration.
Preparing for performance
There was considerable individual diversity in relation to the level of preparation perceived as necessary
for performance itself beyond mastery of the work. Some musicians experienced considerable stage fright
but had developed strategies to cope:-
"I'm not a natural performer. I never was any good at it. Partly because I was pushed far too
young. I know that if I haven't done enough practice I am going to be scared out of my wits so
I try to make sure nowadays that I prepare for it properly so that I can have a reasonable hope
of getting through it. If it is a very big event I take a beta blocker".
"I used to think it would be worse for having psyched myself up but having read a book on
tension and these issues where they suggest that you actually should think of it and also being
aware that when the adrenalin comes on suddenly you're doing yourself more harm than if
you've got it gently.... it's been going for a couple of days or whatever".
"I regard that first playing through as practice for the occasion. Because on the occasion
you've got to play it through from cold".
"I have found that if I practise immediately before it that helps. I try and breathe deeply, you
know the usual things to try and calm nerves, the ordinary straight forward things."
In contrast others felt that they needed an audience to "psych" themselves up, improve concentration and
give the performance a "spark". Nervousness can have positive as well as negative effects.
"I don't do any particular preparation for performance. I tend to feel that some kind of
automatic response comes into operation. I enjoy playing to an audience in public and this
engenders its own enthusiasm, its own spark of creativity or whatever and there is no need to
"psych" oneself up in any way. On the occasion of the performance it is simply a matter of
being prepared, practice, technique and then the music will come by itself providing you have
done the groundwork. I find I don't need any special preparation for the performance
situation".
Others have a relative level of unconcern regarding performance to an audience.
"I'm too worried about actually playing the right notes. That's what I'm worried about, the
rhythm.........the audience is the last thing to worry about".
"I think the important thing if you are performing is to make your audience happy."
One musician described her relation with performance as love/hate. She is frequently physically ill before
performances but says that there is nothing she likes doing more.
The interviews also revealed that stage fright can be transient and unpredictable. One musician reported:-
"For a time I was afraid of being afraid".
Others reinforced this unpredictability indicating that even when playing the same programme in a series
where everything had previously been successful one could be overcome by nerves. Of the twenty two
musicians 4 reported low arousal levels feeling the need for an audience to enable them to perform better.
Twelve experienced moderate levels while six experienced high levels of arousal which created problems
for them in performance. These appeared to be related to levels of organisation and concentration in
practice. Those who reported high levels of arousal in performance also reported high levels of
concentration in practice and high or moderate levels of organisation. Those reporting low arousal levels in
performance consistently reported low levels of concentration and organisation in practice. To explore
these relationships in greater depth more sensitive measures need to be developed.
Findings: Novice musicians
Analysis of the data from the novices revealed six advanced students (Grade 8 standard) whose practice
was qualitatively similar to that of the professionals, although a rather "taken for granted" conception of
performance was adopted, none raised the issue of spontaneity vs planning in performance and there was
little evidence of specific "performance" preparation. The novices ranged in standard from beginners to
Grade 7 standard. The development of expertise and strategy use was explored (for details see Hallam,
1994; 1995a, 1997a). This paper focuses on issues relating to metacognition and planning.
Planning
There was an increase in the practice undertaken by 92% of the advanced students and novices as
examinations approached, with greater organisation of practice and more concentration on technical
aspects, e.g. scales. At other times the amount of time spent practising tended to depend on task
requirements. Not even those students contemplating a career in music felt that daily practice was
necessary to maintain standards. The number of days spent practising was correlated with Grade .12 and
age -.02, both non-significant. Total amount of practice did increase with expertise correlating with age .56
(p=.0001) and grade .51 (p=.0001). This was because the length of practice sessions increased.
Criteria were set out to distinguish high, moderate and low levels of planning in the prepared practice and
in normal practice. High levels of planning in the prepared practice were identified by evidence of the
completion of task requirements; speedy identification of difficulties; concentration of effort on difficult
sections; and integration of the sections practised into the whole for performance. Moderate levels of
planning were identified by completion of task requirements; evidence of on task behaviour but repetition
of large sections of the work rather than a focus on difficulties; and no integration specifically towards
performance. Low levels of planning were ascribed when the task was not completed; the first part of the
music was practised but not the remainder; and considerable amounts of time were spent off task. All of
the advanced students exhibited high levels of planning in their recorded practice, while 5 (12.5%) of the
novices did so. 28 novices (70%) showed moderate levels and 7 (17.5%) low levels.
In relation to daily practice, high planning was characterised by reports of specified aims of practice; a
consistent order of practice; self-imposed organisation of when practice was undertaken and a tendency to
mark things on the part. 4 novices (10%) and 2 advanced students (33%) were classified as having high
levels of planning in their daily practice. Moderate planning was categorised on the basis of some
organisation of when practice was undertaken; a planned order of practice when taking examinations and
evidence of some time organisation. 26 novices (65%) and 4 advanced students (66%) fell into this
category. Those categorised as having low planning skills reported practising when they had time;
constantly having to be reminded to practice, wasting time practising unnecessary material and being
disorganised in their work. 10 novices (25%) fell into this category. The advanced students demonstrated
considerable task planning in their prepared sight reading regardless of their normal planning of practice.
This level of planning may therefore be a feature of increased expertise and is perhaps a characteristic
necessary for becoming expert at playing a musical instrument.
Table 1 sets out the relationships between planning in recorded practice and organisation of daily practice.
The novices exhibited different levels of each kind of planning. As was demonstrated in the professional
group there may be a need for conscious cognitive planning to over ride what are perceived as more
"natural" planning mechanisms or levels of expertise.
Table 1
Novice and advanced students'
approaches to planning
Performance
The advanced students exhibited a similar range of behaviours to the professional sample. Some were
excited at the prospect of performance, others realised that nervousness marred their performance. Unlike
the professionals these advanced students had generally not developed successful coping strategies.
Amongst the novices, 90% reported being nervous on the day of the examination but a minority (38%) of
these reported nervousness occurring for several days in advance, some experiencing extreme headaches.
Others (10%) reported no nerves at prospective performance, some were excited.
69% of the novices adopted some kind of strategy (or more than one) to overcome nerves. The most used
strategy was to play to someone else prior to the examination (21 students). The second most popular
strategy was practising itself (8) followed by doing a mock examination (7) or arranging to be tested (6).
These strategies were part of performance preparations. Other strategies were utilised during performance.
These included treating the examination as it it were a lesson (3), avoiding thinking about it (3) and
actively concentrating on the music (1).
For the novices examinations were considered more important than public performances. An advanced
student suggested that this might be due to their concrete outcome, i.e. a mark. Although strategies were
adopted in relation to stage fright they tended to be focussed on reducing the fear rather than as a positive
means of alleviating any detrimental effects on performance. This had clearly not developed the same
significance as for the professional group.
Lack of concentration in practice was not reported by the novices or the advanced students. Perhaps young
people are generally less aware of their own internal states except in the case of nervousness, which
because of its severe physical symptoms, is difficult to ignore. Perhaps lack of concentration in practice is
perceived as boredom, a reason for terminating practice rather than a study problem. For the professionals
with performance deadlines to meet and standards to maintain, this is not an option.
Conclusion
The findings indicate that 'expert' musicians have extensive metacognitive skills which enable them to
optimise their learning and performance taking account of their own strengths and weaknesses and task
and performance requirements. They adopt extensive support strategies to promote concentration in and
organisation of practice and to optimise arousal levels for performance. The strategies are adopted
consciously to compensate for what are perceived as 'natural' occurring deficiencies. It was possible to
identify strategy use in the advanced students and novices but strategies were less well developed and did
not have a well defined focus on optimising performance. Taking the evidence together planning
mechanisms seem to operate on three levels. Firstly, planning related to task completion which depends in
part on the level of expertise acquired, second automated planning /organisation which may be a relatively
consistent characteristic of the individual and thirdly, conscious, strategic planning which may compensate
for deficiencies in the other planning mechanisms. Research, utilising interval level measures is required to
Back to index
Proceedings paper
Procedure
During the session the infant was seated on the assistant's lap. The experimenter was seated in another room and monitored the experimental
session through a one-way mirror. The infant was presented repeatedly with a standard sequence (S), the parent motif A, separated by
500-msec silence intervals. When the infant was quiet and facing directly ahead to the flashing green lamp, the experimenter initiated a
training trial, at which time a contrasting sequence was presented once at an intensity 6 dB higher than the previously heard standard
sequence. During the training phase, the contrasting sequence consisted in the parent motif B. If the infant turned his/her head at least 45 d°
toward the loudspeaker during the presentation of the training sequence (T) or the 500-ms silent interval that followed (total response time : 5
seconds), the experimenter recorded the turn and a cartoon was displayed (6 sec) after the end of the contrasting sequence. The video monitor
was under the louspeaker, located 1 meter from the infant at an angle of 90 d°. Head-turns made at other times were not reinforced, nor were
head-turns of less than 45 d°. If the infant responded correctly on two consecutive trials, the intensity of the contrast stimulus was made
equivalent to that of the standard sequence. Subsequently, the infant was required to respond correctly on four consecutive trials in order to
meet the training criterion. Infants who failed to turn to the change on the first two trials were presented with the contrasting sequence 12 dB
higher than the standard on subsequent trials until they responded correctly on two consecutive trials, at which time the intensity of the
contrast stimulus was lowered 6 dB, and so on. The session was abandoned if the training criterion was not met within 20 trials. During the
testing phase, the standard sequence remained the same as in the training phase. The infant was presented with change and no-change trials,
the latter of which provided a measure of spontaneous rotation in the direction of the loudspeaker during the test session. The test session
comprised 36 trials, 12 no-change trials [Standard vs Standard, (S)] and 24 change trials [S vs Comparison, (C)], in random order. Among the
change trials, 3 consisted in variations of parent A and were considered as not-to-be-responded-to change trials (C-, i.e. C- 1, C- 2, C- 3) and
3 in variations of parent B which were considered as to-be-responded-to change trials (C+, i.e. C+ 1, C+ 2, C+ 3), each repeated 4 times. The
infant was expected to turn the head more frequently for the change trials that involved variations of parent motif B. The infants were divided
in two groups in order to counterbalance the categories (in the first condition: parent A, A1, A2, A3 were, respectively, S, C- 1, C- 2, C- 3,
and parent B, B1, B2, B3 were, T, C+ 1, C+ 2, C+ 3; in the second condition: parent B, B1, B2, B3 were, respectively, S, C- 1, C- 2, C- 3, and
parent A, A1, A2, A3 were respectively, T, C+ 1, C+ 2, C+ 3).
The pattern of results presented sofar is far from ideal. One would have expected a non-significant d' for the not-to-be-responded-to trials in
the first analysis and a significant result in the second analysis showing a better discrimination for the to-be-responded-to trials. Nevertheless,
the fact that the different variations A and B were discriminated from their parents, indicates that the infants might distinguish the members of
both categories. In fact, the second condition seems to be closer to the ideal profile, since according to a paired t test (2-tailed, t(9) = -1.987),
the d' in this condition tended (p = .0782) to be larger for the to-be-responded-to sequences (d' = 1.937) than for the not-to-be-responded-to
sequences (d' = .999).
Consequently, the first experiment does not allow us to conclude to the existence of a categorization process, since the infants responded
virtually equally to both types of change trials. Moreover, the sequence A3 seems to have a special status, since it tended to elicit more
responses in the first condition and did provoke more responses in the second one. This can be referred to the high register of this passage,
that might have attracted infants' attention and made this sequence more distinctive than the other. Mélen (1997, 1999b) showed that register
change were powerful cues for 6-to-10-month old infants, and these results give further support to this suggestion. One could also say that the
categories were not easy to discriminate, since the features which distinguish them did not lead to higher proportions of responses for the
to-be-responded-to sequences. Even, if the d' for the to-be-responded-to sequences would have been significant in the second condition, it
could have been attributed to one particular sequence rather than to categorical features as such.
Two questions arise at the end of this experiment : 1) is A3 really dissimilar to the other sequences ? 2) Would categorization be manifest if
the difference between categories was increased ? Experiments 2 and 3, respectively, were run to address these questions.
Experiment 2
Methods
Participants
Twenty female and 20 male young adults (mean age = 23 years, range = 15-29 years) took part to the experiment. They never received any
music training.
Apparatus
The experiment was controlled by a computer (Macintosh Classic 4/20), which monitored, through a program designed with the Max
software, the electronic equipment. The stimuli, composed with the software Performer, were generated on line by a synthetizer (Korg
05R/W) involving a built-in MIDI interface and presented via two self-amplified loudspeakers (Altec Lansing ACS31).
Musical material
The Ländler n° 10 D145, op. 11 by F. Schubert served as stimulus.
Procedure
The participants were tested individually and presented with 45 pairs of sequences, representing all the possible arrangements of sequences
Parent A, A1, A2, or A3, on the one hand, and all the possible arrangements of sequences Parent B, B1, B2, or B3, on the other hand. Three
control pairs were added, i.e. A vs A, B vs B, A vs B. Thus, there was a total of 15 pairs, each presented three times. The task was to evaluate
the degree of similarity between the elements of a pair. They used a Likert-type scale ranging from 1 (totally dissimilar) to 7 (identical) made
up of seven strokes of the keyboard. They responded by pressing the corresponding stroke. No time pressure was imposed. A pair of
segments replaced the other, as soon as the participant had given his/her response. The participants were allowed to respond only after the
second element of the pair. The experiment, followed by a debriefing, lasted about 10 minutes.
Results and discussion
The three control pairs showed that the participants performed consistantly, since the mean rating was 6.8 (SD = .34) for the pair Parent A -
Parent A and 6.56 (SD = .51) for Parent B - Parent B, whereas it was 1.67 (SD = .80) for the pair Parent A - Parent B. These ratings are very
similar to those obtained by non-musician subjects for similar task in other experiments (Deliège, 1996).
Two one-way Anovas considering the pairs, excepted the control pairs, as within-factor was run out on the mean similarity ratings and led to
a very significant effect (for the A motifs: F(5,195) = 16.66, p < .0001; for the B motifs: F(5,195) = 27.51, p < .0001). A Newman-Keuls post
hoc analysis revealed that the participants considered all the pairs containing the sequence A3 as being dissimilar from the other pairs (p <
.05). The sequences B and B1 were judged quite similar since the participants gave an average rating of 6.1 to this pair, and this rating was
higher than any other rating (Newman-Keuls significant at p < .05). The similarity ratings for the other pairs were ranging from 4.2 to 4.66
(see figure 1B).
This experiment confirms that sequence A3 appeared dissimilar to other sequences derived from the parent A. The results of the present
experiment must also be compared to those of the experiment by Mélen, Praedazzer and Deliège (this symposium), showing that 7-9 year-old
children did not consider this part of the Ländler as being a member of the category derived from the parent A. The high register of this
sequence could make it so particular that young listeners could have been misled and isolated it from its category. The infants, in the first
experiment, could have experienced the same misleading impression.
Experiment 3
Methods
Participants
The participants were 11 healthy, full-term infants ranging from 8 to 10 months of age. One infant was excluded from the sample, because
she failed to meet a predetermined training criterion. The final sample comprised 4 males and 6 females, with a mean age of 7 months, 6
days.
Apparatus
See experiment 1.
Musical material
The parent motif A from the Ländler n° 10 D145, op. 11 by F. Schubert in the first experiment and the Finale Rondo of Diabelli's Sonatine
n°2 served as stimuli. The latter is described in Koniari, Mélen and Deliège (this symposium). It was chosen because it presents the same
structure as the piece by Schubert. The parent motif B and its derivatives (B1, B2, B3) were extracted from the piece and opposed to the
parent motif A and its derivatives from the Schubert's piece. We thought this could enhance the distinctivity of the categories, since the
sequences of the Rondo are clearly different from the sequences of the Ländler except in terms of length.
Procedure
The same procedure as in experiment 1 was followed in the third experiment.
Results and discussion
As in experiment 1, raw data consisted in the number of head-rotations to change and no-change trials. Globally, the infants responded to
21.62 % of no-change trials, to 25.83 % of change trials involving variations of parent A (hereafter, C- trials) and to 61.67 % of change trials
involving variations of B (hereafter, C+ trials).
A d' was calculated for each type of change trials. Each actual d' was compared to the expected value in case of no-discrimination, i.e. zero. A
one-sample Student t test (1-tailed) was run out and led to the following results : for the C- trials, d' = -.021,
In conclusion, the infants appeared to be more efficient in this experiment than in the first one, since they responded mainly to the C+ trials,
whereas the C- trials provoked less responses, even when it comprised the A3 of Schubert's piece.
The results of this experiment are more readily interpretable in terms of categorization as those of the first experiment. Indeed, the infants
neglected the variations of the standard sequence and responded to the variations of the comparison sequence. It could be concluded that they
detected the common feature to the sequences - the cue - and responded to the sequence that shared the same cue when it was reinforced by
the cartoon.
One could argue against the categorization hypothesis from the non-significant d' for the C- trials: this could mean that infants could simply
not discriminate between the parent motif and its variations because they could not perceive the differences between the items of the same
category. The answer to this argument is a matter of empirical evidence. This experiment is still to be done. In the meantime, our results give
several indirect supports to the categorization hypothesis. In the first experiment, a d' significantly different from zero was observed for the
C- trials of the first condition. Yet, in this condition the C- trials were made up of variations a of Schubert's piece. It revealed, therefore, that
infants did actually discriminate the variations of the Standard sequence from the Standard sequence itself. In the third experiment, when
Schubert served as the Standard category, infants categorized each sequence appropriately. However, for the second condition using
Diabelli's sequences as Standard category, it cannot be concluded that infants categorised so well, i.e. that they perceived each member of the
Standard category as distinct exemplars of that category. Nevertheless, we can suppose it was really the case, since there was no difference
between the two conditions for the d' associated to C- trials. If the infants categorized Schubert's sequences, there is no reason to conclude
they did not categorize Diabelli's sequences.
General discussion
References
Behl-Chadha, G.(1996). Basic-level and superordinate-like categorical representations in early infancy. Cognition, 60, 105-141,
Clarkson, M. G. & Clifton, R. K. (1985). Infant pitch perception: Evidence for responding to pitch categories and missing fundamental.
Journal of the Acoustical Society of America, 77, 1521-1528,
Deliège, I. (1987). Le parallélisme, support d'une analyse auditive de la musique : vers un modèle des parcours cognitifs de
l'information musicale. Analyse Musicale, 6, 73-79.
Deliège, I. (1996). Cue abstraction as a component of categorization processes in music listening. Psychology of music, 24, 131-156,
Deliège, I., & Mélen, M. (1997). Cue abstraction in the representation of musical form. In I. Deliège, & J. Sloboda (Eds.), Perception
and Cognition of music (pp. 387-412). Hove: Psychology Press.
Jolicoeur, P., Gluck, M. A., & Kosslyn S. M. (1984). Picture and names: Making the connection. Cognitive Psychology, 16, 243-275.
Kuhl, P. K. (1991). Human adults and human infants show a "perceptual magnet effect" for the prototypes of speech categories,
monkeys do not. Perception and Psychophysics, 50, 93-107,
Lécuyer, R., & Poirier, C. (1994). Categorization in the five-month-old infants. Cahiers de Psychologie cognitive, 13(1), 193-509,
Cognitive, 13(1), 193-509.
Mandler, J.M., Bauer, P., & McDonough, L. (1993). Separating the sheep from the goats: Differentiating global categories. Cognitive
Psychology, 23, 263-298.
Mélen, M. (1997). Les principes du même et du différent comme organisateurs du groupement rythmique chez les nourrissons de six à
dix mois. Thèse de doctorat en psychologie non publiée, Université de Liège, Liège.
Mélen, M. (1999a). Les principes du même et du différent comme organisateurs du groupement rythmique chez le nourrisson:
Arguments théoriques. Musicae Scientiae, 3(1), 41-66.
Mélen, M. (1999b). Les principes du même et du différent comme organisateurs du groupement rythmique chez le nourrisson: Une
étude empirique. Musicae Scientiae, 3(2), 161-191,
Mervis, C. B. (1987). Child-basic object categories and early development. In U. Neisser (Ed.), Concept and conceptual development
(pp. 201-233). Cambridge (UK): Cambridge University Press.
Quinn, P. C. (1987). The categorical representation of visual pattern information by young infants, Cognition, 27, 145-179.
Quinn, P. C. (1998). Object and spatial categorization in young infants: "what" and "where" in early visual perception. In A. Slater
(Ed.), Percetual development: Visual, auditory, and speech perception in infancy (pp. 131-165). Hove: Psychology Press.
Back to index
Proceedings paper
Music and Spatial-Temporal Reasoning: The most recent and robust evidence suggests that musical training seems to
have the greatest effects on other cognitive abilities, specifically children's spatial-temporal skills, with no effects on
general intellectual abilities or on spatial abilities. Longitudinal studies have shown that training is most effective if on
a keyboard instrument, for a period of more than 2 years, and begun before the age of 7 (Rauscher & Zupan, in press;
Rauscher, 2000). The cortical model proposed by Leng & Shaw (1991) and adopted by Rauscher and colleagues
ascribes these transfer effects to early brain plasticity. This implies that children older than 7 cannot be subject to any
further improvements in spatial-temporal reasoning as a result of musical training.
Yet studies with children aged 10-11 seem to result in similar enhancements in spatial-temporal reasoning
(Costa-Giomi, 1997). Research with adults also indicates that differences in spatial-temporal abilities can be explained
by later aspects of musical engagement. These include current engagement in musical activities (Plumb & Cross, 2000)
or having studied music theory alongside practical musical training (Lamont, 2000). These studies also indicate that
studying science subjects either at age 16-18 or at university can be related to spatial-temporal improvements for
university students. These findings highlight the need to undertake research in a more contextually-grounded manner,
spanning a wide age-range, and to include data on a wide range of non-musical abilities and experiences, in order to
further explore the assumed relationship between musical training and specific cognitive abilities.
Music and Achievement. Folk beliefs about the effects of music on academic achievement are prevalent (cf. Sloboda &
Davidson, 1996). Music instruction during childhood does not itself seem to result in straightforward improvements in
achievement in school-leaving examinations; university students with more years of music training tend to have higher
levels of academic achievement, yet gender and parental educational achievement are more powerful predictors of
differences in students' own levels of academic achievement (Lamont, 2000). Nonetheless, it is typically those children
with higher levels of academic achievement who are selected for music instruction, and thus this issue is carefully
examined here as an important component in the network of real-world relationships involving music and achievement.
Musical Representations: Studies indicate that whilst music instruction is important for developing higher-order
musical representations, musical experiences alone can also lead to sophisticated abstract representations of music
(Lamont, 1998a, 1998c). The current project extends these ideas to questions outside the musical field. Intervention
studies involving music instruction do not account for children's motivations to engage in musical activities (cf. O'Neill
& Sloboda, 1996), and it is likely that within the experimental groups studied there is also a diversity of levels of
understanding music. It is suggested that measuring musical representations, rather than measuring or providing music
instruction, may give a more accurate picture of the extent to which music has an impact on children's musical thinking
in a real-world context. It is proposed that whilst music instruction may lead to changes in cognition in other domains,
this may be mediated by an intervening variable of musical representation or understanding (that may develop without
formal music instruction), which therefore must be included in any systematic study of the effects of music on
cognition.
Musical Opportunities: Whilst all children between 5 and 14 years participate in compulsory school music education in
the UK, the opportunities for music instruction or participation in musical activities vary greatly, and often depend on
financial support from parents (ABRSM, 1994; Sharp, 1991). National surveys also illustrate a diversity in the kinds of
musical opportunities provided by the different local education authorities in the UK (Cleave & Dust, 1989).
Small-scale studies show that children's involvement with music instruction is strongly influenced by parental
encouragement and the qualities of music teachers (Sloboda & Davidson, 1996) in addition to children's own
motivations (O'Neill & Sloboda, 1996). Finally, university students from higher socio-economic categories who do
embark on musical training during childhood are more likely to continue with their training for longer periods of time
(Lamont, 2000). These diverse findings point to the need to include a broad range of variables relating to the children's
own and their families' engagement with music and the effects of school culture upon these opportunities, alongside a
consideration of socio-economic status, in any rigorous study of children's musical and cognitive development.
Significance: This research proposes a more complex theoretical model of the network of relationships between
children's musical and non-musical abilities and achievements (Figure 1 below), and provides a pilot test of the model
to assess the impact that music can and does have on children's thinking, in a highly ecologically valid context.
It is argued that the simple relationship between musical training and spatial-temporal abilities found in previous
research may be complicated by a number of hidden mediating variables. Improvements in spatial-temporal abilities
may be due to musical training, musical experiences, or academic achievement in science. Further, parental
socio-economic status and parental educational achievement may also play a large role in influencing many of the
variables shown below (particularly in terms of take-up of music instruction and children's academic achievements).
Finally, children's real musical experiences and understandings are carefully recorded and assessed (rather than
randomly assigning children to 'music' and 'no-music' conditions), and the network of variables outlined in Figure 1 are
considered as interrelated and integrated aspects of children's thinking.
Figure 1
(Single arrows indicate unidirectional effects. Double-headed arrows indicate hypothesised bi-directional relationships.
Question marks indicate hypotheses are not yet proven. The absence of question marks indicates accepted causal
relationships.)
Aims
This research studies children's musical experiences in an ecologically valid context, incorporating real-world
preferences and opportunities into the design, in order to explore the relationships between children's musical and
non-musical abilities and achievements. The project has two main research questions:
1) What is the relationship between children's musical experiences (formal music instruction, informal
musical experiences, and musical representations) and their academic achievements (English, maths and
science)?
2) What is the relationship between children's musical experiences (formal music instruction, informal
musical experiences, and musical representations) and their non-musical cognitive abilities (general,
spatial, and spatial-temporal)?
These two questions make no assumptions about simple or directional relationships, instead admitting the possibility
that improvements in either area may result in improvements in the other. For example, children with high levels of
academic achievement may be more likely to choose to and be allowed to participate in musical activities (particularly
music instruction), which are often seen as 'optional extras' that might be distracting for academically less able
children. Similarly, children who excel at specific academic subjects (notably mathematics and science) may also be
those who are able to or choose to achieve high levels of musical expertise. Both questions are contextualised by a
broad range of background biographical data concerning children's actual take-up of musical opportunities, aspects of
their home and school lives in relation to music, family involvement with music, parental educational achievement and
socio-economic status.
Method
Children from every year-group aged between 5 and 18 were drawn from different schools in the Midlands of England.
Data was collected in two stages.
Stage 1: This part of the study focused on the relationship between musical experiences and academic achievement
(Question 1 above). Data was gathered via questionnaires from both children and their parents regarding their
experience of formal musical training, informal musical activities, music in the home, and individual and institutional
preferences, opportunities and constraints on children's participation in music. Data was also collected on parents'
occupations and parental levels of education attained. Children's academic achievement was measured from school
records, Statutory Assessment Tests (taken by UK school children at age 7, 11, and 14 in English, mathematics and
science), and public examinations (taken at age 16 and 18) as appropriate. Class teachers, music teachers, and heads of
schools were also interviewed to explore the structure of musical opportunities afforded to the children and to shed
light on any unusual circumstances (either musical or academic).
Stage 2: This part of the study focused on the relationship between musical experiences and non-musical cognitive
abilities (Question 2). It comprised a series of cognitive tests, including group measures of musical representation (as
developed by Lamont, 1998c), measures of spatial-temporal and spatial reasoning (following Rauscher et al., 1997),
and measures of general analytic intelligence (Raven's Standard Progressive Matrices). Children's responses in Stage 2
were contextualised by the data gathered in Stage 1.
Results
Analysis of results is currently being undertaken and will be presented at the conference. The strategy for analysis will:
• explore the relationship between children's musical and non-musical abilities and achievements for each year
group. This will enable the effects of musical experiences and achievements to be more carefully assessed on a
range of cognitive abilities and achievements, and comparisons in both fields to be made on the basis of
children's age, amount and type of musical experiences.
• evaluate the contribution that thus-far neglected variables such as socio-economic status and musical
representations make to the profile of abilities and achievements. This will provide a preliminary test of the
model outlined above in terms of mediating factors between musical training and enhanced spatial-temporal
reasoning.
• focus on comparisons across different schools to assess the impact that music provision has on children's
musical and non-musical abilities. This will enable a consideration of how far children's socio-cultural context
affects their musical development in terms of preferences, attitudes, and achievements.
The results will be important in providing a preliminary empirical test of the model proposed above. This will set the
groundwork for a broader longitudinal study involving a larger sample of children which will trace the development of
children's musical experiences and non-musical cognitive abilities over time.
References
ABRSM (1994). Making Music: The Associated Board Review of the Teaching, Learning and Playing of
Musical Instruments in the United Kingdom. London: Associated Board of the Royal Schools of Music.
Cleave, S. & Dust, K. (1989). A Sound Start: The Schools Instrumental Music Service. Windsor: NFER-Nelson.
Costa-Giomi, E. (1997). The McGill Piano Project: Effects of piano instruction on children's cognitive abilities.
Proceedings of the Third Triennial ESCOM Conference, Uppsala University, Sweden, 446-450.
Hallam, S. & Price, J. (1998). Can the use of background music improve the behaviour and academic
performance of children with emotional and behavioural difficulties? British Journal of Special Education,
25(2), 88-91.
Lamont, A. (1998a). Music, Education, and the Development of Pitch Perception: The Role of Context, Age,
and Musical Experience. Psychology of Music, 26, 7-25.
Lamont, A. (1998b). Response to Katie Overy's Paper, "Can Music Really Improve the Mind?" Psychology of
Music, 26 (2), 201-204.
Lamont, A. (1998c). The Development of Cognitive Representations of Musical Pitch. Unpublished PhD
Dissertation, University of Cambridge.
Lamont, A. (2000). University Students' Musical Experiences, Musical Representations, and Cognitive
Abilities. Paper presented to the SPRMME conference on The Effects of Music, University of Leicester, April.
Leng, X. & Shaw, G.L. (1991). Toward a Neural Theory of Higher Brain Function Using Music as a Window.
Concepts in Neuroscience, 2(2), 229-258.
O'Neill, S.A. & Sloboda, J.A. (1997). The Effects of Failure on Children's Ability to Perform a Musical Test.
Psychology of Music, 25, 18-34.
Overy, K. (1998). Can Music Really Improve the Mind? Psychology of Music, 26(1), 97-99.
Plumb, J. & Cross, I. (2000). A generalised effect of music education. Paper presented to the SPRMME
conference on The Effects of Music, University of Leicester, April.
Rauscher, F.H. (2000). Musical Influences on Spatial Reasoning: Experimental Evidence for the "Mozart
Effect". Paper presented to the SPRMME conference on The Effects of Music, University of Leicester, April.
Rauscher, F.H. & Zupan, M.A. (in press). Classroom Keyboard Instruction Improves Kindergarten Children's
Spatial-temporal Performance: A Field Experiment. Early Childhood Research Quarterly.
Rauscher, F.H., Shaw, G.L. & Ky, K.N. (1993). Music and spatial task performance. Nature, 365, 611.
Rauscher, R.H., Shaw, G.L., Levine, L.J., Wright, E.L., Dennis, W.R. & Newcomb, R.L. (1997). Music training
causes long-term enhancement of preschool children's spatial-temporal reasoning. Neurological Research, 19,
2-8.
Rauscher, F., Spychiger, M., Lamont, A., Mills, J., Waters, A., & Gruhn, W. (1998). Responses to Katie
Overy's paper, 'Can music really 'improve' the mind?' Psychology of Music, 26(2), 197-210.
Sloboda, J.A. & Davidson, J.W. (1996). The young performing musician. In: I. Deliège & J. Sloboda (Eds.),
Musical Beginnings: Origins and Development of Musical Competence, Oxford: Oxford University Press,
171-190.
Sharp, C. (1991). When every note counts: The schools' instrumental music service in the 1990s. Slough:
National Foundation for Educational Research.
Sloboda, J.A., Davidson, J.W., Howe, M.J.A & Moore, D.G. (1996). The role of practice in the development of
expert musical performance. British Journal of Psychology, 87, 287-309.
Weinberger, N.M. (1999). Can Music Really Improve the Mind? The Question of Transfer Effects. MuSICA
Weinberger, N.M. (2000). "The Mozart Effect": A Small Part of the Big Picture. MuSICA Research Notes, VII,
I, Winter.
Back to index
Proceedings abstract
Sofia Dahl
sofia@speech.kth.se
Background:
Much attention has been paid to expressive timing in music performance but
relatively little of this research work has been devoted to percussionists.
Like all musicians, a percussionist will interpret the score, giving more
emphasis to certain notes. In percussions, the principal ways of giving a note
more emphasis are changes in duration and dynamic level. It seems reasonable to
assume that these two performance parameters are used more distinctively in
percussion playing than for most other instruments. This fact ought to be
clearly reflected even in the performance of a simple drumming task.
Aims:
This study investigates the timing in three subjects during the performance of
a simple drumming task.
Method:
Three professional drummers were recorded playing repeated single strokes with
interleaved accents every fourth note in three different tempi and at three
different dynamic levels. The separation of the strokes in time, the
inter-onset intervals (IOIs), were analysed and comparisons of the players'
timing performances under different conditions were studied.
Results:
Conclusions:
Back to index
Proceedings paper
Background
Absolute pitch (AP; sometimes also "perfect pitch" or "positive pitch") is the ability to identify or produce tonal pitches
without reference to an external standard (Takeuchi & Hulse, 1993). This auditory long-term memory for pitches seems
to be based on the perceptual coding of tone chromata, is reported to have a low prevalence in humans (less than
1 : 10.000, at least in its "perfect" form), and has been subject to nature/nurture debates for more than a century (cf.
Stumpf, 1883; Meyer, 1899; Bachem, 1937, 1940; Miyazaki, 1988; Takeuchi & Hulse, 1993; Chin, 1997; for a recent
synopsis see Ward, 1999). Considering neurophysiological evidence, however, the principle of tonotopic organization
throughout the auditory projection pathway suggests that "absolute" information about pitches should be available on
every human's primary auditory cortex (see, e.g., Romani, Williamson, Kaufman & Brenner, 1982; Pantev, Bertrand,
Eulitz, Verkindt, Hampson, Schuierer & Elbert, 1995; Pantev, Oostenveld, Engelien, Ross, Roberts & Hoke, 1998).
Accordingly, it has been proposed that also non-possessors of AP may have access to "latent" long-term pitch
representations in auditory memory (e.g., for the pitches of well-known tunes): Long-term memory for musical keys in
spontaneous pitch production (Halpern, 1989; Levitin, 1994) or pitch recognition (Terhardt & Ward, 1982; Terhardt &
Seewann, 1983), i.e., active and passive "absolute tonality," can be understood as weakened forms of AP.
Aims
Using the 12 major key preludes from Johann Sebastian Bach's Well-Tempered Clavier, Terhardt & Ward (1982) and
Terhardt & Seewann (1983) showed that musically literate subjects without AP performed above chance in
discriminating original keys even from one-semitone transpositions. However, these experiments did not reliably exclude
short-term memory judgments based on tone intervals (i.e., relative pitch cues). We aimed to scrutinize the finding of
musical key recognition in non-AP possessors by using a 24-hours inter-stimulus interval for rigorous short-term
memory interference (Hall, 1982) and updated technical means ("identical replication" by digital transposition).
Study Design
We presented 52 students without manifest AP with the first prelude in C major from the Well-Tempered Clavier, either
in the nominal key or digitally transposed to C-sharp, and tested their ability to discriminate between these two keys.
Each condition was presented seven times in a random sequence of 14 trials, one trial per day. As the two versions were
identical except for their pitch difference of one semitone, subjects without AP were expected not to achieve a
discrimination rate above chance level (7 correct judgments or 50%).
Testing Material and Procedure
A digital recording of the C major prelude (BWV 846) was duplicated in C-sharp on an electronic piano (Yamaha
Clavinova, CLP-840); both versions were recorded with a Digital Audio Tape Deck (Pioneer D-05). N = 52
non-AP-possessors (mostly 17-18 years old students), who declared to be familiar with the piece, were tested in single
sessions, one trial per day only, on 14 subsequent days. Full-length recordings (2'13") were presented in random order, 7
times each, via stereo headphones. Other than in the Terhardt studies, no written musical material (score of the prelude)
was provided. While or after listening to the piece, participants had to judge which version they actually heard and to
mark their dichotomous decision on a response sheet. No feedback was given.
Results
Even with our rigorous testing mode, participants clearly outperformed chance (see Figure 1, left panel). With a mean hit
rate of 8.2 (59%, SD = 1.8 or 13%), the close-to-normal score distribution is significantly shifted to the right compared to
a chance distribution (one-sample t-test, p < .001, effect size δ = 0.7). Performance was still slightly better (M = 8.7 or
62% hits, SD = 1.5 or 11%; between-groups p = .017) in participants with piano playing experience (Figure 1, right
panel). Except for this familiarity / musical expertise effect, we found no other moderating effects, such as training
effects (trials 1-7 vs. 8-14, p = .542), or systematic choice preferences (e.g., for the "white" key; Takeuchi & Hulse,
1991).
Conclusion
The small, but stable effect that we found points to the existence of a rudimentary ability for absolute pitch recognition.
There is increasing evidence about "latent" forms of AP being more widespread, at least among individuals with some
musical pre-experience, than traditionally assumed. Thus, it seems more adequate to adopt a continuum view of AP
instead of maintaining a discrete distinction between "possessors" vs. "non-possessors."
References
Bachem, A. (1937). Various types of absolute pitch. Journal of the Acoustical Society of America, 9, 146-151.
Bachem, A. (1940). The genesis of absolute pitch. Journal of the Acoustical Society of America, 11, 434-439.
Chin, C. S. (1997). The development of absolute pitch. In A. Gabrielsson (ed.), Proceedings of the Third Triennial
ESCOM Conference (pp. 105-110). Uppsala, Sweden: Uppsala University, Dept. of Psychology.
Hall, D. E. (1982). "Practically perfect pitch:" Some comments. Journal of the Acoustical Society of America, 71,
754-755.
Halpern, A. R. (1989). Memory for the absolute pitch of familiar songs. Memory & Cognition, 17, 572-581.
Levitin, D. J. (1994). Absolute memory for musical pitch: Evidence from the production of learned melodies.
Perception & Psychophysics, 56, 414-423.
Meyer, M. (1899). Is the memory of absolute pitch capable of development by training? Psychological Review, 6,
514-516.
Miyazaki, K. (1988). Musical pitch identification by absolute pitch possessors. Perception & Psychophysics, 44,
501-512.
Pantev, C., Bertrand, O., Eulitz, C., Verkindt, C., Hampson, S., Schuierer, G., & Elbert, T. (1995). Specific
tonotopic organization of different areas of the human auditory cortex revealed by simultaneous magnetic and
electric recordings. Electroencephalography and clinical Neurophysiology, 94, 26-40.
Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L. E., & Hoke, M. (1998). Increased auditory cortical
representation in musicians. Nature, 392, 811-814.
Romani, G. L., Williamson, S. J., Kaufman, L., & Brenner, D. (1982). Characterization of the human auditory
cortex by the neuromagnetic method. Experimental Brain Research, 47, 381-393.
Stumpf, C. (1883). Tonpsychologie (Vol. 1). Leipzig: Hirzel.
Takeuchi, A. H., & Hulse, S. H. (1991). Absolute pitch judgements of black- and white-key pitches. Music
Perception, 9, 27-46.
Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychological Bulletin, 113, 345-361.
Terhardt, E., & Seewann, M. (1983). Aural key identification and its relationship to absolute pitch. Music
Perception, 1, 63-83.
Terhardt, E., & Ward, W. D. (1982). Recognition of musical key: Exploratory study. Journal of the Acoustical
Society of America, 72, 26-33.
Ward, W. D. (1999). Absolute pitch. In D. Deutsch (ed.), The Psychology of Music (2nd ed., pp. 265-298). San
Diego: Academic Press.
Back to index
Proceedings paper
Introduction
Since vocal pedagogues cannot directly view the vocal mechanism, they rely on perceptual cues to
help them determine an individual's voice classification. Traditionally, voice classification has been
based on three perceptual parameters: range, timbre, and tessitura (Vennard, 1967); however, these
parameters are poorly defined and the interrelations between them are unknown.
To date, no research has been conducted that examines the interrelationship of pitch, tessitura, and
timbre as predictors of voice classification. Most research studies have focused on the acoustic
correlates one parameter, timbre.
The accepted definition of timbre is as follows: two tones are of different timbre if they are judge to
be dissimilar and yet have the same loudness and pitch (ANSI, 1973). To define timbre for the vocal
instrument, an additional restriction is required. Not only must the two sounds be of the same pitch
and loudness, they must also be of the same vowel quality. Using such a definition, a singer would
have an individual timbre for each pitch-vowel combination. Yet this is not how vocal timbre has
been treated traditionally. Cleveland (1977) states that an individual singer has a characteristic timbre
that is a function of the laryngeal source and vocal tract resonances. Singers with similar timbres,
then, constitute members of the same voice timbre type or voice category. It is possible, however, that
any two voices may be perceived as having similar timbre on one pitch-vowel combination and
dissimilar timbre on another. In this case, each voice possesses a set of timbres. It may not be possible
to devise one simple acoustic measure that can accurately classify voice timbre types.
Research has shown a correlation between timbre type classification and average formant frequency,
with basses having lower formant frequencies than tenors (Cleveland, 1977) and sopranos having the
highest formant frequencies (Dmitriev & Kiselev, 1979). Since vocal tract length is directly related to
formant frequency, it is believed that this physical attribute contributes to voice quality. Yet when the
data provided by Dmitriev and Kiselev are examined closely, there is some evidence that vocal tract
length may not be related to voice classification in females. They observed distinctly different and
increasingly shorter vocal tract lengths for the voice categories of bass, baritone, and tenor,
respectively. However, vocal tract lengths for the voice categories of mezzo-soprano and soprano
were nearly identical. Likewise, the center frequency of the fourth formant decreased and showed
little overlap for basses, baritones, and tenors, respectively. Yet a great deal of overlap in fourth
formant frequency was observed for mezzo-sopranos and sopranos. These data suggest that the
acoustic correlate of vocal tract length, formant frequency, may be a perceptual cue used to assess
voice classification in males, but does not appear sufficient to differentiate the traditional voice
categories in females.
Acoustic and temporal cues have been shown to be important in the perception of speech and musical
instrument timbre. Every vibrating body has natural resonating frequencies. The natural resonating
frequencies of the vocal mechanism are known as formants and are know to influence timbre
perception. Additionally, four temporal parameters may be of importance in the perception of vocal
timbre: onset (e.g., Darwin, 1981; Grey & Gordon, 1978), and vibrato rate and extent (McAdams &
Rodet, 1988), and spectral variation. In cases where steady state spectral information is not sufficient
for timbre perception, such as instances where fundamental frequency is high and harmonics are
therefore widely spaced, these cues may provide additional acoustic information.
This paper attempts to investigate the perceptual validity of singing voice classification systems as
they relate to two parameters, pitch and timbre. Previous studies (e.g., Cleveland, 1977) have
examined the perception of voice classification using forced-choice paradigms based on the traditional
voice classification system of bass, baritone, tenor, alto, mezzo-soprano, and soprano. Such perceptual
experiments provide information as to how listeners place stimuli when provided with arbitrary
classification categories. They do not provide information on the perceptual validity of the categories.
The question that begs answering is this, if provided no classification system a priori, how do
listeners tend to group vocal stimuli? Do they group them in a manner that supports current
classification systems? When grouping vocal stimuli, is the perception of timbre truly independent of
pitch?
This study employed two research paradigms in order to examine the perceptual dimensions of
classical voice classification. First, multidimensional scaling procedures were used to discover the
dimensions underlying vocal timbre. Second, an "oddball" paradigm was used to assess whether
timbre, independent of pitch, can be used as a perceptual cue to group vocal stimuli into traditional
voice categories.
Method
Stimuli
Master's level singers from the Department of Music at the University of Tennessee, Knoxville have
provided stimuli for the experiment. These subjects met the following criteria:
1. Bilateral hearing within normal limits as determined by a 20 dB hearing screening;
2. Voice study at the Master degree level or higher;
3. No voice problems at the time of taping as determined by a certified speech-language
pathologist.
Two singers from each voice classification, mezzo-soprano and soprano, were recorded singing the
vowel /Y / on six different pitches, A3, C4, G4, B4, F5, and A5, at a constant loudness level. The
resulting 24 stimuli were used in two perceptual experiments.
Recordings were made in a single-walled sound booth (Acoustic Systems RE-144-S). Subjects were
recorded while producing a sustained /Y / for 5 seconds using digital audio tape recorder (Sony
PCMR500) and a Sennheiser MD 441-U microphone. Subjects stood in the center of the booth. Lip to
microphone distance was 12 inches. A keyboard was used to present pitches. Prior to taping, subjects
were allowed to vocalize freely and become comfortable with the recording environment.
Listeners
All listeners in this study were experienced vocal professionals. Listeners were recruited from the
Knoxville Choral Society and the Knoxville Opera Company. Twelve listeners were recruited that met
the following criteria:
1. Bilateral hearing within normal limits as determined by a 20 dB hearing screening;
2. Bachelor's degree or higher in a vocal arts related discipline (e.g., pedagogy, performance, or
choral conducting) or 5 years experience in a vocal arts discipline.
Procedure
Two listening experiments were conducted. Both experiments took place in a single-walled sound
booth (Acoustic Systems RE-144-S). Stimuli were presented binaurally through Sennheiser
headphones. Listeners entered responses using a computer monitor and a mouse.
Experiment 1
Experiment 1 utilized a dissimilarity paradigm. From the 24 vocal stimuli, all possible combinations
of two stimuli (A and B) were constructed, resulting in 276 paired trials. Within each trial, stimuli
were randomly assigned as A or B. Trials were presented in randomized order in an ABBA format to
12 experienced listeners.
Subjects were asked to rate each pair on a dissimilarity scaled (0-10). This was accomplished through
use of a horizontal scroll bar presented via a computer monitor. The left side of the scroll bar
indicated 0 while the right side of the scroll bar indicated 10. Subjects were instructed that they should
rate the two stimuli as 0 if they were identical and 10 if they were very different. Subjects were told
that they could use any measure between 0 and 10 to indicate the degree of dissimilarity. Subjects
were told not to use pitch as a factor in their ratings. All subjects were presented with test stimuli so
that they could become familiar with the computer interface and the task.
Experiment 2
For Experiment 2, trials of three stimuli were constructed, using an oddball paradigm. In this version
of the oddball paradigm, two of the three stimuli in each trial were produced by the same singer at two
different pitches (X1 and X2), while the third stimulus was produced by a different singer (Y). For
each singer, three same-singer conditions were constructed: one pairing the pitches G4 and B4 (XG4
and XB4), a second pairing the pitches C4 and F5 (XC4 and XF5), and a third pairing the pitches A3
and A5 (XA3 and XA5). For each singer and each condition, the "odd" stimulus (Y) was varied across
the three remaining singers and across the pitches A4, C4, G4, B4, F5, and A5. This design created
216 trials based on 4 singers times 3 conditions time 3 singer-pairs times 6 pitches. For each trial,
stimulus order was randomized. The resulting 216 trials were presented in random order to 12
experienced listeners.
Prior to stimulus presentation, listeners were told that two of the three stimuli in each trial were
produced by the same person and that they were to chose the stimulus produced by the different
person. Listeners were allowed to replay each trial as many times as they needed. Listener judgments
were recorded via a computer interface.
Results
Experiment 1
Distance measures obtained from Experiment 1 were subjected to multi-dimensional scaling analysis
(MDS). The optimal MDS solution was found in 3 dimensions. Fit measures for the solution were as
follows: Stress = .18 and R2 = .80.
Only the first two dimensions will be discussed in this paper. Dimension 1 was highly correlated with
pitch (R = .83). Dimension 2 appears related to voice category and was moderately correlated with F2
through F4 frequency (R = .73). Dimensions 1 and 2 for all four singers are presented in Figure 1.
Figure 2 displays mean values for dimensions 1 and 2 calculated for each voice category, mezzo
soprano and soprano.
Figure 1. MDS dimensions for Experiment 1
In general, experienced listeners rated same-pitch stimulus pairs from the same voice category as
more similar than they did same-pitch stimulus pairs from different voice categories. However, at the
pitches F5 and A5, listeners rated all stimulus pairs as very similar, regardless of voice category.
Pitch was a primary factor in all judgments. Within each singer, stimulus pairs with larger pitch
differences generally were perceived as being more dissimilar than those with smaller pitch
differences.
Experiment 2
For each X1X2 pair (XG4XB4, XC4XF5, and XA3XA5), the percent correct identification of the oddball
stimulus (Y) was calculated as a function of pitch for two comparisons: Y in the same voice category
as X1X2 and Y in a different voice category than X1X2.
Results for each X1X2 pair are presented in Figure 3. The plot labeled "Y in Same Voice Category as
X1X2" provides a graphic representation of the ability to discriminate differences within the same
voice category. Conversely, the plot labeled "Y in Different Voice Category than X1X2" provides a
graphic representation of the ability to discriminate differences between voice categories.
Figure 3. Percent correct identification of Y stimulus as a function of pitch for all three X1X2
conditions.
XG4XB4 Condition
In the condition XG4XB4, experienced listeners were able to select the "oddball" stimulus with a high
degree of accuracy. Y stimuli from a different voice category than X1X2 were accurately identified
regardless of pitch. However, when Y stimuli were from the same voice category as X1X2, accuracy
of Y stimulus identification decreased as the distance between Y stimulus pitch and XB4 decreased,
dropping from 100% accuracy to approximately 70% accuracy.
XC5XF5 Condition
For the condition XC4XF5, experienced listeners were less able to select the "oddball" stimulus than
they were in the XG4XB4 condition. Y stimuli from a different voice category than X1X2 were
accurately identified more often than those from the same voice category as X1X2. Accuracy levels
for this comparison ranged between 40% and 55%. Y stimuli from the same voice category as X1X2
generally were identified at or below chance levels. Unlike in the XG4XB4 condition, correct
identification of the Y stimulus did not decrease as the distance between Y stimulus pitch and either
X1 or X2 decreased. In fact, for Y stimuli in the same voice category as X1X2, peak percent correct
scores were achieved when Y pitch equaled either XC4 or XF5.
XA3XA5 Condition
Listeners identified the "oddball" stimulus least accurately in the XA3XA5 condition. While Y was
more accurately identified when in a different voice category than X1X2 than when in the same
category as X1X2, accuracy levels generally were at chance or less. For both comparisons, the greatest
accuracy was achieved when Y pitch equaled XA3, when Y pitch equaled XA5, or when Y pitch was
nearly midway between XA3 and XA5. In all other conditions, accuracy levels were far less than
chance.
Discussion
Results of Experiment 1 suggest that the following three factors affect the perception of dissimilarity:
voice category, pitch difference, and high pitch. It was shown that generally, listeners found voices in
different categories to be less similar than those in the same category. Listeners also found that
stimulus pairs were more similar when they were closer in pitch. Finally, listeners in general found all
stimuli to be highly similar at high pitches. Given these findings, several predictions concerning
Experiment 2 can be made:
1. Listeners should be more able to accurately identify Y stimuli when they are in a different voice
category than X1X2 than when Y is in the same category as X1X2;
2. Listeners' accuracy in Y stimulus identification should increase as the distance between Y
stimulus pitch and both X1 and X2 pitch increases.
3. Listeners should be less able to accurately identify Y stimuli when both Y and X2 occur at
pitches above F5 than when Y is in close proximity to X1 or X2 in a pitch range less than F5.
Summary
This research suggests that listeners use multiple cues when determining similarity of vocal stimuli.
Acoustic cues associated with voice categories, pitch difference, and high pitch all are associated with
perception of dissimilarity. However, experienced listeners also appear to use other strategies for
making similarity judgments of vocal stimuli. Further study is needed. Specifically, similarity
strategies for experienced and inexperienced listeners should be compared through examination of
degree of accuracy and error patterns for various listening conditions.
References
American National Standards Institue. (1973). Psychoacoustical terminology. S3.20.
New York: American National Standards Institute.
Cleveland, T. F. (1977). Acoustic properties of voice timbre types and their influence on
voice classification. Journal of the Acoustical Society of America, 61, 1622-1629.
Darwin, C. J. (1981). Perceptual grouping of speech components differing in
fundamental frequency and onset-time. Quarterly Journal of Experimental Psychology.
A. Human Experimental Psychology, 33A, 185-207.
Dmitriev, L., & Kiselev, A. (1979). Relationship between the formant structure of
different types of singing voices and the dimensions of the supraglottic cavities. Folia
Phoniatrica (Basel), 31, 238-241.
Grey, J., & Gordon, J. (1978). Perceptual effects of spectral modifications on musical
timbres. Journal of the Acoustical Society of America, 63, 1493-1500.
McAdams, S., & Rodet, X. (1988). The role of FM-induced AM in dynamic spectral
profile analysis. In H. Duifhuis, J. Horst, & H. Wit (Eds.), Basic issues in hearing (pp.
359-369). London: Academic Press.
Vennard, W. (1967). Singing, the mechanism and the technique. New York: Fisher.
Back to index
Proceedings abstract
Background.
Aims.
The goal was to give a detailed description of a concert pianist learning a new
piece for performance. Studies of expert performance in other domains provided
hypotheses about characteristics to look for: early identification of
underlying structure, shaping of early decisions by long-term goals, extended
retrieval practice.
Method.
A concert pianist, the second author, recorded herself as she learned the third
movement (Presto) of the Italian Concerto by J.S. Bach. The pianist commented
on her goals as she practiced, and provided detailed reports about the formal
structure, about decisions made during practice (e.g., about fingering and
phrasing), and about retrieval cues attended to during performance. Effects on
practice were identified using multiple regression.
Results.
Several features of the pianist's practice were identified that may contribute
to eminent performance: Early use of the formal structure to organize practice;
early effects of interpretive and expressive features; extended practice of
retrieval cues; division of practice into intensive work and longer runs;
clustering of sessions separated by months without practice.
Conclusions.
Back to index
Proceedings abstract
Australia
Background. Analysis and investigation of music from a psychological perspective has flourished during the past two
decades. Research has concentrated on examination of the perceptual and cognitive processes that mediate music
recognition, cue abstraction, and the identification of perceptual components of music. Recently, however, attention
has been given to the emotional aspects of music, such as the role of expressive timing and dynamics in performance,
the effects of emotion on performance and preference, and studies that utilise continuous emotional response measures.
However, the relationship between the cognitive organisation and segmentation of music and emotional response and
been neglected.
Aims. To test the hypothesis that emotional and physiological changes relate strongly to the perceived unfolding
musical structure, and that this perceived segmentation, in turn, relates to the music-theoretic and performance
structures.
Method. Both musicians and non-musicians were asked to listen to contrasting excerpts from pieces for string quartet.
Experiment 1 asked the participants to identify significant cues, and Experiment 2 identified the strength of temporal
placement of these perceptual segments within the musical framework. Using a 2-dimensional emotional space,
Experiment 3 measured both the emotion the participants believed the music was trying to convey, and the emotion
that participants experienced. Physiological changes in the participants were measured in Experiment 4. The results of
these experiments were compared with the structure of the pieces, both performed structure (measured through acoustic
analysis) and theoretic.
Results. Both musicians and non-musicians showed similar responses. Significant cues were found to correlate both
with significant emotional and physiological responses, as well as with important moments of theoretic and
performance segmentation.
Conclusions. Theoretic and performance musical segmentation has reality in perceived musical structure and in
emotional response.
Back to index
Procedure
Participants were tested in intact music classroom groups. Every attempt was made to control for problems with group
testing, by arranging classroom chairs in rows facing front with at least 2 feet between each chair. Participants were asked to
listen quietly to the musical excerpts without moving or making faces which would indicate their response to the music and
were led through the four test items for each stimuli presented. Selections were ordered by page color to maintain group
cohesion to the process.
Results
Data were converted from the self-report forms to numerical equivalents and analyzed via the Statistical Program for Social
Sciences (SPSS). A five point scale was used to compute means and standard deviations for grade and gender groups with
negative anchor statements weighed as one point, and positive statements weighed as five points.
Reliability analyses were completed to determine whether the four items measuring preference responses to the music were
correlated with each other . Alpha coefficients reflecting the degree of correlation between the items were .7798 for
Pop/Stimulative, .8445 for Pop/Sedative, .8829 for Rap/Stimulative, .7709 for Rap/Sedative, .8839 for Classical/Stimulative,
.8749 for Classical/Sedative, .9140 for Alternative/Stimulative, .8435 for Alternative/Sedative, .8879 for Jazz/Stimulative,
and .8842 for Jazz/Sedative. Items were highly correlated with each other so the responses to the four items were totaled into
a single composite score for each musical stimulus, with a possible range from 4-20.
Examination of means and standard deviations in Tables 2 and 3 indicates that for all participants Popular/Stimulative,
Rap/Stimulative, Alternative/Stimulative and Rap/Sedative were generally preferred the most, while Classical/Sedative,
Jazz/Sedative, and Classical/Stimulative were preferred the least. These findings support those of prior studies by May
(1985), Greer et al, (1974) and LeBlanc (1979) who found that rock and popular music forms were most preferred by
children.
Table2
Means, Standard Deviations,and Univariate Analysis of Variance of Grade Differences in Music Selection Preferences
Grade
Note: Style and activity levels are abbreviated; standard deviations are in parentheses.
*p < .05
** p < .01
*** p < .001
Table 3
Gender
Note:Style and activity level are abbreviated: standard deviations are in parentheses.
*p<.001
The results of this study suggest that Rap and Alternative music may also be added to the list of popular musical styles
preferred by the children of today. In addition, for every style grouping, the stimulative selection received a higher score, thus
indicating that "upbeat" rhythmic selections were more preferred than sedative, "flowing" selections.
The variable of grade is also of interest. First graders rated 7 out of the 10 musical selections more highly than did second and
third graders. This suggests that first graders had less specific and greater breadth of preferences for wider varieties of
musical style and activity level. In contrast, the third grade scores were consistently lower than first and second graders for 7
out of 10 selections, thus indicating a narrowing of depth in preference responses with advancement in grade level. The main
effects of gender and grade on preference scores were analyzed through the performance of a multi variate analysis of
variance (MANOVA). Preliminary diagnostic testing for the assumptions did not include any serious deviations from
normality and tests for homogeneity of variances did not show significant differences. The test assumptions for MANOVA
were met except that of independent observations, but since many precautions were taken in the procedure to address this
Order Tempo RhyI RhyII Pitch Timbre Dynamics Key Text Perf Affect
*1 5 7 2 4 4 5 4 5 5 5
2 4 4 6 4 4 4 5 3 2 3
3 6 4 6 6 6 5 7 6 6 7
*4 5 4 3 4 5 5 6 4 4 4
*5 4 5 6 4 3 5 4 4 4 2
6 4 4 6 3 4 3 5 3 3 3
*7 4 7 3 3 4 4 4 4 3 3
*8 5 6 5 5 5 5 6 4 4 6
9 2 2 6 7 6 4 6 5 5 4
*10 6 7 6 4 4 7 4 6 3 2
Note: Some musical characteristics are abbreviated. Numbering represents order of presentation to participants. Asterisks
indicate highest preferred selections.
Preliminary data analysis (in progress) of the scores included the creation of dot plots for each individual characteristic by
preference group. For all participants, regardless of grade level and gender, rhythm 1 (smooth versus percussive rhythm)
appeared to be a discriminative variable. Songs which had moderate to high levels of percussiveness were associated with
high preference scores while songs which were smooth in rhythmic texture were associated with low preference scores.
Faster tempos also appeared to be associated with high preferences, although songs which were not preferred represented a
wide range of tempos. Regularity of rhythm also appeared to be a discriminative factor for low preference scores.
Scatter plots revealed some possible interactions between variable combinations. For all subjects, the combination of
moderate to fast tempo and percussivity appeared to be related to high preference. Faster tempos and simple performances
were also related to higher preference scores. In addition, loud dynamics and simple performances also appeared to create
high preference decisions.
In contrast, the only combination of variables which appeared to be related to low preference was that of regularity and
"smoothness" of rhythm.
General Discussion
Smooth 1 2 3 4 5 6 7 Percussive
Irregular Rhythm 1 2 3 4 5 6 7 Regular Rhythm
Soft 1 2 3 4 5 6 7 Loud
References
Alpert, J. (1982). The effects of disc jockey, peer and music teacher approval of music on music selection and preference.
Journal of Research in Music Education, 30, 173-186.
Baker, D. S. (1980). The effect of appropriate and inappropriate in-class song performance models on performance
preference of third- and fourth- grade students. Journal of Research in Music Education, 28, 3-17.
Boyle, J., Hosterman, G., & Ramsey, D. (1981). Factors influencing pop music preferences of young people. Journal of
Research in Music Education, 29, 47-55.
Christensen, P. and Peterson, J. (1988). Genre and gender in the structure of music preferences. Communication Research,
15, 282-301.
Finnas, L. (1989). A comparison of young people's privately and publicly expressed music preferences. Psychology of Music,
17 (2), 132-145.
Flowers, P. J. (1981). Relationship between two measures of music preference. Contributions to Music Education, 8, 47-54.
Fung, C. V. (1996). Musicians' and nonmusicians' preferences for world musics: Relation to musical characteristics and
familiarity. Journal of Research in Music Education, 44, 60-83.
Geringer, J. M (1982). Verbal and operant discrimination - preference for tone quality and intonation. Psychology of Music,
Special Iss., 26-30.
Goimo. C. J. (1993). An experimental study of children's sensitivity to mood in music. Psychology of Music, 21 (2), 141-162.
Greer, R., Dorrow, L. & Hanser, S. (1973). Music discrimination training and the music selection behavior of nursery and
primary level children. Bulletin of the Council for Research in Music Education, 35, 30-43.
Back to index
Proceedings paper
Introduction
Feelings of swing, characteristic of some jazz performances, are both perceptually salient and definitionally
elusive. Put less formally, you recognize a swing feel when you hear it, but it's not at all clear just what "it"
is. One of the initial problems in studying swing is that the word is used to describe a wide variety of
phenomena. It can describe a style of music popular in the 30's and 40's, a feel of performance (e.g. as an
alternative to funk or Latin), or an ineffable quality which a performance can possess to varying degree.
(Swing as a quality is ineffable in part because, though most centrally describing music in the swing feel,
some jazz musicians feel that it is also found in music in other feels. This paper's hypotheses about
hierarchical impulse coordination may offer a partial explanation of this.) For the purposes of this paper,
'swing' will be understood as characterizing a competent performance in the swing feel; such a performance
should be identifiably in the swing feel and should possess a basic minimum of the quality of swing. Thus to
ask of a performance in the swing feel, "Does it swing?" will be taken as the equivalent of asking of an
utterance, "Was this utterance produced by a native English speaker?" To answer 'yes' to either question is to
identify the object as being acceptable and competently produced; it indicates membership in a category and
says nothing about location within that category.
The analogy with native English speakers can be pushed further. Just as an individual speaker will have both
a regional accent and unique individual characteristics, so swing feels vary both according to stylistic
currents within jazz and with the personal stamp of individual musicians. Both domains also offer research
problems of comparable complexity. When we recognize a specific person's voice, we do this on the basis of
so many individual but mutually-interacting parameters that a complete understanding of our mental
processing is a practical impossibility (Handel 1989). Accent recognition (here native vs. non-native
speaker) requires an abstraction out of those already complex factors, and thus poses an even greater
research challenge. The problem of characterizing swing requires a comparable abstraction out of already
complex objects, so we should not expect to find more than very incomplete conditions for swing, conditions
which are certainly not sufficient and often not even necessary.
Discussions of the sources of swing often focus on phenomenal accentuation of weak beats at the tactus level
and delay of weak beats at the first subtactus level, with particular attention given to the latter condition. It is
well known among musicians who do computer synthesis of jazz, however, that these conditions do not
suffice to produce a feeling of swing. Merely delaying weak eighths will not make music swing. (In notating
jazz, the quarter-note level is almost always the tactus level. This practice will be assumed in much of this
paper, and references to specific note values will always assume a quarter-note tactus. This will simplify the
discussion of different metric levels.) Furthermore, there is no archetypal rhythmic proportion of the swung
eighth to the quarter-note span in which it is embedded (hereafter "swing ratio"). This fact is a commonplace
among jazz musicians, and empirical verification of this was the subject of Experiment 1.
Experiment 1
In Experiment 1 recordings by eminent jazz drummers were studied for the relative durations of the swung
eighths played on the ride cymbals. Ride cymbals were chosen because they are extremely salient on a
sonogram, extending well above the frequency ceiling of the other instruments. Furthermore, the typical
swing pattern on the ride cymbals (all quarters plus offbeat eighths of beats two and four) provided plenty of
swung eighths to study. Drummers Art Blakey and Max Roach were chosen because they are particularly
flexible rhythmically. Because they are both ranked among the greatest jazz drummers, two important
assumptions may safely be made: that this flexibility is the product of intent and not of lack of control; and
that their performances fall within the bounds of acceptable practice.
Multiple portions of four tracks from two albums were imported as .aiff files and subjected to sonogram
analysis using the program AudioSculpt. In many cases markers corresponding to ride-cymbal hits were
placed algorithmically by AudioSculpt; in some cases they needed to be placed by hand. For each sample the
markers were exported as a list of timings. This list was analyzed by a PatchWork patch which calculated the
swing ratios, their average, minimum, and maximum, and their standard deviation. The results are tabulated
as Appendix 1. Swing ratios were found to range from 14% to 48%; this strongly confirmed the belief that
there is no archetypal swing ratio, and thus that rhythmic alteration is not a sufficient condition for swing.
Similar methods were used by Friberg and Sundström (1999), and similar results were obtained.
The Role of the Performer's Body
While the results of Experiment 1 are important for demonstrating the non-existence of an archetypal swing
ratio, it should be stressed that such an archetype would not sufficiently explain the phenomenon of swing
even if it did exist. For example, melodic lines that contain no eighth notes can be perceived to swing. In the
shout chorus of a swing-feel big-band chart the band is often in rhythmic unison, and there may be no
consecutive eighth notes for lengthy stretches; in a good performance these passages will nonetheless be
perceived as swinging.
In looking for sources of swing that go beyond the rhythmic structure of consecutive eighths, a number of
researchers have focused on the role of structured asynchronicity among performers. Another issue
commonly discussed among jazz musicians, it has been most prominently discussed in the academic
literature by Keil (1987), and has been studied analytically by Prögler (1995), synthetically by Friberg
(1999), and using both approaches by Iyer (1999). While the importance of structured asynchronicity has
been demonstrated by the success of these studies, there are plenty of other factors which contribute to
swing, and some of them do not involve synchronization . For example, it is possible for a solo line without
accompaniment to swing, in which case there are of course no issues of synchronization.
As one factor behind the phenomenon of swing which is more general than structured asynchronicity, I
propose that there are characters of movement of the performer's body which leave sonic traces in the
articulations and timings of the notes, and that these characters of movement are important in generating
feelings of swing. The music swings because the performers' bodies swing. This fits well with Gunther
Schuller's (1989) frequently quoted definition of swing: "in its simplest physical manifestation swing has
occurred when, for example, a listener inadvertently starts tapping his foot, snapping his fingers, moving his
body or head to the beat of the music." (p. 223) Iyer (1999) also roots his methodology in the physicality of
performance; his account of structured asynchronicity is based on physical aspects of some African and
African-American cultural practices and values.
My proposal differs from previous work in its emphasis on the role of articulation. I believe that amount and
uses of muscular tension in the performer's body make a difference in the quality of the sound, that a note
played as a rebound sounds different from a note played as a main impulse. If our bodies tend to respond
physically to music, it seems plausible that they do so because of recognizing the character of motion of the
bodies that made the music. (The possibility that physical participation may affect perception will be
discussed briefly below.)
A substantive description of swinging characters of movement is well beyond the scope of this study. While
some characterization will be offered, for the moment it will suffice to argue for the plausibility the concept.
First of all, it must be possible for subtle differences in muscular control to make a qualitative difference in
the sound produced. Iyer, for example, states that in playing certain drums "the only two elements at one's
disposal are intensity and timing." (1998, p. 105) In general I am skeptical about claims that any instrument
in which the performer's body is not entirely mechanically distanced from means of production of the sound
(i.e. any instrument not like the pipe organ) does not respond to subtle differences in the way the performer
moves when playing. To stay with the ride cymbal example, this may seem initially to be too simple an
instrument to produce such shadings of sound; it might be argued that the location on the plate of contact
with the stick and the forcefulness of that contact are the only variables. But this neglects the fact that the
stick and the plate have a complex interaction. They are in physical contact for a finite period of time, and
during this time they will behave to some extent as coupled oscillators. The firmness with which the stick is
held, the angle with which it contacts the plate, and the degree to which the stick is controlled or allowed to
bounce freely will all affect the interaction of stick and plate, and thus affect the vibration of the plate.
Secondly it must be established that the listener can in fact reconstruct information about the performer's
body from the sonic traces thereof. This is intuitively plausible based on the extremely detailed knowledge
inferred from environmental sounds; more direct verification was the purpose of Experiment 2.
Experiment 2
In Experiment 2, computer-altered versions of ride cymbal patterns recorded by a jazz drummer were played
for subjects. In each of two example groups, the swung eighth note was be moved in time but had its spectral
envelope preserved; the two groups were based on originals with differing swing ratios (.285 and .196). The
initial hypothesis was that when asked which version swung the best, subjects would choose the original
timing of the swung eighth based on articulation cues.
Experimental Method
Examples were based on recordings made of Dave Gluck, a nationally touring jazz drummer who is based in
New York. Dave played a 20" Zildjian Custom Dark Ride cymbal, and was recorded on DAT with sampling
rate 44.1 kHz using a stereo microphone placed between three and six inches from the cymbal. He was asked
to play typical ride patterns (all quarter-note beats plus offbeat eighths of beats two and four) with three
different feels; one in which the swung eighth was approximately equal to a triplet eighth, a snappier feel
which was more like a sixteenth, and a straighter one which was closer to a straight eighth. The feels were
recorded in that order, and before each a metronome was played at 130 beats per minute in order to have the
different feels have approximately equal tempo. Within each feel, after recording several measures I had him
play the same feel again twice, this time interrupting him by knocking his stick away in the middle of the
pattern. This was done after beat 2 and after the eighth that follows beat 2. The idea was that when moving
the swung eighth, either the off-beat eighth or the on-beat eighth that preceded it would have to be made
longer, and that it would be helpful to be able to copy the decay that would have occurred had the next note
not been played. The stick was knocked away, rather than simply asking Dave to stop the pattern, in order
that the physical motions which produced the note in question would be exactly those that would have been
used if the next note were going to be played.
The recordings made were imported as .aiff files and processed using BIAS Peak v. 2.0. Within each feel,
one half-measure excerpt was selected which seemed to swing particularly well. When it came to processing
the examples, however, problems arose due to my having underestimated the complexity of the sound of the
ride cymbal. Although I was explicitly testing its capacity to carry subtle articulation cues, I had not
observed its low-frequency behavior. In many musical settings the lower frequency components are not
particularly salient relative to other sounds in the same spectral region, and the envelope of these lower
components evolves relatively slowly. I had therefore not realized that they would present problems. What I
found was that hits with the same metrical position within the same feel were perceived as differing in
overall pitch; it was therefore not possible (with one exception) to mix the decay of one hit convincingly
with the attack of another. It was then necessary to find another way of extending the decay of some of the
hits. After trying a few different methods, I ended up simply copying the last few milliseconds of the hit and
pasting them in at the end. This usually involved only a few milliseconds (at most 23.5 ms pasted four times,
but that was extreme). I did not perceive this as causing any distortion of the sound. The only other method
used was in lengthening the swung eighth of the version with swing ratio .196. This note was problematic
because for some examples it needed to be lengthened considerably, and because it was interrupted by the
next hit very soon into its decay, at a point at which the envelope was evolving extremely rapidly. For this
set I ended up using as my model the last complete pattern from the take in which I interrupted the drummer
after the swung eighth. The swung eighth which ended that take was thus played almost immediately after
the swung eighth in the pattern used as a model, and they had the same overall pitch. I was therefore able to
splice the decay of the final eighth onto the swung eighth from the basis pattern.
Three groups of seven examples each were produced. The first group was based on a half bar from the
approximately triplet feel and had swing ratio .285. The second was based on the snappier feel that was more
similar to a dotted rhythm and had swing ratio .196. The third feel was from the straighter feel and had
swing ratio .35. In each group the overall duration of the pattern was kept constant, and the swung eighth
placed in seven different positions, ranging from ratio .42 to ratio .18 in increments of .04. In each case the
basis pattern lasted from the attack of beat one until just before the attack of beat four. The span from the
attack of beat two until just before the attack of beat four was then copied and pasted in repeatedly until there
were three full bars, concluding with the first beat of the fourth bar (i.e. ending just before the attack point of
the second beat of bar four). Once these groups had been generated, the entire third group was discarded as
sounding awkward; something about the feel of the original half bar made it not conducive to exact
repetition. The first and final examples of each group were also discarded as obviously not swinging, leaving
five examples in each group ranging in ratio from .38 to .22.
There were several problems intrinsic to this method of constructing examples. In order to isolate the
relationship between swing ratio and articulation, it was necessary to repeat the same pattern exactly several
times. (It might have been possible to use examples only a half bar long, but it would have been very
difficult to judge such a short example as either swinging or not swinging.) The exact repetition, however,
created a situation which is basically never heard in real jazz; even the most consistent drummers do not use
such absolutely exactly repeating patterns. The second problem was that the perceived lower frequency pitch
of the cymbal changed over the course of the half bar. Pasting on a repetition thus led to a completely abrupt
change in the perceived main lower frequency; such abrupt changes are never heard in real cymbal playing.
The result was a strange oscillating melody of sorts in the lower frequency range. (Readers wishing to hear
some of the examples should visit my website, "www.columbia.edu/~jpi9".)
Once the examples had been generated, I realized that the coding of bodily motion via articulation cues did
not create such strong restrictions as I had supposed. I was correct in predicting that the same swung eighth,
moved in time from its original position, would not only sound rhythmically different but also sound like it
had been produced with a different character of motion. It was true that the same eighth, at first relaxed and
natural, would gradually become stiff and awkward. But this did not happen as quickly as I had expected.
Rather, there was a considerable range across which the character changed, but in which the examples still
sounded musically viable.
In an attempt to verify the importance of articulation cues, I created a number of other examples which were
identical to existing ones in temporal and peak intensity profiles but
which were assembled out of hits which were not originally proximate (and which should therefore be
expected to have little relation to each other in terms of structure of articulations). In making various
examples I used a single on-the-beat hit, a variety of on-the-beat hits, or hits which retained their original
metrical position but which came from different feels. To my ear those examples ranged in the order
presented from stiff and unmusical to fairly reasonable. At the least, this indicates to me that on-the-beat hits
and off-the-beat hits are, in fact, qualitatively different.
Examples were put into pairs and triplets for comparison, with sets falling in three categories. Within each of
the two groups derived from one of the original feels, the five examples were divided into two overlapping
triplets, with swing ratios ranging from .38 to .30 and from .30 to .22. These four sets comprised the first
category. There were also five pairs, in which the examples having the same swing ratio from each of the
two groups were compared. Those were the second category. Finally, the third category consisted of six
pairs and one triplet in which the more synthetically generated examples were compared with the originals
which they were imitating. All sets were then randomly ordered, and within each set order of presentation of
the examples was also randomized. Once in sequence, the 37 examples comprising the 16 sets were burned
as separate tracks onto CD-R; they were distributed over two CD-R's for ease of track control.
Jazz musicians have extremely divergent attitudes about research into the inner workings of the music.
While some are eager to think about and discuss such issues, others find the attempt to explain that which is
ultimately unexplainable at best futile and at worst offensive. Although I do in fact agree that the workings
of swing will never be completely explained, I decided that it was safest and easiest not to indicate to the
subjects that swing was the object of my investigations. They were therefore informed that the experiment
looked at differences between jazz and classical musicians in how they perceive and characterize swing
rhythms. Subjects were asked to rate the examples within each set numerically, and also were also asked to
write a few words about how they perceived the differences in character. Subjects were allowed to use the
same number more than once if they felt that there was no difference in their preference of some of the
examples. Subjects were given a booklet in which to record responses; the booklet listed the tracks grouped
into sets and had room for comments after each set. The subjects were allowed to control the CD player
themselves and listened through headphones.
Subjects were eight members of the Manhattan Jazz Orchestra and headlining vocalist Johanna Grüssner. All
are professional jazz musicians based in New York. Subjects were given a small honorarium in appreciation
for their help. Most of them completed the task in approximately fifteen minutes. Due a technical problem, 5
of the subjects had sound coming from only one side of the headphones; the responses of these subjects are
not statistically differentiable from those of the others, however.
Results
From the standpoint of verifying my hypothesis, the experiment was an almost complete failure. It was,
however, instructive in other ways.
The only definitive results came in the sets which compared triplets on the basis of swing ratio. It was
assumed that the effects in question would be strong, and that therefore any real difference should be
virtually unanimously agreed upon. No statistically meaningful conclusions can be drawn from anything less
than virtually complete agreement with such a small number of subjects. For the first feel, comparing ratios
between .38 and .30, .30 was preferred by all eight subjects who expressed any preference. .34 was preferred
over .38, but too weakly to be statistically meaningful (5-2). Comparing ratios between .30 and .22, .30 and
.26 were tied (5-4) for first preference; all nine agreed that .22 was the worst. For the second feel, .30 was
preferred over .34, which was considered better than .38; two expressed no opinion and one each couldn't
decide between the adjacent pairs, but no responses contradicted that ordering. Comparing ratios between
.30 and .22, results were much more spread out. Each example got one of the rankings four times, one three
times, and one twice. .30 and .26 were equally preferred, both receiving four number one rankings. .22
seemed least preferred, receiving four number three rankings, three twos, and two ones. From a statistical
point of view, however, these differences are not meaningful.
It may support my hypothesis that with the group which was based on an original swing ratio of .285, a ratio
of .22 was unanimously felt to be worse than either .26 or .30, whereas when the original ratio was .196, all
three ratios were approximately equally preferred. For this to strongly support the theory, though, it would
be expected that when the two examples with ratio .22 were compared directly, the one with original ratio of
.196 would be strongly preferred over the one with original ratio .285. This was not the case; the preference
for the one with .196 original ratio was far too weak to be meaningful. I don't think that this experiment
succeeded in demonstrating anything at all about the role of articulation cues. I would also caution against
any reductive conclusions about swing ratios based on the results of the timing comparisons. The
preferences do seem quite clear, but they are in the absence of a musical context, and at one specific tempo.
While, at a given tempo, some ratios may be more normative than others, in the right musical context ratios
much more extreme ratios than those tested here can swing.
Of the five pairings of examples with the same swing ratio from the two example groups, none produced
significant preferences. Similarly, of the sets which compared more synthetic examples with the originals
which they imitated, only one pair produced significant results, and in that pair the synthetic example had an
extremely noticeable pitch mismatch. Most surprisingly, the synthetic example which to my ear was the
most awkward and unmusical was actually preferred over its original by 5 subjects. Results are tabulated
more fully in Appendix 2.
The real value in the results of Experiment 2 lies in the focus it puts on the subjectivity of swing. In the
introduction I stressed the difficulties in characterizing swing which result from the variety of music which
is felt to swing. Swing is also difficult to characterize because it intersects with individual values which vary
from person to person. It was not uncommon to have two subjects identify the same parameter as marking
the most salient difference between examples, but to evaluate the examples in the opposite way based on the
same parameter. For example, they might agree that one example has more accentuation on beats two and
four than the other, but while one will hear this as good, the other will hear it as exaggerated. Or, in the case
of the synthetic example which had the extreme pitch mismatch, one subject liked the synthetic example,
saying that it "has an interesting two tone thing going on." Another subject, though, complained that "He's
fucking around with where he/she puts the stick on the ride in [the synthetic example]."
Another way in which responses were subjective was in the choice of parameters to give attention to. Even
such simple examples offered a variety of different potential foci of attention. One subject, for example,
made almost no comments other than 'too stiff;' it seems likely that he was bothered by the absolutely
inflexible repetition of the patterns. Another subject (the same one quoted at the end of the previous
paragraph) felt that almost all of the samples were terrible and expressed very few preferences; his one
moment of enthusiasm came when he heard the most synthetic example, the one made with repetitions of
just one hit. He said that that example was "the best so far because the drummer is not fooling around with
stick placement and volume." He made exactly the same comment when he encountered that example in a
subsequent set; it seems likely that he was disturbed by the pitch mismatch. In general, subjects frequently
called attention to different parameters in their written responses.
Finally, responses to swing are subjective because listeners do not differ only in the evaluation and attention
to their perceptions; they are able to affect their perceptions by their behavior. I first noticed this in the
course of Experiment 1, listening to Max Roach's ride cymbal playing on Move. I found that when I listened
hearing the half note as the tactus (an incorrect hearing to a jazz musician) I didn't hear any swing in the ride
pattern at all. I noticed no delay of the off-beat eighth notes, and no accentuation of beats two and four.
When I listened hearing a quarter-note tactus, however, both the very subtle delay of the swung eighths and
the emphasis on two and four became audible; the line started to swing. I had a similar experience when
preparing the examples for Experiment 2. I found that when I was disposed to an example swinging, when I
moved my body in response to it, even the worst examples had at least some swing to them. But if I sat
motionless as a detached and uninvolved listener, some of the better examples sounded sterile and
completely unswinging. Iyer frequently refers to listeners as coperformers (1998); from these experiences it
seems to me that swing is not simply a property of a sound signal, or even of a perception which may vary
from listener to listener but which is fixed for each signal-listener pairing. Rather, swing is, at least in part,
something which the listener participates in, and which the listener has the power either to encourage or to
resist.
Future Directions
In considering future versions of this experiment, there are some problems which could be readily solved,
but many others which would remain. The most obvious area of improvement is the example materials. As
manipulating sound files presents so much difficulty, it would seem desirable to precisely manipulate the
performer instead; the obvious solution is to use a computer-controlled piano which can both record touch
and timing data and play on the basis of touch and timing input from the computer. This would eliminate the
problems such as pitch-mismatch while still allowing a note to be moved in time though remaining
unchanging in character.
The problems related to subjectivity of response, however, would remain. This kind of investigation into
swing presents an experimental catch-22. In order to isolate individual parameters, it is necessary to remove
much of the context, as changing one parameter also changes the relationship between that parameter and the
others present. But removing context only opens the door for the listener's personal responses and
assumptions about context to play an even greater role. Furthermore stripped-down examples which are
perceived as raw materials for music rather than as music itself invite the listener's participation much less
than real, complex musical examples. It's not clear that it would be profitable to repeat this experiment with
the relatively cosmetic (though necessary) change of example source; future experiments bear substantive
rethinking.
Toward a Characterization of a Swinging Performer
As noted above, this paper will not offer a thorough account of swinging motions in performance. There are,
however, some observations about impulse coordination and hierarchy which can serve as the beginning of
such an account.
Unless notes are significantly isolated in time, a performer will not usually execute all notes with equal
impulses. Rather, there will be dominant impulses coinciding with hypermetrical beats; all other notes will
group together gesturally with the notes receiving dominant impulses, joining either the dominant note that
follows or the one that precedes. Each of those notes will receive a subordinate impulse. These subordinate
impulses will be less active, and have the character of a rebound.
An analogy with sawing wood may help convey the intended concept. Imaging first sawing through a piece
of dense particle board. The hardness of the wood, combined with its internally conflicting grain, make it
difficult to saw. Each stroke is very active and must be independently initiated. The saw comes to rest at the
end of each push stroke, and the next pull must be entirely the product of a new impulse. In this case there is
no hierarchy of impulses. Contrast this with sawing balsa wood. It is so easy, and the saw moves through the
wood with so little resistance, that the pull feels like an easy rebound from the push. One main push may
even produce the next three as subordinate rebound strokes. In this case the rebound strokes receive impulses
which are subordinate to that of the main stroke. Note that these strokes are not literally rebounds; without
the intention of the person sawing, one stroke would come to an end and stop, even with balsa wood.
Although these strokes result from their own impulses, those impulses have less intentionality than the main
impulse; it feels almost as if they were happening of their own accord. It is in this sense that impulses are
hierarchized.
To return to a musical example, consider a string player or pianist playing a passage from Mozart featuring
running eighths in a quick tempo. Dominant impulses would fall either every measure or every half measure,
and the subordinate impulses would be relatively undifferentiated. This is necessary for the smoothness
which is stylistically typical; more frequent dominant impulses would make the passage sound beaty and
unmusical; they would also increase the physical tension of the performer. Note that this has the effect of
deëmphasizing the quarter-note level: dominant impulses are less frequent than the quarter-note level and
note changes more frequent. (Dominant impulses are usually audible, though not necessarily because of
being louder.)
Contrast this now with a jazz musician playing running eighths in a swing feel. While there will still be a
larger organization of four or eight notes, of which one will receive a dominant impulse, there will now be a
hierarchically intermediate impulse given to each note falling on a quarter-note beat. There is a gestural,
impulse hierarchized pairing of notes. (It is not universal that the strong impulses fall on the beat; at slow
and moderate tempi the intermediate dominant impulses often fall on the weak eighths. At extremely fast
tempi this kind of syncopation is rarely heard as it becomes extremely difficult to execute.) This will
emphasize the quarter-note level, and the presence of more relatively superordinate impulses will also
increase the intensity. This hypothesis may account for those jazz musicians who swing even though they
use little or no rhythmic inflection. They are perceived as swinging because of the audible results of the
added middle level of impulse with its resultant strong-weak duple impulse pairs. The hypothesis may also
explain how pieces of music which are not in the swing feel may nonetheless seem to swing. Again, it is
important to state that the claim here is not simply that notes on the beat are louder. The claim is that the
difference in impulse control makes the action different; the difference in action makes the sound different.
It is a qualitative difference in sound that is being claimed as the crucial percept here.
This claim about differences in impulse hierarchy seem difficult to verify short of wiring performers to take
data about muscle use. Syllable choice in singing instrumental music, however, provides a way into these
issues. It is commonly observed that jazz and classical musicians sing instrumental music very differently;
classical musicians tend to vary the vowels they use less than jazz musicians. As the vowel chosen
determines the position of the vocal apparatus, and as these various positions involve known relative
amounts of muscular tension, analysis of syllable choice can yield patterns of relative tension and relaxation.
At first glance, the expected patterns conform very well to the hypothesis of impulse hierarchy. A classical
musician singing running eighths will often stay on one syllable, or change syllable infrequently, reflecting
locally undifferentiated levels of tension. A jazz musician, on the other hand, will often employ a pattern of
regularly alternating vowels, corresponding to a regular alternation in the level of muscular tension. Putting
this oversimplified analysis on an empirical footing was the purpose of Experiment 3.
Experiment 3
In Experiment 3 the same nine subjects used in Experiment 2 were recorded singing five instrumental lines
taken from well-known jazz compositions. They were told that the experiment looked at differences between
jazz and classical musicians in the ways that they communicate with other musicians by singing. They were
told not to approach the task like a singer giving a performance, but rather like a musician communicating
with another musician about how the music should go.
The examples chosen were as follows: the first alto part from the first sixteen bars of the first chorus (mm.
13-28) of "Basie - Straight Ahead" by Sammy Nestico, found in Wright (1982); the first alto part from the
first 8 bars of the fourth chorus (mm. 109-116) from the same chart; the flügelhorn part from the first 16 bars
of "Three and One" by Thad Jones, also found in Wright (1982); the first alto part from the first sixteen bars
of the second chorus, with the measure-long lead-in, (mm. 31-48) also from "Three and One"; and the
beginning of a lead sheet of Charlie Parker's "Ornithology," up to the downbeat of measure 12. The
examples each featured passages of running eighth notes in addition to other rhythms. The examples were
chosen for their familiarity, in the hopes both that the singing would be as expressive as possible and that
musicians would not need to concentrate on reading the music while singing; this was not intended to be a
sight-singing task.
The recordings have not yet been fully transcribed; they may well contain enough interest to be the subject
of a separate study. Initial processing reveals the following generalizations, however. Off-beat eighths are
typically either schwa sounds or not sung with vowels at all, but rather with the nasal voiced consonant /n/.
The schwa is produced with a very neutral, relaxed position of the vocal apparatus. On-beat eighths (and
some off-beat eighths which are tied into the next beat, functioning as anticipations) are typically more
active vowels, especially /i/ and /u/. /i/ is produced with the tongue high and in the front of the mouth, /u/
with the tongue high in the back of the mouth and with lips somewhat constricted. (Handel 1989, pp.
135-147) Not only are these both more active than the schwa, they are two of the three 'cardinal vowels', so
named because they are produced in the physically most extreme positions, thus bounding the space of all
vowels. (Handel 1989, p. 143) The most typical vowel patterns were based on regular alternations between
two vowels, with either /i/ or /u/ on the beat and the schwa off the beat. This fits perfectly with the
hypothesis of impulse hierarchy; the alternation between vowels that are maximally active and maximally
passive makes sense as an alternation between dominant impulses and rebounds.
One of the subjects in Experiment 3 was remarkable for his musical knowledge and musical memory. With
each example he glanced at the sheet of music to see what was wanted and then put down the music and
sang from memory. He frequently sang more than had been requested, and in the case of the opening of
"Three and One" he sang some of what the rest of the band does during the rests in the flügelhorn melody.
His singing was therefore purely the expression of an already-formed musical conception, unencumbered by
the parallel task of forming a conception based on reading music. His rendition of "Ornithology," transcribed
as Example 1, demonstrates strategic musical use of the physicality of vowel sounds. From here the paper
will follow the informal vowel notation used in Example 1.
Rather than using mainly one of the more active vowels, he used both, exploiting the space between them.
He moved from the initial doo to dee on the third beat of measure one, using dih as an intermediate point for
a smooth transition. He then created an intensification into the goal of the phrase, the down beat of measure
2, by a direct alternation of the distant doo and dee. In measure three he left off the alternation in the third
through fifth eighths in order to create a sense of headlong drive into the two syncopated quarter notes. In his
rendition the b-flat in measure 4 is heard as a surprise, a sudden turn in the melody which receives a very
different kind of accent from those that the syncopated quarters got. This was achieved through use of
syllables, as the dee is brighter than the dih of the quarters; it stands out particularly saliently because of
being preceded by a doo. The rest of the phrase is a relaxation, with less in the way of extreme contrasts. A
nice touch is the variety of upbeat accents, the first produced by a strong plosive d in the duh at the end of
measure 4, the second, on the second eighth of the second beat of measure 5, produced by use of a doo. A
similar effect was achieved in the next phrase, when he created a weak accent on the second eighth of beat
three by omitting the consonant on the preceding eighth. The anticipation of measure 9, which is heard as an
intensified sequence of the preceding phrase, is brought out by use of the bright bee, though somewhat
softened by the consonant choice. The accent on buh in measure 9 is very strong, set off by a complete stop
of breath just before it. It shows that different kinds of accent don't have to correlate, as both vowel and
consonant choice would make buh quite weak if not for the accent by physical intensification. The end of the
phrase is punched out with the direct dee doo alternation, both on quarter-note beats. The rest of the example
relaxes from that point of greater tension with more normative syllable alterations which do not juxtapose
cardinal vowels. In addition to providing very strong support for the hypothesis about impulse hierarchy, this
example shows that this kind of syllabic analysis could provide new analytic insights into scat singing.
Convergence of the Visceral and the Cognitive: Fast Hard Bop
So far this paper has concentrated on more visceral aspects of swing; there are of course also cognitive
aspects, and in the case of fast hard bop these can combine to begin to account for important aspects of style.
Issues of beat finding are implicit in the above discussion, as the presence of hierarchically intermediate
impulses coinciding with the tactus can aid the listener in finding the beat. These are not, however, the only
cues. Consider a metrical grid of the sort used in Lerdahl and Jackendoff (1983). It is perfectly symmetrical
between levels; any level could serve as the tactus as well as any other. Now apply this to a swing feel jazz
tune, and suspend the assumption that the quarter note gets the tactus. It would be typical to find phenomenal
accentuation on every weak quarter-note beat. As this would happen throughout much of the tune, it would
make it unlikely for the tactus to be at the half-note level or above, as that would mean that the entire piece
was syncopated. It would also be typical for each weak eight-note beat to come late. As the tactus is
generally regular, this would make it unlikely for the tactus to be at the eighth-note level or below. (A
similar conclusion, though on different grounds, is found in Iyer 1998, pp. 117-118). This leaves the
quarter-note level as the only preferred level for the tactus.
Now consider fast hard bop tunes. These often have tempi above 200 quarter-note beats per minute. If a
classical piece had such a fast tempo, it is extremely unlikely that the quarter note would be heard as the
tactus. An experienced listener would likely hear a half note tactus for a classical piece with quarter-note
tempo 200 or above. A fast hard bop tune can also be heard with a slower half-note tactus, but experienced
listeners, aided by both the hierarchically intermediate quarter-note impulses and the cognitive cues
discussed above, will cling to the quarter-note tactus. (Jazz musicians will simply know that that's where the
tactus is, based on elementary grammatical cues.) Thus the visceral and the cognitive join forces to make
perceivable a tactus which lies considerably outside of normal limits. The performance is likely to be
intense, because of the tension involved in generating the tactus-level intermediate impulses. This is
compounded by the listener, who often pulses along with the tactus physically. Musicians playing fast hard
bop create intensity in many ways, but not least by this exploitation of cognitive and perceptual tendencies.
Conclusions
Swing is a phenomenon of immense complexity which arises out of the interactions of many parameters and
which is in many ways subjectively perceived. This paper offers a very partial account of the causes of
swing in its proposal that there are modes of bodily coordination which are in some way swinging, and
which leave audible traces in articulation, timbre, rhythm, and intensity. We understand music to swing in
part because we decode the traces of swinging performers. One aspect of swinging bodily coordination, at
least in the case of consecutive eighth notes, is an alternation of dominant and subordinate impulses. This
theory of alternating impulses was strongly supported by analysis of spontaneous scat singing. Providing
along the way a new paradigm for the analysis of scat singing, this method offers a window into the
simultaneous manipulation of multiple parameters toward musically expressive ends. The explanatory value
of impulse hierarchy was further demonstrated when issues of impulse hierarchy and of cognition were
coordinated in the account of aspects of fast hard bop. Experiment 2, which failed to demonstrate that the
mode of motion of the performer actually has audible effects, was valuable for highlighting the subjectivity
of swing. In doing so it provided support for the main hypothesis. If it is true that we perceive swing more
strongly when our bodies start to swing, then it plausible that the invitation to the dance might come through
sonic recognition of potential partners already out on the floor.
Acknowledgments
This research began in classes taught by Fred Lerdahl and Thanassis Rikakis, and they have both continued
their support and help throughout. Dave Gluck was extremely kind in allowing me to record his playing, and
in discussing the issues with which this paper is concerned; he even gave me a first lesson in playing the ride
cymbal. Johanna Grüssner and the members of the Manhattan Jazz Orchestra were also generous in putting
their musicianship and expert ears at my disposal, as well as their time and patience. Vijay Iyer helped me
through a fruitful conversation which included a contribution to the experimental design. Doug Abrams, the
music director of the Manhattan Jazz Orchestra, contributed to this research indirectly through many
conversations about music over the course of the last decade or so. Finally, Bret Horton spent a long day
assisting me in carrying out experiments. He ran Experiment 3 while I ran Experiment 2 concurrently;
without his help this research could not have been completed on time.
Bibliography
Friberg, A.K., & Sundström, A. (1999). Jazz drummers' swing ratio and its relation to the soloist. Presented
at the 1999 conference of the Society for Music Perception and Cognition.
Iyer, V. S. (1999). Microstructures of Feel, Macrostructures of Sound: Embodied Cognition in West African
and African-American Musics, Ph.D. dissertation, University of California at Berkeley.
Handel, S. (1989). Listening: An Introduction to the Perception of Auditory Events. Cambridge, MA, MIT
Press.
Keil, Charles, 1987. Participatory Discrepancies and the Power of Music. Cultural Anthropology, 2,
275-283.
Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge, MA, MIT Press.
Prögler, J.A. (1995). Searching for Swing: Participatory Discrepancies in the Jazz Rhythm Section.
Ethnomusicology, 39, 21-54.
Schuller, G. (1989). The Swing Era: 1930-1945. Oxford, Oxford University Press.
Wright, R. (1982). Inside the Score. Delevan, New York, Kendor Music, Inc.
Back to index
Proceedings paper
A Stroop-like effect in Hearing -Cognitive interference between pitch and word for absolute pitch
possessors-
Kengo Ohgushi
Faculty of Music, Kyoto City University of Arts
13-6 Kutsukake, Ohe, Nishikyo-ku, Kyoto 610-1197 Japan
ohgushi@kcua.ac.jp
1. Introduction
The Stroop effect is a psychological effect which refers to cognitive interference between a color and a word
when a subject is asked to give the name of the color of a color word (Stroop, 1935). The response time in the
incongruent condition (e.g. word:red, color:green) is longer than in the congruent condition (e.g. word:red,
color:red). This effect seems to be something related to the cognitive conflicts produced when visual and
semantic information contradict one another. This kind of effect has not yet been reported in hearing. In our
university, we have many music students with absolute pitch. They can identify and vocalize the note name (=
pitch name) of musical tones as /do/, /re/, /mi/ and so on, when they listen to any musical tones. An interesting
idea occured to me. If a listener with absolute pitch listens to a vocal tone uttered as one of the seven note names
(Do,Re, Mi,...,Si) with a pitch different from the uttered note name, the listener may be confused by cognitive
interference due to the difference between the note name and the uttered word. If the note name and the uttered
word are the same, then the cognitive interference will not occur. The aim of this study was to discover whether
the cognitive conflict occurs between auditory (pitch) and semantic (word) information. If it does occur then a
Stroop-like effect occurs in the auditory system as well as in the visual system.
2. Method
2.1 Stimuli
A male singer vocalized each of seven words: /do/,/re/,/mi/,/fa/,/sol/,/la/,/si/, each with seven pitches: Do, Re,
Mi, Fa, Sol, La, Si, where the fundamental frequency of La tones was 442 Hz. These 49 vocal tones included 42
tones in which the vocalized word and the note name of the tone were different (incongruent condition) and
included 7 tones in which the vocalized word and the note name were identical (congruent condition). These
sung tones were recorded on a digital tape recorder (DAT).
2.2 Subjects
Five students with absolute pitch served as subjects. They were graduate students majoring in music at Kyoto
City University of Arts and all of them were able to identify perfectly 60 notes from C3 to B7 played on the
piano.
2.3 Experiment
The stimuli were presented to subjects in a random order through headphones, and the subjects were asked to
respond by uttering the note name of each of the 49 tones as quickly as possible ignoring the word pronounced.
The stimuli and the responses voices were recorded on the left and the right channels of a DAT, respectively.
The response time (RT) in the congruent conditions (e.g. word:/do/, note name:Do) and the incongruent
conditions (e.g.word:/do/, note name:Re) were measured. Next, the subjects were asked to respond by repeating
the word of each of the 49 stimuli as quickly as possible regardless of the note name. Subjects practised 20 trials
before this listening experiment.
3. Results
3.1 Note-name response
Fig.1 shows the response time for each note-name judgment averaged over five subjects. The abscissa gives the
note name of each tone stimulus, and the ordinate represents the average response time in ms. Averaged data
over all notes, illustrated as the rightmost bars, indicated that the averaged response time for the congruent
condition RTc (=684 ms) was shorter than the average response time for the incongruent condition Rti (=807
ms). This difference is significant according to the t test (p< .05). But in the case of Sol and Si, RTc was slightly
longer than RTi. This seems to be due to the fricative /s/. This problem will be discussed later.
Fig.2 also shows the response time for each note-name judgment averaged over five subjects. It shows that RTc
is significantly shorter than RTd (p< .01), except for the /la/ voice. RTd for /so/ and /si/ are markedly longer than
for the others. Examining the time envelopes of /so/ and /si/ tones revealed that the voiceless portion of the tones
continued for 200~300 ms. This means that the time from the onset of the fricative /s/ to the onset of the vowel
was comparatively longer than that for the other tones.
Fig.4 also shows the results of word response. For all words, RTc were significantly shorter than RTi (p< .01).
Fig.5 shows the individual differences in note-name responses. For all subjects, RTc was significantly shorter
than RTi (p< .01).
Fig.6 shows the individual differences in word response. For subjects AU and KT, RTc was almost the same as
Rti. RTc was not necessarily larger than RTi. The average RTc was not significantly shorter than the average
RTi.
4. Conclusion
It was shown that cognitive interference between sensory information (pitch)
and semantic information (word), occurs in audition as well as in vision. Here it
is called a Stroop-like effect. Further, a Stroop-like effect in reverse, i.e. the effect in which RTc becomes
shorter than RTi for a word shadowing task, was observed. This seems like a very interesting result. Sensory
information has a greater influence on semantic information in audition than in vision.
5. Acknowledgment
This study was supported by a Grant-in-Aid from the Ministry of Education of Japan, for Science Research
09871019 (1997~1999).
References
Stroop,J.R., (1935). Studies of interference in serial verbal reactions. Journal of the Experimental Psychology,
18,643-662.
Back to index
Proceedings paper
Experiments on musical pitch perception have shown that intervals with equal frequency ratios are not always
perceived as the same by musicians, but are affected by the tonal context, or key, in which the intervals are heard
(Krumhansl, 1979; Krumhansl & Keil, 1982; Krumhansl & Shepard, 1979). Memory for the pitch of single tones
has also been shown to be affected by whether a series of tones intervening between the standard and comparison
tones is tonal or atonal (Dewar, Cuddy, & Mewhort,1977; Krumhansl, 1979). These findings are thought to result
from the use of an internal frame of reference (the scales) in the encoding of pitch. The present study is a test of the
hypothesis that nonmusician's recognition memory for pitch in the form of intonation judgments concerning notes
in familiar melodies will be affected by the tonal context in which the tones are heard. Specifically, it is expected
that differences in the tonal functions, or tonal stabilities, of the tones in their respective tonalities will contribute to
differences in the ability of listeners to perceive small changes in frequency when the notes are out-of-tune. Since
the more consonant steps in the scale are more easily remembered, it is predicted that they will also be more
discriminable from close neighboring tones.
Experiment One
In the first study, participants made intonation judgments concerning notes which were the second and fifth degree
of the scale in the key of the melody in which they were heard. On half of the trials, the tone being judged had an
absolute frequency of 256 Hz (middle C). On the other half, the frequency was 384 Hz (the G above).
Method
Participants: 27 musically-untrained listeners participated as part of one of the options
for fulfilling the experimental methodology activity requirement for an introductory
psychology course at Indiana University Purdue University Fort Wayne.
Materials: Eight melodies which were familiar to the listeners were synthesized and
presented using the Hypersignal software by Hyperception, Inc. The following four
melodies contained target tones that were the fifth degree of the scale: America; Jingle
Bells; Row, Row, Row Your Boat; and Doe, A Deer. The next four melodies contained
target tones that were the second degree of the scale: The Alphabet Song; When the
Saints Come Marching In; Happy Birthday; and Here Comes the Bride.
Procedure: Each participant was asked to judge whether a particular note, the "target" in
each of eight melodies, was in tune or not. The melodies were heard free field and
judgments were entered on an answer sheet. Six judgments were available which
reflected the listener's judgment about whether the note was "right" (in-tune) or "wrong"
(out-of-tune; sharp or flat) and the degree of confidence: definitely right, right, maybe
right, maybe wrong, wrong, definitely wrong. Each listener heard nine versions of each
melody in random order for a total of 72 trials. Each version differed in (1) whether or
not the target was in tune and, if not, (2) the degree to which the target was out of tune.
The nine possible targets were separated by eighth-tone steps. Version one was a
half-tone flat, version two was 3/8th-tone flat, and so on up to version nine, which was a
half-tone sharp. Participants were informed of which note was the target by using the
words of the songs. The words were presented in written form with those words
corresponding to non-target tones in black, lowercase type and those for target tones in
red, uppercase type. For example, while listening to "Row, Row, Row Your Boat",
participants would see the following:
Row, row, row your boat
Gently down the STREAM
Merrily, merrily, ...
They then made their judgment concerning the note corresponding to "stream". These
judgments were made without feedback.
Results
An initial analysis showed that responses did not differ according to the absolute frequency of the target.
Therefore, the data were collapsed across both frequencies. Figure 1 shows the total number of "right" responses of
any sort as a function of the number of 1/8 tones the target note was out-of-tune. The bell shape of the curves are
typical for frequency discrimination data with two exceptions. (1) The point at which the target was most likely to
be heard as in-tune for the P5 condition was not centered on the objectively in-tune frequency, but was slightly
higher in pitch. (2) The curves are steeper (better discrimination) when the target was flat than when the target was
sharp for both P5 and M2.
The signal detection analysis shows a similar pattern. For both P5 and M2, two separate analysis was performed for
when the targets were sharp and when they were flat. Hit rates were the percentage of correctly responding "right"
when the targets were in-tune. The two False Alarm rates were the percentages of incorrectly responding "right"
when the target was either a quarter-tone flat or a quarter-tone sharp. While Hit rates were basically the same in all
four cases, False Alarms were greater when the out-of-tune targets were on the sharp side.
Experiment Two
In the second study, participants made intonation judgments concerning notes which were the sixth and eighth
degree of the scale in the key of the melody in which they were heard.
Method
Participants: 18 musically-untrained listeners participated as part of one of the options
for fulfilling the experimental methodology activity requirement for an introductory
psychology course at Indiana University Purdue University Fort Wayne.
Materials: Eight melodies which were familiar to the listeners were synthesized and
presented using the Hypersignal software by Hyperception, Inc. The following four
melodies contained target tones that were the eighth degree of the scale: This Old Man;
Santa Claus is Coming to Town; All I Want for Christmas; Brahm's Lullaby; The next
four melodies contained target tones that were the sixth degree of the scale: Highlands;
Oh, Shenandoah; Bicycle Built for Two; and Amazing Grace.
Procedure: The same procedure was used as in the first experiment.
Results
Once again, an initial analysis showed that responses did not differ according to the absolute frequency of the
target and the data were collapsed across both frequencies. Figure 2 shows the total number of "right" responses of
any sort as a function of the number of 1/8 tones the target note was out-of-tune. The peaks of both curves are at
the point where the targets are in tune. The overall steepness of the curve for P8 is greater than for M6, showing the
P8 to be more easily discriminated from near neighbors. There is an asymmetry in the curves for both P8 and M6
with a steeper fall-off on the flat side indicating better discrimination.
The signal detection analysis shows overall better discrimination for P8. There is a conservative bias to respond
"wrong" when the target is P8 and a bias to respond "right" when the target is M6.
Discussion
First of all, musically-untrained listeners are able to hear relatively small changes in intonation. For all the P8, P5,
and M2 scale degrees, they are able to detect a quarter-tone change from the point of subjective equality (which is
slightly sharp in the case of P5). For the M6, a change of a 3/8 tone is required for reliable discrimination.
Secondly, there are clear differences in the discrimination functions for the different scale degrees. Therefore, it is
evident that musically-untrained listeners' pitch intonation judgments are sensitive to the tonal functions of the
tones. It is not so evident what mechanism(s) is(are) responsible for these differences. It is not a simple case of
more consonant tones being more discriminable than less consonant tones. While P8 targets were most
discriminable, P5 was no better than M2. The differences in response bias for the different tonal functions suggest
that the accuracy of their memory representations may influence the criteria that is used in accepting a tone as
in-tune. In other words, if one has a good idea concerning what a note such as P8 should sound like, one will have
a more strict criteria for what one calls in-tune. Of course, that begs the question of why certain tonal functions are
better represented than others. Another possibility is that a perceptual quality associated with the different tonal
functions changes at a different rate in the tones surrounding the target tones used in this study. The differences in
the discrimination functions may reflect the degree of change in this subjective quality at 1/8, 1/4, 3/8, etc. tone
differences from the in-tune note. The greater the local change, the better the discrimination.
References
Dewar, K.M., Cuddy, L.L. & Mewhort, D.J.K. (1977). Recognition memory for single tones with and without
context. Journal of Experimental Psychology: Human Learning and Memory, 3,60-67.
Krumhansl, C.L. (1979). The psychological representation of musical pitch in a tonal context. Cognitive
Psychology, 11, 346-374.
Krumhansl, C.L. & Keil, F.C. (1982). Acquisition of the hierarchy of tonal functions in music. Memory &
Back to index
Proceedings paper
Introduction
In this paper I describe the self-regulated use of learning strategies as part of musicians' learning process as they master
musical works for performance. Recent research in other areas than the musical one, suggest that self-regulated learning is
crucial for making learning effective (Schunk & Zimmerman, 1998). The construct of self-regulation refers to the degree
that individuals are metacognitively, motivationally, and behaviourally active participants in their own learning
(Zimmerman, 1994). The centrepiece of self-regulated learning is strategy selection, monitoring and revision (Borkowski &
Muthukrishna, 1992). Strategy use becomes self-regulated when students use learning strategies in conjunction with their
own characteristics (e.g. their skills and knowledge), the nature of the task (e.g. its organisation and modality), and other
situational factors (e.g. the purpose of the problem solving activity) (Brown, Bransford, Ferrara, & Campione, 1983).
Previous research in instrumental learning offers only limited information about musicians' self-regulated use of learning
strategies (e.g. Chaffin & Imreh, 1997; Gruson, 1981; Hallam, 1992, 1995; Miklaszewski, 1989).
Considering learning strategies as learning activities aimed to achieve a particular goal (Weinstein & Mayer, 1986), the
present study investigated two organ students' use of learning strategies in face of different task demands within practice
sessions. It looked at the initial stage of preparing a complex piece for concert performance, and at practice sessions of a
later learning period. It also explored the similarities and differences in self-regulated use of strategies that can be found
between these periods.
Based on the criticism of cognitive strategy research by the phenomenographical research on conceptions (Laurillard, 1986;
Marton & Säljö, 1986; Uljens, 1989), the conception of task demands of each student was focused. The students may
conceptualise task demands differently, and thus, different task demands may be of importance to their use of strategies.
Method
The subjects were two third-year organ students at the Norwegian State Academy of Music in Oslo. Their teacher described
them as gifted possessing a high level of technical skill. The works practised were the Prélude from "Prélude et fugue" in B-
major (opus 7) by Marcel Dupré, and the Salve Regina movement from the second Symphony (opus 13) by Charles Marie
Widor. Both pieces represent some of their most important works from the organ repertoire of the French Romantic period.
Before recording the initial practice sessions no special auditory or analytic prestudy work had taken place. The pieces were
parts of the students' preparations for their final examinations at the Academy. The student and their teacher selected pieces
as exemplars of moderate difficulty. This was important as awareness of how we think typically will occur spontaneously
only in situations when our otherwise smooth and well-formed activities do not lead to the results or goals desired. These
situations occur when the problem being solved is of moderate difficulty (Flavell, 1987).
The results presented are based on data gathered during the first practice session and during and immediately after the
second practice in the first and second learning period (each practice session lasting one hour). The students practised on a
familiar instrument in one of their usual practice rooms. The students gave a concert performance of the pieces a few weeks
after the last recorded session.
The information was gathered through the use of observation of practice behaviour, concurrent verbal reports of
problem-solving activities during a session, and retrospective debriefing reports of problem-solving activities given after
practice (Nielsen, 1997, 1998). During pilot studies the verbal reporting techniques were adjusted to fit the purpose of the
study and the natural practice situation. These procedures were performed according to the guidelines offered by Ericsson
and Simon (1993) and Taylor and Dionne (1994), and they included the conducting of a training session and prompting. The
complementary use of concurrent verbal protocols and retrospective debriefing reports provided frequent opportunities to
verify the data reported by problem solvers and to enhance validity in the interpretation of the data collected.
Considering this, the data for this study consist of a detailed listing of the students' behavioural and verbal activities made
Otherwise,...musically it is much the same...simple in a way. I suppose, I will practice it [the piece] much in the
same way as I already have been doing.» (Student Dupre)
«I work more with the music, but at the same time so many notes were «uncertain» [errors]. It really ended out
much the same. I had planned to focus on the music, but it is always important to me that technical problems
are sorted out. At the same time, I could imagine how it [the piece] should sound and tried to attain this.»
(Student Widor)
Student Dupre's conception of task demands mainly seems to have been to focus on the more technical problems, and to a
lesser degree the expressive qualities. Student Widor had planned for and wanted to work on the expressive qualities of the
piece, but as she experienced problems with the reliability of emerging technical plans and their execution, she changed her
focus. However, when she had improved her execution of the parts, she changed back to focus on the expressive qualities.
In their response to technical problems, both students used strategies which reduced the amount of information to be
processed simultaneously (e.g. as in the first learning period). In response to interpretative problems, they played through
larger sections of the piece or took pauses where the score was studied. Both students also stressed playing through a large
part of the piece in a tempo close to the final tempo before they focused on another part. If they identified problems within
the part practised, they continued practising this part. Both emphasised a use of strategies aimed to achieve both rapidity and
accuracy in their performances in this learning period.
The two learning periods compared
The results show the following similarities and differences in self-regulated use of strategies between these periods:
a. the students varied their use of strategies in face of different task demands in the two learning periods.
b. one student especially used learning strategies in face of transitions in patterns and to a lesser degree in response to
more complex patterns.
c. the other student especially used strategies in response to more complex patterns and in response to problems with
reading the score caused by an unfamiliar clef, and in face of interpretative problems.
d. both students used strategies in response to more complex patterns in the second learning period, although complexity
was not the most important task demand for use of strategies for one of the students.
e. some problems were temporary, while others existed throughout each practice session.
f. both students focused on the technical execution in the initial learning period.
g. one student kept a technical focus in the whole time period, while the other gradually to a larger extent focused on the
expressive problems of the piece.
h. the students use of strategies were to some extent not predetermined, but seemed to depend on the success of the
ongoing strategy use, and the students available and accessible knowledge about their own learning process and the
task.
i. in their response to technical problems, both students used strategies which reduced the amount of information
processed simultaneously.
j. with their decisions to focus on the opposing alternatives «rapidity» and «accuracy» the students formed different
profiles in their use of strategies across the learning periods.
Discussion: A preliminary model
Based on the results from this study and from earlier research on learning activities in instrumental practice (e.g. Chaffin &
Imreh, 1997; Miklaszewski, 1989) and theory of self-regulated learning, a preliminary model of self-regulated use of
learning strategies during instrumental practice is presented in the following. It demonstrates the complexity and the
diversity of the students process of self-regulated use of strategies in practice, and consists of different factors that may
influence the student's use of strategy and their interrelations.
Considering theory of self-regulated learning important factors are the students own characteristics (e.g. their skills and
knowledge), the nature of the task (e.g. its organisation and modality), and other situational factors (e.g. the purpose of the
problem solving activity). In the following model «the students own characteristics» were conceptualised as their
metacognitive competence and self-efficacy beliefs, «the nature of the task» as their problem beliefs, and «the purpose of the
problem solving activity» as their self-evaluation of their performances during practice.
As such, the core of this model consists of the student's «problem belief», «strategy use» and «self- evaluation», and their
interrelations. Their contents depend on changes as the musical work is mastered. In the course of mastery, problem beliefs
may be revised (e.g. technical or expressive problems), and the student's self- evaluation relies on criteria that may be
revised (e.g. rapidity or accuracy criteria).
Results from this study and from earlier research on learning activities in instrumental practice (e.g. Chaffin & Imreh, 1997;
Miklaszewski, 1989), also suggest that the students' problem beliefs are influenced by patterns in the musical material and
that these beliefs may be revised due to the students' evaluation of their performance of the music. Further, the problem
belief may influence the strategy use during practice. Based on theory of self-regulated learning the metacognitive
competence of the students and their self-efficacy beliefs may also influence the strategy use. Their use of strategies may
also be independent of this control.
This imply that a model of the process of self-regulated use of strategies during practice must indicate a relationship
between problem belief and patterns in the musical material where the musical material forms the basis of the problem
belief. It must also indicate a relationship between problem belief and the student's self-evaluation of the performance where
the problem belief follow as a consequence of the self-evaluation. Further, it must indicate a relationship between problem
belief and use of strategies where the use of strategies follow as a consequence of the problem belief. Based on the theory of
self-regulated learning it must also indicate a relationship between the self-evaluation of the performance and self-efficacy
beliefs where self-efficacy beliefs follow as a consequence of the self-evaluation. Further, a relationship between
self-efficacy beliefs and a continued use of strategies must be indicated. Finally, it must indicate the possibility of
relationships between the student's metacognitive competence and use of strategies where the use of strategies follow as a
consequence of the student's metacognitive knowledge and control. These factors and their interrelations are summed up in
the following preliminary model (see Figure 1):
FIGURE 1
This model illustrates the process of self-regulated use of strategies during practice. Based on a problem to be solved, the
student's strategy use, performance of the piece, and self-evaluation of the performance (the black line in the model), this
model indicates four problem solving alternatives to follow:
a. The student evaluates the performance as successful, and focus on a new problem (the lilac line in the model).
b. The student evaluates the performance as unsuccessful, but have confidence in the chosen strategy to solve the
problem. The student increase the effort by continued use of the same strategy to solve the problem (the red line
in the model).
c. The student evaluates the performance as unsuccessful, and have no confidence in the chosen strategy to solve
the problem. The student increase the effort by revising the strategy use in the continued problem solving (the
green line in the model).
d. The student evaluates the performance as unsuccessful, but the performance give reason to revise the problem
belief. The student increase the effort by revising the problem belief and then the use of strategy in the
continued problem solving (the blue line in the model).
However, this model is preliminary, and must be considered as forming conjectures of this process and not as a
generalisation of the present study's results.
Conclusions
To conclude, this study shows that able students self-regulate their use of learning strategies in face of different task
demands. As such it confirms some of the presumed self-regulated use of strategies that earlier research has suggested (e.g.
Chaffin & Imreh, 1997; Miklaszewski, 1989). This concerns the relationships between use of strategies and patterns in the
musical work practised based on the students' task conceptions. On the other hand, some of the hypothesis from earlier
research were not «verified». The present study shows that complexity may be of importance to the students' use of
strategies not only in the initial learning period, but also in later learning periods. It also shows that the students'
self-evaluations relies on criteria that may be revised in the course of mastery.
The results could be seen as demonstrating the student's need to monitor his or her use of strategies during practice as a
prerequisite for being able to self-regulate the use of strategies in face of different task demands in practice.
References
Borkowski, J. G., & Muthukrishna, N. (1992). Moving metacognition into the classroom: "Working models"
and effective strategy teaching. In M. Pressley, K. R. Harris and J. T. Guthrie (Eds.). Promoting academic
competence and literacy in school. San Diego: Academic Press. pp 477-501.
Brown, A. L., Bransford, J. D., Ferrara, R. A., & Campione, J. C. (1983). Learning, remembering, and
understanding. In P. H. Mussen (Ed.). Handbook of child psychology: Cognitive development. New York:
Wiley. 4th ed. pp 77-166.
Chaffin, R., & Imreh, G. (1997). «Pulling teeth and torture»: Musical memory and problem solving. Thinking
and Reasoning, 3, 315-336.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analyses: Verbal reports as data. London: MIT.
Flavell, J. H. (1987). Speculations about the nature and development of metacognition. In F. E. Weinert and R.
H. Kluwe (Eds.). Metacognition, motivation, and understanding. New Jersey: Erlbaum. pp 21-29.
Gruson, L. M. (1981). What distinguishes competence: An investigation of piano practising. Unpublished Ph.D.
thesis, University of Waterloo, Canada.
Hallam, S. (1992). Approaches to learning and performance of expert and novice musicians. Unpublished
Ph.D. thesis, University of London, London.
Hallam, S. (1995, August). Qualitative changes in practice and learning as musical expertise develops. Paper
presented at the VIIth International Conference on Developmental Psychology, Krakow.
Laurillard, D. (1986). Att lära genom problemlösning. In F. Marton, D. Hounsell and N. Entwistle (Eds.). Hur
vi lär [How we learn]. Stockholm: Rabèn & Sjögren. pp 171-197.
Marton, F., & Säljö, R. (1986). Kognitiv inriktning vid inlärning. In F. Marton, D. Hounsell and N. Entwistle
(Eds.). Hur vi lär [How we learn]. Stockholm: Rabèn & Sjögren. pp 56-80.
Miklaszewski, K. (1989). A case study of a pianist preparing a musical performance. Psychology of Music, 17,
95-109.
Nielsen, S. G. (1997). Self-regulation of learning strategies during practice: A case study of a church organ
student preparing a musical work for performance. In H. Jorgensen and A. C. Lehmann (Eds.). Does practice
make perfect? Current theory and research on instrumental music practice. NMH-publications 1997:1. Oslo:
The Norwegian State Academy of Music. pp 109-122.
Nielsen, S. G. (1998). Selvregulering av læringstrategier under øving: En studie av to utøvende
musikkstudenter på høyt nivå. [Self-regulation of learning strategies during practice: A study of two church
organ students]. NMH-publications 1998:3. Oslo: The Norwegian State Academy of Music.
Schunk, D. H., & Zimmerman, B. J. (Eds.) (1998). Self-regulated learning. From teaching to self-reflective
practice. New York: The Guildford Press.
Taylor, K. L., & Dionne, J. P. (1994, April). Accessing problem solving strategy knowledge: The
complementary use of concurrent verbal reports and retrospective debriefing. Paper presented at The American
Educational Research Association Annual Meeting, New Orleans.
Uljens, M. (1989): Fenomenografi - forskning om uppfattningar [Phenomenography - research on conceptions].
Lund: Studentlitteratur.
Weinstein, C. E., & Mayer, R. E. (1986). The teaching of learning strategies. In M.C. Wittrock (Ed.).
Handbook of research on teaching. New York: McMillan. pp 317-327.
Zimmerman, B. J. (1994): Dimensions of academic self-regulation: A conceptual framework for education. In
D. H. Schunk and B. J. Zimmerman (Eds.). Self-regulation of learning and performance: Issues and educational
implications. New Jersey: Erlbaum. pp 3-21.
Back to index
Proceedings paper
1 Introduction
1.1 Categorisation and similarity
The ability to classify musical styles is an important and intriguing task from the perspective of music cognition. This process, which
listeners usually do effortlessly, involves integrating a number of perceptual processes. Recent summaries on categorisation divide
these into two; 1) rule application 2) similarity computations (Smith & Patalano, Jonides, 1998; Hahn & Chater, 1998). This paper
considers the latter using the statistical frequencies of events, which have been shown to be influential in learning and perception of
language and sound patterns (e.g. Saffran, 1999). We also limit our line of inquiry into melodic similarity, since this allows to test and
develop the frequency-based measures of melodic similarity that aim to tackle some of the categorisation and classification challenges
music history holds for us.
1.2 Melodic similarity
There has been a moderate amount of research into melodic similarity and a number of experiments have shed light on the parameters
that give rise to this phenomenon. Findings by Dowling (1971, 1978) indicate that one of the main factors of similarity is contour
information, which is essential in short-term comparisons (Dowling & Bartlett, 1981) and in shorter melodies (Edworthy, 1985; Cuddy
et al, 1981). Some studies have concentrated on melodic archetypes (Rosner & Meyer, 1982, 1986), hierarchical structure (Serafine,
Glassman & Overbeeke, 1989), themes (Pollard-Gott, 1983), motifs (Lamont & Dibben, 1997), whether melodies use scalar or
non-scalar tones (Bartlett & Dowling, 1980; Dowling & Bartlett, 1981, 1988), and more recently, on transposed melodies (van Egmond
et al, 1996), the effects of pitch direction, contour and pitch information (Dewitt & Crowder, 1986; Eiting, 1984; Freedman, 1999;
Hofmann-Engl & Parncutt, 1998), and pitch range and key distance (van Egmont & Povel, 1994). Commonly, rhythm has been
considered as a separate entity (Palmer & Krumhansl, 1990; Simpson & Huron 1993; Gabrielsson, 1973) except by Monahan &
Carterette (1985), who studied both rhythm and tonal dimensions as constituents of similarity.
Theoretical models of melodic similarity include Cambouropoulos' (1995, 1997) formal definition of similarity based on the number of
coinciding attributes of melodies. Smith, McNab & Witten (1998; also Orpen & Huron, 1992) have defined similarity as the
complexity of the transformation process involved in mapping one object onto the other. Models that deal with contour and interval
information of the melodies include work by Deutsch & Feroe (1981), Ó Maidín (1998), and Hofmann-Engl & Parncutt, (1998). The
wide range of the focus of the research and the models can be credited to the multidimensional nature of melodic similarity. The
approach used in this paper differs from the previous approaches in the sense that both rhythm and pitch are considered as statistical
entities that are hypothesised to provide perceptually salient cues for similarity.
1.3 Similarity and statistical properties of the melodies
Classifying melodies and musical styles according to the statistical distribution of different intervals, rhythmic patterns, or pitches has a
long history in ethnomusicology (Freeman & Merriam, 1956; Lomax, 1968). Research on music cognition and learning has
demonstrated the effect of statistical information for learning and perception using both cross cultural studies (Castellano et al, 1984;
Kessler et al, 1984; Krumhansl et al, 1999) as well as studies using melodies in which the statistical properties of music have been
intentionally manipulated (Oram & Cuddy, 1995). The results show that listeners are sensitive to pitch distributional information.
Evidence from different modalities has also shown the importance of frequency information in cognitive processes (e.g., Saffran et al,
1999). In light of this evidence, it seems that statistical properties of melodies could provide a means for classification of musical styles
in terms of their perceptual similarity. Indeed, studies using this approach have been successful, for example, Järvinen, Toiviainen &
Louhivuori (1999) classified ten different musical styles based on the distributions of tones and tone transitions. The results, which
Figure 1. Two-dimensional classical multidimensional scaling solution (R2 = .81, N = 15). Y=Yoiks,
H=Hymns, G=German, I=Irish, C=Greek.
3.2 Association between the statistical properties of melodies and listeners' similarity ratings
A comparison between the similarities derived from the statistical properties of the melodies and listeners' similarity judgements was
done by multiple regression analysis. The similarity measures were regressed upon similarity ratings of listeners for all pairs of
melodies. The overall prediction rate was fairly low, (R2= .41, F= 22.67, df= 3,101, p< .001) and revealed that the distribution of
duration transitions explained 20%, note transitions 13% and normalised interval difference (of first 12 and last 12 intervals) 6% of the
variance in listeners' similarity ratings. In other words, the melodies that possessed similar rhythms and had similar note transitions and
interval differences were judged to more similar by the listeners. In a previous study (Hofmann-Engl & Parncutt, 1998), normalised
interval difference accounted for 76% of the similarity judgements but this was not the case here. One reason for this could be the
difference in the number of tones of the melodies, which makes it difficult to use this measure properly. As the connection between
statistical properties and perceived similarities of the melodies was only moderate, the salient dimensions of listeners' ratings was
studied in more detail.
3.3 Salient dimensions of the melodies
The dimensions of the scaling solution were correlated with the descriptive variables of the melodies. Dimension 1 in the two
dimensional solution correlated with mean pitch, predictability, registral return, registral direction, and rhythmic activity (r= .926, r=
.804, r= .549, r= .594, r= .629, respectively, p< .05 and df= 13 in all cases). Dimension 2 correlated with rhythmic variability (r=.662,
df=13, p<.01). Thus, there is no obvious explanation for dimension 1, but it could be interpreted as the predictability of melodies,
which consists of regularity of big intervals and mean pitch height. Dimension 2, again, can be interpreted as rhythmic dimension. In
the three dimensional solution, the results displayed the same pattern of correlations and the third dimension correlated with the
Back to index
Proceedings paper
Engagement and Experience: A Model for the Study of Children's Musical Cognition
Lori A. Custodero
Teachers College, Columbia University
lac66@columbia.edu
Background
Engagement in authentic musical experiences provides a critical vantage point from which to study musical thinking.
Observations of children's active engagement with the environment suggest that their creative solutions to presented intellectual
problems provide insight into their cognitive processes (Bloom, 1993; Duckworth, 1996). Although investigations of children's
musical problem solving in clinical contexts (e.g. Bamburger, 1999) have led to important theories regarding musical thinking,
these studies typically involve older children. The current research model, based on the paradigm of flow experience
(Csikszentmihalyi, 1975, 1988, 1993, 1997), was developed to systematically address the complexities of musical problem
solving (and problem finding) behaviors in young children's naturalistic environments.
Flow experience refers to a state of optimal enjoyment, when participants are thoroughly involved in intrinsically rewarding
activity. Defined by an individual's perceived match between high challenge and high skill levels, it creates an ideal learning
situation, since in order to sustain flow, skills must improve to meet challenges. This dynamic interaction between skills and
challenges, also known as emergent motivation (Csikszentmihalyi, 1982), is self-perpetuating: As an individual's skill level
improves through practice, challenges must become increasingly complex. Observing children's attempts to solve the problem
of sustaining flow through adjustment of their own challenge levels in an intrinsically rewarding activity such as music making,
will conceivably provide information regarding cognitive processes.
Although flow experience has traditionally been studied in adults and adolescents, it may be hypothesized that young children
must be in constant pursuit of flow-type experiences. Maintaining the flow state may represent an inherent means of engaging
with the world, their need to keep sufficiently challenged reflective of their expanding skills base. Several models from
developmental psychology speak to the intrinsic nature of challenge-seeking behavior. Studies of mastery motivation focus on
children's dispositions for engagement in appropriately challenging activities (Barrett, Morgan, & Maslin-Cole, 1993; McCall,
1995). Children's inherent drive to learn about the world through transforming phenomena in their environment is posited by
Feldman (1994) as a rationale for creativity. One might interpret this "transformational imperative" as a flow sustaining
strategy, especially in the context of creative musical experience.
As children strive to keep skills and challenges balanced through mastering or transforming activity, they are employing both
problem solving and problem finding behaviors. In his studies on children's mathematical thinking, Siegler (1996) has found
variability within and between children as they demonstrate choices of cognitive strategies, supporting the idea that children's
solutions to problems are not universal or simply categorized.
Drawing upon the evidence above, the observational flow model is grounded in the following assumptions: (a) flow experience
provides a window into cognition by virtue of its challenge-skill dynamic; (b) young children are compelled to seek out
appropriately challenging activity to maintain the flow state; and (c) the strategies young children use to monitor their flow
states during musical experiences may provide cues to their musical thinking.
Since conventional flow research relies on self-reports of adolescents and adults, operationalizing children's flow-monitoring
behaviors was crucial to adapting the original methodology for younger participants. A valid and reliable observational protocol
was developed for use with young children ages 4-5 in a music instructional setting utilizing the researcher-developed Flow
Indicators in Musical Activities (FIMA) coding form (Custodero, 1998). This measurement tool included nine affective and
eight behavioral indicators, as well as a global measure of flow used as the dependent variable in multiple regression tests. Both
groups of indicators were submitted to factor analysis resulting in four affective and three behavioral factors.
Affective factors found to predict flow included Potency (alert, involved, active) and Self-concept (satisfied, successful),
replicating findings in conventional flow research (e.g., Csikszentmihalyi & Larsen, 1987). Behavioral factors found to predict
flow were more provocative. Skill loaded with anticipation, expansion, and extension -- consistently observed transformations
of the teacher-delivered material. Challenge, defined by self-assignment and self-correction behaviors as well as a deliberate,
controlled quality of gesture, loaded with adult awareness. Peer awareness loaded with imitation intensity, which did not predict
Discussion
Results indicate that flow-related behaviors are observable, supporting findings in the original study (Custodero, 1998).
Self-assignment, self-correction, deliberate gesture, anticipation, expansion, and extension were all positively associated with
skill, focus, and involvement. Similar pervasive influences of social interaction were observed: Children looked to adults to
help define challenge in tasks; they looked to peers for imitation when the challenge level was perceived as inappropriate. Data
in this exploratory study revealed issues of development, environment, and individual temperament.
McCall's (1995) cautionary note that mastery motivation cannot occur before the infant distinguishes between cause and effect
was supported in the observations of infants' musical self-assignment and self-correction - infants over 12 months showed more
of these behaviors. Toddlers exhibited an abundance of self-assignment behaviors; self-correction, however, was superceded by
the compelling nature of anticipating the musical cues, especially those associated with movement. The within-session delayed
participation observed in the infant group was not apparent in the toddler group, which speaks to the perception of structured
activity in these older children.
Self-initiated activity in the presence of adults was less observable in Groups 3-4. It is believed that dispositions toward
appropriate behavior in learning environments had been established by the schooling experience of these children, resulting in
the waning of flow in the context of formal instruction suggested by Csikszentmihalyi (1993).
A deliberate quality of gesture was the most universally applicable indicator of flow by virtue of its representation of focused
involvement. Older infants and toddlers were very deliberate in their rhythmic movement responses to live and recorded music;
older children may have internalized this seemingly inherent response. The manipulation of musical instruments provided an
important means for observing quality of gesture for all ages, this is supported in the original study. Especially noteworthy was
the similarity between the use of hand gesture by the infants and toddlers as a means to express song repertoire and the elevated
singing skill experienced by the older children in the Dalcroze group who used deliberate hand gestures to assist in pitch
accuracy. Goldin-Meadow (2000) hypothesizes that gesture "has the potential to be involved in innately driven as well as
non-innately driven learning - that is, to be a general mechanism of cognitive growth" (p. 237).
The transformational behaviors of anticipation, expansion, and extension were highly influenced by both developmental and
environmental factors. Anticipation was not consistently apparent in the infants; there were notable increases in its frequency in
toddlers, and even more in the preschoolers from the original study. Since this behavior relies upon the immediate response to
instructional presentation, it is more dependent upon the quality of interaction between teacher and student.
Expansion was evident in the younger groups in spontaneous movement responses; Group 4 also exhibited expansion in their
References
Bamberger, J. (1999). Learning from the children we teach. Bulletin of the Council for Research in Music Education, 142,
48-74.
Barrett, K. C., Morgan, G. A., & Maslin-Cole, C. (1993). Three studies on the development of mastery motivation in infancy
and toddlerhood. In D. Messer (Ed.), Mastery motivation in early childhood: development, measurement, and social processes
Back to index
Proceedings paper
While listening to music, people often form a simplified representation of the temporal structure in the music. This simplified representation often takes the
form of an approximately isochronous series of pulses with inter-pulse intervals near 600 ms (Clarke, 1999; Fraisse, 1982; Parncutt, 1994; Snyder &
Krumhansl, 1999; van Noorden & Moelants, 1999). These pulses provide not only the basis of our perceptual representations of rhythm and meter, but also may
underlie our ability to produce rhythmically appropriate actions (Schubotz, Friederici, & Cramon, 2000) such as playing music in an ensemble, dancing in time
with music, or simply tapping our feet or fingers to the perceived pulse.
While the output of the human pulse-finding system is relatively simple (i.e., a quasi-isochronous series impulses), the computational ability to reduce the
temporal complexity of music to a pattern of pulses is extremely impressive. In this sense, pulse-finding is analogous to an equally-impressive ability to reduce
improvised melodies to a simple theme (Large, Palmer, & Pollack, 1995). Whereas in melodic reduction, subjects identify which notes are most essential, in
pulse-finding subjects must find the primary durational unit, the period, and its onset position with respect to musical events, the phase. Furthermore, the
representation of pulse is dynamic in that the period and phase can change incrementally to compensate for small-scale timing perturbations (Semjen, Vorberg,
& Schulze, 1998; Thaut, Miller, & Schauer, 1998), and categorically from one global state to another (e.g., from a quarter-note period to an eighth-note period,
or from a down-beat phase to an up-beat phase).
In many polyphonic musical styles, certain instruments are designated to provide rhythmic and harmonic structure supporting the melody. Examples include the
rhythm section in jazz, parts of the orchestra during a symphony or concerto, and the left-hand or thumb in many styles of piano and guitar music, respectively.
Therefore, in a recent study on pulse-finding in piano ragtime (Snyder & Krumhansl, 1999), it was not surprising that removing the left-hand part led to
degraded performance in pulse-finding. However, the situation is potentially different in contrapuntal music in which no one voice constantly provides the
rhythmic structure. This democratic quality of contrapuntal music, gives rise to a potential problem in identifying how the perception of pulse arises. By simply
examining scores of such music, it is not always apparent what information is available for pulse-finding. It seems likely that pulse-finding in contrapuntal
music is generally more difficult than music with a rhythm section because of the attentional demands associated with searching for a voice carrying pulse cues.
Contrapuntal music raises other interesting questions for students of pulse-finding, such as the extent to which one voice will dominate attention at any given
point in time, what musical features determine the voice to which attention is directed, and whether the perception of pulse can be influenced by multiple voices
file:///g|/Mon/Snyder.htm (1 of 10) [18/07/2000 00:32:43]
HOW TWO VOICES MAKE A WHOLE:
at once.
In addition to behavioral studies of pulse-finding, many researchers have proposed computational models to account for this ability. Before describing these
models, it is useful to keep in mind what properties we desire in a model of pulse-finding. Firstly, a model of pulse-finding must exhibit the ability to find the
pulse that humans hear for a given excerpt of music, and in a similar amount time as humans. Secondly, the model must exhibit robust performance in the
presence of the normal timing deviations found in human musical performance (Palmer, 1997). Thirdly, the model should show instabilities of period and phase
when humans do. And lastly, before making any strong psychological or biological claims about the model, one must show that the model relies on similar
mechanisms as people. While this last test of a model is crucial from the standpoint of experimental psychology, we will provide preliminary data establishing
the mathematical validity of a model of pulse-finding, using contrapuntal music.
Currently, the predominant class of models used for musical pulse-finding rely on oscillatory units that entrain in a 1:1 fashion to periodic components in
musical stimuli (Gasser, Eck, & Port, 1999; Large, 1994; Large & Jones, 1999; Scheirer, 1998; Toiviainen, 1998; for a detailed examination of rule-based
models though, see Desain & Honing (1999)). Oscillatory modelers have proposed adaptive oscillation as a mechanism for dynamically tracking the pulse in
real musical performances. In other words, the oscillatory units in these models are able to adjust the period and phase to compensate for temporal deviations
from isochrony in the music. In addition, they give a simple quasi-periodic response to complex musical patterns, corresponding well to the behavioral output of
human pulse-finding. This is attractive because the proposed oscillatory mechanism could not only give rise to the sensation of pulse but also could directly
drive rhythmic sensory-motor output such as tapping. Oscillatory models bear important similarities to an influential hypothesis of rhythmic cognition,
Dynamic Attending Theory (Jones, 1976; Jones & Boltz, 1989; Large & Jones, 1999). This theory assumes that internal attentional rhythms underlie our ability
to track and find structure in time-varying patterns. Such a mechanism may be used to represent time intervals, perceive metrical structure in speech and music,
and produce rhythmic motor patterns.
Past work shows that adaptive oscillator models can dynamically track musical and non-musical patterns (Gasser, Eck, & Port, 1999; Large, 1994; Large &
Jones, 1999; Scheirer, 1998; Toiviainen, 1998). However, experimentalists have yet to thoroughly compare the models to human performance on musical
pulse-finding. Therefore, the first goal of this study is to collect new behavioral data on human pulse-finding and to determine whether an oscillator model of
pulse performs similarly to humans on this task. The particular model we test uses oscillating units similar to the one described by Toiviainen (1998; see
Toiviainen & Snyder, 2000 in these proceedings for new developments).
For the behavioral experiment and for testing the model, we chose a single contrapuntal organ duet composed by J.S. Bach (1685-1750), BWV 805. It is the
final work in a series of four organ duets for a single performer. More specifically, the left-hand of the organist plays the bottom voice, while the right-hand
plays the top voice. We refer to the two imitative voices as the right-hand part (RH), the left-hand part (LH), and to the two parts together as both. In the
experiment we present the RH, the LH, and both versions of short excerpts to subjects who tap the perceived pulse of the music. We then attempt to model
human performance with the adaptive oscillator system described above, and analyze musical features that possibly influence performance.
EXPERIMENT
For this experiment, we selected 8 eight-measure excerpts from a MIDI version of BWV 805. For each excerpt, LH, RH, and both versions were presented to
subjects on separate tapping trials. We will focus on determining the relative influence of the two voices on performance, whether starting position of the
excerpts influences tapping performance, and whether spontaneous tapping tempo predicts the period of musical tapping.
Method
Subjects
Figure 1. Three schematic measures depicted at the eighth-note level with 2/2 meter. Taps are indicated by bold, underlined metrical positions. Fourteen
possible periodic modes of tapping are shown, each characterized by a period (2, 4 or 8 eighth-notes) and a phase. Aperiodic tapping is tapping with a period
other than 2, 4, or 8 eighth-notes.
The top half of the Appendix displays the average value on each variable, for each stimulus, across all subjects. The three periodic modes in which subjects
mostly tapped are the first three columns in the Appendix. These modes are 2_1, 4_1, and 4_3. For each stimulus, subjects mostly tapped in mode 4_1.
However, for these data, it is important to note differences between subjects. Seven of the subjects tapped over 85% of the time with a mode of 4_1, while three
subjects tapped over 85% of the time with a mode of 2_1. The remaining three subjects tapped around 50% of the time with a mode of 4_1 and between 20-50%
of the time with a mode of 4_3. In other words, for all subjects, the tapping period was extremely consistent over the course of the experiment. Most subjects
consistently tapped with a period of four eighth-notes, while the remaining subjects tapped with a period of 2 eighth-notes. In general, subjects tapped with
similar modes across the three excerpt versions. None of the main effects of version were significant for tapping to the three predominant modes or for tapping
aperiodically. This indicates that both the RH and LH have sufficient cues to pulse, and that neither seems to play a consistently dominant role in pulse-finding.
file:///g|/Mon/Snyder.htm (4 of 10) [18/07/2000 00:32:43]
HOW TWO VOICES MAKE A WHOLE:
Additionally, there was a non-significant tendency for subjects to tap more with a mode of 4_1 for excerpts that started on the first beat of the measure, and vice
versa for 4_3. For excerpts that began on the first beat of the measure, subjects tapped 68% of the time in 4_1 and 5% of the time in 4_3, whereas for excerpts
that began on the third beat of the measure, subjects tapped 57% of the time in 4_1 and 16% of the time in 4_3.
Puzzlingly, subjects' mean spontaneous tapping period did not correlate with the mean proportion of taps with a period of four eighth-notes for each stimulus,
r(334) = -.03, or with any of the specific tapping mode proportions. The mean spontaneous tapping period across subjects ranged from 554-991 ms inter-tap
interval (ITI) with a mean of 754 ms, notably higher than the conventional 600 ms value cited in the literature for preferred tempo (Clarke, 1999; Fraisse, 1982;
Parncutt, 1994; van Noorden & Moelants, 1999).
Subjects switched period .15 times per trial and switched phase .30 times per trial. Slightly fewer switches in period occurred for the both versions though the
effect of version was not significant, p = .14. However the effect of version for switches in phase was significant, with fewer switches occurring for the both
versions, F(14,182) = 5.29, p < .025. For BST, subjects started tapping periodically after around 8 eighth-notes, or one measure. This shows extremely fast
synchronization, considering that one measure is only two tapping cycles for the subjects who tapped with a period of four eighth-notes. BST differed between
versions, F(2,26) = 9.02, p < .005, with a mean of 7.8 for both versions and over 9.0 for the LH and RH versions.
To summarize the findings, subjects showed generally better performance when both voices were present, although the effects were generally not large.
Importantly, the mode of tapping was not strongly influenced by whether the LH, RH, or both voices were present. This is in contrast to previous findings in
ragtime, in which the LH part seemed to play a predominant role in pulse-finding (Snyder & Krumhansl, 1999). In the present case however, the findings do not
mean that for each excerpt, both voices had an equal role in pulse-finding. Instead, it is possible that the voice that most influenced performance simply differed
between the different excerpts. This is a point for future analyses. The most popular mode of tapping for subjects was 4_1, indicating a period of four
eighth-notes (tapping twice per measure), with a phase that includes the first and fifth eighth-note positions in the measure. Thus, a clear preferred mode of
tapping emerged which corresponded very well to the notated meter. Subjects tended to tap more in mode 4_1 for excerpts that started on the first rather than
the third eighth-note beat. Lastly, there was no relationship between spontaneous tapping rate and the period of musical tapping.
MODEL
To test the general performance of the model against the human data, we tested the model on each of the twenty-four stimuli from BWV 805. We only give a
brief description of the model here. Toiviainen (1998) describes the basic oscillatory unit, but for a detailed explanation of the current model, see Toiviainen &
Snyder (in these proceedings). The present model consists of a bank of oscillators that become active after the beginning of the music. Each oscillator
phase-locks to the music with a specific mode, characterized by its period and phase with respect to the music. Each oscillator is also characterized by its
resonance strength, or its activity value. The oscillator with the highest resonance value, the winner, drives the output of the model, which is a series of discrete
pulses, analogous to human taps. Nevertheless, the resonance values of the other active oscillators continue to evolve over the entire musical excerpt. In this
manner, a challenging oscillator can overtake the current winner. However, this only occurs if the challenger's resonance value exceeds the winner's resonance
value by a certain percent. The model is most influenced by the presence of notes with relatively long durations. In other words, if a particular metrical position
contains long notes, the model is likely to tap with a period and phase defined by this metrical position. Therefore, the model is not very influenced by
ornamentations such as trills or appoggiatura.
Because the model is deterministic, we only tested the model once on each of the twenty-four stimuli. The behavior of the model, for each excerpt, is shown in
the bottom half of the Appendix for all performance variables. Below this are correlation coefficients between the human and model performance on each
variable across the twenty-four stimuli. Currently, the pattern of performance for the model across the stimuli does not match the human performance well at
all. None of the correlations are close to significance. For mode of tapping, this is clearly because the model tends to tap with a mode of 4_3, while the human
subjects more often tap with a mode of 4_1.
file:///g|/Mon/Snyder.htm (5 of 10) [18/07/2000 00:32:43]
HOW TWO VOICES MAKE A WHOLE:
In terms of absolute performance levels on the other measures, the model compares reasonably well to humans. The model taps slightly more aperiodically than
humans, with 8% of taps for the model versus 2% of taps for humans. In contrast, humans switch tapping period and phase more than the model, with values of
.15 and .30 switches in period and phase for humans, and values of .08 and .02 for the model. However, these values all indicate stable tapping performance.
For beats to start tapping (BST), mean performance is very close between humans and the model, with 8.82 BST for humans, and 8.25 for the model.
MUSIC ANALYSIS
To determine musical cues to pulse that were available to both humans and the model in the stimuli, we analyzed the music on a number of dimensions on the
eighth-note level, thus ignoring ornamentation. Our approach was to focus on building a measure of accent based on two dimensions, the number of note onsets
and their duration (inter-note interval within a voice). Following Parncutt's (1994) model, we define durational accent, Ad, at time t as follows:
where d is the inter-note interval between a note at time t and the next note at t+1 in a particular voice. The time constant, τ, is 500 ms and corresponds to a
saturation duration and the accent index, i, is 2 ms corresponding to the minimum discriminable IOI difference. These parameter values are those used by
Parncutt (1994). We add the durational accent, Ad(t), to onset presence, O(t):
to obtain the total accent value, A(t). O(t) equals 0 for absence and 1 for presence of note onsets for each eighth-note position. A(t) was calculated at each t, for
each voice separately. A(t) can range from 0 to 2 for each voice independently because both Ad(t) and O(t) range between 0 and 1. To obtain a measure for both
voices together, we took the mean of the values for the two voices at each t. Because this particular calculation of accent is not experimentally verified, we use
it simply as an indication of note events and their duration across two different musical dimensions and across the two contrapuntal voices. We apply two types
of analysis to this measure of accent: auto-correlations to determine periodicity information in the stimuli, and average value per eighth-note metrical position to
determine phase information in the stimuli.
Auto-correlation, displayed in an auto-correlogram, refers to correlating a series of numbers with itself at different relative phase lags (or phase shifts). The size
of the correlation coefficient at a particular phase lag indicates the strength of periodicities in the series corresponding to that phase lag. Based on the 2/2 meter
and on subjects' performance, we would expect to find strong correlations at lags of 2, 4, and 8 eighth-notes, corresponding to quarter-note, half-note, and
whole-note periods, respectively.
As shown on the left of Figure 2, across all excerpts, the strongest peak in the auto-correlogram is at a period of 8 eighth-notes, or the measure length. The next
highest peak is at a period of 2 eighth-notes, or the quarter note duration. Interestingly, these peaks are both larger than the peak at 4 eighth-notes, which is the
period with which most subjects tapped. One explanation for this is that subjects are unlikely to tap at a whole-note period because this is 1600 ms, near the
upper-bound of synchronization. Similarly, tapping at the quarter-note period, 400 ms, is close to the lower-bound of synchronization. Therefore, tapping with
an 800 ms period, the half-note, may simply be the most comfortable tapping period that fits with the metrical structure.
For the average accent per metrical position (shown on the right of Figure 2), there are clear peaks at odd metrical positions, corresponding to relatively strong
positions. In particular, the strongest peak across excerpts occurs at positions 5 and 3. Strong peaks also appear on positions 1 and 7. Given this analysis, it is
perhaps not surprising that the model tapped predominantly at 4_3 because both positions 3 and 7 had a large number of note onsets and also had relatively long
duration notes. In addition, there were many trills on position 5, leading to lower weighting of this position by the model. In contrast to the model, subjects
file:///g|/Mon/Snyder.htm (6 of 10) [18/07/2000 00:32:43]
HOW TWO VOICES MAKE A WHOLE:
predominantly tapped with a mode of 4_1, or at metrical positions 1 and 5.
Figure 2. On the left, the auto-correlogram of accent per eighth-note lag across the eight excerpts for the versions with both voices. On the right, the average
accent per eighth-note metrical position across the eight excerpts for the versions with both voices.
One possibly important difference between the model and humans is that the model does not take melodic information into account in determining which notes
are salient. Therefore, we calculated the number of melodic direction changes and melodic intervals per metrical position. We focus on phase cues here because
the model and humans did not differ as much in tapping period. The top of Table 1 shows the mean number of melodic direction changes per metrical position,
while the bottom shows the mean number of each melodic interval size per metrical position. We calculated both measures across all excerpts for the both voice
versions. Clearly, more melodic direction changes occur at the first metrical position than at other metrical positions. For the melodic intervals, there are more
file:///g|/Mon/Snyder.htm (7 of 10) [18/07/2000 00:32:43]
HOW TWO VOICES MAKE A WHOLE:
semi-tone intervals on the first and fifth eighth-note metrical positions than at other metrical positions. This pattern does not appear for other interval sizes.
Thus, these melodic cues could be influencing subjects' tapping phase. On the other hand, these cues do not influence the model's behavior because it does not
take melodic information into account in determining the importance of note events.
Note. Mean values were calculated across the original (both voices) eight excerpts from BWV 805, consisting of 64 total measures, and indicate proportion of
time at each metrical position a particular event occurred across the two voices.
DISCUSSION
In summary, we tested human subjects and an oscillatory model of pulse on stimuli derived from a two-voice contrapuntal work by Bach, BWV 805. Subjects
tapped with a similar phase and period, usually with a mode of 4_1, regardless of whether only one of the voices or both voices were present. This suggests that
period and phase information is present in both voices throughout the excerpts we used. The model likely taps on mode 4_3, or anti-phase to humans, because
these positions often contain long notes, and because position five often contains trills, which the model does not weight highly due to their composition of
short duration notes.
Back to index
Proceedings paper
their working memory. These representations then allow them to name the pitch by judging its
distance from the reference.
It is this aspect of the memory model that led Klein, Coles and Donchin (1984) to predict that the
behavior of an Event Related Brain Potential (ERP) known as the P300 would differ in AP and RP
subjects. Their prediction was derived from a theory of the P300, proposed by Donchin (1981, see
also Donchin and Coles, 1988). This theory views the P300 as a manifestation of the activity of
processing modules invoked if and when deviant stimuli indicate the need for a revision in the scheme
of the operating context that governs the subject's information processing.
Klein et al. (1984) reported a successful confirmation of the prediction. However, as will be seen
below, different investigators obtained conflicting results -- some succeeding in replicating Klein et
al. (1984), and others not. In this paper we describe a study that proposes a resolution to the conflict
by demonstrating that the strategy used for pitch judgment is a crucial element of the task, and by
noting that there is a systematic difference between subjects with AP who do, and do not, possess the
ability to make relative pitch judgments.
The ERP represents electrical activity in the brain, recorded between a pair of scalp electrodes, that
has a specific pattern of positive and negative voltages. These voltages have a consistent temporal
time course for each subject, electrode pair, and "eliciting event" (the term "event" is being used here
instead of "stimulus" since ERP's have been shown to occur even in the absence of a stimulus -- such
as when an expected stimulus is omitted during a sequence of stimuli, e.g. Sutton et al., 1967). This
temporal consistency allows for the extraction of the ERP from the ongoing EEG activity by means of
a process known as "signal averaging." This technique extracts all activity that is time-locked to a
specific eliciting event and effectively eliminates activity that has no consistent temporal relationship
to the event.
The ERP is generally viewed as a sequence of "components" (see Donchin, Ritter and McCullum,
1978, for a discussion of the concept of ERP component). Each component is conceptualized as a
manifestation of the activity of a voltage dipole, or an element of neural circuitry activated at a
specific interval following the triggering event. In other words, the components manifest specific
processing needs invoked by the interaction between the occurrence of the eliciting event, the
subject's task and strategy, and the circumstances of the presentation. Thus, ERP components make it
possible to monitor, in addition to the subject's overt responses to events, "covert" processes such as
the updating of memory, and the changing of future strategies.
The ERP component of interest in the present report is the P300, a positive going voltage change
(hence "P") with a latency of about 300 msec following the triggering event. This component was
discovered by Sutton and his colleagues (1965) and has since been intensively studied (see reviews by
Fabiani et al., 1987, Donchin, 1981, Donchin and Coles, 1988). It is most often elicited in the
so-called "Oddball Paradigm," in which a subject is presented with a sequence of events that can be
classified into two categories. The subject is then assigned a task that cannot be performed without
deciding which events belong to which category. If the probability of occurrence of events in one
category is low (the "rare" events), a P300 will be elicited by these events. The P300 has been
successfully employed in the study of different aspects of pitch processing (e.g., Besson and Faíta,
1995; Besson, Faíta and Requin, 1994; Besson and Macar, 1987; Cohen and Eretz, 1991; Cohen,
Granot, Pratt and Barneah, 1993; Crummer, Hantz, Chuang, Walton, and Frisina, 1988; Ford, Roth,
and Kopell, 1976; Granot and Donchin, 1996; Hantz, Crummer, Wayman, Walton, and Frisina, 1992;
Hantz, Kreilick, Braveman and Swartz, 1995; Hantz, Kreilick, Kananen and Swartz, 1997; Klein,
Coles and Donchin, 1984; Paller, McCarthy and Wood, 1992; Verleger, 1990; Wayman, Frisina,
Walton, Hantz and Crummer, 1992).
Donchin and his colleagues (Donchin, 1981; Donchin and Coles, 1988) proposed the "Context
Updating" theory to account for the functional significance of the P300. The assumption is that the
P300 is elicited whenever the circumstances require a revision in the model of the operating context
that governs the subject's information processing. For instance, in pitch memory tasks, musicians
without AP must continually compare incoming pitch information with their internal framework of
the scale, or with a representation of a tone they have just heard. Hence, their model of the "auditory
environment" is continually modified to take into account the discrepancies between the actual and the
modeled context.
Conversely, if subjects with AP maintain permanent representations of pitches, they do not update
their mental scheme during auditory judgments, and therefore the rare events in an auditory Oddball
Paradigm will not elicit a P300. This was the prediction that Klein et al. (1984) tested. They presented
both visual and auditory oddball sequences to AP and to RP subjects. Klein et al. (1984) reported that
the amplitude of the auditory P300 was reduced, sometimes almost to zero, in the AP subjects, but not
in the RP subjects. Both groups displayed a normal P300 in response to the rare events in a visual
oddball sequence. Hence, the absence of the P300 in the AP subjects was specific to the auditory
sequences. Furthermore, Klein et al. (1984) reported a negative correlation between the amplitude of
the P300 and the subjects' performance on a pitch-screening test that was administered to each subject
prior to the oddball task. The better the performance on the pitch screening test, or the stronger the
subject's AP skill, the smaller the amplitude of the P300 elicited in that subject by the rare auditory
stimuli. This small P300 amplitude elicited by AP subjects during auditory tasks has commonly been
referred to in the literature as the "AP effect."
Several investigators have reported successful replications of Klein et al. (1984). For instance,
collaborative research efforts between psychologists and music theorists at the University of
Rochester show that musicians with AP also exhibit a reduced P300 in oddballs involving sine tones,
piano tones, musical intervals, and various instrumental timbres. Hantz et al. (1992) investigated
differences in pitch processing in three different groups of subjects including musicians with AP,
musicians without AP, and nonmusicians. The authors presented subjects with two different oddballs;
a contour discrimination task consisting of ascending and descending intervals, and an interval
discrimination task consisting of minor and major thirds. In both tasks, the AP musicians exhibited
smaller P300 amplitudes than either of the other two groups and significantly smaller amplitudes than
the musicians. This work was corroborated by Wayman et al. (1992) who assessed the effects of
musical training and AP ability on event-related brain activity in response to sine tones. Again, three
different subject groups were used. The task was a simple oddball consisting of 500 and 1000 Hz
tones. The results show that AP musicians exhibited a significantly smaller P300 amplitude than
either the musicians or nonmusicians. Finally, Crummer et al. (1994) examined the perceptual and
neurophysiological differences in subjects with varying musical training during timbre discrimination
tasks. Three stimulus series were used comprising three different types of timbre discrimination
varying in difficulty. These included string instruments in the same family (cello and viola), flutes
made of different material (silver and wood), and like instruments of slightly different size (B-flat
versus F Tubas). Results show that the AP group had smaller P300 amplitudes and shorter latencies
for all three series than the other groups. These results are interesting in that AP subjects were now
showing a reduced P300 in timbre discrimination tasks in addition to pitch discrimination tasks. The
authors suggest that the pitch differences within the harmonics of the note played may account for the
observed waveform differences.
Although these studies concur with the main finding of Klein et al. (1984), they diverge from it in that
they do not find a correlation between AP subject test scores on the pitch screening measure and P300
amplitude. It is important to note, however, that while these authors employed the same pitch
screening measure as the one used by Klein et al. (1984), there are some differences in the exact
implementation of this test. Overall, their screening procedure is more difficult in that the test tones
and inter-stimulus-intervals are shorter. Furthermore, these authors did not give the subjects a practice
session prior to the test as Klein et al. (1984) did, and they only provide half credit for semitone errors
as opposed to full credit as in the Klein et al. (1984) study.
In other instances, attempts to replicate the Klein et al. (1984) experiment yielded contradictory
results. Johnston (1994) found no correlation between the P300 and AP ability. While Johnston (1994)
attempted to replicate the ERP procedure of Klein et al. (1984) as closely as possible, he did employ a
more difficult pitch screening procedure and he added a third subject group consisting of
non-musicians. In both the visual and the auditory oddballs, the non-musician group showed the
largest P300's, followed by the AP group, while the P300's of the non-AP musician group appeared
the smallest. In his attempt to explain the discrepancies between the results of his study and the Klein
et al. (1984) results, Johnston (1994) concluded that AP musicians might show a greater relative
change in P300 over the course of an experimental run. In particular, this might occur if AP ability
makes these subjects more prone to habituate or become more accustomed to the stimuli in the
pitch-discrimination condition. In other words, he suggests that Klein et al. (1984)'s AP subjects may
have habituated to the stimuli faster than those in his study. Generally, this habituation process has
been correlated in the literature with a smaller P300.
Another replication of the Klein et al.(1984) study was performed by Bischoff et al. (1995 --
unpublished data). Bischoff et al. (1995) used the same pitch screening and ERP procedures as Klein
et al. (1984). However, contrary to the data obtained by Klein et al. (1984), Bischoff et al. (1995)
found that the ERPs elicited in the AP musicians were similar in both visual and auditory modalities.
In addition, both Johnston (1994) and Bischoff et al. (1995) found no correlation between the
amplitude or area of the P300 and the percentage of correctly identified tones in the pitch screening
tests. In fact, in Bischoff et al.'s study, some of the AP musicians who scored the highest on the pitch
screening measure also displayed the largest P300's.
Finally, Hantz et al. (1995) presented AP musicians, non-AP musicians, and non-musicians with a
"pitch memory series" which consisted of a target pitch (G4) interspersed throughout a series of
random pitches. They found that all three groups produced some late positive (P300) activity for the
pitch memory series. Hantz et al. (1997) likewise report that AP subjects produce robust P300's during
a melodic and harmonic closure task.
It seems evident from the above survey that there is considerable variance, across subject groups and
laboratories, in the degree to which P300 is indeed absent in AP subjects. Such variance may be
accounted for by subtle differences among the tasks used in the different studies and to individual
differences among AP subjects that have not been considered in the design of the studies. Our most
recent AP study examines these possibilities. Specifically, we examine the possibility that the
contradictory results may be due to variability in the degree to which different studies used tasks that
placed a heavy demand upon pitch memory. It is conceivable that the subjects in some of these studies
were not consistently using their absolute pitch capabilities. Hantz et al. (1995) even claim that the
task used in their study was not sufficiently difficult to warrant the use of an absolute pitch strategy.
Hence, in order to observe the differences in processing in AP and RP musicians, we have designed a
more challenging pitch memory task in which the use of the AP ability could be beneficial. At the
same time, the task is not beyond the capabilities of musicians who possess good relative pitch
abilities.
In our most recent AP experiment, subjects (who were screened for both AP and RP ability) were
presented with two auditory oddball tasks, and one visual oddball task that served as a control. In the
first auditory task (pitch memory task) subjects were asked to differentiate between diatonic and
nondiatonic tones within a tonal framework. We assumed that AP subjects would use their AP ability
in this task because of its difficulty level. In the second task (the contour task) subjects were asked to
differentiate between tones moving upwards or downwards. This task would require both AP and RP
subjects to hold each tone in working memory in order to successfully determine the direction of
motion from one tone to the next. Hence, with this task, which is "relative" in nature, we hypothesized
that the RP subjects would perform better, in terms of accuracy and RT, than the AP subjects.
Our research (reported at ICMPC5 and expanded upon since) suggests that the relationship between
the amplitude of the P300 component of the ERP and the AP ability depends on the degree to which
subjects employ their AP ability during a musical task. This, in turn, may be greatly influenced by the
task as well as by the level of the AP ability of the individual subject. For example, those AP subjects
who also possess strong relative pitch skills and who reportedly used a relative pitch strategy when
performing the pitch memory task, elicited a large P300 to the rare, nondiatonic tones. Conversely,
those AP subjects with weaker relative pitch skills showed a smaller P300 to the rare nondiatonic
tones. Thus, the P300 amplitude may serve as a marker of the type of strategy (absolute or relative)
that subjects employ in a given situation.
Second, this research lends further support to the "Context Updating" theory of P300 that assumes that
the P300 is elicited whenever the circumstances require subjects to modify their model of the
operating context. That is, the P300 reflects the maintenance of a model of the environment that is
continually adjusted to take into account the discrepancies between the actual and the modeled
context. Several accounts of AP in the literature claim that AP subjects have a permanent template, or
a set of internal standards, in their memory that contains representations of pitches. Hence, because of
this template, they are able to "fetch the name of a tone without comparing the representation of the
tone they have just heard with a recently fetched representation of a standard." (Klein et al., p. 1306).
RP subjects, on the other hand, are assumed to have a more movable conceptual template, so they
must continually compare incoming pitch information with their representation of the scale. Our
results are consistent with the predictions derived from this model. Substantial support for this model
has also been reported in a recent study by Zatorre et al. (1998). Through the use of positron emission
tomography and cerebral blood flow studies, these authors found that, when listening to tones, AP
subjects show activation of the left posterior dorsolateral frontal cortex while control subjects do not.
On the other hand, they observed activity within the right inferior frontal cortex in controls but not in
AP subjects during an interval task. The authors claim that, "subjects without AP use tonal working
memory in both tasks, but AP possessors may not need access to this mechanism for interval
classification because they are able to classify each note within the interval by name" (p. 3177). They
further claim that, "this conclusion is concordant with the reported absence of the P300-evoked
electrical component..." (p. 3177).
Third, this research has demonstrated that there are indeed varying levels of the AP ability; some AP
subjects have, in addition, strong relative pitch abilities and others do not. These differences are
reflected quite consistently by the behavior of the P300 component. For instance, those AP subjects
who are the slowest and least accurate during the contour task generally show the smallest P300
amplitude to the pitch memory task and perform the least well on the relative pitch test. These
findings corroborate the work of Albert Bachem who presented a detailed classification of the levels
of AP after an extensive study of the ability between the years of 1937 and 1955. They also
corroborate a hypothesis proposed by music psychologist Jay Dowling in 1986:
It seems unlikely that the ability for absolute pitch is bimodally distributed in the population, that is,
that some have it and others do not. It seems more consonant with our experience that people possess
the ability in varying degrees and that whether the ability shows up depends on the particular task
demands the person faces (p. 122).
Furthermore, they lend further support to the call for a multidimensional measure of AP which could
be assessed on the basis of standardized tests. Such a need was first expressed by Takeuchi and Hulse
in 1993. These tests would provide detailed information about subjects' degrees of AP. In addition,
they would reflect performance on both identification and production tasks, as well as sensitivity to
timbre, pitch register, and pitch class. Moreover, results from this study show that researchers need to
exercise caution when selecting subjects for AP experiments. Subjects must be thoroughly screened
and various aspects of their pitch ability need to be assessed.
Finally, this research strengthens the idea that the memory system for pitch and interval distances is
distinct from the memory system for contour. According to Dowling (1978), when melodies are
processed, both interval information and contour information are extracted. However, whereas the
interval information is stored in long-term memory only after a key has been established, the contour
information is extracted immediately, regardless of the key information, but it rapidly decays as the
melody progresses. That is, these two dimensions of melody function independently. This was nicely
demonstrated in the current study. All of the RP and AP subjects exhibited a reduced or even
nonexistent P300 to the rare stimuli in the contour task (all P300's were less than 5 microvolts). If the
RP subjects were actually focusing on specific pitch information, they would have been involved with
relating each pitch to the rest of the musical context and probably would have responded more
profoundly to the rare, nondiatonic tones as the majority of them did in the pitch memory task. In the
case of the AP subjects, when they were asked to focus on specific pitch information in the pitch
memory task, their waveforms were generally larger than those shown to the contour task. This
demonstrates that they, too, are able to ignore specific pitch information when they change their
listening strategies and focus specifically on contour shifts.
Finally, one of the trends in the data revealed that most of the subjects showing a strong AP/reduced
P300 relationship report having relatives who are either musicians or who also possess the AP ability.
These findings are interesting in view of recent reports (described above) by geneticists of a strong
genetic component in the AP ability. The data from the current study show a strong correlation
between the P300 amplitude of the AP subjects and the number of their relatives who are either
musicians or have the AP ability. The subjects with the greatest number of relatives who are
musicians tend to be the ones with the smallest P300 amplitudes to the pitch memory task. Likewise,
the subjects with the greatest number of relatives with the AP ability tend to be the ones with the
smallest P300 amplitudes to the pitch memory task. In the future, it would be interesting to explore
this genetic component further.
References:
Bachem, A. (1955). Absolute Pitch. Journal of the Acoustical Society of America, xxvii, 1180.
Baharloo, S., Johnston, P.A., Service, S. K., Gitschier, J, and Freimer, N.B. (1998). Absolute Pitch:
An Approach for Identification of Genetic and Nongenetic Components. American Journal of Human
Genetics, 62, 224-231.
Besson, M., and Faïta, F. (1994). Electrophysiological Studies of Musical Incongruities: Comparison
Between Musicians and Non-Musicians. In, Proceedings of the Third International Conference on
Music Perception and Cognition (pp. 41-43), Liège, Belgium: ICMPC.
Besson, M., Faïta, F., and Requin, J. (1994). Brain Waves Associated with Musical Incongruities
Differ for Musicians and Non-Musicians. Neuroscience Letters, 168, 101-105.
Besson, M., and Macar, F. (1987). An Event-Related Potential Analysis of Incongruity in Music and
Other Non-Linguistic Contexts. Psychophysiology, 24, 14-25.
Chouard, C.H., and Sposetti, R. (1991). Environmental and Electrophysiological Study of Absolute
Pitch. Acta Otolaryngol., 111, 225-230.
Cohen, D., and Erez, A. (1991). Event-Related Potential Measurements of Cognitive Components in
Response to Pitch Patterns. Music Perception, 8, 405-430.
Cohen, D., Granot, R., Pratt, H., & Barneah, A. (1993). Cognitive Meanings of Musical Elements as
Disclosed by Event-Related Potential (ERP) and Verbal Experiments. Music Perception, 11/2,
153-184.
Cook, E.W., & Miller, G. (1992). Digital Filtering: Background and Tutorial for Psychophysiologists.
Psychophysiology, 29(3), 350-367.
Crummer, G.C., Walton, J.P., Wayman, J., Hantz, E.C., & Frisina, R.D. (1994). Neural Processing of
Musical Timbre by Musicians, Nonmusicians, and Musicians Possessing Absolute Pitch. Journal of
the Acoustical Society of America, 95/5, 2720-2727.
Donchin, E. Surprise!. . . Surprise? (1981). Psychophysiology, 18/5, 493-513.
Donchin, E., and Coles, M. G. H. (1988). Precommentary: Is the P300 Component a Manifestation of
Context Updating? Behavioral and Brain Sciences, 11/3, 357-374.
Dowling, W. J. (1978). Scale and Contour: Two Components of a Theory of Memory for Melodies.
Psychological Review, 85, 341-354.
Dowling, W. J., and Harwood, D. L. (1986). Music Cognition, Orlando: Academic Press, Inc.
Ford, J., Roth, W., and Kopell, B. (1976). Auditory Evoked Potentials to Unpredictable Shifts in
Pitch. Psychophysiology, 13/1, 32-39.
Granot, R., and Donchin, E. (1996). An ERP note on Musical Scales: The First Scale Tone is
Processed Differently. (In preparation).
Gratton, G., Coles, M.G.H., & Donchin, E. (1983). A New Method for off-line Removal of Ocular
Artifact. Electroencephalography and Clinical Neurophysiology, 55, 468-484.
Gregerson, P., and Kumar, S. (1996). The Genetics of Perfect Pitch. American Journal of Human
Genetics Supplement, 59, A179.
Hall, D. E. (1982). Practically Perfect Pitch: Some Comments. Journal of the Acoustical Society of
America, 71, 754-755.
Hantz, E.C., Crummer, G.C., Wayman, J.W., Walton, J.P., & Frisina, R.D. (1992). Effects of Musical
Training and Absolute Pitch on the Neural Processing of Melodic Intervals: A P3 Event-Related
Potential Study. Music Perception, 10/1, 25-42.
Hantz, E.C., Kreilick, K.G., Braveman, A.L., & Swartz, Kenneth P. (1995). Effects of Musical
Training and Absolute Pitch on a Pitch Memory Task: An Event-Related Potential Study.
Pyschomusicology, 14, 53-76.
Hantz, E.C., Kreilick, K.G., Kananen, W., and Swartz, K.P. (1997). Neural Responses to Melodic and
Harmonic Closure: An Event-Related-Potential Study. Music Perception, 15/1, 69-98.
Johnston, P. A. (1994). Brain Physiology and Music Cognition. Unpublished Doctoral Dissertation,
University of California, San Diego, California.
Klein, M., Coles, M.G.H., and Donchin, E. (1984). People with Absolute Pitch Process Tones without
Producing a P300. Science, 223, 1306-1309.
Krumhansl, C. L. (1979). The Psychological Representation of Musical Pitch in Tonal Context.
Cognitive Psychology, 11, 346-374.
Lockhead, G.R., and Byrd, R. (1981). Practically Perfect Pitch. Journal of the Acoustical Society of
America, 70, 381.
Paller, K.A., McCarthy, G., and Wood, C.C. (1992). Event-Related Potentials Elicited by Deviant
Endings to Melodies. Psychophysiology, 29/2, 202-206.
Rakowski, A. (1978). Investigations of Absolute Pitch. In E. P. Asmus, Jr. (Ed.), Proceedings of the
Research Symposium on the Psychology and Acoustics of Music (pp. 45-57). Lawrence: University of
Kansas Division of Continuing Education.
Rakowski, A., and Morawska-Bungeler, M. (1987). In Search for the Criteria of Absolute Pitch.
Archives of Acoustics, 12, 75-87.
Schlaug, G., Jancke, L., Huang, Y., & Steinmetz, H. (1995). In Vivo Evidence of Structural Brain
Asymmetry in Musicians. Science, 267, 699-701.
Siegel, J. (1974). Sensory and Verbal Coding Strategies in Subjects with Absolute Pitch. Journal of
Experimental Psychology, ciii, 37.
Stumpf, C. (1883). Tonpsychologie I. (Hirzel: Leipzig).
Sutton, S., Tueting, P., Zubin, J., & John, E.R. (1967). Information Delivery and the Sensory Evoked
Potential. Science, 155, 1436-1439.
Takeuchi, A.H., & Hulse, S.H. (1993). Absolute Pitch. Psychological Bulletin, 113/2, 345-361.
Verleger, R. (1990). P3-evoking Wrong Notes: Unexpected, Awaited, or Arousing? International
Journal of Neuroscience, 55, 171-179.
Ward, W.D., and Burns, E.M. (1982). Absolute Pitch. In, D. Deutsch Ed., The Psychology of Music.
(New York: Academic Press), 431-451.
Wayman, J.W., Frisina, R.D., Walton, J.P., Hantz, E.C. and Crummer, G.C. (1992). Effects of
Musical Training and Absolute Pitch Ability on Event-Related Activity in Response to Sine Tones.
Journal of the Acoustical Society of America, 91/6, 3527-3531.
Zatorre, R.J., Perry, D.W., Beckett, C.A.,Westbury, C.F., and Evans, A.C. (1998). Functional
Anatomy of Musical Processing in Listeners with Absolute Pitch and Relative Pitch. Proceedings of
the National Academy of Science, 95, 3172-3177.
Back to index
Proceedings abstract
RECONSTRUCTING MELODIES: HOW CHILDREN FROM INDIA AND THE U.S. PARSE CULTURALLY
FAMILIAR AND UNFAMILIAR MELODIES.
kvv@mediaone.net
Background:
Melodic pattern recognition has been studied in terms of contour similarity and
the principles of expectation enhanced by implied harmonic context. However,
the developmental aspects of our ability to recognize a simple tune are not
fully understood. Bamburger's work with children re-arranging tunes with
musical bells showed that training affects the degree to which a sense of motor
efficiency enters into performance for different age groups. Dowling's work on
memory for contour and the role of rhythm has also shed light on the age at
which the recognition of a tune becomes stable in an musically ambiguous
context.
Aims:
method:
Two groups of elementary school children, one from the United States and one
from New Delhi were given individual instruction in using a laptop computer to
perform tunes on musical blocks. The blocks were of a variety of shapes and
were assigned to segments of a sampled melody such that no identifiable pattern
of blocks would infer any melodic pattern. Familiar melodies were considered
"in-culture" such "Twinkle, Twinkle Little Star." The unfamiliar,
"out-of-culture" melodies were taken from the simple repetitive working songs
of Kaluli women of Papua New Guinea. Blocks were randomly distributed on the
screen before each subject trial. Subjects were then allowed click and listen,
drag and place, or select and throw away any tone blocks, such as duplicate
notes or sections. Unlimited time was allowed. . The subject finally performed
the piece by clicking the mouse on each of the musical blocks in sequence.
Play-backs and revisions were allowed until the child was satisfied. Each
interaction event throughout and all performances were stored for analysis.
Results:
Final layout of the musical segments fell into several categories of spatial
arrangements, from circular, to a staircase shape, and in the case of "Twinkle
Twinkle", a contour which followed the melody. There was no significant
Conclusions:
Back to index
Proceedings paper
Traditionally, accounts of musical learning and musical products have focussed on both a restricted conception of the
possibilities of musical knowledge in conjunction with the explication of what may be seen as lower-order cognitive
processes (J. McPherson, 2000; G. McPherson & Thompson, 1998; Swanwick, 1998). Consequently, descriptions of
musical products such as composition and performance have been restricted in their explanatory power to interpretations of
competence based on issues such as "technique", "craft" and "quality" that fail to provide depth and a convincing notion of
musical excellence (NSW Board of Studies, 1999; Williams, 1999).
The paper proposes a framework for analysing the construction and use of musical knowledge based upon a four-level
model of cognition (Cantwell, in preparation). This model specifies four interactive components underlying learner activity
in any domain (see Figure 1): an operative component descriptive of the real time cognitive operations utilised in the
process of learning; a regulative component descriptive of those processes used in planning, controlling and regulating the
learning processes; a construct component descriptive of the beliefs and understandings about learning that act to drive the
regulative activity; and an efficacy component descriptive of situationally induced competency judgements influencing the
quality of engagement and volitional behaviours.
Figure 1 : Four level model of cognition underlying the current conceptualisation of musical learning (Cantwell, in
preparation)
Conceptually, the regulative, construct and efficacy components represent different aspects of metacognitive knowledge,
while the operative component represents implementation at the cognitive level. It is contended in this model that what
occurs at the operative level is both driven by and reflective of decisions made at the metacognitive level of task analysis.
The process of metacognitive decision making is further presumed to include interactions between the three metacognitive
components. Efficacy judgements, for example, are likely to predict qualities of task engagement through the situationally
determined judgement of potential competence in addressing and completing the task. Such decisions are mediated by
construct level conceptions of task and task requirements. For example, individuals can be said to approach learning with an
array of understandings of learning and expectations about learning. Individual theories of knowledge and knowing (eg.
epistemological beliefs; beliefs about intelligence; self-regulatory knowledge; depth and breath of domain knowledge), as
well as individual theories of self as learner (eg. motivational goals, attributional beliefs, efficacy beliefs) all contribute in a
situationally specific way to determine both the direction and form of task engagement, and through this, the quality of
regulative activity in controlling real time learning.
Musical learning, production, performance and assessment are argued in this paper to be explicable through consideration
of processes involved at the operational, regulative, construct and efficacy levels of human cognition. Whilst
acknowledging the uniqueness of the knowledge base of music, this model nonetheless situates musical excellence within a
common framework of learning in other domains. Empirical support for this proposition will be drawn from three areas:
from research into the planning processes of musicians (Cantwell & Millard, 1994, Sullivan & Cantwell, 1999); from
research into the composing processes of musicians (Irvine & Cantwell, 1999; Irvine, Cantwell & Jeanneret, 1999), and
from research into the assessment of musical thinking (Jeanneret, 1999).
Strategic processes in musical planning
In recent work, Cantwell and Sullivan (Cantwell & Millard, 1994; (Millard) Sullivan & Cantwell, 1999) have investigated
the planning processes of both novice and experienced musicians in learning new score. In both studies, evidence indicated
that factors emerging from the construct level in particular provided the strongest pointers to higher level strategy use
associated with a higher quality planning focus.
In their original study, Cantwell and Millard (1994) speculated that the level at which musicians form intentions in learning
new music may relate to factors beyond the ability to simply read or decode musical score. The processes involved in
learning new music were conceptualised by Cantwell and Millard to be comparable to those involved in the reading and
comprehension of text material. Thus, it was argued, the shift from decoding to comprehending in text learning may be
parallelled in the shift from notational decoding to meaning construction in learning new music. Understanding the quality
of planning in learning new music, then, necessarily involves acknowledgement of conceptualising different potential levels
of meaning, and through this, the potential use of increasingly complex strategic repertoires.
We may conceptualise individual differences in musical learning, then, as reflective of an interaction between underlying
musical epistemologies, prior knowledge states and situationally induced motivational states. Musical epistemology refers
to the individual's conceptualisations of both the structure of musical knowledge and the possibilities of musical knowledge
(that is, both the limits of what I "know" about music and the limits of what it may be possible to "know" at some
indeterminate future point). Additionally, musical epistemology refers to conceptions of how such knowledge has been and
may be constructed. That is, conceptions of music incorporate a strategic as well as substantive component.
In Cantwell and Millard's (1994) work, reported approaches to learning (Biggs, 1987) were seen as indicative of underlying
motivational and epistemological dispositions. Typically, students will report a disposition towards a deep or surface
approach to learning. A deep approach represents a combination of intrinsically derived learning goals with a strategic bias
towards the construction of complex and highly personalised meanings. A surface approach, on the other hand, represents a
combination of extrinsically derived learning goals with a strategic bias towards the reproduction of conventional categories
of the target knowledge. Cantwell and Millard's (1994) data from 8th Grade music students revealed that the surface/deep
distinction discriminated between individuals in terms of the structural complexity of the planning processes and the
strategic complexity associated with the production of such plans. Two examples from a deep oriented student and a surface
oriented student learning the same score may illustrate:
I could handle this ... so I would probably be sure that I am playing the correct notes and can pitch them
straight away ... and then build on that trying to make the piece musical. I'd do this by smoothing it all out,
playing the notes evenly, being expressive ... I mean following the dynamics or doing my own if it sounds too
boring - and yeah, I'd learn it first and then add to it, although I could probably do both at once. (Deep
oriented student, Cantwell & Millard, 1994, p 58)
I'd just ... um ... keep playing it, and if I got any mistakes I'd just fix it up ... like I'd keep going over the same
spot if I kept getting it wrong. Just like that I s'pose ... 'til I got it right. (Surface oriented student, Cantwell &
Millard, 1994, p 56)
In these instances, the differences in planning focus appear quite glaring. For the deep oriented student, learning the score
involved overlaying the notational elements with musical meaning - " trying to make the piece musical". In both
understanding of the nature of music, and in variety of strategies called upon to put in place such meaning, the deep
oriented student indicated a much more complex musical epistemology than was evident in the reproductive focus of the
surface oriented student ("I'd keep going over the same spot if I kept getting it wrong").
In a second study, (Millard)Sullivan and Cantwell (1999) tested a causal model in which prior knowledge and approaches
to learning were predicted to influence the complexity of strategy use and through this, the quality of planning focus.
Fifty-three tertiary music students reported on the way they would learn two scores to the point of performance
competence. In order to control for the effect of pattern identification in conventional notation, one score was presented in
20th Century graphic notation. Measures of approach to learning were also taken. Verbal protocols provided by the
participants were analysed for the presence of low level, mid level and high level strategies (see Table 1), and were
categorised in relation to the level of musical meaning aspired to in planning (see Table 2). For both the traditionally
notated score and the graphically notated score, clear relationships between approach to learning, complexity and level of
strategy use and quality of focal planning were observed. Those musicians reporting (at the construct level) a deep approach
to musical learning utilised a greater array of both mid level and high level strategies in constructing musical meaning, and
focussed planning at higher levels of abstraction than was the case for musicians reporting a surface approach to music
learning.
Examination of protocols from the students in Sullivan and Cantwell's study revealed that the epistemological assumptions
underlying a deep approach for these more expert subjects involved more complex and more abstract levels of musical
understanding. Where in the Cantwell and Millard study higher quality responses found points of integration in the music,
planning was nonetheless constrained to the parameters of the score itself. While this more limited epistemology was also
evident in responses of some of the Sullivan and Cantwell subjects (eg. Levels 3 and 4 in Table 2), some students displayed
significant development of epistemological understanding, allowing for constructed meanings to move beyond the literal
interpretations imposed by the attributes of the scores themselves. The exchange between the interviewer and subject 35 in
Sullivan and Cantwell's data illustrates this:
So you would learn this one differently from the other one. You didn't start at the beginning and then the end
lines and work in really slowly as you said for the other one.
Not for this sort of performance, no. I don't think its ... probably in about fifty years time to be exact, this bit of
notation here is the way music continues to grow. We'll be sitting down and looking at the semi-breves, and
things we have now like in a classroom, with spiked hair and big, huge, green fingernails hanging off, and
some strange instruments no-one's ever seen and someone will say this is how music's done in the twentieth
century, and everyone will be totally bamboozled by the fact that we have two minims making up a semibreve,
and four quavers making up... yeh, the exactness of it all. Music evolves, music notation has evolved or is
continuing to evolve. (S35, graphic score)
In short, the research reported by (Millard) Sullivan and Cantwell provides significant evidence for an emphasis on factors
not conventionally considered in explaining musical competence. While all musicians in their studies had attained a high
degree of technical competence in terms of notational fluency, it was still possible to discriminate between musicians on the
basis of the structural complexity of their planning processes, which in turn were seen to reflect fundamental differences
both in the complexity of the strategic repertoire underlying the planning and, importantly, in the driving epistemological
assumptions underlying intention formation and implementation.
Table 1: Strategic behaviours evident in the planning behaviours of musicians engaging new score (from Sullivan &
Cantwell, 1999)
Association: linking of two or Speed Alteration: Approaching the Interpretation: Strategy of imposing
more musical elements without learning of a piece at a tempo slower meaning on small parts, sections, or
transforming the musical than that which is set or preferred the whole of a piece thereby
meanings transforming the score into something
original and meaningful
Rote learning: Reproducing Chunking: strategy of sorting Patterning: involves searching for
larger or smaller units of music smaller, relatively unmeaningful units underlying themes, ideas, styles,
with the aim of memorisation. of musical information into larger, variations and other less obvious
Does not involve more meaningful units. structures so that a clearer
transformations of meaning understanding of the whole piece may
be gained.
Trial and error: reasonably Linking: When new musical Prioritising: Involves the sorting of
unsystematic strategy selection, information is referenced to prior relevant and important musical
which is persevered with until knowledge (e.g. composer, genre or information from less relevant and
unsuccessful and then replaced style, known terms and symbols) important information into a hierarchy
so that the goal may be achieved in an
orderly fashion.
Table 2: Categorisation of focal planning levels of musicians engaging new scores (from Sullivan & Cantwell, 1999)
No
LEVEL 2: Focus on discrete elements of How would you go about learning this one?
musical information (e.g. notes, dynamics,
rhythms) without considering overall intent I don't know. I'd have to learn what all the symbols meant.
and design of the music
Are you familiar with this type of music?
LEVEL 3: Still a focus on components, but How would you go about learning this piece for a performance?
there is evidence of linking elements, and of
prioritising more important elements above I'd probably go through it once just to know where all the melodies and
less important (e.g. melody over everything go and find the difficult passages, work on those bits and
note-learning) then try and put it all together, and with the difficult passages, like work
on them slowly. (S04, trad. score)
LEVEL 4: Focus shifts to the full score, If you had to learn this for a big performance, what steps would you
but is limited in perspective to the take to do that?
prescriptions of the score rather than its
transformation into an original I'd look at the whole piece overall, just to see the main style of it and
interpretation what common features are throughout the whole thing, ... look at the
time signature, look for accidentals ... I'd probably be fingering part of
it to while I was doing that .... Then I'd probably play it through once,
then I'd go back and fix mistakes bar by bar, then putting it in the
context of the bar, then the next bar, then put them together, keep going.
Then the parts where I had more problems just work on them again,
slower .. (S36, trad. score)
LEVEL 5: A transitional level in which the So this is a different sort of score. How would you go about
musician acknowledges the need to approaching this one?
consider underlying features of the score
(new interpretations, patterns, ideas, To make this score, or as I was reading it, the different lines suggested
stylistic traits and so forth) but the focus of different feelings or bursts of energy, or longer drawn out periods. As
planning still remains on the prescriptive for the first line, relating to the pitch I suppose, it'd be a longer pitch
elements of the score and then suddenly down and long, and then it would be boom, boom,
boom, three big bursts of energy. To me it would be more a process of
feeling my way around the piece, getting a feel for each different line.
(S13, graphic score)
LEVEL 6: The focus here is on the How would you go about planning to learn this piece for a
meaning of the music - what it is trying to performance?
say, what the composer intended, what the
performer wishes to express. The learning When I first see it you just read it through, learn, have a look to see
of the components of the score now serve to how you'd have to play it like whether it goes up or down and what kind
scaffold more create and interpretative of movement it is, arpeggios or scales, or how fast or slow, just general
emphases kinds of things. Then, when you've got an idea, I'd just play it through,
master the notes, and then, depending upon what style of music it was,
work out what kind of interpretation would suit it best, like which kind
of expressive techniques you should use kind of thing. (S03, trad. score)
LEVEL 7: Here the musician develops the So you would learn this one differently from the other one. You didn't
meaning of the music to incorporate other start at the beginning and then the end lines and work in really slowly
domains, to discuss the music in abstract as you said for the other one.
and philosophical terms, to consider
audience reactions, and to propose highly Not for this sort of performance, no. I don't think its ... probably in
original ideas and interpretations. about fifty years time to be exact, this bit of notation here is the way
music continues to grow. We'll be sitting down and looking at the
semi-breves, and things we have now like in a classroom, with spiked
hair and big, huge, green fingernails hanging off, and some strange
instruments no-one's ever seen and someone will say this is how music's
done in the twentieth century, and everyone will be totally bamboozled
by the fact that we have two minims making up a semibreve, and four
quavers making up... yeh, the exactness of it all. Music evolves, music
notation has evolved or is continuing to evolve. (S35, graphic score)
which the composer addresses the task, and the strategic variation implied by that process. Attention at the extreme
categories of the taxonomy (Categories1 &2 and Categories 6 & 7) may be associated with more explicitly metacognitive
activity. The central categories are more representative of operational level activity in "real time" composing.
Attentional shifts, then, between the extreme categories and the central categories may be seen as indicative of reference to
the more abstract construct level understandings driving decision-making at the regulative and process levels.
By way of illustration, the attentional focus of the three composers over the time of composing are included in Figures 2, 3
and 4. The X- axis on these illustrations indicate the category of attentional focus at any given point in time. The Y-axis
indicates real time.
The most notable observation from these illustrations is the increasing spread of categories involved in the composition
process as one moves from the more novice-like to the more expert-like composer. Consistent with expectations, the novice
(Case 1) gave no explicit attention to the global aspects of either problem representation or evaluation. That is, the
composition activity was closely tied to the act of composing as a regulated activity, but one which, in all likelihood,
involved little reference to higher order elements of meaning and understanding. The limited amount of monitoring is more
indicative of reflection on the surface features of the composition than on the overall meaning and direction of the
composition. Case 5, as an example of intermediate level problem-solving, reflected many of the focal and strategic
attributes of the novice. However, the processes differed in the nature of initial planning and, to a lesser extent, in the use of
monitoring and evaluation. The composer explicitly reflected upon the nature of the problem as a starting point, and then
focussed on a trialing strategy for the majority of the time spent composing. The expert composer, on the other hand, while
also giving initial time to representing the musical problem, spent a far greater proportion of time compared to the more
novice-like composers, in deliberative planning and in monitoring and evaluating.
On the basis of these case studies, experts in composition appear to be distinguished from novices in the range of attentional
foci used, in the cyclic nature of the attentional focus, in the depth of processing and in the level and variety of strategic
planning brought to bear on music composition. These case studies lend support to the view that higher-order thinking
underpins the capacity to regulate highly complex musical thought.
Assessment of musical thinking
The previous sections have emphasised the complex nature of musical learning. In particular, the research into planning and
into composing have indicated a significant role for higher-order cognitions in explaining the form and quality of musical
outcomes. Our overall theoretical framework suggests that explanations of higher order competencies lie in the nature of the
understandings and beliefs constructed about learning (or music) that act to drive the process of learning itself. It seems,
therefore, a reasonable position to suggest that the assessment of musical outcomes ought also to reflect such a range of
conceptual complexity as has been illustrated in the processes musicians employ to create music.
Assessment processes and criteria for musical products currently in operation in the New South Wales Higher School
Certificate (HSC) have, until recently, been based on norm-referencing. Criteria drawn from syllabus outcomes exist for all
aspects of the testing procedure including composition, performance and sight reading, but they have been organised into
five bands of "ranking desciptors" that dictate a normal curve based on the comparison of students (Board of Studies,
1999). There has been a gradual move towards criterion referencing and the use of benchmarks for the training of
examiners that is due for full implementation in 2001 where student achievement will be reported on a single set of
performance bands that will cover all aspects of the testing procedure including performance, composition, musicology and
aural. The development of criteria for the new system has brought to light certain inadequacies of the current system of
descriptors. This is not to imply questions about fair and credible assessment in the past. Rather, it is through the constant
development and refinement of criteria that the lack of a framework explaining musical products from a convincing
research base has become more apparent.
The descriptors for composition, for example, are largely skill-based and struggle to acknowledge musical thought in words
such as "musically convincing", "sustained involvement in the composition process", and "convincing development of
ideas" (see NSW Board of Studies, 1999, p137). Such phrases can be subject to varying interpretations that give rise to
discrepant results. This lack of a framework for the analysis and assessment of music knowledge is not confined to the
assessment of high school students. Throughout the data reported by Williams (1999) in her interviews in the tertiary
sector, the same emphasis on skill exists, with a number of references to the "craft" of composition and "technical
proficiency" without further explanation and little acknowledgment of the part cognition might play in this development.
While the notion of "originality", implying high level thought, is recognised, it would seem that skill criteria are easier to
articulate in an explanation of musical excellence than the constituents of "originality". Similar issues exist in the area of
performance. The HSC criteria for "outstanding" include reference to "sophisticated self-expression" and "a musically
sensitive and personal interpretation" (NSW Board of Studies, 1999, p.136), while Williams (1999) refers to the assessment
of performance often coming down to judgements about "musical" versus "unmusical" playing (p.29). These are words
musicians seem to understand intuitively but are often at a loss to explain in more quantifiable terms without breaking a
work down into its (often skill-based) component parts and thereby losing the sense of "whole" that is fundamental to
understanding the effect of the work (Jeanneret, 1999).
Clearly, high levels of cognitive and metacognitive musical competencies are built into notions such as "sophisticated" and
"musical", but it seems to us that this is not necessarily reflected in the current language of musical assessment. Evidence
from both the Cantwell and Sullivan's research and from Irvine's research would suggest that reference to the nature of
higher order cognitions is fundamental to discriminating between different qualities of musical outcome. We may propose,
for example, that a more sophisticated musical epistemology (as an example of a construct level belief) would acknowledge
the possibility of new expressive forms within existing musical genre (as a possible meaning of "sophisticated"), and, when
combined with an appropriate motivational state and strategic repertoire, would give rise to greater complexity of musical
thought demonstrated in the quality of the musical product (see for example, the student protocol in Level 7 of Table 2
above).
We would suggest that the four level model of cognition provides a comprehensive theoretical base through which
understanding of the underlying cognitive processes and the associated driving beliefs and understanding responsible for
the production of musical learning may be more comprehensively and validly assessed.
Note:
1. The majority of Year 12 students in New South Wales sit for the Higher School Certificate (HSC), a public examination,
the results of which students depend upon for entrance into tertiary institutions and courses. The certificate students receive
documents both the results of the examination and an assessment mark provided by the school for each subject. Students
may elect one of two music courses to present as part of their HSC and are able to specialise areas of performance,
composition and musicology over and above certain core requirements. All students must perform a minimum of two
pieces and the submission of a composition is mandatory for students who elect Music 2.
References
Biggs, J. (1987). Student Approaches to Learning and Studying. Hawthorn, ACER.
Board of Studies, NSW (1999). 1998 HSC Examination Report: Music. Sydney: Board of Studies.
Cantwell, R (in preparation). A framework for analysing student learning.
Cantwell, R. & Millard, Y.(1994). The relationship between approach to learning and learning strategies in
learning music. British Journal of Educational Psychology. 64, 47-65.
Cantwell, R. & Moore, P. (1996). The development of measures of individual differences in self-regulatory
control and their relationship to academic performance. Contemporary Educational Psychology. 21, 500-517.
Cholowski, K., & Chan, L.K.S. (1994). Knowledge-driven problem solving models in nurse education.
Journal of Nursing Education. 34 (4), 148-154.
Irvine, I. (1999). Musical composition, learning and assessment. Sounds Australian. 53, 31-33.
Jeanneret, N. (1999). Music assessment: What happens in the school sector? Sounds Australian. 53, 18-21.
Lawson, M. (1991). Managing problem-solving. In Biggs, J. (Ed.). Teaching for Learning. ACER, Hawthorn.
McPherson, J. (2000). Assessing musical performance: Current views and practice. Unpublished paper,
Faculty of Education, University of Newcastle.
McPherson, G. & Thompson, W.F. (1998). Assessing music performance: Issues and influences. Research
Studies in Music Education 10, 12-24.
Reitman, W. (1965). Cognition and Thought. New York, Wiley.
Salomon, G. & Globertson, T. (1987). Skill may not be enough: The role of mindfulness in learning and
transfer. International Journal of Educational Research. 11, 623-638.
Swanwick, K. (1998). The perils and possibilities of assessment. Research Studies in Music Education 10,
1-11.
Sullivan, Y. & Cantwell, R. (1999). The planning behaviours of musicians engaging traditional and
non-traditional scores. Psychology of Music. 27, 245-266.
Williams, C. (1999). Questions on assessment in the tertiary sector. Sounds Australian. 53, 25-31.
Back to index
Proceedings abstract
Dr Ian Cross
ic108@cus.cam.ac.uk
Background:
Aims:
The aim of this paper is to employ the results of an empirical study conducted
in Northern Potosí, Bolivia (in collaboration with an ethnomusicologist, Henry
Stobart) to examine critically current theories of music cognition and
cognitive anthropology and to outline a framework within which a generalised
model of the cultural dynamics of music cognition might be developed.
Main contributions:
Implications:
Back to index
Proceedings paper
Flute 2.92 (1.68) 2.70 (1.59) 2.71 (1.29) 2.77 (1.71) 3.04 (1.86) 3.04 (1.70) 2.61 (1.57) 2.76 (1.56) 2.36 (1.51)
Violin 2.96 (1.14) 3.60 (1.53) a 3.56 (1.52) 2.93 (1.33) 3.30 (1.55) 3.54 (1.55) 3.33 (1.56) * 3.23 (1.60) 2.85 (1.48)
** b
Piano 2.18 (1.48) 2.45 (1.41) 2.19 (1.42) 2.43 (1.52) 3.48 (1.91) aa 2.59 (1.76) bb 2.22 (1.30) 2.67 (1.52) 2.36 (1.38)
Drums 4.80 (1.41) 4.17 (1.75) a 4.43 (1.58) 4.82 (1.47) 3.33 (1.92) aa 4.17 (1.82) bb 4.27 (1.77) 4.13 (1.77) 4.46 (1.67)
*
Guitar 3.49 (1.72) 3.43 (1.67) 3.27 (1.52) 3.71 (1.59) 3.70 (1.37) 3.54 (1.59) 4.15 (1.49) 3.97 (1.50) 4.09 (1.43)
Trumpet 4.65 (1.05) 4.66 (1.26) 4.78 (1.41) 4.36 (1.26) 4.15 (1.41) 4.06 (1.29) 4.24 (1.43) 4.42 (1.39) 4.31 (1.27)
Table 2: Mean rank (and standard deviation) of male participants' preferences for playing specific instruments at Time 1, Time 2 and Time 3 according to schools clusters, and
results of comparisons
Flute 4.49 (1.29) 4.74 (1.37) 4.64 (1.24) 4.28 (1.65) 4.18 (1.55) 4.60 (1.40)bb 4.28 (1.64) 4.35 (1.69) 4.24 (1.54)
Violin 4.65 (1.30) 4.89 (1.20) 5.02 (1.17) 4.43 (1.43) 4.56 (1.44) 4.34 (1.45) 4.61 (1.32) 4.56 (1.35) 4.79 (1.37)
Piano 3.12 (1.54) 3.75 (1.44) 3.02 (1.61)bb 3.91 (1.59) 4.07 (1.59) 3.75 (1.48) 3.40 (1.79) 3.41 (1.68) 3.37 (1.62)
Drums 2.65 (1.39) 1.87 (1.06) a 2.43 (1.45) b 2.78 (1.72) 2.19 (1.59) aa 2.50 (1.63) 2.75 (1.56) 2.76 (1.71) 2.58 (1.59)
Guitar 2.16 (1.56) 2.38 (1.41) 2.09 (1.18) 2.33 (1.37) 2.86 (1.44) a 2.64 (1.63) 2.57 (1.56) 2.39 (1.24) 2.58 (1.49)
Trumpet 3.71 (1.56) 3.24 (1.38) 3.52 (1.33) 3.12 (1.35) 3.02 (1.41) 3.12 (1.57) 3.89 (1.39) 3.52 (1.47) 3.42 (1.45)
Discussion
We reported previously that there was an immediate effect of providing counter-stereotypical role-models on boys' and girls' preferences for gender-consistent and
gender-inconsistent instruments (see Harrison & O'Neill, in press, for discussion of results). The present study indicated some lasting influence of the intervention concerts on
the children's instrument preferences. Boys who saw a woman play guitar continued to like it less, and liked violin less regardless of whether they had seen a man or woman
play it. The increase in their liking for drums did not last. Girls still liked violin and drums more (after seeing a man or woman play the instruments), and flute less (after seeing
a man play it) seven months after the concerts. However, the decrease in girls' liking for piano (after seeing a man play it) did not last.
These results suggest that the decrease in liking for same-sex instruments did to some extent endure over seven months. Thus, we found more enduring effects of providing
role-models compared to other studies designed to challenge children's gender-typed beliefs. However, our study suggests that exposing children to counter-stereotypic
role-models is not necessarily effective in either the short- or long-term. For example, girls actually liked flute less after seeing a man play it, and this effect was still evident
seven months later. Other studies in the literature have reported negative effects of such strategies. For example, eighth grade boys expressed more stereotypical attitudes about
women's roles after watching counter-stereotypic advertisements compared to stereotypic commercials. Guttenbrag and Bray (1976) presented participants with films and
stories about women with counter-stereotypic roles. Whilst female participants' stereotypic attitudes decreased significantly, male participants in fifth and ninth grade showed
more stereotypical attitudes following presentation of the role-models.
Our data suggests that children were more likely to change their preferences for instruments rather than the stereotype itself. Thus, findings indicate that presenting
counter-stereotypic role-models may be an ineffective strategy for challenging children's gender-stereotyped beliefs. More research is needed before we will have a better
understanding of the best way to present instruments to children if we are to assist them in overcoming the tendency to restrict instrument choice along grounds of gender. It
may be that role-models should be presented separately in single-sex teams to overcome the negative effect we observed in our study (see also Matteson, 1991). In the
mainstream gender literature, longer and more interactive intervention studies have been more successful (e.g., Koblinsky & Sugawara, 1978; Weeks & Porter, 1983). Despite
various methodological problems, Tarnoswki's (1993) study suggested that gender-neutral instrument presentation workshops (over a period of eight weeks) may be beneficial
in encouraging children to hold gender-neutral beliefs about the gendered nature of musical instruments. We propose that it may not be enough to merely expose children
passively to musicians that either do or do not conform to their stereotypes, rather it may be necessary for children to engage actively in considering the role of gender in
playing instruments through special programmes designed to raise their awareness and develop more gender-neutral attitudes.
References
Abeles, H. F., & Porter, S. Y. (1978). The sex-stereotyping of musical instruments. Journal of Research in Music Education, 26, 65-75.
Bruce, R., & Kemp, A. (1993). Sex-stereotyping in children's preferences for musical instruments. British Journal of Music Education, 10, 213-217.
Carter, B., & Patterson, C. J. (1982). Sex roles as social conventions: the development of children's conceptions of sex-role stereotypes. Developmental Psychology, 18,
Back to index
Proceedings paper
Geoff Luck,
Keele University.
Introduction
The conductor of a classical instrumental ensemble traditionally uses a small white baton to describe
certain patterns that represent the rhythm of the music being played by the ensemble. Each rhythm has a
different pattern, and these patterns can be modified to indicate expressive features of the music (Rudolf,
1980).
For the last 600 years or so, Western music has been largely polyphonic, resulting in a need for the
different members of an ensemble to accurately synchronise their individual performances both with each
other and with the conductor in order to produce a satisfactory "ensemble" performance (Rasch, 1979).
Thus, musicians playing under a conductor's direction must be proficient at discerning which features of
the conductor's gestures indicate where the beat is located. Likewise, the conductor must be proficient at
showing the precise location of the beat.
Gilden (1991), Stetson (1905), and Wood (1995) note that the visual impression of a rhythmic pule or
"beat" may be produced by a conductor using a gesture in which the velocity structure of the movement is
varied along its path such that the point of maximum velocity is accompanied by a change of direction in
the trajectory of the movement. Clayton (1986) offers a similar description, but adds that in his studies the
beat point nearly always coincided with the baton positional minima, that is, the lowest point of the baton
trajectory.
With regards the 'style' of gestures used by conductors, in a series of studies investigating two-beat
('up-down') gestures, Stetson (1905) concluded that the conductor's downward beat-stroke was a ballistic
movement, while the upstroke away from the beat was a more controlled movement. This view allows
modification of the gesture, in terms of velocity and trajectory, only in a 'temporal window' during the
movement away from the beat. Wood (1995), on the other hand, suggests that all gestures, including the
beat-stroke, should be controlled movements, thus allowing modification of the gesture during the
formulation of the beat itself. In contrast with Stetson's (1905) view, Wood's (1995) approach implies that
additional information may be conveyed in the conductor's arm movements during the evolution of each
beat which may assist in musicians more accurately prediction the exact instant of each beat. The result of
this ought to be a higher level of synchronisation between musician and conductor.
Musicians' ability to synchronise with a conductor has been investigated by Clayton (1986), who, in a
series of studies, systematically controlled the presence or absence of the visual (conductor) stimulus, the
auditory (ensemble) stimulus, or the written music, in order to investigate the relative contribution of these
sources of timing information to an ensemble's ability to play "together". He found that, except under
certain circumstances, the conductor provided general, as opposed to specific, timing information, while
ensemble exerted the most overall influence over the synchronicity of the players. Overall, participants
revealed a mean asynchrony of 26ms between their responses and the conductor's gestures.
Assuming, then, that musicians do look to the conductor for at least some timing information, can
sensitivity to such information be increased with practice? Runeson and colleagues (Runeson, 1984;
Runeson & Frykholm, 1981, 1983) suggest that sensitivity to the kinematics of human movement may be
heightened through experience. 'Kinematics' is here defined as, "the changes in the optic array that occur
when an object moves...Spatial position, velocity, acceleration, and all other derivatives of a motion path
may be regarded as kinematic variables." (Gilden, 1991, p.555). Furthermore, Rubel (1985) and Tees
(1994), for example, suggest that visual and auditory sensory thresholds, and intersensory responsiveness,
are affected by experience. This research, therefore, implies that performance on tasks involving
temporally based, multimodal events, such as synchronising a motor response to a visual event, is likely to
improve through long-term experience.
The research reviewed above prompted several questions to be asked, and a prediction regarding the
outcome of each question to be made:
1. How accurately can people synchronise a motor response with a conductor's demarcation of the beat
when only information regarding the movement of the baton is available? It was predicted that
temporal synchronisation between participants' responses and the conductor's beat would be
somewhat poorer than Clayton's (1986) findings.
2. Does the addition of temporal information in the form of the kinematics of the conductor's elbow and
wrist movements to the trajectory of the baton affect the accuracy with which people can
synchronise a motor response with the conductor's beat? It was predicted that synchronisation under
these conditions would be at least as good as Clayton (1986) reported.
3. Does amount of experience of playing under a conductor's direction affect the accuracy with which
people are able to synchronise their response with the conductor's beat? It was predicted that those
with higher levels of experience would be able to achieve a higher level of synchrony with the
conductor's beat than those with lower levels of experience.
Method
The Conductor
A single, professional conductor, henceforth referred to as 'MD', was the conductor in all stimuli.
The Participants
32 participants, 8 males and 24 females, took part in this study. All participants were considered to be
'musicians', although their experience under a conductor's direction varied from very little to rather
considerable.
The Stimuli
Participants watched a sequence of digital video clips, each of which showed a conductor either beating a
single upbeat and downbeat at one of three tempi (slow, medium, or fast), or beating an upbeat followed by
a series of beats at varying tempi. The conductor was filmed as if from the 'cello section' of an orchestra.
There were two versions of each clip: The full-cue version showed the full image of both the conductor's
movements and those of the baton; the baton-only version showed only the movements of the baton, the
conductor's limb movements having been digitally removed from the image. Each participant saw either
the baton-only or the full-cue stimulus, and each condition was comprised of a total of 22 different clips,
containing a total of 129 beats.
Apparatus and Materials
Computer Hardware. Participants viewed the stimuli on a Personal Computer, and responded to each beat
by pressing the left button of a computer mouse. The mouse chosen was selected for its particularly loud
and positive 'click' so as to provide participants with explicit auditory feedback concerning the moment at
which they responded.
Questionnaire. Each participant completed a questionnaire as part of this study, the purpose of which was
to elicit background information regarding, for example, the total amount of experience they had under a
conductor's direction in general, and whether they had any experience under MD's direction.
Procedure
Participants were tested individually, and were required to press the left mouse button in time with the
conductor's demarcation of the beat, as if they were using the mouse to 'play a note' on each beat.
Participants were randomly selected to receive either the baton-only or full-cue stimulus, though
presentation order of all clips was the same for all participants. After responding to the visual stimuli,
participants were then asked to complete the post-test questionnaire. Once this was completed the
experiment came to an end, participants were informed of the purposes of the study, and thanked for their
time.
Results
Results are presented with reference to each of the three research questions in turn.
1. How accurately can people synchronise a motor response with a conductor's demarcation of the
beat when only information regarding the movement of the baton is available?
Table 1 shows the percentages of early, accurate, and late responses, and overall mean response, by
participants in each condition. As can be seen, only 33.3% of beats in the baton-only condition
received a response that might be considered accurate, while the majority of beats received a late
response. The average response by participants who received the baton-only stimulus was 82ms late.
In line with the prediction made for research question 1, this figure is somewhat poorer than that
reported by Clayton (1986).
Table 1. Percentages of early, accurate, and late responses, and overall mean response, by
participants in each condition.
2. Does the addition of temporal information in the form of the kinematics of the conductor's elbow and
wrist movements to the trajectory of the baton affect the accuracy with which people can synchronise
a motor response with the conductor's beat?
Again with reference to table 1, it can be seen that participants in the full-cue condition responded
accurately to only 23.3% of beats, while all other beats received a late response. Overall, the average
response of full-cue participants was late by 84.67ms − a similar, though slightly higher figure
compared to baton-only participants. Thus, the prediction made regarding this question was not
supported by the data. Full-cue participants in fact demonstrated slightly poorer levels of
synchronisation than baton-only participants.
3. Does amount of experience of playing under a conductor's direction affect the accuracy with which
people are able to synchronise their response with the conductor's beat?
Table 2 shows the percentage of beats that received an accurate response by each experience group,
between conditions. Overall, it was found that total number of years experience following a conductor was
negatively related to a person's ability to synchronise a motor response with the conductor's beat.
Table 2. Percentage of beats that received an accurate response by each experience group, between
conditions.
In addition, as table 3 shows, experience under the specific direction of MD, the conductor used in the
stimuli, was also negatively related to levels of synchronisation.
Table 3. Percentage of beats that received an accurate mean response by each MD experience group,
between conditions.
MDYes MDNo
These results, then, do not support the prediction made for research question 3.
Discussion
Summary of results
To summarise, then: Participants demonstrated a general tendency to respond late; in addition, full-cue
participants demonstrated slightly lower levels of accuracy compared to baton-only participants; finally,
amount of experience under a conductor's direction in general, and under the specific direction of MD, was
negatively related to synchronisation ability. So, how might these findings be explained?
Possible strategies used by participants when attempting to synchronise with the conductor:
'Pick a point and stick to it'
Clayton (1986) suggests that people apply a two-stage strategy when attempting to coordinate a series of
motor responses with a series of visual events. The first stage involves selecting a somewhat arbitrary
asynchrony between stimulus and response, while the second stage involves replicating this offset under
limitation imposed by, for example, differences in ability to track visual versus auditory stimuli (e.g.,
Bartlett & Bartlett, 1959), and variability in motor control/response (e.g., Trew & Everett, 1997). Clayton
(1986), therefore, suggests that the level of synchronisation achieved by an individual is somewhat
arbitrary. Another approach is taken by advocates of the Evaluation Hypothesis
Evaluation Hypothesis
It has been observed that synchronisation tasks typically reveal a systematic negative asynchrony between
the stimulus and participants' responses − that is, responses tend to precede stimuli by a few tens of
milliseconds (Vos, Mates & van Kruysbergen, 1995). Vos & Helsper (1992) suggest that this effect arises
as a result of an asymmetric evaluation function such that participants would rather respond a little early
than risk responding late.
It might be suggested, however, that musicians favour a late response over an early one, resulting in
responses which 'drag' behind the stimulus. Such an evaluation might be learnt, for example, from playing
in ensembles, where it is discovered that 'playing late' is the safer option if you are not sure about the
placement, or quality, of your entry. If such an asymmetric evaluation was learnt through experience, one
might expect those with the most experience playing in conducted ensemble to respond late more often
than those with less such experience. Indeed, this is exactly what was found in the present study.
Style of Beat
Overall, visual perception of a conductor's arm movements per se may not be as important as the 'style' of
the baton gesture that results from a given style of arm movement produced by particular combinations of
relative order and degree of joint mobilisation. In support of this idea, both Boult (1963) and McElheren
(1966) advocate the use of smooth and predictable gestures to help players more accurately predict the
position of the next beat, and also 'feel where they are' between beats.
It may be suggested, then, that as long as at least the baton is visible, the ability of an ensemble to
synchronise with the conductor may depend upon the conductor's ability to conduct in a particular, perhaps
somewhat smooth style, combined with experience-related attempts to respond after the beat, coupled with
experience-induced anticipatory tendencies.
Conclusion
In conclusion, it might be encouraging to consider the general lack of ability of participants in the present
study to synchronise a motor response with the conductor's demarcation of the beat in light of an
observation by Sternberg & Knoll (1994): "Regardless of their inability to be on target with the required
experimental tasks....expert musicians will nevertheless perform as skilled soloists and/or as a fine
ensemble in a musical setting. In other words, they will all come in slightly early or slightly late, but they
will all make their musical entrances together, as a group." (p.241).
References
Bartlett, N. R. & Bartlett, S. C. (1959). Synchronisation of a motor response with an anticipated sensory event. The
Psychological Review, 66(4), 203-218.
Boult, A. (1963). Thoughts on Conducting. London; Phoenix House.
Clayton, A. M. H. (1986). Coordination between players in musical performance. Unpublished PhD Thesis; Edinburgh
University.
Gilden, D. L. (1991). On the origins of dynamical awareness. Psychological Review, 98(4), 554-568.
McElheren, B. (1966). Conducting Technique for Beginners and Professionals. N.Y.; O.U.P.
Rudolf, M. (1980). The Grammar of Conducting: A Practical Guide to Baton Technique and Orchestral Interpretation
(Second Edition). London: Collier Macmillan Publishers.
Runeson, S. (1984). Perceiving people through their movements. In: Kirkaldy, B. (Ed.), Individual Differences in Movement.
Lancaster: M.T.P. Press.
Runeson, S. & Frykholm, G. (1981). Visual perception of lifted weight. Journal of Experimental Psychology: Human
Perception and Performance, 7, 733-740.
Runeson, S. & Frykholm, G. (1983). Kinematic specification of dynamics as an informational basis for person and action
perception: expectation, gender recognition, and deceptive intention. Journal of Experimental Psychology: General, 112,
585-615.
Sternberg, S. & Knoll, R. L. (1994). Perception, production, and imitation of time ratios by skilled musicians. In Aiello, R. &
Sloboda, J. A. (Eds) Musical Perceptions. Oxford U. P.; Oxford
Stetson, R. H. (1905). A motor theory of rhythm and discrete succession: Parts I and II. Psychological Review, 12, 250-270,
293-350.
Tees, R. C. (1994). Early stimulation history, the cortex, and intersensory functioning in infrahumans: Space and time. In: D.
J. Lewkowicz & R. Lickliter (Eds.), Development of Intersensory Perception: Comparative Perspectives. Hillsdale, NJ;
Erlbaum.
Trew, M. & Everet, T. (1997). Human Movement: An Introductory Text (3rd Ed.). Churchill Livingstone; London.
Vos, P. G. & Helsper, E. L. (1992). Tracking simple rhythms: On-beat versus off-beat performance. In: F. Macar, V.
Pouthas, & W. J. Friedman (Eds), Time, action and cognition. Dordrecht; Kluwer.
Vos, P. G., Mates, J. & van Kruysbergen, N. W. (1995). The perceptual centre of a stimulus as the cue for synchronisation to
a metronome: Evidence from asynchronies. The Quarterly Journal of Experimental Psychology, 45A(4), 1024-1040.
Wood, M. (1995). Notes on expressive gestures: Observing and quantifying the 'character' of movement. Unpublished
Research Manuscript.
Back to index
Keynote abstract
M. Baroni
Meaning in Music
The paper is divided into three parts: in the first part the problem of the difference between verbal and
musical meanings is considered. Examples are given of allusions to human experiences that music can
convey: the main areas being speech, physical gestures, aspects of cultural habits and synaesthetic
sensations. In the second part, the idea of musical grammar is discussed from two particular points of
view: grammar is a set of rules that govern the organisation of musical structures; grammar is also a
set of conventions that allow the interpretation of musical structures as allusions to human experience.
The main aspect of such allusions is that of "emotional schemes". Musical language is able to mix and
dose musical structures in order to obtain meanings not even existing before their specific musical
expression. In the third part of the paper the problem of verbal interpretation of musical meaning is
discussed. The conceptual resources of verbal language are not always adequate to describe the
substantial contents of a piece of music.
Introduction
To speak of meaning is one of the most complex and controversial issues in the field of the
philosophy of music. I do not see any point in going into the history of this concept because all the
participants in this Conference are undoubtedly very familiar with this area. Nor do I intend to speak
about the complex history of the discussions about the relationships between musical and verbal
languages. Some of the protagonists of this Conference such as John Sloboda (1985) and Erik Clarke
(1989) have offered well known contributions to such debate. To introduce the problem I will limit
myself to saying that the meaning of the word "meaning", when applied to music, can become
extremely ambiguous. Together with many other words derived from the linguistic area, it risks
becoming a mere metaphor, losing its properties of conceptual definition. The meaning of a word
does, in fact, have certain precise aims. First of all to definine categories of objects or events which
are part of ordinary human experience. Secondly, to distinguish one concept from other similar ones.
In other words, the main aims of a language are to reduce the possibilities of ambiguity, to codify the
relationships between the meaning and the phonetic form of the words, and to limit, whenever
possible, the changes in such codified relationships. For these reasons the words of a given language
are stable, are finite in number and have a lexical meaning given by the dictionary.
In music none of these conditions exists: its meanings, even though they can be interpreted, are not
finite in number, since they change from composition to composition, and for this reason music does
not have a lexicon; in addition, the meanings are not aimed at categorizing events or objects, but
rather at evoking living experiences also linked to emotional conditions. In other words, music does
not have real semantic properties, and does not have real meanings, if we are to use the two terms in
their proper linguistic sense. But in everyday language, and also in scientific conferences, we insist on
adopting these words, apparently without any particular difficulties. In some cases, and in other
languages, a sort of search for alternative words can be observed: signification is used in French, or
senso in Italian. In any case the same problem arises: that of understanding more precisely what we
are referring to when we speak of musical meaning and consequently what the difference is between
musical and linguistic meaning.
In order to discuss this problem adequately three conditions are necessary: firstly to define the
non-conceptual nature of musical meaning, secondly to describe its relationships with musical
structure and finally to indicate to what extent it is possible to use verbal language in order to describe
the meaning of a piece of music. My speech is therefore divided into three parts, each of them devoted
to one of these three conditions.
the words and gestures themselves. In the new situation they were conventionally assigned to the
corresponding structural events and this heritage was unconsciously passed on to the musical
competence of many generations of listeners and of composers. The intuitions of Roland Barthes
(1953) about the passage of stylistic schemes from one text to other texts, or the analyses of Kofi
Agawu in the more specific musical system of classical style (1991), that is, the so-called
"intertextual" theories also applicable to music, may help to give the above-mentioned stylistical
transformations a wider theoretical context. Observed from this point of view, the problem acquires
further sense: the meanings assigned to the structural features of instrumental music were able to
achieve significance because they passed from composer to composer and from composers to
listeners; in other words, because they became a social phenomenon and implied a wide circulation of
experiences. Although these socially accepted conventions may not always have been explicit or
conscious, they nevertheless had no difficulties in passing from mind to mind and from epoch to
epoch. A great number of cues, observable in the writings of musicians and musicologists of the XIX
century (Bent 1994), show that such gestual conventions were still present when the idea of absolute
music imposed its rights on European aesthetics. Scientific research devoted to the problem of the
relationships between physical gestures and music (Trevarthen 1999-2000; Krumhansl-Schenck 1997)
confirms the physiological and psychological depth of the phenomenon and, for this reason, can help
explain why such phenomena were able to remain as an ininterrupted tradition for so many centuries.
Another particularly important source of musical meaning is to be found in the traditional
connotations assigned to different instruments of the orchestra, mainly, although not exclusively,
linked to their historical and anthropological origins. For example, the pastoral heritage of the sound
of the oboe, the military associations of the trumpet or those of hunting linked to the horn, have been
frequently used in order to convey particular meanings based on socially accepted conventions of
interpretation. Conventions like these derive from geographical conditions: a particular use of the
guitar evokes Spain and many instrumental timbres are related to specific cultures: Sardinia, Sicily,
India, Japan and so on. Such "geographical" evocations are not restricted to instrumental timbres, but
also to a number of other features such as an exotic scale or a particular form of cadence (the falling
fourth for Russian music). Gino Stefani (1987), moreover, observed that many social practices
(marching, dancing and other ritual behaviour linked to music, as well as to gestures, or to ideological
symbols) left their traces in instrumental music and in the conventions of interpretation which had
been passed on socially.
The instrumental timbres, however, have other sources of meaning, not determined by their origins,
but by their intrinsic qualities. The famous Traité d'orchestration by Hector Berlioz (1843) is a real
mine of information in this respect: the sound of the flute, for example, is ...(qui trovare esempi). The
quoted examples and most af all specific scientific research devoted to the problem (for example
Bismarck 1974, Erickson 1975) show that in Western musical tradition the perception of sound is
synaesthetically linked to sensations of a visual (darkness, brightness) or tactual nature (delicate,
rough, hard), to physical efforts (heavy, light) or dimensions (tiny, huge). Analogous observations
have been made by linguistic research in the field of so-called "phonetic symbolism" (Dogana 1983)
but in the case of music the treatment of texture, well known in analytical practice, can multiply these
synaesthetic effects, for example by means of accumulations and rarefactions, densities and
transparencies, ascents and descents. Sensations of force or softness are obviously due to quite
analogous qualities of dynamics.
From this brief overview it can be seen that the the nature of musical meaning can be linked to a
multiplicity of fields of experience. It is also apparent that musical meaning has neither the function of
dividing the events of the world into conceptual categories, nor the possibility of alluding to all
aspects of human experience, many of which seem to be outside of its domain. But it is important to
emphasize that the relationships between musical meanings and the musical structures that have the
task of conveying them, are not left to individual preferences, but are actually governed by socially
spread conventions, even though they are learnt simply by exposition to hearing. The latter condition
explains, among other things, why young children are able to interpret some elementary aspects of
music, an opportunity that is exploited in music education (Tafuri 1987). More generally speaking, it
is plausible to deduce from this mass of data and cues, that music may be conceived as a sort of
language and may therefore have certain linguistic properties, above all that of being a semiotic
system. So, in semiotic terms, it might be said to possess "signifiers" (Saussure 1922) or a "level of
expression" (Hjelmslev 1961) which must be able to evoke, or must be linked to, the "signified" or a
level of "content". A specific study devoted to the psychological nature of such links has not yet been
made, but we can tentatively observe that they are characterised by aspects of similarity between the
structure of the sounds and that of the human experiences to which the sounds allude. This is one of
the reasons why the terms "allusion to" or "evocation of" seem to be more apt for music than the term
"meaning". In verbal language this relationship is normally defined as "arbitrary" because there is no
similarity between the signifier and signified. Only some kinds of words (e.g. onomatopoeias) are
linked to their meanings through relationships of similarity, that is, in a "motivated" form. In music
the majority of the relationships is to be conceived as "motivated". The next section will try to explain
what the structural rules of this musical "language" are and how they can produce forms of
"motivated" similarity through the aspects of meaning to which they are able to allude.
Musical grammar
The author of the present paper, together with two colleagues, has recently published (Baroni,
Dalmonte, Jacoboni 1999) the results of an investigation devoted to the systematic study of the rules
of a particular style of music. Since such results are pertinent to our discussion about musical
semantics, it is necessary to make a short digression at this point in order to give information about
the aims and the contents of the research in question. The repertoire chosen for the study were arias
taken from the chamber cantatas of an Italian musician from the XVII century, Giovanni Legrenzi.
The analysis took its starting point from the observation that a number of musical events always
involved the same structural features, in the same order and in the same quantity: one example of such
an event is the dimension of the phrases, others are the forms of the cadences, the correspondence
between verbal and metrical accents, the nature of the harmonic sequences, and so on. All seemed to
happen as if the composer of the music had been following particular rules and respecting them while
composing. This is nothing new: the aim of analysis is always to find regularities, and a composer
always follows rules when composing. But what exactly is the nature of such rules? Are they
describable? A particular problem arose at this point: while the structural regularities are always
concretely observable in a text, the rules are not in the text, but in the mind of the composer. From an
analytical point of view they are mere hypotheses. The musical situation is quite similar to that of
linguistic grammar. In linguistic practice the only way to validate the hypothesis of the empirical,
psychological existence of the rules of the grammar is to demonstrate the possibility of their
producing phrases that speakers of the language can judge grammatically correct. In our case the
grammar ought, in theory, to produce music that a competent listener might judge as corresponding
reasonably to Legrenzi's arias. So we decided to construct a system of hypothetical rules able to
"describe" exhaustively the structures used in the repertoire, and to feed the rules into software
producing "Legrenzian" music.
We expected that the resulting arias could be judged musically "correct". A judgement of
"grammatical correctness", however, soon appeared implausible: what is correctness in music? Only
the prohibition of parallel fifths and octaves, and other such prescriptions? The results of the computer
taught us that in music an "ordinary" or "daily" language does not exist as distinct from the "artistic"
one, and that all rules have exclusively stylistic aims. More precisely this means that another
fundamental difference exists between language and music: the former has a phonetic system aimed at
producing lexical items, linked to conceptual meanings, and has syntactical rules governing the
relationships between the lexical items of a phrase. The "grammatical correctness" depends on this
specific system of rules which are applied in ordinary language. Only outside this use can a language
convey other "meanings" aimed at evoking living experiences and not at producing conceptual
communication: meanings evoked for example by the rhythm of the discourse, by its phonetic
properties, by the use of connotations, by all the resources of the poetical uses of the language.
Grammatical correctness refers to the first of the two levels, that of daily language, and not to the
second "poetical" level. Music, however, possesses exclusively this second level. This does not mean,
however, that music does not have rules and does not have syntax. It means, though, that its rules and
syntax have different functions from that of ordinary language, and that the concept of "grammar" is
different in the two systems: when referred to music it does not have the same meaning. In any case
we decided to accept this transposition and to use the same term, in order to emphasize that musical
rules are widely based on distinct entities. Only well definable features are, in fact, considered in
musical grammar: metrical accents, durations, pitches, intervals, chords, scales, degrees of the scales,
and so on. This renders its rules exactly measurable in the same way that the rules of functionally
different linguistic grammar are exactly measurable.
The most interesting things we learned from the computer outputs were that the texts mechanically
produced by the machine not only had structures similar to those of the initial repertoire, but also
gradually tended to reproduce its expressive characters: the computer arias are not arbitrary
successions of notes, but are compositions that can make sense to a listener, and that in some cases
he/she might even interpret as being "true" arias of the XVII century. This singular phenomenon gives
a concrete demonstration of the distinction proposed by Umberto Eco (1979) between intentio
auctoris (the expressive intentions of the "empirical" author of a text) and intentio operis (the contents
a text can communicate, independently from the intentions of its author). In our case, it was quite
certain that the composer of the mechanical arias, the computer (which made its choices on the basis
of a series of random numbers), had no expressive intentions. Any such expression that the listener
may have felt was simply included in the rules (intentio operis). The musical "meaning" of our
mechanical arias (the interpretation allowing the listener to find plausible references to human
experiences) did not come from a lexicon but from structural rules, that is, from a system of links
between certain features, that were known and commonly used in Legrenzi's time. It is therefore
plausible to imagine that every epoch and every style has its specific rules, different from those of
other epochs and styles.
But other more specific aspects of musical meaning emerged from the Legrenzian research. The most
important is the distinction between different categories of rules. In our experiment we fed just one set
of rules into the software: those which were common to all the arias of the repertoire. The "expressive
meanings" of the outputs are exclusively the result of these "general" rules. We also identified, in
some detail, but without giving them to the computer, more "specific" rules which are referred to
single fragments of an aria and that modify some aspects of the "general" rules. Normally such
fragments were linked to the words of the lyrics. In one of them, for example, the poet spoke of a
snake in the bosom of the lover (Nutro il serpe nel mio seno) and many aspects of the structure
became unusually tortuous, by means of particular modifications of the ordinary rules; in another, the
words alluded to a heroic situation (Mia ragione all'armi all'armi) and here the music adopted
traditional "battle" models common at the time. In such examples particular rules momentarily took
the place of other more "common" ones. We called the former category, " specific expression rules",
and the latter " diffuse expression rules". The difference between the two categories of rules is not just
a question of their more or less extensive application. It is above all a question of their nature: a
diffuse form of expression, the common expressive character of all the arias of Legrenzi, though
undeniably present, is difficult to describe in verbal terms and so its nature, far from appearing
immediately "semantic", seems to be nearer to what R.Jakobson (1970) or N. Ruwet (1972) called
"self-significant". A specific form of expression, on the contrary, is easier to define and has aspects
more similar to those of a conceptual meaning. The distinction corresponds to what, in terms of
semiotic theory, has been called internal and external semantics, respectively (Nattiez 1989) or
congeneric and extrageneric meaning (Coker 1972). An other important category of rules concerns
those which are applied not to the whole repertoire, nor to a small fragment of an aria, but to a single
composition. For example, there are particularly brilliant or particularly sombre arias where the
common rules are applied, but with different percentages of choices for certain rhythmic, melodic,
harmonic traits. In this case we are dealing with "meta-rules" which govern other rules without
changing them. The semantic results of meta-rules are not so well definable as those of "specific
expression", but not so vague as those of "diffuse expression". Their presence shows that the
difference between specific and diffuse expression might be considered a sort of "continuum" and not
a sharp distinction between two separate levels.
It is important to repeat that the choices made by the computer were not oriented by semantic inputs.
This means that the choices were always made randomly from among the permitted rules; in other
words, their only task was to respect the structural possibilities of the grammar and not to look for
particularly meaningful results. The arias composed by the software showed that such rules
automatically ensure interpretable links between different structural features and produce expressive
effects in all listeners who have stylistic competence. Obviously, the results obtained through the
random choices of the machine were by no means original. A listener could accept one or two arias.
But one hundred or one thousand arias (which the computer can easily compose and eventually did
compose) produce a sense of weariness due to the absence of creative interest. The main result of the
experiment, however, is that the simple application of structural rules is able to produce a particular
kind of musical meaning: the arias can be considered "expressive". The "analogical" relationships
with human experiences, in this particular case, are not easy to describe. They may be felt as allusions
to the gestural behaviour of the epoch, to a vague evocation of the laments of an unhappy lover, or to
his tendency to use the lament as an elegant form of seduction. But their verbal interpretation is not
strictly necessary in order to understand that the arias "make sense".
A musical grammar can be defined as a set of expressive possibilities that composers and listeners
have at their disposal: the former in order to compose, the latter in order to interpret music. Such
possibilities imply two different resources: a system of structural rules fixed by the culture of a given
epoch (e.g. more or less explicit and conscious rules of rhythm, melodic structure, counterpoint,
harmony, form, etc.) and, at same time, a system of semantically accepted conventions (even less
explicit and conscious than the former), which allow the listener to interpret musical structures as
analogical allusions to human experiences. Both structural rules and semantic conventions apply to
specific and particular features (or systems of features) of the structure (rhythm, melodic profile,
harmonic sequences, and so on) and not to their global organisation as concretely perceived during a
performance. Their organisation is always left to the free invention of the composer, who has to
respect the structural rules and the semantic conventions but is not obliged to adopt pre-fixed
solutions. He /she must use rules and conventions consisting of different mixtures and doses, in order
to obtain interpretable results which correspond to his/her expressive intentions.
Finally, it should be added that the rules of musical grammar must be well distinguished from the
psychological procedures that are to be applied in composing and listening activities. Musical
grammar (like verbal grammar) is a non-temporal structure, an abstract system of rules that we can
learn, we can list and we can describe. But when we use them in making music (or in speaking) we
have to solve new problems: those connected with the organisation of events in time (for example,
problems of perception and of memory) that do not coincide with the rules of the grammar. In
listening activities the system of cues studied by Irène Deliège (e.g. in Deliège-Mélen 1997)
presupposes the knowledge of grammatical rules and their presence in the long term memory, but the
research is devoted to quite different phenomena. The same must be said for the less studied, but no
less important, problem of composing activities (Baroni 1999). In my opinion the presence of the two
distinct domains and their specific relationships (grammar and its application) has not been considered
with due attention by research.
Musical hermeneutics
The fact that the composer is free to mix and dose the structural events and the semantic conventions
without being obliged to adopt pre-fixed solutions has important consequences on the concept of
meaning: in music we never have to deal with unequivocal meanings, but only with a set of different
cues that allow interpretation. Each of such cues is based on non-arbitrary semantic conventions, but
their overall effect can leave margins to partially different interpretations. Within the processes of the
interpretation, however, another problem arises: in many cases it is quite evident that the allusions to
human experiences evoked by music mat be of very different nature. In other words, the "motivated"
relations of similarity between musical form and musical meaning are not arbitrary, but not
unequivocal. For example, two musical phrases organised as antecedent and consequent are thought
of (as implied by their names) as if they were two parts of a sort of logical thinking and their
interpretation can be based on this kind of "meaning". But in other cases (for example in Italian
terminology) they are named "proposal and response", in accordance with the XVIII century idea that
music was a sort of dialogue (Rosen 1971). In the same years, however, Giuseppe Carpani in his
Haydine (1823) also spoke of architectural symmetries that music could create in examples like these;
the idea of tension and distension is extremely common in musicological literature, whereas
mediaeval terminology, in similar situations, used the terms "open" and "closed". What, in short, does
the pairing of such terms mean? How can any coherent interpretation be reached? Are we to consider
dialogues, architectures, gestures, philosophical thinking, or perhaps doors? How is it possible to find
some sort of unity in this irrational conglomeration?
All listeners to music are intuitively well aware of the existence of a coeherence, and scientific
explanations have been proposed to support this fact. For example Michel Imberty (1976) says that
musical listening always implies the use of what Piaget called "affective schemes": this means that a
listener immediately feels that an affective relationship is present among the different objects a music
can evoke, that they are unified by a symbolic reference to a common emotion and are not necessarily
definable in conceptual terms, but perfectly understandable in an intuitive way. In this case the
listener does not use logical thinking (which would reveal the previously mentioned conceptual
inconsistencies) but what Piaget (1945) calls symbolic thinking. And as far as symbolic thinking is
concerned, tension and distension, proposal and response, opening and closure, and so on, can have
the same affective function. This is a way of understanding reality that children normally use when
they play their symbolic games, one that Mediaeval thinking adopted to interpet the signs of heavenly
presence in the world (Eco 1981), that many cultures (including those of industrialized countries)
normally use in a vast variety of different ritual occasions (Firth 1973), that all of us adopt in our daily
life when we use metaphors, and that all forms of art demand from their participants. A number of
similar situations are, in fact, well known in the field of psychology. There are theories describing the
"semantics" of emotions that show how the relationships between different affections are easier to
describe through topological schemes than through conceptual definitions (Galati 1993): this means
that an analogous emotional quality can pertain to different and even apparently opposite objects
(Imberty 1979). In other words, the domain of the conceptualization of reality and that of emotional
responses to reality are mutually independent. Thus, the various different interpretations of a piece of
music not only have margins of freedom due to the varying levels of importance attached to its
musical structure, but above all need to be made in a form that is not exclusively conceptual.
After this panoramic look at the nature of musical meaning and the musical rules able to convey
meaning, we can now return to the initial problem of so-called absolute music with some concluding
remarks. Once again the comparison with verbal language can prove useful in this discussion. When
we speak, we all know from the start what we want to say and then look for the right words. In music
it is not necessarily the same: the contents do not always pre-exist the music that manifests them.
Musical creativity can imply that a composer finds particular aggregations of musical features that
have the power of alluding to new significant emotional situations and on this basis composes
interesting new music. So it may often happen that the aesthetic quality of a piece of music is not a
question of finding the right way to convey already known affective schemes, but rather of creating
affective situations which were not previously existent. Such situations, of course, do not possess
words able to designate them: they can be expressed even though they are not yet conceptualized and
perhaps will be never conceptualized. There are many examples of objects and living experiences that
we know very well without being able to name them. It is by no means necessary to give a name to
everything that we live. According to D. Raffmann (1993) there are experiences that do not have a
name, that are in-effable.
S. Davies (1994) used the term "musical emotions" to describe the particular category of musical
meanings that are present only and exclusively in music. When Mendelssohn, in a famous letter to
M.A. Souchay (Oct. 15th 1842), stated that his musical thoughts were much more specific than the
words that could describe them, he was referring exactly to these kinds of musical meanings. This
particular concept of "musical emotion" is the only way of giving sense to the ancient formalistic
theories: when Jakobson and Ruwet said that music signified itself, they made an apparently absurd
statement if "itself" is understood in terms of structures (a note signifies a note). But it is not at all
absurd if "itself " signifies " a content that only music can give", a statement that can, moreover, be
easily extended to all other artistic languages: the meaning of a face in a drawing by Picasso is
produced by those very lines and does not preexist those lines. Thus, more generally speaking, the
problem of explaining the nature of "absolute" music, and the controversies regarding the formalistic
tendencies in the aesthetics of music, are not musical, but verbal problems.
My intention, of course, is not only to explain "absolute music", and formalistic tendencies. The point
of view I intend to assume is more general and primarily based on the distinction between diffuse and
specific expression rules. This continuum of possibilities, this mixture of allusions to not always
"effable" affective schemes and to "effable" images of other better known objects of experience (as
the gestures, the physical sensations, the references to social and cultural practices described in the
first part of this paper) is the ordinary condition of Western music tradition. Often it is note easy to
make clear distintions inside these subtle mixtures of grammatical rules and inside these delicate
dosings of semantic-analogical conventions. So, in many cases the interpetation can become
probematic, particularly when we have to use words in order to interpret musical meanings.
We might define this approach to meaning as a "hermeneutic" one.There is a long philosophical
tradition of hermeneutics as the art of interpreting texts by means of words: initially the term was
referred to the Bible, but during the XIX century it was extended to history and to other fields; a
famous and controversial book by Hermann Kretschmar (1903) applied it to music. It would seem
more opportune to adopt this term instead of the more common "interpretation", since the latter is
normally reserved to musical performance. It can be observed, however, that there is no difference in
the nature of the two kinds of interpretation: both of them are efforts to recognise the deep nature of
musical meanings. The only difference concerns the way of manifesting them: the possibility of using
sounds themselves is much more subtle and adaptable than that of using words. This particular
condition, typical of Western music, is not present in musical traditions that do not make use of
notation, such as jazz or many ethnical cultures. In our music, performance is explicitly thought of as
a particular language, distinct from that of composition, and aimed at "interpreting" compositional
language. This is clearly shown, for example, in the rules of performance proposed by Johann
Sundberg and his collaborators (1989) which are conceived as totally dependent on the written
structure (the composition) they have to make perceivable. With words this process becomes more
difficult. Words are cumbersom, they were born for different purposes. When applied to music their
main problem is to escape from their conceptual nature, to transform themselves into metaphors, to
become a sort of poetry speaking of music. I am personally convinced that musicology must be and
must remain a scientific discipline, but I am also aware that one of its subfields, musical
hermeneutics, needs to use verbal language in a non scientific way. The important thing is to maintain
a clear theoretical distinction between the two functions (something that is not always done
nowadays); that is, to avoid arbitrary overlappings between the domains requiring metaphoric and
even ambiguous language and those necessitating the use of scientific clarity along with the
systematic refusal of ambiguities.
REFERENCES
Agawu K., Playing with signs: A semiotic interpretation of classic music, Princeton University Press,
Princeton 1991.
Baroni M., Dalmonte R., Jacoboni C. , Le regole della musica. Indagine sui meccanismi della
comunicazione, EDT, Torino 1999.
Baroni M. "Musical grammar and the study of cognitive processes of composition", Musicae
Scientiae, 3/1 (1999), pp. 3-22.
Barthes, Le degré zéro de l'écriture, Seuil, Paris 1953.
Bent I (ed.), Music analysis in the 19th century (II: Hermeneutic approaches), Cambridge University
Press, Cambridge 1994.
Berlioz H., Grand traité d'instrumentation et d'orchestration, Lemoine, Paris 1843.
Bismarck G. von, "Timbre of steady sound: A factorial investigation if its verbal attributes", in
Acustica , 30 (1974).
Carpani G., Le Haydine, ovvero lettere sulla vita e le opere del celebre maestro Giuseppe Haydn,
Padova, Tipografia della Minerva 1823 (anast. reprod. Forni, Bologna 1969).
Clarke E., "Issues in language and music", Contemporary Music Review, 4 (1989), pp. 9-22.
Coker W., Music and meaning; A theoretical introduction to musical aesthetics, The Free Press, New
York 1972.
Back to index
Back
Proceedings
Complexity of The poshs and the From my hand to Intonation of Musical gesture, Effects of
brain activity slangs: the effects your ear: the faces embedded analysis and extra-music
reflects of social identity of meter in intervals: listening narrative on
complexity of on music learning performance and adaptation to two emotional
music perception tuning systems response to
(Abstract) music
(Abstract)
Looking for the An inherited Psychoacoustic Intonation and Interaction Interplay and
anchor points for musical self? determinants of interpretation in between melodic effects of
musical memory Exploring musical the use of time in string quartet structure and melodic structure
ability within and music performance: The performance and performance
adoptive family performance case of the flat variability on the on emotional
environment leading note expressive expression
quantities
perceived by (Abstract)
listeners
(Abstract)
Afternoon
S1 S2 S3 S4 S5 S6
Thematic Symposium: Thematic Symposium Thematic Thematic
session: session: Session: Session:
Categorical Music in popular
Music as an aid rhythm perception Cognitive culture and Pitch perception Music and
to learning and quantisation processes in everyday life. evolution
performance Chair:
Chair: Conveners: Krumhansl,C. Chair: Bannan,N.
Convener:
Gruhn,W. Chair: Rink,J. Hargreaves,D.,
Jansen,C.
North,A.
Chair: Repp,B.
Chair:
Folkestad,G.
Discussant:
Miller, L.K.
Transactional How identification Do the principles Poular music: a Processing of The memetics of
cognition in of rhythm of expert memory persuasive and musical intervals music and its
music: categories apply to musical neglected art in the central implications for
relationships depends on tempo performance? form? auditory system: psychology
between aural and meter an event related
and potential (ERP)
tactile/haptic study on sensory
experiences consonance
Long -term Rhythm Melody lead in Expectancy based The role of motion Why are we
effects of music categorisation skilled piano model of melodic perception in musical? Support
instruction on acuity as a performance complexity judgement or for an
kindergarteners manifestation of relative pitch evolutionary
spatial-temporal rhythmic skills theory of human
reasoning: a (Abstract) musicality
logitudinal field (Abstract)
study
(Abstract)
Symposium Introduction
In this frame the first contribution by N. Birbaumer and A. di Gangi investigate the relation between
complexity of brain activity and complexity of music. M. Olivetti Belardinelli and collaborators
investigate the influence of salience and tonality on recognition memory. M Imberty shows the
influence of subjects' cognitive style and spontaneous rhythm on the establishment of perceptual
hierarchies. C.X. Rodriguez assesses the age related relationships between performance recognition
and fragment recognition in children, obtaining a richer profile of children's musical thinking.
References
Birbauner, N., Lutzenberger, W., Rau, H., Braun, C., & Mayer-Kress, G. (1996). Perception of music
and dimensional complexity of brain activity. International Journal of Bifurcation and Chaos, (2),
267-278.
Elman, J.L. (1990). Finding structure in time. Cognitive Science, 14, 179-211.
Fotheringhame, D., & Young, M.P. (1997). Neural coding schemes for sensory representations:
Theoretical proposals and empirical evidence. In M.D. Rugg (Ed.) Cognitive Neuroscience (pp.
47-76). Hove East Sussex: Psychology Press.
Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge: MIT Press.
Olivetti Belardinelli,M., & Rossi Arnaud, C. (1999). Recollection and familiarity in recognition
memory for musical themes. In Proceedings of the XI Conference of the European Society of
Cognitive Psychology, Gand: Academic Press, 1999.
Riemann, H. (1914-1915). Ideen zu einer "Lehre von den Tonvorstellungen". Jahrbuch der
Musikbibliothek Peters, Leipzig.
Back to index
Symposium introduction
Symposium Rationale: Research into musical behaviour has tended to focus on individual factors such as motivation
and has usually relied on counting the occurrence of specific behaviours over time to determine their salience in either
the development or continuation of musical engagement. The current symposium offers an alternative view of musical
behaviour, considering it primarily within its social context and how interpersonal and musical interactions impact
upon the individual participant's sense of self.
Aims: The symposium aims to offer a fresh insight into musical phenomena. Working from a primarily social and
psychodynamic perspective, the current symposium aims to explore how an individual's musical identity is influenced
by different forms of social interaction. The first paper explores children's relationships with their parents in their
development of a musical identity. The second paper focuses specifically on the issue of adoption and how an adopted
child develops a musical identity in relation to parents and other family members. The third paper looks at how social
networks influence work on specific musical tasks - creating a composition. Finally, the fourth paper explores how
Music Therapy interactions are used for a client to monitor a changing self concept as a degenerative illness advances.
Back to index
Proceedings paper
Introduction
Music stimuli produce emotion-specific physiological changes that reflect their emotional contents
(Krumhansl CL, 1997). Emotions expressed by musical stimuli are associated with their valence and
arousing qualities (North AC et al, 1997). Visual stimuli presented together with musical stimuli may
amplify the existing positive emotions (Wallbott HG, 1989). It is also important to highlight that
musical creativity and performance is closely linked to emotional associations (Lund NL et al, 1994).
Neurophysiological measures such as EEG-recordings and -analysis may provide more objective
measures of neural changings occurring during exposure to musical stimuli than psychological ones.
EEG studies also showed differences between musicians and non-musicians in musical processing.
Musicians show higher amplitudes in auditory evoked potentials at the fonto-temporal lobe, than
non-musicians (Hibler N et al, 1981). Mismatch Negativity (MMN) -a component of evoked
potentials- indexing the preattentive detection of change in stimulus patterns, is larger in musicians.
That indicates improved sensory memory function, suggesting that cognitive components of
musicality (defined as the ability to temporally structure auditory information) are based on neural
mechanisms already present at the preattentive level (Tervaniemi M et al, 1997). Musicians show
changes in both hemispheres, and changes in brain activity involve larger brain regions in musicians
than in non-musicians. In addition, musicians show changes in the frequency band between 18 and 24
Hz, non-musicians between 13 and 18: that may suggest different strategies in processing musical
information between musically trained and untrained subjects (Petsche H et al, 1985). Musicians are
also faster than non-musicians in detecting auditory incongruities, their brain waves differ from
non-musicians and as a function of their familiarity with the melodies and the type of incongruity
(Besson M et al, 1994).
Chaos theory and non-linear dynamics provide a useful instrument to analyze EEG recordings. Such
instruments have been increasingly used in the last 30 years in order to describe and explain the
dynamics of many biological phaenomena. Non-linear dynamics has been successfully used in
medicine (Chialvo et al, 1987; Garfinkel, 1992; Guevara et al, 1981; Holstein-Rathlou et al, 1994);
biology (Olsen et al, 1990); physics (Kowalik et al, 1988), chemistry (Rössler et al, 1978): since 1985
there has been an increasing interest in EEG-analysis with non-linear algorhytms (Rey et al, 1997).
According to Kaplan (Kaplan et al, 1995) chaotic behavior is defined as "aperiodic bounded dynamics
in a deterministic system, with sensitive dependence on initial conditions".
A set of finite differential equations can be used in order to describe the dynamics of biological
systems that exibite chaotic behavior (Hodgkin et al 1939, 1952): through time-series analysis the
system can be reconstructed . The method is very useful, especially for non-stationary signals such as
the EEG (Elbert et al, 1994). In order to reconstruct long temporal series, specific algorhythms are
used (Lutzenberger et al, 1992a, 1992b). The number of indipendent variables required to reconstruct
the whole time series is defined as "dimensionality". An attractor is defined as a phase-space subset to
which the phase space trajectory may converge. The correlation dimension allows to analyze the
attractor dimensionality. A frequently used measure for dimensionality is the D correlation
2
dimension.
Dimensionality can be a useful tool in understanding brain dynamics. Dimensionality may be defined
as the number of brain structures functioning at the same time. Complexity changes in brain activity
due to different conditions - as sleep, epilepsy or Alzheimer disease- or different tasks show different
brain dimensionality. (Babloyantz A et al, 1986; Babloyantz A et al, 1986; Birbaumer N et al, 1995;
Fell J et al, 1993; Jeong J at al, 1998b). The higher the dimensionality, the more structures or
cell-assemblies are involved. Cell-assemblies are defined as group of cells with plastic synapses
distributed with any possible distance across the neocortex, with excitatory connections among each
other; the excitatory connections of a particular assembly are stronger than the background assemblies
responsible for another mental activity. Assemblies are formed through contiguity: simultaneous
arrival of two impulses or cascades of impulses at plastic synapses strenghten their connection, and at
the next occasion the input of only few synapses are capable of firing the post-synaptic unit (Hebb,
1949).
How can non-linear dynamics shed light on neural processing of music? Recent studies show that
emotional responses to music are reflected in the electical activity of the brain. Brains which raport
positive feelings to musical stimuli show decreased chaotic electrophysiological behavior. In addition,
rhythmic variations contribute much more to tha subjective emotional response to music than the
melody variations (Jeong J et al, 1998a).
The topic of this work was to show whether particular features in musical stimuli are reflected in
changes in dimensionality in EEG-recordings. Our question was whether complex music stimuli
evoke more complex brain responses (i.e. activity of more and extended cell-assemblies), whether
rhythm or melody have an influence, and whether education and musical preferences play a role in
brain complexity.
Experiment
Subjects
Eighteen healthy and right handed males aged between eighteen and thirthy-one years (mean age:
22,0 years). All subjects were free of any medication. Prior to the experiment, subjects were informed
about all aspects of the experimental procedure and then asked to sign an informed consent according
to the Helsinki convention on human studies.
Methods
The experiment lasted for about 50 minutes and the procedure was identical for each subject. Subjects
sat in a sound proof chamber on a comfortable reclining chair. At the beginning, they received a
headphone; at the same time EEG electrodes were attached to their head. The electrodes were placed
at the following placements: F3; Fz; F4; C3; Cz; C4; Pz; P4. In addition, two electrodes were placed
on the mastoids. All channels were amplified with a bandwidth from 0.016 Hz to 70 Hz and sampled
at a rate of 256 Hz. Horizontal eye movements were recorded and EEG was corrected for ocular
artefacts. The EEG was recorded using Ag/AgCl electrodes according to the international 10-20
system. The experiment consisted of three blocks. Each block contained 12 trials which lasted for 15
s. Trials were separated from each other by intertrial intervals randomly varying between 8 and 15 s.
The random variation of the intertrial interval was inrtoduced in order to prevent systematic EEG
variation associated with expectancy and preparation. During a single trial the acoustic stimuli were
presented without any other stimulation, subjects had to attend to these stimuli. After each trial,
subjects had to perform two subjective ratings, regarding (a) the subjective interest elicited by and (b)
the subjective complexity of the stimuli on a 1 to 9 analogue scale, with 1 indicating low
interest/complexity and 9 indicating highest interest/complexity. After the experiment and after
removal of the electrodes, subjects performed a short questionnaire asking:
1. How they estimate their own musical capability.
2. How many hours a week they perform music
3. How many hours a week they listen to music
4. How much they like classical music
5. How much they like popular music
6. Which instruments they perform
7. How they estimate their rhythmic capability
8. How they like dancing
9. How they like Jazz
10. Which kind of musical education they had.
Each question had a scale range from 1= very low to 5=very high. The questionnaire evaluated
musical habits of each subject.
During block 1 (mode "melody"), only the pitch of the piano sounds was varied (melodic complexity)
with rhythm kept constant. During block 2 (mode "rhythm"), only the rhythm of the wood-drum like
sounds have been varied with tone frequency being constant. During both blocks, three different kinds
of trials (four trials of each complexity condition) presented in pseudo randomized sequence. Three
degrees of complexity were introduced the first condition consisting of periodic, the second of chaotic
and the third of stochastic (i.e. without any set of deterministic laws) tone sequences. The third block
(mode "melody and rhythm") also contained 12 trials which were separated into three conditions. In
this block, variation of melody and rhythm was combined. Condition 1 contained periodic melody and
periodic rhythm, condition 2 periodic rhythm but stochastic melody, condition 3 stochastic rhythm
and stochastic melody. The computer-synthesizer generated sequences of stimuli were recorded on
analogue tape and replayed from a tape recorder. Sequence of stimuli and intertrial intervals were
identical across subjects.
Apparates
Acoustic stimuli were created using a Yamaha syntetizer connected via an Opcode Studi III Midi
interface to a NeXT computer. For the generation of the Midi signals we used the software package
Chaos.app originally written by R. Bidlack and modified by ED Erwin.
Data Analysis
For each sequence an interval of 16 s was selected for the analysis. Thus the lenght of each EEG-trace
was 2048 points. For every EEG trace the following measures were calculated:
(i) EEG alpha power, obtained from the average log power in the range from 8 to 12 Hz. The power
spectrum was calculated by averaging the Fourier transforms of 15 overlapping 2s segments (256
points), using Parzen windows on the 2s segments.
(ii) EEG beta power, calculated as the average log power in the range from 14 to 30 Hz.
(iii) The state space dimension of the EEG (D ), the singular value decomposition was based on the
2
autocovariation function with time-lags ranging from 0 to 32 points corresponding to 0.25 s.
ANOVA (Analysis of Variance) were calculated with the between factors electrode rows (left, middle,
right), electrode columns (frontal, central, parietal), sound complexity (periodic, chaotic, stochastic),
and type of modulation (rhythmic, melodic and both).
Two approaches were used: the first analysis did not differentiate between the musical background of
the subjects and included all three types of modulation. A second analysis was based on the subjects’
preferred type of music: seven preferred classical music and nine subjects preferred popular music.
We expected a differential responsiveness of these groups to the type of modulation. For this analysis
we used only the two pure types of modulation (mode "melody" and mode "rhythm").
Results
The analysis of the EEG dimension showed significant effects of electrode row (F(2, 30)=9.8; ε
=0.89, p<0.001) and of electrode columns (F(2, 30)=4.5; ε =0.91, p<0.05) which demonstrate a
non-uniform distribution over the head. With respect to the experimental variation, we found a
significant interaction of the electrode rows and complexity (F(4, 60)=7.1; ε =0.66, p<0.002). The low
dimensional chaotic music induced a reduction of the dimension mainly in the frontal electrodes,
compared to the periodic and the stochastic music, which showed no significant differences. The
parietal electrodes showed no significant effects of complexity while the central electrodes showed
moderate effects similar to the frontal electrodes. These effects were confirmed by post-hoc t-tests. No
effects of the type of modulation were found.
The analysis with the factor "music preference" (classical vs popular music) and the restriction to the
two pure types of modulation (melody vs rhythm) confirmed the above interaction of electrode rows
and complexity (F(4,56)=5.7; ε =0.51, p<0.008). In addition, we found a significant interaction
between groups, complexity, and type of modulation (F(2,28)=5.1; ε =0.98, p<0.02). Subjects
preferring classical music responded with a reduction of the EEG dimension if the melody modulation
was chaotic while subjects preferring popular music showed this effect when the rhythm was
modulated. For the complexity rating which was performed immediately after each trial, the 2 (group:
classical versus popular music preferred) by 3 (complexity condition: periodic, chaotic, stochastic) by
2 (mode: melody vs rhythm) ANOVA showed a significant effect of complexity (F(2,28)=5.7;
Discussion
Low-dimensional chaotic sequences produce a significant reduction in dimensional complexity
compared to both periodic as well as stochastic sequences. This was documented particularly in the
prefrontal regions. The phenomenon occurs for the melodic sequences in both groups whereas in the
"popular music" group it was observed only in the case of the rhythmic sequences. If chaos levels
reflect the number of active cell-assemblies than it could depict the aesthetic experience. Subjective
interest may at least in part be determined by stinulus complexity, whose neurophysiological
equivalent may consist in EEG complexity. Subjective interest, particularly in the musically
sophisticated subjects is reflecting the richness or diversity of associative connections evoked by a
particular piece of music. On the other hand the majority of less educated listeners those rhythmical
modulations which obviously "pull" their brain activity in a less complex periodic oscillatory
response, shutting off all competing assemblies. The difference between the three types of music (high
and weakly chaotic and periodic) is confined to frontal brain regions. The same result was found for
variations in intelligence (Lutzenberger W et al, 1992b; Schupp HT 1994) and differences between
mental imagery and perception of diverse objects. The more intelligent showed increased dimensional
complexity in the prefrontal regions. Apart from the general explanation that production of music
seems to be an exclusive human trait, appearing as late in the phylogenetic and ontogenetic evolution
as the prefrontal cortex, mental processes such as listening to music, creative thinking, imagery,
involve dalay of immediate reinforced behavior and active working memory. Both cognitive functions
are more or less exclusively frontally located. It is therefore not surprising that realization of the
"highest" (latest evolutionary) cognitive skills used the participation of additional frontal
cell-assemblies which is expressed in an increased frontal dimensional EEG complexity.
Supported by the Deutsche Forschunggemeinschaft (DFG)
References
Babloyantz A, Destexehe A "Low-dimensional chaos in an instance of epilepsy" Proc. Natl. Acad.
Sci. USA, 83: 3513-3517, 1986
Babloyantz A, Salazar JM, Nicolis C "Evidence of chaotic dynamics of brain activity during the sleep
cycle" Physics Letters, 111A: 152-156, 1986
Besson M; Faita F; Requin J "Brain waves associated with musical incongruities differ for musicians
and non-musicians" Neurosci Lett 1994 Feb 28; 168(1-2): 101-5
Birbaumer N., Flor H., Lutzenberger W., Elbert T "Chaos and order in the human brain" Perspectives
of Event-Related Potentials Research (EEG Suppl. 44): 450-459, 1995
dimensions of the EEG and its variations with mental tasks" Brain Topography 5: 27-33, 1992a
Lutzenberger W, Birbaumer N, Flor H, Rockstroh B, Elbert T "Dimensional analysis of the human
EEG and intelligence" Neurosci Lett 143, 10-14 1992b
North AC "Liking, arousal potential, and the emotions expressed by music" Hargreaves DJ Scand J
Psicol 1997 38 (1):45-53
Olsen LF, Schaffer WM "Chaos versus noisy periodicity: alternative hypotheses for childhood
epidemics" Science, 249: 499-508, 1990 Robazza C, Macaluso C, D’Urso V "Emotional reactions to
music by gender, age and expertise" Percept. Mot. Skills 1994 Oct; 79(2): 939-44;
Petsche H; Pockberger H; Rappelsberger P "Music perception, EEG and musical training" EEG EMG
Z Elektroenzephalogr Elektromyogr Verwandte Geb 1985 Dec; 16(4): 183-90
Rössler OE, Wegman K "Chaos in the Zhabotinskij reaction" Nature, 271: 89-90, 1978
Schupp H T, Lutzenberger W, Birbaumer N, Miltner W, Braun C "Neurophysiological differences
between perception and imagery" Cognitive Brain Research 1994 2,77-86
Tekman HG "A multidimensional study of preference judgements for excerpts of music" Psychol Rep
1998 June: 82 (3Pt1): 851-60
Tervaniemi M; Ilvonen T; Karma K; Alno K; Natanen R "The musical brain: brain waves reveal the
neurophysiological basis of musicality in human subjects" Neurosci. Lett. 1997 Apr 18, 226 (1): 1-4
Terwogt MM, Van Grinsven F "Recognition of emotions in music by children and adults" Percept.
Mot. Skills 1988 Dec, 67(3): 697-8
Wallbott HG "The ‘euphoria’ effect of music videos-a studyof the reception of music with visual
images" Z. Exp Angew Psychol 1989: 36(1): 138-61
Back to index
Background. Unlike previous research into influences on musical development which have looked at
the child and usually only one parent's account of that child's musical life, this study examines in
detail the identities of all family members in order to produce an integrated picture of the underlying
dynamics surrounding music within the home.
Aims. This study examines generational influence claiming that a child's musical identity and
'success' is directly shaped by their parents' social backgrounds.
Method. The study gleans best practice from clinical family therapy where it has been recognised that
to fully understand a person, the other sides of the story must be acknowledged. This entails each
member of the family telling his/her own story so that all constructions are assessed within a
qualitative quasi-anthropological framework. Using semi-structured interview techniques, this study
examines the multiple layers of interaction within twelve families where all have at least one child
learning an instrument.
Results. The concept of script patterning can be used to explain the way in which a child's musical
identity is influenced by his/her parents. Although peer pressure is undoubtedly significant, it is
parents' attitudes to music that have been determined by their own social makeup, which are found to
be fundamental in fashioning their children's musical success.
5. Conclusions. Although it cannot be disputed that significant others outside the home context play a
key role in developing a child's musical interest and skills, the social identities of the parents'
themselves from their respective Families of Origin seem directly instrumental in fashioning the
musical outcome of their children.
Back to index
Proceedings paper
From my hand to your ear: The faces of meter in performance and perception.
Caroline Palmer and Peter Q. Pfordresher, Ohio State University
Background.
Although much theoretical and experimental study in music cognition has examined the role of meter
in perception, less work has examined the role of meter in music performance. On the one hand, many
studies have documented that perception of meter may arise from a variety of acoustic cues, and
metrically regular patterns allow more accurate perception of and memory for music (eg. Jones &
Pfordresher, 1997; Povel, 1981). Such evidence has inspired theories to posit that the perception of
meter arises from attention to surface-level periodicities in a sequence that generate expectancies by
driving internal rhythmic oscillations (Large & Jones, 1999). Other work has suggested that meter
may serve as a well-learned abstract schema that guides listeners' interpretation of strong and weak
beats even in the absence of surface cues (Palmer & Krumhansl, 1990); although meter stems from
the musical surface, it is not entirely dependent on surface structure.
In contrast to the perceiver's task, performers do not have to derive the meter; they know it
beforehand. Furthermore, they often choose not to emphasize the meter in terms of the acoustic cues
found useful for listeners. This may be because expressive nuances in performance are for the most
part subtle, and metrical accents interact with many other accents in terms of performers' expressive
nuances (Drake & Palmer, 1993). A performance that emphasized meter might even be considered
exaggerated and unmusical. Yet evidence from many performance situations suggests that meter is far
from irrelevant in performance. Pitch errors in experienced pianists' performances of well-learned
music reflect the tactus or metrical level considered most important (Meyer & Palmer, submitted).
The precision of performance timing, as measured by deviations in interonset intervals, suggest that
some metrical levels are more directly timed than others (eg. Shaffer, 1981). Production of event
sequences that match a metrical framework is often more accurate than production of sequences that
do not (Povel, 1981). Finally, performance of the complex meters present in polyrhythms
demonstrates that metrical complexity is an important dimension of performance (Handel, 1989).
Music-theoretic approaches to meter suggest at least two alternatives for the role of meter in musical
structure: as a time-based metric, in which metrical beats are separated by equal time-units, or as an
accent-based metric, in which metrical beats are distinguished by accents. Both approaches point to
some regularity in the pattern of events. That regularity can be defined in terms of accent strength;
meter can be described as an alternation of strong and weak accents, usually in binary or ternary
alternation between strong beats (eg. Cooper & Meyer, 1960). The regularity can also be defined in
terms of time spans; a beat is defined in terms of a point in time and the time elapsed between one
beat and the next offers a source of regularity (eg. Lerdahl & Jackendoff, 1983). The accent-based
approach assumes only an ordinal scale, which means that the downbeat is stronger than the second
beat and so on. Ordinal scales for meter make the assumption that strong beats are separated by weak
beats but do not rely on assumptions about the ratios of timespans between such beats. The timespan
approach defines meter as a periodic alternation of strong and weak beats and incorporates an
assumption of a ratio scale of events, which means that one timespan has twice the duration of
another, and so on. The assumption of an ordinal versus ratio scale for meter has important
implications: conclusions such as relational invariance of timing across tempo in performance follow
from the ratio-scale metric, but not from the ordinal-scale metric.
Theories of perception and performance tend to differ in the scale assumptions they make. Both
music-theoretic (Hasty, 1997) and psychological approaches (Jones, 1976; Large & Jones, 1999)
propose that interactions between rhythms of the preceding event structure and rhythmic
predispositions of the listener generate expectancies such as meter. These theories assume that the
perception of rhythm incorporates an underlying ratio scale and can explain findings such as listeners'
detection of timing deviations and categorization of ratio-based time intervals (Jones & Yee, 1997;
Large & Jones, 1999).
However, there is less evidence that performance can be explained by invoking similar ratio-based
models. For instance, the ratio-scale assumptions conflict with evidence that performers do not use
ratio-based time units - timing in performance is always fluctuating. Also, some work suggests that
listeners can use temporal cues other than simple ratio intervals to perceive meter (Large & Palmer, in
preparation). In addition, ratio-based theories do not explain perceived similarity among the musical
events that make up a performance; therefore, it is difficult for such an approach to explain memory
confusions that arise in performance errors, such as the common error of substituting the correct event
with one intended for a nearby location in the same musical sequence: a serial ordering error. Because
of these problems, and the simplicity of the ordinal time scale inherent in the accent-unit approaches
to meter, we rely in this paper on an accent-based (ordinal) approach to meter in performance.
Another possible metrical distinction to consider between perception and performance of music is the
role of memory. Both perception and performance require integration of musical events over time in
memory. Related work in psychology of memory suggests that behaviors as diverse as speech,
categorization, and decision-making reflect temporal constraints on short-term memory.
Developmental work suggests that older children show increased temporal persistence of auditory
sensory memory relative to younger children, as well as increased storage of phonological (verbal)
information (Gathercole, 1999). A related finding suggests that children make relatively more serial
order errors than adults (Brown et al, in press). One explanation offered for these findings is that
younger children's slower mental rehearsal leads to faster decay of information over time. If memory
demands in general play a larger role in performance than in perception, then temporal constraints on
memory may be more apparent in music performance, especially at faster tempi.
Two problems arise in the performer's memory for sequences of events: knowing what to do next (the
serial order problem) and knowing when to do it (the relative timing problem). Early work in memory
for sequences of words, tones, and other lists showed that when we are required to remember a
sequence of items, we often remember the items but not the order in which they occurred (Gathercole,
1999). This result suggested that there is an important difference between remembering the items in a
sequence, and remembering the order in which those items occurred. However, these two dimensions
may not be separate in memory for hierarchically organized sequences such as music. For example,
both speech errors and music performance errors tend to reflect sequence events intended for
elsewhere in the sequence that arose from the same phrase rather than from a different phrase,
suggesting that mistakes in serial order are not random but instead reflect hierarchical constraints on
memory for the sequence (Garcia-Albea et al, 1989; Palmer & van de Sande, 1995). The question we
address here is: is this scope constraint on how much of a musical sequence is accessible in memory
based on ordinal or ratio-scale properties? That is, are elements within a sequence related in memory
in terms of their ordinal properties (such as same or different phrase), their ratio properties (such as
twice the duration or half the duration), or both?
greatest number of accent levels in the grid. Thus, the number of total event locations in one cycle of
the grid, n, is determined here by the number of divisions from the lowest level to the highest level.
The metrical accent strength of each event, m, is represented by the length of each vector, which is
equal to the number of metrical levels in the grid with which that event coincides. Note that the
accents are not temporally defined; the grid can stretch or shrink to fit the tempo of the sequence.
Time is defined in terms of a serial (proximal) component of the model, described later.
The first component of the model, metrical similarity (Mx), defines the similarity in metrical accent
strength between sequence events. The absolute difference in metrical strength between an event at
position i and another event at distance x (position i+x) is computed and divided by the sum of the
metrical accents for the two events. That difference is subtracted from 1 to form a similarity metric, as
follows:
Equation 1:
The righthand side of Equation 1 reflects the fact that this function for metrical similarity is a form of
Weber's law. This is psychological appealing because it captures the perceptual analogue that listeners
are more sensitive to a given accent difference among a pair of events when presented in a context of
low-intensity accents, than in a context of high-intensity accents.
Figure 1: Frequency
histogram of the number of
serial order (pitch) errors by
event distance, in
performances of Bach
Prelude in D-Major (from
Meyer & Palmer, in
preparation).
Figure 2: Circular representation of metrical accent strength for 4-tier metrical grid. Metrical strength
of each event is represented by the length of each vector; concentric circles represent each level in the
grid.
Similar contrast functions have been used in vision to model the detection of luminance differences
(Michaelson, 1906). This metrical similarity measure is summed across all positions in a sequence and
divided by the total number of positions n, to generate the vector of metrical similarity values across
distances from the current event, Mx, as follows:
Equation 2:
The second component of the model, serial proximity (Sx), captures the fact that memory for sequence
events is less accessible the farther away they are from the performer's present position in the
sequence. Event strength (Sx) is assumed to be maximal at the current position and equal to 1; event
strength for other sequence events decreases both with absolute event distance (x) from the current
event and as event duration (t, defined here as seconds per event) increases in the following nonlinear
relationship:
Equation 3:
This function takes an initial activation a, a value between 0 (no activation) and 1 (total activation)
that represents temporal constraints on short-term memory. The exponent (x / t) refers to number of
events per unit time (similar to beats per minute in musical terms) and leads to two predictions. First,
the larger x is (the farther away an event from the current event), the weaker the event strength.
Second, the smaller t is (as tempo gets faster), the weaker the event strength. Thus, the serial
component represents a proximity-based combination of decay (over elapsed time) and interference
(over intervening events). Sequence events from the future and the past will decay faster as the
number of intervening events increases and as the rate increases.
The model makes a basic assumption common to many formal models of memory, that sequence
elements can be represented as vectors of relations among elements. The metrical similarity and event
strength of each sequence element at each distance from the current event are represented in M and S
vectors, respectively. Position within the vector represents comparisons among sequence events at
different positions and distances; the vector size is equivalent to one metrical cycle.
Finally, the two components of the model are combined in a multiplicative fashion to predict relative
event strength or activation for any event x at time t as the product of metrical similarity and serial
activation (Sx ⋅ Mx) function. The relative activations of sequence elements at each distance x from
the current event are then normalized to determine relative error probabilities for each sequence event.
The error probabilities for each event distance from the present event reflect the fact that sequence
events from greater distances have greater event strength in some cases than sequence events from
smaller distances. This is psychologically appealing because it reflects Garrett's (1980) caveat that
although speech errors often reflect access to sequence events from some distance from the error
location, it does not follow that a speaker has access to all intervening events. This model is the first
to make specific predictions for which elements are more or less accessible from various sequence
distances.
The model makes a further prediction for the absolute mean distance between any serial order error
and its target pitch. The mean range is computed as the weighted sum of the error probabilities at each
sequence distance multiplied by each sequence distance. This is shown in Equation 4.
Equation 4:
A final prediction of the model is that for any two tempos t1 and t2, such that t1 is less than (faster
than) t2, the mean absolute range for t1 will be smaller than the mean absolute range for t2. This
follows from the predictions of the product model (combination of metrical and serial components):
the serial component decreases activation of sequence elements from farther distances faster for t1
than for t2, in essence damping the effect of metrical similarity for events form larger distances. This
fact, combined with the fact that sequence events at closer distances are more accessible at fast than at
slow tempi, and sequence events at farther distances are more accessible at slow than at fast tempi,
account for the general prediction that events will be accessible from greater sequence distances on
average at slower performance tempi than at faster tempi. .
Palmer, Pfordresher and Brink (in preparation) tested the model's predictions in two experiments.
Pianists performed simple musical excerpts during which both practice and production rate (tempo)
were varied to test the predictions of the model. Increased practice and slower rate both led to fewer
errors. Performances at slower tempi generated a larger range of planning, with sequence elements
arising in errors from greater distances. Furthermore, more errors reflected nearby elements when the
music was performed at a faster tempo, and more errors reflected distant elements when performed at
a slower tempo. In a second experiment, novice child pianists performed the same task. The
performances showed relatively more serial order errors, consistent with psychological theories of
short-term memory processes that predict faster decay of information for children than adults (Brown
et al, in press). Because the initial activation parameter of the serial component predicts faster decay
of information in memory for children than for adults, the novice performances showed less
contribution of metrical frames to range of planning than the expert performances (these findings are
described further in Palmer, Pfordresher and Brink, in preparation).
Implications:
We have presented a model of a metrical framework based on accent similarity that guides the
retrieval and organization of musical events during performance; the metrical component of this
model highlights some features that may be unique to performance. First, the metrical similarity
component of the model predicts a symmetrical influence of past and future events relative to the
present. This feature may be specific to performance because memory for past and future events may
be simultaneously available and the weighting of memory may be symmetrical. Perceptual tasks in
contrast may reflect lower memory load but the burden of generating expectations for upcoming
events from past events may force listeners to have asymmetrical influences of past and future on
processing of current events. Second, the reliance on metrical grids requires only an ordinal-scale
assumption about metrical similarity, which is sufficient to generate predictions about memory
retrieval in music performance.
The proximity-based decay component of the model specifies how temporal constraints of short-term
memory can influence serial order of events in performance. The metrical and serial components
interact to moderate the influence of meter on a performer's scope of planning at different tempi.
These psychological consequences of time in memory for musical sequences may also extend to other
aspects of music performance, such as those related to tempo effects on musical interpretation and
relational invariance of motor programs.
Are principles such as metrical similarity and temporal proximity common across music perception
and performance? Strangely enough, similarity and proximity principles are more commonly found in
perceptual theories but rarer in performance theories. The model of metrical similarity described here
is simple because it has relatively few parameters: production rate, metrical grid size, and initial
activation strength. The first is established by the experimental conditions; the second is schematic
(general and abstract) and acquired through exposure, and the third reflects temporal constraints on
short-term memory and is posited to increase with age. In principle, none of these parameters need be
specific to music: even the metrical schemas, which might be most specific to musical styles and
periods, resemble in fundamental ways those proposed for language (Hayes, 1984), and perception of
their component periodicities has been modeled with dynamical systems (Large & Jones, 1999).
The model's simplicity also gives rise to its limitations. Perhaps most important is its adherence to
only one dimension of similarity among musical events: that of meter. Research in music performance
has documented other musical dimensions that influence similarity judgments or confusion errors,
including melodic contour, tonality, harmony, rhythm, and timbre (cf. Palmer & van de Sande, 1993,
1995). Another limitation is the model's inability to explain how the metrical grid is learned.
Statistical analyses of frequency distributions of note events across metrical positions document the
common compositional technique of establishing a meter by putting more notes in positions of
metrical strength (Palmer 1996; Palmer & Krumhansl, 1990), but how these are acquired in memory
is unsolved. More recently, dynamical systems models have been posited that track events over time
and generate expectancies for when events will occur, based on prior sequence structure (Large &
Jones, 1999; Large & Palmer, in preparation). Oscillators with adjustable period and phase
components may respond to periodicities represented at each level in a metrical grid, offering an
explanation of how metrical frameworks are acquired.
Musical meter provides an important testing ground for comparing the role of temporal sequence
structure in perceptual tasks (such as beat tracking) and motor tasks (such as music performance).
Comparisons between these domains may lead to the identification of different constraints on
attention and memory processes, as well as some similarities. For example, the psychological
constraints in the metrical similarity model described here reflect general principles that can be tested
in perceptual tasks, a necessary step in bridging the gap from the pianists' hand to the listeners' ear.
Acknowledgements
This research was sponsored in part by NIMH grant R01-45764 to the first author. Reprint requests
should be addressed to Caroline Palmer, Psychology Dept., Ohio State University, 1885 Neil Ave.,
Columbus Ohio 43210, USA, or to palmer.1@osu.edu.
References
Brown, G.D.A., Vousden, J.I., McCormack & Hulme, C. (in press) The development of memory for
serial order: a temporal-contextual distinctiveness model. International Journal of Psychology.
Cooper, G., & Meyer, L.B. (1960). The rhythmic structure of music. Chicago: University of Chicago
Press.
Drake, C. & Palmer, C. (1993) Accent structures in music performance. Music Perception, 10,
343-378.
Garcia-Albea, J.E., de Visa, S. & Igoa, J.M. (1989) Movement errors and levels of processing in
sentence production. Journal of Psycholinguistic Research, 18, 145-161.
Garrett, M.F. (1980) Levels of processing in sentence production. In B. Butterworth (ed,), Language
production: Speech and talk (pp. 177-220) London: Academic Press.
Gathercole, S.A. (1999). Cognitive approaches to the development of short-term memory. Trends in
Cognitive Science, 3, 410-418.
Handel, S. (1989). Listening: an introduction to the perception of auditory events. Cambridge: MIT
Press.
Hasty, C.F. (1997) Meter as rhythm. New York : Oxford University Press.
Hayes, B. (1984). The phonology of rhythm in English. Linguistic Inquiry, 15, 33-74.
Huron, D., & Royal, M. (1996). What is melodic accent? Converging evidence from musical practice.
Music Perception, 13, 489-516.
Jones, M.R. & Pfordresher, P.Q. (1997). Tracking musical patterns using joint accent structure.
Canadian Journal of Experimental Psychology, 51, 271-290.
Jones, M.R. & Yee, W. (1997) Sensitivity to time change: the role of context and skill. Journal of
Experimental Psychology: Human Perception & Performance, 23, 693-709.
Large, E.W., & Jones, M.R. (1999) The dynamics of attending: How people track time-varying
events. Psychological Review, 106, 119-159.
Large, E.W,, & Palmer, C. (In preparation). Temporal response to music performance: Perceiving
structure in temporal fluctuations.
Lerdahl, F., & Jackendoff, R. (1983) A generative theory of tonal music. Cambridge: MIT Press.
Liberman, M.Y., & Prince, A. (1977) On stress and linguistic rhythm. Linguistic Inquiry, 8, 249-336.
Meyer, R.K. & Palmer, C. (submitted). Temporal control and planning in music performance.
Palmer, C. (1996) Anatomy of a performance: Sources of musical expression. Music Perception, 13,
433-454.
Palmer, C., & Krumhansl, C.L. (1990) Mental representations of musical meter. Journal of
Experimental Psychology: Human Perception and Performance, 16, 72-741.
Palmer, C., Pfordresher, P.Q., & Brink, D. (in preparation). Music errors, speech errors, and cognitive
constraints on sequence production.
Palmer, C., & van de Sande, C. (1993). Units of knowledge in music performance. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 19, 457-470.
Palmer, C., & van de Sande, C. (1995). Range of planning in music performance. Journal of
Experimental Psychology: Human Perception and Performance, 21, 947-962.
Povel, D.J. (1981) Internal representation of simple temporal patterns. Journal of Experimental
Psychology: Human Perception and Performance, 7, 3-18.
Shaffer, L.H. (1981) Performances of Chopin, Bach, and Bartok: Studies in motor programming.
Cognitive Psychology, 13, 326-376.
Sloboda, J.A. (1983). The communication of musical metre in piano performance. Quarterly Journal
of Experimental Psychology, 35, 377-396.
Thomasson, J.M. (1982) Melodic accent: Experiments and a tentative model. Journal of the
Acoustical Society of America, 71, 1596-1605.
Back to index
Proceedings paper
Reinhard Kopiez
Hochschule für Musik und Theater
Emmichplatz 1, 30175 Hannover, Germany
Phone: +49-511-3100 608, Fax: +49-511-3100 600
e-mail: kopiez@hmt-hannover.de
INTRODUCTION
Since Seashore's (1938/1967, p. 218) ground-breaking studies in violin intonation, it is a well known
fact that we have to differentiate between tuning (this means an idealized system of pitch relation like
just or meantone tuning) and intonation (this is what the performer really does). As he was able to
show, the intonation of the same interval in a piece of music can be described by a distribution of
deviation, which depends on the musical context.We also know that the intonation of a performance is
influenced by expressive deviations - due to the melodic structure of a piece, the performer's expertise
and the instrument's imperfections. Until today little attention has been focused on the performer's
role. For example, the expertise theory would predict that experts can adapt to different task
constraints (e.g. different tuning systems) to a high degree. But as Fischer (1996) reports in his
meta-analysis, there are no studies using controlled varied condition although we can find numerous
studies on ensemble intonation (e.g. in a string quartet). A recent extensive study by Fyk (1995) on
violin intonation concentrates on solo performance and confirms the dynamic, context-sensitive
character of intonation. As she was able to demonstrate, the player's tendency is to increase the size of
large intervals and to compress intervals smaller than a fifth in melodic (horizontal) intonation (see
also Rakowski, 1990).
Due to this lack of research, this study addresses the following questions:
● How can a musician cope with an instrument's technical obstacles and adapt to a given tuning
system?
● How reliable is intonation over different renditions?
● Are there any evident effects of tonal gravitation (Fyk, 1995), causing "islands of intonational
stability" or can we observe overall stability, indepent of the interval category?
● How important is the degree of expertise for successful task adaptation?
METHOD
Material
A short piece of music was composed, which had to fulfill the following demands:
● No expressive melodic movement, to avoid an overlap of harmonic intonation and expressive
melodic intention
● A slow tempo, to enable the player to listen to and adjust his intonation without too many time
constraints. Additionally a slow tempo would deliver signal durations which would leave a
quasi-stationary part of the note after removal of the attack and decay.
● Technical simplicity, to render the player free from technical obstacles
● A four-part structure, to simulate an ensemble timbre over which the subject could play the
upper part (embedded interval paradigm)
● An A-B-A form with a modulation in the B-part, to test subject's adaptation to harmonic
changes
● Chordal progression in root position only, to make identification of the harmonic context easier
● Concentration on only a few test-intervals like prime, minor third, major third, fifth and minor
seventh. Example 1 shows the score of the short composition.
Two 3-part tuning versions were generated from the original midi-file, using the software
RealTimeTuner for Macintosh (V 1.2) by William Cooper
(http://socrates.berkeley.edu/~wcooper/realtimetuner.html) for the generation of the just version in 5
limit. The option "automatic chord following" was used, which corresponds to an adaptive just tuning
(for intonation details see Blackwood, 1985). In a next step sound files for CD recording were
generated from the midi-files, using a Yamaha Sampler (TG 77) with sound "French horn".
Subjects
2 trumpet players took part in the experiment: Player "H" (24 years) was a trumpet student at a music
conservatoire. He had been playing for 15 years. His additional monthly ensemble activities added up
to 6-10 hours depending on seasonal activities. Subject "K" (39 years) was a professional player in an
orchestra. He had been playing for 28 years, 19 of those professionally. His additional monthly
ensemble activities added up to 15-20 hours, mostly in an ensemble for avantgarde music. Subject "K"
was recommended by a conductor due to his outstanding intonational skills.
Procedure
Subjects received a practice CD 10 days before the recording session. The CD contained the
test-composition in ET and in this particular JI rendition in a 3-part version, and a repeated sample
tone (Eb) to tune up to. Score and solo voice were added as a print out. A short written introduction to
the subject's task was given, and subjects were asked to notate the time used for practising during the
10-day preparation phase. Their main task was to produce a "best fit" to the indicated tunings on CD.
The recording procedure took part in each subject's home. Subjects listened to the 3-part
accompaniment (at first in ET, followed by JI) through headphones and played the upper voice. The
recording was done with a DAT-recorder, using a microphone (Type Sennheiser E 608) attached
directly onto the instrument's bell. 5 renditions in each tuning system were done. The recording
session lasted about one hour. As a control variable and to assess the subjects' skills of perception, a
"surprise" informal aural test was constructed, consisting of a cadence in 3 tuning systems: (a)
pythagorean, (b) equal tempered, and (c) just. After a short trial section, explaining the features of
each tuning systems, subjects listened to the cadences in the sequence a-b-c-c-b-a. Subjects
recognized the sequence faultlessly.
RESULTS
Solo voice recordings were sampled onto hard disk (sample rate = 11.025 kHz) and a pitch analysis of
each of the 21 notes was calculated, using the software PRAAT for Macintosh (V 3.8.16) by Paul
Boersma. The first and last 200 ms from each note were removed, so that only the quasi-stationary
part of each note with a duration of 1.6 s was analyzed (see Fyk, 1995, p. 65). The module
"Periodicity" was chosen, with an FFT-size of 16.384. This results in a frequency resolution of 0.67
Hz, corresponding to a smallest difference of 1.4 c in the vicinity of 622 Hz (Eb5). Intonational
deviations were calculated using EXCEL and SPSS (V 9.0). Although figure 1 represents statistical
means of repeated renditions, a natural range of intonational deviations could be observed.
As a first step, data were analyzed with a repeated measure general linear model using the factors
tuning (2) * version (5) * interval (5) * player (2). The first factor "tuning" showed a significant effect
(Hotelling's T: F (1,32) = 8.5, p = 0.006). ET had a mean overall deviation of 4.9 c with a standard
error of 0.4 and JI of 7.5 c with a standard error of 0.5 (see figure 1). As we are interested in the
H0-hypothesis (which states that there is no difference between the adaptation to different tuning
systems) we would have to reach at least an alpha error level of > 0.20. Our findings show that the
adaptation to ET is significantly better than to JI (see figure1).
As a next step we analyzed the degree of adaptation to single intervals in the two tuning systems. The
analysis of variance showed a significant effect for tuning * interval-interaction (F (4,32) = 38.7, p =
0.000). This means that the intonational deviations differed between the 5 interval categories. The
factor "player" showed no significant effect, which allows us to concentrate on the explanation of the
observed interval-effect. Let us try a simple explanation: we can hypothesize that the players used the
same intonation (ET) for both tasks. Although this explanation is somewhat provoking it is supported
by the fact that the high minor third from ET (+9.2 c) fits extremely well when used in this particular
JI renditions (should be +15.6 c). But if the major third from ET, played only 2 c higher than its ideal
pitch, is transferred to JI, it is much higher (+16c) than expected in JI (-13.7 c). The minor seventh is
too high in ET (+1.7 c) and if transferred to JI, where the seventh would be expected to be -3.9 c
lower than in ET, this error results in a total deviation of +7 c for the minor seventh.
To sum up some other results: the tuning * interval-interaction produced a significant effect. Major
and minor thirds showed the smallest deviations in ET, and minor thirds the best performance in JI.
The interval-performance did not improve over the five renditions and players did not differ in their
performance in the two tuning tasks.
DISCUSSION
We can confirm the predictions from expertise theory, namely that expertise is always
domain-specific (see Ericsson, 1996): there is no evidence of successful task-adaptation if there has
not been enough time for skill acquisition. The standard tuning system for a trumpet player in an
orchestra is the equal tempered system. This seems to be already internalized in the early stage of
higher music education, as with a conservatoire student. The student player "H" already shows a
remarkable instinct for his major thirds and minor sevenths. From this point we can say that subject
"H" was not a novice and his near-perfect adaptation to ET was not expected. There is no evidence for
an automatic adaptation to a so-called "natural" tuning system like JI. Both performers had far less
expertise in ensemble playing (without piano) and had had little chance to acquire intonational skills
for JI to the same extent as for ET. We can assume that the player's adaptation to JI would be much
better for instance after one week of intensive rehearsals in a brass ensemble. On the basis of
Sundberg's (1987, p. 178) studies on intonation in barbershop singing which showed that these singers
can adapt to beat-free just intonation with a mean deviation of less than 3 c, we can hypothesize that
expert musicians are capable of perfect task adaptation. These results open perspectives for music
education. The surprisingly successful adaptation to the "unnatural" ET system shows that only
deliberate practice is required to adapt to a given task. So we cannot support Vogel's assumption (see
quotation in the header) of the unattainability of ET for brass instruments. From our point of view the
role of the human factor, the professional musician and his skill to compensate for a wide range of
imperfections has been underestimated. In other words: not the trumpet, but the trumpeter makes the
music.
Acknowledgement
A full documentation of the experiment including the sound examples can be obtained from the URL
http://musicweb.hmt-hannover.de/intonation
REFERENCES
Blackwood, E. (1985). The structure of recognizable diatonic tunings. Princeton: Princeton University
Press
Ericsson, K.A. (Ed.) (1996). The road to excellence. The acquisition of expert performance in the arts
and sciences, sports and games. New Jersey: Erlbaum.
Fyk, J. (1995). Melodic intonation, psychoacoustics, and the violin. Zielona Gòra: Organon.
Fischer, M. (1996). Der Intonationstest. Seine Anfänge, seine Ziele, seine Methodik. [The intonation
test. Its history, its aims, and its method]. Frankfurt: Lang.
Rakowski, A. (1990). Intonation variants of musical intervals in isolation and in musical contexts.
Psychology of Music, 18, 60-72.
Seashore, C.E. (1938/1967). Psychology of music. Reprint New York: Dover Publications.
Sundberg, J. (1987). The science of the singing voice. Illinois: Northern Illinois University Press.
Vogel, M. (1961). Die Intonation der Blechbläser. [Intonation on brass instruments]. Bonn: Orpheus.
Back to index
Procedings paper
I suggest that while neither intervallic, tonal, voice-leading, or exact rhythmic configurations--the supposed constituents of musical motives--are retained in these variants,
a different kind of motive emerges: a complex of simple, broadly-defined relationships in various parameters. Here, this complex consists of a wide fall in pitch,
articulated legato, from an accented yet shorter note to an unaccented, yet longer one.
Beethoven defines this complex as a musical idea through a careful, gradual process, in which intervallic and rhythmic figures are altered, while the above broad features
are retained (Example 1b). First (starting at x3), the original leap of a perfect fifth is stretched; then, while the continued process of intervallic stretching and a held upper
pitch provide for association between successive motives, the truncated and rhythmically diminished x4-x6 are presented, embedded within a variant of y. Once
Beethoven's "rupture" gesture has two contrasting directions. In one (Examples 2a, 2c, 2d), rupture is generated through a sudden decline in intensity; in the other
(Example 2b), through an extreme and sudden intensification. Though both versions are most frequently generated through dynamics, dynamic changes are often
accompanied or replaced by analogous actions in other parameters. Thus, sudden decrease in intensity is delineated through a subito piano, sometimes following a
crescendo (Example 2c), but also by a sudden slowdown of rhythmic activity (often through the elimination of a dense figuration) or a drop to a lower register (Example
2d). Abrupt intensification may involve a subito ff, but also increased attack rate and textural density, as well as a steep registral expansion and a pitch rise (Example 2b).
Furthermore, while initially performed by "brute" agents like dynamics and attack rate, in latter variants the rupture is enacted by harmonic syntax itself, as the disrupting
function is rendered by replacing an expected cadential resolution with a dissonant, chromatically altered, substitute (see Example 2e, the dramatic open ending of the
sonata's 2nd movement). Harmonic syntax thus emulates gestures established by the supposedly "secondary" parameter of dynamics--a surprising discrepancy with our
cherished hierarchy of musical parameters in tonal music.
What makes the figures in Example 2 variants of a single gestural "motive" is, then, not their shared musical features but the act they all perform: severing an anticipated
termination from its "body"--the figure or phrase it is supposed to close and resolve--through an abrupt and extreme change. The musical parameters and procedures
enacting this change, its "direction," (increase or decrease in intensity) and its specific melodic, rhythmic, and harmonic constituents, may all be different in each case. It is
the act itself that is motivic here, not any specific musical rendering of this act.
How may a listener hear such gestures and respond to them, consciously or subconsciously? Obviously, an ability to perceive immediate and short-range structural
relationships is a prerequisite. For instance, in Example 2c continuity from m. 40 to 41 is suggested by local harmonic and voice-leading implications (m. 41 resolves a
dissonant V34 harmony and its upper-voice leading-tone, and serves as a goal of a passing I-V34-I6 progression) and by rhythmic and melodic conformance with preceding
figures and phrases (the rupture in m. 41 interferes with a repetition of the mm. 35-39 phrase and severs an omnipresent rhythmic figure). The impact of dynamic rupture
in m. 41, which clashes with these implied continuities, cannot be perceived, then, without sensitivity to the immediate implications of harmonic syntax, voice-leading,
and grouping structure (though not necessarily to their larger, long-range aspects). Yet the various figures in Example 2, enacted by very different parameters and
procedures (e.g. a subito pp and a chromatic deceptive cadence), cannot be perceived as expressing a single expressive act, a unifying gestural motive, unless one
transcends the structural and the syntactic. What associates these gestures more than any structural similarity is their shared expressive allusion: a violent interference with
an implied course of events, frustrating an expected close and resolution; and it is shared expression that enables them to serve as agents of expressive coherence--the aim
of "musical hearing."
My final example illustrates gestural coherence on a larger scale. If analogues of expressive action indeed play an important role in musical experience, one may look for
such analogues also on a dimension larger than that exhibited by the small-scale examples above. The principal phrase-groups in the Appassionata's opening movement,
most of which can be conceived as variants of a single gesture (which I call, for reasons soon to be clarified, the Sisyphean gesture), provide an example of such wide
gestural utterance.
The overall melodic contours of most themes and phrase-groups in this movement outline vast, asymmetrical curves, circuitously rising several octaves, only to fall back
again, swiftly and directly. These arches all delineate a process in which musical activity intensifies and musical patterning gradually disintegrates as the curve rises,
finally breaking down into a repetitious, shapeless fall. Structurally, this process may be described as a four-phase progression:
(1) A low register group, symmetrically and hierarchically articulated, presents distinct rhythmic and melodic figures. Despite the group's neat symmetry, some facets of
its organization, such as a metrical clash between parts (2nd theme), or a steep initiating ascent (opening theme) charge the music with tension and instability.
(2) A repetition of this group, higher in pitch.
(3) A further rise, characterized by simultaneous processes of intensification and disintegration, conveyed through a host of alternative means. Intensification may be
conveyed by a progressive shortening of phrase length, a steeper rise, an acceleration of attack rate, harmonic progression (tonic harmony is usually situated at the registral
bottom, while dominants or their substitutes are at the top), or dynamics. Disintegration is conveyed by thematic liquidation (a progressive elimination of distinct melodic
file:///g|/Tue/Eitan.htm (7 of 11) [18/07/2000 00:33:32]
gesture paper JM
and rhythmic patterns), and by gradually breaking symmetrical and hierarchical phrase structure into a chain of short, uniform figures. This phase may lead to a climactic
registral highpoint, marked by a shiver-like trill or arpeggiation over a dissonant harmony, or break directly into phase 4.
(4) A rapid fall back through several octaves to the initial register, composed of a repetitious and uniform melodic and rhythmic pattern over an unchanging harmony
(prolonging the peak harmony, to be resolved only at the registral bottom). Thus, falls present a paradoxically active inertia, displaying the fastest rate of motion, yet the
slowest rate of change. If very fast, the fall's "momentum" may generate, near its end, a brief bounce upward (e.g., mm. 15-16).
Table 1 marks these phases in the movement's three main subjects and in the climactic retransition ending its development section.
Phase 1 Phase 2 Phase 3 Phase 4
Principal subject mm. 1-4 mm. 5-8 mm. 9-13 mm. 13-16
(mm. 1-16)
Secondary Subject mm. 35-39 mm.39 -41 mm. 41-46 mm. 47-51
(mm. 35-51)*
Closing Subject mm. 51-54 mm. 55-58 mm. 59-60 mm. 61-65
(mm. 51-65)
Climax & mm. mm. mm. mm.
retransition (mm.
109-135) 109-113 113-117 117-126 126-135
In the above discussion and musical analyses I presented a hypothesis concerning the musical objects and processes ordinary "musical listening" may attend to, and the
facilities that may enable it. I suggested that sensitivity to the expressive, kinesthetic, and inter-modal associations of simple proto-musical factors, as encountered in
extra-musical situations (combined with sensitivity to some aspects of surface musical syntax) may take our "musical listener" a long way toward a non-propositional
grasp of music as an expressive, dynamic whole. Not less important, I suggested (and tried to exemplify) that such "brute" elements can also be used as sophisticated
file:///g|/Tue/Eitan.htm (9 of 11) [18/07/2000 00:33:32]
gesture paper JM
compositional tools, creating intricate thematic and motivic connections and processes, and thus enhancing structural coherence in music.
Here, perhaps, is a possible bridge between Cook's two separate worlds--"musical" listening and "musicological," that is analytical, discourse. It is, though, a bridge that
has yet to be erected.
References:
Berry, W. (1976). Structural Functions in Music. Englewood Cliffs, N.J: Prentice-Hall.
Bregman, A. S. (1990). Auditory Scene Analysis. Cambridge, Mass.: MIT Press.
Clynes, M. & E. Nettheim (1983). The living quality of music. In M. Clynes (ed.), Music, Mind, and Brain: The Neurobiology of Music. New York: Plenum, 47-82.
Cohen, D. (1971). Palestrina counterpoint: a musical expression of unexcited speech. Journal of Music Theory 15/1, 99-111.
Cook, N. (1987). The perception of large-scale tonal closure. Music Perception 5 (2), 197-205.
--------. (1990). Music, Imagination, and Culture. Oxford: Clarendon.
Cumming, N. (1997). The subjectivities of 'Erbarme Dich.' " Music Analysis 16 (2), 5-44.
Eitan, Z. (2001). Thematic gestures: Theoretical preliminaries and an analysis. Orbis Musicæ 13.
Francès, R. (1988). The Perception of Music. Translated by W.J. Dowling. Hillsdale: Erlbaum.
Friberg, A., and J. Sundberg (1997). "Comparing Runners' Decelerations and Final Ritards." In A. Gabrielsson (Ed.), Third Annual ESCOM Conference: Proceedings.
Uppsala, 582-586.
Gotlieb, H., & Konecni, V.J. (1985). The effects of instrumentation, playing style, and structure in the Goldberg Variations by Johann Sebastian Bach. Music Perception 3
(1), 87-102.
Hatten, R. S. (1997-99). Musical gesture: on-line lectures. Cyber Semiotic Institute, University of Toronto. URL: http://www.chase.utoronto.ca/epc/srb/cyber/hatout.html
Konecni, V.J. (1984). Elusive effects of artists' "messages." In W. R. Crozier and A. J. Chapman (Eds.), Cognitive Processes in the Perception of Art. Amsterdam: North
Holland, 71-93.
Lidov, D. (1987). Mind and body in music. Semiotica 66, 69-97.
Millar, J.K. (1984). The aural perception of pitch-class set relations: A computer-assisted investigation. Ph.D. dissertation, North Texas State University, 1984.
Papousek, M. (1996). Intuitive parenting: a hidden source of musical stimulation in infancy. In I. Deliege and J. Sloboda (Eds.), Musical Beginnings: Origins and
Development of Musical Competence. Oxford, New York, and Tokyo: Oxford University Press, 88-112.
Sloboda, J. (1998). Does music mean anything? Musica Scientiæ 2/1, 21-32.
Stern, D. N. (1984). Affect atunement. In J. D. Call, E. Galenson, and R. L. Tyson (Eds.), Frontiers of Infant Psychology, Vol. 2. New York, Basic, 3-14.
Sullivan, J. W. and Horowitz, F. D. (1983). Infant intermodal perception and maternal multimodal stimulation: implications for language development. In L. P. Lipsitt and
C. K. Rovee-Collier (Eds.), Advances in Infancy Research, Vol. 2. Norwood, N. J.: Ablex, 183-239.
Back to index
Proceedings abstract
Mr Matthew M Lavy
mml1000@jesus.cam.ac.uk
Background:
method:
Results:
Conclusions:
Back to index
Proceedings paper
Aims
In this research we shall adopt Tulving’s paradigm in order to investigate what could help memory
while listening to music and by this way to determine which are the perceived characteristics that
contribute to build up the subjective temporal schema implicit in every temporal processing.
In particular, we will investigate if it is possible that salience prevails over tonality in determining
how much a musical theme is remembered. In case of a positive answer the problem of cognitive
processing of atonal music will be overcome.
As the concept of salience is not univocal a subsidiary goal will be considered: it consists in
determining possible differences or correspondences between objective salience, as defined by
Musicology, and perceived salience as assessed by Psychology.
Method
We used a set of 48 new melodic stimuli composed on the purpose by F. Cifariello Ciardi. The stimuli
are grouped in 4 categories (12 stimuli each) according to the musical genre: Tonal Salient (TS);
Tonal NonSalient (TNS); NonTonal NonSalient (NTNS); Non Tonal Salient (NTS). An operational
definition of salience was given to the composer on the basis of a predefined grid of both tone
intervals and temporal parameters.
The stimuli are divided in two lists of 24 stimuli each (6 stimuli for each category). In the study phase
subjects listened to one of the two lists 1,2,or 3 times. Afterwards in the test phase subjects listened to
all 48 stimuli and had to recognize the stimuli they heard earlier by means of R or K responses.
296 Subjects with different characteristics of age and experience were examined.
Besides the statistical analysis of the collected data a musicological analysis of the stimuli was
performed by E.Pozzi.
Results
Synthetically our results show that: additional study trials improved recognition in adults but not in
younger children; gestaltic accentuation enhanced precise recollection expecially for Salient stimuli;
more R responses are given for Salient stimuli and, when salience is absent, for Tonal ones.
By means of the Thyeory of Signal Detection we found that 6 stimuli (3 of which NTS) are
recollected better than the others of their category. 7 stimuli (3 of which TS) are remembered worse
than the others of their category. A good correspondence was found with the results of the
musicological analysis.
The cluster analysis shows 2 main clusters corresponding to all Salient stimuli (TS + NTS) on one
side and to all Non Salient stimuli on the other (TNS + NTNS).
The ANOVA shows: significantly more R responses for all Salient stimuli; more K responses for TS;
less K responses for SNT; more wrong R responses for NSNT; more wrong K responses for ST; more
wrong recognitions for all Non Salient stimuli (the highest level for NSNT). These rsults confirm that
in the development of the relative hierarchy of incoming musical events Salience prevails over
Tonality in affording a better anchorage for musical memory.
References
Butler, D. (1990). A study of event hierarchies in tonal and post-tonal music. Psychology of Music, 18
(1), 4-17.
Cifariello Ciardi, F. (1999). Natura e funzione della discontinuità nell’ ascolto musicale. Draft for the
ECONA Symposia.
Conway, M.A., Gardiner, J.M., & Perfect, T.J. (1997) .Changes in memory awareness during
learning: The acquisition of knowledge by psychology undergraduates. Journal of Experimental
Psychology: General, 126 (4), 393-413.
Gardiner, J.M., Kaminska, Z., Dixon, M., & Java, R.I. (1996). Repetition of previously novel
melodies sometimes increases both remember and know responses in recognition memory.
Psychonomic Bulletin and Review, 3 (3), 366-371.
Imberty, M. (1999). Continuité et discontinuité de la matière sonore dans la musique du XX siècle.
General Psychology, 3-4, 49-69.
Java, R.I., Kaminska, Z., & Gardiner, J.M. (1995). Recognition memory and awareness for famous
and obscure musical themes. European Journal of Cognitive Psychology, 7 (1), 41-53.
Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge: MIT Press.
Olivetti Belardinelli, M., & Rossi Arnaud, C. (1999). Recollection and familiarity in recognition
memory for musical themes. In Proceedings of the XI Conference of the European Society of
Cognitive Psychology. Academic Press, Gand, Sept. 1-4, 1999, p.193.
Rajaram, S. (1996) Perceptual effects on remembering: Recollective processes in picture recognition
memory. Journal of Experimental Psychology: Learning, Memory & Cognition, 22, 365-377.
Tulving, E. (1985). How many memory systems are there? American Psychologist, 40(4), 385-398.
Back to index
Proceedings paper
Abstract
This paper examines recent theories of children’s motivation and strategies in music
learning, within a case study of an adoptive family in which the middle child takes on a
teaching role in his younger sister’s practice. Questions of heredity and environment are
raised by the adoptive relationships, so that this research offers a new perspective on the
ongoing nature/nurture debate in music research. Video evidence and in-depth interviews
with each family member are used to explore their perceptions of each other’s skills and
roles in relation to music. The sense of freedom expressed by the parents allows the
individuality of the children to emerge strongly, as the family seek to provide narratives that
explain the children’s skills and interests in relation to their family and culture of origin, as
well as their existing environment.
Introduction
This paper brings together two seemingly diverse areas in which psychological research has
advanced understanding in recent years: the dynamics of adoptive families, and children’s
motivation and strategies in music learning. Dealing with a single case study family, we are
able to look at questions of heredity and environment in the development of musical skills.
The focus of the investigation is the Eccles family, who live in a wealthy suburb of a major
city:
intelligence and physical characteristics, genetic inheritance plays a significant part’ (1998:
8). A large scale US survey of ‘mental health of adopted adolescents’ (Benson et al, 1994)
showed that genetic roots were less of an everyday concern for the children, who tended to
explain their skills and behaviours in terms of patterns learnt within the adoptive family,
rather than their family of origin. Adoptive relationships are subject to the same strains as
any other family (Triseliotis et al, 1997), with sibling rivalry, age differences and sibling
position effects (Sulloway, 1996) occurring amongst the children. Trans-cultural adoptive
families face additional challenges, and the Eccles family falls into this pattern, the parents
being of white European heritage, and the children of South American origin, coming from
Chile and Columbia. The literature is divided in its analysis of trans-racial adoptions, with
Hoksbergen (1997) emphasising the greater responsibilities that are placed upon the
parents, whilst accounts reported by Austin (1990) suggest that the comments of other
people are most likely to cause distress to parents and children (cf. Bartholet, 1993).
The case study family
Perceptions of the children’s characteristics
Suzie and Bob Eccles take a robust approach to their parenting roles, and Suzie says ‘It’s
actually quite wonderful that they’re adopted, because when someone says, "Oh, isn’t Lucy
beautiful?", I can say with a totally clear conscience "Oh yes, she’s absolutely stunning, isn’t
she?"’. Their perception that the children’s characteristics, whether physical, intellectual or
emotional, are very much their own, seems to make for a healthy parent-child relationship,
where any dreams held for the children are not the result of frustrated parental ambition, as
they might be in a genetic family. As Suzie describes it, ‘The children are entirely different
and it’s sort of meeting those needs when the occasion arises’. Whilst the Eccles children
identify closely with their adoptive parents - James seeing his musical interests as having
come from his father, and Sean wanting to emulate Bob as an accountant - Bob and Suzie
see their role as one of exposure, giving the children access to diverse opportunities,
through which they can ‘find their passion’. Unlike biological parents, they have less of a
predetermined idea about what that passion might be, because they are not trying to identify
their own physical and mental characteristics in their children, but rather accepting them for
who they are.
Through the varied opportunities that their parents have given them, each of the Eccles
children have learnt two musical instruments; the piano and a band instrument. Despite this
equality of provision, it is James who has been labelled as ‘the musical one’ by the rest of
the family, whilst Sean is seen as more studious, and Lucy is the dizzy artist. The three
children are generous in their perceptions of each other’s skills and interests, and even
where Sean feels he has been overtaken in musical ability by James, he acknowledges that
‘people have talents’, and recognises his own skills in maths and science, which he identifies
with Bob, who is an accountant. James articulates his passion for music, stating that ‘at
school they know me for my music talent; I would hate not to be able to play, I’d hate to not
have that privilege.’ He too identifies with his father, saying that ‘he gets all emotional when
he listens to music’, an engagement that James can empathise with. Lucy is far more willing
to give up her musical ambitions, as dance and art her personally acknowledged strengths.
She claims to have inherited her artistic interests from Suzie, but links dance with her
Colombian origins, showing an interesting resolution of her identity as an adopted child.
Clearly, there is a gender split in evidence, as the boys identify with their father’s interests,
whilst Lucy feels closer connection with her mother. Support for their interests is coming from
the family environment, yet there is an acknowledgement that their country and family of
origin are also influential, with some sense of confusion implicit in the children’s comments
about which set of parents have provided the critical genetic variable.
Teaching and learning roles
There is a substantive family research literature showing that eldest children often take on
the role of teacher, most typically with older girls teaching younger brothers (Dunn &
Kendrick, 1982). Within the Eccles family, James, the middle child, has taken on the role of
musical expert and so transcends the normal sibling boundaries, giving advice to his older
brother and acting as Lucy’s teacher, despite the small age difference between them. Lucy is
complicit in this arrangement, having chosen to play the same two instruments as James,
and preferring to be taught by him rather than an adult; ‘I don’t really need a teacher every
day, because I can be helped by my brother’. Video evidence shows James revelling in the
teacher persona - ‘Come on Luce, concentrate and try it one more time’ - whilst Lucy shows
a high level of responsiveness throughout long practice sessions (c. 45 mins.). Immediately
after teaching Lucy, James begins his own practice, demonstrating complete absorption and
intrinsic motivation. Despite his youth and lack of explicit teaching strategies, James shows
an awareness of different practising styles, including long term repetition and more detailed
work, in line with established theory (Hallam, 1998); as he says, ‘practising does not just
mean playing it once and flipping the page over’.
Two years into learning the clarinet, Lucy decided to give up, stopping her lessons with
James and withdrawing from the school band. She and her parents are adamant that this
was caused by her dislike of early morning band practices, and her growing enthusiasm for
dance, which was occupying increasing time and energy. The decision to give up was
Lucy’s, as Suzie says ‘I won’t keep badgering them, that’s why Lucy dropped the clarinet,
because I said I’m not going to be the one yelling and screaming about practice’. For the
highly self-motivated James, this lack of explicit direction provides enough of a supporting
environment, but Lucy’s waning interest suggests that a greater level of parental intervention
can sometimes be necessary. James, having held the role of extrinsic motivator for Lucy,
seems to feel some sense of guilt at her decision, blaming his teaching: ‘I was just trying to
teach her what she had to learn ... but sometimes I would go too far, I would go out of her
span. I could have done better if I was older, like me now and her when she was, because
I’d know what to do’.
Conclusions
This brief glimpse of the case study family has highlighted the complex interaction between
heredity and environmental influence in the development of musical skill. At a genetic level,
the three children are recognised as being very different, unrelated to their parents and to
each other. There are some connections of cultural identity, as they are all South American,
and family narratives acknowledge that some characteristics, such as Lucy’s love of
dancing, may have come through the genes. The immediate concern for the family,
however, is to create a secure and enriching environment, offering opportunities for
individual fulfilment.
Within this framework, it is not clear whether James’s identified talent for music is something
he has inherited from his family and culture of origin, or whether it has come from the
opportunities afforded to him within the adoptive environment. What is clear, however, is that
he is abundantly more self-motivated than the other two children, especially in music. So, is
it that James has a personality characteristic that predisposes him towards solitary,
systematic learning, for which music is only one possible outlet? His other interests, in
writing and journalism, support this. Or, is the balance more towards identifying with Bob’s
love of music, and the ideal of what being a musician represents? James is very pleased
with the lucrative opportunities offered by busking, for instance, and likes his status within
school and the family as a musician: ‘I just always liked music and I’ve kind of grown up with
it’. The third possibility is that James accesses something intrinsic within the music, but
deciding between these three explanations is virtually impossible, and it is likely that the third
is in itself a subtle consequence of the other two. Genetic and environmental factors are
interwoven in the children’s explanations, so that for them, music is a product both of
inheritance and the life they are now leading. The circularity of the nature/nurture debate
demands that both influences are equally acknowledged, without being shaped by parental
pressure or expectation. Sean, James and Lucy illustrate the importance of allowing
individuality to flourish, so that musical development can become a source of intrinsic and
growing satisfaction to the child.
References
Austin, J. (Ed) (1990) Adoption: The Inside Story. London: Barn Owl Books.
Bartholet, E. (1993) Family Bonds: Adoption and the Politics of Parenthood. Boston:
Houghton Mifflin Company.
Benson, P. L., Sharma, A. R. & Roehlkepartain, E. C. (1994) Growing up adopted: A portrait
of adolescents and their families. Minneapolis: Search Institute.
Capron, C. & Duyme, M. (1989) Assessment of effects of socio-economic status on IQ in a
full cross-fostering study, Nature, 340: 552-554.
Davidson, J. W., Howe, M. J. A., Moore, D. G. & Sloboda, J. A. (1996) The role of parental
influences in the development of musical performance, British Journal of Developmental
Psychology, 14: 399-412.
Davidson, J. W., Howe, M. J. A. & Sloboda, J. A. (1997) Environmental factors in the
development of musical performance skill in the first twenty years of life, in D. J. Hargreaves
& A.C. North (Eds.), The Social Psychology of Music (pp. 188- 206). Oxford: Oxford
University Press.
Dunn, J & Kendrick, C. (1982) Siblings: Love, envy, and understanding. Harvard University
Press, Cambridge, MA.
Hallam, S. (1997) Approaches to instrumental practice of experts and novices: Implications
for education, H. Jorgensen & A.C. Lehmann (Eds.), Does practice make perfect? Current
theory and research on instrumental music practice (pp. 89-107). Oslo: Norges
Musikkhogskole.
Hallam, S. (1998) Instrumental Teaching: A practical guide to better teaching and l earning.
Oxford: Heinemann.
Hill, M. (1998) Concepts of parenthood and their application to adoption, M. Hill & T. Shaw
(Eds.) Signpost in Adoption: Policy, practice and research issues (pp. 30-44). London: British
Agencies for Adoption and Fostering.
Hoksbergen, R. A. C. (1997) Child Adoption: A Guidebook for Adoptive Parents and Their
Advisors. London: Jessica Kingsley.
Howe, D. (1998) Adoption outcome research and practical judgment, Adoption and
Fostering, 22 (2): 6-15.
Howe, M. J. A., Davidson, J. W. & Sloboda, J. A. (1998) Innate gifts and talents: Reality or
myth?, Behavioural and Brain Sciences, 21 (3): 432-442.
Kemp, A. E. (1996) The Musical Temperament: Psychology and Personality of Musicians.
Oxford: Oxford University Press.
Pitts, S. E., Davidson, J. W. & McPherson, G. E. (in press) Developing effective practice
strategies: case studies of three young instrumentalists, Music Education Research,
forthcoming issue.
Plomin, R. (1998) Genetic influence and cognitive abilities. Behavioural and Brain Sciences,
21(3): 420-421.
Sloboda, J.A., Davidson, J.W. & Howe, M.J.A. (1994) Is everyone musical? The
Psychologist. 7 (4) 349-354.
Sloboda, J. A. & Davidson, J. W. (1996) The young performing musician, in I. Deliege & J.A.
Sloboda (Eds), Musical Beginnings: Origins and Development of Musical Competence (pp.
171-190). Oxford: Oxford University Press.
Sloboda, J. A., Davidson, J. W., Howe, M. J. A. & Moore, D. G. (1996) The role of practice in
the development of performing musicians, British Journal of Psychology, 87: 287-309.
Sternberg, R.J. (1998) If the key’s not there, the light won’t help. Behavioural and Brain
Sciences, 21(3): 424-425.
Sulloway, F. J. (1996) Born to Rebel. New York: Little, Brown & Co.
Triseliotis, J., Shireman, J. & Hundleby, M. (1997) Adoption: Theory, Policy and Practice.
London: Cassell.
Back to index
Proceedings paper
equal intensity. In the two-cues-cooperating comparison timing and intensity were combined to
strengthen the same rhythmic structure. Two sequences differed in terms of the presence of higher
intensity tones again but this time the higher intensity tones were preceded by longer time intervals
compared to their counterparts in the sequence with equal intensity. In the two-cues-conflicting
comparison differences in timing undermined the difference created by the higher intensity tones. The
higher intensity tones in one sequence were preceded by shorter time intervals compared to their
counterparts in the sequence with equal intensity. One key prediction from the identification
performance was that two-cues-cooperating performance should be better than one-cue performance
but two-cues-conflicting performance should be poorer than one-cue performance. The results of the
experiments verified the prediction in that addition of the second cue helped discrimination only if it
was in the cooperating direction.
Integrality of timing and intensity in tone sequences
Interaction of timing with other dimensions of sound has been demonstrated in other ways as well. On
of them is the independence/integrality approach. Two perceptual dimensions are considered to be
processed independently, if detection of variations in one of them (the relevant dimension) is not
affected by variations in the other (the irrelevant dimension). If totally uncorrelated variation on the
irrelevant dimension reduces discrimination performance on the relevant dimension the two
dimensions are not considered to be independent (Garner, 1974; Melara & Marks, 1990). Further
conclusions can be reached from the cases when variation on the irrelevant dimension is correlated
with variation on the relevant dimension. If both positive and negative correlations between the two
dimensions facilitate discrimination on the relevant dimension the interaction is considered to be at a
sensory level. In contrast, if only positive correlation facilitates discrimination but negative correlation
makes it worse the interaction is considered to be at a higher lexical level. That is, at a level where
stimuli are labeled.
I have investigated the independence and integrality of timing and intensity as two means of accenting
in tone sequences (Tekman, in preparation). In one experiment the relevant dimension was variability
of IOIs whereas in the second one it was variability of tone intensity. In the control conditions the
irrelevant dimension was held constant. In the uncorrelated condition presence of intensity and timing
accents varied independently of each other. In the positive correlation condition if a sequence
contained higher intensity tones they followed longer time intervals and if there were no higher
intensity tones then all IOIs were equal. Conversely, in the negative correlation condition if a
sequence contained higher intensity tones then all IOIs were equal and if a sequence had equal tone
intensities then some tones had longer IOIs.
For both timing and intensity as the relevant dimension uncorrelated variation reduced accuracy in
making discriminations on the relevant dimension. Correlation of the two dimensions helped only in
the case of positive correlation. Negative correlation did not have a significant effect on detection of
timing variations and had a negative effect on detection of intensity variations. Thus, timing and
intensity did not appear to be processed independently. Facilitation by only positively correlated
variations indicated that perception of tones as accented interfered with accurate perception of the
actual physical dimensions that created accenting.
Detection of variations in expected and unexpected directions
One important question about perception of timing is how it relates to expected variations in
performance. In two experiments in which I manipulated IOIs in the directions consistent with and
opposed to the direction that they were usually varied in performance I investigated this question
(Tekman, submitted).
In one experiment intensity in addition to timing was manipulated. It performance of intensity accents
shorter time intervals typically precede the higher intensity tones (Billon & Semjen, 1995; Semjen &
Garcia-Colera, 1986). In this experiment, participants had to detect temporal variations in sequences
when either shorter or longer time intervals preceded higher intensity tones. It was found that the
higher intensity tones reduced sensitivity for shorter time intervals more compared to sensitivity for
longer time intervals. That is, the expected shorter IOIs were harder to detect.
In the second experiment pitch accents were created by having skips in pitch to coincide with the time
intervals that were manipulated. In musical performance larger changes in pitch are typically
combined with longer tie intervals (Drake & Palmer, 1993). In contrast to the previous experiment,
introduction of the pitch skips reduced sensitivity to longer IOIs more than sensitivity for shorter IOIs.
The consistent aspect of these results with the results of the previous experiment was that the expected
longer IOIs that coincided with the pitch skips were hard to detect.
Another variable that was manipulated in these experiments was whether the deviant intervals had
temporal regularity or not. This variable did not affect the way accenting by variation on a second
dimension changed how timing was perceived. Thus, the way variations in other dimensions of sound
affects perception of timing in sound sequences was closely related to how the two dimensions would
be used in performance. Furthermore, this relationship did not appear to be sensitive to global
temporal structure.
Conclusion
Multiple methods converge on the observation that perception of timing in sound sequences depends
on variations in other dimensions of sound. This results in distortions in time perception that have a
close relationship to how these dimensions interact in expressive musical performance. Such effects
are observable with simple acoustic stimuli and are not affected by manipulations of structure such as
temporal regularity. All this supports the conclusion that expressive timing variations must be
determined in part by limitations in our perception of timing.
References
Billon, M. & Semjen, A. (1995). The timing effects of accent production in synchronization and
continuation tasks performed by musicians and nonmusicians. Psychological Research, 58,
206-217.
Drake, C. & Palmer, C. (1993). Accent structures in music performance. Music Perception, 10,
343-378.
Garner, W. R. (1974). The processing of information and structure. Potomac, Maryland:
Erlbaum.
Melara, R. D. & Marks, L. E. (1990). Interaction among auditory dimensions: Timbre, pitch,
and loudness. Perception & Psychophysics, 48, 169-178.
Palmer, C. (1989). Mapping musical thought to musical performance. Journal of Experimental
Psychology: Human Perception and Performance, 15, 331-346.
Penel, A. & Drake, C. (1998). Sources of timing variations in music performance: A
psychological segmentation model. Psychological Research, 61, 12-32.
Repp, B. H. (1990). Patterns of expressive timing in performances of a Beethoven minuet by
nineteen famous pianists. Journal of the Acoustical Society of America, 88, 622-641.
Repp, B. H. (1992). Probing the cognitive representation of musical time: Structural constraints
on the perception of timing perturbations. Cognition, 44, 241-281.
Repp, B. H. (1995). Detectability of duration and intensity increments in melody tones: A
partial connection between music perception and performance. Perception & Psychophysics,
57, 1217-1232.
Repp, B. H. (1996). The art of inaccuracy: Why pianists’ errors are difficult to hear? Music
Perception, 14, 161-184.
Repp, B. H. (1998). Variations on an theme by Chopin: Relations between perception and
production of timing in music. Journal of Experimental Psychology: Human Perception and
Performance, 24, 791-811.
Semjen, A. & Garcia-Colera, A. (1986). Planning and timing of finger-tapping sequences with a
stressed element. Journal of Motor Behavior, 18, 287-322.
Shaffer, L. H., Clarke, E. F., & Todd, N. (1985). Meter and rhythm in piano playing. Cognition,
20, 61-77.
Tekman, H. G. (1995). Cue trading in the perception of rhythmic structure. Music Perception,
13, 17-38.
Tekman, H. G. (in preparation). Perceptual integrality of timing and intensity variations as
means of creating accents. Music Perception.
Tekman, H. G. (submitted). Accenting and detection of timing variations in tone sequences:
Different kinds of accents have different effects. Perception & Psychophysics.
Todd, N. P. (1985). A model of expressive timing in tonal music. Music Perception, 3, 33-58.
Windsor, W. L. & Clarke, E. F. (1997). Expressive timing and dynamics in real and artificial
musical performances: Using an algorithm as an analytical tool. Music Perception, 15, 127-152.
Back to index
Proceedings paper
Intonation and Interpretation in String Quartet Performance: the case of the flat leading note
Peter Johnson, Birmingham Conservatoire (UK)
EMail: peter.johnson@uce.ac.uk
In a famous paper from the 1930s, Carl Seashore observed a tendency among violinists to adopt the very wide major thirds and narrow semitones of Pythagorean tuning (Seashore 1938: 218-224). His data,
however, reveal extreme deviation in the size of these intervals, and this is inconsistent with any formal temperament. The minor seconds vary from 38 cents smaller than equal temperament to 18 cents larger, a
difference of more than half a semitone (p.221). Either he was using poor performances, or the variety of intonational practice must be regarded as musically significant. Studies of unaccompanied cello playing,
including recent and admired recordings by Mischa Maisky and Anner Bylsma, show a similar predilection for wide major thirds and narrow semitones together with a striking range of actual tunings (Johnson
1999b).
Casals has argued that good playing demands what he calls 'expressive intonation', where tones are sharpened or flattened in accordance with the direction of melodic movement (Blum 1987). This principle is
consistently applied in his own recording of Bach's C minor Sarabande (Johnson 1999b). Intonation thus serves as an indexical sign of voice-leading. But is this its only expressive function? It is widely assumed in
the profession that intonation has a more general expressive function, and it is well-known that we tend to interpret small deviations of tuning qualitatively rather than quantitatively (Makeig 1982). And what
happens when a performer reverses the tendency, by playing a falling leading-note sharper than a rising one in the same phrase? In the Notes that append this paper, I discuss an example from Beethoven's last slow
movement, the Lento assai from the Op.135 quartet (Example 1). In bars 3 and 5, there are conjunct falling and rising leading-notes in the first violin. Out of 25 recordings, sixteen violinists take the falling
leading-note in bar 3 sharper than the rising, five tune the notes almost identically, and only four conform to 'expressive tuning' (Johnson 1999b). Explanations can no doubt to be found for this reversal of normal
practice in the special qualities of Beethoven's music, but these would confirm an expressive function for intonation transcending the strictly syntactical.
In fact, whatever principle we devise to justify normal practice, the anomalies remain problematic. From the study of leading-notes at the start of the Lento assai, one recording in particular stands out as highly
idiosyncratic. This is No.17, the Lindsays' 1987 recording, in which the leader consistently tunes all the leading-notes in bars 3 and 5 as pure major thirds against the dominant in the second violin. Against the
normative tunings of Equal Temperament, this gives very flat leading-notes and correspondingly wide semitones to and from the adjacent tonics.
What are we to make of such divergent practices? In particular, how do we 'read' and experience the idiosyncratic tunings in the Lindsays' recording? In this paper, I propose to address these questions by comparing
this recording with two others. One will be an example of very sharp tuning in the same bars of the Lento assai, the other an application of Just Intonation by the Lindsays in the very different context of the Heiliger
Dankgesang from Op.132. First, however, we need critically to review our methods.
Example 1
1. Method
An obvious problem in dealing with an idiosyncratic performance is how we assess its competence. How do we know that our analyses are not revealing simple errors or miscalculations in the execution of the
performance? By using commercially released recordings, we have a strong, although not foolproof, assurance of an error-free performance that has met the approval of its performers, the producer and engineer,
and eventually the wider community of experts and the listening public. On the other hand, recordings made in the laboratory carry no such assurances, neither are they normally available for external scrutiny or
further analysis.
How, then, do we accurately and reliably analyse intonation from a recording? Necessarily, we have to work from small samples, and these need to be representative or symptomatic of the larger musical context.
My examples have been supported by note-by-note analyses of adjacent events, which confirm the stylistic integrity of the performances. A preparatory survey exposes the moments of special interest, which,
perhaps not surprisingly, tend to involve the leading-notes.
We can define the frequency spectrum of any complex musical event quickly and efficiently using Spectrum Analysis. In ensemble performance, the chief question raised by this method is whether the fundamental
frequencies are accurate indicators of our perceptions of pitch and interval. In a paper on intonation in Barbershop singing, Hagerman and Sundberg support the contention that they do (1980), and empirical tests in
which single-beat extracts are compared with synthesised tones of know frequency, also confirm this in most cases (Johnson 1999b). Nevertheless, there are anomalies, and the Lindsays' recording of the Lento assai
provides an interesting example.
Figure 1 shows the spectra of two chords. In the upper plot, the chord is from the opening chorale of the Heiliger Dankgesang from Op.132, and represents the frequency-content of the C major chord in bar 4. The
leading-note is E4 in first violin, and C3, G3 and C4 complete the dominant harmony in the lower strings. The lower plot is from the Lento assai, and represents the last beat of bar 3, a more complex second
inversion dominant seventh, but also with the leading note, C4, in melodic prominence in first violin (see Example 1).
Figure 1
The accuracy of the frequency-data shown in Figure 1 is determined by the value of k, shown to the right of the title-line. The analysis resolves the source signal into discrete bands each of width k Hz, and shows
the strength of each band. By adjusting this value, more precise readings are available without overtaxing a standard personal computer, but there is a trade-off between accuracy and efficiency. Amplitude levels in
the figure are shown as decibel-difference calculated from the strongest peak in the signal. For a more technical description see the Notes appended to this paper.
In both cases, the analysis has generated a spectrum from k-11025Hz, but it is evident that in this quiet music, most of the relevant acoustical information is contained within the 50-3000Hz range. We can see a
marked difference between the two plots, the lower showing a tailing off of peaks in the spectrum above about 900Hz, whereas the upper plot suggest that we shall need to rescale the plot to gain access to all the
significant harmonics. To this extent, the analyses give a good visual analogue of the differences of tone-colour of the two chords. Although both are played quietly and in the same tessitura, the lower has a
noticeably darker quality of tone.
2. Just Intonation
The two extracts analysed in Figure 1 are chosen because they illustrate Just Intonation in the tuning between first and second violins. The relevant calculations are, from the upper plot:
C4 at 263.45 x 5/4 gives E4 at 329.3Hz
and from the lower plot:
Aflat3 at 209.44 x 5/4 gives C4 at 261.8Hz.
Both results are within 0.4 Hz of the actual peaks for the leading-notes, a barely perceptible difference of 2.2 cents. The margin of error is < (k x 5/4) = < 0.21 Hz or about 1 cent. For a brief explanation of the
relevant acoustic theory see the Notes at the end of this paper.
Examination of the lower-order harmonics in the upper plot reveals close conformity to the exact integer multiples of acoustical theory. In the lower plot, that is the case for all the harmonics of the first violin (< 2
cents) but not for those of the second violin's Aflat3. The second harmonic at 416.03Hz is flatter than the theoretical second harmonic (209.44 x 2) by almost 12 cents. In other words, the interval between the
second harmonics of first and second violins is an equal tempered major third. In fact, the peaks at Aflat4 and C5 are close integer multiples of the cello's Aflat2. The fundamental of this tone is unclear in Figure 2,
but by replotting the figure with a higher sample-size and hence greater precision (see Notes), Aflat2 is shown to be 104.2Hz. The multiples of this frequency are within 4 cents of the peaks at Aflat4 and C5, < 4
cents being the level of precision of these calculations. The tuning of the other cello note, the Eflat, is very close to Just Intonation in relation to the two violin tones. One other detail to emerge from this spectrum is
that the viola's Gflat3 is tuned as an almost true 7th harmonic against the second violin's A flat (183.54 x 8/7 = 209.76). This Just dominant seventh is some 25 cents flatter than the Just major second from A flat.
3. Comparison and Interpretation
We have, then, two examples of applied Just Intonation, from the same ensemble and in a similar repertoire. But many other factors are different. In the case of the Heiliger Dankgesang, the intention to find pure
tuning is explicit throughout the opening statement of the Chorale, which is to say that we hear the passage in terms of the tones of Just Intonation. However, the execution does not always match this ideal, and in
the other chords there is a liveliness in the sonorities arising from intonation that is very nearly but not quite pure. And we have seen that in the Lento assai, there is a direct conflict between the second violin and
cello A flats, creating on the one hand a Just major third and on the other, an equal-tempered tenth. Both performances are therefore less than ideal. Should we therefore conclude that our sources are invalid as
exemplary or at least expert performances? The expert community has clearly judged otherwise, for these recordings have been in the CD catalogues already for some 13 years. Let us make up our own minds by
hearing the extracts. Here, first, is the start of the Heiliger Dankgesang.
[HERE PLAY
OP.132iii b.1-6 (Lindsays)]
What I find interesting here is the way we differentiate between intention and execution. We can handle the differences either evaluatively or interpretatively, but we elect to listen evaluatively, we must bear in
mind that the apparent imperfections have not been edited away in the recording process, as they could have been. If instead we listen interpretatively, we assume that what we hear is intentional or acceptable to the
performers and, in some musical sense, construct meaning from what we hear. In the Lindsays' Chorale, for instance, there is no singing persona from the first violin, as there is in some other recordings. With no
hint of vibrato, there is a clear intention of creating a blended, single sonority for each chord. Yet the first violin is still the top line and the step-wise movement confirms that this is a chorale melody. There is
therefore a deictic tension between what we know the music could be, and what we hear. This tension, when appropriately used, is, I suggest, one of the most positive aspects of good performance. It explains why
we need more than one good performance of the same work, and why there is no single definitive interpretation.
If the peculiar tension in the Lindsays' playing of the chorale arises from the gap between intention and realisation, it is reinforced by the withdrawal of persona from the first violin, which plays the melody almost
as if it were not the principal line. This is signified by the lack of vibrato and by using Just Intonation, thus denying the normative sharpening towards the tonic. But Just Intonation, I suggest, has its own surcharge
of meaning, and this may even transcend the semiotic in its strictly natural origins. It is perhaps its very neutrality that has made Just Intonation unusual in mainstream classical performance over the last seventy
years. On the other hand, period instrument performance, in which Mean-tone tuning with its pure thirds is widely used, serves to remind us that any interpretation of performance practice is style-specific, and that
Additional Notes
1. Pitch-names, cents and intervals
An equal tempered semitone is 100 cents. At C4, a difference of 2 cents is about 0.3 Hz, and the equivalent in other registers can quickly be calculated by multiplied or divided by a factor of 2 per octave. So at C2,
2 cents represents a difference of 0.075 cents. Note that in Seashore (1938), intervals are shown in fractions of a whole-tone, where 0.01 = 2 cents.
Just Intonation is defined by the convergence of the lowest common harmonics between any two tones. For two frequencies, f1 and f2, the Just major third is given as f1x 5 = f2 x 4, so that f2 = f1 x 5/4 (386 cents).
The Just perfect fifth is 3/2, and the major seventh is 15/16.
2. Spectrum Analysis
The accuracy of a spectrum analysis depends upon the ratio between the sample-rate of the source recording and the size of sample submitted to analysis. The resultant analysis is a plot of frequency against
amplitude, the readings along the frequency axis proceeding in steps of k Herz, where
k = sample-rate/sample-size.
It is this ratio that determines precision in the reading of frequency. In Figure 1, k is set at 0.1682Hz, this being the ratio between a sample-rate of 22.05kHz and a sample-size of 217. The latter in fact represents a
duration of almost six seconds, but shorter extracts may be used with the same sample-size by padded the source-file with zeros. This generates mathematically predictable anomalies in the form of smooth curves
connecting the peak readings, but does not affect the peaks. A Hanning Window is applied to reduce this effect. In Figure 1, the original sound-source has a duration of 1.5s, 1/4th of sample-size.
The software I am using is SPAN, which is a purpose-specific implementation of the signal processing routines in Matlab (Johnson 1999c). For a more detailed implementation and discussion of spectrographic
analysis see Johnson 1999a. The somewhat complex mathematics is explained in Poularikas & Seely 1991: 259-260, or in any standard text on signal processing.
3. Comparison of Leading Notes in Op.135iii, bars 3 and 5
Figure 2 shows how 25 string quartets handle the tuning of the melodic leading-notes in bars 3 and 5 of Op.135iii. The ensembles are arranged in chronological order along the x-axis, from the Flonzaley Quartet's
recording of 1927 to the Leipziger Quartet's of 1999, and the sample includes a quite remarkable performance of an arrangement for the strings of the Vienna Philharmonic under Bernstein (No.22). This example
can be included because a well-trained string section plays with sufficient precision of intonation to generate very clear peaks in a spectrum analyses.
The upper plot relates to bar 3, and the lower to the equivalent tones in bar 5. The signs ∨ and ∧ indicate the tuning of the falling and rising leading notes respectively, calculated as intervals in cents from the
sustained Aflat3 in second violin. The zero line represents a Just major third (386 cents) between first and second violin. The equal tempered major third would be about 14 cents sharp, the Pythagorean, 22 cents
sharp.
Contrary to the principle of expressive tuning, the falling tone is on average the sharper, by a factor of about 5 cents in the upper plot and 6 cents in the lower. Contrary to 'expressive tuning', 16 (13) out of the
twenty-five recordings tune the rising leading-note flatter than the rising one, whereas only 4 (8) show the ascending leading higher than the lower; 5 (4) are identical, within a safe margin of error of <3 cents
(figures for bar 5 in brackets). And we can note that all the entries below the zero line, i.e. flatter than Just, relate to the rising leading note in both plots. We can similarly compare the melodic semitone between
these leading notes and the preceding and following tonics. As we would expect, the sharp major thirds are reflected by narrow semitonal steps in the melody (Johnson 1999b).
Figure 2
Key:
1 Flonzaley(1927) 14 Melos(1984)
2 Busch(1934) 15 Vermeer(1985)
3 Loewenguth(?) 16 Guarneri(1987)
4 Budapest(1941) 17 Lindsays(1987)
5 Hungarian(1953) 18 Emerson((1988)
8 Italian(1968) 21 Medici(1991)
9 Amadeus(1969) 22 Bernstein(VPO,1991)
10 Vegh(1973) 23 Juilliard(1996)
12 Talich(1977) 25 Leipziger(1998)
13 Amadeus(1982)
Bibliography
Adorno, T.W. (1997). Aesthetic Theory. (Translated by Hullot-Kentnor from Asthetische Theorie. Suhrkamp, 1970). London, Athlone Press.
Blum, D. (1977). Casals and the Art of Interpretation. Berkeley, Los Angeles, London, University of California Press.
Cambell, M. & Greated, C. (1987). The Musician's Guide to Acoustics. London and Melboune, J.M.Dent.
Cook, N. (1999). Words about Music, or Analysis versus Performance. In Dejans (1999), pp.9-52.
Dejans, P. (Ed.,1999). Theory into Practice: Composition, Performance and the Listening Experience. Leuven, Leuven University Press.
Hagerman, B. & Sundberg, J. (1980). Fundamental Frequency Adjustment in Barbershop Singing. In Journal of Research in Singing, Vol.4 No.1, pp.3-17.
Johnson, P. (1997). Musical Works, Musical Performances. The Musical Times, August 1997.
Johnson, P. (1999a). Performance and the Listening Experience. In Dejans (1999), pp.55-101.
Johnson, P. (1999b). Intonation in Professional String Quartet Performance. ESCOM 1999 (proceedings forthcoming).
Johnson, P. (1999c). SPAN: spectrographic analysis of musical extracts in the Matlab environment. Birmingham Conservatoire.
Makeig, Scott (1982). Affective versus Analytical Perception of Musical Intervals. In Clynes, M., Music, Mind and Brain: The Neuropsychology of Music. New York & London, Plenum Press.
Poularikas, A.D. & Seely, S. (1991). Signals and Systems. Boston, PWS-Kent.
Seashore, C. (1938). Psychology of Music. McGraw-Hill, 1938, reprinted by Dover Books, New York, 1967.
Tarasti, E. (1994). A Theory of Musical Semiotics. Bloomington & Indianapolis, Indiana University Press.
Back to index
Proceedings paper
Key phrases: Music perception, meaning, aesthetics, performance variability, factor analysis
Both the music structure and the music performance can in themselves can be powerful means of communication.
Listeners can typically identify the emotional character of a composition in the absence of expressive devices, for
example through a deadpan (without variability) sequencer realisation (Thompson & Robitaille, 1992).
Performances of the same melody can also be made to communicate different emotional expressions (Gabrielsson &
Juslin, 1996). Although both levels of communication can thus be effective, at least as far as emotions are
concerned, the question remains how they interact in normal music performance. Given that (systematic)
performance variability (PV) is almost ubiquitous in music performed by people, what purpose does it fill that is not
satisfied by other aspects of the music?
Several different scenarios can be imagined. Generative theory (e.g. Clarke, 1986; Lerdahl & Jackendoff, 1983)
implies that the role of performance expression is to facilitate the communication of the music structure. According
to this view, the performer's creative space must be constrained by the structure, and acceptable performances
according to experienced listeners can only exist within narrow bounds. It might for example only be possible to
reinforce or attenuate the same pattern of perceived expression ratings, not alter it altogether. This can be called a
reinforcement hypothesis. A different view has been advocated by (Repp, 1998), who suggests that the structure
provides a frame in which an expressive landscape can be painted by the performer. For example, a trivial melody or
rhythm may call for the performer to endow it with more content, or a slow tempo or a low event density may call
for things to happen so that the listeners will not grow bored. As a result, we would expect greater difference in
ratings as a function of PV for melodies with simpler structure or lower event density, constituting a compensation
hypothesis. Finally, structural information and PV might convey different kinds of expressions. It has recently been
demonstrated that emotional intentions could to a certain extent be identified through PV alone (Juslin & Madison,
1999), which might for example raise the question whether other kinds of expression may capitalise more on
structural features. This could be called a dissociation hypothesis.
The present study was designed to explore the relations between music structure and performance variability, by
comparing ratings of expression for natural performances with performances from which all PV was removed. To
this end, generality and ecological validity was sought by sampling melodies that were unknown to both performers
and listeners from a body of publicly available Western music, in order to avoid possible extramusical and individual
associations. The performances should also be natural, and although this is left to the performers' discretion, highly
experienced professionals can be expected to share an understanding of what is musically appropriate. The piano
was considered the most suitable instrument due to its limited means for expression. Neither performers nor listeners
would expect or try to use any other cues than tempo/timing, loudness, and articulation, and the risk for inconsistent
use of these cues is small. Fewer cues are also easier to control experimentally.
The study incorporates the generation of performances of 25 different melodies, a manipulation of these so as to
eliminate the PV, and a listening experiment.
Method
Performance session
One female and two male professional keyboards musicians participated, 34, 45, and 31 years old. Each had played a
keyboard instrument since early childhood (at least for 25 years) and had worked professionally for at least 10 years.
All frequently performed on both piano and organ, and they were paid for their voluntary and anonymous
participation.
The performers played a Casio Celviano AP-20 digital piano and received feedback through Sony CD-250
headphones, while the MIDI signal was recorded and stored in files on a PC by means of a MIDI interface.
The melodies were selected to represent a wide range of structural features, such as harmonic mode, metre, interval
size, and rhythmic and melodic complexity. The performances were between 13 and 31 seconds in duration (M =
19.3).
Performance manipulation
A deadpan realisation of each performance was made by creating a new MIDI file based on means for the three
performance variables (tempo, loudness, and articulation) in the corresponding original performance. The 75 original
and the 75 manipulated MIDI files were played back through the Casio piano, and the sounds were recorded digitally
on disk.
Listening experiment
Six male and four female musicians, 31 to 52 years of age (M = 43) participated. All had played an instrument for at
least 15 years and were currently active musicians
The number of adjectives to rate was kept at a minimum not to exhaust the listeners. The words angry, calm,
complex, fearful, happy, longing, musical, sad, solemn, stable, tender, and tense were selected as being
representative, based on a several studies of dimensionality for full performances (Campbell, 1942; Hevner, 1936;
Watson, 1942); Wedin, 1969; 1972a; 1972b). In addition, expressive and beautiful were included to test for the
presence of PV and the agreeableness of the performances.
The stimuli were presented in 6 melody × performance blocks, arranged such that each melody only occurred once
within each block, and the 6 performance conditions (3 performers × 2 PV) were rotated within the block. The
purpose of this design was to maximise the distance between presentations of the same melody, in order to avoid
contrast effects due to the comparison of different performances – while still retaining a representative balance
between the performance conditions within each block. Each listener individually attended four sessions and the
presentation order of blocks and performances within blocks was individually randomised. The first block in the
experiment contained 14 performances randomly sampled from the entire pool. Its purpose was to establish an
impression about the range of expressive features in the experiment, and these ratings were not included in the
analysis.
relative magnitude of dimensions that were actually used by the listeners. The inter-scale correlation matrix
indicated a cluster of "positive" adjectives, of which the highest correlation was for musical and expressive (.73). In
descending order we find correlations between tender and longing (.70), beautiful and musical (.63), beautiful and
expressive (.62), beautiful and longing (.61), and beautiful and tender (.60), followed by tender and calm (.56), angry
and tense (.55), and calm and tense (-.54). Exploratory FAs on a total of 72 cases (3 performers × 24 melodies)
favoured five-factor solutions for both performances with and without PV (based on the scree plot and
interpretability of factors) and the two sets of factors were similar, both in their respective explained variance (93.4%
and 86.4%) and factor loadings.
Factor I, with high loadings in expressive, musical, beautiful, and longing was interpreted as goodness or
pleasantness. Factor II, with high positive loadings for tense and angry, and high negative loadings for longing,
beautiful, tender, and calm was interpreted as tension versus calm, and factor III as fear and sadness versus
happiness. It is unusual that fear and sadness join in a factor, but one explanation may be that terms representing fear
and music expressing fear have rarely appeared together. Factors IV and V were identified as complexity versus
stability and solemnity. Most of the effect of PV on complexity–stability is probably trivially related to the stable
pulse which results from removal of PV. High factor loadings, with at least one loading > .80 for each factor,
indicates that the factors successfully compress the semantic scales into dimensions which can be given well-defined
semantic labels. These five dimensions are summarised in Table 1, and will in the following be called goodness,
tension–calm, fear–happiness, complexity–stability, and solemnity.
Table 1
Interpretation of the five factors, and the semantic scales with the highest loadings in each factor. Factor loadings for
adjectives in parentheses are smaller than 0.5.
Interpretation + adjectives - adjectives
I Goodness, expressivity Expressive
Musical
Beautiful
II Tension, anger– Angry Tender
Tenderness, calm Tense Calm
(Complex) Longing
Beautiful
III Fear, sadness–Happiness Fearful Happy
Sad
IV Complexity–Stability Complex Stable
V Solemnity Solemn
Under the assumption that these dimensions provide a valid data reduction and an interesting level of analysis,
another FA was performed on the means across performers for both PV conditions, and the factor scores for all 144
(2 PV × 3 performers × 24 melodies) cases were submitted to a three-way ANOVA for each factor, with the highest
interaction as error term. The results, summarised in Table 2, show that there were significant main effects of PV for
goodness, tension–calm, and complexity-stability. There were also main effects of performers for all factors but
fear–happiness, and of melody for all factors. The effect sizes d’ related to performers were however smaller than
0.20 (typically < 0.10) and these effects should therefore not be very important.
Table 2
Summary of three-way repeated measures ANOVAs for the factor scores related to a five-factor solution, in terms of
p levels for all effects.
PV Perf. Melody PV PV Perf. PV ×
× × × Perf. ×
Perf. Mel. Mel. Mel.
— n.s., *p < .05, **p < .01, ***p < .005, ****p < .001. *****p < .0001. a Highest-order interaction was used as
error term.
The main effects of PV were to increase the melodies’ goodness and decrease their tension. It also increased the
scores for complexity-stability, which was however essentially an effect of the isochronous (stable) pulse in deadpan
performances, as indicated both by interviews and adjective ratings. Finally, PV had no main effect on the
dimensions characterised as solemnity and fear–happiness. The interaction between PV and melody on goodness
suggests that natural variability patterns might have quite different effects for different structures. The factors will,
under the term perceived expressive dimensions (PEDs), replace the 14 adjective scales in the following analyses.
Contribution of structure, performance levels, and performance variability to the PEDs
The purpose of this section is to explore the relative contributions to the PEDs among a set of potentially relevant
variables describing the performances. Although this approach does obviously miss much of the detail of the
phenomena, it might tap possible general relations across the range of different music structures.
Overall tempo, loudness, etc., have substantial effect on, for example, ratings of emotion words (Juslin, 1997). It is
therefore important to account also for the mean levels of the mean performance levels (MPLs) of the expressive
variables although they were not experimentally controlled. MPLs were measured in terms of the means for all notes
across each performance, and its index of dispersion – that is, performance variability – is here expressed as the
dimensionless coefficient of variation (SD / M). Tempo is expressed as the intervals between subjectively perceived
beats (the tactus) in the sounding music, for the sake of consistency with other measures of time. Thus, higher MPL
values indicate high loudness (vs. low), legato articulation (vs. staccato), and long tactus duration (TD), that is, slow
tempo (vs. short duration).
The nine structure variables were based on transcriptions of the melodies (not shown here). Mode was coded as
major (1), minor (2), and mixed (3), melodic complexity as to correspond with function harmony (1), chromatic or
modulated (2), and free-tonal (3), according to the author’s judgement. Pitch, interval, and duration levels refer to the
number of different values, regardless of their distance or frequency, and range is the number of semi-tones between
the highest and lowest value. For time, it seemed more appropriate to measure the range in terms of the ratio
between the longest and shortest duration.
The usefulness of these variables depends on the amount of variance in the listeners responses that they explain.
Substantial multicollinearity can be expected within each group of variables (structure, MPL, and PV), but not
between the groups, and a hierarchical multiple regression analysis (MRA) was therefore performed for each PED
on its mean factor scores across the ten listeners (72 cases). The same kind of MRA was also made for the original
performances, so as to detect any differences in the way listeners used structure and MPL when PV was absent.
Figure 1 shows that all variables together could explain between 34 and 63 percent of the variance across all PEDs
and the two PV conditions. The variance explained by structure is highest for perceived complexity (49%), as might
be expected, but only for deadpan performances.
Figure 1. Cumulative proportion variance explained by hierarchical MRA for three blocks of variables; structure,
MPL, and PV. Each combination of a PED (1-5) and a PV condition (with or w/o PV) is represented by a column,
whose total height is equal to R2 for all predictor variables together in a simultaneous MRA. All differences are
statistically significant (p < .05) except the increments due to PV for fear-happiness.
It is notable that the presence of PV seems to increase the contribution from MPL for goodness, but decrease it for
tension–calm and solemnity. Also, PV decreases the contribution of structure for goodness and complexity–stability.
These results might indicate a dynamic effect of PV, in that it directs the listeners attention to certain features. For
example, if one can hear that the loudness varies, then it can be assumed that loudness conveys some kind of
information, which makes both the local and overall level of loudness more interesting.
Interaction structure–performance variability
This section focuses on the three hypotheses – reinforcement, compensation, and dissociation. The principal variable
will be the difference dPV = original - deadpan, for which a positive value indicates that the factor scores, and hence
some of the positively loaded adjective scale ratings, were higher with PV. A correlational analysis between ratings
of original performances and dPV on the one hand, and the structure and performance variables on the other, was
made across the entire set of 72 performances.
The compensation hypothesis suggested that not only complexity but also the time available should affect the PV.
There are at least two levels of time in music that might be relevant in this context, namely tempo (as measured by
TD) and event duration (ED), which is the time between onsets of note events. Whereas TD is nominally
isochronous, ED is also a function of the structure. There will be large differences between mean, minimum, and
maximum ED when there are many levels of note durations. As it was not obvious which measure is the most
important for PV, all were included in the analysis.
Table 3 shows the correlations between factor scores for original (left half) and dPV (right half), and the structure
and performance variables. If the corresponding correlations for original and dPV have the same sign, it means that
high factor scores are decreased more as a function of removing PV than smaller scores are, which is in line with the
reinforcement hypothesis. According to the same reasoning, opposite signs between the left and right half of Table 3
supports the compensation hypothesis.
Table 3
Correlations between structure and performance variables (rows) and PEDs based on ratings (columns). Left and
right panels of the table compare original performances and the differences between original and deadpan in terms of
mean factor scores across 10 listeners (N = 72).
Original dPV = Original – Deadpan
FI FII FIII FIV FV FI FII FIII FIV FV
Mode .26 * .07 .37 * .30 * -.15 -.21 .17 .25 * -.23 * -.04
Melodic complexity .15 -.01 .32 * .30 * -.03 -.20 .01 .17 -.21 -.10
Pitch levels .13 -.12 .41 * .01 .04 -.09 -.07 .05 -.11 -.15
Pitch range .24 * -.23 .32 * -.06 .06 -.09 -.04 -.02 -.09 -.02
Interval levels -.06 -.26 * .05 -.02 .19 -.00 .05 -.08 .04 .02
Interval range .14 -.08 .14 .14 .18 .05 .07 -.28 * .04 .09
Interval size -.03 .16 -.00 .17 -.02 .08 .20 .19 -.28 * .03
Duration levels .21 .06 .30 * .28 * -.19 -.18 .05 .27 * -.14 -.18
Duration range .33 * -.09 .17 .05 -.14 -.31 * -.04 .19 .03 -.19
Loudness .05 .58 * .09 .11 .08 -.22 -.25 * -.15 -.12 .04
Articulation (legato) .09 -.10 .24 * -.06 .09 .07 .17 .38 * -.17 -.18
Tactus duration -.01 -.16 .04 -.04 .22 .29 * .09 .16 .00 .01
Event duration min. -.10 -.23 * -.09 -.14 .34 * .27 * .01 -.07 .03 .02
Event duration Md -.08 -.31 * .29 * -.10 .34 * .20 .12 .16 -.13 -.01
Event duration max. .20 -.08 -.03 .34 * -.09 .06 .02 .19 -.15 -.18
For goodness, the correlations with mode, melodic complexity, duration levels, and duration range have all opposite
signs for original and dPV. These four variables measure the structure complexity as inferred from the scores, and
are also positively correlated with the listeners’ ratings of complexity (R2 = .61 for all 9 structure variables). Thus,
various indicators of complexity apparently increase ratings of goodness, but also decrease the effect of PV on
goodness. The same phenomenon is found for complexity–stability, but this will be ignored because it is likely a
trivial effect of deadpan performances being rated as more stable. Opposite signs are also found for loudness and
event duration (ED) in tension-calm.
All in all, these results indicate that the ratings behind the goodness and tension-calm PEDs are subject to
compensation by means of the variability that the performers impose when attempting to make a natural
performance. One could say that structures which are perceived as less pleasant – predominantly the simpler ones –
are made more pleasant, whereas already pleasant structures do not need as much improvement (based on the fact
that the main effect of PV on goodness is positive). Likewise, the compensatory effect on tension-calm is that
structures which appear more tense in a deadpan performance are made more relaxed and tender by the performance
variability than those that appear less tense in their deadpan version.
The two remaining PEDs, fear–happiness and solemnity, do not follow the same pattern. The three correlations
including fear–happiness that are significant for both original and dPV are all positive, which speaks in favour of the
reinforcement hypothesis for this PED. That is, when mode, duration levels, and articulation affect the ratings along
the fearful/sad-happy dimension, PV increases those effects further, so that PV makes a sad or fearful performance
even more sad etc. The bipolarity of fear–happiness is probably the reason why it did not demonstrate any
significant main effect according to the ANOVA (Table 2). Finally, there are no significant correlations at all
including dPV for solemnity, which is in line with its lack of main effect of PV.
General discussion
Whereas it has generally been assumed that the purpose of PV is to communicate more effectively the structure,
recent work seems to refute the perceptual reality of such a claim (Tillman & Bigand, 1996), and argue that it might
instead serve to communicate the performer’s aesthetic expression (Repp, 1998). In other words, although variability
patterns may perfectly well reflect performers’ overlearned hierarchical representations of the structure, it is doubtful
whether the ordinary listener would notice. In any case, the short melodies used in this study do not comprise many
organisational levels to convey.
The question remains why there was nevertheless a substantial amount of variability, and why it affected the
listeners’ perception in certain directions. If the purpose were to improve the communication of structure, consistent
effects of PV would have been less likely in view of the wide range of different melodies. However, a few possible
sources of errors in this study must be considered.
First, it can not be excluded that the crude elimination of PV might have caused some artefactual interaction with the
other expressive devices, such as tempo. It seems however unlikely that such phenomena be consistent enough to
cause significant effects across the relatively large number of very different structures.
Second, three performers and ten listeners are admittedly few. It was however believed that these performers’ long
experience would warrant their correct anticipation of listeners’ ideals. Musicians were chosen as listeners because
they were believed to be more internally and inter-individually consistent, and less susceptible to superficial
differences between the melodies.
Third, the setting was constrained to monophonic performances and the few expressive variables available on the
piano. It is possible that this paucity precludes several expressive scenarios, and one should be careful when
attempting to generalise to other conditions. But it is also possible that the performers’ high level of skill enabled
them to approach what would have been the case in a more expression-wise favourable setting.
Bearing these concerns in mind, a more general problem lies with the concept of performance expression,
specifically with its definition as systematic deviations from the "neutral" structure given in a score (Clarke, 1996).
This is obvious when the music is not notated, but one can also question the theoretical arguments for taking the
score "literally" in the sense that equal units of musical time should necessarily correspond to equal units of physical
time. However, extending the definition to include deviations from a norm (Desain & Honing, 1992, p. 175) could
raise even more severe problems, depending on what "norm" is meant to be. For example, Repp (1998) found that
one pattern of timing deviations accounted for 61.4 percent of the (timing) variance among 115 commercially
available recordings of Chopin's Etude in E major. As this pattern can justifiably be regarded as a norm, should we
consider the performances which were closest to this norm to be inexpressive? And if the norm or the most
pleasurable way to play a piece deviates in a specific way from the notation, why have we not developed a notation
with the appropriate level of resolution?
Generative theory suggests that rule-based transformations of canonical values constitutes an economical way of
rendering performances which convey the hierarchical structure of the music. In addition, performers' internal
representations may differ, and give rise to an aesthetic variety from which individual listeners can choose.
However, most music has a simple large-scale structure with repetitions of two or three sections, and is frequently
performed with uniform tempo and loudness, at least within consciously audible thresholds. The generative scenario
can only take place at the outskirts of music as a whole, where compositions are multi-level hierarchical, and the
performer is allowed to impose large-scale deviations. And even under these conditions, it appears the listeners
would have to know the piece very well in order to appreciate both the large-scale structure and to recognise
variability patterns that help to convey it (e.g. Cook, 1987; Karno & Konecni, 1992; Konecni, 1984; Gotlieb &
Konecni, 1985). For example, Tillman and Bigand (1996) arbitrarily shuffled musical chunks (~6 s duration) in
commercially available recordings of classical piano music, which means that any hierarchical performance
variability patterns must also have become scrambled. However, the listeners' appreciation of the music was not
reduced by the shuffling.
It has also been suggested that music elicits or communicates emotions. Ranging from the speculative, the purely
theoretical, and the empirical, much of the latter work has investigated how representations of emotions are
expressed and decoded. Although the melodies in the present study were selected in part for their wide range of
emotional character, the small reinforcements found for fear, sadness, and happiness are not strong or consistent
enough to warrant a simple communicative interpretation. The results do rather suggest a general tendency to
decrease tension and increase goodness, whatever the latter may stand for in a wider psychological perspective.
As inconsistent as the present results may be, they may help to point in certain directions concerning the function of
performance variability, and the role of the performer. Further research along this path might prove effective in
elucidating the performer’s contribution to the qualities that music convey.
References
Campbell,J.G. (1942). Basal emotional patterns expressible in music. American Journal of Psychology, 55, 1-17.
Clarke,E. (1986). Theory, analysis and the psychology of music: A critical evaluation of Lerdahl, F. and Jackendoff, R.:
A Generative Theory of Tonal Music. Psychology of Music, 14, 3-16.
Clarke,E. (1996). Expression in performance: generativity, perception, and semiosis. In J. Rink (Ed.), The practice of
performance. (pp. 21-54). Cambridge: Cambridge University Press.
Cook,N. (1987). The perception of large-scale tonal closure. Music Perception, 5(2), 197-206.
Desain,P., & Honing,H. (1992). Music, Mind and Machine. Amsterdam: Thesis Publishers.
Gabrielsson,A., & Juslin,P.N. (1996). Emotional expression in music performance: Between the performer's intention
and the listener's experience. Psychology of Music, 24, 68-91.
Gotlieb,H., & Konecni,V.J. (1985). The effects of instrumentation, playing style, and structure in the Goldberg
Variations by Johann Sebastian Bach. Music Perception, 3, 87-102.
Hevner,K. (1936). Experimental studies of the elements of expression in music. American Journal of Psychology, 48,
246-268.
Juslin,P.N. (1997). Perceived emotional expression in synthesized performances of a short melody: Capturing the
listener's judgment policy. Musicae scientiae, 1(2), 225-256.
Juslin,P.N., & Madison,G. (1999). The role of timing patterns in the decoding of emotional expressions in music
performances. Music Perception, 17, 197-221.
Karno,M., & Konecni,V.J. (1992). The effects of structural interventions in the first movement of Mozart's symphony in
G-Minor, K. 550, on aesthetic preference. Music Perception, 10, 63-72.
Konecni,V.J. (1984). Elusive effects of artist's "messages". In W. R. Crozier & A. J. Chapman (Eds.), Cognitive
processes in the perception of art. (pp. 71-96). Amsterdam: North-Holland.
Lerdahl,F., & Jackendoff,R. (1983). A generative theory of tonal music. Cambridge, Massachusetts: MIT press.
Repp,B.H. (1998). A microcosm of musical expression I: Quantitative analysis of pianists' timing in the initial measures
of Chopin's Etude in E major. Journal of the Acoustical Society of America, 104(2), 1085-1100.
Thompson,W.F., & Robitaille,B. (1992). Can composers express emotions through music? Empirical Studies of the Arts,
10 (1), 79-89.
Tillman,B., & Bigand,E. (1996). Does formal musical structure affect perception of musical expressiveness? Psychology
of Music, 24, 3-17.
Watson,K.B. (1942). The nature and development of musical meanings. Psychological Monographs, 54(2)
Wedin,L. (1969). Dimension analysis of emotional expression in music. Swedish Journal of Musicology, 51, 119-140.
Wedin,L. (1972a). A multi-dimensional study of perceptual-emotional qualities in music. Scandinavian Journal of
Psychology, 13, 1-17.
Wedin,L. (1972b). Multidimensional scaling of emotional expression in music. Swedish Journal of Musicology, 54, 1-17.
Back to index
Proceedings abstract
Mr Erik Lindström
Erik.Lindstrom@psyk.uu.se
Background:
Aims:
The main purpose of this investigation was to elucidate the effects of factors
in music performance and melodic structure in achieving emotional expression.
How is inherent expressiveness in melodic structure administered in real
performance? How is a happy tune performed to sound happy ? Is it possible to
increase expressed happiness of an inherently less happy tune by adequate
performance? Are certain notes of more importance and articulated more
explicitly in some expressions?
method:
Two performers were asked to play these versions so as to clearly express the
respective emotions. A listeningtest was conducted to evaluate their
communication of emotion to listeners.
Results:
Conclusions:
Means for emotional expression may be found in (a) the musical structure per
se, (b) in adequate performance, as well as (c) in the interaction between
structure and performance.
Back to index
Proceedings abstract
College of Education
UNIVERSITY OF IOWA
Background
Aims
Results
Results and discussion will focus on the implications for music teaching and
learning. The interpretative performance task has already been piloted by the
author in the United States, as has the memory task by Prof. Belardinelli in
Italy.
Conclusion
This study is the first attempt to use these measures together to obtain a
richer profile of children+IBk-s musical thinking. We believe the gathering of
cross-cultural data will promote more generalizable conclusions about
children+IBk-s musical development.
Back to index
Proceedings paper
important determinant of friendship (Aboud and Mendelson, 1996; Hallinan, 1980). Some research
shows that attempts of teachers to use friendship cliques as a means of judging academic achievement
have been largely unsuccessful (Waller, 1932; Brady-Smith, Newcomb, and Hartup, 1978, cited in
Foot et al, 1980), despite other evidence demonstrating that there is a positive relationship between
high cohesive groups and higher achievement (Seashore, 1954; Kruger and Tomasello, 1986).
One theory of why the collaborations of friends and non-friends may have different effects on
achievement is that conflict occurs more often in friendship groups. Using the Piagetian perspective,
psychologists believe that socio-cognitive conflict is the distinguishing factor of successful and
non-successful peer collaboration. When children experience confrontation whilst attempting to solve
a problem, the effect is 'disequilibrating'. The conflict forces the children to attempt an understanding
of the alternatives, and consequently they achieve a more in-depth and objective understanding of the
task (Kruger, 1993). However, social determinist psychologists believe that conflict does not lead to
productive group work. It is thought that it is cooperation and not conflict which will result in
cognitive change (Vygotsky, 1978). Additional reasons as to why friendship group collaborations may
be more successful than non-friendship groups are that individuals share more information with
friends than with non-friends (Newcomb and Brady, 1982) and that friends create a more desirable
working atmosphere (Foot, Chapman, and Smith, 1977).
Of course, friendship groups may not produce wholly desirable behaviours. Whilst they may
collaborate in a more positive and friendly way, there is a chance that this will provoke more off-task
behaviour and general chat or play. However, despite this, it appears that friendship groups have the
potential to create an effective working environment, with good interactions, which have the potential
to produce high quality results. In light of the positive and negative effects of friendship, it is
necessary to consider exactly what defines effective interaction.
Effective interaction is commonly understood to involve contribution from all group members.
Examples of such behaviour within a group includes participation, cooperation, listening, helping,
negotiating, and developing ideas with others. If individual group members interact in this way, it is
suggested that the group is collaborating successfully (Plummer and Dudley, 1992; Bales, 1950a;
Kruger, 1992).
There has been little research directly related to friendship group interactions and musical
composition. Morgan, Hargreaves, and Joiner (1997-98) investigated how peers collaborate in
compositional tasks, and the influence this has on the resulting music. They conducted two studies
investigating group interactions and the effects on musical composition. Their research demonstrated
the importance of gender as a factor in group collaborations, highlighting the dominance of girls in the
groups, and their overall ability to cooperate more successfully. Mixed gender groups cooperated less
well than homogeneous groups, with all-girl groups producing the most effective compositions. Since
friendship groups are usually same-sex, these findings suggest that friendship groups may well have a
positive effect on collaboration.
It appears that the study of group collaboration is relatively under researched, and that there are many
questions which remain unanswered. Evidence suggests that whilst group work is highly
advantageous, the specific conditions in which it should occur remain unclear. Comparisons between
variations of grouping such as friendship, ability, and gender, and their different influences on the
quality of interaction and outcome, have yet to be drawn. Furthermore, very little evidence considers
the impact of group work on musical activities.
EXPERIMENTAL STUDY
The current research aimed to investigate how various groupings of children, particularly friendship
groups, influence collaboration focusing on composition quality and the quality of interactions. Five
groupings were used for this purpose: friendship, non-friendship, matched-intelligence,
mixed-intelligence, and random. In addition, the study was conducted in two schools, one with a
reputation for musical activities and one which offers it’s pupils few musical opportunities, in order to
establish whether any differences existed between them.
Research highlights the benefits of friendship group collaborations, and so it was anticipated that
friendship groups would produce the highest quality compositions. Ability also has also been found to
have an important influence on past studies, and so it was thought that mixed ability groups would
produce more effective results than matched ability groups. Gender has been shown to play an
important role in the selection of friends in younger children, which means that friendship groups
consist of same-sex group members. It was hypothesised that since gender is such an influencing
factor on friendship groups in this age group, homogeneous gender groups would produce higher
quality compositions than heterogeneous gender groups. An additional research aim considered the
effect of school and whether there is any difference between the quality of composition produced in
each. It was thought that the children in the ‘musical’ school would create higher quality compositions
than the children in the ‘non-musical’ school, due to greater musical experience.
Friendship groups are more likely than the other grouping conditions to spend a substantial amount of
time together, and therefore it was anticipated that friendship groups would produce the most effective
group interactions. Ability, however, has not been found by previous research to be an influence on
the quality of interaction, and so it was not expected to be an influencing factor in the present study. It
was anticipated that the quality of outcome should reflect the standard of interaction. If a group does
not collaborate effectively, then it could be considered that the outcome will not be as good as that
produced by a group which has a higher quality interaction. Therefore it was expected that the highest
quality composition would occur with the most effective interaction.
METHOD
Participants: The participants used in this study were from two different schools - one of which does
not have an active music education (school A), and the other which does (school B). The children
were in Year 6, and therefore aged either 10 or 11. There were 29 children from school A; 11 boys
and 18 girls. There were a total of 30 children from school B; 17 boys and 13 girls.
Data was collected by the researcher and the class teachers who observed the childrens’ interactions.
The researcher and an additional, independent individual marked the compositions, and an extra
person operated the video camera.
Procedure: Initial planning The first stage of the study involved interviewing the teachers in order to
discover information about the friendship groups within the class, normal working groups, and
amount of music experienced in class. Teachers additionally provided approximate levels of
attainment for each child. The children were asked to complete a pre-study questionnaire which asked
about their friends, usual working partners, musical experience, and whether there was anyone in the
class that they did not wish to work with.
Once the data had been collected, children were allocated to groups. In school A, 29 children were
organised into 5 groups; 4 with 6 members, and 1 with 5. In school B, 30 students were organised into
5 groups of 6 children. As mentioned previously, 5 different group variables were manipulated in this
research. For the control variable, children were randomly allocated into groups. The ability matched
groups were organised as far as possible to ensure that each group contained children of the same
grade. Where this was not possible, care was taken to ensure that similar grades were present in each.
For the mixed ability variable, one student from each ability grade (1-5) was placed in each group. For
the groups of 6, the procedure remained the same as for the groups of 5, with the extra children
randomly assigned to each group. The children usually had pre-established friendship groups of 5 or
6, and these remained the same in the friendship variable. In this variable it was more difficult to
accommodate those children who were isolated, or generally disruptive, and who nobody wanted to
work with. Where possible, these children were placed with children whom they had named as their
friends, although the feeling may not have been reciprocated. The non-friendship groups involved
mixing those children together who had specified a preference not to work together. However, care
was taken to confirm the groups with the teacher, just in case there was a particularly volatile
arrangement.
The sessions: As far as possible, the research sessions proceeded in the same way for the two schools.
For school A, three visits were made. Each lasted approximately 2 hours. For school B, there were
two visits, each lasting approximately 2½ hours. A total of 10 compositional tasks were used in all.
They were taken either from existing education literature (Paynter, 1992; Mills, 1995; Morgan et al,
1997-98) or were specially constructed by the researcher.
The classroom was organised in order to place the tables as far away from each other as possible and
seats were arranged around the table, so that all of the group members could see each other, therefore
aiding participation and cooperation. Onto the tables, 5 or 6 different percussion instruments
(depending on the size of the group) were placed, in order to avoid arguments regarding which
instruments they wanted, and also to ensure that each group had different timbral sounds to explore
The children were then told which groups they would be in for the first two tasks. A brief introduction
to the sessions was given by the researcher, explaining that they would be working with groups of
different children from within their class, to compose different pieces of music using the percussion
instruments. They were informed that the video camera was going to film the sessions, and also that
the teacher and the researcher would be observing their work. Additionally, it was explained that they
would be expected to perform their compositions at the end of each task. The first task was then
described. There was an opportunity for any questions or clarification. They were given a time limit of
15 minutes in which to compose a piece.
As the sessions took place, the two observers moved around the room, completing the questionnaires
noting information regarding the group leader, major contributors to the discussion, the type of
interaction, the speed of progress, and whether they were operating as a group. The video camera
recorded those groups which were not being observed by other means, and the footage was later
analysed for signs of interaction quality by assessing aspects of non-verbal communication. The
researcher used the audio tape recorder on those tables she was observing, and this was later
transcribed and analysed to assess the quality of interaction. About 10 minutes into the session the
researcher informed the class that there were only five minutes remaining in which to organise their
final ideas, and prepare to perform their pieces. After the full 15 minutes, the groups proceeded in
numerical order to play their pieces whilst the video camera recorded them. Once all the pieces had
been performed, they were informed of their next task, and the procedure was repeated for the
remaining tasks.
After each group had completed the allocated two tasks, the children were asked to complete a
post-task questionnaire, concerning their perceptions of the group's interactions, whether or not they
enjoyed the activities, and who they did and did not like in the group. They were asked to complete
them on their own, without discussing them with the other group members.
Methods of assessment: data analysis The performances of the compositions were marked using an
adapted criteria of "Teacher’s constructs of music activities" (Hargreaves, Galton, and Robinson,
1996) which operated on a scale of 1-7 between two polar points. There was, however, one change
made to number 14 of this criteria. Whether the children were 'technically skillful or technically
unskillful' was not deemed relevant to this study, and so this was replaced with 'task requirements
fulfilled or unfulfilled', since it was felt that this was important due to the precise nature of the
composition tasks that were involved in this research. The compositions were marked by the
researcher and a musician external to the study in an attempt to reduce bias factors.
The video footage, audio tape transcriptions, observer questionnaires, and participant questionnaires
were analysed for quality of interaction. A chart was created by the researcher which used criteria
suggested by past research (Kruger, 1992; Berkowitz, 1980) to constitute good, effective group
interaction. Included are equal contribution, participation and cooperation from all members, listening
to each other, and discussion. If all listed elements were present then the group received the maximum
score. Thus, the lower the score, the weaker the group interaction, and vice versa. A total score of 11
was possible.
RESULTS
Data was collected for both compositional, and interactional quality, and the results for each were
analysed independently using a 3 Factor Analysis of Variance (ANOVA) - the factors being school,
grouping, and composition.
A three factor ANOVA on compositional quality found that the effect of grouping was not statistically
significant. However, a significant effect of school was found (F1, 35 =10.769, p=0.0023) and there
was significant interaction between school and grouping (F4,35=3.77, p=0.0119). More detailed
analysis reveals that in school A, the higher attaining ability-matched groups appear to have created
better quality compositions than the lower attaining groups. This finding is interesting since school A
was the school which provided least musical instruction. The ANOVA also revealed that the
compositional tasks did not influence the standard of composition, which suggests that no task was
more difficult than another, a factor that could have potentially distorted the results.
Grouping, however, was found to be a highly significant influence on the quality of interaction
(F4,35=6.814, p=0.0004). Friendship groups produced the highest interaction scores in both schools.
The Fischer PLSD revealed that the friendship groupings produced significantly higher results than
the other groupings (p<0.05). The mean marks across the two schools also reveal that the
non-friendship variable produces the lowest scores for interaction results.
Whilst no statistically significant difference was found between the quality of interaction in the two
schools, there was some effect of school and groupings on the group interaction scores (F4,20=3.2,
p=0.0243). The analysis also revealed that the different compositional tasks also influenced the
interactions (F16,140=2.57, p=0.0097).
A one factor ANOVA testing the effect of ability on the quality of interaction revealed that the result
for school B was significant (p=0.0216), although the result for school A was not. Gender was also
found by a one factor ANOVA to be an important influence on the quality of interaction (p=0.0059).
The Fisher PLSD reveals that both the homogeneous-boys, and homogeneous girls groups produced
higher quality interactions than both the mixed gender groups (1.544), and those groups which consist
of one girl and four or five boys (1.947). This allows the conclusion that homogeneous gender groups
in this age group collaborate more effectively than heterogeneous gender groups, supporting the
findings of Morgan et al (1997-98).
A Spearman’s correlation revealed no relationship between the quality of interaction and the quality
of composition.
DISCUSSION
Contrary to expectation, grouping was not found to influence the quality of composition. There are
several possible explanations for this finding. One possible reason is that when friends work together,
there is always the danger that there will be some off-task discussion and general playing around. One
example of this occurred in school A. The boys in the fourth group were trying to decide which
emotion they wanted to depict musically. Having established that they would try and depict sadness,
the amount of on-task talk was very limited. It seems, therefore, that friendship groups may not
achieve higher results than any other grouping simply because the amount of on-task behaviour is
less. A second possible explanation is a methodological one: the ability levels provided by the
teachers were approximate, and the grades did not consider musical ability. The estimated grades do
not reflect aptitude for particular subject areas (for example, if a child is of average ability but
outstanding in Art, the grade does not represent the fact that s/he is better at creative subjects than
scientific subjects). Nor is music an assessed part of the National Curriculum at this level, and so the
children's estimated grades did not include their musical ability.
Another possible reason why grouping did not have the anticipated effect on compositional quality is
that the majority of previous research does not focus on musical activities. The fact that the findings
of the current research do not concur with existing evidence may be because past studies have focused
on settings other than music, and so the results cannot be generalised to this particular domain.
Furthermore, music is seen as a specialised field, and unlike other academic subjects such as science
and mathematics, it is given very little emphasis by the National Curriculum. Children, therefore, may
have a lack of experience with music, which may have an effect on the results.
However, although there is no effect of grouping on compositional quality, there was a significant
effect on the quality of interaction. Friendship groups interacted most effectively, and there are several
possible explanations for this. Firstly, it is obvious that friends spend a lot of time together, both in the
classroom and at play. They will have established methods of interaction, and will have developed a
way to share ideas, compromise, and complete work on time. Secondly, in accordance with past
research, conflict situations occurred more frequently in the friendship groups than in any other group
situation (Berkowitz and Gibbs, 1983). Whilst not all friendship groups experienced conflict
situations, there were often instances when the children were critical about each others ideas, which
eventually resulted in the development of those ideas. Comments such as "no, we don't want no silly
idea", and "no, don't do that, that's crap", led to more productive results. Sometimes the comments
would be substantiated with encouraging remarks such as "No, it sounds good the way it is....That
sounds best". The marks for the friendship interactions were considerably higher, probably because
these kinds of conversations led to extended discussions and the elaboration of ideas - important
criteria on the marking scheme. When critical comments, such as an idea being "silly" were made, no
negative feelings seemed to have been created, supporting suggestions from psychologists who
believe that criticism is received differently by friends than non-friends (Gottman and Parkhurst,
1980). In the groups of non-friends, negative comments become a lot more personal. "You're just
stupid then", "If you don't shut up, I'm going to put you down as being bossy" (on the questionnaire),
"Rosa, are you actually awake?", "You're not supposed to sing, you sound stupid". Remarks such as
these can hardly improve the quality of the composition, since they are negative criticisms regarding
personal characteristics of the individuals to whom they are directed. This obviously had an impact on
the group collaborations, since the non-friendship groups produced the lowest interaction scores.
Additional conflict situations in the groups of friends occurred due to leadership problems. If an
individual takes control, it may be the case that the other group members resent one person taking
over, or that the leader resents having to tell everyone what to do. This happened in school A, with
one group of girls. One of the girls was feeling terribly isolated, whilst one of the other group
members resented being expected to tell everyone what to do. Interestingly, the results achieved by
this group were the highest - some 4 marks above the other groups. This provides further support for
the findings of psychologists such as Kruger (1993), and Berkowitz and Gibbs (1983).
Gender was also found to have a significant influence on the quality of interaction. In the ability
groupings (which were mixed gender groupings), analysis of the interactions revealed that it was
usually the girls that took control of the activities, and assigned roles to each of the group members,
which is in direct contrast to the findings of previous research (Aries, 1976); they were more
conscientious, and would often make notes about their plans. In the mixed gender groups the children
would generally work within gender subgroups, and if they did collaborate with the whole group, they
would only do so after they had discussed their ideas separately. This supports the findings of Morgan
et al (1997-98), who in a similar study found that mixed gender groups cooperated less well than
single sex groups.
CONCLUSION
The present study was designed to investigate the influence of friendship, gender, and ability
groupings on the standards of composition and interaction produced by children aged 10 and 11 from
two different schools. The results of the study suggest that grouping does not influence the quality of
composition produced, although it does have an effect on the interaction. Friendship groups were
found to produce the most effective group interactions. Gender was additionally found to be an
important influence on the interactions, with girls interacting more effectively than boys. No evidence
was found that higher quality compositions occur with higher quality interactions.
There has been little research directly related to that of the present study, although the findings
regarding gender influences on the group collaborations do agree with previous research by Morgan,
Hargreaves, and Joiner (1997-98) and friendship groups were found to produce more effective
interactions, supporting past research (Kruger and Tomasello, 1986). However, contrary to previous
research, ability was not found to be an important influence on compositional quality, although the
results suggest that matched ability groupings are not as effective as friendship groupings, thus
supporting the findings of Kulik and Kulik (1991).
There was a highly significant difference between the quality of compositions produced by the two
schools, in favour of the school which did not provide as many musical opportunities. This has certain
implications for the teachers of music. School B had certain kinds of musical practices which
concentrated on theory and instrumental teaching. This suggests that if a school chooses to
incorporate music into their lessons, then steps should be taken to ensure that the education is
well-balanced, and does not simply emphasise theory and performance.
Further implications for classroom teaching (of music) can be drawn from these findings. It suggests
that grouping can positively influence the work atmosphere by allowing individual group members to
share and develop ideas, which may consequently lead to personal learning, and possibly group
success. These findings suggest that this is indeed a topic deserving of further investigation.
REFERENCES
Aboud, F. E., & Mendelson, M. J. ‘Determinants of Friendship selection and quality: Developmental perspectives’. In
Bukowski, Newcomb, and Hartup (eds.) ‘The company they keep: Friendship in childhood and adolescence’.
Cambridge University Press, 1996.
Aries, E. ‘Interaction patterns and themes of male, female and mixed groups’ Small group behaviour 7: 7-18, 1976.
Bales, R. ‘Interaction Process Analysis: A method of the Study of Small Groups’ Chicago, University of Chicago
Press, 1950.
Barnes, D., & Todd, F. ‘Communication and Learning in Small Groups’ London, Routledge and Kegan Paul, 1977.
Bennett, N., & Cass, A. ‘The Effects of group Composition on Group Interactive Processes and Pupil Understanding’
British Educational Research Journal, 15 (1), 19-32.
Galejs, I. ‘Social Interaction of preschool children’ Home Economics Research Journal, 2, 153-159, 1974.
Galton, M., and Williamson, J. ‘Group work in the Primary Classroom’ Routledge, 1992.
Hargreaves, D. J., Galton, M. J., & Robinson, S. ‘Teachers’ assessments of primary children’s classroom work in the
creative arts’. Educational Research, 1996, 199-211
Hallinan, M. T., ‘Patterns of Cliquing Among Youth’. In Foot, Chapman, and Smith ‘Friendship and Social Relations
in Children, John Wiley and Sons Ltd., 1980.
Jones, M. G., & Gerig, T. M. ‘Ability Grouping in Classroom Interactions’ The journal of classroom interaction, 1994,
27-33.
Kruger, A. C. ‘The Effect of Peer and Adult-Child Transactive Discussions on Moral Reasoning’ Merrill-Palmer
Quarterly, pp. 191-211, 1992.
Kruger, A. C. ‘Peer Collaboration: conflict, cooperation, or both?’ Social Development, 1993, 165-180.
Kruger, A. C., & Tomasello, M. ‘Transactive discussions with peers and adults’ Developmental Psychology, 22,
681-685, 1986.
Mabry, E. A. ‘The Effects of Gender Composition and Task Structure on Small Group Interaction’ Small Group
Behaviour, 1985, 16 (1), 75-96.
Morgan, L. A., Hargreaves, D. J., & Joiner, R. W. ‘How do children Make Music? Composition in Small Groups’
Early Childhood Connections, 1997-98.
Plummer, A. D., & Dudley, P. (project leaders) ‘Assessing Children learning collaboratively’. Essex Development and
Advisory Service, 1993.
Seashore, S. E. ‘Group cohesiveness in the industrial work group. University of Michigan Press, 1954.
Vygotsky, L. ‘Mind in society: The Development of Higher Psychological Processes’ Cambridge, Mass: Harvard
University Press, 1978.
Back to index
Proceedings abstract
A. Penel
penel@psycho.univ-paris5.fr
Background:
On the one hand, the process of sequential grouping, the use of temporal
regularity, and the possible incorporation of these into hierarchical
organizations have been demonstrated with simple auditory sequences. On the
other hand, Western tonal music presents two related structural components, the
hierarchical grouping and hierarchical metrical structures, and the functioning
of hierarchical grouping and hierarchical metrical organizations have also been
demonstrated in the case of music.
Aims:
Main contributions:
Implications:
Back to index
Proceedings abstract
Back to index
Proceedings paper
http://www.acad.carleton.edu/curricular/MUSC/faculty/jlondon/index.htm
1. Some preliminaries.
There is a growing body of work in the philosophy of music and musical aesthetics that has
considered the various ways that music can be meaningful: music as representational (that is, musical
depictions of persons, places, processes, or events); musical as quasi-linguistic reference (as when a
musical figure underscores the presence of a character in a film or opera), and most especially, music
as emotionally expressive. Here I will focus on the last topic, for I believe it will be useful for
researchers in music perception and cognition to avail themselves of the distinctions that aestheticians
have worked out regarding the musical expression of emotion.
Now we often say that music is "expressive," or that a performer plays with great expression, but what
exactly do we mean? There are at least things one may be saying. First, one may be praising a
performer for their musical sensitivity, that he or she has a keen sense of just how a passage is
supposed to be played. Such praise is often couched in terms of the performer's "musicality" (in
statements that border on the oxymoronic, as when one says that a performer plays the music very
musically). Such praise may also be couched in terms of expression--i.e., that a performer plays
"expressively." I have little to say about these attributions, save that they are often linked to the
second thing one often means when speaking of the music or a performance being expressive: an
expressive piece or performance is one that recognizably embodies a particular emotion, and indeed
may cause a sympathetic emotional response in the listener. Thus if one plays "expressively," this
means that the music's particular emotional qualities--its sadness, gaiety, exuberance, and so forth, are
amply conveyed by the performer.
Before going further, a number of other preliminary remarks are in order. When we speak of the
expressive properties of music, these are distinct from the expressive properties of sound. Sounds may
be loud, shrill, acoustically rough or smooth, and so forth. These acoustic qualities have expressive
correlates and may trigger emotional responses, and of course one cannot have music without sound.
But musical expression is more than this: it requires the attention to the music qua music, rather than
as mere sounds. The opening "O Fortuna" of Carmina Burana may shock (and indeed scare) the
listener due to its sudden loudness (especially when the bass drum starts whacking away), but this
shock isn't a musical effect--we get the same reaction when we here a sudden "bang" at a fireworks
display or when a car backfires. By contrast, in hearing the opening of Mozart's 40th symphony as
having a quality of restless melancholy, one attends to both the musical syntax and its sonic
embodiment.
Another caveat: as Hanslick has noted, at times a musical work may arouse feelings in the listener
through ad-hoc associations. In other words, one must be on guard for the "they're playing our song"
phenomenon. These associative properties may be quite strong, and can operate in marked contrast to
the innate expressive qualities of a given piece, as in the paradigmatic case of a happy piece that
arouses sadness because it reminds the listener of a lost love or deceased friend. As will be noted in
some detail below, context plays a pivotal role, and here context can include not only genre, but
extra-musical information such as lyrics, the image track of a film score, and literary programs. I take
it, however, that a primary interest for researchers in the perception and cognition of musical
expression will be in the intrinsic expressive properties of the music itself.
Finally, in philosophical discussions of meaning and expression, there is usually what might be called
"the inter-subjective agreement requirement." Here is an example from visual art. If I show you a
picture of a man on a horse, and you and everyone else says "that's a man on a horse," this confirms
that the picture is a successful representation of a man and a horse. Moreover, I don't have to give you
any cues or hints regarding its representational subject. By the same token, in order for a piece of
music to be "an expression of emotion X" (or "expressive of X") there must be broad consensus
among listeners that the music expresses X, a consensus arrived at without any extra-musical
prompting. One problem for accounts of musical expression is that such inter-subjective agreement
often does not happen: one listener says a given piece is an expression of anger, while another says it
expresses hate, another jealousy, and yet another of sinister passion. What emotion does this piece
express? While anger, hate, jealousy, and sinister passion are related emotions, the piece nonetheless
fails to individuate any one of them in particular. Musical expression is plastic enough so that the
same passage might be expressive of a wide variety of emotional states.
2. Simple Emotions, Higher Emotions, and Moods
In the late 19th century Eduard Hanslick famously denied that music had any ability to express
emotions, and many 20th century aestheticians (and composers, most notably Stravinsky) held this to
be true. Why would one take up such a counter-intuitive view? Well, philosophers often take up
counter-intuitive views, and if you are a philosopher, there are two problems to be surmounted if one
wants to claim that a piece of music expresses a particular emotion. The first is the "who" problem:
whose emotion is being expressed? Emotions are felt by living, sentient creatures, and as Malcolm
Budd has noted, "It cannot be literally true that [a piece of] music embodies emotion, for it is not a
living body" (Budd, Music and the Emotions, p. 37). One is thus tempted to claim that a piece of
music is an expression of its composer's emotion. But when one examines the compositional history
of most works this claim also falls apart, for composers often write sad music, for example, even
when they feel no particular sadness (as in the case of Funeral March from of Beethoven's "Eroica"
symphony). Nor are they in the throes of sadness during the entire course of composing a piece of
music, since the compositional process may last weeks, months, or even years (see, for example, the
Adagio Mesto movement of Brahms' trio for horn, violin, and piano, which is purported to be an elegy
to his mother). Thus if pieces of music are expressions of emotion, they are disembodied, and usually
disconnected from any particular "emotional cause" in the life of their composer.
The second problem is the "why" problem: emotions typically require what are referred to in
philosophical parlance as intentional objects, that is, particular people or events that play a causal role
in trigger an emotional state. Thus we are jealous of a particular person, frustrated at a particular state
of affairs, feel grief at the death of a particular friend or relative, and so forth. One does not, for
example, feel "jealous" in general (though one may have a disposition toward jealousy).
Not all emotions are like jealousy and frustration, as some do not always require intentional objects.
While one can be sad due to particular event, one also can be generally sad, for example, and such
sadness is not dependent upon any particular person, state of affairs, and so forth. As Colin Radford
has pointed out, "not all emotions, or occasions of emotion are rational, i.e., they are not informed by,
explained and justified by appropriate beliefs [that is, intentional objects]" ("Muddy Waters," p. 249).
Radford also explicitly acknowledges that "we naturally call such feelings 'moods.'" ("Muddy
Waters," p. 250). Thus there is a distinction between higher emotions (which require an intentional
object) and simple emotions and moods which may/do not.
There is now general consensus that music can express moods and simple emotions, contra Hanslick.
Some aestheticians, most notably Jerrold Levinson, have claimed that in some music contexts music
can do more, in that it is capable of mimicking the characteristic "look and feel" of at least some of the
higher emotions (see Music, Art, and Metaphysics, chapter 14).
3. How music expresses emotions I: Cognitivism
But just how does music express simple emotions? There are two main points of view on this
question. The first, developed (and much defended) by Peter Kivy, is known in philosophical circles
as "cognitivism" or "cognitivist" theories of musical expression. The second, one with a long
historical pedigree, can be termed "emotivist" or "arousal" theories of musical expression. Taking up
the cognitivist charge, Kivy has repeatedly denied that music really arouses what he has termed the
"garden varieties" or real-world instances of sadness, happiness, anger, and other simple emotions in
the listener (though music may move the listener through its sheer beauty). For even simple emotions,
when fully aroused, usually relate to an intentional object. Thus if we say that a piece of music makes
us sad or angry, what exactly are we sad or angry about--the music? ("that damn Symphonie
Pathetique!") or its composer? ("that damn Tchaikovsky!"). And has already been noted, a piece that
seems expressive of happiness may actually trigger sadness due to extra-musical associations.
For the cognitivist, the expressive properties of music are properties intrinsic to the music, and not, to
quote Kivy, "dispositions to arouse emotions in [the] listener" ("Feeling the Musical Emotions," p. 1).
Kivy takes this position from O. K. Bouwsma, but he also acknowledges psychological antecedents
for this view, in particular Charles Hartshorne's The Philosophy and Psychology of Sensation (1934),
and Kivy cites Hartshorne observation that 'Thus the "gaiety" of yellow (the peculiar highly specific
gaiety) is in the yellowness of the yellow' (see ibid., note 2, p. 1). In making this move, one allows
that music that is expressive of sadness need not make the listener sad.
How exactly does music then express emotions if not by arousing them in the listener? Here Kivy,
Levinson, and many others would agree with this explanation given by Malcom Budd (who takes this
view in large part from the music psychologist Caroll Pratt): "music can be agitated, restless,
triumphant, or calm since it can possess the character of the bodily movements which are involved in
the moods and emotions that are given these names" (Music and the Emotions, p. 47). Likewise Kivy
develops a "physiognomy of musical expression" and thus claims that music is expressive of these
basic emotions by its resemblance to human utterance and behavior. Music thus distills certain aspects
of human expressive behavior, especially that of the voice, and renders those aspects into dynamic
musical shapes.. Levinson's claim that music can express some higher emotions (such as hope) is
based on the claim that some higher emotions have characteristic physiognomies that can be musically
portrayed (see ibid.).
Note, however, on this view that in order for the contours of a musical phrase to express an emotion,
one must recognize that this "musical utterance and behavior" is akin to other, non-musical utterances
and behaviors. Thus musical expression is mediated through our understanding of social behavior in
general, and what might be termed a knowledge of "social musical behavior" in particular. It is for this
reason that one may mistake musical expressions in an alien musical culture, not because we do not
know the musical language, but perhaps primarily because we do not know the normative social
behaviors onto which the musical gestures may be mapped.
To sum up so far: the "cognitivist" theory of emotional expression in music says that a piece of music
expresses a particular emotion if a suitably grounded listener is able to recognize that emotion in the
musical structure by analogy to human social behavior, but she need not assume that this emotion was
felt by the composer (or is felt by the performer), nor does the listener have to experience that emotion
while listening.
4. How music expresses emotions II: Emotivism
For many other aestheticians, cognitivism is a necessary but insufficient account of emotional
expression in music. As Jenefer Robinson has noted, not only does music frequently express
emotional qualities, it also frequently affects us emotionally by evoking or arousing emotions in the
listener ("The Expression and Arousal of Emotion in Music," p. 13). But what kinds of feelings does
music arouse? Are they the same as our "ordinary" emotions, or are they special "musical versions" of
emotions? And what is their relationship to our understanding of musical expression?
A common tack taken by a number of philosophers has been to claim that music arouses our
emotions, but in a special way. For Kendall Walton, who approaches all kinds of aesthetic experience,
and not just music, as a special kind of imaginative activity, expressive music "evokes the imaginative
experience of the emotion expressed: more precisely, music expressive of sadness, say, induces the
listener to imagine herself experiencing sad feelings" (this cogent summary of Walton is from Jenefer
Robinson, op. cit., p. 18). In other words, for Walton our emotions aren't really aroused, but we
imagine they are. For Stephen Davies and Jerrold Levinson, expressive music really does arouse the
listener's emotions, but emotions of a greatly attenuated kind--"sadness lite", for example. As Kivy
has noted with respect to their theories, such emotional arousals "must be weakened, . . . because they
do not have the power to make use behave the way those emotions would do in ordinary
circumstances" ("Feeling the Musical Emotions," p. 11). For Kivy, champion of cognitivism, this is
inadequate. We do not have imaginary or stunted emotional responses when we listen to expressive
music, but real, full blown feelings--albeit feelings of a special kind. For Kivy, what moves us is sheer
musical beauty, and this beauty may be emotionally individuated: "Sad music emotionally moves me,
qua sad music, by its musically beautiful sadness, happy music moves me, qua happy music, by its
musically beautiful happiness, [and so on]" ("Feeling the Musical Emotions," p. 13). For all of these
philosophers, however, what we do not experience when we listener are ordinary feelings of sadness,
happiness, serenity, or so on. Musical emotions are always of a different order.
Jenefer Robinson takes a different approach, one that tries to avoid making musical expression a
special case. She considers most carefully what we really do feel when we hear expressive music, and
then what we make of those feelings: "As I listen to a piece which expresses serenity tinged with
doubt, [for example], I myself do not have to feel serenity tinged with doubt, but the feelings I do
experience, such as relaxation or reassurance, interspersed with uneasiness, alert me to the nature of
the overall emotional expressiveness in the piece of music as a whole" ("The Expression and Arousal
of Emotion in Music," p. 20). Robinson takes care to note that "the emotions aroused in me are not the
emotions expressed by the music" (p. 20), and so for her it is not simply that sad music arouses
sadness. Rather, our basic feelings--or perhaps "reactions" is a better term--of tension, relaxation,
surprise, and so forth, are combined with our awareness of the musical gesture and syntax, and
through this combination we gain a sense of what emotion(s) a piece may express.
5. Conclusion and implications for research
As is now clear, while philosophers of music now generally agree that music can express at least some
emotions, there is much disagreement as to which particular emotions can be expressed, whether or
not such expression depends upon arousing an emotional response in the listener, and if so, what kind
of feelings exactly music does arouse. Nonetheless, philosophical discussions of musical expression
have a number of implications for research in music cognition and perception.
1. Musical expression always involves sonic properties, and to things like loudness and
roughness I would add the rhythmic properties of sounds (as indicative of coordinate
movement, spatial location, and so forth). Moreover, alterations to the "sonic" properties
of a musical passage may be made without changing its basic melodic or harmonic
structure--the same melody and accompaniment played high, fast, and loud may convey a
vastly different expressive character from its low, slow, soft version (the locus classicus
of such variations is the various presentations of the idée fixe in Berlioz's Symhonie
Fantastique).
2. If one uses "real world" musical stimuli, especially well-known repertoire, one will
often be faced with "associative interference," as one cannot control the contexts in
which subjects have first heard and come to know such repertoire. Therefore in many
cases newly composed or otherwise unfamiliar musical stimuli may be preferable, as they
circumvent such interference.
3. While music alone may only express a garden-variety emotion, such as anger, that
same music in a richer semantic context may be properly heard as an expression of
jealousy or hate. Different visual and/or linguistic cues will give different expressive
results. Moreover, a level of musical activity that is most apt for one particular emotion
may be inapt for another. For example, a passage that expresses "anxious anticipation"
very well will not be made more expressive by making it louder, faster, and so forth.
There isn't a simple linear relationship between musical parameters and the robustness of
an emotional expression.
4. Some perfectly good musical expressions of emotion may not arouse those emotions
(or much of anything, for that matter) in the listener. Yet it would be incorrect to call
such passages "inexpressive."
5. Any emotions that are aroused by listening to music, while perhaps similar to "real"
emotions that occur in non-musical contexts, nonetheless have important differences.
Even if context provides an intentional object for an emotion, transforming a yearning,
longing passage into an expression of hope (to take an example from Levinson), it is not
at all clear that the listener should feel hopeful, what she should be hopeful about, and so
forth. Moreover, such hope (and its emotional stimulation) is commingled with other
aesthetic properties--balance, beauty, intensity, coherence--and those properties may (and
most certainly will) also stimulate affective responses of their own.
Works Cited
Budd, M. (1985, 1992). Music and the Emotions: The Philosophical Theories. New York,
Routledge.
Davies, S. (1994). "Kivy on Auditors' Emotions." The Journal of Aesthetics and Art Criticism
52(2): 235-36.
Goldman, A. (1995). "Emotions in Music (A Postscript)." The Journal of Aesthetics and Art
Criticism 53(1): 59-69.
Graham, G. (1995). "The Value of Music." The Journal of Aesthetics and Art Criticism 53(2):
139-53.
Kivy, P. (1989). Sound Sentiment. Philadelphia, Temple University Press.
Kivy, P. (1993). "Auditor's Emotions: Contention, Concession and Compromise." The Journal
of Aesthetics and Art Criticism 51(1): 1-12.
Kivy, P. (1994). "Armistice, But No Surrender: Davies on Kivy." The Journal of Aesthetics and
Art Criticism 52(2): 236-37.
Kivy, P. (1999). "Feeling the Musical Emotions." British Journal of Aesthetics 39(1): 1-13.
Levinson, J. (1990). Music, Art, and Metaphysics. Ithaca, Cornell University Press.
Martin, R. L. (1995). "Musical "Topics" and Expression in Music." The Journal of Aesthetics
and Art Criticism 53(4): 417-24.
Meyer, L. B. (1956). Emotion and Meaning in Music. Chicago, University of Chicago Press.
Radford, C. (1989). "Emotions and Music: A Reply to the Cognitivists." The Journal of
Aesthetics and Art Criticism 47(1): 69-76.
Radford, C. (1991). "Muddy Waters." The Journal of Aesthetics and Art Criticism 49(3):
247-52.
Robinson, J. (1994). "The Expression and Arousal of Emotion in Music." The Journal of
Aesthetics and Art Criticism 52(1): 13-22.
Back to index
Proceedings paper
In a recent book (Addis, 1999), I developed a theory of emotion in music that, in the spirit of Susanne
Langer’s work of midcentury (Langer, 1942), maintains that passages of music represent emotions to
the listener. On my version, this fact about music and humans rests on certain facts of our human
nature and on what I call the ontological affinity of consciousness and sound. In short, that music does
represent emotions and possibly other states of consciousness to us is grounded in our being the kind
of species we are and the basic natures of mind and music. The representation, therefore, is not purely
conventional like that of language, nor is it purely natural like that of thought itself. But because it is
more nearly natural than conventional, I describe it, somewhat infelicitously, as a quasi-natural
representation.
These two features--ontological affinity and quasi-natural representation--are, or may be, sufficient to
account for the power of music in human affairs. But I also follow Langer in holding that, as a further
aspect of its power, music is able to represent--or "express," as might be the more natural word in this
context--certain aspects of reality that language cannot express. More precisely put, we may say that
there are some nuances or subtleties of emotion that music can, but language cannot, capture. This is
sometimes called the ineffability thesis even though, strictly speaking, that is a theory only about what
language cannot do and not one of one music can do. Langer’s version of the ineffability thesis is
much more radical than mine in that she holds that there are forms of states of affairs in the world
itself that are in principle unrepresentable in language because they are not of the subject/predicate
form. My much milder version is only that, while all states of affairs including those involving
emotion are of the subject/predicate form, no language contains the predicates (or a means of
generating the predicates) that would be adequate to the expression or description of all the subtle
differences of mood and emotion that humans are capable of experiencing or conceiving. Nor is it that
music can, even in this domain, do everything that language cannot do, but only that it can do some of
what language cannot do.
A friendly reader and critic, Bruno Repp, while not necessarily disputing my conclusion, has
suggested that I have not yet made my case with respect to the limits of language as compared to
music. My failure, he says, derives from the fact that (and here I quote from a private communication,
with his permission) "whenever you refer to language in relation to music, with regard to ineffability
for example, you ... seem to have in mind the grammatical and semantic properties of language, even
when referring to poetry. There is no reference to the prosody of spoken language--its rhythm,
intonation, volume, timbre, tempo, and so on. It seems to me that there are close analogies between
language prosody and music, but you did not touch at all on these analogies in your arguments"
(Repp, 1999).
To a failure to consider matters I should have considered, I do plead guilty. But assuming that the
results I would have come to then had I considered the relevant analogies and the properties of
language that Repp mentions are the same as those I shall arrive at in this paper, my conclusions
would have been unchanged. At the same time, consideration of these matters will, if I am successful,
enrich our understanding of the multiple ways in which emotion "connects" (to use an intentionally
vague word) with both language and music. And I also want to use the occasion to speak of some
broader issues concerning what language is in its spoken and written forms, what analogies there may
be between reading and performing, and other matters. I shall eventually conclude that there are three
major ways in which emotion connects with both music and language, but that their relative
importance to each other is quite different between language and music.
Let us begin by fixing firmly in mind a distinction that is crucial to the understanding of both
language and music but which is frequently ignored and sometimes even denied, at least for music.
Consider the following passage that might appear in a probably very bad short story: "Upon reading
the editor’s letter of rejection, Ernest cursed, crushed the letter in his hand, threw it into the fire, and
burst into tears." The emotions represented in this passage are clearly disappointment and anger, but
the emotions you felt on hearing and understanding the passage were more likely mild amusement,
possibly slight pleasure, or maybe just indifference. The point is, of course, that the emotions
represented by the passage are not the same emotions as those felt by the listener. Exactly the same is
true of music. The emotions I feel when I hear and understand sad music may be--if, for example, my
daugher is the performer and doing it well--pride, happiness, and excitement. I recognize the music as
expressive of sadness, but I certainly do not feel sad in so doing. We have before us, then, the
fundamental distinction between the emotions that a given piece of music expresses or represents on
the one hand and the emotions that the music, as heard by a particular person on a particular occasion,
arouses in that person. My theory, like Langer’s, is primarily about how music expresses, and not how
it arouses, emotion. Still, we now have before us two of the three major ways in which music and
language connect with emotion: by expressing it and by arousing it. But before we come to the third
connection, let us consider the nature of language and its relation to emotion.
Language is at once one of the most familiar and one of the most recondite of phenomena for human
beings. Language is an intimate part of our lives, playing its role in most of our waking hours in either
spoken or written form, and is the primary mode of communication and perhaps of interaction
between and among the members of our species. What could be "closer" to us than language? Putting
aside the fascinating intricacies and mysteries of language learning and the role of language in our
evolutionary heritage, I want to attend momentarily to the seemingly mundane question of what it is
to be a word. The simple, tempting answer is that it is to be a sound in the case of spoken language
and a shape in the case of written language; and for many purposes that answer, false though it be, is
adequate. That it is a false answer, or at least an incomplete one, can be seen immediately by asking
ourselves why it is that certain sounds and shapes are, but others are not, words. The answer, of
course, is that the privileged ones are the ones that have meaning; and if we ask what it is for a sound
or a shape to have meaning, we see that in no case is it some intrinsic property of the sound or shape
but instead something in us, and essentially so. Thus a word is, in some way or other, a combination
of publicly observable sound or shape and some feature of human consciousness. Simply as sound or
shape, nothing is a word, that is, a part of language. What I want to stress here is that the notion of
language apart from consciousness is unintelligible; nothing is language or a part of language except
as including consciousness in some way.
Now let us turn to music again and consider briefly what a score is. Not all music, probably not even
most music that humans have ever produced, involves scores just as most instances of spoken
language have no prior written form. We needn’t ponder the many senses of the word ‘music’ to agree
that in probably its most important sense, music is sounds of a certain sort. In that sense, a score is not
music; and its creator, the composer, does not create music. Speaking in this way, we might better say
that the composer creates instructions for producing instances of a piece of music much as the person
who writes down a recipe for a new soup provides instructions for making instances of a soup
without, thereby, making soup. When the production of music does result from the prior construction
of a score--that is, a physical object that, according to certain conventions "tells" the performer to
produce certain sounds--then music is, as Nelson Goodman puts it, a "two-stage" art (Goodman,
1976); and we recognize the creators of both stages--composer and performer--as artists, each with a
creative task to be undertaken. We might note that what is achieved by the use of scores could in
principle be accomplished by written or even spoken language, for there is nothing in what a score
"tells" a performer that cannot be said in language. Keeping this fact in mind will diminish the
temptation to think of a score as music in any literal sense.
What I have just said about scores may be obvious and uncontroversial, but I want now to argue
something that is by no means uncontroversial--that what we call written language is not really
language at all in just the sense that a score is not really music. I have long harbored the view or
suspicion that not just drama but also poetry and even the novel in the literary arts are best conceived
as performing arts (maybe all arts are performing arts!) not because I believe with the
deconstructionists and other postmodernists, as I most emphatically do not, that there is no fact of the
matter with respect to meaning, but because a poem or a novel exists as such, that is, as something
with meaning, only when it is being read or recited. The book as physical object is not a novel--it is
just ink shapes on paper--but, instead, instructions for creating the novel which exists only when
someone is in the act of reading or reciting the novel. There are also the limiting cases of reading
silently and reciting in one’s mind which are analogous, respectively, to going through a score and
hearing the music in one’s mind, as we say, and running a piece through one’s mind. These activities
are extensions of speaking language and performing music, not of written language and score, and it is
only spoken language, including sign language, that is literal language just as sounds alone are literal
music.
If we do include reading to oneself, silently or not, and hearing music in one’s head as limiting cases
of speaking and making music, then the sonata and the poem may already have existed when the
composer and the poet were putting ink marks on paper. Be that as it may, we can affirm that nothing
is taken away from the significance of the novelist or poet, any more than from the composer, if we
insist that in a certain sense all they do, at least publicly, is to put certain shapes on paper or to cause
them to be so put. For in doing so, they make possible, through elaborate conventions, for certain art
objects, literary and musical, to come into existence, and that is achievement of the highest order for
our species.
No score is a piece of music; no book (as physical object) is a novel. Music and novels exist only as
inherently temporal objects, as sounds or the auditory images of sounds and the meanings that attach
to those sounds, either conventionally or naturally. Thus I come to the intermediate conclusion,
perhaps already vaguely known to us, that the issues of the connection of emotion to music and
language pertain only, or almost only, to music as sounds and to language as spoken, not to music as
score or language as written. And with that we must turn now to those issues directly.
Suppose, to modify our earlier example, that Ernest is not a character in a story but someone whom a
friend of yours likes very much but whom you, secretly, intensely dislike. Having just observed
Ernest, your friend says to you, in a tearful voice, that upon reading the letter from the publisher,
Ernest cursed, crushed the letter and threw it into the fire, and burst into tears. In this case, as before,
we can distinguish the emotions represented by the content of those words--the disappointment and
anger of Ernest--from the emotions aroused in the hearer of the words. The latter, because you dislike
Ernest and also consider him to be a terrible writer, we may imagine to be pleasure and satisfaction.
Thus again we have our distinction between emotion expressed or represented and emotion aroused.
But what about the emotion of the speaker? When I described your friend’s voice as a tearful
one--alluding thereby to its pace, volume, pitch, intonation and whatever other such properties make
for a tearful voice--I was indicating that the speaker was expressing some emotion--sadness, we may
gather--by the manner in which those words were uttered. Notice that I have used the language of
expression in two of the three aspects, and we must understand clearly that the sense in which the
words expressed Ernest’s anger is very different from the sense in which they expressed your friend’s
sadness. For your friend felt sad and that feeling was manifested in how she spoke. But words, either
as sounds or as sounds plus meaning, don’t feel anything. So let us drop the language of expression
and speak simply, as I already have, of emotion represented (Ernest’s anger), emotion aroused (your
pleasure), and emotion manifested (your friend’s sadness). And notice, before we turn again to music,
that those features of language to which Repp called attention and which I ignored in the book
contribute not to the emotion represented by the speaker’s words but to the manifestation of the
speaker’s own emotion. And of course it is extremely important in human affairs to be able to discern
another person’s emotional state from that person’s manner of speaking, the inability to do so being
one of the main symptoms of autism.
How, if at all, does this three-fold distinction apply to music? If it does apply, is it in a way similar or
analogous to that of language? When an orchestra performs the funeral march of Beethoven’s 3rd
Symphony, sadness is being represented. What emotions are aroused in the listeners--happiness, pride,
nostalgia, envy, maybe even also sadness--will depend on the particular listener’s circumstances. Is
there also emotion manifested, presumably by the performers? Does it matter, insofar as this is a
performance of music? The power and accuracy of the representation of emotion by the music will
vary from performance to performance as those performances themselves differ. And the relevant
differences in performance will, obviously, depend mostly on properties of the performer, although
also on the nature of the instruments, the temperature and humidity, the accoustical properties of the
place of performance, and other factors. Of the factors that are properties of the performer, some will
be that person’s emotional states. Thus, some differences in performances are due to differences in
emotional states of performers; and all performances, we may surmise, depend in part on performers’
emotional states, at least in the sense that if their emotional states were radically different from what
they are, the performances would be different.
But does the performance manifest the performer’s emotional state? That is, can a normal or typical
listener, whatever that might exactly be, make a reasonable conjecture as to the performer’s emotional
state from the character of the performance, much as you could discern your friend’s sadness from the
character of the utterance of her words to you? If you know the performer well, and especially if you
include visual as well as auditory properties as those of the performance, you may well be able to
discern the performer’s emotional state with some measure of reliability. But if we know nothing of
the performer and must rely only on what we see and hear in the performance, and especially if only
on what we hear (which is, arguably, all that really constitutes the performance), we can tell very
little; if we do make conjectures, we may learn later that they were often mistaken.
But now I ask again whether or not the manifestation of emotion or even its existence in the performer
is of any musical significance? It is one of the leftover myths of Romanticism that the important thing
about musical performance is that the performer is expressing him- or herself in performance. It is a
myth, not because it is false that the performer is expressing in performance, but because it is trivial
and irrelevant. It is trivial because almost everything a person does voluntarily is an expression,
intentional or not, of some aspect of oneself. It is irrelevant because the value and meaning of the
music and its performance lies in its observable characteristics and not in how it came about; it is only
a causal question of how the inner states of the performer were relevant to the observable
characteristics of the performance and only a biographical question as to what those inner states were.
Yet the myth is so powerful that most people, including most musicians, are unable even to conceive
any way to think of music except as expression by the performer whereas before the eighteenth
century almost no one would even have understood the idea of music as personal expression. That fits
well with--indeed, it partly constitutes--not only the ideals of Romanticism that continue to plague our
musical culture but also with the radical individualism of our ideological culture.
Be all that as it may, the lesson I wish to draw from these reflections is that while the phenomenon of
manifestation of emotion certainly occurs in the performance of music, it is essentially irrelevant to
grasping the character and meaning of the music and to the realm of music generally. Again, as a
matter of causal fact, what comes out of us depends on what is inside us including our emotional
states. But music must never be understood as about the emotions of the performer (except per
accidens) nor, except in the trivial sense, as an expression of those emotions. We rightly praise the
composer and the performers for their role in bringing about music; but what emotional states they
happened to be in while exercising their roles is only of biographical, never musical, interest.
If we now ask why there should be such an asymmetry between music and language with respect to
the manifestation of emotion, the easy and obvious answer is that in the typical use of spoken
language, the state of mind of the speaker is just what the listener is presumably interested in--the
speaker’s beliefs, intentions, values, and emotional states. The content of what is said goes mainly to
belief, the manner of speech mainly to emotional state, and intentions and values somewhere in
between, so to speak. But in music, what corresponds to manner of speaking--pace, intonation,
volume, pitch, and so on--are themselves part of the content and therefore of the meaning of the
music. This fact also explains why it is so difficult, in observing a performance, to distinguish the
properties of it that may be manifestations of the performer’s emotions from those that, intentionally
or not on the part of the performer, contribute to the emotion represented by the music and why,
except in extreme cases, we are ordinarily entitled to take the properties of the performance as
representing and not manifesting emotion.
At this point someone may object that the relevant analogue in language to the performance of music
is not everyday conversation of the sort of my example but instead what are also regarded as
performances such as public readings of poetry. My answer is that, to the extent to which in such
cases the manner of speech does contribute to the meaning of the poem being read, it is a musical
performance. The Greeks, if I may pretend to be learned for a moment, characterized any art in which
the Muses presided as mousik_ techn_, which is the origin of our very word ‘music’. But perhaps I
can soften the seemingly stipulative character of this reply by observing that, in this highly atypical
use of spoken language, it is very difficult, as in musical performance, to disentangle which aspects of
the reader’s utterances are to be taken as part of the content of the poem and which the manifestation
of the reader’s emotional states while, unlike the case of music, it becomes very important to do so.
But nothing important rests on where exactly, in what we now may take to be something of a
continuum between what we call language and what we call music, we place poetry reading, acting,
lying, ceremonial speaking, and all of the other many ways in which we use language. But we may
certainly say, and even stress, that when a manner of speech contributes more to the meaning of the
words than to the manifestation of the speaker’s emotional state, we have a special use of language
that, in a very important respect, is more like music than the typical use of language.
We return finally to the matter of ineffability and my claim that music can represent to us some
differences of emotion and mood that language cannot. Repp correctly pointed out that I had ignored
the prosody, in a broad use of that word, of language in making my claim and wondered, at least
implicitly, whether or not taking it into account would tell against the claim of ineffability. What we
now see, if I am correct, is that while those prosodic features do have a connection with emotion in
the use of language, they are typically that of manifestation rather than representation of emotion.
Because of this fact and because, as we also saw, the prosody of music does contribute primarily to its
content and therefore its meaning, the case for the power of language to represent anything music can
represent is not strengthened by taking those features of language into account. This, obviously, does
not prove that music actually can represent something that language cannot, but it removes one
apparent consideration to the contrary.
In fact, I have not here made any substantive argument that music really represents anything at all,
much less that what it represents is mood and emotion. Those arguments are in the book. But perhaps
enough has been said to conclude that in whatever the meaning of music is to be found, it is not in the
emotions of either the listeners to, or the performers of, music. This may sound odd, even
counterintuitive, given the way most people casually think and talk about music. But I am convinced,
and hope in the foregoing remarks partly to have demonstrated, that the understanding of music and
its power in human affairs requires that we look beyond the comparatively superficial facts of the felt
emotions of the participants in music to something that is represented to us by music.
REFERENCES
Addis, Laird (1999). Of Mind and Music. Ithaca: Cornell University Press.
Langer, Susanne (1942). Philosophy in a New Key: A Study in the Symbolism of Reason, Rite, and Art.
Cambridge: Harvard University Press.
Goodman, Nelson (1976). Languages of Art: An Approach to a Theory of Symbols. Indianapolis:
Hackett.
Repp, Bruno (1999). E-mail message to Laird Addis.
Back to index
Proceedings paper
1. Introduction
Our ability to parse the acoustic array into its many meaningful sound objects rely on our capacity to process sensorial information.Variables that influence perception
of sonic entities in every day life can be broadly organized into three sets. The former refers to subject-dependent factors as attention, memory, knowledge as well as
gestures and postures. The second group is environment-dependent factors as masking, interference that can arise from interactions among different sonic events. The
latter includes the cognitive correlates of the physical parameters that define each sonic event.
The possibility to interact with these variables facilitates our ability to answer to specific tasks in cognitive processing. We can modify our attention or posture, and we
can move our body with the purpose of optimizing hearing conditions. In the same way, we can change cognitive strategies in order to fulfill different needs and
improve our adaptability to different situations. Under certain circumstances we can improve perception by partially modifying the acoustic environment around the
sonic event inorder to attenuate the interference from other events. Still, we can directly alter the sonic event itself by changing some of its parameters.
By means of these actions we can affect the segmentation of the sonic surface into cognitive units useful for further cognitive processing regardless any cultural
difference (Dowilng, Jay & Harwood, 1986). Such units interact in terms of structuring forces in the processes underlying the definition of cognitive correlates of a
sonic event.
A typical examples of sound parameters partial modification can be found in musical practice when musicians gradually modify the result of their sight-reading by
adding nuances and micro-variations to the their performance. The role of such variations has been studied in several computer models of musical performance
(Frieberg, A. & Sundberg J., 1995). Generally, these models allow more convincing results for tonal than for non-tonal music. Weaker results for several compositions
of twenty century may be linked with the lack of coherence between the structuring rules defined by the composer and strategies the listeners apply during the
performance of the composer's work (Imberty, 1996).
In order to face the issue it might be useful to rely on concepts that are likely to be applied in any acoustic context rather than notions linked with specific musical
grammars (e.g. the link between harmony and tonal music).
Within this framework the aim of this study is to investigate the concepts of continuity and discontinuity. The interest in these categories is twofold.
1. Continuity and discontinuity may be considered as perceptual qualities that derive from variations on many of the sonic event dimensions
1. A Continuity/Discontinuity framework
Several studies stress the relation between the variation on certain dimensions of a sonic event (frequency, amplitude, duration) and listener ability to organize the
acoustic surface into meaningful "chunks" or segments.
Segmentation of a sonic event have been linked with amplitude (Fraisse, 1974; Vos, 1977; Gabrielsson, 1973) frequency (Tekman, H.G., 1998), duration (Vos, 1977)
and timbre variations (Dowling, 1973).
As well as pitch and loudness are perceptual correlates of frequency and amplitude, discontinuity and continuity may be perceptually correlated with the
presence/absence of perceptible variations on these or others dimensions.
Therefore, from this point of view, subjects' ability to extract cognitive units during the listening experience may be related to their ability in the detection of
discontinuities and continuities on multiple acoustic dimensions.
As a consequence of their cognitive nature, continuities and discontinuities may also emerge in absence of any notable acoustic variation. While istening to a regular
sequence of pulses, listeners often add discontinuities that lead to the perception of "strong" and "weak" pulses. Moreover, discontinuities on one dimension may
emerge from the variation other than the first one. Frequency, amplitude and duration strongly interact in simple acoustic stimuli (Tekman, H.G., 1997).
The perception of continuities within a stream of sensorial information leads towards the compression and, consequently, to the reduction of the information.
In this case the result of cognitive processing is influenced by subject skillfulness in shadowing non-relevant information. An example of such selective listening is the
classical Cherry's "Cocktail party effect" (reviewed in Wood & Cowan, 1995). Selective listening is also crucial in auditory scene analysis when we face the problem of
separatating sounds reaching hears simultanously (Bregman, 1990). In other words: the more information reduction lacks to be selective, the more continuity perception
leads to a generalized deterioration of the information. On the other hand, selective information reduction facilitate cognitive processing for those entities that exhibit
temporal coherence: attenuation of meaningless variations permits the perception of the event as a "whole". Moreover, unitary perception of one dimension may ease
other dimensions processing (e.g. continuity in pulse may facilitate melodic or harmonic processing). For instance, because of possible interference and distortions,
amplitude variations of speaking voices are often compressed in radio transmission in order to guarantee a better compression of the spoken text.
Subjects tend to preserve perceived continuities as long as a new variation in the information stream is detected. This aptitude is coherent with listener tendency to
produce expectations that are influenced by past information. (Dowilng, Jay & Harwood, 1986)
On the other hand, discontinuity perception determines an increase of information. In this case the result of cognitive processing rely on the effectiveness of subject's
strategies used for information organization. As an example, the more different pitches the composer uses, the strong correlation among pitches should be perceived by
the listener in order to process the acoustic information.
Correlation is more evident when discontinuities on two different dimensions take place at the same time (e.g. a simultaneous change in pitch and duration). When that
happen information is amplified by means of distinctiveness: stressed variations will lead to the perception of the sonic object in term of multiple units clearly marked
The second pulse (p2) that has been considered in the model is the binary or ternary multiple of p1. In this case, an evaluation of the correlation between pulses and
previously defined discontinuities is invoked in order to assign the preference: the more coincidences between the pulses and the tones marked during the discontinuity
detection phase, the more probable the p2 pulse will be.
Pulses are marked above notes. Detected changes in pulse subdivision have been marked as 'x' otherwise a 'o' has been annotated. A change in pulse that occurs just on
the last note of the excerpt due to an evaluation of its length has not been considered.
Segment definition
Early research (reviewed in Fraisse, 1982) has shown that extreme temporal limit for the perception of a single group can not be more than 5 sec. According to these
results our model allows continuity/discontinuity amplification only on three temporal layers. The longest (L3) coincides with the beginning and the end of 7-9 sec.
melodic fragment, and has not been marked on the score. The second (L2) divide the excerpt in two or three segments and is marked. The latter (L1) divide each L2
segment in 2-3 sub-segments. A third sub-level (L0) is not considered in the amplification phase but is used to generate segments that will be subsequently grouped on
the L1 layer.
The segment production step is threefold. Firstly, all possible L0 segments are produced by means a small set of rules. Secondly, produced segments are reduced by
means of constrains. Thirdly, segments are grouped on L1 and L2 layer through a set of other constrains.
Rules used for L0 segments are the followings.
1) Each segment must contain at least two distinct sounds.
2) Each segment must begin on a marked tone and should end on marked or unmarked tone.
Constrains used to reduce the number of L0 segments may be biased either by discontinuity or continuity. In our hypotesis continuity and discontinuity behave as
powerful structuring forces in subjective auditory organization during music composition, performance and listening. Emphasis on one or the other relies upon
individual tendencies but has its roots in historical and esthetical traditions. For instance, today baroque music performances emphasize discontinuities whereas
continuities were much more stressed in early twenty-century interpretations (Harnoncourt, 1984).
Constrains based on discontinuity stress the contrast among inner differences thus induce further subdivisions of the excerpt. On the other hand, more
continuity-oriented constrains often lead to the perception of a unified auditory image.
In the present version our model is more influenced by discontinuity constrains. That is preferred segments to be grouped in level L1 are those that begin and end with
tones marked with a discontinuity.
Each group of two or three L0 "well-formed" segments will produce a segment on L1 layer. Segments of similar duration have been preferred among others. That
means that segments of different length are likely to be assigned to different layers more than grouped on the same temporal span. L1 segments are combined into L2
segments according to the same principles.
Discontinuity amplification
In the discontinuity amplification phase amplitudes and duration discontinuities are added on each well-formed segment of L1 and L2 layers. Discontinuity positioning
could not be defined by strict rules but depends, once again, on a fairly variable combination of discontinuity and continuity constrains.
Nevertheless, once defined, introduced discontinuities must cohere over the different layers. In other words, the positioning of the discontinuity on a higher layer
segment must coincide with one of the discontinuities introduced on the sub-divisions of that segment.
file:///g|/Tue/Ciardi.htm (5 of 11) [18/07/2000 00:33:59]
L'amplificazione delle discontinuità in stimoli acustici
Discontinuity strength varies according to the layer of the segment: the higher is the segmentation layer the stronger the discontinuity will be.
In our model, in order to make subjects fell metrical continuity within each melodic excerpt, amplitude discontinuities are added on notes that are marked both with
discontinuities and pulse continuities. Amplitude discontinuities are marked with an accent (>) on the output score. Duration discontinuity is added by shortening the
last note of each segment and is indicated with a staccato (.). The dimension of the sign varies according to the strength of the discontinuity.
3. Experimental results
An indirect validation of the model can be found in the experimental results from researches on recognition memory for unknown musical fragments (Olivetti
Belardinelli et al., 2000). Experimental materials for these researches derive from 48 melodic fragments devised by the author for experiment on recognition memory
(Olivetti Belardinelli, Cifariello Ciardi, Rossi-Arnaud, 1998). The original MIDI files used to record the original set of stimuli have been modified according to the
described model and recorded using a Digidesign-Sample Cell piano tone. Introduced amplitude and duration discontinuities were realized with a 70% increase in
velocity (i.e. a MIDI correlated for loudness) and a 40% decrease in duration on L2 segments. Discontinuities on L1 segments were realized with a 50% increase in
velocity (i.e. a MIDI correlated for loudness) and a 20% decrease in duration. Experimental data show that subjects are more likely to remember stimuli affected by
discontinuity amplification.
4. Conclusions
We are cautious about conclusions from the above mentioned experiments because not directly designed to test the model. This issues aside, results seem to suggest that
a discontinuity amplification may improve melodic processing regardless of any musical language (e.g. tonal, atonal, serial). Moreover, being acoustic sequential events
organized according to criteria of spectral continuity (Mc Adams, 1984), a continuity/discontinuity framework would be of interest because it provides hints for a
generalized analysis of sonic surfaces.
Clearly, future research must provide experimental evidences for many of the empirical constrains included in the model. Moreover, since analysis phase is based on
relatively few sonic variations further studies has to determine to which extend discontinuities emerging from more complex parameters interactions (e.g. timbre and
dissonance variations, symmetry detection) affect listener auditory organization.
5. References
Bregman, A.S. (1990). Auditory scene analysis. Cambridge, MA, MIT Press.
Deutsch, D. (1982). Grouping Mechanisms in Music in D.Deutsch (Eds.). The Psychology of Music. New York, NY, Academic Press.
Dowilng, W.J. (1973) The perception of Interlaved melodies. Cognitive Psychology, 5, 322 - 337.
Dowilng, W.J. Harwood, D. L. (1986), Music Cognition, New York, NY, Academic Press.
Fraisse, P. (1974). Psychologie du ritme. Paris: Presses Univeritaries de France.
Fraisse, P. (1982). Rhythm and Tempo in D.Deutsch (Eds.)The Psychology of Music. New York, NY, Academic Press.
Frieberg, A. & Sundberg J. (Eds.) (1995). Grammars for music performance. Proceedings of the KTH Symposium. Stockholm: KTH
Harnoncourt, N. (1984). Der musikalische Dialog. Salzburg: Residenz Verlag.
Imberty, M. (1996). Ordine e disordine. Un punto di vista psico-cognitivo sulla creazione musicale, in Di Matteo R. (a cura di) Psicologia Cognitiva e Composizione
Musicale. Roma: edizioni Kappa.
Lerdahl, F., Jakendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge, MA, MIT Press.
file:///g|/Tue/Ciardi.htm (10 of 11) [18/07/2000 00:33:59]
L'amplificazione delle discontinuità in stimoli acustici
Olivetti Belardinelli, M., Cifariello Ciardi F., Rossi-Arnaud, C. (1998). Recognition memory for previously novel musical themes in children, in Proceedings of XV
Congress of the International Association of Empirical Aesthetics, Roma: Edizioni Universitarie Romane, Roma.
Olivetti Belardinelli, M., Rossi-Arnaud, Pitti G., Vecchio S. (2000). Looking for the anchore points for musical memory. ICMPC2000 proceedings.
Tekman, H.G. (1997). Interaction of Percieved of Intensity, Duration, and Pitch in Pure Tone Sequences. Music Perception, 14, 281-294.
Tekman, H.G. (1998). Effects of Melodic Accents on Perception of Intensity. Music Perception, 15, 391-401.
Wood, N.L., Cowan, N. (1995) The Cocktail Party Phenomenon Revised: Attentio and Memory in the Classic selective Listening Procedure of Cherry (1953). Journal
of Experimental Psychology: General, 124, 243-262.
Back to index
Proceedings paper
Actions aimed at achieving a sense of control over life in spite of the changes brought by illness have
been entitled ‘accommodation’ (Corbin and Strauss, 1987: 250). After the loss of self and wholeness
which chronic illness causes, a new self concept can only be reconstructed with the possibility of
discovery of new actions, thereby transcending the body. However, in reality the experience of living
with chronic degenerative neurological illness involves any combination of the loss of physical,
sensory or cognitive abilities, and loss of control over one's present and future. Loss of control in
effect raises questions about whether ill people will live, or whether they want to (Charmaz, 1991).
The ‘social’ component also must remain a central issue in work with the chronically ill, as prolonged
immersion in illness takes its toll upon social relationships and self (Charmaz, 1991). Social isolation
translates directly into emotional isolation and loneliness.
Music therapy is the planned and intentional use of music to the meet an individual's social,
emotional, physical, psychological and spiritual needs within an evolving therapeutic relationship. In
the therapy session, the therapist and client explore the client's world together, basing all interaction
on the client's musical utterances or musical preferences. This forms the basis for the therapeutic
relationship.
The growing body of music therapy literature with neuro-degenerative populations claims that music
therapy intervention with individuals with chronic neuro-disability can effect change on self-concepts,
for example ‘self-esteem’, ‘self-image’ or ‘self-worth’ (McMaster, 1991; Purdie & Baldwin, 1994,
Magee 1999a,b&c). In a group study of music therapy with patients with Multiple Sclerosis, themes
which emerged in the therapy were disability, uncertainty, anxiety, depression, and loss of self-esteem
(Lengdobler and Kiessling, 1989). Challenging identity through music therapy to form a new aesthetic
identity, which transcends the physical, also has been discussed in the role of music therapy in chronic
illness and with the dying (Aldridge, 1995 & 1996). There remains, however, no empirical
explanation of how music therapy may impact upon identity constructs in a way to promote positive
self-concepts.
Aims.
The case study presented here has been selected from a larger study which examined broader clinical
issues in music therapy with individuals with chronic neurological illness (Magee 1998). For the
purposes of this paper, the research focus was to examine how the experience of clinical
improvisation in music therapy changed self-concepts for an individual with complex
neuro-disabilities caused by Multiple Sclerosis.
Method.
A single case study has been selected from a larger group study which recruited six adults with
Multiple Sclerosis at a residential and day-care facility for complex neuro-disability. Research
participants were selected using purposive sampling procedures from multidisciplinary referrals and
self-referrals to a music therapy research study. An assessment determined whether music therapy was
a relevant intervention for individuals' particular needs. The participants gave written consent to
participate in this research, receiving individual music therapy as part of a wider clinical programme.
The music therapist was the primary researcher and so worked as a participant researcher for the
study.
The music therapy sessions took place on a weekly basis for approximately six months for each
participant. The session format included active participation in exploring instruments, joint clinical
improvisation with the therapist and singing songs of the participant's choice which held personal
meaning to the research participant. Discussion of the musical material or personal material relating to
Conclusions.
Jessie's need for meaningful interaction on an emotional level, in which her emotional responses to
loss were heard and accepted. This was her core experience in the clinical improvisations. Through
her role within the improvisation, the therapist served to validate not only Jessie’s emotional
experience as expressed through music, but also served as a ‘performance validator’, essential in
redefining concepts of identity (Corbin & Strauss, 1987). For Jessie, the validation of her emotional
experience reflects the parallel made by Aldridge between the activity of mutual music making within
music therapy and the ‘affirmation of worth’ which validates the individual’s experience of hope
(Aldridge, 1995: 106).
The manner in which Jessie played out control in her life is also a significant feature in this case
study. Jessie gained control over her loss and dependence by actively refusing novel activities which
may have allowed her opportunities to develop new skills. Her ‘accommodation’, therefore, increased
her control in one sense but also served to reinforce her isolation and disabled identity. Clinical
improvisation within a therapeutic relationship provided her with the opportunity to experience
control and discovery of new skills through the interactive elements of spontaneous, non-verbal
music-making. Although her sense of identity was so severely damaged, the seeds of the process
described as ‘identity reconstitution’ (Corbin & Strauss, 1987) were evident in Jessie’s heightened
emotional responses within improvisation, as she moved towards a greater sense of wholeness.
Jessie reflects the picture of the chronically ill individual who is ‘so immersed in illness that they
cannot readily claim other identities in the external world’ or ‘move on’ from their preoccupation with
loss (Charmaz, 1987). The external world did not exist for Jessie, either through visual images,
physical presence, or access. Therefore opportunities to claim a more able identity were very limited
and immensely difficult for her. Brooks and Matson (1987) further describe the process of isolation
for the chronically ill individual, which stems from a shift in self perception, strained social
relationships and changed relationships with intimates caused by the increasing dependence on others.
For Jessie, the interpersonal connections which took place within improvisation provided the support
through the musical relationship, and reassurance through the reflection and development of her
musical ideas. She was unable to gain reassurance in this way in her verbal interactions.
Jessie had little opportunity for change, development or progress in her life due to her disabilities and
coping mechanisms. Frequently within sessions she expressed the wish to die, however, her
experience of improvisation did give her hope, and the sense of development. Charmaz (1987)
suggests that motivation for the chronically ill individual is a result of developing a personal identity
which encompasses future selves, reflecting hopes and aspirations. Dimensions for fostering hope
have been allied with the music therapy context, both in improvisation and the use of pre-composed
music (Aldridge, 1995&1996). Through her own creative process and the shift in identity she
achieved through improvisation, Jessie gained motivation and increased levels of arousal. This was
observable in all aspects of her behaviour during and at the end of sessions, contrasting greatly with
the depressed affective state and reduced energy levels with which she presented every week prior to
sessions when collected from the ward.
The results of this study show that through the active music therapy process isolation is reduced,
thereby facilitating the individual to challenge their concept of self. Clinical improvisation centres
around the interactive relationship on an equal basis through the physical act of playing. This
addresses fundamental issues concerning dependency. Dependency is another crucial issue in forming
concepts of self and identity, as individuals with chronic illness express a greater fear of dependence,
debility and abandonment than of death itself (Charmaz, 1991). The experience of clinical
improvisation stimulated shifts in identity and is therefore an effective means of addressing the
‘spoiled identity’. This case study illustrates that the interactive nature of clinical improvisation as an
intervention with individuals with chronic disability and illness may provide validation of positive
feelings of self-esteem and identity. Clinical improvisation is able to facilitate the emergence of new
and undiscovered skills, develop a wholeness of self, thereby shifting identity to reach a preferred
sense of self.
References.
Aldridge, D. (1995). Spirituality, Hope and Music Therapy in Palliative Care. The Arts in
Psychotherapy, 22(2), 103-109.
Aldridge, D. (1996). Music Therapy Research and Practice in Medicine. London: Jessica Kingsley
Publishers.
Brooks, N. & Matson, R. (1987). Managing Multiple Sclerosis. In Roth, J. & Conrad, P (Eds.),
Research In the Sociology of Health Care: A Research Annual. The Experience and Management of
Chronic Illness, 6, 73-106. London: JAI Press Inc.
Corbin, J. & Strauss, A. (1987). Accompaniments of Chronic Illness: Changes in Body, Self,
Biography, and Biographical Time. In Roth, J. & Conrad, P (Eds.), Research In the Sociology of
Health Care: A Research Annual. The Experience and Management of Chronic Illness, 6, 249-281.
London: JAI Press Inc.
Charmaz, K. (1987). Struggling for a self: Identity levels of the chronically ill. In: Roth, J. & Conrad,
P (Eds.), Research In the Sociology of Health Care: A Research Annual. The Experience and
Management of Chronic Illness, 6, 283-321. London: JAI Press Inc.
Charmaz, K. (1991). Good Days, Bad Days. The Self in Chronic Illness and Time. New Brunswick:
Rutgers University Press.
Conrad, P. (1987). The Experience of Illness: Recent and New Directions. In Roth, J. & Conrad, P
(Eds.), Research In the Sociology of Health Care: A Research Annual. The Experience and
Management of Chronic Illness, 6, 1-31. London: JAI Press Inc.
Lengdobler, H., & Kiessling, W.R. (1989). ‘Gruppenmusiktherapie bei multipler Sklerose: Ein erster
Erfahrungsbericht’. Psychotherapie, Psychosomatik, Medizin und Psychologie, 39, 369-373.
Magee, W. (1998) ‘Singing my life, playing my self’. Investigating the use of familiar pre-composed
music and unfamiliar improvised music in clinical music therapy with individuals with chronic
neurological illness. Unpublished doctoral dissertation, University of Sheffield, UK, #9898.
Magee, W. (1999a) ‘Music Therapy in Chronic Degenerative Illness: Reflecting the Dynamic Sense
of Self’. In Ed. D. Aldridge, Music Therapy in Palliative Care, 82-94. London: Jessica Kingsley
Publishers.
Magee, W. (1999b) ‘Singing my life, playing my self’: Song Based and Improvisatory Methods of
Music Therapy with Individuals with Neurological Impairments. In T. Wigram & J. De Backer, (Eds.)
Clinical Applications of Music Therapy in Developmental Disability, Paediatrics and Neurology,
201-223. London: Jessica Kingsley Publishers.
Magee, W. (1999c) ‘Musiktherapie bei chronisch degenerativen Krankheiten: Eine Wiederspiegelung
des dynamischen Selbst’. In Aldridge, D. (Ed.) Kairos IV, Berne: Huber Verlag.
McMaster, N. (1991). Reclaiming A Positive Identity: Music Therapy In The Aftermath Of A Stroke.
In: Bruscia, K.E. (Ed.), Case Studies in Music Therapy, 547-560. Philadelphia: Barcelona Publishers.
Purdie, H. & Baldwin, S. (1994). Music Therapy: Challenging Low Self-Esteem in People With a
Stroke. British Journal of Music Therapy, 8(2), 19-24.
Robinson, I. (1988). Multiple Sclerosis. London: Routledge.
Strauss, A. & Corbin, J. (1990). Basics of Qualitative Research. Grounded Theory Procedures and
Techniques. Newbury Park: Sage Publications, Inc.
Authors’ note.
Wendy L. Magee BMus PhD ARCM SRAsT(M) is Head of Music Therapy at the Royal Hospital for
Neuro-disability, London, holding a clinical post as a music therapist working with adults with
acquired and complex neuro-disability, and developing research projects with this population. This
research is part of doctoral research undertaken whilst registered at the Department of Music,
University of Sheffield. Jane W. Davidson BA PGCE MA PhD Cert. Counselling is Senior Lecturer
in Music at the Department of Music, University of Sheffield. She is editor of the international journal
Psychology of Music and has researched on a wide range of topics from self and identity in singers
through to expressive body movement and piano performance, having over 50 publications to her
name in international peer-reviewed journals. Besides researching, she teaches a wide range of
courses and is an active performer, artistic director and producer.
The authors would like to acknowledge the Living Again Trust, the John Ellerman Foundation, the
Juliette Alvin Trust and the Music Therapy Charity who all contributed to funding this project. The
author also would like to thank the research participants who took part in this study. The Royal
Hospital for Neuro-disability received a proportion of its funding to support this paper from the NHS
Executive. The views expressed in this publication are those of the authors and not necessarily those
of the NHS Executive.
Address for correspondence: Dr. Wendy L. Magee, Music Therapy Department, Royal Hospital for
Neuro-disability, West Hill, London SW15 3SW, UK
Back to index
Proceedings paper
The ability to produce regularly timed rhythmic actions and the corresponding ability to perceive (deviations from)
regularity in a sequence of events presuppose a timekeeping mechanism in the brain that oscillates in a periodic
fashion and also can adapt to deviations from perceived isochrony by changing its period and/or relative phase.
Mental timekeepers or oscillators underlying rhythmic action have been discussed by many authors representing
diverse theoretical orientations, such as Michon (1967), Kugler and Turvey (1987), and Vorberg and Wing (1996).
For perception, a corresponding theory of attentional rhythms or oscillators has been proposed by Jones (1976) and
elaborated in many subsequent articles, most notably in Large and Jones (1999).
One important theoretical issue is whether perception and action are subserved by a single general timekeeper or
whether separate, perhaps even task-specific timing processes are involved. Working within a traditional
information-processing framework, Keele, Ivry, and colleagues have presented evidence suggesting a common
timing mechanism in perception and production of simple event sequences (Keele & Hawkins, 1982; Keele,
Pokorny, Corcos, & Ivry, 1985; Ivry & Keele, 1989; Ivry & Hazeltine, 1995). According to a dynamic systems
perspective, however, timing is an emergent property and thus may be specific to different activities (e.g.,
Robertson et al., 1999; Turvey, 1977; Wallace, 1996). By analogy, this view might also predict task-specific
perceptual processes with regard to timing. However, the dynamic systems view also posits a close relationship
between perception and action within the same task situation (see, e.g., Viviani & Stucchi, 1989, 1992a, 1992b).
The two experiments reported here are pertinent to the hypothesis of a common timekeeping mechanism for
perception and action, addressed here by investigating whether contextual timing variation has similar effects on
detection and synchronization accuracy in musical sequences containing small deviations from isochrony.
Experiment 1 was concerned with perception, Experiment 2 with action.
EXPERIMENT 1
Experiment 1 continued a long series of experiments on the detectability of small timing perturbations in
isochronous musical excerpts (Repp (1992a, 1992b, 1995, 1998b, 1998c, 1998d, 1999b, 1999c). Two consistent
findings in that research were that the detectability of a hesitation-a single lengthened inter-onset interval
(IOI)-varies greatly with its position in the musical structure, and that the resulting "detection accuracy profile"
(i.e., percent correct detection as a function of position) is negatively correlated with the typical timing pattern (i.e.,
IOI duration as a function of position) of artists' expressive performances of the musical excerpt. In one variation of
the basic paradigm (Repp, 1998b: Exp. 1), each presentation of the test excerpt was preceded by an expressively
timed performance of the same music. This precursor made the hesitations in the test excerpt more difficult to
detect than in an earlier study in which no precursor had been employed. Moreover, this interference effect did not
seem to decrease in the course of the test excerpt, which lasted close to 20 seconds.
The interference was believed to be due to the temporal variation in the precursor, not to the specific pattern of that
variation. Large and Jones (1999: Exp. 1) recently reported striking context effects of a similar nature in a
nonmusical paradigm (see also Jones & Yee, 1997; Yee, Holleran, & Jones, 1994). Large and Jones attributed these
effects to a widening of the temporal expectancy region of an attentional oscillator, and to a slow rate of adaptation
of this mechanism. The findings of Jones and collaborators suggest that the musical precursor effect observed by
Repp (1998b) was due to the same attentional mechanism. However, it seemed desirable to replicate it in a
within-participant design contrasting different precursor conditions. Moreover, if the effect is due to timing
variability as such, then it should not be necessary to employ a musical precursor; a simple tone or click sequence
would do. These considerations led to a design with four precursors: (1) isochronous music; (2) expressively timed
music; (3) isochronous clicks; and (4) "expressively timed" clicks (i.e., with the same timing pattern as the music).
The predictions were that the accuracy of detecting small hesitations in the following musical test excerpts would
be lower after temporally modulated than after isochronous precursors, and that there would be little difference
between the music and click precursor conditions. In addition, by comparing the detection accuracy profiles for the
test excerpts in the different precursor conditions, it could be determined whether or not the precursor effect
declined in the course of the test excerpt.
Method
The musical excerpt was the opening of Chopin's Etude in E major, op. 10, No. 3, the score of which is shown
below. With the exception of the initial eighth-note upbeat (which was excluded from all analyses of timing), all
inter-onset intervals (IOIs) in the excerpt are nominally sixteenth-note intervals, as long as distinctions among
voices are disregarded. In a completely deadpan version of the excerpt, which served as the basis for the
experimental materials, all IOIs were set to 500 ms, and all keypress velocities were set to the same arbitrary MIDI
value of 60.
The test sequences of the main experiment were not entirely isochronous but each contained four lengthened IOIs
("hesitations"). These IOIs were lengthened by the same amount ∆t and occurred at unpredictable locations,
separated by at least four unchanged IOIs. All tones sounding during a lengthened IOI were lengthened by ∆t,
too, so that legato articulation was maintained. In the course of a block of 9 trials, each of the 36 IOIs in the excerpt
was lengthened once by ∆t. Three different blocks with different randomizations of the lengthened IOIs were
created for each ∆t value. The ∆t values ranged from 80 ms (16%) to 20 ms (4%) in steps of 10 ms.
The expressively timed music precursor was a synthesized expressive performance of the same Chopin excerpt.
The (atypical) timing pattern was derived from a principal component analysis of a large sample of expert
performances (Repp, 1998a). The standard deviation of its IOI durations was 80 ms. The precursor also contained
typical expressive dynamic variation as well as small tone onset asynchronies and pedaling, to enhance its
naturalness. An isochronous music precursor was derived from the expressively timed one by setting all IOIs to
500 ms, leaving all other temporal details relatively invariant (see Repp, 2000b). The expressively timed click
precursor consisted of a sequence of 38 very high-pitched (C8, 4,168 Hz), rapidly decaying digital piano tones of
equal intensity, called here "clicks" for simplicity, which were timed in exactly the same way as the top-line tones
of the expressively timed music. The isochronous click precursor had constant IOIs of 500 ms. All materials were
generated on a Roland RD-250s digital piano under control of a Macintosh Quadra 660AV computer via a MIDI
As in previous studies of a similar nature, the detectability of hesitations varied greatly across positions in the
music, F(35,385) = 14.3, p < .0001. However, none of the interactions of position with precursor type or timing
approached significance, indicating that the precursor effects did not decrease across positions in the test excerpt.
EXPERIMENT 2
The results of Experiment 1, in conjunction with those of Repp (1998b) and Large and Jones (1999), show that a
temporally modulated context reduces listeners' ability to detect small deviations from isochrony in a test sequence.
The underlying mechanism suggested by Large and Jones is an attentional oscillator whose temporal expectancy
region widens after exposure to temporal variability, making it more tolerant of small deviations from temporal
expectancies. The width of a temporal expectancy window is formally equivalent to the probability distribution of
specific temporal expectations, so that a wider window implies greater variability of the underlying timekeeper.
Therefore, if the adaptive attentional oscillator is identical with the timekeeping mechanism governing regular
motor activity, such as finger tapping in synchrony with an isochronous sequence, then one should expect
temporally modulated precursors to increase the variability of subsequent finger tapping> These precursors should
also reduce the timekeeper's sensitivity to small deviations from isochrony in a stimulus sequence, as reflected in
the speed of motor compensation (phase error correction) following such timing perturbations.
Repp (2000a: Exp. 4) conducted an experiment with the Chopin Etude excerpt in which each test trial was preceded
by an expressively timed precursor. The test trials contained subliminal hesitations (lengthened IOIs). Motor
compensation for these perturbations was just as rapid following the expressively timed precursor as in a condition
without a precursor (Repp, 2000a: Exp. 3). The variability of the tap timing was likewise unaffected by the
presence of the precursor. These results suggested a possible dissociation between the timekeeping processes
involved in perception and in motor control. However, the precursors were merely listened to and thus did not
require any overt motor activity. Experiment 2 investigated whether the added requirement of tapping in synchrony
with the expressively timed precursors would result in increased variability and in slower phase correction in
subsequent synchronized tapping with isochronous musical test excerpts containing small hesitations.
Methods
The materials were a subset of those of Experiment 1. Only musical precursors were used, either isochronous or
expressively timed, together with test excerpts containing hesitations of 20 ms. Three blocks with different
randomizations of the hesitations were used. Each block comprised 9 trials, each of which contained 4 hesitations.
The 12 participants were mostly musically trained undergraduates. They tapped with their preferred hand on a
white key of a Fatar Studio 37 MIDI controller (a silent three-octave piano keyboard) which they held on their lap.
The key depressions were recorded via a MIDI interface by a MAX patcher that also controlled presentation of the
musical excerpts. Otherwise, the equipment was the same as in Experiment 1.
Participants were given a few practice trials, followed by three blocks of test trials without precursors. Tapping
started with the first downbeat (the second tone) in each excerpt and continued in synchrony with the sixteenth
notes, for a total of 37 taps. Participants were not informed about the hesitations which generally were near or
below their detection threshold. The main part of the experiment consisted of six blocks of test trials, with each trial
being preceded by a precursor. Precursors were constant within each block but alternated between blocks, with
some participants starting with the isochronous precursor and others with the expressively timed one. Participants
were requested to tap in synchrony with each precursor.
Results and discussion
The variability of the taps was assessed in four different ways: The average standard deviations of tap-tone
asynchronies and of inter-tap intervals (ITIs) were computed both within and between trials. In each case, the initial
three taps (during which participants adjusted to the sequence tempo) and the final tap in each trial were
disregarded. Within-trial standard deviations were also calculated for the asynchronies and ITIs generated in
synchronizing with the precursors themselves.
The results are summarized in Table 2. The finding of main interest is the difference in tap variability following
expressively timed and isochronous precursors, shown in the last column of the table. As can be seen, this
difference was usually positive, suggesting higher variability following temporally modulated precursors, but it was
very small. Nevertheless, it reached significance for within-trial ITIs, F(1,11) = 6.5, p < .03, and for between-trial
ITIs, F(1,11) = 5.6, p < .04. The lower part of Table 2 also reveals that variability in tapping to the isochronous
precursors was comparable to that in tapping to the test excerpts, whereas variability of tapping to expressively
timed precursors was very high.
Table 2. (a) Average standard deviations (in ms) of asynchronies and ITIs within trials (ASY-w and ITI-w,
respectively) and between trials (ASY-b and ITI-b, respectively) for test excerpts in the three precursor conditions,
and the difference between the two precursor conditions, with standard errors in parentheses. (b) Average standard
deviations of within-trial asynchronies and ITIs for tapping in synchrony with the precursors themselves.
(a) Synchronization with test excerpts following precursors
The origin of the small differences in ITI variability between precursor conditions could have been due to either
random or systematic variation, or both. Previous investigations (Repp, 1999a, 1999b, 1999c, 2000a) have shown
that the asynchronies and ITIs of taps accompanying perfectly isochronous music exhibit a systematic pattern of
deviations from isochrony. Therefore, these patterns were determined and compared between the precursor
conditions. Before computing the average asynchronies and ITIs, however, the asynchronies of taps coinciding
with hesitations as well as the two subsequent asynchronies were removed from the data, following the procedure
of Repp (1999b: Exp. 2). The averages across the 9 trials in a block thus were based on 6 data points per position.
As expected, there was highly significant variation across positions of both asynchronies, F(33,363) = 8.3, p <
.0001, and ITIs, F(32,352) = 13.2, p < .0001. Most interestingly, there was a small but significant Condition x
Position interaction for both asynchronies, F(33,363) = 1.5, p < .04, and ITIs, F(32,352) = 1.8, p < .006: The
profiles were initially more strongly modulated after modulated precursors than after isochronous precursors, and
there was also a larger difference in absolute asynchronies at the beginning. That the interaction derived from the
initial portions of the profiles was confirmed in ANOVAs that omitted the initial 10 data points (not counting the
tuning-in portion) and in which the Condition x Position interaction was far from significance (p > .5).
These results suggested that the small difference in overall ITI variability between the precursor conditions (Table
2) was due to this initial difference in the amplitude of systematic ITI modulation. The standard deviations of the
within-trial and between-trial ITIs were therefore recalculated with the first 10 data points omitted. The resulting
average within-trials values were 22.7 and 22.8 ms for isochronous and modulated precursors, respectively, and the
corresponding between-trial values were 20.5 and 20.7 ms, respectively. Both differences were clearly
nonsignificant. Thus the differences reported in Table 2 were indeed due to the initial portion of the ITI profile
only.
Next, the speed of compensation for hesitations in the test excerpts was examined. The relevant data were the
triplets of asynchronies that had been extracted from the data before computing the average asynchrony and ITI
profiles. There were 36 such triplets in each condition, representing the 36 positions of the perturbation point in the
music. These triplets were further divided into two groups of 16, according to whether the hesitations were of high
or low detectability. (The first two and last two positions were excluded.) The high-low distinction was based on a
median bisection of the average detection accuracy profile obtained in Experiment 1. The results were expressed as
deviations from the average asynchrony profile. The average relative asynchronies were close to -20 ms at the
perturbation point (P) and quickly returned to the zero baseline in the following two positions.
These average "compensation functions" showed two unexpected differences. First, compensation was more rapid
following modulated precursors than following isochronous precursors, F(2,22) = 7.7, p < .003. If anything, the
opposite had been predicted. The second unexpected finding was that compensation was more rapid for
high-detectability than for low-detectability perturbations, F(2,22) = 7.7, p < .003. This result, although plausible,
contradicts an earlier negative result, obtained in a very similar comparison (Repp, 1999b: Exp. 2).
GENERAL DISCUSSION
The present results reveal a partial dissociation of perception and motor control with regard to timing. Experiment
1 replicated the precursor effect found serendipitously by Repp (1998b) and demonstrated its existence more
clearly in a within-participant design: Exposure to a variably timed auditory sequence reduced listeners' sensitivity
to deviations from temporal regularity in a subsequent sequence. Moreover, this effect occurred regardless of the
type of precursor (clicks or music), which suggests that it is not specific to music. The precursor effect is analogous
to similar context effects reported by Large and Jones (1999), who attributed it to the widening of the expectancy
window of a slowly adapting attentional oscillator. However, there was no evidence of a decrease in the precursor
effect in the course of largely isochronous test excerpts lasting close to 20 s. Thus the rate of adaptation seemed to
be very slow indeed.
Very different results were obtained in Experiment 2 with regard to sensorimotor synchronization with musical
stimuli very similar to the ones used in Experiment 1. Previously, Repp (2000a) had found no effects of a
modulated precursor on variability of tap timing or compensation for hesitations when participants merely listened
to the precursor. In Experiment 2, tapping variability was slightly higher following modulated precursors than
isochronous precursors, perhaps due to the requirement that participants tap in synchrony with the precursors, but
this difference disappeared within about 10 taps. Moreover, the difference was due to an increased amplitude of
systematic variability, not of random variability. This suggests a heightened sensitivity to structural musical factors,
such as meter, which was induced by the expressively timed precursor but wore off rapidly during exposure to the
nearly isochronous test excerpt. This process most likely reflects a modulation of the period of the motor
timekeeper, not a widened expectancy window. It cannot account for the precursor effect that lasted throughout the
test excerpt in Experiment 1.
As to compensation for hesitations in the music, which is perhaps more directly relevant to their detectability, it
was found to be more rapid after modulated than after isochronous precursors. This is a paradoxical result which
remains unexplained, but it contradicts the precursor effect on detectability. Thus the negative effect of precursor
timing modulation seems to be largely specific to perception, suggesting that the attentional oscillator governing
time perception is distinct from the mechanism that controls the timing of finger taps.
The dissociation between perception and action would have been even more striking if Experiment 2 had replicated
the finding of Repp (1999b) that compensation for hesitations was independent of their detectability. In Experiment
2, however, compensation was somewhat more rapid in positions of higher detectability. Conscious detection of a
hesitation thus may have accelerated the phase correction process somewhat. This result must be viewed with some
scepticism, however, because it also contradicts previous findings obtained with simple tone sequences (Repp,
2000a: Exp. 5). In any case, the results are consistent with the general finding that phase correction does not
require conscious detection of a perturbation (Repp, 2000a).
In summary, the present results suggest that the perception of timing is governed in part or entirely by processes
that are separate from the timekeeper that controls the timing of action in sensorimotor synchronization. There are
close parallels in the ways perceptual and motor timing mechanisms have been conceptualized (e.g., Large &
Jones, 1999; Mates, 1994a, 1994b), but this functional similarity should not be taken to reflect physiological
identity. The most important difference between perception and action is that perceptual tasks generally require
conscious registration of temporal differences and explicit judgments, whereas motor tasks such as sensorimotor
synchronization often rely largely or entirely on subconscious, automatic regulatory processes. Although the
processes underlying perceptual judgments are just as subconscious as those underlying motor control, the
additional processing required for information to reach awareness and for a deliberate response to be made may
introduce random as well as systematic variation. There are many recent demonstrations of dissociations between
perception and action, especially in tasks based on visual information (e.g., Creem & Proffitt, 1998; Gentilucci et
al., 1996; Haffenden & Goodale, 1998; Klotz & Neumann, 1999; Rumiati & Humphreys, 1998), and the present
research adds to the rapidly mounting evidence that action is often based on information that is not fully processed
perceptually (see Neumann, 1990, for relevant discussion).
Acknowledgments
This research was supported by NIH grant MH-51230. I am grateful to Paul Buechler and Steve Garrett for
extensive assistance. Address correspondence to Bruno H. Repp, Haskins Laboratories, 270 Crown Street, New
Haven, CT 06511-6695 (e-mail: repp@haskins.yale.edu).
References
Creem, S. H., & Proffitt, D. R. (1998). Two memories for geographical slant: Separation and
interdependence of action and awareness. Psychonomic Bulletin & Review, 5, 22-36.
Fraisse, P. (1954). La structuration intensive des rythmes. L'Année Psychologique, 54, 35-52.
Gentilucci, M., Chieffi, S., Daprati, E., Saetti, M. C., & Toni, I. (1996). Visual illusion and action.
neuropsychologia, 34, 369-376.
Haffenden, A., & Goodale, M. A. (1998). The effect of pictorial illusion on prehension and perception.
Journal of Cognitive Neuroscience, 10, 122-136.
Ivry, R. B., & Hazeltine, R. E. (1995). Perception and production of temporal intervals across a range
of durations: Evidence for a common timing mechanism. Journal of Experimental Psychology:
Human Perception and Performance, 21, 3-18.
Ivry, R. B., & Keele, S. W. (1989). Timing functions of the cerebellum. Journal of Cognitive
Neuroscience, 1, 136-152.
Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and
memory. Psychological Review, 83, 323-355.
Jones, M. R., & Yee, W. (1997). Sensitivity to time change: The role of context and skill.
Keele, S. W., & Hawkins, H. L. (1982). Explorations of individual differences relevant to high level
skill. Journal of Motor Behavior, 14, 3-23.
Keele, S. W., Ivry, R. B., & Pokorny, R. A. (1987). Force control and its relation to timing. Journal of
Motor Behavior, 19, 96-114.
Keele, S. W., Pokorny, R. A., Corcos, D. M., & Ivry, R. B. (1985). Do perception and motor
production share common timing mechanisms: a correlational analysis. Acta Psychologica, 60,
173-191.
Klotz, W., & Neumann, O. (1999). Motor activation without conscious discrimination in metacontrast
masking. Journal of Experimental Psychology: Human Perception and Performance, 25, 976-992.
Kugler, P. N., & Turvey, M. T. (1987). Information, natural law, and the self-assembly of rhythmic
movement. Hillsdale, NJ: Erlbaum.
Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How we track time-varying events.
Psychological Review. 106, 119-159.
Michon, J. A. (1967). Timing in temporal tracking. Assen, NL: van Gorcum.
Neumann, O. (1990). Direct parameter specification and the concept of perception. Psychological
Research, 52, 207-215.
Repp, B. H. (1992a). Detectability of rhythmic perturbations in musical contexts: Bottom-up versus
top-down factors. In C. Auxiette, C. Drake, & C. Gérard (eds.), Proceedings of the Fourth Rhythm
Workshop: Rhythm perception and production (pp. 111-116). Bourges, France: Imprimérie
Municipale.
Repp, B. H. (1992b). Probing the cognitive representation of musical time: Structural constraints on
the perception of timing perturbations. Cognition, 44, 241-281.
Repp, B. H. (1995). Detectability of duration and intensity increments in melody tones: A partial
connection between music perception and performance. Perception & Psychophysics, 57, 1217-1232.
Back to index
Proceedings paper
factors, two related to the dissonance of intervals and one related to the structure of the chords. The
presence or absence of semitone/whole-tone dissonance was the most significant factor influencing
the subjects’ responses (p<0.0001) and alone accounted for about half of the variance in the data
(R~0.5). In other words, when subjects heard semitone or whole-tone dissonance in a chord, their
evaluation of its harmoniousness plummeted. A second factor related to interval dissonance, i.e., the
total theoretical dissonance of the intervals contained in the chord (Sethares, 1993), was also found to
be a significant factor (p<0.001). It is noteworthy, however, that, when regression was done using
only the total dissonance of the intervals, this factor alone accounted for little of the variance in the
data (R<0.1). Most interestingly, the presence of chordal tension (as defined above) was also found to
play a significant role (p<0.001). This result indicates that when subjects heard three-tone chords
containing two intervals of the same magnitude, their evaluation of its pleasantness decreased.
In order to determine more precisely what the relevant influences on the perception of harmony are,
we examined three distinct factors related to the dissonance of the intervals in the chords. (1) The first
factor was the total theoretical dissonance of the three intervals in each of the three-tone chords, using
the model of Sethares (1993). That model is based approximately on the empirical dissonance curve
obtained by Plomp and Levelt (1965), which indicates a trough of consonance at an interval of about
1-2 semitones. (2) The second factor was the theoretical dissonance of the intervals in the chords, but
including all of the intervals among the first six upper partials of each of the three tones, using a
model originally presented by Kameoka and Kuriyagawa (1969). It should be noted that the chordal
stimuli in our experiment consisted of three pure sine waves, absent of any upper partials. This
theoretical factor was nonetheless calculated for use in the multiple regression analysis because most
musical sounds contain upper partials and, as a consequence, normal listening experience may
produce associations between tones and their higher harmonics, even when the higher harmonics are
absent from the current auditory stimulus. (3) Finally, the empirical dissonance curve obtained in the
interval experiment was used to calculate the total dissonance in the chords. This calculation was done
in order that the factors involved in the perception of the pleasantness of intervals and inherent to our
experimental equipment were included in the evaluation of the perception of the harmoniousness of
chords, using the same equipment. Small differences in the influence of the various interval
dissonance factors on chord perception were indeed found, but the main result was that: (1) the
(empirical or theoretical) dissonance of intervals alone does not explain the harmoniousness of chords,
and (2) an independent factor related to the structure of the three-tone chord ("chordal tension") had a
significant influence on the evaluation of harmoniousness.
5. Conclusions
We conclude that chordal tension is a feature of certain three-tone chords; the tension is perceived by
normal listeners in a manner similar to, but distinct from, the perception of the dissonance of two-tone
intervals. The results suggest that chordal tension is perceived as a psychological Gestalt inherent to
certain three-tone combinations and that the perception of such tension may explain the "instability"
or unresolved nature of augmented and diminished chords without consideration of higher harmonics.
The combination of tones at which resolution of harmonic tension is obtained (and indeed the tones at
which resolution of interval dissonance is obtained) depends heavily upon the scalar intervals used in
the given musical culture, but the need to resolve chordal tension may be a feature as common to all
forms of polyphonic music as is the need to resolve interval dissonance.
References
Kameoka, A. and Kuriyagawa, M. (1969) Consonance theory. Journal of the Acoustical
Society of America 45, 1452-1459; 45, 1460-1469.
Plomp, R. and Levelt, W.J.M. (1965) Total consonance and critical bandwidth. Journal
of the Acoustical Society of America 38, 548-560.
Sethares, W. A. (1993) Local consonance and the relationship between timbre and scale.
Journal of the Acoustical Society of America 94, 1218-1228.
Back to index
Proceedings paper
Abstract
This study investigates the strategies recommended by concert pianists to memorize two pieces of the
piano literature. Semi-structured interviews were conducted with four classically trained concert
pianists who were asked to describe the recommendations they would give to a proficient piano
student to memorize Chopin Prelude in E minor, Op. 28, No.4, and J.S. Bach Prelude in C major from
Book 1 of The Well Tempered Clavier. For both compositions the pianists recommended a detailed
analysis of the music as the most important variable for acquiring a secure memory.
Their recommendations included:
1. study the overall structure of the piece and divide it into sections;
2. look for specific melodic and harmonic patterns and understand their function within the
composition;
3. block the chords so the piece can be played as a chord progression. In addition, some
suggestions based on the use of visual, auditory, and kinesthetic memory were also given.
Introduction
Learning piano pieces from memory is undertaken by students and professional pianists every day, yet
relatively little is known about this topic both from a pedagogical and a psychological perspective.
The books On memorizing and playing from memory and on the laws of practice generally written by
the pianist and pedagogue Tobias Matthay in 1926 (Matthay, 1926; see also Matthay, 1913) and Piano
Technique written by the pianist Walter Gieseking and his teacher Karl Leimer in 1972 remain among
the best pedagogical sources available (Gieseking & Leimer, 1972). While it is important for piano
students to explore different types of memory strategies on their own, and to create their very own
mental image of a piece, it is possible to note that sometimes they are not given specific directions on
how
to memorize their repertoire (Aiello,1999).
Since performing from memory is an important part of the training and the skill required of many
classical pianists, research in this area could be useful to piano students and piano teachers and could
provide valuable information to psychologists. This paper is part of a series of interviews held with
classically trained pianists to gain a better understanding of how they memorize their repertoire.
Method
Semi-structured interviews were conducted with four professional classical concert pianists who had
extensive piano teaching experience. The participants were asked to describe the recommendations
they would give to a proficient piano student to memorize two compositions of different style: Chopin
Prelude in E minor, Op. 28, No.4, and J.S. Bach Prelude in C major from Book 1 of The Well
Tempered Clavier. Specifically, the participants were asked:
1. to describe the strategies they considered valuable to obtain a secure memory of each piece, and
2. to illustrate their suggestions on the score.
All four pianists had performed these pieces from memory sometime during the last few years. By
asking the participants to describe what strategies they would suggest to a capable piano student
instead of what strategies they would use themselves to memorize the pieces it was hoped that they
would use clear and simple descriptions. Therefore their recommendations could be interpreted to
address memory strategies at a basic level. The interviews were held in classrooms or piano studios
and were audio taped.
Results
The data were analyzed according to the principal themes that emerged from the interviews. They are
reported qualitatively.
With reference to the Chopin Prelude in E minor, Op. 28, No.4 (see Figure 1) the pianists'
recommendations focused mainly on the analysis of the piece.
They addressed in particular:
1. the overall form of the piece (i.e.,a period made up of two long phrases);
2. the repeated melodic line occurring in measures 1-4 and measures 13-15;
3. the step-wise motion of the right hand in measures 1-9;
4. the harmonic changes in the left hand that take place throughout the piece, and the rate at which
the chords change;
5. the embellishment, the left hand pattern, and the crescendo that occurs on measure 16;
6. the overall climax of measure 17 due to the pattern of the left hand, the right hand reaching its
highest note, and the forte that should be reached here;
7. the very last chord of the piece.
All participants illustrated on the score what they described. One pianist drew a quick sketch and
explained how making a drawing would help her memorize this piece. She illustrated how the lines of
her drawing outlined the contour of the two phrases, the frequent changes in the left hand, and the
crescendo and the climax occurring on measures 16 and 17. The other three participants
recommended focusing in particular on the sounds of the chromatic left hand chords played
throughout the piece, and on the rate at which these chords change. One of them suggested
remembering the position of the left hand as the chordal changes occur.
With reference to the J.S. Bach Prelude in C major from Book 1 of The Well Tempered Clavier (see
Figure 2) the main themes that emerged from the participants' recommendations addressed primarily
the structure of the piece. The pianists explained that the entire piece is based on chord
progressionsand that all the chords are arpeggiated throughout this prelude.They suggested blocking
the chords to understand their harmonic function, and to hear the chord progressions clearly. They
pointed out how that, except for measures 33-34, each chord is repeated twice in the same position,
and creates a measure.
Three pianists emphasized the importance of hearing the texture of the piece, and two of them spoke
of the feeling the motion in the music. All discussed the importance of the phrasing in this piece, and
how they would create their phrasing. The pianists explained how the phrasing would reflect the
harmonic tensions and resolutions. Three of them stressed the importance of the bass line, and the
fourth spoke of the relevance of the brief coda in measures 33-35. They illustrated on the score what
they described. No references were made to any particular use of visual memory for this piece.
Discussion
For both pieces the pianists' suggestions revealed that their memory strategies were based primarily on
a detailed analysis of the music. Their responses emphasized mostly a cognitive, analytic approach to
the music. The main recommendations they gave were:
1. divide the pieces into sections according to their formal structure;
2. look for specific melodic and harmonic patterns;
3. block the chords to play the pieces as a chord progressions.
Comments such as "Start with the whole so that the parts can make sense"; "Memorize in terms of
sections"; "Focus on all the patterns. Focus on what is different and what is similar in them" are
representative of the recommendations that were made. No pianist suggested memorizing either piece
by rote. They all illustrated on the scores what they described.
The references that these pianists made to the use of visual, kinesthetic, and auditory memory related
to their keen understanding of the scores. For example, the pianist who suggested remembering the
feeling of the left hand in the Chopin Prelude related it to the frequency of the chord changes in this
piece. He explained: "Memory is the balance between mental power and physical dexterity". And the
pianist who drew a quick sketch of this same prelude captured in her simple drawing some of the most
salient musical elements inherent in the score. These concert pianists' reliance on analytic memory is
in agreement with the data reported by Roger Chaffin and Gabriella Imreh who documented how a
concert pianist (the second author) memorized the Presto from J.S. Bach Italian Concerto (Chaffin &
Imreh 1994, 1996a, 1996b, 1997).
Further agreement among the performers in this study and the data reported by Chaffin and Imreh can
also be seen on the emphasis in dividing the scores into sections and creating phrasing that highlights
the structure of the music. It is possible that memorizing atonal music or contemporary pieces would
require the performers to apply different memory strategies than the ones described above to
memorize baroque and romantic music (Aiello, 1999; Marcus,1979). Comments such as: "The process
of discovery in a piece is what helps me creates my memory"; "If you think musically, memory will
follow", and "Understanding music as process helps me remember" provide rich food for thought for
both music teachers and psychologists. It is hoped that future research will address in depth the mental
representations of music performance taking into account different types of music and performers at
different levels of musical skill.
References
Aiello, R. (1999). Strategies for memorizing piano music: Pedagogical implications. Poster
presentation work-in-progress Eastern Division of the Music Educators National Conference,
February 26-28, 1999, New York, New York.
Aiello, R. (2000). Playing the piano by heart: From behavior to cognition. Poster session presented at
the Biological Foundations of Music Conference. The Rockefeller University, New York, NY, May
20-22, 2000. To appear in the Annals of the New York Academy of Sciences. In press.
Aiello, R. & Williamon, A. (2000). Memorization. In R. Parncutt, & G. McPherson (Eds.), Science
and psychology of music performance. New York, NY: Oxford University Press. Forthcoming.
Chaffin, R., & Imreh, G. (1994). Memorizing for performance: A case study of expert memory. Paper
presented at the Third Practical Aspects of Memory Conference. University of Maryland.
Chaffin, R., & Imreh, G. (1996a). Effects of difficulty on practice: A case study of a concert pianist.
Poster presented at the Fourth International Conference on Music Perception and Cognition. McGill
University: Montreal, Canada.
Chaffin, R., & Imreh, G. (1996b). Effects of musical complexity on expert practice: A case study of a
concert pianist. Poster presented at the Meeting of the Psychonomic Society. Chicago, Il.
Chaffin, R., & Imreh, G. (1997). Pulling teeth and torture: Musical memory and problem solving.
Thinking and Reasoning, 3, (4): 315-336.
Chase, W.G., & Simon, H.A. (1973). The mind's eye in chess. In W.G. Chase (Ed.), Visual
information processing. New York: Academic Press.
Clarke, E. F. (1988). Generative processes in performance. In J. A. Sloboda, (Ed.), Generative
processes in music: The psychology of performance, improvisation, and composition. (pp.1-26).
Oxford: Clarendon Press.
Davidson, J.W. (1993). Visual perception and performance manner in the movements of solo
musicians. Psychology of Music, 21, 103-113.
Ericsson, K.A., Krampe, R.T. & Tesch-Romer, C. (1993). The role of deliberate practice in the
acquisition of expert performance. Psychological Review, 100, 363-406.
Gabrielsson, A. (1999). Music performance. In D. Deutsch (Ed.), The psychology of music, second
edition, (pp.501-602). San Diego: Academic Press.Gieseking, W., & Leimer, K. (1972). Piano
technique. New York: Dover publications, Inc.
Gruson, L.M. (1988). Rehearsal skill and musical competence: Does practice make perfect? In J.A.
Sloboda (Ed.), Generative processes in music: The psychology of performance, improvisation, and
composition, (pp.90-112). Oxford: Clarendon Press.
Hallam, S. (1995). Professional musicians' approaches to the learning and interpretation of music.
Psychology of Music, 23, 111-128.
Hallam, S. (1997). The development of memorization strategies in musicians: implications for
education. The British Journal of Music Education, 14, 87-97.
Lehmann, A. (1997). Acquired mental representations in music performance: Anecdotal and
preliminary empirical evidence. In H. Jørgensen, & A. Lehmann (Eds.), Does practice make perfect?
Current theory and research on instrumental music practice (pp. 141-164). Oslo, Norway: Norges
musikkhøgskole.
Marcus, A. (1979). Great pianists speak. Neptune, NJ: Paganiniana Publications, Inc.
Matthay, T. (1913). Musical interpretation: Its laws and principles, and their application in teaching
and performing. Boston, MA: Boston Music Company.
Matthay, T. (1926). On memorizing and playing from memory and on the laws of practice generally.
Oxford: Oxford University Press.
Miklaszewski, K. (1989). A case study of a pianist preparing a musical performance. Psychology of
Music, 17, 95-109.
Miklaszewski, K. (1995). Individual differences in preparing a musical composition for public
performance. In M. Manturzewska, K. Miklaszewski & A. Bialkowski (Eds.), Psychology of Music
Today: Proceedings of the International Seminar of Researchers and Lecturers in the Psychology of
Music,(pp.138-147). Warsaw: Fryderyk Chopin Academy of Music.
Noyle, L. (1987). Pianists on playing: Interviews with twelve concert pianists. Metuchen, N.J.: The
Back to index
Proceedings paper
Notes: Letters a) to n) represent the following theories (from Ortony & Turner 1990, p. 316, Table 1): a) Arnold (1960)*; b) Ekman, Friesen & Ellsworth (1982); c) Frijda (1986); d) Gray (1982); e) Izard
(1971); f) James (1884)*; g) McDougall (1926); h) Mowrer (1960); i) Oatley & Johnson-Laird (1987); j) Panksepp (1982); k) Plutchik (1980); l) Tomkins (1984); m) Watson (1930)*; n) Weiner & Graham
(1984); o) Clynes (1982)**; p) De Vries (1990); q) Juslin (1998).
* these authors replace "joy" for "love"
** Clynes differentiates "joy" und "love"
In sum, the question of basic emotions seems a matter of ongoing debate. Therefore, a pragmatic solution in the search for basic emotions in music seems to be the use of a rather broad repertoire to
start out with. It was decided to compile this set from general theories and some theories of emotion in music, which are outlined above. The resulting list of thirty-two emotion categories was used in
the two parts of the empirical study, namely the semantic ratings (data set one) and the survey of a database storing lyrics of songs and Lieder (data set two).
Data set one: Semantic ratings
Subjects: Forty-nine participants (35 female) were recruited from the university student population. Most subjects were at least moderately musically trained. The majority has received instruction on one
or more musical instruments for at least two years.
Questionnaire and procedure: Each subject received a questionnaire consisting of five pages (DIN A4). On the first two pages, subjects had to fill in demographic data and data on musical experience
and some aspects of their music consumption. The third to the fifth page contained a list of thirty-two emotion words. On the top of each page, there was one question each. The questions were in
particular: Can music represent each of the emotions given in the list? Can music evoke each of the emotions given in the list? And finally: Can music influence each of the emotions given in the list? Each list of items contained the
same set, but in different order. Each category was rated on a five-point Likert-type scale, where ratings to the left of the midpoint indicated disagreement and ratings toward the right indicated
agreement. It was further explained that judgements should be based on pure musical sound and not in relation to pictures or lyrics with the music. Subjects were tested as groups in a seminar room.
Some individual subjects filled out the questionnaire at home. In the seminar, filling out the sheets took about twenty minutes. In general, the procedure posed no problems to the subjects.
Results and discussion
Figure 1: Mean ratings of categories on the basis of whether music might express, evoke, or influence a given emotion.
For the averaged ratings, significant correlation coefficients were found (Table 2). However, toward the centre of Figure 1, a number of significant mean differences of individual items must be
observed. The number of these significant differences exceeds the number expected according to test theory for multiple comparisons of means. It is plausible that interest, boredom, mercy, and tiredness are
rather evoked than represented by music (p < .01), whereas items such as pain, loneliness, fear, despair, and pride reveal a reverse tendency (p < .01). A special status must be attributed to surprise. Ratings on
this item are lower for influence as compared to the other two ratings. Those items, which average around the midpoint of the scale and which are characterised by higher variance seem to be particularly
interesting. It can be inferred that some subjects seem to attribute a certain degree of effectiveness of music representing, evoking, or influencing these feelings However, a much larger sample would be
required to address individual differences involved here.
Table 2: Product moment correlation coefficients between the three rating scales.
Evoke Influence
Express 0,80 0,85
Evoke 0,93
Figure 2: Frequencies of emotion words in German (N=2933) and English (N=1785) songs in a lyrics database
The final step of the analysis tried at least to establish, whether the occurrence of emotion categories is quantitatively related to their subjective importance, as was assessed in the first part of this study.
For this purpose, averaged rating profiles and frequencies of emotion categories in the database were correlated. Coefficients of determination (indicating variance explained) were calculated (Table 3). In
general, frequencies are best predicted as a curvilinear, parabolic function of averaged subjective ratings. Moreover, the coefficients are highest, if frequencies are higher than one percent, and
particularly, if the evoke and influence ratings rather than express ratings are considered.
Table 3: Correlation (coefficients of determination) between averaged ratings and frequencies of emotion categories in a song lyrics database.
Express Evoke Influence
Total R2 (linear) 0.26 / 0.25 0.15 / 0.14 0.21 / 0.19
R2 (parab.) 0.42 / 0.36 0.26 / 0.26 0.32 / 0.32
N > 1% R2. (linear) 0.24 / 0.21 0.30 / 0.21 0.32 / 0.29
R2. (parab.) 0.47 / 0.33 0.69 / 0.54 0.64 / 0.56
General Discussion
Studies in the psychology of music have claimed the importance of language as a referential system for the psychological reality of music (see Fricke, 1999; Kleinen, 1999). The present investigation used
linguistic categories to determine emotion stereotypes in music. The relationship of every-day and music emotions was determined on the basis of two independent sets of data. The first set of data
consisted of subjective ratings on thirty-two emotion words. Even in the absence of sounding music, it became clear that subjects differentiate emotions on the basis of whether they can be expressed,
evoked, or influenced by music. There was a significant effect that music is thought of as expressing rather than evoking a given emotion (F[2,56]=8.3; p<.002). Only few emotions, which indicate
motivation or attitude (interest, boredom, tiredness) stood against this trend. It was found that those emotions received high ratings, which according to Mowrer (1960) require no learning (pain, joy) and
which are presumably acquired before a social identity is fully developed (in particular love, sadness, desire, loneliness, anger). Other highly rated emotions indicated degrees of activation (unrest, relaxation).
Contrary to these, most social emotions (shame, pride, jealousy, disgust) play little or no role in music. However, there are other categorical systems, which may aid to interpret these findings. Mees (1985)
distinguishes relationship, empathy and target as major categories of emotion. The latter are differentiated by evaluation, expectation, attribution and moral emotions. Finally, each group of emotions is composed of
positive and negative opposites. Relationship (love), evaluation (joy/sadness) and expectation (desire) are easy to identify in this system. But there is no clear categorisation for pain, which might not be an emotion
in Mees' theory. Considering the four basic emotions used in Terwogt & Van Grinsven's (1991) study, fear and anger were less identifiable emotions than joy and sadness in music selections. Perhaps
emotions such as joy and sadness, which are both to be considered positive in the sense that they are highly appreciated in musical contexts, are more expected in music than the latter, which seem less
appreciated. In light of the present findings the former two emotions also drop off as evoked or influenced as compared to expressed feelings. It might be that the identification of an emotional state in
music depends to some extent on resonance in or involvement of the listener. Summarising so far, results suggest that there need to be distinctions made between every-day and music emotions. The
particular kind of emotion and the developmental state of their acquisition seem to be important key factors. Music addresses those emotions which might be considered instinctive or which have a
physical basis.
A search of emotion words within a database of about 4700 German and English lyrics generated a frequency profile, which correlated significantly with the semantic ratings. In other words, subjective
ratings predict to some degree the frequency of an emotion category (as represented in word fields) in lyrics over various centuries. Poets/composers seem more devoted to those emotions, which music
might evoke or influence, and they seem less interested in the full range of expressible emotions. There is an undeniable dominance of themes of "love", "pain", "joy", "sadness", and "desire", which
render any other emotion marginal.
In particular, even though love is not considered as an emotion by most authors (cf. Table 1), it certainly indicates a fundamental human need, and an orientation of social being. A basis for this
phenomenon in mere joy seems inadequate. Some sociologically oriented, empirical studies tend to support a close relationship of love, sexuality, and music (Gembris, 1995; Kreutz, 1997, 1999). There is
historical evidence that love has been subject matter of much music production long before this stereotype has come under the rule of commerce in the industrial age. It should be noted here that
poets/composers prefer to address motives (love, pain) in their lyrics rather the commonly associated affects (joy, sadness). This preference most easily explains the inverted-U shape found in the
correlation of the two data sets.
Finally, some limitations and perspectives of this study should be addressed. No sounding music was used here. It remains uncertain, how exactly subjects interpreted the three questions, whether ratings
were based on music in subjects' minds or other associations, or whether familiarity with a given emotion term influenced the ratings. At least, the procedure seemed efficient enough to generate a
plausible pattern of emotion stereotypes in music, which is further corroborated by previous studies and, again, by the amount of correlation between the two independent sets of data in the present
study.
There is no apparent solution to the problem that emotions in poetry and lyrics do not necessarily surface in explicit linguistic labels, but often emerge from the interpretation of symbols, rhetoric
Appendix
N=1865 unique English texts
Love 634 Beloved affection fondness liking passionate beloved Cupid cherish
enamoured amorous
Pain 291 Anguish painful pang harmful suffer distress trouble tears sigh bitter
hurt ache achiness twinge
Joy 378 Joyousness pleasure pleasant glad rejoice gladden happy enjoyment
happiness delight cheerful enjoy fortune fortunate elated elation funnies
gratification blitheness maffick please
Sadness 236 Sad unhappy grieve grief sorrow deplore inept saddish lamentation
lamento mourn
file:///g|/Tue/Kreutz.htm (9 of 11) [18/07/2000 00:34:15]
Gunter Kreutz
Fear 122 Danger afraid alarm fearful dread anxiety reverence risk
Contempt 1 Disrespect
Shock 1 Concernment
Back to index
Back to index
Proceedings abstract
Michael Grossbach
Michael.Grossbach@gmx.de
Background:
Aims:
To identify and differentiate neuronal networks involved in a putative
supra-modal analyser for processing temporal structures in both visual and
auditory sequences, this study investigates brain activation patterns in
healthy musicians during a same-different task with stimuli of both modalities.
method:
Results:
Conclusions:
Back to index
Proceedings abstract
kaisu.korosuo@helsinki.fi
Background:
Aims:
In the other experiment, subjects were instructed to press the button every
time they detected a deviant tone.
Results:
It is possible that some neural correlates for tonal hierarchies could be found
already at non-attended level. Mismatch negartivity (MMN) which reflects
automatic change detection seems to be largest for the dominant tone which had
the highest status in tonal hierarchy among the deviant tones. In attentive
condition dominant tone also differed from ERPs to the other tones.
Conclusions:
The results demonstrate that the neurocognitive basis of tonal hierachies can
be revealed with ERP recordings despite experimental manipulations of the
subject's attentional focus.
Back to index
Proceedings abstract
Helen Kuck
helen_kuck@yahoo.com
Background:
For the visual modality it has been found that global features are processed
within the right hemisphere whereas local features are processed within the
left hemisphere. Transferring these results to the auditory modality we
hypothesised that the processing of rhythm (a local time structure in music)
might also be located within the left hemisphere whereas the processing of
metre (a global time structure) might be lateralizied to the right hemisphere.
No systematic study has yet been carried out to investigate a comparable
hemispheric dissociation in healthy subjects. Preceding studies with epilepsy
patients showed ambiguous and partly contradictory results.
Aims:
Method:
Results:
Conclusions:
As rhythm and metre are found to be processed in the same networks it remains
unclear, whether these networks are specific for musical time structures or
belong to a supramodal time processing unit. Moreover the additional activation
of parietal areas during processing of rhythm points to higher demands in
temporo-spatial integration of sensory information as compared to the more
global metre task.
Back to index
Proceedings paper
INSIGHTS INTO THE FUNCTIONAL ORGANIZATION OF MUSIC PROCESSING REVEALED USING CONTINUOUS
ACROSS-SUBJECT EVENT-RELATED POTENTIAL AVERAGING
Douglas D. Potter (1,2), Helen Sharpe (1), Deniz Basbinar (1), Susan Jory (1)
(1) Keele University, UK (2) Now at University of Dundee, UK
d.d.potter@dundee.ac.uk
http://www.dundee.ac.uk/psychology/ddpotter/
Introduction
Recent functional imaging studies have shown that pitch and rhythm processing utilizes resources primarily in the left hemisphere and that timbre processing utilizes resources
primarily in the right hemisphere . We report here preliminary findings using single trial across subject averaging of electrical brain activity associated with passive listening to
unfamiliar pieces of music. The advantage of this technique is that one can look at unique and transient changes in the operation of perception, memory and attention mechanisms
that are occurring over the time scale of seconds and minutes, but with a resolution of milliseconds.
Method.
In this study participants listened to 3 different and unfamiliar pieces of music while the ongoing electroencephalogram was recorded at 500hz from 19 standard 10/20 locations on
the head. A continuous, single trial across subject, topographic map of activation (using an average reference) was generated from these continuous samples.
Results / Discussion
Only brief samples of the continuous recording are presented here. These can be viewed in the attached shockwave files y7s8, f8s4 and f9s4 by clicking on the links when using
internet explorer 5 or another browser with shockwave capability. In simple terms the blue negative patches indicate areas of high negative potential and the red areas high positive
potential. In general terms negative potentials often indicate sustained enhanced activation and positive potentials indicate transient inhibitory processes but such simple
interpretations do not cover the full range of possibilities. In the present images blue and red are best treated as crude indicators of regions in which more activity is occurring or has
occurred. Representative frames from these movies are illustrated below in Figures 1-3. It is clear from the movies that musical pieces with different structures evoke quite different
patterns of activation. However there are common features in these patterns of activation that would be expected given the specific structural features of these stimuli. Both y7s8 and
f9s4 have an abrupt onset to the music and in these movies the first prominent feature is a fronto-central P3a that is associated with the brain rapidly orienting attention to this new
stimulus. In the case of f8s4 the piece starts slowly and so a P3a is not obvious in this recording. In all the movies a more posterior positive feature is observed following the P3a and
this would typically be classed as a P3b. In the examples given here this positive feature is quite variable in distribution. This is a probably a result of the differing structure of the
pieces. In standard experiments that evoke P3b deflections and involve averaging of several trials within subject, the distribution is relatively diffuse. The P3b is believed to be made
up of a number of distributed sources in the cortex as well as the hippocampus. In the present results it appears that more evidence of multiple sources can be discerned possibly as a
result of the trial unique nature of the response. The P3b deflection is generally regarded as marking the operation of certain long-term memory processes.
Figure 1. y7s8 Single-trial across-subject average (n=32) based on 19 standard 10-20 positions. Potential distribution is projected onto a 3-d model of head viewed from the rear.
Example frames of main event related potential features observed during first 8 seconds in passive listening to music stimulus. Main features are central P3a at 440 msec, parietal
P3b at 640 msec, right posterior occipito-parietal ?P3R? at 800/1720 msec and right ventral-occipital / left frontal activation in 800-3560 examples. Voltage Range +/- 10 microvolts.
Figure 2. f8s4. Parameters are as stated in Figure 1, except that potential distribution projection is onto a convex surface that allows a view of the entire potential map. (n=20)
References
Auzuo, P., Eustache, F., Etevenon, P., Platel, H., Rioux, P., Lambert , J., & al., e. (1995). Topographic EEG activations during timbre and pitch discrimination tasks using
musical sounds. Neuropsychologia, 33(25-37).
Cabeza, R., & Nyberg, L. (1997). Imaging Cognition: An Empirical Review of PET Studies with Normal Subjects. Journal of Cognitive Neuroscience, 9(1), 1-26.
Kelley, W. M., Miezin, F. M., McDermott, K. B., Buckner, R. L., Raichle, M. E., Cohen, N. J., Ollinger, J. M., Akbudak, E., Conturo, T. E., Snyder, A. Z., & Petersen, S. E.
(1998). Hemispheric asymmetry for verbal and nonverbal memory encoding in human dorsal frontal cortex. Journal of Cognitive Neuroscience, 46-46.
Mazziotta, J. C., M.E., P., Carson, R. E., & Kuhl, D. E. (1982). Tomographic mapping of human cerebral metabolism: auditory stimulation. Neurology, 32, 921-937.
Mazzucchi, A., Marchini, C., Budai, R., & Parma, M. (1982). A case of receptive amusia with prominent timbre perception defect. Journal of Neurology, Neurosurgery and
Psychiatry, 45, 644-647.
Petersen, S. E., van Mier, H., Fiez, J. A., & Raichle, M. E. (1998). The effects of practice on the functional anatomy of task performance. Proceedings of the National
Academy of Sciences of the United States of America, 95(3), 853-860.
Phelps, M. E., & Mazziotta, J. C. (1985). Positron emission tomography: human brain function and neurochemistry. Science, 228, 799-809.
Platel, H., Price, C., Baron, J. C., Wise, R., Lambert, J., Frackowiak, R. S. J., Lechevalier, B., & Eustache, F. (1997). The structural components of music perception - A
functional anatomical study. Brain, 120, 229-243.
Snyder, A. Z., Abdullaev, Y. G., Posner, M. I., & Raichle, M. E. (1995). Scalp Electrical Potentials Reflect Regional Cerebral Blood- Flow Responses During Processing of
Written Words. Proceedings of the National Academy of Sciences of the United States of America, 92(5), 1689-1693.
Zatorre, R. J., Evans, A. C., & Meyer, E. (1994). Neural Mechanisms Underlying Melodic Perception and Memory for Pitch. The Journal of Neuroscience, 14(4), 1908-1919.
Zatorre, R. J., Evans, A. C., Meyer, E., & Gjedde, A. (1992). Lateralization of Phonetic and Pitch Discrimination in Speech Processing. Science, 256, 846-849.
Back to index
Proceedings abstract
aoyagi@usa.com
Background:
Aims:
This paper will analyze the reported and possible problems of the probe-tone
method. Especially, the problems caused by the cultural difference between the
experimenter and subjects will be addressed. Also, I will introduce a new
method of measuring a tonal hierarchy. This new method is designed to be robust
for certain problems, as there is less room for subjective interpretation.
Main contributions:
Until today, there had been only one method of quantifying importance of
various pitches in music. The new method that is to be introduced in this paper
may be employed as an alternative to the probe-tone method when the probe-tone
method may not be a viable method of data collection. Also, this new method may
be used as a confirmatory method for the tonal rating where the probe tone
method can be used.
Implications:
Many scholars have equated the results of probe-tone rating as the tonal
hierarchy. This suggests that we have tendency to resort to just one
methodology without examining its implication extensively. This paper will
suggest the importance of studying music of other cultures because it reveals
that our cultural bias is at work even in experimental studies.
Back to index
Proceedings abstract
Back to index
Proceedings abstract
LOCAL AND GLOBAL REPRESENTATIONS FOR MUSIC
wcooper@utdallas.edu
Background:
Tanaka and Farah (1993) found that parts of a face are better recognised when
presented in the context of a face, than the individual parts of the faces are
recognised when presented in isolation from the face.
Aims:
The current investigation seeks to find evidence indicating that the perception
of complex music, like the perception of faces, is reliant on a global
representation.
method:
Using four-part hymn music, it will be determined whether the accuracy for
identifying changes in an individual melodic line is higher when that melodic
line is imbedded in the context of three other accompanying melodic lines, or
when the individual melodic line is presented alone.
Results:
Conclusions:
Back to index
Proceedings abstract
DETECTION OF UNEXPECTED PITCHES IN A MUSICAL CONTEXT
jdowling@utdallas.edu
Background:
Previous research has found that when listeners expect target tones in a
particular pitch region, targets falling outside those regions often go
undetected. That is, targets markedly different in pitch, typically by half an
octave, are more difficult to detect.
Aims:
This study aims to test whether, in addition, tones that are unexpected in
terms of musical structure would also be more difficult to detect. By
"unexpected" we mean not conforming to the musical scale structure of a cue
melody presented at the start of each trial. There are two levels of
expectation involved: pitches outside of the key but nevertheless part of the
"tonal material" in the culture (for example, a C# in the key of C), and
pitches outside of the tonal material (for example, the quarter step between C
and C#).
method:
Listeners have hearing thresholds assessed for this situation, and perform a
series of two-alternative-forced-choice detection trials near their individual
thresholds. For each alternative the listener hears all but the final note of a
familiar cue melody normally ending on the tonic, followed (or not) by a target
tone. The listener has to say which alternative contained the target tone. On
most trials the target was in the expected pitch region, and of those trials it
was most often the expected tonic. On some trials the target was approximately
one-half octave higher or lower. One-third of the unexpected trials were
pitches of the musical scale in the key of the melody; one-third were nonscalar
semitones; and one-third were quarter steps.
Results:
Conclusions:
Musical structure in the form of the scale framework and the framework of tonal
material in a culture affects even the very early stages of pitch processing.
That is, it does not appear than a pitch is perceived, and then "encoded" as a
scale note or something else. Rather, even the detection of a tone is affected
by whether the pitch processing system expects something in that category.
Back to index
Proceedings paper
Tone constellations: Stabile melodic intervals determine the best tonal fit
Erkki Huovinen, University of Turku, Finland
and "powerful" intervals was held by the perfect fifth and the perfect fourth. In this context,
Hindemith is an especially telling example, because he also claimed the most harmonically powerful
intervals to stand out of the musical texture and determine the tonality, and consequently his methods
for musical analysis rely heavily on this conception. Despite these intuitions, there has been a lack of
studies on how the more "stabile" intervals could affect the forming of tonal centers especially in
melodic contexts.
The Tonal Hierarchy Theory in its original form does not say much about what brings the feeling of
tonality about in the first place. In terms of intervals we are only told that certain intervallic
relationships to the pre-established tonal center are preferred over others. Information about the
tonality-forming powers of the intervals cannot therefore be gained by the usual probe-tone method of
the Tonal Hierarchy theorists - by asking experimental subjects how well certain tones fit into an
unambiguously given tonal center. A more appropriate method is the one adopted by Auhagen. In the
study mentioned above, he played short to subjects asking them in each case to produce a suitable
tonal center by operating a tone generator with twelve buttons, one for each pitch class. Auhagen
himself pays more attention to the effects of the temporal ordering of the tones, but his data can also
be used to test more structurally oriented intuitions about the importance of certain intervals to the
phenomenon of tonality. One part of the tone material used consists in 182 different five-tone pitch
strings. It turns out that for these strings, 211 (88%) of the 241 statistically significant tonal centers
reported by Auhagen form a fifth or fourth, that is, an interval of interval-class 5 (hereafter ic5) with
at least one of the other tones in the string. Some of the statistically significant tonal centers produced
by the subjects were tones outside the original five tones played for them, and so the total sum of pitch
classes to consider was 951. Of these, 602 (63%) formed an ic5 with one of the other tones in the
string. The 95% confidence interval for the difference between the proportion of ic5-forming tones
among the tonal centers and the proportion of ic5-forming tones among all tones used is therefore
0.25 ± 0.05. This means that forming an ic5 with some other tone in the string has clearly been a
desirable property for tones candidating to be elected as tonal centers. This would promote
Hindemith's idea of fifths and fourths between some (not necessarily successive) tones of the melody
acting as guides to its tonal structure.
Results such as this point to the possibility of finding in single stabile intervals a positive structural
constraint for tonal center perception. It must be emphasized that stability is here understood as a
property assignable to intervals independent of context and not a property that is assigned to single
tones due to their intervallic relation to a given tonic. Of course, intervallic stability would be too
weak a criterion to work as a sufficient condition for locating the tonal center. This becomes
especially clear in a diatonic context if we regard the intervals of ic5 as stabile: each and every tone of
the diatonic set participates in an interval of ic5 with one or two other tones of the set. But then again,
this is no argument against the conception of ic5 as a stabile, tonality-promoting element. It has been
demonstrated (West & Fryer, 1990) that for the seven diatonic tones presented in random order,
musically trained listeners can besides the major-mode tonic equally well choose the mediant,
subdominant or dominant as tonal centers. This can be taken to show that when the effects of temporal
ordering are suppressed, there is not enough structural differentiation in terms of interval stability in
order that the tonality be unambiguous. It can still be hypothesized that in a melodic context
containing fewer supposedly stabile intervals of the ic5 the listeners would in their tonal decisions
very much orientate by these intervals.
Tone constellations
The issue of choosing a tonal center seems often to be approached by asking "Given certain tones,
which one of them becomes accepted as the tonal center?" This implies a conception of tonality as
something that the producer of the music (such as a composer) builds into the music and that is
thereafter tracked down by the listener. If, however, we think of tonality primarily as something that
the listener puts into the music by way of focusing her attention on a certain pitch or pitch class, then
the question can be turned around. We can ask: "Which tones does the listener accept around a tonal
center?" As a tool for studying the tonal implications of particular interval classes, I would like to
introduce the concept of tone constellation, which can now be defined as the set of tones accepted
around a tonal center by the listener. Tone constellations can be best thought of as pitch-class sets
abstracted out of all tones present in the music and arranged around the chosen tonal center. Different
tone constellations are compared to each other by transposing them so that the tonal center always
corresponds to the same pitch-class, say, the class of all Cs. In this paper, pitch-classes (hereafter pc)
are referred to by the conventional number notation, whereby 0 = C, 1 = C#/Db, 2 = D etc. (see e.g.
Rahn, 1980).
In conjunction with the experimental method promoted by Auhagen, tone constellations allow us to
see whether the listener prefers certain intervallic relationships to the tonal center over others. To this
effect, the tonal material of the melodies used in the experiment is described as pitch-class sets
(hereafter pcsets), and for each subject, each pcset is transposed to move the chosen tonal center to
pc0. For example, if the melody consisted in tones C, E, G and B, one subject might hear C as the
tonal center, whereas someone else might prefer E. The tone constellation for the first subject would
then be [C, E, G, B], or [0,4,7,11], and the constellation for the second subject would be [C, Eb, G,
Ab], or [0, 3, 7, 8]. In each case, pc0 represents the chosen tonal center. Note that both of these tonal
interpretations are perfectly explicable within a traditional tonal theory: the first subject has preferred
a "major" tonality, whereas the second one has opted for a "minor" tonality. By taking the tonal
intuitions of the listener seriously, this method thus allows for individual listening strategies to be
taken into account.
The question concerning the stability of ic5 can now be easily formulated in terms of tone
constellations. We can ask, to what extent do the listeners use tone constellations with pc5 or pc7 in
them? In other words, is the listener likely to shift her tonal focus until a tonal center is found, which
stands in these favorable intervallic relationships to some other tones? It can be hypothesized, for
example, that if the heard melody consists of tones of a pcset that includes only one possibility of
forming an interval of a fourth or a fifth, then the listener will be likely to choose such a tone for the
tonal center that is a part of this stabile interval. This hypothesis was tested in an experiment that will
be described below.
Procedure
The tone material used in the experiment consisted in all five-tone pcsets whose interval vectors have
number 1 as their second last component. Interval vector (Forte, 1973), which is also sometimes more
properly called the interval-class vector (Morris, 1987) is a description of the total intervallic content
of a pcset, which enumerates the number of possible instances of each ic within the pcset. The interval
vector [254361], for example, describes the intervallic content of the diatonic set, where there are two
possible instances of ic1 (minor seconds/major sevenths), five instances of ic2 (major seconds/minor
sevenths) etc. The pcsets used in the experiment to be described had number 1 as the second last
component of their corresponding interval vectors, indicating only one possible instance of ic5
(perfect fourths/perfect fifths). In all, there are 20 such pcsets, and two of the have always a common
interval vector. Below there is a list of the interval vectors, the corresponding pcsets used, and for
reference also the customary "Forte names" and the prime forms of the sets (see Forte, 1973). Note
that every pcset used in the experiment included pc0 and pc7, which together make up the only
possible ic5.
Each pcset was used as material for one melody which incorporated high speed and large intervals in
random-like succession to make functional clues (such as the temporal order of the tones)
insignificant in assessing the tonal center. First, the 20 pcsets were arranged in such an order that by
transposition there would be no common tones between successive trials, if possible. The melodies
were then composed in 5/8-time and they were all 18 measures in length, where every measure
consisted of all the pcs of the appropriate pcset. The registral space for each melody was principally
G3-F#5, and the registral position and the order of the tones were changed measure for measure so
that no pc could appear in the same registral position and/or the same part of the measure in more than
two consecutive measures. Three consecutive notes were not allowed to form any inversion of a major
or minor triad, and the same pitch class was not allowed to appear two times in succession at measure
boundaries. To avoid registral accents (see Huron & Royal, 1996) pcs7-0 that would otherwise have
predominated the low register were here and there moved up to G5-C6, and likewise pcs1-6 that
would otherwise have predominated the high register were moved down to C#3-F#3. This was carried
out according to the rule that if such a half-octave (pcs1-6 or pcs7-0) included two tones of the
pentachord, one of these would in turn be moved out of the principal registral space every four
measures, and if the half-octave included three of these tones, one of these would be likewise
transposed every three measures. (In some cases, the pentachords had to be transposed a little to
achieve this division of the five tones in groups of two and three at the boundary of the two
half-octaves.)
The melodies were then played using a Power Macintosh 7200/90 computer with Finale 3.5.1
software, with a flute-like sound chosen from the internal sound bank of the software program. The
duration of each tone was fixed to 150 ms, which made the duration of one measure 750 ms and the
total duration of each melody 13.5 s. The high speed of presentation combined with the large interval
skips and unbiased ordering of the tones were meant to hinder conscious choices of tonal center based
on familiar temporal and registral successions. Further, the activation of metrical accents and order
effects were minimized by providing the first two measures with a continuous crescendo from zero
amplitude and likewise the last two measures with a continuous diminuendo ending in zero amplitude.
The melodies were finally recorded on minidisc by a Sony MZ-R35 minidisc recorder and reproduced
in the experiment through a stereo system (amplifier Pioneer SA-510, loudspeakers Infinity
J814-200640) at a comfortable amplitude level.
The test subjects were 73 music professionals, students and amateurs aged between 17 and 53 years
(M = 27.43, SD = 9.13). All of them were tested individually. The subjects were asked to listen for a
tonal center in the melodies, where the concept of "tonal center" was explained to mean "the most
stabile tone in relation to the other tones". There were two practice trials before the 20 primary trials.
During each trial, the subject listened to the melody as many times as he/she wanted in order to be
able to sing, hum or whistle the tone felt to be most suitable as tonal center. The experimenter, who
was out of sight from the subject, checked all the answers on a keyboard with headphones before
writing the answers down
tones.
These results already show, however, that there is a lot more individual variation in the decisions
concerning tonality than one might think. In the context determined by the pcsets described above, ic5
was indeed found to act as a relatively strong criterion in determining tonality, but it still explains only
a part of the results, as is clear from the figures given above. Here the concept of tone constellation
comes in handy. Tone constellations are arrived at by transposing the pcsets for each answer type so
that the chosen tonal center becomes pc0. For example, the set {0,1,3,7,9} has received answers for
nine of the possible twelve pitch classes. Apart from the obvious pc0 (31 answers) there was a
considerable concentration of answers on pc3 (14 answers). By transposing the pcset a minor third
down we find the corresponding tone constellation {0,4,6,9,10}, which now represents the auditive
impression of 19% of the participants. When all answer types have been treated similarly, the result
will be a list of tone constellations with their respective frequencies of occurrence.
To simplify the analysis, subjects with divergent answers were left out. Originally it was hoped that
the relative length (13.5 s) of the test melodies would constrain the subjects to find a tonal center
among the tones actually heard and not try "resolving" perhaps dissonant-sounding melodies into
tones outside the used pcset. To a great extent, this was what also happened. Only 11 of the 73
participants gave an answer outside the pcset more than four times. As it was hard in some of these
cases to determine whether the subject was applying a consistent strategy or whether he/she just had
"a bad ear", these cases were left out from the final analysis. For the remaining 62 subjects, only 5.2%
of the answers fell outside the heard pitch classes.
When the tone constellations for the remaining subjects had been calculated, it was found that most
subjects had a tendency to include one or more particular pcs in their tone constellations. In other
words, certain intervallic relationships to the tonal center were often systematically favored by the
subject over others. This can be seen by listing those pcs that appear in, say, at least 10 of the 20
constellations of each subject. The four pcs to rise above this limit most often were pc7, pc3 (both in
35% out of the 62 cases), pc9 (32%), and pc4 (29%). What is of interest here is the relatively small
overlap between these four groups of listeners. This is illustrated in figure 1, which combines the
groups preferring pc3 and pc4. The three strategies considered are thus those in which the tone
constellations included (1) pcs [0,3] or [0,4], (2) pcs [0,7], and (3) pcs [0,9]. Together these three
strategies cover 89% of the cases, but the respective groups of subjects appear to be rather distinct.
This is to say that although the subjects often appear to use individually consistent tonal hierarchies,
these hierarchies can differ significantly from each other. What is more, the different strategies were
not found to correlate with differences in age or musical education. These results thus show that there
might be a lot more individual variation in listeners' mental representations of tonality than is
sometimes thought to be the case.
Figure 1. The three most favored intervallic strategies were to include in the tone constellation an
interval of major or minor third, fifth, or major ninth above the tonal center (or their complementary
intervals below it). The ellipses represent the groups of subjects who used one of these strategies in at
least half of the 20 tone constellations. The percentages show the relatively small overlap between
users of the different strategies.
These figures should still be regarded with caution, however, because no mention has been made of
the relative frequencies of occurrence for intervals in the pcsets used. The interval vectors shown
above reveal that different interval-classes are more common in the pcsets than others, and as a
consequence it is no wonder that certain pcs pop up in the tone constellations more often than others,
while listeners mainly choose tonal centers from the tones in the pcset. For this reason, the relative
frequencies of occurrence for intervals in each subjects constellations have to be normalized as if the
starting point had been equal for all intervals. This has been done here by first determining a mean
interval vector, which describes the average interval content of the pcsets used, and then dividing the
frequencies of occurrence for each interval by the appropriate number in the mean interval vector. For
the 20 sets used in the experiment, the mean interval vector is [2, 2, 2.2, 1.8, 1, 1]. The vector reveals,
among other things, that the popularity of pc3 and pc9 in the tone constellations was at least partly
due to their prominence in the tone material itself. If the frequencies of occurrence counted for pc3
and for pc9 are divided by 2.2 (the number for ic3 in the vector) and the similar operation is carried
out for other pcs, the results are seen in a different light. When all results are modified in this way, we
come up with measures for the relative desirability of pcs in tone constellations, which describe the
test subjects' strategies of tonal focusing in a context determined by the 20 pcsets, but normalized as if
the interval vector of the tone material was [111111]. Note that in case of the last component of the
interval vector the results must in addition be divided by 2, since a single ic6 in the pcset results in the
inclusion of pc6 in two (not one) of the five possible constellations built around the five pcs of the set.
The interval vector can easily be given a probabilistic interpretation (Lewin, 1977). On the assumption
that the subjects choose their answers solely from the five pcs heard, each pc (except ic6) has in the
normalized situation the probability 0.2 of being included in a tone constellation. Now it is also
reasonable to define a limit for statistical significance with respect to how often the subject has chosen
a certain pc in her tone constellations. If we choose 0.01 as the critical value, it turns out that, by
binomial distribution, the listener has to opt for a given pc nine or more times out of 20 (p = 0.00998)
in order for the pc to be a statistically significant inclusion in the constellations. It turns out that for 38
subjects of the 62 under consideration, the desirability of at least one pc rises above this limit.
However, after the normalization process these desirable pcs only include pc7 (for 52% of the
subjects) and pc5 (for 15% of the subjects). The overlap of the respective subject groups is still
relatively small: the two significantly desirable pcs coincided at the same subject in only 3 cases out
of 62.
Figure 2 illustrates the relative desirability of the pcs 1-11 as inclusions in the listeners' tone
constellations. For each pc, the broken line represents the factual average proportion of inclusion in
tone constellations (the results before normalization); this corresponds to the situation illustrated
earlier in Figure 1. The continuous line in turn represents the average desirability (the normalized
results). It is easy to see that the broken line somewhat misrepresents the situation as it gives the
impression that pc3 and pc4, for example, would have been especially favored and pc5 the least
wanted companion to the tonal center. However, the relative absence of pc5 in the actual tone
constellations is due to the structure of the pcsets used, all of which included only one possibility for
an interval of ic5 in them. By choosing pc7 in their constellations the listeners automatically cancelled
out pc5, and vice versa. This is also reflected in the relatively large dispersion of the desirability
measures of these two pcs, which only repeats what was said in the preceding paragraph: many
subjects had a strong tendency to choose mostly either pc7 or pc5 in their tone constellations. The
relatively high average desirability of pc5 shows that pc5 was not at all neglected in the process of
tonal focusing as the non-normalized results appear to imply. In contrary, it was used as much as the
structure of the pcsets allowed it to be. The methodological lesson to be learned is that unless the
intervallic possibilities of the tone material are properly taken into account, the results of studies in the
perception of tonality will fail to reflect the nature of the decision-making process of tonal focusing.
What the listener hears is not always what she seeks to hear. What she strives for is not always what
she gets.
Figure 2. The average proportion of inclusion in tone constellations for pcs 1-11 is given by the
broken line, and the relative desirability by the continuous line with standard deviation error bars.
It is true that the measures of desirability outlined above do not reflect the actual auditive impressions
of the listeners, which are indeed more faithfully conveyed by the results before normalization. What
they do reflect is the listeners' attitudes towards particular intervallic relationships to the tonal center.
While pc3, pc4, pc6, pc8 and pc9 have been readily accepted by the subjects to appear alongside a
tonal center of pc0, there has still not been much attempt to force them into the tone constellations any
more than the tone material naturally gives occasion to. The intervals belonging to ic5, in turn, have
proved their supposed stability by exerting a considerable pull towards themselves in the listeners'
process of tonal focusing. For typical western listeners, these stabile intervals help to determine the
best tonal fit, the best way to make sense out of new and unfamiliar melodies. However, this is only
one side of the coin. Most of the rules for intervallic strategies of tonal hearing in melodies may no
doubt be functionally oriented, that is, more concerned with the temporal and registral relationships of
the tones. The present study nevertheless indicates a desirable structural condition for these
functionally organized melodies to fulfill: if possible, the tonal center must be part of a stabile interval
belonging to ic5. This structural interval will then act as the basis on which particular intervallic
contexts can develop.
On the whole, tonal focusing is a complex phenomenon that may be subject to a substantial amount of
variability among listeners. Individuals may have different criteria for intervallic stability, that is,
different criteria as to which intervallic relationships to the tonal center are to be considered tonally
fitting. Approaching the problem in terms of tone constellations helps to pay attention to these
individual strategies of tonal hearing.
References
Auhagen, W. (1994). Experimentelle Untersuchungen zur auditiven Tonalitätsbestimmung in
Melodien. Teil 1: Text. Kassel: Gustav Bosse Verlag.
Balzano, G. J. & Liesch, B. W. (1982). The role of chroma and scalestep in the recognition of
musical intervals in and out of context. Psychomusicology, 2, 3-31.
Browne, R. 1981. Tonal implications of the diatonic set. In Theory Only, 5(6-7), 3-21.
Butler, D. (1989). Describing the perception of tonality in music: A critique of the tonal
hierarchy theory and a proposal for a theory of intervallic rivalry. Music Perception, 6,
219-242.
Forte, A. (1973). The structure of atonal music. New Haven, CT: Yale University Press.
Hindemith, P. (1937). Unterweisung im Tonsatz. I. Theoretischer Teil. Mainz: B. Schott's
Söhne.
Huron, D. & Royal, M. (1996). What is melodic accent? Converging evidence from musical
practice. Music Perception, 13, 489-516.
Jeffries, T. B. (1974). Relationship of interval frequency count to ratings of melodic intervals.
Journal of Experimental Psychology, 102, 903-905.
Killam, R. N; Lorton, P. V. Jr. & Schubert, E. D. (1975). Interval recognition: Identification of
Back to index
Proceedings paper
Introduction
perception of accompanied and unaccompanied tone sequences. This has resulted in a dispute about whether
melody and harmony are mutually influencing dimensions or perceptually independent and additive components of
the musical stimuli (Povel & Van Egmond, 1993; Thompson, 1993). Thompson (1993) proposed the notion of a
partly hierarchical connection between key, harmony and melody.
Other studies have concentrated on the perception of specific groups of tones or chord changes in melodies. Povel
and Jansen (1998), for instance, showed that listeners are capable of recognizing arpeggiated chords in a series of
tones. Listeners, who judged a number of tone sequences on their musical goodness, generally gave higher ratings
to sequences that only contained chord tones, or sequences containing a non-chord tone that is linked (anchored) to
a closely following chord tone. From the study it was concluded that chord recognition and anchoring are important
mechanisms in the perception of melodic sequences.
Platt and Racine (1994) showed that listeners are capable of detecting a chord change occurring in a sequence of
arpeggiated tones from a single triad. Other researchers have shown that listeners perform less well with tasks in
which they have to detect melodic alterations when these alterations conform to the implied harmony than when
they violate it (Trainor & Trehub, 1994; Holleran, Jones & Butler, 1995). In addition, the ability to detect violations
of implied harmony has been reported to develop at a later age than the ability to detect violations against the key
(Trainor & Trehub, 1994), suggesting the involvement of a skill acquired through exposure, rather than being an
inherent characteristic of key or diatonic structure.
From a somewhat different perspective, Schmuckler (1989) investigated melodic and harmonic expectations by
collecting listeners' responses for a number of probe tones and arpeggiated probe chords at several sequential
positions in a musical phrase. The results of the harmonic probe chords were shown to be in accordance with
Piston's "Table of usual root progressions" (Piston, 1973). This table denotes, for the triads on each of the diatonic
scale degrees, which other chords follow a) most often, b) less often, or c) seldom.
In sum, several studies have pointed out the relevance of chords and chord changes for the perception of melodic
sequences. However, these studies have not addressed one possible implication for music perception: If harmony
underlying melody conveys perceptually important information, listeners should demonstrate the (explicit or
implicit) evaluation of chord progressions in processing single tone sequences. If this can be shown, it follows that
listeners are capable of recognizing a number of tones as a chord, then keeping a representation of this chord in
short term memory, while the next few tones are being recognized as another chord, enabling the final step of
evaluation of the chord progression.
Present approach
The present approach focuses on the question whether listeners evaluate the quality of a chord change. The aim of
this study is to investigate the role of implied harmony in the perception of a melodic line by using a paradigm in
which listeners rate the melodic goodness of tone sequences. Two contrasting hypotheses were formulated:
The first hypothesis states that listeners do not mentally represent the relation between chords that are induced when
listening to a tone sequence. This implies that after having heard the sequence only the last chord is perceptually
relevant. This hypothesis is based on the outcome of studies, that have shown, that music perception is a relatively
local process (e.g. Povel & Jansen, 1998; Bigand & Parncutt, 1999). The second hypothesis states that listeners do
represent and evaluate the relation between a succession of implied chords.
These hypotheses were tested in an experiment using a number of tone sequences only containing sequential
intervals larger than a major second which may be conjoined into chords. Four categories of tone sequences were
formed, corresponding to four different harmonic progressions employing the I, IV, and V triads of the major
diatonic scale. Listeners rated the melodic goodness of the sequences.
The predictions of the two contrasting hypotheses are:
I. On the hypothesis that listeners base their goodness rating on the last chord independently of the first chord, it is
predicted that sequences ending on the dominant triad (V) are rated higher than sequences ending on the
subdominant triad (IV).
II. On the hypothesis that listeners base their goodness rating on the relation between the two chords in the
sequence, it is predicted that there is a context effect of the first chord on the final chord, parallel to the usualness of
the chord relation. These predictions are based on the Table of usual root progressions by Piston (1973).
Also the effect of contour structure is investigated. It is hypothesized that simple contour structure increases the
perceptual goodness of tone sequences. Complexity of contour structure is therefore defined in terms of the number
of contour changes within segmented subgroups in the sequences (as explained below). Thus, it is predicted that
simpler contours are rated higher than complex ones.
Experiment
Method
Participants
Twenty-five listeners, students and staff of the Psychology Department of the University of Nijmegen, with various
degrees of musical experience participated in the experiment. Most of the participants practiced or had practiced a
musical instrument (ranging from 3 to 25 yrs), in most cases the violin, closely followed by piano and guitar.
Listeners were from different musical backgrounds, mainly tonal classical and mainstream pop music. Age ranged
between 18 and 38 yrs, with a median of 25.
Stimuli
Thirty-two 6-tone sequences only containing intervals larger than a major second were constructed by crossing 4
harmonic progressions with 8 contours, grouped into 4 contour categories, as described below. The stimuli are
shown in Table 1. The design was a 2-factor within-subjects design: Progression (4 levels) x Contour (4 levels).
1) Harmonic categories
Each 6-tone sequence consisted of an arpeggiated triad followed by another arpeggiated triad. These triads were the
Tonic (I), the Subdominant (IV), or the Dominant triad (V) within the same key. Pitch height differences between
the pcs present in the tone sequences were kept as small as possible.
The first triad was either a I, IV, or V chord. The choice of the second triad was based on the Table of usual root
progressions by Piston (1973). If the first chord was a I, the second chord was either a IV or a V, producing the
progressions I-IV and I-V, both quite usual (Category 1) progressions according to Piston's table. If the first chord
was a IV the second chord was a V, resulting in the chord progression IV-V, which is also a Category 1 progression.
If the first chord was a V, the second was a IV, producing a V-IV progression, which is less common (Category 2)
according to Piston's table.
2) Contour configuration
To investigate the effect of contour (here defined as the pattern of contour changes), the four harmonic categories
were crossed with 4 contour categories. These 4 groups of two sequences each differed with respect to the number
of directional changes in the entire sequence and on the relative position of the change(s) within the tones of the
first and the second triad. Group 1 contained 2 direction changes in total, but none within each of the two triads (in
fact, both triads were parallel upward or downward). In Group 2 the sequences had 3 changes, the first triad forming
a linear contour motion, and the final triad containing one change. In Group 3 this was reversed: 3 changes in the
entire sequence, the first triad containing a contour change, the final triad following a linear motion. In Group 4, a
total of four contour changes resulted in both triads containing a contour change (note that no linear motion was
present in this category).
Differences in melodic accents (defined here as peaks in the pitch contour) and pattern of intervals were minimized.
3) Timing and articulation
Presentation of the stimuli was manipulated to induce a segmentation in two groups of three tones (see also Jansen
& Povel, 1999), by stressing the first and the fourth tone. Timing and articulation of the tones was based on the
recorded key-press velocities, tone durations and IOIs of a triple meter pattern played by one of the authors, such
that tone IOIs were approximately 500 ms. The parameter values of the tones 1 to 6 are as follows: Velocities
(1-127) were 80, 66, 59, 80, 66, and 59; IOIs (in ms) were 514, 483, 496, 503, and 509; and durations (in ms) of the
tones were 514, 483, 315, 503, 509, and 297.
Each tone sequence was preceded by a cadence V7-I to induce a major key. In order to strengthen the induction of
triple meter, the chord V7 was played three times and the chord I once using the same timing and articulation of the
first four stimulus tones. The cadence was followed by a 1041 ms silence before the tone sequence was sounded, in
order to comply with the induced meter.
Apparatus
The stimuli were generated by a Yamaha PSR-620 keyboard set to the sound Jazz Organ 1. The volume of the
keyboard's stereo internal speakers was set to a comfortable listening level. Both stimulus presentation and response
collection were controlled by a Macintosh 4400 Power PC computer, running a custom written computer program.
Procedure
Participants were seated at a desk facing the computer screen. The computer screen showed a horizontal array of 7
radio-buttons numbered 1 to 7 (left to right), and two buttons below. The numbered radio-buttons served as a
7-point scale ranging from "bad" (1) to "good" (7) and a response was given by clicking one of these buttons. By
pressing the bottom-left button, labeled "play" or "repeat" (depending on whether a stimulus had already been
listened to for the first time or not), a stimulus was presented. The bottom-right button, labeled "next" was pressed
to proceed to the next trial.
To start a trial the participant clicked the "play"-button to listen to a stimulus. Next, the listener answered the
question "How good is this tone sequence as a melody?" by clicking one of the 7 buttons representing the scale.
Participants could repeat a stimulus by pressing the play button again. The number of repetitions was not restricted,
nor was the time to provide a response. Finally, after having provided a response the participant clicked the
bottom-right button to continue with the next trial.
The experimental trials were preceded by a number of training trials, during and after which the participant was
allowed to ask questions regarding the procedure. The experiment proper followed, in which the test sequences
were presented in a different random order for each participant. The pitch height of the trials was quasi-randomly
varied between 2 semitones below and 3 semitones above C4, with consecutive trials never in the same
transposition. After the experiment, the participant was asked to comment on the response strategy followed or
anything that came to mind.
Results
The mean intersubject correlation was .323 (p<.0001). A MANOVA was performed to examine the effects of the
factors Progression and Contour (4 by 4 Repeated Measures). Both the main effect for Progression (F(3,22)=7.254,
p=.0015) and for Contour were significant (F(3,22)=33.764, p<.0001). A statistical test of the interaction between
Progression and Contour was also significant (F(9,16)=3.186, p=.0209).
The means for the 4 categories of Progression (I-IV, I-V, IV-V, and V-IV), were 4.57, 4.26, 4.53, and 3.84,
respectively (see Figure 1a). After Bonferroni correction for taking all possible pairwise comparisons, the difference
between I-IV and V-IV (F(1,24)=15.90; p=.0005), and the difference between IV-V and V-IV (F(1,24)=15.31;
p=.0007) were statistically significant. A planned comparison combining groups with the same final triad (I-IV and
V-IV, vs. I-V and IV-V) was not significant (F(1,24)=1.836; p=.1880), while a comparison between Category 1
(I-IV, I-V, and IV-V) and Category 2 (V-IV) progressions of Piston's table was indeed significant (F(1,24)=15.351;
p=.0006).
The means for the 4 Contour categories (groups 1-4) were 5.50, 3.76, 4.14, and 3.82, respectively (see Figure 1b).
Pairwise planned comparisons of the differences between these means showed that Contour group 1 differed
significantly from the other three groups: Group 2 (F(1,24)=84.871; p<.0001), Group 3 (F(1,24)=87.698; p<.0001),
and Group 4 (F(1,24)=71.483; p<.0001), again after Bonferroni correction.
The interaction of the factors (shown in Figure 2) was inspected further, both visually and by means of interaction
contrasts. For each Progression the effect of Contour group is approximately the same. The differences between
Contour group 1 vs. groups 2, 3, and 4 are statistically significant within each progression. Interaction contrasts
showed that for the progression V-IV, also the differences between Contour Groups 3 vs. 2 (F(1,24)=6.919;
p=.0147), and Groups 3 vs. 4 (F(1,24)=5.744; p=.024) are significant.
Discussion
The results show that both implied harmonic progression and contour structure played a role in the perception of the
6-tone sequences used in the Experiment. Overall, apart from some large pairwise differences, the effects found are
subtle rather than pronounced. The interaction of progression with contour shows that both factors are engaged in an
interdependent relation. Interpretations of the effects in terms of the hypotheses are given below.
The ratings for the harmonic progressions show that listeners, at least in part, form their goodness judgment of a
melodic sequence on its implied harmonic structure. In particular, listeners appear not to base their response solely
on the recognition of the final triad in a sequence, as shown by a nonsignificant contrast between categories with the
same final chord, but their responses can only be explained by including the influence of the first chord and thus
assuming the perception of chord changes, as is concluded from a significant contrast between Category 1 and
Category 2 progressions in terms of Piston's table. Therefore the first hypothesis stated in the introduction is
rejected in favour of the second hypothesis: listeners are capable of holding an implied chord in STM, while
recognizing the next chord, and they evaluate the perceptual quality of the harmonic transition.
The pairwise differences between progressions were unexpected in one respect: The generally accepted fundamental
nature of I-V suggests that it would be rated highest of all progressions, yet it was rated lowest of the Category 1
progressions. Furthermore, the difference between I-V and V-IV did not reach significance. A tentative explanation
for this finding may be that I-V tends to be a too direct transition, whereas I- IV and IV-V tend to be more subtle,
and therefore perceptually preferred.
The results for the contour variable show that the sequences with the most simple contour structure (contour
category 1; parallel linear motion) are rated highest by far. The data show a tendency for a single linear motion in
the final triad to be judged slightly higher than when it appears in the initial triad, although this difference did not
reach statistical significance. Thus, the present results show that listeners are sensitive to contour information and
tend to rate the musical goodness of simpler contours higher than more complex contours.
The deviant effects for contour on the V-IV progression (as indicated by the statistical interaction between
progression and contour), may be interpreted as follows. Music perception can be seen as a process aiming at
constructing a suitable musical interpretation of the auditory signal, employing a number of perceptual mechanisms
to analyse the content of the input for musical features. The strategy that is followed seems to be solutionist rather
than perfectionist: the ultimate aim for the process is to produce an interpretation at all, and any mechanism
available to interpret the elements, will achieve that aim. In such a framework, mechanisms fitting well are most
likely preferred over mechanisms fitting less well. In this light, the larger effects for contour within the lowest rated
progression V-IV can be explained by the supposition that if an harmonic analysis is less appropriate, contour
analysis has a more pronounced influence on the responses.
The present results are in accordance with Schmuckler (1989), although his second experiment investigated
harmonic expectation rather than perception. Taking into account the fact that we used only 4 different progressions
instead of all possible chord changes, the absolute difference on a 7-point scale between Category 1 changes
(average of 4.45) and Category 2 changes (3.84; absolute difference of .61) is comparable to the difference found by
Schmuckler (Category 1: 4.92; Category 2: 4.25; difference of .67). In contrast to Schmuckler's study, ours does not
explicitly enforce listeners to evaluate a chord change as in a probe tone/chord study, but examines whether they
evaluate the progression spontaneously. Nevertheless, the agreement between our results and Schmuckler's suggests
that the perception of harmony in melodies follows the same rules as harmonic expectancy for melodies. It also
stresses the importance of the concept of expectation for harmonic factors in melody perception.
The present study might be criticized for its use of artificial stimuli, as a consequence of which its results do not
generalize to the process of perceiving real music. This would be a legitimate criticism, if the present study had
pretentions to generalize directly to music listening in everyday life. However, one should not confuse the object of
a study with its method. Of course, the major goal of our research is to examine the process of music listening as is,
but the experimental method is only exploited to its full potentials if the rules of conducting experimental research
are followed. In case this means: constructing musical stimuli in which the variables of interest are manipulated
categorically. Only such a method allows the systematic investigation of the hypothesized mechanisms of music
perception. Once the mechanism has been firmly established experimentally, predictions concerning the perception
of more realistic musical stimuli can be derived and tested.
In closing, the results of this study show that the construction of a mental representation of a melody is based on a
description of the sequential structure on the one hand, and a representation of the underlying harmony on the other.
This finding is in line with the notion that the perceptual analysis of musical information involves both general
auditory principles, which guide the process by grouping the elements and providing sequential structure
representations, as well as strictly musical principles which analyze harmonic structure in the process of listening to
music.
References
Bharucha, J.J. (1984). Anchoring effects in music: The resolution of dissonance. Cognitive Psychology,
16, 485-518.
Bigand, E. (1997). Perceiving musical stability: The effect of tonal structure, rhythm, and musical
expertise. Journal of Experimental Psychology: Human Perception and Performance, 23, 808-822.
Bigand, E., & Parncutt, R. (1999). Perceiving musical tension in long chord sequences. Psychological
Research, 62, 237-254.
Cuddy, L.L., Cohen, A.J. and Mewhort, D.J.K. (1981). Perception of structure in short melodic
sequences. Journal of Experimental Psychology: Human Perception and Performance, 7, 869-882.
Holleran, S., Jones, M.R., & Butler, D. (1995). Perceiving implied harmony: the influence of melodic
and harmonic context. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21,
737-753.
Jansen, E.L., & Povel, D.J. (1999). Mechanisms in the perception of accented melodic sequences.
Proceedings of the 1999 Conference of the Society for Music Perception and Cognition. Evanston, Ill.,
p. 29.
Krumhansl, C.L. (2000). Rhythm and pitch in music cognition. Psychological Bulletin, 126, 159-179.
Lerdahl, F. (1988). Tonal pitch space. Music Perception, 5, 315-350.
Piston, W. (1973). Harmony (7th ed.). London, Victor Gollancz Ltd.
Platt, J.R., & Racine, R.J. (1994). Detection of implied harmony changes in triadic melodies. Music
Perception, 11, 243-264.
Povel, D.J. (1996). Exploring the elementary harmonic forces in the tonal system. Psychological
Research, 58, 274-283.
Povel, D.J., & Van Egmond, R. (1993). The function of accompanying chords in the recognition of
melodic fragments. Music Perception, 11, 101-115.
Povel, D.J., & Jansen, E.L. (2000). Towards an on-line model of music perception. This proceedings.
Povel D. J., & Jansen, E.L. (1998). Perceptual Mechanisms in Music Perception. Internal Report NICI.
Schmuckler, M. (1989). Expectation in music: Investigation of melodic and harmonic processes. Music
Perception, 7, 109-149.
Thompson, W.F. (1993). Modeling perceived relationships between melody, harmony, and key.
Perception & Psychophysics, 53, 13-24.
Trainor, L.J., & Trehub, S.E. (1994). Key membership and implied harmony in Western tonal music:
Developmental perspectives. Perception and Psychophysics, 56, 125-132.
Back to index
Proceedings paper
[Editor's note: This paper contains a number of symbols and characters from a font set that we were unable to print or read. In most cases these are mathematical,
sound pressure level or Hz symbols and the meaning should be clear. Additionally some aspects of the figures, particularly graph axis labels, were unstable. We have
reproduced the paper as well as we are able but advise checking with the original document which is stored as originals\posters2\kwak.doc]
â... . Introduction
Most authors of today seem to acknowledge the statement, "no roughness is produced by pure-tone pairs exceeding critical band" which Terhardt(1974a) insisted. He indicated that V-shaped
curve presented by Plomp and Levelt(1965) and Kameoka and Kuriyagawa(1969a, 1969b) exhibits no singular points corresponding to simple frequency ratios. In addition, he said that "hence it
must be concluded that frequency distance rather than frequency ratio is the decisive parameter of the consonance of pure-tone intervals". However, there must be some corrections in Terhardt's
statement for the following five reasons:
1 Past experimental studies on the tonal consonance and dissonance were mostly limited to simple intervals;
2 In spite of pure-tone pairs exceeding critical bandwidth, roughness can be produced by aural harmonics(harmonic distortion) in mistuned consonance(1:2, 1:3);
3 The direct interaction model presented by Plomp's study(1967) on the beats of mistuned consonances needs to be corrected because it lacks the comprehension of the different meaning between
the threshold of pitch perception and of loudness perception;
4 As an appropriate traveling wave model on pure-tone pairs evoking roughness, composite model which is compromised between Plomp's direct interaction model and Clack's aural harmonics
model must be considered;
5 Aural harmonics are significant parameters in the perception of tonal dissonance.
It seems that the aspects of graphs b, c, and d are slightly different from that of graphs a and e. In common, also in graphs b, c, and d, the monotonous V-shaped curve aspect where the frequency
differences of pure-tone pairs are within critical bandwidth, are broken after the point of frequency difference exceeding critical bandwidth.
In graph b, the first peak of consonance is about at 53㎕ frequency difference. At this point, the higher tone is about 278㎕ and the lower tone, 225㎕(278/225≈ 5/4). Now, look at the point of
file:///g|/poster2/Kwak.htm (3 of 12) [18/07/2000 00:34:53]
(Effect of Aural Harmonics on Dissonance Perception)
frequency difference 210㎕. At this point, the higher tone is about 376㎕ and the lower tone, 166㎕(376/166≈ 9/4). In terms of musical intervals, the former is major 3rd and the latter, major
9th(or major compound 2nd). Based upon the 7-point in graphs(ordinate), major 9th in the geometric mean frequency 250㎕ is more dissonant than major 3rd, though it has a larger frequency
difference. This aspect appears also in graph d.
In graph d, the first peak of consonance is about at 165㎕ frequency difference, in which the higher tone is 1086㎕ and the lower tone, 921㎕(1086/921≈ 20/17). This tone pair is a slightly
narrower interval than minor 3rd(6/5), and its consonance value in the graph indicates about 6.5. Now, look at the point of frequency differences, 330㎕ and 660㎕. In the frequency difference
330㎕, each of the higher and lower tone is 1179㎕ and 849㎕(1179/849≈ 7/5), and in 660㎕, each tone is 1383㎕ and 723㎕(1383/723≈ 15/8). Approximately, the former is augmented
4th and the latter major 7th, in which each of the consonance values is about 5.55 and 5.75. In the geometric mean frequency 1000㎕, the consonance values for the frequency differences,
165㎕, 330㎕, and 660㎕ are in turn 6.5, 5.55, and 5.75., by which is shown that there is almost no relation between the aspects of tonal consonance and dissonance of the tone pairs exceeding
critical band and the frequency difference.
Also in graph c, it is shown that there is no relationship between the aspect of consonance and dissonance and the frequency difference after the frequency difference 105㎕. If more number of
tone pairs exceeding critical bandwidth had been used in the experiments, such an aspect would have been shown more clearly. Anyhow, all of the graphs b, c, and d support that the V-shaped
curve aspect by Plomp and Levelt is valid only in the tone pairs within the critical bandwidth.
â' V-curve
As shown in Fig. 1, the degree of dissonance for pure tone pairs decisively depends upon the frequency difference. Especially, the maximal dissonance of two pure tones was produced at the
frequency difference corresponding to the position of about a quarter of critical bandwidth. After this position, the consonance value smoothly increases and almost recovers at the frequency
difference corresponding to critical bandwidth. This aspect is also presented by Kameoka and Kuriyagawa(1969a, 1969b). However, there are some differences between Fig. 1 by Plomp and
Levelt(1965) and graphs by Kameoka and Kuriyagawa(1969a). These differences are very important for considering discrepancy between both of the authors' experiments. At first, in comparison
with the graphs by Plomp and Levelt(Fig. 1), the graphs by Kameoka and Kuriyagawa are more effective in investigating the aspect of consonance and dissonance between the pure-tone pairs
within the critical band. They adopted the frequency deviation rate as a parameter to acquire the perfect V-shaped curve, and in addition, applied the simple interval-based logarithmic scale to
their graphs. Thereby, it seems that they could systematically confute the 25% theory of the maximal dissonance by Plomp and Levelt(1965). According to Kameoka and Kuriyagawa, the most
dissonant frequency difference varies with the frequency range. It also varies with the sound pressure level of two pure tones.
Above all, however, one of the most interesting aspects in graphs between both of authors is found at the range of compound intervals whose aspects of consonance and dissonance are apparently
opposite each other. In graph by Kameoka and Kuriyagawa(1969a) the transition aspect of consonance and dissonance is indicated over three octaves. According to this graph, the maximal
dissonance in f1=440㎕ is produced at the interval consisting of f1 440㎕ and f2 484㎕, where the frequency deviation rate is 10%. This result agrees well with that of graph e in Fig. 1.
However, compare the aspect of consonance and dissonance for compound intervals in graph by Kameoka and Kuriyagawa(1969a) with that of Fig. 1. In Kameoka and Kuriyagawa's graph, the
aspect of compound intervals is a V-shaped curve, whereas in Fig. 1, it is a Λ-shaped curve!
â'¡ The change of dissonance degree depending on the level difference between two pure tones
file:///g|/poster2/Kwak.htm (4 of 12) [18/07/2000 00:34:53]
(Effect of Aural Harmonics on Dissonance Perception)
Through their extensive experimental study, Kameoka and Kuriyagawa proved the fact that dissonance degree can be systematically changed by the level difference between two pure tones.
When the level of lower tone(f1) is L1 and that of higher tone(f2), L2, the tone pair with a spectrum form L1>L2 is more dissonant than that with the opposite spectrum form L1<L2. With regard
to the reason why the spectrum form L1>L2 is more dissonant than the opposite form L1<L2, Kameoka and Kuriyagawa suggested that such an asymmetrical property is explained by the
pure-tone masking effect and nervous response patterns, but these cues seem to be somewhat obscure.
The magnitude of roughness produced between two pure tones, f1 and f2 attains to maximal value when the levels of f1 and f2 are same(Terhardt, 1974b). On the contrary, the conclusion by
Kameoka and Kuriyagawa shows that L1>L2 is more dissonant than L1=L2. This fact is in conflict with the existing experimental results on the relation between amplitude modulation and the
magnitude of roughness. However, if it is considered that two pure tones within critical band are not f1 and f2 but nf1(aural harmonics of f1) and f2, their conclusion is correct. The loudness
difference between nf1 and f2 in L1>L2 is smaller than that in L1=L2. If nf1 and f2 are within the critical band, a pure-tone f2 causes roughness through the interference(or intermodulation) with
not f1 but nf1. Therefore, with respect to the existing experimental results for the relationship between amplitude modulation and the magnitude of roughness, the magnitude of roughness between
nf1 and f2 in L1>L2 is larger than that in L1=L2.
According to the critical band theory, roughness as a cue of psychoacoustic dissonance should not be produced at the pure-tone intervals exceeding critical bandwidth. However, as shown in Fig.
1, there is a certain systematic aspect of incomplete dissonance in spite of the compound interval exceeding the critical band. This is an indirect evidence for the fact that even when two pure
tones exceed the critical band, roughness can be produced by intermodulation with aural harmonics. The assumption of Terhardt(1974a) such as "in the cases of a pure-tone pair exceeding critical
bandwidth, no roughness is produced" could be right or wrong according to the following premises. If his assumption was involving adequate recognitions for the systematic existence of aural
harmonics which are physiological products, and for the assumption that an aural harmonic could be a pure tone, it is not a wrong assumption. However, if his assumption was overlooking the
existence of aural harmonics because of focusing only the maximal dissonance phenomena, it ends up to be incorrect assumption in which the systematic aspect of 'incomplete dissonance' e.g., for
pure-tone compound intervals is unfairly excluded.
â...¢. Experiments
1. Purpose
This study investigates the aspects of incomplete dissonance of pure-tone pairs, especially, pure-tone compound intervals(CIs) exceeding critical bandwidth. Because the past researches are
mostly limited to simple intervals(SIs), the experimental approach to pure-tone CIs is very suggestive. To verify the fact that aural harmonics affect the perception of incomplete tonal dissonance
in pure-tone CIs, it is essential to make a thorough investigation of the experimental results in both of the same and different loudness conditions for several intervals such as augmented 4th,
minor 6th, major 7th, minor 9th(compound 2nd), augmented 11th(compound 4th), and minor 13th(compound 6th).
2. Method
Through experiment 1(L1=55㕈, L2=55㕈) and 2(L1=55㕈, L2=40㕈), the total 3978 intervals were randomly presented to the 13 subjects(6 males, 7 females, average age=26.5). The subjects
were chosen regardless of the degree of musical training. The frequency of invariable lower tone f1 was 262㎕, and the variable higher tone f2 included 24 tones which are in an equal
temperament scale within the distance of two octaves from the lower tone 262㎕. Based upon 7-point scale, each subjects had to judge the degree of tonal consonance and dissonance of
3. Results
â' In the condition of L1=55㕈, L2=40㕈, augmented 11th(compound tritone) was perceived to be more dissonant than tritone(F(1,12)=14.74, P<.01).
â'¡ In L1=55㕈, L2=40㕈, minor 13th(minor compound 6th) was perceived to be more dissonant than minor 6th(F(1,12)=8.14, P<.025).
â'¢ Augmented 11th in L1=55㕈, L2=40㕈 was perceived to be more dissonant than that in L1=55㕈, L2=55㕈(F(1,12)=9.90, P<.01).
â'£ Minor 13th in L1=55㕈, L2=40㕈 was perceived to be more dissonant than that in L1=55㕈, L2=55㕈(F(1,12)=12.28, P<.01).
The above results are interesting in the historical context with relation to whether or not aural harmonics exist. Incomplete dissonance in the CIs can be discussed in parallel with the problem of
beat and roughness sensation in the mistuned consonances. Paradoxically, past studies for beats of mistuned consonances(Tonndorf, 1959; Plomp, 1967) suggest that beats can be produced by
pure-tone pairs exceeding the critical band depending on the frequency ratio between two pure tones. Because, physically, beat is closely related with roughness, a lengthy discussion for the
results â' , â'¡, â'¢, â'£ presented by this study would be inevitable when relating to the following three subjects: the origin of beats of mistuned consonances, Clack's experiments for aural
harmonics, and the aspects of traveling wave in basilar membrane by pure-tone pairs exceeding critical band.
â...£. Discussion
Evidence (b): The subjects who could not identify aural harmonics in the test of aural harmonics audibility, could hear the beats of mistuned consonances. For example, in the test of aural harmonics audibility for 125㎕(SL 65㕈), the subjects who could not
identify aural harmonics, 250, 375, 500㎕, etc., could hear the beat produced between 125㎕(SPL 100㕈) and 251㎕(SPL 90㕈).
Evidence (c): Beats are produced even in the mistuned consonances, 5:9 and 4:9. Traditionally, beats of mistuned consonance types of 1:2, 1:3, 1:4,......1:n have been made out by aural harmonics. On the other hand, beats in 5:9 and 4:9 are difficult to be
explained by aural harmonics.
On the basis of these evidences, Plomp concluded that the origin of beats of mistuned consonances is not aural harmonics but a waveform variation(phase interference) by direct interaction
between two primary tones(Plomp, 1967, 1976). In the view of the direct interaction theory, it is presumable that roughness could be produced by phase effect depending upon the frequency ratio
even when two primary tones exceed the critical band. This fact contradicts with Plomp and Levelt's critical band theory(1965) and Terhardt's assumption(1974a) for the production of roughness,
because the generation of phase effect and roughness in pure-tone pairs was depending on the frequency ratio of two primary tones. Also Kameoka and Kuriyagawa's assumption, "for a given
level difference, a dyad with a spectrum form (L1>L2) is more dissonant than that with its opposite form (L1<L2)", lacks the consideration of roughness by phase effect depending on the
frequency ratio. If they were considering that beats of mistuned consonances have an origin of two tones interaction depending on the frequency ratio, their assumption would be more
complicated. Anyway, the studies of tonal dissonance sensation by all the authors above overlook the influences by phase effect or roughness by aural harmonics in pure-tone pairs exceeding the
critical band.
Apart from all arguments above, however, one of the purposes of this paper is not to give support to the direct interaction theory that phase effect resulted directly from two primary tones is an
Four graphs in Fig. 2 show the dissonance aspects of SIs and CIs in experiment 1 and 2. First, observe the difference of dissonance aspects in L1=L2=55㕈(graph a) and in L1=55㕈,
L2=40㕈(graph b). The noticeable shifts are found at major 7th, minor 9th, augmented 11th, and minor 13th. In addition, it is found that augmented 11th and minor 13th are more dissonant than
augmented 4th and minor 6th in both graphs a and b. Especially, the differences of augmented 4th and 11th, and minor 6th and 13th are significant in graphs b and d.
An evidence for that the differences of augmented 4th and 11th, and minor 6th and 13th can be explained by both theories of direct interaction and aural harmonics is shown in graph c. The
Fig. 3. Illustrations of the vibration patterns along the basilar membrane produced by a pure-tone interval, minor 9th. A direct interaction model presented by Plomp is shown in a, and an aural harmonics model in b. According to critical band model, Plomp's
direct interaction model contradicts his own critical band model in the cases of a higher intense lower tone and the low frequency range below 1000㎕. Model d is a compromised model between the models of direct interaction and aural harmonics.
Now, we need to consider a problem related to the critical band model presented by Plomp and Levelt(1965). The V-shaped aspect is evident only in the relatively lower intensities, and have to be
limited only in the range of a single critical band. Paradoxically, an application of Plomp's critical band model in the higher intensities of a lower tone or for intervals exceeding critical band is
invalidated by his own direct interaction model. His direct interaction model predicts essentially not that "consonance and dissonance of pure-tone pairs depend on frequency difference rather
than frequency ratio(Terhardt, 1974a)" but that "consonance and dissonance of pure-tone pairs depend on both of the frequency difference and the frequency ratio".
â...¤. Conclusion
In this study, the aspects of incomplete dissonance of pure-tone pairs, especially, pure-tone compound intervals(CIs) exceeding critical bandwidth were investigated. The experimental results
suggest several evidences on the fact that V-shaped curve aspect as a basal psychoacoustic concept explaining dissonance mechanisms in pure-tone pairs must be applied only within a single
critical band. Furthermore, with regard to the fact that roughness can be produced even when both primary tones of pure-tone pairs exceed the critical band, this study argues that aural harmonics
can be crucial cues for the formation of dissonance sensation.
References
Allen, J. B.(1989). Is the basilar membrane tuning the same as neural tuning?, Cochlear mechanisms, ed. by J. P. Wilson, and D. T. Kemp, New York: Plenum Press, 453-460.
Allen, J. B., and Fahey, P. F.(1993). A second cochlear-frequency map that correlates distortion product and neural tuning measurements, J. Acoust. Soc. Am. 94, 809-816.
Bekesy, G.(1963). Hearing theory and complex sounds, J. Acoust. Soc. Am. 35, 588-601.
Bekesy, G.(1963). Three experiments concerned with pitch perception, J. Acoust. Soc. Am. 35, 602-606.
Bekesy, G.(1972). The missing fundamental and periodicity detection in hearing. J. Acoust. Soc. Am. 51, 631-637.
Clack, T. D.(1967a). Aural harmonics: The masking of a 2000㎕ tone by a sufficient 1000㎕ fundamental, J. Acoust. Soc. Am. 42, 751-758.
Clack, T. D.(1967b). Aural harmonics: Preliminary time-intensity relationships using the tone-on-tone masking technique, J. Acoust. Soc. Am. 43, 283-288.
Clack, T. D.(1968). Aural harmonics: Tone-on-tone masking at lower frequencies of a fundamental, J. Acoust. Soc. Am. 44, 384.
Clack, T. D., and Erdreich, J.(1971). Aural harmonics: A possible relation of loudness, J. Acoust. Soc. Am. 51, 113.
Clack, T. D., Erdreich, J., and Knighton, R. W.(1972). Aural harmonics: The monaural phase effects at 1500㎕, 2000㎕, and 2500㎕ observed in tone-on-tone masking when f1=1000㎕, J. Acoust. Soc. Am. 52, 536-541.
Clack, T. D.(1975). Some influences of subjective tones in monaural tone-on-tone masking, J. Acoust. Soc. Am. 57, 172-180.
Clack, T. D.(1977). Growth of the second and third aural harmonics of 500㎕, J. Acoust. Soc. Am. 62, 1060-1061.
Erdreich J., and Clack, T. D.(1971). Aural harmonics: A comparisoon of two model for the tone-on-tone paradigm, J. Acoust. Soc. Am. 51, 113.
Fastl, H., Jaroszewski, A., Schorer, E., and Zwicker, E.(1990). Equal loudness contours between 100 and 1000㎕ for 30, 50, and 70 phon, Acustica 70, 197-201.
Giguere, C., Smoorenburg, G. F., and Kunov, H.(1997). The generation of psychoacoustic combination tones in relation to two-tone suppression effects in a computational model, J. Acoust. Soc. Am. 102, 2821-2830.
Kameoka, A., and Kuriyagawa, M.(1969a). Consonance theory part 1: Consonance of dyads, J. Acoust. Soc. Am. 45, 1451-1459.
Kim, D. O., Monlar, C. E., and Pfeiffer, R. R.(1973). A system of nonlinear differential equations modeling basilar-membrane motion, J. Acoust. Soc. Am. 54, 1517-1529.
Kwak, S. Y.(1998). A study of combination tone and dissonance sensation. Journal of Music Theory, Vol. 3. Seoul national University, Western Music Research Institute.
Letowski. T.(1975). Difference limen for nonlinear distortion in sine signal and musical sounds, Acustica 34, 106-110.
McAdams, S.(1982). Spectral fusion and the creation of auditory images, Music, mind, and brain, ed. by M. Clynes, New York: Plenum Press. 279-298.
Meddis, R., and O'Mard, L.(1997). A unitary model of pitch perception, J. Acoust. Soc. Am. 102, 1811-1820.
Noorden, L.(1982). Two channel pitch perception, Music, mind, and brain, ed. by M. Clynes, New York: Plenum Press. 251-269.
Plomp, R., and Levelt, W. J. M.(1965), Tonal consonance and critical bandwidth. J. Acoust. Soc. Am. 38, 548-560.
Plomp, R.(1967). Beats of mistuned consonances, J. Acoust. Soc. Am. 42, 462-474.
Rakowski, A., Miskiewicz, A., and Rosciszewska, T.(1998). Roughness of two-tone complexes determined by absolute magnitude estimation, Proceedings of the 5th International Conference on Music Perception and Cognition, Seoul National University,
Western Music Research Institute. 95-100.
Schubert, E. D.(1969). On estimating aural harmonics, J. Acoust. Soc. Am. 45, 790-791.
Stevens, S. S., and Davis, H.(1938). Hearing: Its psychology and physiology, New York, American Institute of Physics.
Terhardt, E.(1974a). Pitch, consonance, and harmony, J. Acoust. Soc. Am. 55, 1061-1069.
Terhardt, E.(1974b). On the perception of periodic sound fluctuations(roughness), Acustica 30, 201-213.
Terhardt, E.(1976). Ein psychoakustisch begrundetes Konzept der Musikalischen Konsonanz, Acustica 36, 121-137
Tonndorf, J.(1959). Beats in cochlear models, J. Acoust. Soc. Am. 31, 608-619
Back to index
Proceedings paper
Tonal organization is an essential process for a listener to perceive a mere pitch string as a coherent melody. The process, which is constrained and led by the
tonal schema the listener has acquired, is to organize an input pitch sequence into a system of tonality that is woven around "a tonal center." For an input pitch
sequence to be organized, the tonal center of the input melody must be fixed in the mind, and vice versa. As a result of this process, the listener can perceive
tonality (or atonality) for a melody, and can perceive a melody to be in a certain key. Previous studies have shown that key perception is defined by structural
characteristics of the pitch contents of a set (Abe & Hoshino, 1990; Cuddy, 1997; Krumhansl, 1990). In other words, they argued that the perceptual
interpretation of a key of a melody would be strongly affected by the pitch contents of a set of constituent notes in the melody. Here, we would like to point
out the possibility that key perception also affected by the time-dependent characteristics of the pitches' ordering within the melody. In other words, we
assume that key perception can be affected by the temporal ordering of pitches. Consider, for example, the following two tone sequences: Both are composed
of the same set of five notes [C4, D4, E4, F4, and G4]. The only difference between the two sequences is the temporal ordering of the five notes; Sequence 1:
G4-E4-F4-D4-C4, Sequence 2: G4-C4-D4-E4-F4. When listening to the two sequences, it might be said that listeners perceive the first sequence as C-major and
the second as the F-major.
The purpose of this experiment is to investigate whether perceptual interpretation of a key of an input melody can be influenced by the temporal ordering of
pitches, and if this is the case, what kind of cues influence these key interpretations.
Method
Participants
Thirty-one undergraduate and graduate students of Hokkaido University (average age; 22.8 years; age range; 18-39 years) were participated in this experiment.
None of the participants was a professional or a serious amateur musician, although some had taken music lessons.
Materials
Eighteen tone sequences were prepared as stimulus materials. Every sequence was composed of the same set of six tones (pitch set). Absolute pitch of the six
file:///g|/poster2/Matsunag.htm (1 of 7) [18/07/2000 00:34:57]
THE ROLE OF THE TEMPORAL ORDERING OF PITCHES IN TONAL ORGANIZATION
tones were C#4, D4, E4, F#4, G4, and A4, respectively. All tones of the pitch set could only be interpreted as scale tones of D-major or b-minor, in accordance
with the traditional music theory of the Western diatonic scale. Only the same pitch set was experimentally manipulated. That is, the difference among the 18
sequences was the order of the constituent notes. Some pitch positions of the constituent notes in each sequence were assigned to either a higher octave or a
lower octave from each original pitch. The pitch range for each sequence was restricted to one octave. In other words, the highest and lowest pitches of each
sequence fell within one octave â€" but not necessarily the same octave (see Table 1). The duration of each tone was 1.0 s, so for total was 6.0 s.
Task
We adopted the "final-tone extrapolation" task (Abe & Hoshino, 1990; Hoshino & Abe, 1984), which was based on the assumption that selected the
final-tones were the tonic or the nuclear pitches of a certain key, as the task of the participants in this experiment. The assumption comes from the
characteristics of tonality, which are defined "the predominant of a certain tone, tonic, over others in a piece of music" (Abe & Hoshino, 1990).
Procedure
Each participant sat in front of a keyboard, and was presented with the sequences through a loudspeaker placed 1.0 m away. In each trial, each sequence was
presented three times. There was a 6.0 s interval between the presentations. The participants were told that they would hear a series of tone sequence. For each
one their task was to select a pitch that would be the best ending of the sequence. During and after the presentation of each sequence, the participants were
allowed to play tones on the keyboard to help them make their final-tone selection. They were required to select the final-tone from a printed list of the twelve
pitches in the octave (C, C#, D, D#, E, F, F#, G, Ab, A, Bb, B). Then, they were asked to rate the confidence of the selected final-tone on a 5 point scale (5 =
full confidence, 1 = poor confidence). When the participants felt that more than one pitch could be the best final-tone, they were allowed to select as many
pitches as they liked, but they had to rate the confidence of each selected pitch as the best final-tone. The timbre and dynamics of the keyboard sound was the
same as those of the sequences, and the tones of the keyboard played by the participants were sounded through the same loudspeaker. After 2 practice trials,
each participant performed 18 experimental trials in a random order.
Results and Discussions
In this experiment, the participants were allowed to select more than one pitch as the final-tone when they desired. In fact, the average number of the final-tone
responses of the participants was 1.06 tones per sequence. Twenty-five out of 31 participants always selected only one pitch as the final-tone throughout the
trials. Although the remaining 6 participants selected more than one pitch as the final-tone, they did not do so for every sequence. The largest number of
responses for one sequence was 3, which was seen in only one participant's response for that sequence.
Key interpretation
The number of the final-tone responses in each pitch category was counted. We adopted the coefficient of concentration of selection (CCS: Iwahara, 1964) as a
measurement to show the degree to which the final-tone responses for each sequence concentrate within a few specific pitch categories. The CCS is calculated
by the equation: CCS . Here, K is the number of response categories, that is, the number of pitches in the octave (= 12). N is the total
number of the final-tone responses, which might vary in each sequence.
Table 1 shows the results of the final-tone responses. The first column indicates the 18 stimulus tone sequences ordered in accordance with the values of CCS
for each sequence. The second column indicates the temporal orderings of the six tones for each sequence in the absolute pitch notation and in the interval
notation. The third column indicates the numbers of the final-tone responses selected in each of 12 pitch categories by the participants. The fourth column
As seen in the third column of Table 1, pitch D was the most commonly selected as the final-tone over all the sequences. In particular pitch D was chosen the
most often for S01, S02, S03, S04, S05, and S06, all of which also had high CCS values. For S07, S08 and S09, the number of pitch D responses was more or
less equivalent to the number of pitch F# responses. The final-tone responses were dispersed over several pitches for the remaining sequences, all of which had
the relatively low CCS values. None of the participants selected pitch B or pitch D# as the final-tone for any of the sequences. Chi-square analysis was
performed for the distribution of the final-tone responses. The chi-square analysis revealed that the difference of the distribution of responses among the 18
sequences was significant ( χ2 = 114.58, df = 187, p < .01). These results suggest that the participantsâ€(tm) key interpretation varied with each sequence, all
of which consisted of the same set of pitches, but differed in their temporal ordering. This implies that key perception depends not only on the pitch contents
of a set, but also on the temporal ordering of pitches.
For the confidence rating scores (Table 1, fifth column), a one-way analysis of variance was performed. The ANOVA indicated that the difference between the
confidence ratings for each sequence was not significant (F (17.510) = 1.02, ns). The mean of the confidence rating scores was 3.47. The correlation between
the confidence ratings and the CCS values was significant (r = 0.67, p < .005). These results imply that the participants could feel somewhat above the middle
degree of tonality for all the sequence, but a little difference existed between the sequences.
What kind of cues influence key interpretations?
As mentioned above, the results showed that the final-tone response could vary across melodies which consisted of the same pitch set but differed in the
temporal ordering of pitches. Nevertheless, the participants selected pitch D most often, as the final-tone over several sequences (S01 to S06). These results
suggest that temporal ordering itself is not the sole cue for guiding key interpretation. The more specific local characteristics of pitch ordering within a melody
may also have an effect. For example, if a set of sequences differ in pitch order, but have one or more interval relationships between pitches in common,
listeners may identify them as belonging to the same key.
It is known that tonal organization is not derived from absolute pitch information in a melody, but from relative pitch relationships in a melody (Abe, 1987;
Bartlett & Dowling, 1980; Dowling, 1986). Based on this view, we will focus on interval relationships in melodies, and examine whether the existence of
specific intervals in melodies might lead listeners to make specific key interpretations.
In this paper, we denote intervals of the tone sequences by positive integers for ascending intervals and by negative integers for descending intervals (one unit
= a half-step interval). For example, the ascending minor second is denoted as (+1), the descending perfect fourth as (-5), and so on. As seen in Table 1, for
example, S01 is denoted in absolute pitch notation: D4 - F#4 - A4 - G4 - E4 - C#4, and in interval notation: (+4) - (+3) - (-2) - (-3) - (-3).
Twelve intervals were used in the 18 tone sequences for this experiment (+1, -1, +2, -2, +3, -3, +4, -4, +5, -5, +6, and -6). No other intervals were included.
We examined the frequency of occurrence of similar interval relations in sequences, for which participantsâ€(tm) responses were also similar (see Table 2).
First, we examined the 6 sequences (S01, S02, S03, S04, S05, and S06) for which pitch D was the most common response. There were no intervals which
file:///g|/poster2/Matsunag.htm (5 of 7) [18/07/2000 00:34:57]
THE ROLE OF THE TEMPORAL ORDERING OF PITCHES IN TONAL ORGANIZATION
were common to all the 6 sequences. However, in 5 out of the 6 sequences, intervals (-5) and (+3) occurred and in 4 out of the 6 sequences, intervals (+4),
(-3), and (-2) occurred. Intervals (±6) and (±1), on the other hand, were not contained in any of these 6 sequences.
We then focused on the frequency of each interval relationship across the all 18 sequences. The interval (-5) existed in 11 out of the 18 sequences including
those for which the responses were not pitch D (those with low CCS values). Therefore, it is unlikely that interval (-5) is a strong cue influencing key
interpretation in these sequences. Similarly, intervals (+3), (-3), and (-2) are also unlikely candidates as they were found in many sequences with low CCS
values.
As for the remaining interval (+4), it was found in 5 out of 18 sequences. Four of these 5 sequences (S01, S04, S05, and S06) had the highest CCS values for
pitch D. For the remaining sequence (S11), the number of pitch D responses was more or less equivalent to those of pitch E and A and accordingly it had low
CCS values. In the former 4 sequences (S01, S04, S05, and S06), the interval (+4) consisted of pitch D4 and pitch F#4 (D4-F#4). In the latter one (S11), on the
other hand, the interval (+4) consisted of pitch A4 and pitch C#4 (A4-C#4). Thus, this interval relationship of a major third could have led listeners to
recognize the lower and the first pitch of the interval as the tonic or key center of these sequences.
Interval (+4), i.e., the major third, is considered an important interval for perceiving the tonality of a melody (Krumhansl, 1979, 1990). The results of this
experiment suggest that the presence of interval (+4) in a melody had a strong influence on listenersâ€(tm) tendency to identify the lower interval pitch as the
tonic of the key. Thus it seems interval (+4) is one of strong cues for guiding key interpretation.
By the way, were there any common intervals in the sequences with low CCS values? There were no common interval relationships that occurred in the 9
sequences with the lowest CCS values (see Table2). In 5 out of these 9 sequences, intervals (-5) and (-2) occurred and in 4 out of these 9 sequences, intervals
(+5), (-4), (-3), and (-1) occurred.
There were four intervals, which were found in those 9 sequences but did not occur in the above mentioned 6 sequences with the highest CCS values: (±6)
and (±1). Interval (+6) occurred in only one sequence (S13). Similarly, interval (-6) only occurred in S11 and S18. Interval (+1) occurred in S08, S10, S16,
and S17. Interval (-1) occurred in S09, S11, S14, S15, and S18. These results suggest that intervals (±6) and (±1) can be negative cues for guiding key
interpretation. Butler and Brown (1984, 1994) argued, based on Browne (1981), that these intervals are heard less frequently than other intervals in Western
music, and so do function from their rareness, as strong cues for guiding key interpretation.
In this experiment, the sequence materials did not contain intervals (±7), (±8), and so on. Moreover, a uniform occurrence of each interval was not
manipulated in the material sequences. Further research including a wider range of intervals is required to obtain a more detailed picture of the influence of
interval relationships on key interpretation.
References
Abe, J. (1987). Senritsu wa ikani shori sareruka [How is a melody processed?]. In G, Hatano (Ed.), Ongaku to ninchi (pp. 41-68). Tokyo: Tokyo
University Press.
Abe, J., & Hoshino, E. (1990). Schema driven properties in melody cognition: Experiments on final-tone extrapolation by music experts.
Psychomusicology, 9, 161-172.
Bartlett, J. C., & Dowling, W. J. (1980). The recognition of transposed melodies: A key-distance effect in development perspective. Journal of
Experimental Psychology: Human Perception & Performance, 6, 501-515.
Back to index
Proceedings abstract
Mark G. Orr
morr@uic.edu
Background:
Aims:
This study investigated whether musical training would facilitate memory when
encoding is implicit.
method:
Results:
Conclusions:
Even when the encoding is implicit, musical training affords better encoding of
relevant stimuli as evidenced by the experts' superior recognition. This has
implications for chunking as a mechanism of expert performance in general.
Back to index
Proceedings paper
Ever since that epochal discovery made some 25 centuries ago by a man from Samos, Pythagoras, musical
sound has been an interdisiplinary study in that its essence has embraced music, aesthetics, physics,
psychology, physiology, neurology and architecture. A proliferation of books and treatises on the physics
of music has provided humanity a substantive and objective base for musical sounds. these writings have
covered a wide range of topics such as: vibrating columns of air and accompanying idiosyncrasies of the
wind instruments; vibrating strings and the factors that govern resonance/tone quality; profuse
explications of the human auditory system, both subjectively and objectively; vibrating characteristics of
bars, plates, and membranes; from pure tones to electric synthesizers; and, from the phenomena of the
behavior of sound waves to the characteristics of perceived acoustics.
Introduction
From the many facets of psychoacoustics mentioned above, this study will investigate one aspect of the
human auditory musical functioning which is most germane to the human's musicality, viz., discernment of
pitch differences. One component of this desired musical attribute, difference limen (DL) acuity, is
defined as an amount by which a stimulus must change in the appropriate physical property in order for an
observer to detect a difference in sensation a certain criterion percentage of the time.' (Radocy & Boyle,
1997, p. 76) Perception of discreet differences in the frequencies of two separately generated tones is
one of the important characteristics of musical behavior. Many forms of musical behavior that are
fundamental to successful performance or perception require the individual musician to possess a
relatively high level of pitch discrimination ability (Sergeant and Boyle, 1980).
This investigation was concerned with whether pitch acuity abilities (discernment of frequency
differences--difference limens), not actually a requirement of pianists' performances, is a concomitant of
their development, as it must be for saxophonists, trombonists, violinists, and vocalists. In comparison
to a group of vocalists, each of whom has to make physical adjustments as a prime requisite in their
performances, violinists, and trombonists to a lesser extent, and saxophonists to a still lesser degree,
do pianists learn this musical attribute simply through maturational experiences?
Why such an investigation? Two reasons are immediately apparent. First, we need to know if the development
of pitch discernment ability must be separately emphasized in the pedagogical processes. Second, a
relatively high pitch discrimination ability, and its development, exemplifies many forms of musical
behavior that are critical to successful cognition or performance of western music. This ability to
differentiate between tones of different frequencies continues to be acknowledged as prime in that most
researchers have included a form of measure of pitch judgment among their tests and various facets of
research.
Related Literature
As with numerous research reports (Bruner, 1984, and Sloboda, 1985), this study is involved, as in most
naturally occurring instances, with the subject's perceived pitch of the fundamental frequency of each of
the stimuli. Whether pure tones or complex tones, it is 'normal hearing' for the human's basilar membrane
and the higher nerve centers, responding in consort, to identify the fundamental pitch of each stimulus
(Bekesy, 1960). Further, . . . training has a great effect on what is perceived. . . . the experience of
every teacher of musicianship--has demonstrated a marked difference between the responses of trained
musicians and those of other listeners.' (Bruner, 1984, p. 39) Discerning differences in fundamental
pitches, and the sharpening of this ability then, was the salient motivation of this study.
Two-frequency dyads can be discerned as a musical stimulus or the listener can hear two tones with pitches
corresponding to the individual frequencies of the dyad if the difference in the frequencies is not tuned
too small . (Smoorenburg, 1970)
Roederer (1975) says this is due to the ability of the cochea to extricate the frequency components from a
complex vibration pattern. A single vibration pattern at the oval window gives rise to two resonance
regions of the basilar membrane. If the frequency difference between the two component tones is large
enough, the corresponding resonance regions are sufficiently separated from each other. Each one
oscillates with a frequency corresponding to the component tone. If the frequency difference is smaller
than a certain amount (the DL), the resonance region overlaps and only one tone of intermediate pitch with
modulated or 'beating' loudness is heard. It is to be remembered that the difference limen (DL) of a
hearing sensation is the difference in frequencies which will give rise to a perception of two different
pitches in one half of the total number of trials. In this connection the pitch sub-test of the well-known
Seashore battery includes intervals down to six cents.
Leipp (1977) reported that 50 per cent of the students in the Conservatoire de Paris were able to
discriminate intervals of four cents and Rakowski (1977) observed some students at the Academy of Music in
Warsaw who could discriminate intervals with two cents differential. Meyer (1978) reported similar results
but cautioned that each musician's discrimination range varied according to timbres. In the foregoing,
discernment of frequency differences (of the fundamentals) was better than in studies such as the present
one due to the fact that musical sounds present a timbre identification element not present when the
stimuli are audio generated.
Encompassing the frequency ranges utilized in orchestral music, when a number of pitch judgments are
averaged, the smallest DL is generally 30 cents for most trained musicians. According to several authors
(Thurlow and Bernstein, 1957, & Plomp, 1964), the auditory separation of two simultaneous sounding
frequency tones in most musical frequency ranges may be accomplished only when the interval is between the
simultaneous sounding frequencies is not smaller than a semitone. Lundin (1985) reports that the average
person has a DL of plus or minus three cycles when the reference frequency is 435 Hz. There are many
reports which emphasize that individuals differ in their ability to discriminate differences in
frequencies--that these limits vary considerably from individual to individual, dependent on the occasion
and the frequency range (Roederer, 1975, and Radocy & Boyle, 1997). In a previous study of this nature,
Parker (1983) found no significant differences in the competencies of a group of pianists and a group of
trombonists to discern difference limens.
In the early years of a person's musical training, dependent on the musical medium, accuracy of pitch
discernment competency attainment is either paramount, or not. Because saxophonists, trombonists,
violinists and vocalists must ultimately pay constant attention to pitch and attend to very small degrees
of variances in pitch, it would seem that their acuity for frequency discernment within their instruments'
(the human voice is considered an instrument) range will be more discriminative than the acuity of
pianists, who are not required to pay constant attention to pitch. Pianists obviously are concerned, in
performance on a given piano, indirectly with the pitch variances as 'built-in' to the equal-tempered
scale. Consequently, although the other three musical elements (loudness, timbre, and time) are of
constant musical concern to all musicians, this study will investigate only the music element of pitch and
its discernment competency-level in respective groups of pianists, saxophonists, trombonists, violinists,
and vocalists.
Hypotheses
Ho -- There is no difference in the reports of difference limens (DL's) of respective groups of pianists,
saxophonists, trombonists, violinists, and vocalists.
H1 -- There are differences in the reports of difference limens (DL's) of respective groups of pianists,
saxophonists, trombonists, violinists, and vocalists.
Experimental Design
Subjects
The 75 subjects were university students--15 each in groups of pianists, saxophonists, trombonists,
violinists, and vocalists, all with ontologically normal hearing. The criteria for the pianists were that
each pianist did not play any other instrument in an instrumental performing organization and that they be
registered for applied piano lessons. The criterium for each of the subjects in the other groups was that
they be registered for applied music lessons on their designated instruments.
The Johnson Intonation Trainer was used exclusively to provide the frequency signals for the experiment.
This is a tunable keyboard with two tunable sections, each ranged from C3 (130.81 Hz) to C6 (1046.48 Hz).
Each note on the keyboard is capable of being tuned up or down approximately a major third from its usual
equal tempored scale pitch. A Stroboconn Model 6T5 was used to set and check each frequency chosen.
Amplification of the signals was provided by a Technics SU-V7 Amplifier and a pair of JBL 4301B Studio
Monitors. Loudness preferences were set by each subject.
Procedure
The timbre on the Johnson Intonation Trainer chosen for this study was the 'flute' setting. Because of the
limited range of this instrument, the fact that tuning any note tunes the octaves as well, and in order to
get the maximum number of pairs of notes, it was necessary to use only the notes A, C, and E as the basic
pitches. This provided for A's at 220 Hz, 440 Hz, and 880 Hz; C's at 130.82 Hz, 261.26, 659.24 Hz, and
1046.48 Hz; and, E's at 164.8 Hz, and 329.62 Hz. A total of 60 dyads were available. Each stimulus was a
combination of the chosen fixed frequency and a higher frequency designated randomly in the dyads, ranging
from the fixed frequency to a frequency 100 cents higher, in increments of 10 cents. (See appendix B) It
should be noted here that the variances used in constructing the dyads were listed in a random fashion so
that the subjects were unable to assume a predictiveness about the next stimulus. Also, the two
frequencies making up each dyad stimulus were sounded simultaneously.
Prior to the presentation of the 60 stimuli, the experiment was explained to each subject individually.
This included a discussion of frequencies, two instruments playing together, tuning and the production of
beats. This was illustrated aurally by the use of the Johnson Intonation Trainer and visually by the
Stroboconn. When each subject indicated that each facet was understood, the subject was asked to listen to
a perfect unison using two notes on the keyboard. One of these notes was then detuned gradually sharpuntil
the subject indicated that a second pitch was audible rather than being a unison with beats. The
Stroboconn was used to check how far sharp the upper note had been tuned. Next, two notes were tuned 100
cents apart; i.e., a half step. These two tones were played together and the upper one detuned flat until
the subject indicated that one unison with beats was heard rather than the two distinct pitches. Again the
Stoboconn was used to measure this difference.
When the subject had indicated that the procedure was understood, the 60 stimuli were performed by the
administrator at the keyboard of the Johnson Intonation Trainer. The duration of each stimulus was
normally two seconds with an equal amount of silence between. The subject was asked to write '1' or '2' in
the appropriate blank indicating whether the stimulus was heard as one pitch or two pitches. The time was
lengthened if the subject needed more time on each stimulus. Repeats were all permitted since the time
factor was not crucial.
Processing of the raw data was done with two procedures to both obtain and check x2 (Chi-Square) values.
First, the formula
x2 = z (O - E)2
was utilized, computing on the basis of the experiment being a one-sample test.
x2 was found to be 2.21. The tabled value of 3.84 > 2.21. (P < .05) indicated failure to reject Ho.
Second, the formula
was utilized, computing on the basis of the experiment being based on two independent samples (the
pianists being one sample and all other subjects being another sample). x2 was found to be 1.04. The
tabled value of 3.84 > 1.04. (P > .05)indicated failure to reject Ho. (Madsen and Moore, 1997)
Though the statistical treatment of the data failed to indicate differences in the groups' difference
limens, the raw data do reveal some cognitive differences in the groups, albeit a lack of significance at
the .05 confidence level. They are:
X 28 38 34 29 31
R 17 22 18 17 18
No significant differences were found, statistically, in the difference limens of groups of pianists,
saxophonists, trombonists, violinists, and voclists respectively. the comparison of other data, as shown
above, does seem to indicate that pianists and violinists, as groups, perceive two tones in each stimulus
dyad at a smaller DL than do the saxophonists, trombonists, and vocalists, as groups. A saxophonist made
the most erros in pitch discernment and a pianist score the most accuracies (i.e., a DL); taken as groups,
the violinists and saxophonists were the most accurate, the trombonists rather 'in the middle,' and the
vocalists and pianists, as groups, scored lowest in accuracies of responses. In comparative
psychoacoustical terminology, in cents:
Dl* 39 28 31 22 34
*Difference Limens
Due to the nature of their musical mediums, saxophonists, trombonists, violinists, and vocalists have to
give constant attention to adjustment for pitch accuracies, whereas pianists do not have this concern. It
would seem that there would be significant differences in the groups' frequency discernment abilities.
Since there is not (at least in this investigation), at least three questions are elicited. One, is it
that the pianists, through listening as an adjunct to their acquirement of psychomotor skills learn pitch
acuities indirectly? Two, is it because they, probably as a group, have begun their music studies at an
earlier age than is customary for the other groups--somewhat validating the theories regarding critical
stages of learning? Three, is it likely that pianists, practicing and performing simultaneously sounding
tones (whereas the other four groups' subjects perform tones sequentially), are learning from the
contextual musical occurrences?
Recommendations for future research would include a similar type of study with the stimuli being complex
tones produced by actual orchestral instruments. Heterogeneous and homogeneous mixtures of timbres would
be an added dimension. This could be a pseudo replication of a study reported by Geringer (1989), in which
he reported '. . . non-majors tended to have . . . more correct discriminations to quality than to
frequency stimuli.' (p. 35) Another investigation should be made as to the importance of the maturational
influence by having groupings according to sex, age, intelligence, or scores'classifications obtained from
the subjects taking a standardized music test.
Finally, this author does not wish to leave the impression that music education has not been effective.
Lehman (1985) has said that 'the level of music teaching in the schools has never been higher . . .
(college/conservatory) freshmen play better and know more about music than the graduate students did 20
years ago . . . due simply to the magnificently successful efforts of the (educational psychologists) and
music educators.' (p. 12)
References
Bekesy, G. (E. G. Weaver, Ed. and Trans.) (1960). Auditory thresholds. Experiments in hearing. New York:
McGraw-Hill Book Company.
Bruner, C. L. (1984). The perception of contemporary pitch structures. Music Perception, 2(1), 25-40.
Frances, R. (1988). Psychological origins and development of the sense of tonality. The perception of
music___®. In W. J. Dowling (Trans.). Hillsdale, NY: Lawrence Erlbaum Associates, Publishers.
Geringer, J., & Madsen, C. K. (1989). Pitch and tone quality discrimination. Canadian Music Educator,
Research Edition (Special Supplement), 29-38.
Hall, E. E. (1980). Musical acoustics: An introduction. Belmont, CA: Wadsworth Publishing Company.
Hodges, D. A. (1996). Neuromusical research: A review of the literature. In D. A. Hodges (Ed.), Handbook
of music psychology (2nd ed., pp. 197-284). San Antonio, TX: IMR Press.
Lehman, P. R. (1985). What's right with music education. Georgia Music News, 45(3), 10-12.
Madsen, C. K., & Moore, R. A. (1997). Experimental research in music: Workbook in design and statistical
tests (2nd ed.). Raleigh, N___'C: Contemporary Press.
Meyer, J. (1978) The dependence of pitch on harmonic sound spectra. Psychology of Music, 6(1), 3-12.
Moore, B. C. J. (1973). Frequency difference limens for short-duration tones. Journal of the Acoustical
Society of America, 54, 610.
Rakowski, A. (1977). Memory for absolute and relative pitch. Paris: Symp. Psychoacoustique Musicale.
Roederer, J. G. (1975). Introduction to the physi___¦cs and psychophysics of music.New York:
Springer-Verlag.
Serafine, M. L. (1983). Cognitive process in music: Discoveries vs. definitions. Bulletin of the Council
for Research in Music Education, 73, 1-14.
Shuter-Dyson, R., & Gabriel, C. (1981). The psychology of musical ability (2nd ed.). New York: Methuen &
Co.
Sloboda, J. A. (1985). Categorical perception of frequency. The musical mind. Oxford: Clarendon Press.
Smoorenburg, G. F. (1970). Pitch perception of two-frequency stimuli. Journal of the Acoustical Society of
America, 48, 924.
Thurlow, W. R., & Bernstein, S. (1957). Simultaneous two-tone pitch discrimination. Journal of the
Acoustical Society of America, 29, 515-519.
Appendix A
Practice exercises: 'You will hear two notes played together as a perfect unison. Then one note will be
raised in frequency, gradually. When you hear two distinct pitches (that is, no longer just 'beats'),
raise your hand.' (the difference is then read and recorded.)
Practice exercise 1: ____________ Practice exercise 2: ______________
Now you are ready to take the test. You will hear, at spaced intervals of four seconds (two seconds of
sound followed by two seconds of silence) a sound to which you are to respond by writing 1 if you hear one
pitch, and 2 if you hear two pitches simultaneously. These occur relatively fast, so mark each answer
quickly and be ready for the next stimulus. Mark your answers in the columns indicated below. At the end
of each set, there will be a four-second interval. You should, during that time, get ready to start the
next column. The test admistrator will call out the beginning of each set.
2. ___ 12. ___ 22. ___ 32. ___ 42. ___ 52. ___
3. ___ 13. ___ 23. ___ 33. ___ 43. ___ 53. ___
4. ___ 14. ___ 24. ___ 34. ___ 44. ___ 54. ___
5. ___ 15. ___ 25. ___ 35. ___ 45. ___ 55. ___
6. ___ 16. ___ 26. ___ 36. ___ 46. ___ 56. ___
7. ___ 17. ___ 27. ___ 37. ___ 47. ___ 57. ___
8. ___ 18. ___ 28. ___ 38. ___ 48. ___ 58. ___
9. ___ 19. ___ 29. ___ 39. ___ 49. ___ 59. ___
10. ___ 20. ___ 30. ___ 40. ___ 50. ___ 60. ___
Appendix B
F R E Q U E N C I E S
(Deviations in cents)
X E3/A3 C5 E5 C6 / C3 E4 /A4 C4
1. 40 100 80 10 50 70
2. 90 50 50 80 10 40
3. 100 10 70 70 100 60
4. 50 60 40 20 20 30
5. 70 70 60 90 80 50
6. 20 20 30 60 30 90
7 80 80 90 30 70 20
8. 10 40 20 100 40 100
9. 30 90 100 50 90 80
10. 60 30 10 40 60 10
Back to index
Proceedings abstract
Rebecca A. Pittenger
Rebecca.A.Pittenger@Dartmouth.EDU
Background:
Listeners expect unstable tones to be followed by stable tones that are close
in pitch. This is familiarly known as 'resolution'. According to our model, an
unstable tone, being unexpected, attracts frequency selective attention to
itself. A stable tone within the attention band (roughly a minor third)
captures the focus of attention, leading to a directional expectation toward
the stable anchor. Mistunings of unstable tones should thus generally be
preferred if they are in the direction of the nearest anchor (e.g., a leading
tone is preferred if mistuned sharp rather than flat). Paradoxically, because
of the band of attention around the anchor, mistunings should sound more
pronounced (more distant or dissimilar from the original) if they are in the
direction of the anchor (e.g., a mistuned leading tone should sound less
similar to a well tuned leading tone if mistuned sharp than flat).
Aims:
The aim of this study was to compare the perception of anchored and nonanchored
mistunings of nonchord tones by eliciting preference and similarity judgments.
method:
Results:
Conclusions:
Back to index
Proceedings paper
A longitudinal study of the acquisition process of absolute pitch: an effect of subject's age on the process.
Ayako Sakakibara
JSPS Fellowships for Japanese Junior Scientists
3-19-2 Nagasaki
Toshima-ku Tokyo 171-0051 Japan
HQM01603@nifty.ne.jp
Introduction
Absolute pitch (AP) is the ability to identify or produce a musical pitch without the use of external reference tones. In this study,
some aspects concerning the acquisition processes of AP is investigated.
It has been suggested that everyone initially has the potential to acquire AP. However, almost everyone who exceeds a certain age
cannot acquire AP. This "early-learning theory" states that the musical experiences only during a limited early period can be
effective to develop AP. According to this theory, we can expect that the learning ability to develop AP is decreased as a person
grows older. But little attention has been given to the question about what kind of changes decrease the possibility to acquire AP as
one gets older.
I suppose that the acquisition processes of AP would be different by age of the subjects. The purpose of this study is to examine the
effect of a subject's age on the acquisition processes of AP.
I trained 6 young children (non-AP possessors) to develop AP by the chord identification training method, and investigated their
acquisition processes of AP longitudinaly.
In this training method, children generally start the training at 3 or 4 years old. This age group can be called the "general age". In
this study, I trained subjects who started the training when they were younger or older than general age. Subjects of this study were
3 younger subjects starting the training at 2 years old and 3 older subjects starting at 5 or 6 years old.
I would like to clarify the different characteristics of the acquisition processes of AP between the subjects of differing ages. This
attempt would offer a key to understanding the changes which decrease the possibility to acquire AP as one grows older.
Method
(1) AP training method.
This study used the chord identification training method, which is the most successful method to acquire AP (Eguchi, 1991). The
method consists of tasks for identifying some chords.
Training by the use of chords is considered to be proper to acquire AP. According to the notion that the attributes of tones have two
components "tone height" and "tone chroma", the acquisition of AP can be regarded as developing the reference frame of "tone
chroma". The use of chords can prompt attention to "tone chroma". In the case of identifying single tone, subjects would tend to
pay attention to "tone height". In contrast, the use of chords enables one to make stimulus that "tone height" are similar, but "tone
chroma" are quite different. So to identify among chords can make subjects to identify chords depending on their "chroma".
Subjects did chord identification tasks everyday. One session consisted of twenty to thirty trials (it took about three minutes).
Subjects had to do four or five sessions per day, thus totaling about one hundred and twenty trials per day. If subjects could identify
chords, the number of chords would be increased. At the time when nine kinds of chords are identified perfectly, the AP for every
white-key note is acquired(Oura et al,1981).
The training generally took about one year for the acquisition of white-key notes. The analysis of this paper deals with AP of
white-key notes (the identification of nine kinds of chords) only. The nine chords were, CEG, CFA, HDG, ACF, DGH, EGC, FAC,
HCD and GCE.
(2) Subjects.
In the chord identification training method, children generally start the training at 3 or 4 years old. This can be called as the
"general case" (Sakakibara,1998,1999). The subjects of this study were children who started the training at 2 years old (younger
In the case of the younger case group, errors depending on "chroma" already appeared in Stage 1. Stage 2 of the younger case
group was the periods when the percentage of errors depending on "chroma" was high.
In contrast, Stage 2 of the older case group had very few errors depending on "chroma". In the older case group, errors depending
on "height" were consistently dominant. The result suggested that older subjects tended to identify the chords mainly based on
"height".
[Stage 3]
In Stage 3 of the general case, a lot of errors were observed. Especially the percentage of errors depending on "chroma" was high.
The Stage 3 characteristics of the older case group were the same as the general case with respect to low percentage of correct
answers. But the details of errors were quite different. Errors observed in the older case group were almost dependent on "height".
In the case of the younger case group, few errors were observed. Younger subjects came to correctly identifying the chords in this
stage. Younger subjects always had a few errors depending on "height" throughout the processes.
[Final Stage]
In the Final Stage, there was no difference among cases. Errors of Stage 3 decreased gradually and the percentage of correct
answers became 100% in every case.
Back to index
Proceedings paper
Introduction
Music is sequential and more and less continuous. But we hear 'chunks' of sound that pass before us; they begin and end. Models of segmentation thought to be responsible for chunking define musical
descriptors through which the degree of change in a sequence of events can be measured. Segmentation occurs when change in a parameter(s) exceeds some bound of coherence, causing a 'break'. Lerdahl and
Jackendoff proposed one model of this kind - as part of a broader theory - the Generative Theory of Tonal Music (GTTM), (Lerdahl and Jackendoff, 1983).
Figure 1: Diagrammatic view of the differences in attending to segmentation by musicians and non-musicians reported by Deliège (1987).
melodic organisation that use the score as their input depart from trying to model perceptual processes because they use a representation that is an informal and partial model of WTM's organisational principles
(Baker, 1989; Friberg, Bresin, Frydén, & Sundberg, 1998; Stammen & Pennycook, 1994). Assuming that music and its notated representation are equivalent confuses aspects of human auditory mechanisms
with a symbolic communication code and runs the risk of confusing the organisational principles of WTM with more general principles. To investigate this a number of exploratory studies have been
undertaken.
Lerdahl and Jackendoff's GPRs have been implemented as a program, and applied to sets of melodies of different cultures transcribed via traditional western notation into symbolic codes. The data sets used in
this transcription set included German children songs, Irish traditional melodies, Ojibwa songs and Chinese traditional melodies (table 1.). The first two sets were encoded in the Essen Associative Code
(EsAC) (Kindly provided by Ewa Dahlig from the archive established by Helmut Schaffrath) and the third and fourth sets (Ojibwa and Chinese melodies) in the Kern code (von Hippel, 1998). We will use the
term Chippewa music instead of Ojibwa, as it is the case in the original report (Densmore, 1909).
German 78 39 16
Chinese 30 67 31
Irish 50 77 33
Chippewa 41 56 25
The encoded melodies were presented to the 'rule program' and a number of statistics were collected, e.g. the number of times that each rule was fired during a melody. To balance the effect of uneven melodic
length this raw count was normalized by dividing the number of firings by the number of notes in the melody. The mean percentage of that rule's normalized firings was calculated for each set of melodies
(figure 2).
Figure 2: Normalized numbers of rule usage for each transcribed data set.
When making comparison with results from (Deliège, 1987) we must bear in mind that these results reflect all situations where a rule could fire (no salience scale was implemented). The rule usage measured is
consistent with the results of Deliège's second experiment (Deliège, 1987) for three of the four data sets. The R.3 (Register rule) is the most widely used rule, followed by the R.2 (Attack/Point) rule (as it is
reported in Deliège (1987), taking into account the absence of the R.7 (Timbre rule) from this assessment). The least differentiated rule usage is that of the R.6 (Length rule). The low counts for the R.1 (Rest
rule) usage comes somewhat as a surprise considering that it is often considered to be the chunking cue (Narmour, 1990; Deliège, 1987)
"The sensitivity to a sensation of a gap in music perception may be considered, by the way, as a key element in the grouping behaviour." (Deliège 1987, p.343)
The following observations about segmentation processes in these different musical cultures seem valid:
● In all four sets, the importance of the Length rule (R.6) is similar.
● The Rest rule (R.1) is largely absent from all but the European melodies.
● In the Chippewa set, there is a preference for using the Attack/Point rule (R.2) over the Register rule (R.3). The situation is opposite for the other three sets.
However, it is questionable whether even such general interpretation of the measures can be made with any confidence due to the uncertainties associated with the encoded information. The main problems
arising from pitch and duration transcriptions of melodies apply mostly to the data sets from the musical cultures most remote from the WTM system, namely Irish traditional and Chippewa music (Densmore,
1909; Mulheir, 1991; Henebry, 1928). These will now be briefly described. For research on similarities and historic relationship between Western and Chinese scale systems see (Kuttner, 1975; Dowling &
Harwood, 1986). The following problems are more or less applicable to the transcription of non-western music.
Pitch transcription
What Densmore calls "incorrect tones" are tones that are at a categorical granularity that is either finer than or out of step with the chromatic pitch system of WTM. These are transcribed with symbols
representing a pitch "slightly less than a semitone higher/lower than the proper pitch" (Densmore, 1909, Vol. II, p. XIX; Abraham & von Hornbostel, 1909-1910). When taking into account that human 'just
noticeable difference' (JND) for pitch is approximately 1/12th of a semitone (Howard & Angus, 1996), 'slightly less' carries very little information.
file:///g|/poster2/Serman.htm (3 of 11) [18/07/2000 00:35:11]
Brief description of L&J rules and applications -ref Deliege
The use of vibrato (wavering tone) that is prevalent in both Chippewan and Irish singing styles can be indicated by a symbol (which is absent from the above mentioned data sets) but even then, the influence of
the pitch change (due to vibrato) on perceptual chunking is lost.
Interval transcription
When looking at Densmore's extensive statistics, one is struck by the variety of scale types and intervals used by Chippewa singers that do not conform to the WTM system definitions and classifications
(Burns, 1999). Without deeper analysis of this variety, the appropriateness of the Register rule definition (GPR 3a) as given in (Lerdahl & Jackendoff, 1983) for Chippewa melodies is questionable.
Metrical transcription
Both Irish and Chippewa singers structuring of time often completely evades the rules of WTM transcription. Chippewa singers introduce time changes almost every measure in many songs (Densmore 1909).
One of the problems with the Irish tunes transcriptions, comes from the dominance of the bar lines over the note importance. This often obscures the accents (present in the performance of the melody). Also,
airs (both slow and quick) have no definite time structure at least not in the WTM system sense and their transcription is therefore particularly "untrustworthy" (Henebry 1928).
Durational transcription
Leaving aside prolongations of notes that are reported to be extremely difficult to transcribe by all three of the aforementioned authors, the virtual non-existence of rests in three out of four transcribed data sets
is highly unlikely. In spite of Densmore's observation that rests occurred in only 4% of the transcribed songs, from the perceptual point of view rests do not have to fit into any metrical system in order to be
perceived as such. For this reason the absence of rest indicators may be a serious omission and this brings into question the Rest rule (GPR 2a) results (figure 2).
When reviewing the counts above and reading the reports of the transcription of Chippewa and Irish melodies, two conclusions seem inescapable:
1. The number of omissions from the real sound of the music that the ethnomusicologist has to allow to be able to make the transcriptions is unknown but probably significant.
2. The notation is incomplete and hence inappropriate as a representation of perceptual properties. We have no way of measuring the potential role of absent descriptors.
These problems must lead us to ask the following question: Do the variations in organisation - reflected in the frequency of rules - imply different organisations of rule use in different musical cultures, are they
meaningless, or do they simply reflect properties of the WTM notation system? The answer is that we won't know unless we analyse perception independently of notation. We must start with sound rather than
from an incomplete theory laden abstraction.
The MusicTracker
The software tool described here is called the MusicTracker. It extracts information directly from the digitally recorded music signal. The present version is suitable for application to monophonic music and its
development is still in progress. It has been developed in MATLAB and is provided with a graphical user interface.
The MusicTracker reads a 'wav-file', i.e. an auditory signal expressed in an array of numbers representing changes in the acoustic air pressure over time. The signal array is divided into consecutive 'frames' of
equal duration - about 20 ms, and for each frame the values of pitch, perceptual dynamics and timbre indicator are computed.
Due to the fixed frame duration, the strings of consecutive values of each indicator form a discrete function of time representing the change of the respective indicator during the melody. These functions can be
represented graphically or stored for further manipulation either in their raw form, i.e. as computed or smoothed by the use of filters with selectable bandwidths.
In order to eliminate the influence of the pre-set physical dynamic range depending on the sound record quality, the values of perceptual dynamics indicator of all frames of the melody are normalized. All
values are divided by the highest value of the perceptual dynamics indicator found in the entire sequence. Thus, dynamics are bounded within the range from 0 to 1.
Figure 3: Transcribed WTM notation of the first few seconds of the shakuhachi piece.
Figure 4: Applying GPRs to the notated shakuhachi excerpt results in no rules firing.
However, when we listen to the fragment it is clear that the dominant perceptual changes are occurring not in the development of the pitch indicator but in the loudness and timbre indicators. This can be seen
in the graphic representation of the MusicTracker indicators (figure 5). This is perhaps not surprising when we consider that shakuhachi players can use 6-8 different tone qualities on the same tone (Dai Shihan
Further development
The initial analysis described above indicates some of the advantages of working with real sound. In this way, it has been possible to look at issues for modelling melodic segmentation processes at a level
closer to a real performance and how these compare to the results of notational transcriptions. However, the MusicTracker is currently still under development. A number of developments will be included in
the future. The indicators of perceptual dynamics and timbre are currently defined by simplifications derived from our existing knowledge based on psycho-acoustics and related disciplines, and these
definitions are to some degree simplifications of the perceptual descriptors - this is especially so for the timbre indicator.
The application of the MusicTracker in analysing many monophonic melodies performed by various sound sources (including human voice, oriental instruments like shakuhachi and sounds generated by
computer) has shown its ability to indicate significant changes of pitch, perceptual dynamics and timbre respectively. This makes it a useful tool in the computational investigation of melodic segmentation
across cultures, albeit a first approximation.
In order to investigate perceptual processes and build a model of melodic segmentation two problems have to be addressed. Firstly, there is the problem of modelling in a more realistic way the sensory and
perceptual information available to the listener when listening to melodies. Secondly, we need to take into account the fact that this perceptual information is conditioned heavily by the music culture that the
listener belongs to, i.e. that the organisation and balance of the contribution of descriptors within a musical culture is to some extent learned. While the descriptors themselves may be considered to be
hard-wired gestalt processes, how they are used together has to be learned. It is this aspect of melodic organisation that the developing MusicTracker will be used to model. Thus we hope it will help clarify the
validity or otherwise of models such as GTTM's GPRs and thus contribute to an increasingly scientific approach to the investigation of melodic processes.
Figure 6: Applying GPRs to the MusicTracker results for pitch from the shakuhachi excerpt. The numbers at the top, represent the number of the rule that fired. In this case, it was the
Attack/Point rule (GPR 2b) that fired - twice.
Acknowledgments
We would like to thank Mr Tomizu Inzan - Master of the Tozan School of Shakuhachi, Mr Kevin Hayes, Ms. George Mulheir, Dr. Ewa Dahlig, Paul von. Hippel and Dr Donncha O'Maidin for their comments
and suggestions as well as for the recordings, transcriptions and encoded data sets.
References
Abraham, O., & von Hornbostel, E. M. (1909-1910). Vorschlaege fuer die Transkription exotischer Melodien. Sammelbaende der Internationalen Musikgesellschaft, 1-25.
Baker, M. (1989) An artificial intelligence approach to musical grouping analysis. Contemporary Music Review, 3, 43-68.
Burns, E. M. (1999). Intervals, Scales and Tuning. In Deutsch, D. (Ed.), The Psychology of Music San Diego: Academic Press, 215-264.
Crossley-Holland, P. (Ed.). (1974). Selected Reports in Ethnomusicology, 2(1)
Densmore, F. (1910). Chippewa Music. Washington: Government Printing Office.
Deliège, I. (1987). Grouping conditions in Listening to Music: An Approach to Lerdahl and Jackendoff's Grouping Preference Rules. Music Perception, 4(4), 325-360.
Dowling, W. J., & Harwood, D. L. (1986). Music Cognition. San Diego: Academic Press.
Erdely, S. & Chipman, R. (1972). Strip-chart recording of narrow band frequency analysis in aid of ethno musicological data. 1972 Yearbook of the International Folk Music Council, 120-136.
Fletcher, H., & Munson, W. A. (1933). Loudness, its definition, measurement and calculation. Journal of the Acoustical Society of America, 5, 82-108.
Friberg, A., Bresin, R., Frydén L., & Sundberg, J. (1998). Musical Punctuation on the Microlevel: Automatic Identification and Performance of Small Melodic Units. Journal of New Musical
Research, 27(3), 271-292.
Fujie L. (1992) East Asia/Japan. In Todd Titon J. (Ed.), Worlds of Music: an introduction to the music of the world's peoples. New York: Schirmer Books, 318-375.
Grey, J. M. (1977). Multidimensional perceptual scaling of musical timbres. Journal of the Acoustical Society of America 61(5), 1270-1277.
Hajda, J. M., Kendall, R. A., Carterette, E.C., & Harshberger, M. L. (1997). Methodological issues in timbre research. In Deliège I., & Sloboda J. (Eds.), Perception and Cognition of Music.
Psychology Press, 253-305.
Handel, S. (1995). Timbre perception and auditory object identification. In B. C. J. Moore (Ed.), Hearing, New York: Academic Press, 425-461.
Henebry, R. (1928). A handbook of Irish music. London: Cork University Press
Hippel, P. T., von. (Ed.). (1998). 42 Ojibwa songs in the Humdrum **kern representation: Electronic transcriptions from the Densmore collections [computer database].
Houtsma, A. J. M. (1997). Pitch and Timbre: Definition, Meaning and Use. Journal of New Music Research, 2, 104-115.
Howard, D.M., & Angus, J. (1996). Acoustics and psychoacoustics, Oxford: Focal Press.
West, R., Howell, P., & Cross, I. (1991). Musical structure and knowledge representation. In Howell P., West, R., & Cross, I.(Eds.), Representing Musical Structure. London, Academic Press.
Iverson, P., & Krumhansl, C. L. (1993). Isolating the dynamic attributes of musical timbre. Journal of the Acoustical Society of America. 94(5), 2595-2603.
Jones, M. R., & Boltz, M. (1989). Dynamic Attending and Responses to Time. Psychological Review, 96(3), 459-491
Krumhansl, C. L., Iverson, P. (1992). Perceptual interactions between musical pitch and timbre. Journal of Experimental Psychology: Human Perception and Performance, 18(3), 739-751.
Kuttner, F. A. (1975). Prince Chu Tsai-Yü's life and work: a re-evaluation of his contribution to equal temperament theory. Ethnomusicology, 19(2), 163-204
Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge, MA: MIT Press.
Mulheir, G. (1991). The structure and function of the 'sean-nós' tradition in Connemara, Ireland and the effects of outside influence on the Irish oral tradition vol. I & II Unpublished manuscript
with the University of Sheffield.
Narmour, E. (1990). The Analysis and Cognition of Basic Melodic Structures. Chicago:University of Chicago Press.
Rossing, T. D. (1990). The Science of Sound, Addison-Wesley.
file:///g|/poster2/Serman.htm (10 of 11) [18/07/2000 00:35:11]
Brief description of L&J rules and applications -ref Deliege
Seeger, C. (1958). Prescriptive and Descriptive Music-Writing. The Musical Quarterly, 44(2), 184-195.
Selfridge-Field, E. (Ed.). (1997). Beyond MIDI - The handbook of Musical Codes. Cambridge, MA: MIT Press.
Serafine, M. L. (1988). Music as Cognition, New York: Columbia University Press.
Stammen, D. R., & Pennycook, B. (1994). Real-time segmentation of music using and adaptation of Lerdahl and Jackendoff's Grouping Principles. In Proceedings of the 3rd International
Conference on Music Perception & Cognition. Liège, Belgium: European Society for the Cognitive Sciences of Music, 268-270.
Back to index
Proceedings paper
Introduction
The process of "tonal organization" is an indispensable part of melody perception. The cognitive
organization of tones within a melody shapes the "Gestalt" of pitch in the tone sequence. The function
of tonal organization (which is greatly dependent on "tonal schema") is to organize input tone
sequences into a system of tonality which is centered around "tonal center" (Abe & Hoshino, 1990).
In the process of tonal organization, each tone of a pitch sequence is related to the tonic according to
the relative distance (interval) and assigned tonal function in hierarchical relation to other pitches
(Dowling, 1994; Krumhansl, 1990). When tonal organization is well-formed, the tonal center is fixed
in the mind. That is, as a result of this process, we can perceive tonality.
Well-formed tonal organization is not easily achieved for all pitch sequences. Usually, listeners have
difficulty finding the tonal center of atonal pitch sequences. Some pitch sequences may have several
possible tonal organizations, leading to ambiguous key perception. Generally, when a stimulus object
has several possible and plausible patterns of organization, we will be just as likely to follow one
pathway as another. However, if the object is preceded by a context which eliminates all but one
possible pathway, we can perceive the object in one stable organization as determined by this context.
In the case of a pitch sequence with an ambiguous key, if the sequence is preceded by a context which
establishes only one key of several possible keys, the tonal organization of the sequence will proceed
according to the context driven key. For example, if the pitch sequence "F4 G4 F4 Bb4 D4 C4" (key =
Bb major or F major) is preceded by the Bb major scale, it will be tonally organized in the key of Bb
major.
One may question whether or not the melody "F4 G4 F4 Bb4 D4 C4" organized in Bb major is
perceived differently from the same melody organized in F major. This is not a difficult question for
ambiguous visual stimuli such as "figure-ground reversible figures (vase/profile reversible figure)",
"depth reversible figures", and "ambiguous figures (My wife and mother-in-law)". It is evident that
what is perceived in one interpretation (a vase) is different from what is perceived in another
interpretation (two faces in profile). However, the question of interpretation is more difficult for aural
stimuli, such as the aforementioned ambiguous melody. A simple method examining this question is
to prepare both pitch sequences which include the ambiguous melody and the context establishing
each possible key, and then compare listeners' interpretation.
Dowling (1986), using the method similar to above one, examined listeners' recognition of melodies
and which of two types of pitch information, intervals or scale steps, listeners use in encoding
melodies. He asked listeners (inexperienced, moderately experienced, and professional listeners) to
perform a long-term transposition recognition task of melodies which were preceded by a chordal
context that established the key of each melody. The experiments demonstrated that inexperienced
listeners (as well as professionals) performed the task regardless of context, and that moderately
experienced listeners performed differently as a function of context. Further, professional musicians
performed most accurately. He proposed that inexperienced listeners use pitch-interval
representations, moderately experienced listeners use scale-step representations, and that professional
musicians use more sophisticated representations, in their memory for melodies.
From Dowling's results (1986), it may led that inexperienced and professional listeners are able to
recognize that two identical melodies presented with different contexts are the same, and that
moderately experienced listeners perceive the same stimuli as different. The question of interest to
this study is what factors account for the difference in tonal organization between these groups of
listeners. It is possible that the tonal representation strategies of experienced listeners may be
differentiated by pitch perception strategies (the use of absolute pitch or relative pitch), rather than
their degree of musical training. That is, it seems that the way that AP experienced listeners perceive
and encode melodies may be somewhat different from non-AP experienced listeners. Second,
Dowling's long-term recognition task (which is rather similar to the short-term situation) may focus on
one of several aspects in the melody recognition process. As Dowling has demonstrated, melody
recognition strategy varies depending on the delay time between the standard and comparison
melodies (Dowling, Kwak and Andrews, 1995). Therefore, the effect of context should be examined
in other types of melody recognition tasks. For example, simple immediate recognition and
recognition after a period of learning could be explored. Until these factors have been taken into
account, the question of how ambiguous melodies are perceived has not yet been resolved.
This paper examines the effect of "context key" on recognition of the melodies with two possible
keys, using short-term (Experiment 1) and long-term (Experiment 2) recognition tasks. Secondly, to
assess the difference in melody representation strategies between AP possessors and others
(experienced listeners without AP, and inexperienced listeners). The main interest of this study is the
recognition of ambiguous melodies preceded by different context stimuli. In order to explore this
question as directly as possible, it was decided that a simple non-transposed recognition task would
serve the design of present study better than a transposed recognition task. It was also important to be
sure that pitch sequences were reliably key-ambiguous. Therefore, a set of the pitch sequences from
Yoshino and Abe (1996) and Yoshino (1998b) which were verified as having two possible and
plausible key interpretations were used in the present study.
Experiment 1
Experiment 1 examines the effect of "context key" on melody recognition in the short therm memory.
This experiment adopts a simple discrimination recognition task in short retention interval.
Method
Subjects
45 subjects, graduate and undergraduate students at Hokkaido University, participated in the
experiment. They were grouped into 3 groups (15 in each). The inexperienced group (IE group)
consisted of 15 subjects who had studied music for less than 4 years or not at all. The remaining 30
subjects had each studied music for more than 10 years. Of these 30 experienced listeners 15
possessed absolute pitch (AP group) and 15 did not (EN group).
Stimuli
32 standard stimuli consisted of six tones and were presented at a rate of two tones per second. Each
standard melody had two possible keys on the diatonic scale. A preliminary experiment confirmed
that listeners found the key of the melodies sufficiently ambiguous. That is, of the two possible keys
that could be identified for each melody, one was not chosen significantly more often than the other.
32 comparison stimuli also consisted of six tones and were presented at the same rate as the standard
melodies. Half of them were the same as the standard stimuli, and the other half differed from the
standard stimuli by the alteration of only one tone by one diatonic step, in a way that did not alter the
contour and the original key (changing in 2nd to 5th position tone but never in 1st or 6th).
Each melody (standard and comparison) was preceded by a scale context. The scale context of a
standard melody is called "first context scale", and that of a comparison is called "second context
scale". The scale context stimuli consisted of upper octave scales from one of the two possible keys
for each melody, presented at a rate of two tones per second. The key of the scale context was selected
from the two possible keys for the standard or comparison melody. In order to strengthen the sense of
key, the tonic tone of the context scale was presented simultaneously with the melodies as an
additional context cue. The tonic tone was played one octave below the original pitch for 3 seconds.
In half of the trials, the key of first context scale and second context scale was the same (same context
condition). In the other half of the trials (different context condition), the key of second context scale
was different from that of first context scale (the other possible key). For example, in the different
context condition, the melody composed of "C4 F4 G4 F4 E4 A4" has two possible keys: "C major"
and "F major". If the key of the first context scale (preceding the standard melody) was set to C major,
the key of the second context scale (preceding the comparison melody) was set to F major.
All stimuli were presented on a Yamaha tone generator MU50 using MIDI controlled by the MIDI
sequencer software "Performer". In order to distinguish the standard and comparison melodies from
the context stimuli, melodies were produced using a piano timbre (Grand Piano) and the context
scales and tones using an organ timbre (Church Organ). The stimuli were presented at a comfortable
listening level that could be adjusted by the subjects.
Procedure
A trial consisted of a first context scale (3s), a standard melody with a context tone (3s), a second
context scale (3s) and a comparison melody with a context tone (3s). Subjects' task was to indicate
whether the standard and the comparison melodies were same or different within the inter-trial
interval of 5 seconds. They were asked to ignore the context stimuli. After 4 practice trials, each
subject performed 32 trials presented in random order.
Results
The variables of interest in this experiment are the hit rate (HIT), the false alarm rate (FA) and the
recognition probability. A hit is defined as a SAME response for stimuli in which standard and
comparison melodies are identical, and a false alarm as a SAME response for stimuli in which the
melodies differ. A recognition probability is defined as the HIT rate minus the FA rate.
Figure 1 shows the mean recognition probability for the three subject groups and the two types of
context condition. These recognition probability data were analyzed in a two-way analysis of variance
[3 (Subject Group) x 2 (Context Conditions)], which yielded only a main effect of subject group, F (2,
42) = 68.18, p < .0001. AP subjects (M = .80) performed significantly better than EN subjects (M =
.48), and EN subjects performed significantly better than IE subjects (M = .13), in Tukey's Test (HSD
= .14, p < .05). The main effect of context was not significant, F (1, 42) = .26, p > .10.
Discussion
In the short-term recognition task, there were no differences between same and different context
conditions for all three listener groups. Listeners performing this task did not appear to be influenced
by the context stimuli. It was unexpected that the performance of EN subjects were not influenced by
context. They seemed to pay attention to pitch change, a perceptual feature rather than whole melody
difference, a cognitive feature. It appears that second context scale in this experiment was not strong
enough to affect tonal organization. This result converges with previous findings that information
based on pitch-interval pattern doesn't contribute to immediate recognition of novel melodies
(Dowling, 1978, 1991; Dowling, Kwak & Andrews, 1995).
As expected AP listeners performed best, followed by EN listeners, and the IE group performed most
poorly. It appeared that AP possessors could recognize melodies correctly by linguistically encoding
the pitches of standard melodies to absolute pitch names and maintaining them in memory. Based on
the questionnaire data, it seemed that EN listeners could sometimes encode the pitches of standard
melodies to scale names (movable do), and sometimes they could not. For IE listeners, the main
difficulty was maintaining the pitch information of the standard melody while listening to the second
context scale.
Experiment 2
Experiment 2 examines the effect of "context key" on melody recognition in long term memory. This
experiment adopts a simple long-term recognition task consisted of a learning session and a
recognition session. In the learning session, subjects listened to standard melodies preceded by a
context scale five times and were asked to remember them. Then, in the recognition session, they
were asked to discriminate standard melodies from distractor melodies preceded by a context scale.
Method
Subjects
39 subjects, graduate and undergraduate students at Hokkaido University, participated in this
experiment. There were 13 inexperienced listeners (IE group), 13 experienced listeners without AP
(EN group), and 13 experienced listeners with AP (AP group). EN and AP subjects had studied music
for more than ten years. None of these subjects participated in Experiment 1.
Stimuli
48 melodies were used in this experiment, including the 36 melodies used in Experiment 1. As before,
each melody could be assigned either of two possible keys in the diatonic scale. All of the melodies
consisted of six tones and were presented at a rate of two tones per second, as in Experiment 1. Half
of the 48 melodies were used as the standard melodies to be remembered, and were presented in both
the learning session and the recognition session. The other half were used as distractors in the
recognition session. In the recognition session, standard melodies were presented without any pitch
changes.
The context stimuli were presented before the standard melodies and the distractors. In each case, the
context consisted of a scale and a tonic tone, in one of the two possible keys for the melody. In the
learning session, each standard melody was presented five times, with the same preceding key
context. In the recognition session, the context key of half of the the melodies from the learning
session were the same as those in the recognition session (same context condition). The other half
were preceded by the alternate context key in the recognition session (different context condition).
The context key of each distractor melody was arbitrarily selected from the two possible keys for the
distractor melody, independent of any condition in this experiment. The rate of presentation for
context and the duration of the tonic tone were identical to Experiment 1.
All stimuli were generated and stored as standard MIDI file by the MIDI sequencer software
"Performer" on Macintosh computer, and presented on a Yamaha tone generator MU50. A customized
program generated with PsyScope software was used to control stimulus presentation and to record
responses. Stimulus timbres were similar to Experiment 1, that is, standard melodies and distractors in
piano timbre (Grand Piano), and context scales and tones in organ timbre (Church Organ). The stimuli
were presented at a comfortable listening level that could be adjusted by the subjects. Subjects were
tested individually in a soundproof room.
Procedure
Learning Session. Each pitch sequence with context was presented five times in succession. As noted
above, the same key context preceded a standard melody every 5 times. Subjects were instructed to
listen to the melodies and to remember them for a subsequent memory task.
Recognition Session. After a 10-minute retention interval, recognition memory was tested in a
forced-choice task. Each trial began with a scale context, followed by the standard or distractor
melody with a tonic tone. Subjects were asked to indicate whether the melody was "old" or "new" by
pressing the "1" and "3" keys on a keyboard as quickly as possible. In other words, they listened to a
melody and indicated whether or not they had heard the melody in the preceding learning session.
They were also asked to ignore the context. All subjects performed the 48 trials in random order.
Result
The main variables in this experiment were the hit rate (HIT), the false alarm rate (FA) and the
recognition probability, as in Experiment 1. However, in this case recognition probability in each
context condition was calculated using the mean FA, because FA could not be calculated for each
context condition.
Figure 2 shows the mean recognition probability for the three subject groups and the two types of
context condition. These recognition probability data were analyzed in a two-way analysis of variance
[3 (Subject Group) x 2 (Context Conditions)], in which only a main effect of context condition was
obtained, F (1, 36) = 32.14, p < .0001. Subjects performed significantly better in the same key
condition (M = .52) than in the different key condition (M = .34). The main effect of subject group
was not significant, F (2, 36) = .92, p > .90. There was no significant difference in be recognition
probability between AP (M= .44), EN (M= .42), and IE groups (M= .42). There was a significant
interaction effect, F (2, 36) = 3.49, p < .05. This interaction reveals that the difference between the AP
group's recognition performance in same and different context conditions was much large than that of
EN and IE groups. The simple main effect of context condition for each subject group was also
significant, HSD = .08, p < .05. However, the simple main effect of subject group was not significant
for any of the context conditions.
Discussion
In the long-term recognition task, there were significant differences between same and different
context conditions for all three listener groups. Not only EN listeners' performance, but also IE and
AP listeners' performance were influenced by context key. Hit rates in the different key context were
near chance level (.50 for AP listeners, .58 for EN listeners, and .61 for IE listeners). That is, they
didn't seem to be able to recognize a melody preceded by a different key context even though it was
the same melody they had heard in the learning session. Because the FA rate was not high (about .22
for all subject groups), the difficulty of this experiment task could not account for such low hit rates. It
is probable that listeners perceived the same melody differently as a function of differing key contexts
which led to different pathways of tonal organization for the same melody.
Although in Experiment 1 we have the main effect of subject group, however, here in Experiment 2
such effect has not occurred, adding new information to look into. In short-term recognition,
experienced listeners including EN and AP groups had the advantage of their musical ability in
remembering tones by linguistically labeling them. However, in this situation, that mean retention
interval was much longer (about 25 minutes), and even experienced listeners could not maintain
linguistic labels in memory for that duration. Listeners in both of these groups seemed to depend on
similar memory traces, based on the tonal organization of standard melodies. It has been shown that
recognition of pitch patterns based on familiar melodies are high not only for experienced listeners,
but also for inexperienced listeners as well (Attneave & Olson, 1971; Dowling & Fujitani, 1971;
Smith et al., 1994). The recognition condition in this experiment may be similar to that of hearing
familiar melodies, because subjects had previously heard the standard melodies repetitively. The
difference between subject groups was evident only in the interaction between subject group and
context. The differences between recognition probabilities in each context condition were about .11
for EN and IE listeners, and .28 for AP listeners. This result is difficult to interpret as it suggests AP
listeners didn't depend on their absolute pitch ability, but rather used relative pitch information more
than non-AP listeners.
General Discussion
The present study demonstrated that recognition of ambiguous melodies was influenced by context
key in a long-term task, but not in a short-term task. These findings suggest that an ambiguous melody
contextualized in one key can be perceived differently from the same melody contextualized in
another key, when there is long retention interval between them. On the other hand, it was natural that
there was no context effect in the immediate recognition task as subjects were simply detecting any
change of pitch. This difference in the effect of context between short-term and long-term memory
seems to be analogous to the difference in the contribution of contour and pitch interval information
between short-term and long-term memory. However, as yet we don't have any positive evidence that
context has no effect on ambiguous melodies in short-term memory. Thus, it is necessary to examine
the effect of context on a short-term recognition task that is not dependent on subjects' detection of
pitch change.
The effect of context key shown in Experiment 2 could be considered part of the general context
effect in "state-dependent memory". However, we should note that this effect occurred in the special
case of melodies that could be perceived in two possible keys. If this effect could be explained by
simple state-dependent memory, recognition of "unambiguous melody" would be influenced by
context key. However, it is questionable whether a melody with only one possible key is recognized
as a different melody when it is preceded by a different key context.
The strategies used by experienced listeners to perceive and recognize pitch sequences is not yet fully
understood. In Experiment 1, there was an expected difference in recognition performance between
AP listeners and EN listeners. In this type of immediate recognition task, the strategy AP listeners use
should be clearly different from that of non-AP experienced listeners. However, this difference in
pitch perception strategy was not be found in Experiment 2. On the contrary, the opposite result to our
expectations was observed. The recognition performance was similar for all three subject groups.
Furthermore, AP listeners showed a tendency to depend more on relative pitch information or
information based on tonal organization (rather than absolute pitch information) than EN listeners. It
likely seems that AP listeners are best able to use their AP ability while rehearsing pitch names in
short-term tasks, and when having remembered the pitch names in long-term memory. In Experiment
2, AP listeners seemed unable to remember enough pitch names for each melody to use them in the
recognition session. Presumably, If AP listeners were given enough opportunities and time to store the
pitch names for each melody, their recognition performance would improve.
Inexperienced listeners recognized fewer melodies in Experiment 1. This result converges with
previous findings that the recognition task which requires the ability to accurately recognize intervals
is difficult for inexperienced listeners (Bartlett & Dowling, 1980; Cuddy & Cohen, 1976; Trainor &
Trehub, 1992). It may not be surprising that inexperienced listeners perform as well as experienced
listeners in Experiment 2, and that their performance was influenced by context key. The nature of the
task in Experiment 2 doesn't require special music ability. Every listener seemed to depend on
auditory memory traces which are not related to musical training. Yoshino (1998a) suggested that
inexperienced listeners can also carry out tonal organization as experienced listeners do, and that they
can interpret the key of a melody, at least, implicitly. Inexperienced listeners seemed to tonally
organize a melody according to the context key, and consequently perceived two identical melodies
presented with different contexts as different.
References
Abe, J., & Hoshino, E. (1990). Schema driven properties in melody cognition: Experiments on final
tone extrapolation by music experts. Psychomusicology, 9, 161-172.
Attneave, F., & Olson, R. K. (1971). Pitch as medium: A new approach to psychophysical
scaling.American Journal of Psychology, 84, 147-166.
Bartlett, J. C., & Dowling, W. J. (1980). The recognition of transposed melodies: A key-distance
effect in developmental perspective. Journal of Experimental Psychology: Human Perception &
Performance, 6, 501-515.
Cuddy, L. L., & Cohen, A. J. (1976). Recognition of transposed melodic sequences. Quarterly Journal
of Experimental Psychology, 28, 255-270.
Dowling, W. J. (1978). Scale and contour: Two components of a theory of memory for melodies.
Psychological Review, 85, 341-354.
Dowling, W. J. (1986). Context effects on melody recognition: Scale-step versus interval
representations. Music Perception, 3, 281-296.
Dowling, W. J. (1991). Tonal strength and melody recognition after long and short delays. Perception
& Psychophysics, 50, 305-313.
Dowling, W. J. (1994). Melodic contour in hearing and remembering melodies. In R. Aiello(Ed.),
Musical perception. Oxford: Oxford University Press.
Dowling, W. J., & Fujitani, D. S. (1971). Contour, interval, and pitch recognition in memory for
melodies. Journal of the Acoustical Society of America, 49, 524-531.
Dowling, W. J., Kwak, S., & Andrews, M. W. (1995). The time course of recognition of novel
melodies. Perception & Psychophysics, 57, 136-149.
Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. Oxford: Oxford University Press.
Smith, J. D., Kemler Nelson, D. G., Grohskopf, L. A., & Appleton, T. (1994). What child is this?
What interval was that? Familiar tunes and music perception in novice listeners. Cognition, 52, 23-54.
Trainor, L. J., & Trehub, S. E. (1992). A comparison of infants' and adults' sensitivity to Western
musical structure. Journal of Experimental Psychology: Human Perception & Performance, 18,
394-402.
Yoshino, I., & Abe, J. (1996). Cognitive modeling of the process of tonal organization in melody
perception. International Journal of Psychology, 31, 51.
Yoshino, I. (1998a). Can non-musicians interpret the key of a melody? Proceedings of the fifth
international conference on music perception and cognition, 225-229.
Yoshino, I. (1998b). Key interpretations of the melodies composed in various periods. Journal of
Music Perception and Cognition, 4, 81-99. (in Japanese)
Back to index
Proceedings abstract
MUSIC IN EVERYDAY LIFE
G. Bertling
Rolf.G.Bertling@ruhr-uni-bochum.de
Background:
It is known for a long time that music is perceived and dealt with in a
different way by musicians than by nonmusicians. Whether these differences also
concern daily life activities is not yet known.
Aims:
method:
Results:
After matching for age and sex, mean age was 28 years. Mean time of listening
to music per week was about 16 hours in both groups . While nonmusicians
listened to music significantly more for "entertainment", as "background" and
in "social situations" (dancing, disco etc), musicians dealt more with music
for educational and professional purposes. No differences could be found
concerning the number of sound recording and storage media.
Musical preferences of musicians reflected the style they were mostly occupied
with professionally (Classic, Jazz). In contrast nonmusicians preferred Pop,
Rock and modern Pop styles.
Conclusions:
The results implicate that for musicians there is no clear difference between
professional and private aspects of dealing with music, while for nonmusicians
music predominantly means "fun".
Back to index
Proceedings abstract
MUSIC IN EVERYDAY LIFE OF PATIENTS WITH MENTAL DISEASE
Rolf.G.Bertling@ruhr-uni-bochum.de
Background:
Clinical observation shows that music plays an important role in everyday life
of patients with mental illness. But there are very few empirical data
concerning psychiatric patients dealing with music in daily routine.
Aims:
This study investigates the musical education, musical skills and use of music
in daily life of patients with mental disease in comparison to a sample of sane
people.
Method:
After matching for age and sex 131 psychiatric in-patients of different
diagnostic subgroups and 86 sane people (mean age 43 years) were applicated a
selfconstructed questionnaire.
Results:
Conclusions:
Musical education, musical skills and actively making of music are comparable
to the normal population. Concerning the receptive use of music there seem to
be different patterns between the psychiatric patients and the normal
population. This might indicate that psychiatric patients use music in a
specific mode.
Back to index
Proceedings paper
INTRODUCTION
It has recently been suggested that, "…music can be perceived by listeners as if it were equivalent to a person making a disclosure."
(Watt & Ash, 1998:37). The suggestion is formulated by analogy with spoken communication. For example, if I were to stand before
my peers and deliver this paper, the information that I would be giving would comprise not only of documented facts and ideas but
would also involve communicating something of myself - my nature or personality. Thus, although my principal intention may be to
maintain the listeners' interest through a cogent and logical oration, I would also be revealing some uniquely personal qualities as well
as some more general qualities that are capable of identifying me as belonging to particular categories. My voice is capable of
revealing my sex, my approximate age and my ethnic origin, for example. The qualities of my utterances, even in the absence of visual
clues, may reveal something of my present health or state of mind, as well as something of my personal character. It is assumed that
an audience can detect these qualities during the communication even if they are not specifically attending to such. The presumption is
that by the time I have finished speaking my audience will have not only heard the message I intended to convey but will also have
come to some understanding, or have formed some opinion, concerning ‘who I am'. In other words, apart from what I said, how did
I come across? Was I confident, honest ? or was I obviously concealing something: Have I doctored my experimental results or am I
just nervous? Do I sound ‘intelligent' or ‘interesting' ? do I like or care about the people I am addressing? And on a more personal
level, am I attractive ? and if so, to which sex? Do I sound trustworthy; would you buy a used car from me or let me go out with your
son or daughter? (Hendrix, 1997; Harris & Busby, 1998; Robinson, Obler, Boone, Shane, Adamjee & Anderson, 1998).
Watt & Ash (1998) report that, when forced to make choices between binary opposite adjectives to describe auditioned music, levels
of agreement between respondents appear significantly higher for adjectives assumed to be associated with descriptions of people than
those not normally associated with people. A comparable study using food stimuli rather than musical stimuli did not reveal the same
high levels of inter-respondent agreement. Thus, it was hypothesised that music has an action upon listeners similar to the actions that
the ‘qualities of a person' have upon listeners during spoken communication. In other words, music conveys an impression of its
‘self' or ‘identity' in addition to communicating whatever is ‘meant' by the actual compositional text. This represents a most
interesting departure from previous empirical work because the meaningful natures of these ‘disclosed' representations are
considered akin to psychological aspects of people rather than descriptions of the affective value of sound expressed using emotional
descriptors (e.g. Hevner, 1936; Wedin, 1972). Different musics, therefore, may be said to have different ‘personalities' that are
somehow distinct from the ‘message' actually conveyed by the musical text. The ‘message', to use the terminology of Meyer
(1956), is informed by embodied meanings (sonic stability, tension and resolution) and by designative meanings (suggestions of
symbolic references to extra-musical phenomena). The notion of disclosure meaning, however, rests less with the ‘message' and
more with the ‘musical informant'.
Adopting a forced-choice paradigm requires one to consider the degree to which binary opposite adjectives are both relationally
appropriate and commonly understood. Gross, Fischer & Miller (1989) note that, "…the basic organisational relation between
adjectives has generally been assumed to be antonymy." (p.92). The authors demonstrate experimental support for the relative strength
and perceptual salience of adjectives inferred from the amount of time taken to judge opposites, i.e. to identify antonyms. More
importantly, they provide evidence regarding the different levels of antonymic classification and suggest that there exists a set of
‘direct antonyms' which are most easily and readily identifiable as having relational opposition. In addition, it seems that these
strongest antonymic dimensions (direct antonyms) are also words that are used more frequently in spoken and written language.
In a preliminary study, 6 members of staff in a university department specialising in popular music were asked to rate the plausibility
of describing either people or music using the same 40 descriptive dimensions (binary opposite adjectives). These included the
adjective-pairs originally used by Watt & Ash (1998) and the direct antonyms identified in the study by Gross et al (1989). Two
questions were asked: 1) ‘In general, how often could MUSIC be plausibly described using the following dimensions?'; and, 2)
‘In general, how often could PEOPLE be plausibly described using the following dimensions?' Participants were required to circle
one response on a five-point Likert-type scale comprised of the possible responses ‘very often' ? ‘often' ? ‘occasionally' ?
‘not often' ? ‘hardly, if ever' for each of the forty dimensions in respect of both questions (music/people descriptions).
The least plausible dimensions to describe either people or music were: ‘inside/outside'; ‘clear/cloudy'; ‘dry/wet';
‘prickly/smooth'; ‘sweet/sour'; and ‘near/far'. All these dimensions were considered plausible descriptors less than
‘occasionally'. Repeated measures t ? tests revealed no significant statistical differences between respondents' plausibility ratings
when applying these dimensions to descriptions of people versus music. A further six dimensions were found to be highly plausible to
apply to descriptions of people but much less plausible to apply to descriptions of music. These dimensions were: ‘rich/poor';
‘friendly/unfriendly'; ‘male/female'; ‘patient/impatient'; ‘shy/outgoing'; and ‘honest/deceitful'. Responses indicated that
such dimensions were considered likely to be used to describe people more than ‘occasionally', but less than ‘occasionally' to
Individual participants gave between 4 and 42 responses when asked to name the musical styles auditioned
.
14 musical styles were immediately discarded on the basis that eleven or more respondents (i.e. at least 30%) had provided no answer
and that there was no consensus of opinion regarding stylistic labelling by the participants who did provide a response. Responses to
the remaining styles were put into 4 categories, arbitrarily entitled ‘correct' and ‘similar' ? the "positive pole", or ‘incorrect' and
‘no answer' ? the "negative pole". The experimenter determined the categorisation of responses with reference to the auditioned
stimuli themselves, rather than the manufacturer's labels (which were renamed according to responses given, if necessary). The
following were considered prototypical exemplars of particular popular music styles based on responses demonstrating a bias in
favour of the "positive pole": hillbilly; country; heavy metal; rock; rock ‘n' roll; blues-rock; blues; boogie-blues; swing band;
modern jazz; bossa nova; samba; latin pop; rap/hip-hop; funk; soul-funk; disco; and techno pop.
EXPERIMENTAL HYPOTHESES
1) When forced to choose between adjectival antonyms to describe musical stimuli, respondents will demonstrate significantly
higher levels of agreement in their preferences for person-like adjectives than non-person-like adjectives.
2) The adjectival dimensions associated with person-type attributions that demonstrate significant levels of participant
agreement will reveal clusters of stimuli that group popular music by developmental/stylistic similarity.
In the absence of adjectival dimensions considered particularly appropriate to descriptions of music, it is proposed that respondents
will suggest that there is some content to the musical information presented that can be better explained by reference to attributions
more commonly associated with descriptions of people. In addition, it is an a priori assumption that the significant person-type
attributions will group stimuli by stylistic traits.
Participants:
Participants (male = 14, female = 18) were volunteers from a variety of backgrounds between the ages of 20 and 62 ( = 39.22, ≥ =
11.98). No participants had studied music or psychology at advanced levels (i.e. in higher education) and none reported having
hearing difficulties that affected their enjoyment of music. All but one (a postgraduate student studying at a British university)
reported English to be their first/native language. Eight participants considered themselves to be musically active (i.e. played music or
sang in their leisure time). Four of these participants had received musical instruction in the past but none had studied music in an
academic setting since leaving school. Participants reported very broad ranges of personal tastes in music preference.
Fig. 1: The results of a cluster analysis conducted on the five dimensions assumed to lie within the person-type
attribution category.
Discussion:
The results of the adjectival attribution experiment indicate some support for experimental hypothesis 1, however, there remain
anomalies that require the null hypothesis to be retained at this stage. In the person-type category only one dimension has proven to
confound experimental hypothesis 1 in terms of the overall results. The dimension ‘rich/poor' was included in this category based
upon the results obtained in the preliminary adjective survey. In retrospect, it is conceded that there is a serious possibility that
respondents may have interpreted this dimension in differing manners and that this might explain why substantially less agreement
was demonstrated in terms of preferred descriptor. There is semantic confusion evident within this dimension that is
context-dependent and which was overlooked in preparation of the final experimental design. In terms of person-type attributions, the
dimension ‘rich/poor' refers to relative affluence. However, there is clearly a strong alternative attribution that may be made to the
dimension and which could perhaps be situated within the context of ‘value-judgement'. Thus, as a dependent variable, this
dimension is probably invalid since it cannot be guaranteed to be measuring the attribution intended. Indeed, a few participants who
expressed criticisms of the stimuli in terms of ‘musical pleasure' suggested that the synthetic nature of the sounds was ‘poor' and
that this was reflected in their attributions within the dimension ‘rich/poor' specifically. As regards the non-person-type dimensions,
at least one (smooth/prickly) and possibly two (sweet/sour) dimensions may have confounded the experimental design. In the case of
the latter, it is possible that ‘sweet' may actually be considered an appropriate descriptor of music by musically unsophisticated
respondents, thus incorporating a natural bias toward the adjective. It is possible that the experimental stimuli may have influenced
this result given that each stimuli was in the major mode and harmonically consonant. This may perhaps be evidenced by the fact that
80% of the significant deviations from chance attribution were in favour of ‘sweet' as opposed to ‘sour' (cf. ‘smooth/prickly' for
which significant deviations from chance were split equally). In the case of ‘smooth/prickly', the feedback from one participant was
particularly notable. After suggesting a tendency to think about the musical stimuli as if they represented different types of people, this
participant then stated that it seemed more straightforward to choose an adjective from some dimensions because it was natural to talk
about people using certain descriptors. An example cited in respect of this natural tendency was that of ‘smooth/prickly'!
The cluster analysis performed on the five dimensions assumed to be associated with person-type attributions reveals partial support
for hypothesis 2. Due to the reservations expressed above concerning validity, the dimension ‘rich/poor' was omitted from the
cluster analysis. The grouping of stimuli, illustrated in figure 1, revealed three main clusters. With reference to the
GENERAL DISCUSSION
This investigation has been termed ‘exploratory' in deference to contemporary criticisms of the traditional positivistic paradigm
(e.g. Persson & Robson, 1995). The empirical design has been reported with consideration given not only to the interpretation of
quantitative data but also to the qualitative insights provided by participants. Because of this, it is possible to make strong
recommendations for improving experimental designs aimed at investigating the concept of disclosure meaning in music. The present
research process has revealed that further scrutiny of the dependent variables is warranted. As noted in the previous section one cannot
confidently assume that adjectival dimensions will fit neatly into categories suggestive of either person-like attributions or
non-person-like attributions. For example, the dimension ‘rich/poor' may refer to a person's wealth but may equally refer to some
‘aesthetic value' associated with the description of music. The justification for empirically determining the classification of
adjectival dimensions arose originally because of suspicions that it is all too easy to infer that little or no musically plausible
attribution will be considered concomitant with certain descriptive terms where, in fact, there may exist an arbitrary, yet conventional,
discourse that incorporates such terms. The first preliminary study supported criticisms of some of the dimensions used in the original
study by Watt & Ash (1998). However, the results of the present attribution experiment and the participant feedback discussed in the
preceding section imply that it is not sufficient to survey only ‘musical experts' when attempting to classify adjectival dimensions
into categories intended to delineate particular attributions. If musically unsophisticated individuals represent the population under
investigation, then there is clearly a requirement to assess the extent to which descriptive attributions are considered plausible
indicators of the phenomena under investigation by musically unsophisticated respondents themselves. As previously noted, some
participants do use words like ‘smooth' and ‘prickly' to describe people, and possibly ‘sweet' to describe music: the quantitative
data evidences sympathy with this.
One of the main emphases of the present investigation has been to explore the possibility that disclosed meanings in music are capable
of delineating the stylistic similarities of different musics. There does seem to be at least partial support for the assertion that
attributions of person-like qualities to music do group music by style. Further, it is suggested that the level of support demonstrated for
this assertion rests upon the basis of musical perceptions that are clearly derived from the general typicality of the stimuli heard as
opposed to the specific characteristics of musical extracts from individual works. Thus, in the absence of contextual information, such
as the idiosyncrasies of expression that may give rise to ‘extra-musical' information about a piece of music, it is suggested that the
information disclosed is textual, i.e. contained within the musical construction. In other words, the agreement demonstrated by
participants in their attributions of person-like qualities to music, generally speaking, is based upon perception of the text, i.e. the
heard characteristics of the musical stimuli. Thus, the attributions made are not dependent on any preconceptions informed by the
REFERENCES
Berz, W.L. & Kelly, A.E. (1998) Research note: Perceptions of more complete musical compositions: An exploratory
study. Psychology of Music 23, 39-47.
Gross, D., Fischer, U. & Miller, G.A. (1989) The organization of adjectival meanings. Journal of Memory and Language
28, 92-106.
Harris, S.M. & Busby, D.M. (1998) Therapist physical attractiveness: An unexplored influence on client disclosure.
Journal of Marital and Family Therapy 24.2, 251-257.
Hendrix, K.G. (1997) Student perceptions of verbal and nonverbal cues leading to images of Black and White professor
credibility. Howard Journal of Communications 8.3, 251-273.
Hevner, K. (1936) Experimental studies of the elements of expression in music. American Journal of Psychology 48,
246-268.
Juslin, P.N. (1997) Perceived emotional expression in synthesized performances of a short melody: Capturing the
listener's judgement policy. Musicae Scientiae 1.2, 225-256.
Meyer, L.B. (1956) Emotion and Meaning in Music. Chicago: Chicago University Press.
Persson, R.S. & Robson, C. (1995) The limits of experimentation: On researching music and musical settings.
Psychology of Music 23, 39-47.
Robinson, K.A., Obler, L.K., Boone, R.T., Shane, H., Adamjee, R. & Anderson, J. (1998) Gender and truthfulness in
daily life situations. Sex Roles 38 (9-10), 821-831.
Stuessy, J. (1994) Rock and Roll: Its History and Stylistic Development [2nd edn.]. Englewood Cliffs (NJ): Prentice Hall.
Umemoto, T. (1990) The psychological structure of music. Music Perception 8, 115-128.
Watt, R.J. & Ash, R.L. (1998) A psychological investigation of meaning in music. Musicae Scientiae 2.1, 33-53.
Wedin, L. (1972) A multidimensional study of perceptual-emotional qualities in music. Scandinavian Journal of
Psychology 13, 241-257.
Back to index
Proceedings abstract
Anna-Karin Gullberg
anna-karin.gullberg@mh.luth.se
Background:
The fact that rock music sounds different within and without University College
of Music has been known and discussed for years among teachers, musicians and
researchers in the field of music, as well as in the field of popular culture
and youth studies. It seems that methods of learning and practising are very
influential on expression, aesthetics and music taste. Few researchers have
studied this question empirically and the body of knowledge is not so
comprehensive.
Aims:
This study investigate how two ensembles differs in learning strategies,
expression and music taste when it comes to create and perform a rock song. An
important part of the project is to pinpoint crucial turning points during the
learning process and to compare the way of working between the groups.
method:
Two groups of musicians, one from a University College of Music and the other,
a formally non-educated rock group were asked to create a rock song from a
melody with lyrics composed especially for this project. Each group had one day
in a recording studio to record and mix their versions of the song. Information
concerning thinking and acting before and during the recordings was collected
by interviews and the two music making situations were observed and
video-taped.
Results:
The version of the music students was a jazzy pop tune and the Rock group made
a hardcore tune. Data from interviews and observations showed separate
attitudes to music making and to learning strategies when playing in an
ensemble. The method of the music students was distinguished by being
thoroughly democratic and polite. The rock group learned by the singer who did
all the arrangements.
Conclusions:
This would suggest that the way one is performing music and which genre that is
to be favoured is motivated by the way one has gathered knowledge about music.
With this in mind, one can wonder if higher music education is giving the
students the opportunity to develop the qualifications needed for becoming an
open-minded and professional teacher and performer in music.
Back to index
Proceedings paper
with an unclear melodic line, so it may cause a stronger impairment of mental tasks performance.
COGNITIVE PROCESS IN READING COMPREHENSION
Learning by studying texts is a common way to acquire new information. As an important part of our
daily cognitive activity it should be done as effectively as possible. In the process of text
comprehension new information is referred to the conceptual system of representations stored in
memory. Acquisition and storage of declarative knowledge is the domain of semantic memory which is
responsible for attaching meaning to the perceived information and for the process of coding it. Craik
and Lockhart (1972) state that the extent and level of information processing influences the
effectiveness of storage and retrieval. The storage of information in semantic memory follows the
analysis and transformation of cognitive structures. These processes take up much attention but provide
the best results of studying. Text comprehension is due to the consequent construction of mental
representation of the text. The process of studying a text requires fixed attention. Therefore it is easy to
be disturbed by the musical background that partly occupies the attention span.
At a certain grade of task difficulty the speed of mental operations of intelligent people is faster than of
those who are less intelligent. They can also resist distraction more easily. Consequently, their attention
span is bigger because of the larger capacity of the information processing system and their
performance of mental operations is more efficient. Despite the fact that they pay their attention partly
to the distracting stimuli, they can deal with mental tasks better than less intelligent people.
MUSIC "PERCEPTION"
Music is an important factor of the youth identity development. Listening to music meets their need for
stimulation. When held as auditory background of the prior task, music is perceived unintentionally in
form of passive reception (Jordan-Szymańska, 1991) which is rather hearing the music than listening
to it. Hearing the music brings about auditory impressions experienced without any involvement.
However, when auditory stimuli are attractive and easy to attend to, they distract attention from
performing the main task. In such cases mental resources which could be put into dealing with the task
are consumed by processing irrelevant information. Nevertheless, unintentional perception - reception -
doesn't involve mental resources to such a degree as conscious efforts do. Musical elements (e.g.
melodic theme) that strongly attract attention are processed focally and thus most thoroughly in working
memory (Sloboda, 1985). This fact is of great importance in terms of text comprehension with musical
background. Relations within the melodic line can be perceived by means of attention. That's how the
melody can be recognised and compared with the previous or accompanying tone sequences within the
piece of music. Other melodic lines form the harmonic background that is not processed focally. A
listener "drifts" with the melody that attracts his attention and follows the appearance of sound relations
(Sloboda, 1985). In improvisational music the element of predictability is only marginal or even
unnoticeable.
Some authors claim that music with a clear melodic line is easier to remember than an incoherent
progression of tones. The perception of melody is optimal when the golden mean between its coherence
and variability is found (Jordan-Szymańska, 1990).
Musical sequences must be redundant to some extent so that the perceptual organisation of sounds
would be possible. The melody contour can be slightly modified but its general shape remains the same.
Melodic contour is an important element of musical structure as it integrates the piece and enables the
holistic perception of music (Patel and Peretz, 1997). Unclearness of the melodic line disturbs the
perception of the piece and makes it more difficult to be remembered. As to melodic music, a piece
based on clear melorhythmic motives is perceived gradually as a whole structure (Farnsworth, 1958 see
Wierszyłowski, 1979). The melodic theme becomes a figure and all the other musical elements make
up its background.
The melodic structure in improvisational music is unclear. To the listener the relations between sounds
seem to be more accidental. The element of foreseeing the following tones is only marginal so the
holistic perception and cognitive organisation of the melody need more effort. For those who are not
experienced in listening to the music that comprises improvisation, its perception is much harder than
that of melodic music. To fix attention on it, one needs a much greater mental effort than in the case of
melodic music. The way in which music influences the effectiveness of mental operations depends on
both the types of music and the task.
THE INFLUENCE OF MUSIC ON TASK PERFORMANCE
The research run hitherto indicates that the performance of complex mental tasks with musical
background is impaired, compared with those performed in silence. The results of the research on this
problem diverge. Furnham and Bradley (1997) state that in introverts musical accompaniment of
reading affects the later recall of the information acquired. According to the study of Freebourne and
Fleischer (1952 see Furnham and Bradley, 1997) in such conditions there's no difference in terms of the
level of performance between people varying as to their intelligence level. This finding seems to be
incompatible with Kahnemann's theory of limited mental resources (1973) and with the assumption that
intelligence determines the size of mental resources. Of course, peaceful and quiet baroque music is
widely believed to improve the effects of foreign vocabulary learning. However, a vast majority of
young people of today do not listen to such music at all. Eysenck's theory claims the nervous system of
introverts to be sensitive to the stimuli overloading whereas extraverts need an environment rich in
stimulation. The past research proves that the reaction of extraverts to the distracting stimuli is weaker
than in the case of introverts. Due to extraverts' low activation level it's easier for them to resist
distraction. According to Eysenck's theory auditory stimulation in form of musical background is
believed to disturb introverts' work significantly. It is also supposed to raise the level of extraverts'
performance by helping them to reach their optimal activation level for dealing with complex mental
tasks. On the other hand, Koneĕni (1982 see Furnham and Bradley) assumes that each kind of music
takes up mental resources and can be detrimental to all subjects regardless of their extraversion level.
even impossible. Music with a clear and distinct melody is easier to remember than an incoherent
progression of sounds whose perception needs certain knowledge and listening habits. A melody is
more than a sequence of sounds, similarly to a text whose sense comes not only from the semantic
intent of particular words. In a melody there is a hidden order and harmony which enables its holistic
perception.
HYPOTHESES
In this research it was assumed that the musical background impairs the performance of reading
comprehension test in comparison to silence. Melodic music was expected to disturb reading
comprehension to a wider extent than improvisational music. As said before, intelligent students were
expected to show a higher level of task performance than less intelligent ones. This hypothesis concerns
general intelligence as well as verbal. It was anticipated that there is a positive correlation between the
level of intelligence and the effects of reading comprehension. Finally, as regards both musical
experimental conditions, extraverts were expected to score higher than introverts at the reading
comprehension test with musical accompaniment.
SUBJECTS
The Ss were 111 18-year-old secondary school students of the last (fourth) grade from classes with the
intensive programme in foreign languages. They were selected by lot and placed under the following
experimental conditions:
1. melodic music - with a distinct melodic line
2. improvisational music - with an unclear melodic line
3. no music
42 students
35 students
34 students
VARIABLES
The main independent variable whose impact is investigated in the study is the type of experimental
conditions - musical background (melodic or improvisational) and silence. It seems that general
intelligence which determines the amount of mental resources and extraversion as a factor that regulates
the optimum intensity of stimulation are the variables that determine differences in the level of task
performance. Another variable that can't be left out in the analysis is verbal intelligence as a set of
specific abilities regarding the use of language.
A dependent variable is reading comprehension test performance defined as the accuracy score.
EXPERIMENTAL PLAN
At the first stage of the research the subjects were asked to perform the experimental task which had a
form of a reading comprehension test and was 40 minutes long. The test was to assess the
comprehension of the article „Emotional mimics" by Paul Ekman as an example of verbal reasoning.
The experiment was carried out in a school classroom under the conditions mentioned above.
Afterwards the subjects were provided with the post-test questionnaire which investigated their listening
habits and their attitude towards the music they were exposed to.
The second stage of the research was completed after three weeks. It comprised three tests in the
following order:
Table 1. Results of the reading comprehension test performance in the subgroups discriminated according to the level of
extraversion and general intelligence. (+) symbolises higher level of both variables, (-) is a symbol for the lower level of
both extraversion and general intelligence.
The effect of both extraversion and general intelligence taken simultaneously with the test results
appeared only as a tendency (F=3,24; p<0,07). The main effect of the type of musical background and
task performance was significant (F=4,72;p<0,01) and the no music group scored the highest on the test
(p<0,05)(Table 2.).
CONDITIONS TEST SCORE
melodic music 25,45
improvisational music 25,97
no music 30,47
Table 2. Results of the reading comprehension test performance under particular experimental conditions.
The results indicate that the level of reading comprehension correlates significantly with verbal
intelligence in the whole sample (0,32; p<0,05), and in each particular experimental group considered
separately (0,27; p<0,05). The correlation between the experimental conditions and the results of
reading comprehension also proved to be significant (0,26; p<0,05). However, as to the performance
there was no difference between the melodic and improvisational music conditions. The performance of
the subjects with a higher verbal intelligence under improvisational and no music conditions was the
best. In the presence of music the results of verbal reasoning were significantly worse than in silence,
no matter how distinct was the melodic line of the background music presented (melodic - no music:
p<0,01; improvisational - no music: p<0,04).
There was a significant interaction between verbal intelligence and the level of task performance
(F=12,16; p<0,0007). The subjects with high verbal intelligence under both no music and
improvisational music conditions turned out to score the highest on the test (Figure 1.).
Figure 1. Interaction between verbal intelligence, experimental conditions and the reading comprehension test
performance
Their performance differed significantly from the performance of the subjects from the whole melodic
group and from those less intelligent from the improvisational group. As regards the groups dealing
with the experimental task in the presence of background music of both types, the mean results of the
task did not differ significantly. Nevertheless, it was observed that music with distinct melody might
impair the mental efficiency of the students with high verbal intelligence. The effects of their reading
comprehension don't differ statistically from those of persons with lower verbal intelligence.
DISCUSSION
The study indicated a significant difference between experimental groups that completed the task in
different conditions. The results of both "musical" groups were similar. However, verbal intelligence
turned out to be an important factor in the assessment of reasoning on verbal material with musical
background. The past research did not distinguish between background music with various types of the
melodic line.
The interaction of verbal intelligence with the performance of the experimental task and the fact that no
significant correlation between the effectiveness of verbal reasoning and general intelligence has been
found, indicate that in the process of text comprehension verbal skills are much more decisive than
general intelligence. General intelligence embodying cognitive processes that are part of reading
comprehension is measured with the skilfulness of reasoning on the graphic data. It does not seem to
influence the effectiveness of the acquisition of verbal information significantly.
The fact that the results of the test do not vary as to the level of extraversion suggests two
interpretations. In the sample extraverts might have prevailed which might have caused that the subjects
whose results that have been compared did not differ enough in terms of extraversion level. Another
possible explanation is that music was so distracting and the grade of task difficulty so high that the
stimulation was past its optimal level even for extraverts.
No significant difference between people of varying verbal intelligence has been observed. This fact
suggests that in the presence of melodic music even the peripheral perception of melody occupies
mental resources to such an extent that the amount left for the accomplishment of a prior task is too
little to enable high level of performance.
It can be concluded that in the presence of melodic music verbal intelligence had no influence on the
level of task performance. Presumably, the significant role of this variable was weakened by the distinct
melody. Even in more clever students the melody took up mental resources to such an extent that they
had no chance to perform better than less intelligent students. Improvisational music caused a
significant difference in the results of reading comprehension of students with higher and lower verbal
intelligence which indicates that as an unstructured stimulation improvisational music doesn't absorb
mental resources, particularly attention, to such an extent as music with a clear melodic line.
The study proves that a melodic background equalises the efficiency level of people with different
levels of verbal intelligence. Below a certain level of verbal intelligence the background music
influences the performance harnessing the effectiveness of mental processes comprised in the wider
process of text comprehension. Presumably, melodic music involves mental resources to such a large
extent that intelligence has no influence on the efficiency of task performance. On the other hand,
improvisational music doesn't attract attention so much and intelligent people do better than those with
lower mental supplies. The study suggests that music absorbs working memory and involves part of
attention span limiting mental resources ready to process the significant verbal stimuli.
The main hypothesis assuming that the presence of background music results in a drop in the quantity
and quality of work on the reading comprehension test has been supported. Individual differences in
extraversion and general intelligence didn't appear to affect verbal reasoning. The study implies that as
an unstructured background improvisational music doesn't take up mental resources, particularly
attention, to such an extent as music with a distinct melody. However, this effect seems to be
insignificant in the case of people with comparatively lower intelligence. In the research presented
above study habits and the attitude of the subjects towards the music which they were exposed to had no
effect on the task performance.
To keep the experimental plan and research clear it was indispensable to ignore such variables as
musical preferences, musical sensitivity and temperament factors regarding emotional reactivity. It
would be interesting to examine the influence of background music on people who are involved in
intensive musical activities and people whose contact with music is passive and rather accidental. The
study could not throw light on which particular cognitive processes are affected by background music.
However, it provides eager young listeners with the information on the disadvantages of making
studying compete with listening to music at the same time. The frequent disapproval of parents who
observe their children's study habits doesn't seem to be only empty preaching.
REFERENCES
Anderson, J.R. (1976). Language, memory and thought. Hillsdale: Erlbaum.
Baddeley, A.D. (1976). The psychology of memory. Oxford: Clarendon Press.
Barrett, P.T. i Eysenck, H.J. (1994) The relationship between evoked potential component
Back to index
Proceedings paper
Introduction
Investigators interested in music preference studies often face the problem of choosing a method of
collecting data. The area of music is very complicated in its nature and no perfect tools for measuring
preference are available. The variety of kinds of music, impossibility of their exact, univocal defining,
permanent evolution and changes within (and between) different genres are only a few difficulties
challenging a researcher at the beginning of a survey. Actually such disadvantages make every
thorough investigator start her/his study with a new, or at least freshly updated, method of assessing
the preference.
At present, there exist two main approaches to examining music preference (for a brief review see
Rawlings & Ciancarelli, 1997). The first one seems to be more obvious - if anybody is to investigate
music, a musical material should be used. According to that, recorded excerpts of music are rated by
subjects. This approach is criticised mostly because of rather arbitral choice of examples by
researchers. Going further, however, one can find some other flaws in this way of measuring
preference. Preparing musical examples has to be based on several criteria, which are very difficult to
be controlled altogether: the level of familiarity of every piece should be similar among subjects (and
between the examples), chosen piece should be as much representative for a genre in question as
possible, and so on. Therefore preference profiles measured with this method may greatly depend on
material used.
The other way of assessing the preference is using paper questionnaires. The Musical Preference Scale
devised by Litle and Zuckerman (1986) is probably the best-known representative of the approach.
Using such tools is undoubtedly simpler and less time consuming, hence more appropriate for
studying the whole music field; moreover the method is believed to be more objective (Christenson &
Peterson, 1988, Dollinger, 1993, Rawlings & Ciancarelli, 1997, Rawlings et al., 1998). However, the
investigators who choose to work with genre labels seem to be unaware (or to neglect) of a serious
disadvantage, which is a high likelihood of collecting rather declarative responses than information
about real preferences. Besides, usually a decision what genres or categories should be included in a
list, or how "broad" they should be, is also arbitral.
There are some other difficulties connected with the preparation of testing material, which are not
specific to any of above methods, e.g. number of excerpts or genres constituting the tool. Christenson
& Peterson (1988) note examples of studies examining music preference by dividing popular music
into only five categories.
Thus it seems interesting to directly compare the two methods of measuring music preference. Such
comparison could confirm the results of Müller (1998), stating that in general verbal preferences are
higher. This could also test few modifications and thoroughly investigate some flaws of both tools.
Many researches investigated relationships between music preferences and personality dispositions
(e.g. Litle and Zuckerman, 1986, Dollinger, 1993). Personality is also the element of the interactive
theory of music preference by LeBlanc (1982), where it is considered not only to influence music
interests as such, but also to modify (or even to protect from) the external, environmental influences.
It is worth examining then, if any of personality dimensions are predictors of genre labelling
knowledge. For example, openness to experience was found to correlate with a general preference for
a wide range of music types (Rawlings & Ciancarelli, 1997, Rawlings et al., 1998). This factor may
be also responsible for a better acquaintance with genre labels. Additionally, the study allows looking
for a relationship between personality and a similarity of preference profiles assessed with "paper"
and "music" tools.
Method
Subjects
Subjects were 17-18 year old students from a secondary school. Because of the two stage research
design, results of only those who completed all tasks (82 students - 58 females, 24 males) were
included into analyses.
Measures
Genre labels list. Polish questionnaire of music preference based on the updated version of Musical
Preference Scale (Litle & Zukerman, 1986, Rawlings & Ciancarelli, 1997) was developed. MPS was
independently modified by music store employees in order to best fit in Polish market. Many genres
were eliminated as completely not existing in Polish culture (e.g. various kinds of country music
created one country item). A lot of items were added instead - not only as a result of culture
differences (like dividing metal music into several genres), but also because of constant changes and
evolution within "youth" music. Examples given as a help for correct recognition of the particular
genres were also updated.
Moreover, questions about preference for "general" kinds of music (like rock, classical and so on)
were left out as not suitable to the research. The only general category (with several subcategories)
was techno music.
The final version of the questionnaire consisted of 72 genres and 7 techno subgenres (see Appendix
1).
Music excerpts. Music examples corresponding to the items of the questionnaire were chosen using
several principles of selection. First of all, every selected piece had to be the best representative of a
specific genre. This was the precondition of a possibility of comparing the methods. Very important
thing was also keeping in mind that not the whole production of a band connected to a specific area of
music is characteristic for that area (let Metallica be the example here - will anybody recognise the
band as the leading thrash metal group after listening to the recent concert with the symphonic
orchestra?). Compositions also should have been unknown to the listeners.
Technical reasons made 3 examples unavailable so 75 pieces were recorded on a cassette tape (see
Appendix 2).
Excerpts were about 30 second long, fading-out at the end, with 2 seconds of silence following each
of them. Their order was randomised to avoid examples of similar music went together. The whole
"sound test" lasted about 40 minutes.
NEO-FFI. Costa and McCrae's NEO-FFI (1992) adaptation is the only questionnaire of the five-factor
model of personality translated and normalised in Poland (Zawadzki et al., 1998).
Procedure
Subjects were tested in groups of 25-30 persons. During the first meeting they completed both
NEO-FFI and music preference questionnaire. They were instructed to rate their preference level for
every genre (not for any band or composer) included in the list using 7-point Likert scale (ranging
from like very much to dislike very much, with indifferent as a mid-point). Additionally an unknown
answer was possible.
Some weeks later subjects rated the recorded musical excerpts with the same Likert scale.
Results
The data were analysed with ANOVA/MANOVA and Non-parametric Statistics modules of Statistica
for Windows (version 5.5). Because the comparison of two measuring methods was the main goal of
the experiment, direct information about preference profiles will not be included here.
Techno subgenres were not differentiated well by subjects - in almost every diagnostic case these
items were rated on the same, case-specific, level. Therefore they were excluded from analyses and all
results concern 70 genres.
Analysis of variance showed that differences between "paper" and "sound" ratings appeared
significant in 39 genres. 23 of them differed at p<0.001 level (F ranging from 74.56 to 13.6), for 7
items p<0.01 (10.93>=F>=7.52), and for the remaining 9 - p<0.05 (7.52>=F>=4.03). A certain
number of other genres seemed to produce similar relationships, though they were not statistically
significant. The significance of differences in all but 2 above genres was confirmed by Wilcoxon test
of pair order.
From among 39 items in question, 31 declarative responses were higher than corresponding opinions
about the musical material. Moreover, most of the remaining 31 genres showed a tendency of that
kind.
Almost magic "31" appeared once again as a number of genres rated positively in the questionnaire;
13 of them were also liked in the real music test (so were another 4 excerpts).
Letting the subjects not rate unknown items in the questionnaire allowed investigating the level of
genre labelling knowledge among youngsters. First, less than the half of the respondents indicated
their attitude towards few genres (gothic rock/cold wave, industrial rock and all jazz categories). On
the other hand, individual differences in this matter were hypothesised to correlate with personality
dispositions. Unfortunately, no relationship was found between any of NEO-FFI factors and number
of known genres.
Personality dimensions, as assessed with NEO-FFI, didn't relate to a stability of answers between two
presented methods of measuring music preference. In very few cases high and low scorers on a
personality subscale changed their ratings in a different way.
Discussion
The main conclusion drawing from the results is consistent with the note by Müller (1998)
-declarative preferences are generally higher than preferences assessed with musical material. Some
speculative considerations are worth putting forth to reinforce the finding. It seems reasonable to say
that some labels names or a category that they belong to, made them be rated more severely. The
opinions about heavy metal, American metal (two "softer" subcategories of metal music), hard rock
(commonly associated with "heavier" playing), and world music (belonging to disliked folk/ethnic
music category) most probably were lowered just because of the labels. The 4 corresponding musical
examples ratings were significantly higher. It may be supposed then, that declarative responses
sometimes concern the concepts, which are not well defined.
The very similar inference (although the results contrast with the above) may derive from considering
musicals and movie music items. The differences between "paper" and "sound" answers appeared to
be the highest ones here - this was very likely caused by too broad (thus not clear and precise) terms;
preference for these kinds of music genres probably reflects attitudes towards main themes or
leitmotifs. Therefore "plain" music is not liked so much.
On the other hand, dividing music into more and more narrow genres makes their precise
identification very difficult. That might be the reason why most subjects in the presented study didn't
know any of jazz subcategories. Commonly jazz is... jazz and no deeper knowledge in this area is
obtained.
What is better then? Gaining almost-surely-false (or not-surely-true) results by creating broad
categories or agreement for missing some information at all? Anyway, a necessity of rating general
heavy metal genre for example (being in fact the incorrect label of metal music and consisting of such
miscellaneous subgenres as trash, death, gothic or American metal), may cause certain uneasiness or
displeasure in people familiar with that type of music.
The possibility of not rating unknown genres seems to be valuable solution. Thanks to the option
subjects were not forced to choose an answer and probably much fewer indifferent rates were used.
Second modification of MPS, namely broadening the Likert scale in its negative end, allowed subjects
to grade their negative opinions. They didn't have to choose between indifferent and don't like it only -
negative attitudes are for sure as gradable as positive ones.
Hopefully these two modifications resulted in much reliable data.
Additional information about another 2 genres used in the study confirms earlier findings about more
positive attitudes towards known music (e.g. North & Hargreaves, 1995). The subjects undoubtedly
knew disco and hip-hop excerpts and this familiarity caused higher ratings. This observation
emphasises the importance of levelling the familiarity of all used pieces.
Unexpectedly no relationships between personality and the range of known genre labels were found.
In spite of the fact that openness to experience correlates with higher number of preferred styles of
music, broader knowledge of labels is not related to this factor. So preferences for wider range of
music types are independent of a known genres pool size.
To summarise, preference profiles measured with genre label list are usually higher than those
assessed with music excerpts.
Thanks to the comparison, paper questionnaires seem not to be so much better methods of measuring
as their advocates claimed it. Especially trouble with a breadth of used terms and impossibility of
precise definitions of genres vs. subgenres make the tool far from ideal.
Of course it is also almost impossible to create an ideal music excerpts test. Too many variables
would have to be controlled at once to secure similar level of examples familiarity, their
representativeness, and so on.
Therefore, it seems very important not to choose a measuring tool only on the basis of investigator
preference for a given method.
References
Christenson, P.G. & Peterson, J.B. (1988). Genre and Gender in the Structure of Music Preferences.
Communication Research, 15, 3, 282-301.
Dollinger, S.J. (1993). Research Note: Personality and Music Preference: Extraversion and
Excitement Seeking or Openness to Experience? Psychology of Music, 21, 73-77.
LeBlanc, A. (1982). An Interactive Theory of Music Preference. Journal of Music Therapy, 19, 1,
28-45.
Litle, P. & Zuckerman, M. (1986). Sensation Seeking and Music Preference. Personality and
Individual Differences, 7, 4, 575-578.
Müller, R. (1998). Young People's Distinction Between Verbal and Sounding Preferences - An
Indicator of Musical Literacy. International Annual Meeting of the German Society for Music
Psychology. The Musicians Personality. Proceedings: Schedule and Abstracts. Universität Dortmund.
North, A.C. & Hargreaves, D.J. (1995). Subjective Complexity, Familiarity, and Liking for Popular
Music. Psychomusicology, 14, 77-93.
Rawlings, D. & Ciancarelli, V. (1997). Music Preference and the Five-Factor Model of the NEO
Personality Inventory. Psychology of Music, 25, 120-132.
Rawlings, D., Twomey, F., Burns, E. & Morris, S. (1998). Personality, Creativity, and Aesthetic
Preference: Comparing Psychoticism, Sensation Seeking, Schizotypy, and Openness to Experience.
Empirical Studies of the Arts, 16, 2, 153-178.
Zawadzki, B., Strelau, J., Szczepaniak, P. & Sliwińska, M. (1998). Inwentarz osobowosci NEO-FFI
Costy i McCrae. Adaptacja polska. Podręcznik. [Costa and McCrae's NEO-FFI personality
inventory. Polish adaptation. A manual]. Warszawa: Pracownia Testów Psychologicznych PTP.
Appendix I
The list of genres used in the music preference questionnaire (translated into English)
Rock
1. Rock and roll/classic rock (Beatles, Rolling Stones, Doors)
2. Acid/psychodelic rock (Jimmy Hendrix, Greatful Dead, Jefferson Airplane)
3. Jazz-rock (Pat Metheny, Mahavishnu Orchestra, SBB)
4. Progressive/symphonic rock (Pink Floyd, King Crimson, Yes)
5. Electronic rock (Tangerine Dream, Kraftwerk, Klaus Schulze)
6. Pop rock (Queen, Madonna, Kylie Minogue)
7. New wave (Stranglers, Depeche Mode)
Back to index
Proceedings paper
The factor scores of each musical stimulus were calculated, and are shown in Figures 1-1 to 1-3. Factor 1 (Metallic) was represented by the horizontal-axis and Factor 2 (Pleasant) was represented
by the vertical-axis in Figure 1-1. In succeeding Figures, Factor 1 and Factor 3, Factor 2 and Factor 3 were represented respectively. The "MM" is the symbol for "melodious music", and "s" is the
symbol for "sound-logo". The other dots show the sound classified as "neutral".
As shown in Figure 1, the 6 melodious musics (MM) were perceived as being powerful and metallic. Furthermore, some subjects regarded the departure music as not very pleasant. On the other
hand, 8 sound-logo had a little unsatisfactory impression, and gave more pleasant feeling than the melodious music except for S1. The neutral music was accepted as sound without any character.
Figure 2-1and 2-2 show the profiles of the "melodious music" and "sound logo" respectively on the basis of the average judgments by 20 subjects.
Figure 2-1 Profile of the "melodious music"
Figure 2 show the difference of "melodious music" and "sound-logo" clearly. Melodious musics are accepted as the "noisy" (M=2.5, SD=0.22), "shrill" (M=5.5, SD=0.25), "unstable" (M=4.93,
SD=0.43), "hurried" (M=2.18, SD=0.34), and "loose" (M=4.82, SD=0.4). Meanwhile, the sound-logo gives the various impressions to subjects. Some sound-logo is accepted as "unsatisfactory",
"calm" and "grave", however, other sound-logo is not. The characteristics which are common to all sound-logos cannot be found from this results.
Multiple regression analysis.
The above result of each factor loading was analyzed in relation to the musical features of departure music using multiple regression analysis. Eight musical elements; the way of modulation, the
progression of cadence, the rate of harmony transition, the tone density, the lowest tone height, the highest tone height, the mean and the variance of pitch were extracted from each musical
stimulus. Multiple regression analysis indicated that the all factors were estimated significantly as follows: Factor 1 (F=5.53, p=0.001), Factor 2 (F=2.28, p=0.06), and Factor 3 (F=3.08, p=0.01).
Results from regression coefficient of explanatory variables indicated that the lowest tone height was most closely related to the metallic factor (p=0.009), the way of modulation, the progression
of cadence were related to the pleasant factor (p=0.003, p=0.08) and the tone density was related to the powerful factor (p=0.005).
Based upon the data given by the 20 subjects, the following results were found:
1) Subjects' judgements of thirty styles of departure music from around Tokyo and Osaka area are divided into 3 main factors: metallic, pleasant and powerful.
2) "Melodious music" categorized by music experts is judged as "noisy", "shrill", "unstable", "hurried" and "loose".
3) Some "Sound-logo" music is accepted as more pleasant and slightly unsatisfactory impression compared to the "Melodious music".
4) The "metallic factor" has some relation to the lowest frequency.
5) The "pleasant factor" has a relation to the way of modulation, and the progression of cadence.
6) The "powerful factor" has some relation to the tone density.
4. Discussion
As metropolitan areas continue to grow, and as people commuting to the central city continue to increase, public transportation provides a greater service to the passengers. Ten years ago, the
Japan Railway Company reformed a departure sign system which rung electric bells to music. It is said that some users complained about the tone quality of the bell ringing. Nowadays, more
people find themselves in a daily crowded and noise-packed train. While getting on/off a train every 20 seconds, we can hear the repeating announcements, ringing departure music, shouting
whistle, and the warning bell.
What are the causes of the noise problems? It is possible to point out 3 reasons. In the first place, Japan public corporations offer a surplus service to us. In the second place, some departure music
does not work as departure signals. Furthermore, a lot of people are unconcerned about the environmental sound issue. For the moment, let us look closely at the effect of music.
There are many studies on the effect of music in social environments, of course. According to many researchers, music can influence the extent to which consumers interact with commercial
environments. Dube, Chebat and Morin found some interactive effects associated to musically induced pleasure and arousal on consumers' desire to affiliate with bank employees (1995). North
and Hargreaves found that the liking for music in a student cafeteria was related positively to the diners' willingness to return (1996,1998). Areni and Kim found that classical music led to
customers buying more expensive wine in a wine cellar (1993). Other studies have indicated that music mediate affect toward the store or store image (for instance Bawa et al., 1989; Milliman,
1986; Golden & Zimmer, 1986) These studies have focused on some relationships between music listening and its commercial, or social context. In short, the conclusion that can be drawn from
these social conditions is that some music evokes positive affection between music and its context. However, on the platform, there is some doubt whether we can enjoy listening to musical
fragments while boarding a crowded-train. Does the music have positive effects on the irritated passengers? Does the departure music fulfill its' function well or does the music further aggrieve
the situation? It needs further consideration. Judging from the above, it is no exaggeration to say that Japan Railway Company's idea that change to the bell into music is a short-circuit.
Our results indicated that some musical features of departure music are deeply connected to the passengers' emotions. Subjects' judgements could be divided into 3 main factors: metallic, pleasant
and powerful. The metallic factor had some relevance to the lowest tone height. If the music is played at higher pitch range, we would feel the metallic impression strongly. The pleasant factor
had some relation to the progression of cadence and modulation. It is likely that if the music is played with an unexpected cord or key-change, we are displeased. The powerful factor was strongly
connected to the fast tempo and its tone density. It seems also reasonable to suppose that if the music is played at a fast tempo or quick rhythm, we feel a strong energy.
References
Areni, C.S. & Kim, D. (1993). The influence of background music on shopping behavior: Classical versus top-forty music in a wine store. Advances in Consumer Research, 20, 336-340.
Bawa, K., Landwehr, J.T., & Krishna, A. (1989). Consumer response to retailers' marketing environments: An analysis of coffee purchase data. Journal of Retailing, 65, 471-495.
Dube, L., Chebat, J.C. & Morin, S. (1995). The effects of background music on consumers' desire to affiliate in buyer-seller interactions. Psychology and Marketing, 12, 305-319.
Golden, L.L., & Zimmer, M.R. (1986). Relationships between affect, patronage frequency and amount of money spent with a comment on affect scaling and measurement. Advances in Consumer
Research, 13, 53-57.
Igarashi, J. (1993). Development of community noise control in Japan.
The Journal of the Acoustical Society of Japan, Special Issue on the creation of comfortable sound environment, Vol. 14-3, 177-180.
Milliman, R.E. (1986). The influence of background music on the behavior of restaurant patrons. Journal of Consumer Research, 13, 286-289.
Namba, S. & Kuwano, S. (1993). Global environmental problems and noise.
The Journal of the Acoustical Society of Japan, Special Issue on the creation of comfortable sound environment, Vol. 14-3, 123-126.
North, A.C., & Hargreaves, D.J. (1996). The effects of music on responses to a dining area. Journal of Environmental Psychology, 16, 55-64.
North, A.C., & Hargreaves, D.J. (1998). The effect of music on atmosphere and purchase intentions in a cafeteria. Journal of Applied Social Psychology, 28, 24, 2254-2273.
Ogawa, Y., Mizunami, T., Yamasaki, T., & Kuwano, S. (1999). The preference of signal music at railway stations in Tokyo. Children and Music: Developmental Perspectives, 233-240.
Sasaki, M. (1993). The preference of the various sounds in environment and the
Discussion about the concept of the sound-scape design. The Journal of the Acoustical Society of Japan (E), Vol. 14, 189-195.
Schafer, R. M. (1992). Music, Non-music and the Soundscape. In John Paynter, Tim
Howell, Richard Orton and Peter Seymour (Eds.). Communion to Contemporary Musical Thought. Routledege. 34-45.
Sterne, J. (1997). Sounds like the Mall of America: Programmed Music and the Architectonics of Commercial Space. Ethnomusicology, Vol.41-1, 22-50.
Back to index
Proceedings paper
Background:
During the last ten years, Techno has developed into an independent style of popular music. In
comparison to other kinds of popular music, such as Pop or Rock music, Techno is a very unique
style. Techno music has neither lyrics nor musical structure or form like Pop songs, but a strong,
repetitious rhythm (Keller, 1995). It does not have groups or singers to identify or fall in love with.
The typical Techno "band" is comprised of a computer and DJs, who programme computers to create
Techno. The first pieces of Techno were produced not to be sold in shops, but to be performed
directly in a club as a unique performance (Jerrentrup, 1995).
The presented study includes questions on the psychological context and function of Techno music, or
simply: Who likes Techno and why do they like it.
Everybody prefers one musical style to another, a person might like Pop songs but completely reject
Reggae. Differences in musical taste can be distinguished by looking at the music people listen to, at
the recordings they buy or at the specific live performances they attend. This does not mean that the
preferred music in all three examples is just one and always the same musical style. However, a "real"
fan prefers his or her style to others in all three cases. The measurement of musical taste is not straight
forward. Different studies can include different musical styles, which are more or less specifically
described. A lot of studies distinguish between Pop, Rock, Reggae, Hip Hop and Heavy Metal, which
could all be classified as "popular music" (Lewis 1996, Hakanen and Wells 1990). They rarely
distinguish between genres of "classical music" such as Baroque or Romantic (Kemp 1997). Different
studies on musical preferences are also hard to compare because of the methodology that is used. For
example, one set of researchers may look at verbally expressed preferences, another at the music that
participants actually listen to, and a third on the purchase of recordings.
Assumptions of studies on musical preferences are often based on given data sheets, where
participants rated their preferences concerning musical style. It is unclear whether participants know
all characteristics of a certain style and know differences of i.e. Pop and New Age. An unknown term
could easily be rated as 'disliked', whereas an audio example would be rated as 'liked'. Another critical
point about some studies is the use of questions like 'Do you like or dislike' something. To rate a
specific style as 'liked' does not necessarily mean that people have ever bought a recording of that
style or would go to a concert. It does not give any other information about how integrated this
particular music in peoples' everyday life is.
Behne (1997) suggested that different methodologies could influence the result of musical
preferences. He investigated musical preferences in a cross-sectional design, using a procedure which
looked at the preferences of played musical examples without telling the participants the name of
composition or composer, and preferences answering to questions like 'Do you like Jazz'. The
preferences, as reported by Behne, "are far from being identical". Even though Behne's main interest
was a specific use of music ('Musikerleben'), the data supported the need of a methodology for
investigations in musical preferences where both sides -participants and researchers- agree about the
specific characteristics of musical styles.
To some extend Lewis (1995) took a double check of musical preferences into account. Participants
picked their favourite style out of a list of ten major musical types and answered a question on their
favourite recording. This methodology shows more accurately whether the favourite recording
represents one of the styles picked out of the given list. Having a higher number of favourite CDs or
recordings to name could give even more information about the strength of a preference, be it an
artist, a group or a style.
The present study considers the importance of using a methodology that includes different ways to
specify musical preferences, such as verbally expressed preferences, preferences made after played
examples and the integration into everyday life.
Being a fan of a certain musical style includes almost always being part of a specific subculture.
According to Ruseell (1997), subcultures stay apart from mainstream culture in various ways. The
main variable can be being a fan of Manchester United, Bruce Willis or a musical style like Country,
Heavy Metal or Hip Hop. To every main variable other variables are added, i.e. specifically coloured
T-shirts for football-fans and particular types of clothing for different musical styles. Sometimes
ideological or political components are important as well.
Some added variables for Techno fans have already been investigated. Rose (1995) analysed the
typical Techno fashion, which is based on the character of the music itself. One basic element of the
music is sampled sounds. The fashion uses samples, as well as taking a lot of ideas from the sixties
and seventies combined with "pop-art" sportswear and label-logos. The material of cloth and other
fashion products reflects the technical and artificial production of the music. Plastic and shiny
materials are very popular in Techno fashion.
Other factors, besides visible elements of the culture like political involvement have not been
investigated yet.
Various studies deal with the functionality of music and personal choice in certain situations.
Research can look perspectives at participants' use of different musical styles:
a. from actual music they listen to in particular and
b. participant's reflection on the type of music they would chose in specific situations.
Rosenbaum and Prinscy (1987) made use of the first approach. Participants had to list three of their
favourite songs, followed by a personal description of the song and a reason why they liked it.
Participants were not asked to write an explanation why thy liked the song themselves, but to choose
one explanation out of a list of seven, as the researchers were additionally interested in the importance
of lyrics for choosing a favourite song. Some participants did not pay any attention to the lyrics and
explained their personal choice with "It's good to dance to", which suggestd that music can have
different functions.
Sloboda (1999) applied a comparable methodology. Participants were asked to answer a general
question like "Could you please tell us all about you and music". Included were some cues such as
"Do you use music in different ways?", concerning questions on music of their own choice and cues
like "Do you enjoy music in pubs or supermarkets" concerning music in public places. Participants
named their private use and preference of music as well as their liking of music in public places in a
personal letter to the researcher. This study suggests that people choose their music carefully to reach
a psychological function. Music can be used as a 'reminder of valuable past events' or for mood
changes. The most popular activities while listening to music are doing housework, driving, running
or cycling. The specific functions can be labelled, but it is not possible to ascribe one style of music to
one particular activity. To concentrate on one group of fans can give information on whether there are
preferred or excluded activities while listening to that particular style.
Another way of looking at the use of music was introduced by Behne (1997). The research is based on
the 'Uses and Gratification approach' by Katz et al (1974) and gives situations with different
emotional connotations. Participants rated the kind of music they would like to listen to in given
situations on a seven-point Likert scale with eight pairs of opposite adjectives. The activity Behne
focused on was a specific type of listening to music with joy and appreciation. These data show a
similar tendency as Sloboda's: participants know what kind of music they want to listen to in a
specific situation.
North and Hargreaves (1996) employed a more complicated methodology. The main idea of the study
is that musical preferences are associated with the listening environment. In contrast to Behne, North
and Hargreaves did not focus on only one specific situation, but gave 17 different situations. To
specify the music participants would like to listen to in these situations, 27 musical descriptors were
given to rate their musical preferences. The musical descriptors were a mixture of adjectives such as
'familiar', 'sad', 'beautiful', styles like 'Jazz', 'Pop', and 'Classical' and descriptors related to activities
like 'can dance vigorously to it'. The importance of a musical descriptor for a situation was rated on an
11-point Likert scale. Two steps led to situations with similar emotional connotations firstly, and
secondly to information on the preferred music in the given situation. Two factor analyses were made.
The first investigated the relationships of the musical descriptors and yield to six factors. The first
factor is the most important for my study. Descriptors which loaded positively on factor I were loud,
strong rhythm, invigorating, can dance vigorously to it, attention-grabbing, exciting/festive, and pop
music and negatively loaded quiet, relaxing/peaceful, classical music, beautiful, and lilting. The factor
was interpreted as arousal. The musical style Techno might be an example for this factor, the
descriptor 'pop music' neglected.
The second factor analyses investigated the relationship between the proposed 17 situations and yield
to five factors. Factor I with the situations jogging, with your Walkman on, at a nightclub, at an end of
term party with friends, doing the washing-up, ironing some clothes, driving on the motorway, and on
Christmas Day with your family loaded positively on this factor, which was interpreted as activity.
Between situations with similar emotional connotations and similarities in reported preferred music in
the situations product-moment correlation were calculated. The result showed a small but significant
value (resulting coefficient = .21). The results can be interpreted as follows: the more similar the
emotional connotations in two proposed situations are, the more similar is the preferred music
reported by the participants. For example, situations that seem to be arousal in nature (like dancing
and jogging) are associated with musical descriptors that increase arousal as well. The musical
selection supports the atmosphere of a situation. The presented data show the character the music
should have if used in specific situations, but it can not give information on one particular style and
the actual preferred activities.
Hypotheses
1st Hypothesis: There will be a positive relationship between knowing Techno and liking it.
Techno music is a musical style with specific characteristics which are quite different from other
popular styles such as Rock because it does not have lyrics, verses, clear musical form or a
'song-melody'. On the other hand Techno is similar to styles such as House and Drum'n'Bass because
of e.g. the repetitive rhythm. Since these styles are not broadcasted in general Radio or TV
programmes you must be very involved in the culture which use it in order to understand the
differences between such styles. These people are likely to use the music and are likely to like it. It
does not mean that all other participants must not like it, but they need not know the correct name, as
shown by Behne (1997).
2nd Hypothesis: Because of the musical characteristics of Techno is dancing is the most preferred
activity when fans listen to Techno.
This Hypothesis was suggested on the base of two matters: suggestions based on the study by North
and Hargreaves (1996) and two pilot interviews with Techno fans. Pilot interviews suggested a strong
preference for Techno as dance music. One reason a participant's friend does not like the music was,
"She couldn't dance to it". The repetitive rhythm and the flowing shapelessness makes it possible to
dance for a long time, even though it could be tiring just to listen to.
3rd Hypothesis: The more somebody feels comfortable to dance to Techno, the more he/she likes to
listen to this music in other situations.
In North and Hargreaves' study music, described with factor I by the musical descriptors, is
appreciated in more situations than just dancing. Techno, as a possible representative of factor I,
might be preferred in other situations. However, feeling comfortable dancing to Techno is the
prerequisite for liking to listen to Techno in other situations. Whoever does not feel comfortable
dancing to Techno music does not like to listen to it. Listening to Techno in other situations could
remind you on the pleasure you get in a club what makes you feel better.
The most important research questions for the interviews are not formulated in an hypothesis. For
example 'Why do people like Techno' can lead to different answers which can hardly be predicted by
hypotheses, since various topics concerning Techno are not investigated yet and the uniqueness and
specialities are unknown.
The study was conducted in two steps: a quantitative part basically to find participants for interviews
and to answer the question 'Who likes Techno?' and a qualitative part to understand why they like it.
Step one
The main purpose of the quantitative part is to find real Techno fans. The literature discussed earlier
suggests three steps for finding fans of a certain style. Firstly, participants listen to musical extracts,
determine the style and rate how much they like the music. Secondly, participants answer general
questions on their preferred musical styles and thirdly list their favourite recordings.
The purpose of the questionnaire is to divide participants into two groups: a) students who really
know what Techno is (connoisseurs) and b) students who do not know what Techno is (non
connoisseurs). Participants are also divided into one group of Techno fans and another group of
non-fans. Further questions were i.e. how much they liked the extract, how likely they would listen to
this kind of music in particular environments and how likely they would dance to it and to buy
recordings of it.
Method:
153 students from a College and a University in Middle England, students of a German Gymnasium
and Universities in Berlin (age 16 to 30 participated in the first part.
Five musical extracts were presented to the participants in groups of up to 20 students with an audio
tape-recorder. After every extract participants had to fill in a questionnaire. The answer to the
question 'How would you call the style of the extract' divided participants in connoisseurs and non
connoisseurs. Five one-minute extracts were used as stimuli. They represented the musical styles: a)
Drum 'n' Bass ("drum in a grip" by Logical Progesstion), Techno ("Deltroid" by Hardline2), House
("Blow Ya shistle" by J. Dubs), Pop (instrumental part of "Fantasy" by G. Michael) and Trance
("Ritual of Life" by Sven Väth).
Step two
The second part consisted of interviews with Techno fans to answer the question why people like it. In
which situations is Techno the preferred music and which situations can be excluded? What is so
special about Techno and its culture in the eyes of Techno-philes?
Method
11 students took part in the interviews. Participants were chosen because they were able to distinguish
between the extract of Techno and Drum 'n Bass, House, Trance, and Pop. As long as they were able
to identify the 'Techno' extract, they did not have to name the other styles correctly, but should not
entitle any of them as Techno. One of their favourite styles should be Techno and they should count
specific Techno pieces to their preferred recordings.
The interviews integrated 15 open-end questions and were taped and transcribed afterwards. The
interviews included questions on the participants' personal associations with Techno, social
influences, how long the have known the music, circumstances of the first contact, perception,
functionality and use of Techno.
Quantitative Results
The quantitative part was conducted for two reasons: Firstly, to find participants, who are 'real'
Techno fans, including knowing and liking of played extracts, questions on the preferred style, a list
of favourite recordings and questions on everyday use of the musical styles represented by played
extracts. Secondly, to use parts of the questionnaire to answer the Hypotheses.
1st Hypothesis
Looking at relationships between liking the musical extracts, knowing the style and preferring the
musical style 1st Hypothesis can be supported.
One way ANOVA with knowledge of style as the independent variable and liking of the extract as the
dependent variable showed a significant effect for the Techno extract (F(2, 138) = 11.78, p< 0.001).
Tukey post-hoc tests showed that the significance between the group that correctly identified the
Techno example (mean(liking) = 3.48 with 1 = liking a lot and 5 = not at all) and all other participants
(mean(liking) = 4,26,).
A one way ANOVA with 'Techno' as favourite style as the independent variable and liking the
extracts as dependent variable suggests that there is a significant difference between participants
whose favourite style is Techno and all other participants, with regard to liking the extracts of Techno
(F(1,147) = 40.39, p < 0.0001) and extracts similar to Techno but not the Pop extract (F(1,143) =
2.656, p = 0.151).
Both results suggest that the prediction made in the 1st Hypothesis - that participants who knew the
name of the style 'Techno' liked the extract significantly more than participants who did not know the
style name- is relevant. The connoisseur-group was the one who generally likes Techno.
2nd Hypothesis
Spearman's rho between liking the Techno extract and liking to dance to Techno shows positive
correlation between liking the Techno extract and the likelihood of wanting to dance to Techno
(Speyrman's rho = 0.661, n = 151, p < 0.0001).
Results of one way ANOVA of liking a certain style (included are Techno, Classic, Rock, Pop, Jazz,
Indie, and Heavy Metal) as the independent variable and how likely somebody would dance to the
Techno extract as the dependent variable, suggests that a predisposition towards liking to dance to
Techno depends on the musical style somebody prefers. Results propose that Techno fans rather
dance to Techno music than non-Techno fans (F(1,153) = 21.24) p< 0.0001, mean of fans = 3.33,
mean non-fans = 4.24). Participants who listed other musical styles as their favourite were not very
likely to dance to Techno. Fans of Classical music (mean fans = 4,35, mean others = 3,78, F(1,
153)=4.714, p = 0.031), and Heavy Metal (mean fans = 4.35, mean others = 3.75, F(1,153) = 5.836, p
= 0.017) were significantly less likely to dance to Techno than non-fans of this styles.
The 2nd Hypothesis is supported by both analyses; they suggest that the more participants liked the
Techno extract, the more they would like to dance to Techno. Only participants whose favourite style
is Techno would like to dance to Techno; fans of different styles were not very likely to dance to it.
3rd Hypothesis
Spearman's rho correlation between being likely to dance to Techno and being likely to listen to it in
other situations -given in the questionnaire- showed that the more somebody likes to dance to Techno
the more he or she would like to listen to it in other situations and environments. The included
situations were: at home (Spearman's rho (Sr) = 0.824, n = 157 , p < 0.0001, n and p are the same
values for the following situations), in a bar (Sr = 0.451), with friends (Sr = 0.678), to relax (Sr =
0.438), to energize (Sr = 0.717), being likely to go to a concert (Sr = 0.825), how often they attend (Sr
= 0.705), and being likely to buy (Sr = 0.724).
A one way ANOVA of being a fan of a certain style besides Techno as the independent variable and
how likely somebody would listen to Techno in different situations was applied to investigate whether
fans of different styles are also likely to listen to Techno in different situations. None of the results
show that fans of another musical style like listening to Techno in other situations. Some results even
propose an opposite tendency. Fans of Classical music or Heavy Metal, for example, are less likely to
listen to Techno to arouse and energize or to listen to it with friends than non-fans of these styles
(arouse/energize for Heavy Metal fans: mean fans = 4.45, mean others = 3.88, F(1,153)=5.68, p =
0.018; for classic fans: mean fans = 4.5, mean others = 3.91, F(1,153= = 4.738, p = 0.03; 'with friends'
for Heavy Metal fans: mean fans = 4.18, mean others =3.62) F(1,153) = 5.551, p = 0.02; for classic
fans: mean fans = 4.32, mean others 3.63, F(1,153)=6.739, p = 0.01; with 1 = very often and 5 = not at
all).
All results presented here support the 3rd Hypothesis, that participants who like to dance to Techno
like to listen to Techno in different situations as well. Participants who like other musical styles
except Techno are not likely to listen to Techno in any other situation, sometimes even less likely than
non-fans of a particular style. Techno fans also report that they would be likely to listen to Techno in
order to relax. This data seem to contradict North and Hargreaves idea that the musical selection
'augment' the connotations and that therefore relaxing should not be one of the favourite activities
while fans listen to Techno.
Qualitative Results
The quantitative part with the three-steps methodology divided participants into different groups.
Firstly a group of connoisseurs and another group of non-connoisseurs. Secondly questions divided
participants in 'committing Techno-fans' and fans of other musical styles, and thirdly in those who
count Techno recordings as their favourite and another group who do not.
Only participants who gave 'positive' answers in all three groups took part in the interviews: they
knew the right name for the Techno extract, did not get confused by similar extracts, one of their
favourite musical style is Techno and their favourite recordings include Techno.
Interviews gave a) more specific information which leads us to accept the Hypotheses and b) gave
information emphasizing the uniqueness of Techno, which does not show in questionnaires.
The 2nd and 3rd Hypotheses can be supported by statements given in interviews. Over and above
underline interviews the importance of dancing as the favourite activity.
Compared to all other activities mentioned during the interviews 'dance' is used more often than all
other activities together (mean 'dance' = 16,45, mean all other activities = 6.0, M = 11.1, SD = 6.51,
t(10) = 2.11, df = 9, p = .063). Included were relax/chill, work and driving in a car.
Even though 'dance' has been an important issue in all interviews, the context in which this word was
used was different. Some situations were described without using the keyword 'dance'. Different
statements of first association of Techno underline the importance of Techno as dance music. e.g. the
first association by Chris (30 year old English student)
Ahm, dancing, you know, yea dancing ... in Clubs.
Reasons to prefer Techno as a music to dance to are strongly related to musical elements. A possible
explanation why people do not like Techno was:
Why? I think, al lot of people don't like it because it is very repetitious, but I think that's a good thing
about it, because it is a regular rhythmic beat, ah, you tend focus on that, and your mind can take you
elsewhere, whereas in lots of like guitar music or folk music you tend, to concentrate on the lyrics, or
the performance but in dance music it is more the rhythm.
Besides backing the formulated Hypotheses, interviews gave answers of the research question why
people like Techno. The given reasons can be classified into two groups. One is related to dancing to
Techno and the other to the subculture, as perceived by participants.
What makes dancing in Techno clubs special for its fans is a more relaxed atmosphere. In contrast to
other Clubs nobody is there to record your dancing movements' (Clair, 24 year English student) or in
other words: Techno is not focused on dancing with a partner or integrates a 'give someone a come
on-game'.
According to Thomas, a 25 year old German student, Techno creates a 'community' where people
have a similar way of thinking. This statement seems to be supported by other participants, who even
specified cultural characteristics: they perceive Techno culture as open minded and able to break
down cultural and sexual barriers.
Techno fans are able to specify situations when they listen to Techno and situations when they prefer
different music. For them, situations like doing housework, driving a car, preparing to go out at night,
getting into a good mood or even 'producing a creative mood' (Thomas) are strongly connected with
Techno music. The choice of specific styles depends on personal preferences. Chris reported not to
listen to 'aggressive Techno' at home and Clair also prefers soft Techno for driving or chatting with
friends.
Almost all participants mentioned that they do not listen to Techno in order to relax, because it is not
slow or calm enough. Participants were never asked directly whether or not they relax to Techno, but
most of them mentioned it during the interview, like Beate (17 year old German student):
I've got such a CD with various things on it, nice things, when it should be calm, for relaxation and
so. Well, Techno is nothing to relax to [...].
Discussion
The analyses of the quantitative part suggest a strong relationship between knowing and liking a
musical style. Knowing does not only include the ability of telling what Techno is, but also the ability
to distinguish Techno from different musical styles like Pop as well as from styles close to Techno
like House or Trance. This ability can be found by Techno fans and did not appear among non-fans.
The most preferred activity of Techno fans seem to be dancing. The preference of Techno as dance
music could be understood as 'key'-reason for liking to listen to Techno generally. He or she who does
not like to dance to Techno does not tend to listen to Techno in other situations.
Data from the questionnaires did not really show the 'front-runner' position of the activity dancing.
However, comments in the interviews established the implication that dancing is the most preferred
activity while fans listen to Techno.
Fans perceive dancing to Techno as different from dancing to e.g. Pop music. The uniqueness of
dancing to Techno is caused by the special musical elements, which are different from structured
popular music. The combination of a strong and dominant rhythm with dancing creates a feeling
which could described as hypnotic. The attention is not drawn to the lyrics or the performing band.
Not only the dance situation is perceived as different compared to Pop or Rock concerts or clubs, but
also the culture itself. Fans look at their culture as open minded and less prejudiced against minorities
than other musical cultures. Compared to mainstream clubs the atmosphere is described as more
friendly. This could be the effect of a different drug use and can be further investigated.
What type they choose, differing in tempo or soundcombinations, depends on whether they are
driving in a car, talking with friends or dancing. A situation where even fans do not like to listen to
Techno is that of trying to relax. The music Techno fans choose to chill out tends to be calmer and
slower than Techno. This supports North and Hargreaves' explanation that musical selection supports
the atmosphere of a situation.
The methodology, which is divided into two parts, made it possible that the first part selected suitable
participants for the interviews. The important first part - the selection of interviewees - was structured
in order to find participants who were firstly able to distinguish between similar types of Techno and
styles close to it. The ability was proven by being able to identify styles of played musical extracts.
Secondly, participant who were included in interviews, appreciate the Techno extract, were likely to
listen to it in different environments and were likely to dance to it. And thirdly one of their favourite
styles is Techno. This information was asked among personal questions. The style was doublechecked
with participant's favourite recordings, which included e.g. Techno CDs, to make sure that a research
expectation did not influence the chosen style, as it was not difficult to conclude - after four musical
extracts all similar to Techno - that the study was on Techno music. The three steps methodology
enabled me to find participants who are really involved in this style and represent a good sample of
Techno fans.
Besides using the quantitative part to identify fans, the combination of quantitative and qualitative
approach gave evidence that in some parts qualitative approach can clarify quantitative outcomes.
Qualitative data was essential not only to support the 2nd Hypothesis that dancing might be the most
important activity of Techno-philes, but also to come to the controversial result that people might
listen to Techno in order to relax. North and Hargreaves' research could suggest that Techno is not a
relaxing music. However, differences between the mean of Techno fans and the mean of non fans
suggest that fans are significantly more likely to relax to Techno than other participants. On the other
hand, by interpreting the interviews it becomes obvious that even fans don not count Techno to music
to relax to. The differences between quantitative and qualitative outcomes could be caused by the
nature of question. The quantitative part integrated a more hypothetical question, participants were
asked whether they were likely to listen to Techno for relaxation. However, the interviews gathered
information on personal use and the answers are therefore not presented as hypothetical, but show a
picture of real attitudes.
The deciding point is that the methodology applied discovered the real fans among all participants and
questioning only those led to a more detailed view on Techno fans and cultural elements.
References
Behne, K.E. (1997a). The development of "Musikerleben" in adolescence: How and why
young people listen to music. In Deliège, I. and Sloboda, A. (Eds.) Perception and
Cognition of Music., Hove: Psychology Press.
Hakanen, E.A. and Wells, A. (1990). Adolescent Music Marginals: Who Likes Metal,
Jazz, Country, and Classical. Popular Music and Society. 14.4, 57-66.
Jerrentrup, A. (1995) Techno Musik. In Henger, M. and Prell, Matthias (Eds.) Popmusic
- yesterday - today - tomorrow, p. 107-121, Regensburg.
Katz, E., Blumler, J.G., Gurevitch, M (1974)Utilization of mass communication by the
individual. In Blumer, J.G. and Katz, E. (Eds.) The uses of mass communication. London.
Back to index
Proceedings paper
Mr Mark Tarrant
mt37@leicester.ac.uk
Background:
A dearth of research has considered the contribution that music makes to social
identity in adolescence. The research that has addressed this has indicated
that adolescents may use music as a means of distinguishing their own peer
group from other groups. More specifically, adolescents have been shown to
associate the ingroup with positively stereotyped music to a greater extent,
and negatively stereotyped music to a lesser extent, than they associate an
outgroup (see Tarrant, North, and Hargreaves, 1999). Such findings are
consistent with social identity theory (Tajfel, 1982). However, to date
research has not investigated the importance of music relative to other
interests in this process. This is the purpose of the present study.
Aims:
The study will demonstrate the extent to which music is considered a valued
dimension in the intergroup behaviour of adolescents.
method:
175 male adolescents aged 14-15 years took part in the study. They were
recruited from a school in the West Midlands region of the UK. A questionnaire
presented participants with 27 statements concerning adolescents'
attitudes/leisure interests, and participants were required to estimate how
much each statement described members of the ingroup and members of the
outgroup. The statements covered a wide variety of musical and non-musical
interests (e.g. "they enjoy listening to classical music"; "they enjoy
listening to indie music"; "they enjoy watching current affairs programmes").
The participants then rated the 27 items for how desirable or undesirable the
ingroup believed each one to be. The final section contained six items which
assessed level of ingroup identification.
Results:
The results are currently being collated. It is hypothesised that these will
demonstrate support for social identity theory, and will confirm music's status
as a valued dimension in young people's intergroup behaviour.
Conclusions:
Back to index
Proceedings paper
Perception Of Musical Styles, People Listening To Them, And Reasons For Listening
Hasan Gürkan Tekman & Nuran Hortaçsu, Middle East Technical University
Listeners may have a variety of reactions to different musical styles. Hargreaves and North (1999)
have listed these reactions as stylistic sensitivity, discrimination, knowledge, liking, tolerance, and
competence. Stylistic knowledge can be defined as knowledge of verbal labels associated with
different musical styles. We took stylistic knowledge to be not only about labels that are used to refer
to musical styles but also about the relationships between different styles, characteristics of different
styles, reasons for listening to different styles, and characteristics of people who listen to different
styles. Each one of these four issues was investigated in a method that involved directing first open
ended than more structured questions to participants. The procedures that were followed and results
that were obtained relating to these four issues are described in separate sections below.
Musical styles and how they are related
The initial task in this research project was to identify a small number of distinct musical styles
familiar to the college student population that responded to our questionnaires. For this purpose, first,
participants were asked to list as many names of musical styles as they could. Then, the names of the
styles that were listed most frequently were given to another group of respondents and they were
asked to group them so that similar styles would be in the same group. The data obtained thus were
used in hierarchical cluster analysis. In this analysis six distinct clusters were identified.
1. Pop music: This cluster contained the labels pop, foreign pop, and Turkish pop. "Pop" was
selected to represent this cluster in the following stages of the research.
2. Rock and metal: This cluster contained rock, metal, and heavy metal. "Rock" was selected to
represent this cluster.
3. Western art music: This cluster contained classical, jazz, and blues. "Classical" was selected to
represent this cluster.
4. Contemporary dance music: This cluster contained rap, techno, and underground. "Rap" was
selected to represent this cluster.
5. Turkish musical styles: This cluster contained Turkish folk music, Turkish art music, and
Özgün music, which is a more recent development in Turkish music. It shows influences of
both folk and art music and its lyrics contain political comment. "Turkish folk" was selected to
represent this cluster.
6. Arabesk: This is another style that is indigenous to Turkey. It combines aspects of folk music
and traditional art music of Turkey with the musical styles of Egyptian and Indian movie
musicals, which were popular in Turkey during the 1950's. Lyrics in Arabesk typically describe
the sufferings of a protagonist of lower socioeconomic origins whose aspirations are frustrated
by an unjust fate.
How are musical styles described?
For the purpose of answering this question, first, the respondents who listed musical styles were asked
to list three adjectives that those styles brought to their minds. Second, the adjectives that were used
most frequently were presented to a different group of respondents together with the names of the six
musical styles that were selected for further investigation. Respondents had to rate the appropriateness
of each adjective to each musical style on a five-point scale. The data collected from these scales were
submitted to factor analyses for the six musical styles separately. Then, scale reliabilities were
calculated for the groups of adjectives that were consistently put together in these analyses. As a
result, three main dimensions with satisfactory scale reliabilities emerged.
1. Evaluation: This dimension brought together the adjectives meaningful, pleasant, high quality,
(not) boring, (not) irritating, (not) simple, lasting, and (not) monotonous.
2. Activity: This dimensions brought together the adjectives lively, exuberant, dynamic, exciting,
entertaining, and rhythmic
3. Peacefulness: This dimension brought together the adjectives harmonious, sentimental, restful,
peaceful, and soothing.
The six musical styles clearly differed in terms of the appropriateness of these three dimensions for
them. In terms of the evaluative dimension classical and Turkish folk were rated most positively while
rap and Arabesk were rated least positively. On the activity dimension rock, pop, and rap were rated
highest while Arabesk was rated lowest. Arabesk and rap were rated as the least peaceful as well,
while classical was rated highest on this dimension.
Why do people listen to different styles?
For the purpose of finding out what our respondents thought about this question, first, respondents
were given the names of the six musical styles selected and asked to list the reasons people may have
for listening to them. The most frequently mentioned reasons were presented to a different group of
respondents as five-point scales in the next step and respondents were asked to rate how appropriate
each reason was for listening to each style. The reasons for listening were grouped in factor and scale
reliability analyses as described in the previous section. Four dimensions emerged as main reasons for
listening to musical styles.
1. Listening in the background: This dimension brought together listening for, background
accompaniment, relaxation, feeling good, diverting one's mind from troubles, and passing time.
2. Listening for movement: This dimension brought together listening for dancing, movement, and
catharsis.
3. Listening for appreciation: This dimension brought together listening for thinking and
appreciating art.
4. Listening for identity: This dimension brought together listening for reviving identity, having a
sense of community, nostalgia, and finding expression of one's feelings.
The reasons for listening were well differentiated for different styles of music. Classical and pop were
best suited to listening in the background and Arabesk was least suited to this purpose. Pop, rap, and
rock were the best candidates for listening for movement and Arabesk was the least preferred for this
purpose. Arabesk was also least suited for listening for appreciation while classical was the choice for
this purpose. Turkish folk was rated as most suitable for listening for identity and pop and rap were
least suited to this purpose.
Who listens to different styles?
For the purpose of answering this question, first, respondents were given the names of the six selected
styles and they were asked to describe what kind of person would listen to each one of them. The
adjectives that were used most frequently were presented to a different group of respondents as
five-point scales in the next step. Respondents were asked to rate the listeners of the six musical styles
in terms of these adjectives. They were also asked to rate how much these qualities fit themselves and
how much they would desire to have these qualities. Then, the desirability ratings of the adjectives
were submitted to a factor analysis. Scale reliabilities were calculated on the ratings of the six musical
styles on the adjectives that were grouped together in the factor analysis. Three dimensions describing
the listeners of music emerged with consistently high scale reliabilities.
1. The loser: This dimension brought together the adjectives pessimistic, aggrieved, disturbed,
poor, and defeated.
2. The sprightly: This dimension brought together the adjectives dance-loving, fun-loving, wild,
and vigorous.
3. The sophisticated: This dimension brought together the adjectives (not) unenlightened, young,
educated, refined, and mature.
The six styles differed in terms of how strongly they were associated with each listener type. The loser
type was most closely associated with Arabesk and least with classical. The sprightly type of listener
was most appropriate for pop, rock, and rap, and least appropriate for Arabesk. The sophisticated
listener was most appropriate for Classical and least appropriate for Arabesk. In addition, although
respondents who liked and disliked a musical style did not describe themselves differently they
disagreed when they were asked to rate listeners of that style on these three dimensions. Respondents
who liked a musical style tended to describe listeners of that style more favorable compared to those
who disliked that style.
Conclusion
We observed remarkable consensus on how musical styles and people who listen to them are
perceived. One can say that stylistic knowledge involves a multifaceted representation about musical
styles, why people listen to them, and what kind of people would listen to them. Possible reasons our
respondents reported for listening to different musical styles are consistent with proposals that music
serves the functions of emotional and intellectual stimulation (Berlyne, 1971; Meyer, 1956), mood
manipulation (Konecni, 1982), and creating and consolidating social identity (Crozier, 1997; Sloboda,
1985). The differences between how respondents who liked and those who disliked a musical style
described fans of that style point out the importance of music as a way of expressing social identity
and group membership (Tajfel, 1981).
References
Berlyne, D. E. (1971). Aesthetics and psychobiology. New York: Appleton-Century-Crofts.
Crozier, W. R. (1997). Music and social influence. In D. J. Hargreaves & A. C. North (Eds.)
The social psychology of music. Oxford: Oxford University Press.
Hargreaves, D. J. & North, A. C. (1999). Developing concepts of musical style. Musicae
Scientiae, 3, 193-216.
Konecni, V. J. (1982). Social interaction and musical preference. In D. Deutsch (Ed.), The
psychology of music. New York: Academic Press.
Meyer, L. B. (1956). Emotion and meaning in music. Chicago: University of Chicago Press.
Sloboda, J. A. (1985). The musical mind: The cognitive psychology of music. Oxford:
Clarendon Press.
Tajfel, H. (1981). Human groups and social categories. Cambridge: Cambridge University
Press.
Back to index
Proceedings abstract
kominek@astercity.net
Background:
Aims:
The above research aimed at (i) identifying the criteria which determine the
process of recognising folk songs which have a range of local and individual
variants, and (ii) examining the extent of tolerance towards alterations of the
parameters which are crucial to the tune identification process.
method:
10 experienced female folk singers from south-eastern Poland were chosen for
the research. They were asked to assess a dozen or so variants of songs
selected from the region's popular folk repertoire. The songs, recorded over
the last 50 years, are rather diverse in terms of melodic-rhythmic structure
and performance style. The research method combined an informal interview and a
"same-or-different" test.
Results:
in process
Conclusions:
Back to index
Symposoum introduction
Rationale
Recent attempts of developing automated music transcription systems reemphasize the need for
detailed knowledge about categorisation in rhythm perception since straightforward grid quantisation
often results in too complex notations. Knowledge about how human musicians perceive and interpret
rhythms has not only theoretical significance, but is also likely to improve automated transcription
systems.
Aims
The symposium aims to gain insight into human categorisation in the perception and interpretation of
rhythmic patterns. First, it attempts to reveal the relationship between quantisation and categorical
rhythm perception. Second, it tries to make explicit the role of context. Third, it aims to discover how
empirical insights can be incorporated into models that may also be part of automated music
transcription systems.
Speakers
Eric Clarke will give an overview of the field, and provide the theoretical platform for the discussion
by making a distinction between continuous (non-categorical) and discrete (categorical) aspects of
rhythm perception. Peter Desain will present experiments which aim to describe in detail the shape of
the rhythm categories and how they are affected by tempo and metrical context. George Papadelis will
show how these categories form and change with increasing musical experience of the listener. Next
to presenting empirical work, Ed Large will propose (an outline of) a dynamical model which should
be able to bring together these different aspects of categorisation in rhythm perception.
Discussant
Bruno Repp, who has largely contributed to the study on categorical perception, will chair the
symposium.
Back to index
Proceedings abstract
Background:
Research suggests that preschool children provided with piano instruction score higher on the Object
Assembly (OA) task of the Wechsler Preschool and Primary Scale of Intelligence.
Aims:
The present study's goal was to understand the nature of this enhancement. The OA task involves (a)
sequential problem-solving, (b) mental imagery formation, and (c) mental image transformation. We
administered a large battery of other tasks which draw upon these to determine the specific cognitive
abilities that are enhanced by music training. Because the failure to develop spatial/abstract reasoning
represents the most glaring deficiency of deprived children, children enrolled in a federal at-risk
intervention program served as subjects.
method:
Eighty-eight at-risk three-year-old children were pre-tested using several visuospatial cognitive,
perceptual, and standardized tests. Instructors visited the children's preschools to provide private
15-min weekly instruction in either the piano or the computer for 24 weeks (14 weeks/year). A third
group of children received no special training. All children were then post-tested. Gordon's Primary
Measures of Music Audiation was also administered.
Results:
Children in the music group scored significantly higher on several tests measuring mental imagery
formation. The magnitude of the effect was similar to that found in previous studies with
middle-income children. Sequential problem solving and mental image transformation were not
affected by music instruction. Scores of the children who received computer lessons or no training did
not differ significantly on most measures. Musical aptitude scores did not significantly correlate with
spatial task scores.
Conclusions:
These findings suggest that music training significantly improves mental imagery formation in young
children. This research will help researchers understand the links between intellectual abilities, and
how development in one sphere might influence the development of related processes in another
sphere.
Back to index
Proceedings paper
Abstract
There has been a limited quantity of work on the categorical perception of rhythm, and
the work that exists suggests that categorisation is closely linked to the perception of
metre in rhythmic sequences. A possibly misleading consequence of the existence of the
term ‘categorical perception' is the implication that categorical perception is somehow
‘special' and that it only applies in certain circumstances.
The aim of this paper is briefly to review previous work on categorical perception of
rhythm and to argue that perception always has a categorical (and a non-categorical)
component. There is in this sense nothing ‘special' about the categorical perception of
rhythm. The theoretical framework for this perspective is event perception, and the paper
will argue that rhythm perception is categorical because it is events (which have a
discrete character about them) which are perceived in music. The nature of these musical
events will be briefly discussed. Perception is, however, not only categorical, and it is
important to consider those aspects of perceptual experience that are continuous rather
than discrete. For rhythm, the non-categorical component is experienced as an expressive
or characterising modifier, from which a listener can pick up such things as the
competence or nervousness of the performer, and the quality of movement (real or
imagined) that has given rise to the music.
The implications of this view are on the one hand to make categorical perception less of
an issue in its own right, and on the other hand to show that it is an endemic and intrinsic
feature of perception and therefore of more general significance than its rather technical
name might suggest.
1. Introduction
The term ‘categorical perception', originating in work on the perception of speech and colour, has
frequently been used to suggest that some perceptual domains are ‘special' and demonstrate
categorical perception, while others do not manifest this feature and (presumably) demonstrate
continuous perception. This outlook has implied or directly suggested that categorical perception
confers advantages on the perceptual domains to which it applies, such as speed of processing and
distinctness, and (according to some authors) may be the result of ‘hardwiring', or special
sensitivity. For example, it has been argued that categorical perception in speech is a consequence of
the adaptive significance of language, and has been proposed as one of the components of a putative
language module. By contrast, the approach I will suggest here is that categorical perception is
nothing special at all, and is the inevitable consequence of the sensitivity to events that characterises
an ecological understanding of perception.
2. Categorical Perception
Harnad (1987) provides an overview of the literature on categorical perception in its various
manifestations. Operationally defined, categorical perception is characterised by two effects: i) in an
identification task, as the stimulus material is gradually transformed from one point on a stimulus
continuum to another, subjects show a relatively sharp discontinuity between the probability of
making an 'A'-type judgement to any one of a number of continuously variable stimuli and the
probability of making a 'B'-type judgement; ii) in a discrimination task, subjects show better
performance when a pair of stimuli separated by a given amount ∂ on the stimulus continuum is
taken from across the putative category boundary than when a pair with the same objective separation
∂ is taken from within either/any of the putative perceptual categories. It has sometimes been
claimed (and for some particular phenomena) that categorical perception is innate, and at other times
that it is learned (see Livingstone, Andrews & Harnad, 1998).
3. Categorical Rhythm Perception
Some 50 years ago Fraisse, in his work on rhythm, proposed a categorical distinction between two
classes of rhythmic duration, which he termed temps longs and temps courts. The distinction between
the two categories was not framed within the classic operational definition given above (Fraisse did
not carry out identification and discrimination experiments), but it nonetheless demonstrates some of
the same principles and properties. Fraisse proposed that events with a duration of less than about 400
msec (temps courts) are qualitatively distinct from events with durations of more than this value
(temps longs) in that events in the former category are not perceived as having duration at all, but
rather what he calls ‘collection', while events in the latter category are perceived as having true
duration. Fraisse claims that listeners perceive temps courts as event collections which spontaneously
group together into rhythmic Gestalts in which there is no awareness of the duration between attack
points (event onsets), while temps longs demonstrate no such spontaneous aggregation into groups,
and lead to a definite sense of duration between attack points. He summarises the idea as follows:
"Rhythmic structures ... consist of the interplay of two types of value of temporal
interval, clearly distinct from one another (in a mean ratio of 1:2). Within each type the
durations are perceptually equal to one another. The collection of shortest intervals
appears ... to consist of durations less than 400 msec." (Fraisse, 1956, p. 30. Author's
translation)
Fraisse's monograph of 1956 reported further data which can be seen as the precursors of more recent
interest in categorical rhythm perception. He measured the inter-onset durations of subjects'
spontaneous tapping, which when plotted on a histogram showed a strongly multimodal distribution,
with peaks in the distribution at integer ratios between durations (primarily 1:1 and 1:2, with a less
well-defined peak at 1:3). In other studies also reported in the monograph, Fraisse noted the tendency
for subjects to transform their rhythmic productions in the direction of integer ratios when trying to
reproduce non-integer stimulus rhythms, which he explained according to a principle of assimilation
and distinction. The idea (which has a clear Piagetian legacy) is that ratios between adjacent durations
that are non-integer values ‘migrate' towards nearby integers in reproduction by a process in which
the durations become either more similar to one another (assimilation) or more different (distinction).
For example, two durations of 300 and 400 msec (in a ratio of 1:1.33) will tend to drift towards 350 +
350 (= 1:1) by a process of assimilation; while two durations of 300 and 500 (a ratio of 1:1.66) will
tend to drift towards 270 + 540 (= 1:2) by a process of distinction. While the relationship between the
temps longs/temps courts distinction and the idea of integer ratios is not clearcut, taken together
Fraisse's work suggested the possibility that rhythm might be perceived in terms of categories which
were both qualitatively distinct, and integer related.
In a paper published in 1981, Povel took up Fraisse's suggestion of the ‘migration towards integers'
and in a reproduction task investigated these migrations in a more systematic and thorough fashion,
incorporating the effect into an explanatory framework that recognised more explicitly the central role
of metre. As with Fraisse, Povel's work cannot be considered to embody the standard demonstration
of categorical perception (he never uses the term in the paper, and uses a reproduction task rather than
the standard identification and discrimination methods), but the ‘migration' effects in his empirical
results are suggestive of the within-category instability and between-category distinctiveness that is
fundamental to the whole notion of categorical perception.
In a paper published in 1987 (Clarke, 1987) I took up the implicit indication that rhythm might be
perceived categorically and tackled the question with the standard methods of identification and
discrimination. The data reported there showed the characteristic features of categorical perception (a
disjunction in the identification function as subjects switch from one perceptual category to another,
coupled with a peak of discriminability when pairs of stimuli are taken from either side of the
category boundary), together with a metrical effect causing the category boundary to shift so as to
make a larger proportion of the stimulus continuum consistent with the prevailing metre. In simple
terms, the effect of the metrical context is to cause subjects to perceive potentially ambiguous rhythms
in a fashion which supports and confirms the prevailing metre.
Towards the end of the paper, I considered the perception of so-called rhythmic ‘microvariations' -
those continuous variations in local tempo that have been widely researched in music performance,
and which correspond to small departures from integer proportions at the level of adjacent durations.
The implication of categorical perception, narrowly and rigidly defined, is that these microvariations
should be barely detectable if they occur within categories, and yet they have been widely regarded as
a critical component of expression in musical performance - from both a production and perception
point of view. Furthermore, other studies (e.g. Clarke, 1989; Repp, 1992) have demonstrated that
listeners are sensitive to much finer rhythmic distinctions than the coarse-grained categories would
suggest. The answer to this apparent paradox (as suggested in the 1987 paper) is to recognise that the
microvariations constitute a different kind of rhythmic sensitivity, experienced as expression rather
than rhythm:
"After the temporal information for rhythm has been categorised, any ‘remainder' (i.e.
any deviations from a perfect categorical fit) is considered to be expressive information,
or perhaps accidental inaccuracy. This expressive information is perceived as
continuously variable, and is used by performers to indicate structural markers in the
music and thus particular interpretations ... It is perceived by listeners as qualitatively
different from the temporal information that specifies rhythmic structure ... The
separation of temporal information into a domain of structure and a domain of expression
resolves the apparent paradox that small whole number ratios are the simplest to perceive
and reproduce, but that real human performances do not conform to these integer
perception. The absence of metre is, after all, a perfectly legitimate and quite perceptually striking
rhythmic effect. The discrimination function showed a distinct peak between this central area and both
of the adjacent metrical categories, and the result indicates that categories unanticipated by the
experimenter, and not designed to be in the stimulus materials, may nevertheless make themselves
felt.
The relationship between the microvariations of expressive timing and rhythmic categories is
paralleled in Windsor's study by the relationship between metrical accents and syncopations (or other
kinds of expressive accents). Windsor's results suggest that his subjects were unable to perceive small
differences in accentuation (expressive differences) outside a metrical framework:
"Syncopation and expressive dynamics exist in opposition to metrical accents, giving no
information for metre. ... The important point here is that information redundant in one
domain may be important in another. Just as expressive information can only be
perceived in relation to the non-expressive rhythmic structure, syncopation is
inconceivable outside a metrical framework." (Windsor, 1993, p. 138)
4. Quantisation
Closely linked to the notion of categorical rhythm perception, and using the language and principles
of automated rhythm processing as employed in commercial sequencer programs, is the idea of
quantisation. Quantisation is the process whereby the continuously variable event durations of a real
performance are rationalised into discrete rhythmic values as represented in standard music notation.
For sequencers and notation programs, this stage of processing is required so as to give rise to
practical and musically ‘sensible' representations that might then be read by subsequent performers,
or to allow one sequencer track to be coordinated with other tracks. However, the process has also
been regarded as analogous to (or even identical with) the perceptual process by which a listener or
co-performer parses the rhythmic structure of a performed event sequence, the quantisation process
filtering out expressive microvariations so as to reveal the underlying rhythmic structure. This
perceptual interpretation of the term (as opposed to its purely operational use) goes back at least as far
as the work of Longuet-Higgins (e.g. Longuet-Higgins and Steedman, 1971) whose pioneering
artificial intelligence work on music took as its starting point the proposition that recovering a
conventional notational representation from a performed sequence should be regarded as modelling a
perceptual process. Standard music notation, it is asserted, is not just a set of conventions, but
embodies musical understanding in a formalised manner. If a program could be written that rendered
performed music into the same notation that a skilled human transcriber would produce, then that
program should be regarded as manifesting musical intelligence, and would be (implicitly or
explicitly) a model of musical cognition.
Longuet-Higgins' early attempts at this modelling made use of a very simple approach to the
quantisation problem (a term that he actually never used): having set the tempo of a performance
either by means of a series of introductory beats, or by using the length of the first note (or few notes)
as a standard, the program looked for successive beats, and multiples and divisions of beats, with a
window of tolerance within which the next note had to fall. If the next note fell within that tolerance
window, the appropriate beat length was confirmed; if it preceded or followed the boundary of the
tolerance window it triggered a shorter or longer beat unit. Longuet-Higgins has never provided a
definitive statement on the value that the tolerance window should take, though in one paper
(Longuet-Higgins, 1979) he proposes that it might take an absolute value of around 100 msec.
Intuition suggests, however, that a window that was at least partially proportional to the current beat
duration might be more plausible, since the tolerances around the timing of semibreves and quavers,
for example, are unlikely to be the same. The principle that Longuet-Higgins proposed, however, is
essentially that adopted by most sequencer manufacturers: the quantisation function in most
commercial sequencers simply sets up a metrical grid, and if a note falls within a certain tolerance
range of one of the ‘grid lines' its onset is moved forwards or backwards in time so as to
synchronise with the grid.
Though computationally simple and reasonably effective with performance data that are already quite
metronomic, there are problems with this approach. In its basic form it cannot cope in a sensible way
with tempo drift (i.e. continuous changes in the underlying tempo); and it is very sensitive to, and
easily disrupted by, local irregularities - a single misrepresentation resulting in a cascade of errors.
Although commercial products have tried to get around some of these problems with a variety of
quick fixes of one sort or another, the underlying problem is that quantisation is tackled on an
event-by-event basis. Desain and Honing have approached the problem using rather different methods
(e.g. Desain & Honing, 1989), based on the principle that it is the mutual adjustment of adjacent time
intervals in a sequence, and the relationships between time intervals at the same level as well as with
superordinate time intervals formed out of aggregations of basic-level time units which must be
considered in order to arrive at a flexible and robust solution. They use a connectionist network to
achieve this goal, in which the immediate inter-onset intervals (level 1 IOIs) of a rhythmic sequence
are held in a buffer, and over a number of iterations are steered towards integer ratios - the integer
ratios applying not only between adjacent level 1 IOIs but also between level 1 IOIs and composite
durations formed out of combinations of level 1 IOIs. This embodies the principle that in a quantised
sequence, individual IOIs should not only be in integer relationship with their neighbours, but also
with the larger beat units to which they contribute. A quantised sequence of semiquavers, for example,
exhibits 1:1 ratios at the lowest level, 1:2 ratios between each semiquaver and the quaver unit of
which it is a part, and 1:4 ratios between each individual semiquaver and the crotchet unit formed out
of a group of four.
Desain & Honing's connectionist quantiser (which exists in both ‘static' and ‘process' versions) is
able to quantise with impressive flexibility and robustness, producing correct representations of
precisely those kinds of continuously varying sequences which conventional quantising methods
cannot handle. One extension of the connectionist model (Desain, 1992a) demonstrates emergent
metrical behaviour, and another (Desain, 1992b) shows how simple short sequences of continuously
variable durations migrate in the quantiser towards a small number of rhythmic ‘nodes' - providing
a highly contoured ‘rhythm space' in which non-integer rhythms slide down the gradients of the
space towards integer valleys. In this work in particular, the relationship between categorical
perception and quantisation can be seen (as the authors themselves note): the valleys of the rhythm
space are rhythmic categories, and the gradients represent the instabilities and migration paths of
within-category variation.
A fundamental question, however, is whether people are doing anything remotely like this when they
listen to music. The ‘filtering' approach of most commercial quantisers certainly seems to be
nothing like human perception (and behaves quite differently), but even the approach adopted by
Desain and Honing may bear little resemblance to human perception despite the more attractive
principles on which it is based and the more interesting behaviour that it displays. Their approach
results in a more flexible and effective transcription process, but does it embody principles which
have psychological reality - which can explain perceptual processes? Is transcription into standard
music notation really identical with, or even analogous to, the perception of musical rhythm?
5. Perception of rhythm, Perception of events
What is the perception of rhythm? In attempting to address this question, a fundamental difficulty is
the diversity of definitions and characterisations of rhythm, as Parncutt (1994) has also noted - and in
particular the relationship between rhythm and metre. If discussion is confined for the moment to a
consideration of metrical rhythms, it is clear that very significant component of rhythm perception is
simply the perception of metre in rhythmic sequences. It is extremely unsettling for a listener not to be
able to identify the metre of a sequence that s/he believes to be metrical, and many aural training
exercises are essentially aimed at helping listeners to become quicker and more adept at identifying
the metre of a sequence. Parncutt (1994) expresses a similar view, though he focuses on the less
differentiated notion of pulse rather than metre. Indeed, after a discussion of the difficulties of
defining rhythm, his definition is: "A musical rhythm is an acoustic sequence evoking a sensation of
pulse" (p. 453). One approach to the perception of rhythm is thus to regard rhythm not as an object but
as a medium - or (to use the language of event perception) as information for events.
What are the events that are specified in rhythm? One such ‘event' is metre itself: metrical rhythmic
sequences specify a particular metre, and this metre is the extended event that we perceive in the
rhythm. The enormous variety of individual durations that may make up a metrical rhythmic sequence
will (if the sequence can be perceived as having a stable metre) possess an invariant property that
specifies the metre. This is the standard ecological situation of ‘the detection of invariants under
transformation', and from a different theoretical perspective has been investigated by empirical studies
of metre perception (e.g. Lee, 1991; Parncutt 1994). Another kind of event that is specified in rhythm
is a figure or group: in the same way that the rapid collection of impacts at the start of a game of pool
specify the ‘break' (an event), so too a particular pattern of durations in a musical sequence may
specify a particular figure or group - even if we do not have a specific name for it. These figures or
groups may themselves have a strong ‘real world' component, too - in that they may be heard to
have particular motional origins (limping rhythms, stately rhythms, rushing rhythms) whether these
are real or virtual.
How does this perspective relate to the issues of quantisation and categorical perception? In a paper
aptly entitled "Events are perceivable but time is not", Gibson (1975) points out that event perception
"asserts that when an event has been perceived there are two kinds of concurrent awareness, one of
variation and one of non-variation. That is to say the observer perceives both what is altered and what
remains unaltered in the environment." (p. 298, Gibson's emphasis). Categorical perception and
quantisation can be seen in exactly the same light: when listening to music, we are concurrently aware
both of fluctuations of tempo (which may specify expression, the hesitations of incompetence,
anxiety, or lack of control) and the rhythmic events (figures, groups and metre). Quantisation presents
this as the separation of microvariations from underlying canonical values: "Quantization is the
process by which the time intervals in the score are recovered from the durations in a performed
temporal sequence; or to put it another way, it is the process by which performed time intervals are
factorized into abstract integer durations representing the notes in the score and local tempo factors."
(Desain, 1992b, p. 240). The conventional approach to categorical perception, similarly, presents the
problem as one of ‘separating the wheat from the chaff', of the elimination of ‘noise' so as to
recover stable underlying values. But this is to overlook the obvious perceptual value of the
‘non-categorical' component of perception (as discussed above), and the different kinds of events
and tranformations that this conveys. Expression and event structure in rhythm collaborate in a highly
integrated fashion, as when expression acts to clarify (or, equally, intentionally to call into question)
structure.
6. Conclusion
In this paper I have argued that categorical perception is not a ‘special feature' of some sensory
systems, or of some kinds of materials: it is the inevitable consequence of our sensory systems'
fundamental orientation towards event perception. The events of the environment (in all its natural
and cultural diversity) are what we are sensitive and attuned to, and it is events that we perceive. The
categories of categorical perception are no more and no less than those events, and through perceptual
learning different individuals under different circumstances will demonstrate differential sensitivity to
those events. Since metre is a fundamentally important component of (metrical) rhythm, it is not
surprising to find evidence of what has been called ‘categorical perception' of rhythm - closely
linked to the distinctions between different metres. By analogy, categorical pitch perception (see
Burns, 1999 for an overview) is the inevitable consequence of the importance of fixed pitches (and
discrete intervals) in the Western system. Metre, however, is not the only kind of event in rhythm
(figures and gestures are some others), and the so-called non-categorical components of rhythm
(expressive features and other aspects of continuous temporal variation) convey a host of other kinds
of events and their transformations. In short, categorical perception as a concept seems to offer little in
the way of explanatory value.
References
Burns, E. M. (1999): Intervals, scales, and tuning. In D. Deutsch (Ed.) The Psychology of Music.
Second Edition. New York: Academic Press, p. 215-264.
Clarke, E. F. (1987): Categorical rhythm perception: an ecological perspective. In A. Gabrielsson
(Ed.), Action and Perception in Rhythm and Music. Stockholm: Royal Swedish Academy of Music, p.
19-34.
Clarke, E. F. (1989): The perception of expressive timing in music. Psychological Research, 51, 2-9.
Desain, P. (1992a): A (de)composable theory of rhythm perception. Music Perception, 9, 439-454.
Desain, P. (1992b): A connectionist and a traditional AI quantizer, symbolic versus sub-symbolic
models of rhythm perception. Contemporary Music Review, 9, 239-254.
Desain, P. & Honing, H. (1989): The quantization of musical time: a connectionist approach.
Computer Music Journal, 13 (3), 56-66.
Fraisse, P. (1956): Les Structures Rythmiques. Louvain, Paris: Publications Universitaires de
Louvain.
Gibson, J. J. (1975): Events are perceivable but time is not. In J. T. Fraser and N. Lawrence (Eds.)
The Study of Time II. Berlin: Springer Verlag, p. 295-301.
Harnad, S. (1987): Categorical Perception. The Groundwork of Cognition.
Lee, C. S. (1991): The perception of metrical structure: Experimental evidence and a model. In P.
Howell, R. West and I. Cross (Eds.) Representing Musical Structure. London: Academic Press, p.
59-127.
Livingstone, K. R., Andrews, J. K. & Harnad, S. (1998): Categorical perception effects induced by
category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24,
732-753.
Longuet-Higgins, H. C. (1979): The perception of music. Proceedings of the Royal Society of
London, B 205, 307-322.
Longuet-Higgins, H. C. & Steedman, M. (1971): On interpreting Bach. In B. Meltzer & D. Michie
Back to index
Proceedings paper
Vera Milankovic
Gordana Acic
Performance in Education: a Field to Explore
Introduction
Serbian music education system embraces a wide range of children whose abilities vary in the acquisition of
musical skills. A major instrument and solfege provides a rather extensive curricula. Major instrument
instruction is in the form of private lessons, and solfege as class ensemble.
There is a difference in performing outcomes due to different vocal and instrumental abilities, teaching methods,
practice hours etc. A detailed investigation of their respective role is necessary.
Considering a busy schedule of all participants in educational process, this subtle investigation is not easy to
convey. The teacher's role could not be overestimated, since they convey key messages which are built in
education theory and philosophy implemented in technical and didactic practice.
In music education, the analysis of performance should articulate main features of the students' performance in a
meaningful context for both teachers and research workers. Teachers should evaluate the effects of their teaching
methods and scholars their conclusions on music abilities and performance.
Theory
Music communicates esthetisized emotions, through a cognitive framework, therefore creative music education
should establish and follow the emotional and aesthetic communion between teacher and student. At first music
is used as a medium to establish the communion, later, as it develops into an aethetisized emotional relationship,
music becomes its goal.
Music, played and listened to, is an interplay of emotional and aesthetic relationships within the performer's and
listener's mind. While playing, the performer balances emotional and aesthetic aspects of his performance. This
process should run parallel in the imagination of the listener, provided the reception is creative. Namely, while
listening, the listener should experience the reinterpretation of the same music.
A creative teacher should search for the ideal - model of perfection in order to engage the students "body and
soul" which in joined effort tries to reach. The Ideal - is experienced as a model of perfection only as the whole.
Only a holistic approach to the Ideal reveals its metaphoric meaning.
The teacher's role is to introduce, develop and engage the student's imagination, so that the student can use it
when inspired to reach the ideal.
The teacher articulates the body motions-feelings which follow the realisation of the ideal, enabling a long term
kinaenstheic impression of the quality of performance.
The teacher initiates the development of student's control over his motion and feeling by enabling him/her to
become his/hers own audience by feed back principle, as it works both sides.
Music is a performing art, therefore music pedagogy is a performing art par excellence.
The aim of Solfege is to
1.develop cognitive processes necessary for reading and understanding music which is profound only if it
includes emotional involvement
2.develop vocal skills enabling articulate understanding of music however uncomfortable to human voice
3. the main tool in Solfege is solmisation. It determines the tonal interrelationships (latent harmony etc.)
Our study should be observed within the below proposed model of creative and effective music education.
ideal
as a cognitive-emotional aim (the tone quality, style connotations, technical idiom, form)
teacher
evaluation of the
interpretation
audience
teaching
procedures
solfege
performer
student
The students were expected to sing and play each song in succession. Their performance was tape recorded.
Before and after the performance, each student was expected to fill in a small questionnaire regarding his
approach to the preparation of the performance; comparison between his vocal and instrumental performance,
and self - estimation of the quality of their aural control:
1. successive and
2. retrospective
The recordings were taken in the exam period at the end the first term. No special instructions about the
performance or the aim of the study were given beforehand.
The audio - tapes of the performance were given to 5 secondary school teachers (one piano teacher; three solfege
teachers and one music history teacher) for estimation.
1.The teachers evaluated the following parameters of
intonation,
rhythm,
tempo
phrasing,
emotional1 and
aesthetic expression.
These were the parameters in singing and playing respectively. Pianists prevailed so that we omitted intonation
as a significant parameter in playing.
1 Emotional expression meaning perceived emotional involvement in performance
Results:
1. A significant positive correlation was found between performance ratings in singing and playing ( with
Spearman coefficient = 0.86 p = .001)
2. Significant difference (Man-Whitney U test) was found in mean ranks for singing and playing. performance
between groups with self-estimated good and poor on-going aural control in singing.
Therefore, the quality of aural control in singing turned out to be the important correlate of the success in
singing, as well as in playing.
However, the quality of on-going aural control in playing made no significant differences in mean ranks neither
in playing nor in singing performance, and no correlation was found.
In order to test students retrospectively (post factum aural control) we asked them to give their own estimates on
comparing their own performance outcomes in singing vs. playing. The students turned out to be quite aware of
their success in performance and their estimation in great majority of cases - correct.
3. There is evident difference in the overall performance ratings of Flint Stones tune as opposed to the other
three. Flint Stones was the only scored tune all participants recognised before the performance by simply looking
at the score.
4. In attempt to shed light on potential structuring of teachers ratings, we submitted all ratings to hierarchical
cluster analysis. We sought to find out optimal number of latent dimensions to cover all of them . Two
dimensions was the optimal number.
Reliability among teachers was fairly strong , with coefficient for parameter totals ranging from .0/80 for
emotional expression in singing to .0/96 for intonation.
As shown above, the best performed / evaluated parameters in playing were rhythm, tempo and phrasing, while
intonation and aesthetic - emotional aspects were not as good (below 3.5 average)
Intonation was most varied in performance but at the same time, most consistently judged by the teachers.
Conclusions
We support
1. the introduction of the IDEAL in all forms of music education
2. two forms of instruction: private lessons and master class, because they encourage emotional and aesthetic
aspects of music communication
3. a mutual responsibility between teacher and student (in private lessons) and/or between students (at the
master class sessions), stimulating a particular motivation related to performing music
4. performance analysis and self-evaluation a useful tool for testing educational technology investigating
teacher's ratings as reception-evaluation process and respective roles of all parameters involved
We plan a research on performing techniques engaged in achieving the IDEAL common to different kinds of
musicianship.
REFERENCES
1.Sheldon,D.A.,Reese,S., &Grashel,J.(1999).The effect of Live Accompaniment,Intelligent Digital
Accompaniment, and No Accompaniment on Musicians` Performance Quality. Journal of research in Music
Education,vol.47,No.3,251-265.
2. Miroslava Lili Petrovic Skolica za klavir, 1991. Nota Knjazevac
Back to Index
Proceedings paper
The main aim of this paper is to describe a means of collecting data about everyday music use that is both comprehensive and contemporaneous. It provides a fine-grained map of the musical
world of individuals without relying on their reminiscences and generalizations. We believe such a method is necessary to answer satisfactorily some important questions about music use for
which other methods are not particularly well suited.
Our primary focus is on the situations in which musically untrained listeners (who constitute the vast majority of Western populations) experience music as they go about their everyday lives.
When and where do they engage with music; what are they doing while they engage with it; who are they with; how intentional and purposive is that engagement; and what psychological
outcomes follow from that engagement? Such an approach aims to capture the richness and diversity of everyday musical experience whilst taking into account the social context in which music
listening occurs.
Methodological considerations
Much of the music psychological literature starts with a particular musical work, or fragment of that work, and investigates the psychological response to that work in a sample of listeners who
are exposed to it, usually in a situation devised and controlled by the experimenter. In the vast majority of such studies, the experimental situation closely resembles the paradigm of the Western
classical concert (Frith, 1996; Cook, 1998) where listeners sit in still and silent concentration, attending to musical materials determined and prepared by others, with external distractions
removed to a minimum. For some purposes, such as investigating maximal capacity of perceptual or cognitive mechanisms, this may be a satisfactory, even desirable, simplification. For other
purposes, connected with investigation of the social and emotional functions of music, it is clearly inadequate. This is primarily because the experimental paradigm severely curtails the choice
and autonomy of the listener. Many relevant psychological mechanisms are simply, thereby, removed from the field of potential investigation.
The dominant response within music psychology to the deficiencies of the experimental method has been to tap the generalized knowledge of listeners about their own music engagement through
retrospective verbal data. This data can be in the form of categorical responses to investigator-designed questionnaires (e.g., likert scales etc.) that are subject to multivariate statistical analysis
(e.g. Behne, 1997), or in more open-ended forms of discourse that may be subject to qualitative analysis (e.g. Sloboda 1992, 1999). Such data provide a rich source of themes and hypotheses, but
they suffer from the general deficiencies of retrospective investigations. Participants may recall events on the basis of their centrality to core personal themes and preoccupations, or because their
atypicality makes them more memorable, rather than because they represent predominant or habitual modes of listening. Generalized questionnaires can both limit enquiry to the concepts and
areas specified by the investigator, and also encourage respondents to take positions on issues which they have never really thought about before.
A less utilized but promising line of investigation involves experimental interventions within a wider range of music engagement contexts than the "concert hall simulation" of the laboratory. For
instance North and Hargreaves (in press, 1997) report a series of studies in which musical parameters have been manipulated in everyday situations such as shops and supermarkets, restaurants
and canteens, aerobics and meditation classes. This allows the identification of musical features that are deemed by participants to be more or less appropriate to the activity concerned. It also
allows for the experimenter to identify direct influences of music on behaviour, which may or may not be available for conscious report by participants.
Although there is no form of data gathering which can entirely escape the problems of investigator effects, this paper outlines a method which comes as close to direct observation of daily
musical life without intervention as is practically and ethically permissible. For obvious reasons, surveillance, via video and audio recording, is not a meaningful option, and in any case such
surveillance would have to be accompanied by verbal questioning to elicit information about inner responses.
The method chosen here is an adaptation of the Experience Sampling Method (ESM), as described by Csikszentmihalyi and Lefevre (1989), in which participants are signalled via electronic
pagers at random intervals during the day. At each paging they are required to complete a brief response form relating to current or immediately preceding experience. In this way, we are able to
examine individuals' subjective experiences during 'real' evolving musical episodes in the context of everyday life situations.
Music 81 32 0 19 17 3 2 2
episodes
Non-music 96 3 73 5 1 6 13 1
episodes
Total 77 35 73 24 18 9 15 3
episodes
Participants were also able to freely respond to the question "What was the MAIN thing you were doing?". Reported activities were coded post-hoc according to the classification shown in Table
2. There were three main categories, personal, leisure and work. Personal activities cover those everyday activities which are a necessary consequence of living, and are further divided into states
of being, maintenance activities and travel. Leisure activities were divided into three sub-categories, including music-related activities in virtue of the focus of this study. Work activities were
categorized according to whether they were primarily solitary (self) or primarily group-based (other). Table 3 shows the distribution of episodes over activity category.
Table 2
Categorisation of activities
Personal - being states of being (e.g. sleeping, waking up, being ill, suffering from hangover)
Personal - maintenance washing, getting dressed, cooking, eating at home, housework, shopping
Leisure - music listening to music (n.b. no examples of performing music were found)
Leisure - passive watching TV/film, putting on radio, relaxing, reading for pleasure
Leisure - active games, sport, socializing, eating out, chatting with friends
Listening to music as a main activity accounted for a small minority of all episodes (2%). Episodes were divided roughly equally between personal, leisure, and work. There was a significant
difference in frequency of musical episode as a function of activity. A one-way ANOVA (F (8,347) = 18.58, p < 0.0001) with Tukey tests showed that personal-maintenance, personal-travel, and
leisure-active were not significantly different from each other, but showed a significantly greater proportion of music episodes than all other categories, which were not significantly different
from each other (7 out of 8 participants showed this ordering). It is particularly notable that few episodes of "music while working" were reported in this predominantly academic group of
participants.
Table 3
Activity of episode as a function of music being present
Music 1 4 46 35 6 20 28 16 0
episodes
Non-music 7 23 25 10 0 40 9 68 18
episodes
Total 8 27 71 45 6 60 37 84 8
Episodes
Reasons for the activity were roughly equally divided between "I had to do it", and "I wanted to do it", with the other two categories (it was a basic part of my routine, and I had nothing else to
do) accounting together for less than 15% of the total episodes. A one-way ANOVA (F (3, 290) = 8.47, p < 0.0001) with Tukey tests showed a significant effect of reason on distribution of music
episodes. Only one third of "I had to do it" episodes involved music, whereas two-thirds of the other types of episode involved music (effect found in 5 out of 8 participants).
Outcomes of music listening and autonomy
The third question relates to choice and autonomy. If, as is suggested above, this is a potent variable, then the degree of choice over the music being heard should affect psychological outcomes.
The study, therefore, assesses both the degree of choice exercised over the music, and a range of outcome measures, including mood-change, and estimates of the degree to which the music
contributed to valued outcomes. It is predicted that there will be a positive relationship between level of autonomy and valued outcomes.
For each music episode participants rated their mood on ten 7-point bipolar scales both before and after the music. The ten "before music" mood scores were subject to principal component factor
analysis with varimax rotation. A three-factor solution was obtained. Factor 1 accounted for 36% of the variance. Factor 2 for 14% and Factor 3 for 12%. Factor 1 was designated POSITIVITY,
loading highly on distressed - comforted (0.62), happy - sad (-.58), irritable - generous (0.64), secure - insecure (-.59) and tense - relaxed (0.86). Factor 2 was designated PRESENT
MINDEDNESS, loading highly on interested - bored (0.69), involved - detached (0.76), lonely - connected (-.64), and nostalgic - in-the-present (-.75). Factor 3 was designated AROUSAL,
loading highly on alert - drowsy (0.82), and energetic - tired (0.82).
Mood change effects were assessed by a series of analyses examining the difference between mood before and after music exposure. The distribution of mood change was examined by
subtracting mood factor value before music from that after it. In each case the mean change over all episodes was positive. Music tended to increase arousal, present-mindedness and positivity.
However, there were instances of no change and negative change.
Arousal 63 82 5
Present-mindedness 90 9 48
Positivity 71 71 11
High instances of no-change were found on the arousal and positivity factors (see Table 4). Present-mindedness tended to be more changeable in response to music. There were only nine
instances of "no change" in the whole set. However, correlations between factor change scores showed that all three factors tended to change together and in the same direction (p<0.001 in all
cases).
We were interested to see whether there were any cases in which mood change factors were dissociated from one another (i.e. cases where one mood increased simultaneously with another mood
decreasing). Inspection of cases showed that there were 25 such episodes in the entire sample (16% of all music episodes). These cases are particularly important in beginning to identify distinct
functional niches for music engagement. The largest group of such episodes (10) involved increases in positivity along with decreases in present-mindedness.
One example of this comes from a male participant reporting being at home relaxing with a group of friends and acquaintances. The activity was being done out of choice. There was ambient
music playing on a CD, although the participant had not chosen it. The participant commented that "the music was very tranquil and relaxing", that others present were "discussing work
boringly" and that he was "very, very tired". This episode was also associated with a decrease in arousal during the music. It would be reasonable to assume that the participant was using the
music as a means of relaxing and disengaging from the surrounding conversation.
A second example from the same category is provided by a female participant who reported being at home, tidying a bedroom as part of the normal basic routine. The participant had chosen to
listen to a piece of pop/chart music on a tape. The participant commented that the music was chosen to "enhance the wonderful experience of cleaning" and was "very lively". This episode was
associated with an increase in arousal during the music. It seems as if the purpose of this music was to allow the participant to focus her attention on the music, and away from the uninteresting
domestic chore, and this focused attention was used to increase energy levels.
It was less easy to find examples of episodes where positivity decreased, but one clear episode involved a female participant at home, alone, doing the washing up as part of the basic routine. She
had chosen to listen to rock music on the radio and commented that the track was "a favourite song I had not heard for some time... It brought back certain memories". The music increased this
participant's nostalgia, sadness, and loneliness, at the same time as making her more alert. It is pretty clear that this episode reminded the participant of a significant past event that brought on
nostalgia. At the same time it appears as if she had chosen the music to engage and arouse during an uninteresting routine task.
In order to investigate systematically the influence of autonomy or choice on response to music, participants were asked to rate the music in each episode on an 11-point scale according to the
degree of personal choice exercised in hearing the music (from 0 = none at all, to 10 = completely own choice). The distribution of scores over the scale was not uniform, with responses
clustering on 0 and 10. In order to increase cell size, this variable was recoded as a three-point scale where 0-3 = low choice, 4-7 = medium choice, and 8-10 = high choice.
One-way analyses of variance were carried out for each mood factor, where the dependent variable was the amount of mood change from before to after the music, and the independent variable
was degree of choice. There was a significant effect of degree of choice on each mood factor. For each mood factor this showed a similar effect - the greater the choice the greater the mood
change. Table 5 shows the mean change scores for each cell, and F-ratios associated with each analysis. For positivity and arousal these effects did not differentiate between participants, each of
whom showed the same effect. For present-mindedness there were significant individual differences (interaction effect F (9,13) = 1.89, p <0.05), with only 6 out of 8 participants showing the
main effect of choice on mood change).
Table 5
Mood change as a function of degree of choice over music experienced
file:///g|/Tue/Sloboda.htm (7 of 21) [18/07/2000 00:35:44]
There exists a particular attitude to music listening which implictly dominates much psychological... response to music. This is what I have charicatured (Sloboda, 1989) as the "pharmaceutical" model
One-way analyses of choice as a function of place, activity, reason for activity, and companion, showed that some situations are associated with significantly more choice over the music than
others. High choice situations occur when the person is alone, travelling or working, at home or in a vehicle, undertaking activities for duty. Low choice situations occur when with others, during
active leisure or personal maintenance activities, in shops, gyms, and entertainment venues, and when doing activities because one wants to.
Discussion
Our findings indicate that the Experience Sampling Method is a robust method for examining individuals' subjective experience during 'real' evolving musical episodes in everyday life.
Participants provided responses on over 90% of calls, and we found that 44% of all responses involved music episodes. While generalizations to the population at large are not possible with such
a small sample, it is already clear that music listening is not randomly distributed over contexts. Even on this small sample there were highly significant effects on almost every variable
measured. For instance, music occurred very frequently while participants were travelling, or in public places (such as shops), moderately frequently at home, and less frequently in other
locations. It tended to accompany active leisure (e.g. going out with friends) and maintenance activity (e.g. housework, washing, shopping) more than deskwork or passive leisure (e.g. reading,
watching TV), and tended to accompany activities undertaken by choice rather than for duty. Although most of these findings are not surprising, the link between choice and activities undertaken
for duties is not intuitively obvious. It may be that choosing music to accompany duties is a way of bringing some autonomy and personalization back to them. DeNora (1999) suggests that the
music associated with duties is used as a catalyst to shift individuals out of their reluctance to adopt what they perceive as 'necessary' modes of agency, and into modes of agency 'demanded by
particular circumstances'.
Our findings are consistent with those reported by respondents in the Sussex Mass Observation survey (Sloboda, 1999) where the most frequently mentioned activities involving music were
housework and travel. Although functions mentioned by participants in the survey were varied, they had a predominately affective character, with many participants (particularly women)
explicitly mentioning music as a mood-changer or enhancer. The most frequently mentioned function was essentially nostalgic. As in the Mass Observation survey, almost no episodes in our
present study had music as the primary focus. Participants were doing something else with music as accompaniment to that activity. Similar findings were reported by DeNora who also found
that primarily among her respondents over the age of 70 and those who were trained musicians did it tend to be considered antithetical to conceive of music as 'background' to anything. This has
major theoretical implications for how we conceptualize music use. The focussed attentive and "respectful" listening of the "music lover" figures hardly at all in our present sample of
non-musicians. Furthermore, the study has allowed significant progress to be made on typologies for non-musical activities and locations based on the open-ended questions. The prospects for
using the ESM approach to develop further typologies for specific functional niches are promising given our results.
Mood was measured by 10 bipolar scales, both before and after music. Factor analysis yielded a clear and familiar grouping based on valence (positive-negative), arousal (alert-drowsy), and
attention (loading on interested-bored, involved-detached, connected-lonely, and nostalgic - in-the-present). In the great majority of cases there was mood change as a result of music, and such
change was generally in the direction of greater positivity, arousal, and attention. This is, we believe, the first demonstration of robust mood effects of music outside the laboratory.
Notwithstanding the predominance of generalized mood "improvement", not all episodes showed the same pattern. The major result of this part of the study is a strong moderating effect of
choice. Reported mood change is significantly greater for episodes where participants exercise high choice over the music they hear than for episodes where there is little or no choice. The largest
effects are for self-chosen music. About 20% of episodes involved more complex patterns of mood change (such as increases in positivity together with decreases in arousal). There are clear
suggestions that these different patterns arise from the specific niches in which they occur (location, activity, and purpose of music). For instance, some music is chosen to make one feel better
file:///g|/Tue/Sloboda.htm (8 of 21) [18/07/2000 00:35:44]
There exists a particular attitude to music listening which implictly dominates much psychological... response to music. This is what I have charicatured (Sloboda, 1989) as the "pharmaceutical" model
about, and distract one from the boredom of, some routine task, such as cleaning. In this case one would expect increases in positivity with decreases in attention to task. This is exactly what we
found in some episodes.
De Nora (1999) also found examples of self-regulation involving a number of musical strategies described by participants as 'revving up' or 'calming down', 'getting in the mood' (e.g., for a
particular social event), 'getting out of a mood' (e.g., to improve a 'bad' mood or 'de-stress'), 'venting' strong emotions. For the most part, as in the present study, these were predominantly
described at the 'personal' or intrapersonal level as a means of creating, enhancing, sustaining and changing subjective, cognitive, bodily, and self-conceptual states. According to De Nora, the
women in her study 'drew upon elaborate repertories of musical programming practice, and a sharp awareness of how to mobilize music to arrive at, enhance and alter aspects of themselves and
their self-concepts. This practical knowledge should be viewed as part of the (often tacit) practices in and through which respondents produced themselves as coherent social and socially
disciplined beings' (p.35).
Music, when viewed as a cultural resource, provides numerous ways in which musical materials and practices can be used as a means for self-interpretation, self-presentation, and for the
expression of emotional states associated with the self. According to De Nora, a sense of self is locatable in music, in that 'musical materials provide terms and templates for elaborating
self-identity (p.50). Although viewed as essentially 'private' experiences involving a great deal of autonomy or agency, everyday musical experiences are deeply embedded in a social context
which exerts a powerful influence (albeit often implicitly) on our music listening. Thus, our engagement with music is enmeshed in a social and cultural world where we can 'forget' or become
unaware of the grounds on which our feelings and behaviours are based. This 'forgetting' is the product of years of training, socialization, and the institutionalisation of music. Not only have our
musical practices become routine and invisible, but as musicians and psychologists we are limited in our ability to describe musical materials in a way that is free of the assumptions and biases
associated with our own experiences and training. Further research is needed which seeks to identify and further our understanding of the role of music listening as part of the rules and
conventions of a particular social context and the unfolding episodes in which they occur. The ESM provides a useful approach for capturing the complexity of everyday evolving musical
situations in a way that makes it possible to retrieve some of these 'forgotten' or 'hidden' practices, thus furthering our understanding of the meanings associated with our evaluative judgements of
the functionality of music in everyday experience.
Correspondence concerning this article should be addressed to John A. Sloboda, Department of Psychology, Keele University, Keele, Staffordshire, ST5 5BG, UK. Email address:
j.a.sloboda@psy.keele.ac.uk
References
Behne, K. E. (1997). The development of "Musikerleben " in adolescence: how and why young people listen to music. In I. Deliege and J. A. Sloboda (Eds.), Perception and Cognition of Music
(pp. 143-160). Hove: Psychology Press.
Cook, N. (1998). Music: A very short introduction. Oxford: Oxford University Press.
Csikszentmihalyi, M. and Lefevre, J. (1989). Optimal experience in work and leisure. Journal of Personality and Social Psychology, 56, 815-822.
DeNora, T. (1999). Music as a technology of the self. Poetics, 27, 31-56.
Frith, S. (1996). Performing Rites: On the Value of Popular Music. Oxford: Oxford University Press.
Gabrielsson, A. and Lindstrom, S. (1996). Can strong experiences of music have therapeutic implications? In R. Steinberg (Ed.), Music and the Mind Machine: the Psychophysiology and
Psychopathology of the Sense of Music (pp. 195-202). Berlin: Springer.
Merriam, A. (1964). The Anthropology of Music. Chicago: Northwestern University Press.
North, A. C. and Hargreaves, D. J. (1997). Experimental aesthetics and everyday music listening. In D. J. Hargreaves, and A. C. North, (Eds.), The Social Psychology of Music (pp. 84-103).
Oxford: Oxford University Press.
North, A. C. and Hargreaves, D. J. (in press). Responses to music in aerobic exercise and yogic relaxation classes. British Journal of Psychology.
Sheridan, D. (in press). Mass-Observation revived: the Thatcher years and after. In Sheridan, D., Street, B., and Bloome, D. (Eds.), Writing ourselves: Mass-Observation and literacy practices.
Cresskill, NJ: Hampton Press.
Appendix A
Date: _____________ Time Beeped: ____________ am/pm Time Filled Out: ___________am/pm
SECTION A
What were the principal reasons why you were doing this particular activity? (tick as many as apply)
__ It was a basic part of my routine (e.g. something that is so habitual that I don't really think about my
SECTION B
Who were you with when you were hearing music? (tick as many as are applicable)
__ Alone __ Partner __ person/people you live with __ Family member(s) __ Friend(s) __ Strangers
__ Acquaintance(s) __ person/people you work with __ professional(s) (e.g. dentist)
If with one other person, was that person male ___ or female ___ ?
Here are some ways in which moods or states can be described, paired as opposites.
For each pair, tick the category that most closely describes the way you felt immediately
before the music started. If the way you felt changed while you were hearing music, write
a C in the space that most closely describes where you changed to. If the
way you felt did not change during the music put two ticks in the same place.
Tired
Are there any important ways you felt during the music which are not covered by the above list?
If so, please state them. ________________________________________________________________________________
How much personal choice did you have in hearing the music?
If you chose to listen to music, what was your MAIN reason? Please state in your own words.
Was there anything in the music that you found particularly important or noticeable?
_________________________
Since you were last beeped has anything happened or have you done anything that could have
affected the way you feel? ____________
__________________________________________________________________________________________________________________________________
___________________________
Please write below any additional information or comments about what was happening and/or how
____________________________________________________________________________________________
____________________________________________________________________________________________
_____________
Appendix B
Interview protocol
Prompts:
Was this a typical week for you? If not why?
2. Looking back on the week, what stands out as most significant to you and your experience of the study?
Have you learnt/discovered anything new about yourself or music in general?
Prompts:
Aware of?
Negative/positive feelings?
3. I am going to pick five of the sheets at random and would like you to talk about them in more
detail; in particular: (use diary as guide)
Prompts:
What were you thinking about in particular?
4. Can you imagine your life without music? If not, why not?
Prompts:
What would be missing?
Back to index
Proceedings abstract
Background:
Aims:
method:
Results:
Non-musicians discriminated the tones better in the music than the hum context
and in turn better in hums than speech. The musicians were better on all three
tasks compared with the non-musicians; and moreover were better in the music
than the speech or hum context. Perfect pitch musicians were equally good in
all three contexts, and better overall than the musicians and non-musicians.
Conclusions:
Speech perception and music perception are not independent. Musical training
and perfect pitch ability "leak" through to speech perception ability. The
specific course of speech perception development is not only influenced by the
surrounding language, but also by other events, eg, musical training and
ability.
Back to index
Proceedings abstract
Background:
The literature on the evolution of music is quite sparse, and the
topic is often mentioned only in passing as part of larger proposals
concerning the origin of language and the emergence modern human
cognitive abilities. Although music, no less than language, is a
uniquely human behavior, most evolutionary scenarios either do not
mention music at all, or make ethnocentric assumptions concerning
the nature of music and its relationship to language, assumptions
that are at odds with findings in ethnomusicology concerning the
social embeddedness and mutual interdependence of music and language
across a wide range of socio-cultural contexts.
Aims:
This paper attempts to articulate an evolutionarily plausible and
socially grounded theory of musical meaning in light of recent
proposals concerning the origins of human cognitive abilities.
Main contributions:
Expanding upon Donald's (1991) suggestion that the capacity for
representation evolved prior to the development of language, this
paper proposes that music is grounded in a capacity for "mimesis",
or motor modeling, and has a social ontology rooted in gesture and
preverbal spatio-temporal concepts. Although both music and language
evolved from a mimetic capacity, musical meaning retains a distinct
link to vocal mimesis through sonic representations of bodily
movement and emotional states.
Implications
This work challenges both structural and cultural accounts of
musical meaning by suggesting that music's power is not derived
solely from syntactical or semantic referents, arousals and
expectancies, or from its indexical relationships to a particular
cultural context, but rather through its immediacy as a performance
of socio-emotional essence and embodied gesture.
Back to index
Proceedings paper
Background
The integration of perceptual modes underlying various art forms is a fundamental expression of the
inter-relatedness of artistic thought. Historical concepts such as Johan Kepler’s Harmonia mundi, or
Anasthasius Kircher's Musurgia universalis, have asserted a unity of art and science, and a felt unity
of the senses (de la Motte, 1996, p. 311). Throughout history, evidence of continuously evolving
combinations of artistic disciplines suggests that we are capable of "making sense" of art objects in
one medium through the co-influence of another. Recent experiments demonstrate that the structural
and expressive qualities of performance art have strong parallels across media (Krumhansl &
Schenck, 1997), and that meaning can be conveyed across sensory media as well (Smets & Overbeek,
).
1995 This capacity of the human mind to communicate ideas and understandings cross-modally is
a central aspect of artistic communication and, more specifically, is characteristic of higher-order
musical thinking. We believe there is sufficient evidence in previous research and writings, as well as
in our personal experiences, that when music invokes profound reactions, or when it is felt to be
understood fully, it is in part because music verifies and clarifies knowledge and understandings
acquired through the other senses.
As one step toward a more integrated view of this phenomenon, we introduce the concept of
transactional cognition in music. We call transactional those processes which involve the mutual
influence of otherwise independent forms of sensory or artistic experience. We call cognition that
human behavior which demonstrates an awareness of these processes and makes judgements based
upon that awareness. Transactional cognition is thus related to aesthetic perception and reaction,
creative reasoning, and interdisciplinary thinking in the arts. Our primary interest was to test the
strength of the relationship between alternative domains of sensory information. We designed an
experiment to explore cross-modal correspondences between two types of sensory experiences—those
of hearing (henceforth referred to as "aural") and touch (henceforth referred to as "tactile"). Our
assumption is that such correspondences might arise from a process of encoding aural information
into tactile information, and decoding tactile information back into aural information. Our principal
goal was to determine the success of this encoding/decoding process when we employed contrasting
groups of subjects, and to consider the psychological implications of this communicative process by
means of both quantitative and qualitative information. The lack of research on this topic obliges us to
review studies in cross-modality, synesthesia, and tactile response as those most closely related to our
study.
Cross-Modal, Synesthetic, and Tactile Response Studies
Qualitative research based on introspective accounts of music (Kleinen, 1994) and brain imaging data
(Petsche, Richter & Filz, 1995) suggest that the cognitive processing of music evokes a wider range of
sensory information than is available from the musical sounds themselves. These studies suggest that
the interactions of aural experience and other types of perceptual experience are active in music
cognition. Cross-modal stimulation through music listening, which we view as closely related to
transactional cognition, has become an important aspect of research in music therapy (James, 1984;
Fisher & Parker, 1994).
Research on cross-modal effects in music perception, referred to as synesthesia (Cuddy, 1994), and
cross-modal analogy (Behne, 1992), has identified strong interrelations between the senses. While
synesthesia is characterized as a stable yet idiosyncratic phenomena which occurs in relatively few
individuals (Behne, 1992), cross-modal analogy is considered more universal (Behne, 1992). For
example, the concept of 'brightness' in pitch perception might be based on a common gradient (i.e.,
salient perceptual dimension) of the visual and auditory domains (Osgood, 1960), making it an
example of cross-modal analogy rather than synesthesia. Such analogy appears to be based on
semantic coding (Osgood, 1960; Martino & Marks, 1999) or common stimulus dimensions.
Tactile perceptions of music, contrary to visual perceptions or "photisms," have been only marginally
addressed in the literature on music synesthesia (Behne, 1992; Cuddy, 1994). Vibrotactile cues in
singing or instrumental performance have been hypothesized as feedback signals that mediate
performance processes (Verillo, 1992). Despite the prominence of physical responses to music
experiences as reflected in numerous studies of spontaneous peripheral-physiological reactions to
music (Bartlett, 1996), tactile responses in music have curiously escaped systematic investigation.
Tactile and other bodily sensations have been reported in verbal accounts of music experiences
(Kleinen, 1999; Sloboda, 1991), and in studies of autonomic processes of the human nervous system
in response to music listening (Goldstein, 1980; Panksepp 1995). It appears likely that tactile
metaphors in music descriptions are not based on mere linguistic convention, but rather appear to
reflect psycho-physiological mechanisms. Spontaneous pilomotoric reflexes to music, e.g., "thrills",
or "chills" (Goldstein, 1980; Panksepp, 1995) appear related to the release of hormones that modulate
stress (McKinney et al., 1997). Tactile responses to music such as "chill" sensations occur across a
very large population irrespective of musical training, thus provide a tangible object for the study of
the aesthetic experience of music.
Method
Stimulus materials
Eight graduate art students specializing in ceramics from a large university in the midwest United
States participated in the first phase of this experiment. Three instrumental musical excerpts of
differing musical styles were selected for presentation to these ceramicists as follows:
Excerpt 1— "Le Repos de la Sainte Famille" from: Hector Berlioz, L ENFANCE DU CHRIST,
English Chamber Orchestra, Philip Ledger, conductor, Thames DCD 452.
Excerpt 2— "River of Orchids" from: XTC, "Apple Venus, Volume 1," TVT Records 3250-2.
Excerpt 3— "Kismet" from: Anokha, "Soundz of the Asian Underground," Quango
314-524-341-2.
These excerpts were selected for their diversity of styles and the predominance of various musical
elements. To briefly describe the character of these excerpts, "Le Repos de al Sainte Famille",is
orchestral program music from early nineteenth century, dominated by strings and slow tempo; "River
of Orchids" is progressive rock music from England, featuring string and brass figures organized in a
cyclical composition; "Kismet" is underground dance music, comprised of mostly electronic timbres,
Indian-derived tabla accompaniment, and moderately fast tempo. Each excerpt was digitized using
sound synthesis software, and edited to exactly 1:35 minutes long, including a five-second digital
fade. The digital excerpts were written to compact disk and presented on a portable Aiwa stereo
system fitted with Sony stereo headphones.
The ceramicists were administered this treatment in two groups of four. They were seated in their
regular studio and asked to listen to one of the three excerpts and interpret the music on the surface of
a 12 x 12-inch square tile of wet clay. They were provided with the following directions:
In this study we are investigating the relationship between tactile and aural experiences. We would like
you to create a ceramic tile which represents your interpretation of a short musical excerpt. We would
also like to record your impressions of the task. Your identity in this study will be kept confidential.
Please listen carefully to the music excerpt. The first listening is just for you to become acquainted with
the music. We will repeat the music excerpt again, during which you may start to work on the clay
which you have in front of you. You may choose to hear the music excerpt repeated at any time.
However, after ten minutes we will play the excerpt again to remind you that you are at the midway
point, and for a final time after twenty minutes, or when you are finished, as a means of confirmation.
You have a maximum time limit of twenty minutes to create your ceramic tile.
We ask you to try to express your own perception of the music or aspects of the music which appear
prominent to you by means of the material in front of you. For this purpose your eyes will be
blindfolded until you are finished with your work. There are no restrictions in the way in which you
work the clay, but we ask you to leave it in just one piece and to preserve, if possible, the overall shape
of it. Any questions? You may begin now.
The single question that arose from the ceramicists was our policy on their use of ceramic "tools,"
which we permitted for those who requested them. The tools were limited to those available in the
ceramics studio—small wooden implements and sponges.
Procedure
Forty elementary education majors (33 female, 7 male) from the same participated in the second phase
of this experiment. The subjects were tested in groups of four. Each subject was seated at an Apple
MacIntosh computer equipped with web browser software and stereo headphones. The three music
fragments were presented on the browser screen as "streamed" audio files sampled at 32-bits per
second. In front of each subject we placed a randomly-assigned tile (produced earlier by the
ceramicists) which was concealed in a box on the workspace in front of them. The subjects were read
the following directions:
In this study we are investigating the relationship between tactile and aural experiences. Please listen
carefully to all three musical excerpts on the web page in front of you. Listen to each one again, this
time with your hands on the ceramic tile inside the box, and after each excerpt, answer the questions on
your response sheet by placing a vertical dash on the darkened scale. If possible, also describe the
experience in your own words on the lines provided. You have ten minutes in all to complete this task. I
will notify you when you have two minutes left. Thank you for participating in this study.
The subjects were asked to answer the question How well does this music relate to the sensation of the
ceramic tile? for each of the three musical excerpts. They indicated their response on a horizontal
scale ranging from "very well" to "very poorly." The subjects were also encouraged to use their own
words to describe the relationship between tactile and aural sensations for each of the three excerpts.
Since one of the three excerpts had been used to create the selected tile by one of the ceramicists, it
was noted as the "target," while the other two excerpts were coded as "distractors."
The subjects’ ratings of the excerpts were converted to scores and used as the dependent variable in
all subsequent analyses. Target vs. distractor, presentation order, and gender served as independent
measures. An analysis of variance (ANOVA) using the three independent variables was conducted.
Results
There was a highly significant effect of target vs. distractors [F(1,38)=10.632; p<.002] (cf. Fig. 1).
However, there was no significant effect of presentation order [F(1,118)=.535; p=.587], nor of gender
[F(2,76)=1.489; p=.232]. A multivariate analysis revealed no significant interactions among these
variables.
Figure 1: Ratings of the subjective relationship between aural and tactile impressions (score in mm;
N= 40). Means, standard deviations, and range are displayed. Mean differences for target vs. distractor
items are significant (p<.002).
Discussion
In this two-phase aural/tactile music experiment we found what appear to be meaningful transactional
cognitive processes between these two domains of perception. Our results suggest that the ceramicists
were able to transform the aural impression meaningfully into a tactile representation of relevant
aspects of the musical events. Further, the novice subjects were in turn able to derive sufficient
information from the tactile representations to successfully identify which music excerpt had been
transformed. In the context of our experiment, there appears to be a connection between tactile and
aural perception and reasoning. These results raise a number of issues.
The specific nature of the connection noted above is not evident, nor is the extent to which common
gradients overlap in aural and tactile experiences. Further, we have yet to identify the specific
psychological factors and task constraints that might empower or inhibit transactional cognition in
music. We will, however, provide some observational data here that attest to the
Visual inspection of the ceramic tiles created in the first phase of the experiment revealed a
heterogeneous mixture of mainly abstract works, which to different degrees also resembled concrete
visual objects. We further observed that the surface appearance of the clay related to synchronous
movement of the hands to the music, while in other cases, there was no direct relation of musical
rhythms to surface appearance. Although it is impossible to give thoroughly comprehensive
descriptions of the tiles, it seems worth noting that task constraints were not considered as a major
negative influence by the artists
The ceramicists had little or no experience with the required task. Yet, the communication of this task
to the subjects was apparently not problematic. Implicitly, it seemed clear to them what was required,
although it was entirely left open how to fulfill the requirement under the constraints of the situation
and the available materials. The imaginative transfer of information from aural to tactile form by
ceramicists might not be equated with the invention or the idiosyncratic construction of a meaningful
relationship between these perceptual domains by the artists. Rather, according to semantic coding
theory (Osgood, 1960; Martino & Marks, 1999), the creative process might be guided to some extent
by semantic gradients. Although the artists lacked extensive formal musical knowledge, they verbally
expressed sensitivity to particular structural features of their aural images. Music cognition research
(e.g. Krumhansl, 1990; Bharucha, 1995) suggests that ordinary listeners acquire some implicit formal
knowledge of music structures by mere exposure to the musical artifacts in their socio-cultural
environments. Such implicit structural knowledge of music might have influenced the process of
matching aural and tactile perceptions. However, since the subjects in the second phase of the
experiment were elementary education majors, the effect of formal music training in their
decision-making cannot be readily determined. In summary, informal observation of (and interaction
with) both groups of subjects revealed that the combined tasks were highly appropriate and enjoyable
by the majority of the subjects. Results of the questionnaires that were administered to both groups of
subjects will be analyzed and presented in a later study, in which time we will address the
psychological factors which mediate the systematic processing and transfer of sensory data across
domains.
Implications for Musical Development and Education
Traditional music instruction has focused on the development of performance and listening skills
while the antecedents of creative activities, those which invoke expressive and interpretative thinking
and gestures, have been comparatively de-emphasized. Given that music responses are often
strengthened if several perceptual domains are involved, it seems important to draw conclusions of
how these research findings might help improve music instruction.
Sensitivity to musical expression in performance and listening, while arguably the most component of
musicality, is not typically an organic part of the learning process, owing to the its highly personal and
intuitive nature. However, expressive and interpretative musical thinking are considered highly-prized
outcomes of competencies in other areas such as reading notation, motor skill proficiency, idiomatic
knowledge, and so forth, which raises the question of whether traditional forms of music instruction
are internally consistent. Music instruction can be made more meaningful and satisfying if students
are encouraged to make interpretations in sound, that is, to explore connections between music and
other types of sensory experiences, for therein lie the creative possibilities of the medium. Guidelines
for these explorations should take into account the strength of relationships between aural, tactile,
visual, and kinesthetic sensations, as well as the myriad extramusical experiences that are invoked in
musical experience.
When we regard someone as "musical" we refer to their skill in communicating musical ideas through
a variety of related behaviors that include performing, listening, creating, notating, and so forth. While
traditional music instruction generally supports the development of these behaviors, ultimately the
ability to successfully communicate something meaningful requires that one make connections
between musical experiences and other kinds of experiences. We intend for this study to serve as an
example of how these connections might be approached didactically, thus allowing more developing
musicians to think more carefully about musical expression and musical meaning. We propose that
future research be conceived with greater attention to the conditions and needs of music pedagogy.
Such research must emphasize the formulation and use of creative teaching strategies that encourage
students to explore the relationships between aural and other types of experience.
References
Bartlett, Dale (1996). Physiological responses to music and sound stimuli. In: D. A. Hodges
(ed.) Handbook of music psychology, San Antonio: IMR-Press, 343-385.
Behne, Klaus-Ernst (1992). Am Rande der Musik: Synästhesien, Bilder, Farben, ... Jahrbuch
Musikpsychologie 8, 94-120.
Bharucha, Jamshed (1995). Neural Nets and Music Cognition. In: R.Steinberg (ed.). Music and
the mind machine. The psychophysiology and psychopathology of the sense of music. Berlin:
Springer, 199-204.
Cuddy, Lola L. (1994). Synästhesie. In H. Bruhn, R. Oerter and H. Rösing (eds.)
Musikpsychologie. Ein Handbuch. Reinbek: Rowohlt, 499-505.
de la Motte-Haber, Helga (1996). Handbuch Musikpsychologie, 2.Auflage. Laaber:
Laaber-Verlag.
Fisher, Kimberly V. and Barbara J. Parker (1994). A multisensory system for the development
of sound awareness and speech production. Journal of the Academy of Rehabilitative Audiology
27, 13-24.
Goldstein, Avram (1980). Thrills in response to music and other stimuli. Physiological
Psychology 8(1), 126-129.
James, Mark R. (1984). Sensory integration: A theory for therapy and research. Journal of
Music Therapy 21(2), 79-88.
Kleinen, Günter (1994). Die psychologische Wirklichkeit der Musik. Wahrnehmung und
Deutung im Alltag. Kassel: Bosse.
Kleinen, Günter (1999). Die Leistung der Sprache für das Verständnis musikalischer
Wahrnehmungsprozesse. Musikpsychologie 14, 52-68.
Krumhansl, Carol (1990). The cognitive foundations of musical pitch. Oxford: Oxford
University Press.
Krumhansl, C. & Schenck, D.L. (1997). Can dance reflect the structural and expressive
qualities of music? A perceptual experiment on Ballanchine’s choreography of Mozart’s
Divertimento no. 15. Musicae Scientiae. 1, (1), 63-85.
Martino, Gail and Lawrence E. Marks (1999). Perceptual and linguistic interactions in speeded
classification: Tests of the semantic coding hypothesis. Perception 28(7), 903-924.
McKinney, Cathy H., Frederick C. Tims, Adarsh M. Kumar, and Mahendra Kumar (1997). The
effect of selected classical music and spontaneous imagery on plasma -endorphn. Journal of
Behavioral Medicine 20(1), 85-99.
Osgood, Charles E. (1960). The cross-cultural generality of visual-verbal synesthetic
tendencies. Behavioral Science 5, 146-169.
Panksepp, Jaak (1995). The emotional sources of "chills" induced by music. Music Perception
13(2), 171-207.
Petsche, Hellmuth, Peter Richter and Oliver Filz (1995). EEG in music psychological studies.
In: R.Steinberg (ed.). Music and the mind machine. The psychophysiology and psychopathology
of the sense of music.. Berlin: Springer, 205-214.
Sloboda, John A. (1991). Music structure and emotional response. Psychology of Music 19,
110-120.
Smets, G.J.F. and Overbeek, C.J. (1995). Expressing tastes in packages. Design Studies. 16,
349-365.
Verrillo, Ronald T. (1992). Vibration sensation in humans. Music Perception 9(3), 281-302.
Back to index
Proceedings paper
Extended abstract
In previous work we elaborated a way to systematically investigate the perceptual division of a
continuous space of temporal patterns into discrete rhythmic categories (Aarts, Desain & Jansen, in
preparation; Aarts & Jansen, 1999). We studied the perception of short rhythmical patterns by
sampling the space of all possible patterns of three interonset-intervals (IOI), with a total duration of
one second. For this, musicians were asked to transcribe these short patterns of three IOIs (i.e. four
onsets) into music notation. It was shown that subjects were able to identify rhythmic categories as
regions of the rhythm space. Next to an insight in the perception of rhythmic categories, the results are
essential in the design and construction of automatic music transcription systems (see Cemgil, Desain
& Kappen, in press).
The current study was conducted to clarify how the shape of the rhythmic categories depends on
metric context and global tempo. Previous studies (e.g., Clarke, 1987; Schulze, 1989) already
addressed the effect of tempo and meter. However, these results were not unequivocal. Also,
generalization was limited due to a relatively small set of stimuli and a predefined set of responses
(arguably steering the subject to the available categories).
In this study we elaborated on the paradigm used in earlier studies of systematically sampling the
space of temporal patterns using open responses. To investigate the context effects of tempo and
meter we conducted two experiments in which a total of seventeen conservatory-trained musicians
transcribed aurally presented rhythmic patterns by means of a simple computer interface for music
notation.
In the first experiment the rhythms were presented at three tempi (40, 60, 90 BPM). The total stimulus
consisted of nine beats, marking eight measures. In the 3rd, 5th and 7th measure the rhythmical
pattern was presented. The length of a measure was resp. 0.67, 1.0 and 1.5 seconds in the three tempo
conditions. The beat and pattern were marked with different sounds (resp. high and low congo).
In the second experiment, the same patterns were presented in the context of a duple meter and a triple
meter, at 60 BPM. The stimulus consisted of nine beats, marking eighth measures. Each beat was
divided in two (for duple meter) or three (for triple meter) intervals marked with a softer sub-beat. In
the 3rd, 5th and 7th measure the rhythmical pattern was presented.
The results of the first experiment show how the rhythmical regions change shape in the different
tempi, depending on their complexity. In the second experiment, different rhythmic categories were
given as response depending on the metrical context, again in accordance with the relative rhythmic
complexity of the patterns in these meters. Metrical context determines what rhythmical categories
occur (see Figure 1), and tempo affects the size and shape of these categories. Notwithstanding these
clear results, the variance of the data between subjects reflects the complexity of decision making by
musicians in quantizing rhythmical patterns. The consequences of these findings for existing models
of rhythmic quantization are the topic of further research.
Figure 1: Ternary plot of the responses (drawn as a line from stimulus to response; line-width
indicates the number of equal responses) for 66 stimuli (open circles) consisting of three IOIs
presented in a duple meter context (left) and in a triple meter context (right).
The direction of a tick mark indicates to which grid line it belong. For example, the stimulus in
the middle of the rhythm space is 1/3-1/3-1/3 or 1:1:1).
These results show that metric context has a strong effect on perceived rhythmic categories.
For example, the rhythmic category 2:1:1 is only present in duple context, 1:3:2 only in the
triple condition.
Acknowledgement
This research has been made possible by the Netherlands Organization for Scientific Research (NWO)
as part of the "Music, Mind, Machine" project. More information can be found at
http://www.nici.kun.nl/mmm.
References
Aarts, R. and Jansen, C. (1999) Categorical perception of short rhythms. In Proceedings of the
1999 SMPC, pp 57. Evanston.
Aarts, R., Desain, P., and Jansen, C. (In Preparation) The quantization of short rhythmical
patterns.
Cemgil, T., Desain, P., and Kappen, B. (In Press) Rhythm quantization for transcription.
Computer Music Journal.
Clarke, E. F. (1987). Categorical rhythm perception: an ecological perspective. A. Gabrielson
(Ed.), Action and Perception in Rhythm and Music. Royal Swedish Academy of Music, 55.
Schulze, H. H. (1989). Categorical perception of rhythmic patterns. Psychological Research,
51, pp 2-9.
Back to index
Proceedings paper
Aaron Williamon
Royal College of Music, Prince Consort Road, London, SW7 2BS, UK
Elizabeth Valentine
Department of Psychology, Royal Holloway, University of London,
Egham, Surrey, TW20 0EX, UK
Exceptional memory is a hallmark of expertise. The demands placed on memory during musical
performance, for example, are remarkable, sometimes requiring the reproduction of over 1000 notes a
minute for periods of up to 50 minutes. Unsurprisingly, performers often accrue hours of extra practice on
a composition, developing multiple retrieval systems that will permit a performance to continue come what
may (Chaffin & Imreh, 1997). Chase and Ericsson’s (1982) Skilled Memory Theory has commonly been
accepted as accounting for expert memory (Schneider & Detweiler, 1987; Carpenter & Just, 1989;
Anderson, 1990; Baddeley, 1990; Newell, 1990; Ericsson & Kintsch, 1995). The theory proposes that
outstanding memory abilities arise from the creation and efficient use of "retrieval structures". These
structures can only be attained under restricted circumstances. First, individuals must be able to store
information in LTM rapidly. This requires a large body of relevant knowledge and patterns for the specific
type of information involved. Secondly, the activity must be familiar, so that individuals can anticipate
future demands for the retrieval of relevant information. Finally, individuals must associate the encoded
information with appropriate retrieval cues. This association adheres to hierarchical and serial principles
and permits the activation of a particular retrieval cue at a later time, thus partially reinstating the
conditions of encoding so that the desired information can be retrieved from LTM (for discussions of
hierarchical and serial principles see Collins & Quillian, 1969, 1970; Smith et al., 1974; Rosch et al., 1976;
Kosslyn, 1980, 1981, 1987, 1994; Pylyshyn, 1981, 1984; Hinton et al., 1986; Sellen & Norman, 1992;
Shaffer, 1976; Sternberg et al., 1978; Sternberg et al., 1988; Palmer & van de Sande, 1995).
Only after a set of retrieval cues are organised in a stable structure, is a retrieval structure formed, thereby
enabling individuals to "retrieve stored information efficiently without lengthy search" (Ericsson &
Staszewski, 1989, p. 239). Several researchers have voiced doubts about the theory’s generalisability to
working memory (see Baddeley, 1990; Schneider & Detweiler, 1987). Consequently, Ericsson and Kintsch
(1995) have extended the theory into the Long-Term Working Memory Theory, asserting that "storage in
working memory can be increased and is one of many skills individuals attain during the acquisition of
skilled performance" (p. 220). Nevertheless, their description of the cognitive mechanisms that permit
information to be extracted for memorised performances (i.e. retrieval structures) remains unaltered.
Chaffin and Imreh (1994, 1996a, 1996b, 1997) systematically observed the practice of a concert pianist to
determine whether she used the kind of highly practised, hierarchical retrieval structure described by Chase
and Ericsson (1982) to memorise and perform the Presto from Bach’s Italian Concerto. Practice for this
piece was divided into 58 sessions, aggregated into three learning periods and spread over ten months.
Sessions were video-taped, and cumulative records were created showing the pianist’s starting and
stopping points in the music. They also examined the pianist’s concurrent and retrospective commentary
on her practice. Chaffin and Imreh confirmed the prediction that the concert pianist would use a
hierarchically ordered retrieval structure to recall encoded information. Moreover, they found that she
organised her practice and subsequent retrieval of the Presto according to its formal structure. The number
of practice segments that started and stopped at boundaries in the formal structure during practice were
significantly higher than the number that started and stopped at other locations.
In a follow-up study, the pianist was asked to write out the first page (i.e. 32 bars) of the score from
memory two years after the project. She was not informed before the project that she would be asked to
perform this task. During the interval between the performance of the piece and recall, the pianist did not
practise or perform the Presto. The researchers found that recall was significantly better for the bars
beginning each section than for bars at other locations, confirming, once again, that the hierarchical
components of the music’s formal structure formed an enduring foundation for the pianist’s retrieval
structure.
Chaffin and Imreh’s (1994, 1996a, 1996b, 1997) work is the first to demonstrate that the principles of
expert memory apply to concert soloists. Subsequent investigations, however, should examine this issue
across more than just one performer and across several levels of skill. Such research would test the
generalisability of their findings and indicate how the implementation of hierarchical retrieval schemes in
practice develops as a function of expertise. This paper details such an investigation, in which the practice
of 22 pianists at four levels of skill was examined as they prepared an assigned composition for a
memorised performance.
Method
The Musicians
Six piano teachers from southeast England were asked to recommend students capable of learning and
performing a selected piece of music suited to their level of ability from memory. Thirty-seven pianists
were recruited for the study. Of those 37, a complete set of data was collected and analysed for 22
participants. Participation was strictly voluntary but encouraged by the piano teachers because the
conditions of participation were seen to contribute to students’ overall musicianship by providing
invaluable and challenging performance experience.
The participating pianists were classified into four levels of ability based on the grading system set forth by
the Associated Board of the Royal Schools of Music (see Harvey, 1994). This system contains eight
grades, with Grade 1 representing the lowest level of skill and Grade 8 representing the highest. Musicians
at Grade 8 are usually considered to possess high performance standards, though falling short of expertise.
The four levels span all eight grades and were stratified as follows: pianists of Grade 1 & 2 standard were
placed in Level 1 (2 male, 3 female); Grade 3 & 4 in Level 2 (3 male, 3 female); Grade 5 & 6 in Level 3 (2
male, 4 female); and Grade 7 & 8 in Level 4 (5 female). (Williamon & Valentine, 2000, found that the
musicians within each of these levels were adequately comparable in terms of overall musical competence
and training so as to satisfy the requirements of the normal distribution. They also found that age
significantly differed between levels; therefore, age will be entered as a covariate in all between-level
comparisons in this paper).
The Music
The pianists were assigned one piece of music appropriate to their level of ability. All selected pieces were
composed by J.S. Bach. The compositions for Levels 1 to 4 were, respectively, the Polonaise in G Minor
from the Anna Magdalena Notebook (BWV Anh. 119), the Two Part Invention in C Major (BWV 772), the
Three Part Invention in B Minor (BWV 801), and the Fugue in D Minor from the Well-Tempered Clavier I
(BWV 851; Level 4 pianists also prepared the Prelude in D Minor, but the values reported in this paper
were obtained for the Fugue only).
Procedure
The pianists were asked to record all practice for their assigned piece on cassette tape. The participants
were invited to comment, either on tape or in writing, on any relevant aspect of the learning process (these
comments were subsequently transcribed by the first author). In addition, pianists were asked to note and
describe all practice carried out away from the piano, including singing the music and analysing the score.
Participants were informed at the outset of the study that they would be required to perform the assigned
piece from memory in a recital setting, attended by their teachers, parents and fellow music students. The
time and location of each recital was arranged by the respective music teacher as part of students’ regular
curriculum. No restrictions on the amount of time or the number of practice sessions were placed upon the
pianists, except for those normally affixed by themselves or their music teachers. Following each
performance, the pianists were interviewed about the practice and memorisation process and asked to
comment on the project itself, including its design and implementation.
Cumulative Records
The recorded practice sessions were transcribed into cumulative records for each pianist. These records
contained both quantitative and qualitative information on the learning process as a whole and on each
individual practice session (a practice session was defined as a discrete period of time, of variable length,
in which musicians practised the assigned composition either at or away from the piano). The records
documented characteristics of practice such as the total time spent practising, the number of days
encompassed within the learning process (from the first practice session up to the final performance), the
number of practice sessions in the learning process, the number of practice sessions per day and the time
spent in each practice session. In addition, graphs were plotted for each practice session showing starting
and stopping points for the segments of music played by each pianist. The graphs − with the x-axis
representing bars of the music and the y-axis depicting the cumulative number of practice segments −
represent the sequence of segments of the music executed by a particular pianist during a given practice
session. Such graphs were originally introduced by Chaffin and Imreh (1994, 1996a, 1996b, 1997). The
pianists often corrected one or two notes whilst continuing to play through the music. This type of practice
is analogous to a stutter in speech. Such stutters were not included in the cumulative records. All graphs
were transcribed from the cassette tapes by the first author.
Results
Segmenting the Assigned Compositions
Following each performance, the musicians were interviewed. One set of interview questions required that
the pianists indicate whether they had thought of their assigned composition as having component sections
during both practice and performance, and if so, why and how they partitioned it. In another set of
questions, they were asked to identify the bars in which difficult passages occurred in the music and
explain why they were difficult. These were open ended questions, not intended to lead the pianists into
particular answers, such as the identification of the music’s formal structure or the cataloguing of
difficulties into specific types. In sum, the findings reveal that participants segmented their assigned
composition into various hierarchical organisations, not always coincident with the formal structure.
Moreover, the interview data indicate that the higher skilled musicians demonstrated an extended use of
hierarchy when reporting information about local detail in their composition (i.e. difficult bars).
The Role of Segmentation in Practice
To determine the extent to which segmentation played a role in guiding the practice of the musicians, bars
of the assigned compositions were categorised as "structural," "difficult" or "other". Unlike Chaffin and
Imreh’s (1994, 1996a, 1996b, 1997) study, the formal structure of the music was not used as the basis of
this categorisation because only three pianists (one in Level 3 and two in Level 4) reported that the formal
structure influenced their segmentation of the assigned piece. Instead, the categorisation system was based
on the pianists’ individual-specific segmentation of the music and identification of difficult bars. Bars were
classified as "structural" if they were the first bar in each of the identified sections and subsections. They
were labelled as "difficult" if they had been previously named as such by the pianists. No differentiation
was made between types of difficulty. All remaining bars were placed into the "other" group. In four cases,
two pianists in Level 2, one in Level 3 and one in Level 4 labelled one bar in their composition as both
structural and difficult. In these instances, the bars were omitted entirely from subsequent analyses. Also,
the frequency of the first bar from each assigned composition was excluded from all analyses. Obviously,
the first bar of a piece may play a guiding role in any hierarchical retrieval scheme; however, it was
excluded because of the multitude of reasons as to why musicians may decide to start their practice at the
beginning of a piece. Two of these are that musical information is organised linearly and that any attempt
to simulate a complete performance is likely to begin on the first bar.
Using this classification system, the frequency with which pianists started their practice on structural,
difficult and other bars was obtained for each practice session. Initially, these frequencies were to be
compared both within and between ability levels, but closer inspection of the values revealed that such
comparisons were not valid for three reasons. In terms of within-level comparisons, the number of
structural, difficult and other bars identified by each pianist varied considerably. Consequently, the
resulting frequencies may have increased or decreased based on the number of each type of bar. As for
between-level comparisons, the findings of Williamon and Valentine (2000) reveal that pianists in the
sample at higher ability levels spent more time practising in each practice session. As a result, they may
have started practice on structural, difficult and other bars more often than pianists at lower levels of ability
in this extra time. Also, the number of bars in each assigned piece was different. Therefore, in the
hypothetical situation that all bars were equally important in terms of encoding and retrieving musical
information, the probability of the pianists starting their practice on any one bar would decrease with an
increase in the length of the piece.
To account for these within- and between-level inconsistencies, a measure was calculated reflecting the
deviation between (1) the observed frequencies of starts on structural, difficult and other bars and (2) the
expected frequencies based on the number of each type of bar identified and the number of bars in the
assigned piece. The equations by which these calculations were performed were derived from that used to
calculate expected frequencies in the Chi-squared test (see Goodman, 1957; Kendall & Stuart, 1963). The
calculated values (referred to from here on as δ s for structural bars, δ d for difficult bars and δ o for other
bars) give an equivalent of "z" scores, by which positive integers indicate more starts on a specific bar type
than would be expected and negative integers indicate fewer starts on a specific bar type than would be
expected. The equations and calculations used to obtain δ s, δ d and δ o for each pianist in each practice
session are as follows:
The Equations
Measure of Deviation of Observed "Structural" Starts from Expected "Structural" Starts
(a)
Measure of Deviation of Observed "Other" Starts from Expected "Other" Starts
The Calculations
Step 1: The proportion of "structural," "difficult" and "other" bars to the total number of bars
❍ nsi = number of structural bars identified by pianist "i"
(proportion structural)
(proportion difficult)
(proportion other)
Step 2: The number of actual starts on "structural," "difficult" and "other" bars
❍ fsi = number of observed starts on structural bars for pianist "i"
Step 3: The number of expected starts on "structural," difficult" and "other" bars
❍ esi = psi x Mi (number of expected structural starts)
The means for δ s, δ d and δ o across all practice sessions for each level of ability are listed in Table 1.
These values for the deviation of the observed from expected frequencies for each bar type were compared
using a two-factor mixed analysis of covariance (ANCOVA) with deviation as the dependent variable, bar
type (i.e. structural, difficult and other) as the within-subjects independent variable, level as the
between-subjects independent variable, and age as the covariate. The analyses revealed significant main
effects of bar type [F (2,34)=262.25, p<0.001] and level [F (3,17)=58.88, p<0.001] and a significant
interaction between bar type and level [F (6,34)=265.22, p<0.001]. Subsequent polynomial contrasts (i.e.
comparisons of δ s versus δ d and δ s & δ d combined versus δ o) further defined these findings in that
structural starts were more frequent than difficult starts [t (17)=225.16, p<0.001] and structural and
difficult starts combined were more frequent than other starts [t (17)=373.91, p<0.001]. These contrasts
also revealed that the predominance of structural starts over difficult starts was greater for pianists at higher
levels of ability [t (17)=263.33, p<0.001].
To explore the extent to which structural and difficult bars guided the pianists’ practice throughout the
learning process, the deviations of the observed from expected frequency of structural and difficult starts
were examined at three discrete stages of practice. Stage 1 included values for each pianist’s first three
practice sessions; Stage 2 included values for the middle three practice sessions; Stage 3 included values
for the last three practice sessions. The mean values for δ s and δ d at the three stages for each ability level
are also displayed in Table 1.
Table 1. Means for δ s, δ d and δ o for each level of ability at each Stage of practice.
The values were analysed by two two-factor mixed ANCOVAs with the deviation of the observed from
expected starts on structural and difficult bars as the respective dependent variables, stage as the
within-subjects independent variable, level as the between-subjects independent variable and age as the
covariate. For δ s, the analyses revealed significant main effects of stage [F (2,34)=346.70, p<0.001] and
level [F (3,17)=529.87, p<0.001] and a significant interaction between stage and level [F (6,34)=114.90,
p<0.001]. Further polynomial contrasts (i.e. comparisons of Stage 1 versus Stage 3 and Stages 1 & 3 versus
Stage 2) revealed that these deviations increased linearly across the practice process [t (17)=912.49,
p<0.001] and that they increased most for pianists at higher levels of ability [t (17)=263.34, p<0.001]. For
δ d, the analyses revealed significant main effects of stage [F (2,34)=64.79, p<0.001] and level [F
(3,17)=62.44, p<0.001] and a significant interaction between stage and level [F (6,34)=78.21, p<0.001].
Further polynomial contrasts (i.e. comparisons of Stage 1 versus Stage 3 and Stages 1 & 3 combined
versus Stage 2) revealed that these deviations were greater in Stage 1 than in Stage 3 [t (17)=96.22,
p<0.001] and that this decrease occurred most for pianists at higher levels of ability [t (17)=59.61,
p<0.001].
Discussion
In sum, the data indicate that (1) pianists at all ability levels started their practice on "structural" bars more
frequently than on "difficult" and "other" bars, (2) the overall use of structural bars in starting practice
segments increased significantly with stage of practice and ability level and (3) pianists started their
practice on difficult bars less frequently from Stage 1 to 3. The results suggest, therefore, that the
identification and implementation of structure in guiding practice is a salient characteristic of musical skill
and becomes even more so as a function of expertise. Moreover, they demonstrate that the influence of
difficult bars in directing practice was increasingly replaced by the use of structural bars to guide rehearsal
(i.e. "difficult" starts significantly decreased across the three stages; "structural" starts significantly
increased across the three stages).
The findings give some insight into the use of "structure" in guiding performance. Existing research
suggests that if individuals use a retrieval scheme during performance, then they must use the same scheme
to encode the information (Tulving & Pearlstone, 1967; Baddeley, 1990) and must practise using it to
guide retrieval (Ericsson & Kintsch, 1995). Considering (1) that the identification of structural bars was
based on the pianists’ reports of sections in the music that were important in both practice and performance
and (2) that these bars were increasingly exploited across the practice process, one could argue that this
exploitation was not only important for the encoding of musical information but also for its retrieval.
In general, these findings support the arguments of Chase and Ericsson (1982) and Ericsson and Kintsch
(1995) in that musical performers appear to implement hierarchical retrieval structures in practice so that
they may use them to guide retrieval in performance. This study goes beyond existing research by
examining retrieval structures across more than just one performer and across several levels of skill,
demonstrating the generalisability of Chaffin and Imreh’s (1994, 1996a, 1996b, 1997) findings to other
skilled musicians and documenting how the exploitation of hierarchical retrieval schemes in practice and
performance develops as a function of expertise.
References
Allard, F., Graham, S., & Paarsalu, M. E. (1980). Perception in sport: Basketball. Journal of
Sport Psychology, 2, 14-21.
Anderson, J. R. (1990). Cognitive Psychology and Its Implications. San Francisco: Freeman.
Baddeley, A. D. (1990). Human Memory: Theory and Practice. Boston: Allyn & Bacon.
Carpenter, P. A., & Just, M. A. (1989). The role of working memory in language
comprehension. In D. Klahr & K. Kotovsky (Eds.), Complex Information Processing: The
Impact of Herbert A. Simon. Hillsdale, NJ: Lawrence Erlbaum Associates.
Chaffin, R., & Imreh, G. (1994). Memorizing for performance: a case study of expert
memory. Paper presented at the Third Practical Aspects of Memory Conference. University of
Maryland.
Chaffin, R., & Imreh, G. (1996a). Effects of difficulty on practice: a case study of a concert
pianist. Poster presented at the Fourth International Conference on Music Perception and
Cognition. McGill University: Montreal, Canada.
Chaffin, R., & Imreh, G. (1996b). Effects of musical complexity on expert practice: a case
study of a concert pianist. Poster presented at the Meeting of the Psychonomic Society.
Chicago, IL.
Chaffin, R., & Imreh, G. (1997). "Pulling Teeth and Torture": Musical Memory and Problem
Solving. Thinking and Reasoning, 3, 315-336.
Chase, W. G., & Ericsson, K. A. (1982). Skill and working memory. In G. H. Bower (Ed.),
The Psychology of Learning and Motivation (Vol. 16). New York: Academic Press.
Chase, W. G., & Simon, H. A. (1973a). Perception in chess. Cognitive Psychology, 4, 55-81.
Chase, W. G., & Simon, H. A. (1973b). The mind’s eye in chess. In W. G. Chase (Ed.), Visual
Information Processing. New York: Academic Press.
Collins, A. M., & Quillian, M. R. (1969). Retrieval time from semantic memory. Journal of
Verbal Learning and Behavior, 8, 240-248.
Collins, A. M., & Quillian, M. R. (1970). Does category size affect categorisation time?
Journal of Verbal Learning and Behavior, 9, 432-438.
de Groot, A. (1946/1978). Thought and Choice in Chess. The Hague: Mouton. (Original work
published in 1946).
Deakin, J. M. (1987). Cognitive Components of Skill in Figure Skating. Unpublished PhD
thesis. University of Waterloo.
Ericsson, K. A., & Kintsch, W. (1995). Long-term working memory. Psychological Review,
102, 211-245.
Ericsson, K. A., Krampe, R. Th., & Tesch-Römer, C. (1993). The role of deliberate practice in
the acquisition of expert performance. Psychological Review, 100, 363-406.
Ericsson, K. A., & Staszewski, J. J. (1989). Skilled memory and expertise: Mechanisms of
exceptional performance. In D. Klahr & K. Kotovsky (Eds.), Complex Information
Processing: The Impact of Herbert A. Simon. Hillsdale, NJ: Lawrence Erlbaum Associates.
Goodman, R. (1957). Teach Yourself Statistics. London: English University Press.
Hinton, G. E., McClelland, J. L., & Rumelhart, D. E. (1986). Distributed representations. In
D. E. Rumelhart, J. L McClelland & The PDP Research Group (Eds.), Parallel Distributed
Processing: Foundations (Vol. 1). Cambridge, MA: MIT Press.
Kendall, M. G., & Stuart, A. (1963). The Advanced Theory of Statistics (Vol. 1). London:
Charles Griffin and Company.
Kosslyn, S. M. (1980). Image and Mind. Cambridge, MA: Harvard University Press.
Kosslyn, S. M. (1981). The medium and the message in mental imagery: A theory.
Psychological Review, 88, 44-66.
Back to index
Proceedings paper
Introduction
Modern technological advances mean that now, more than at any other time in history, music is
pervasive and functions not only as a pleasurable art form, but surrounds many activities of daily life
(Mertz, 1998). Indeed, a recognition of the ubiquitous presence of music has been one of the factors
motivating significant research interest in the effects of music listening on a range of psychological
and physiological variables. Consequently, there has been significant debate within the psychology of
music literature regarding these effects with a wide range of effects of music listening suggested by
researchers (Overy, 1999). These include the effects of listening to music on cognitive skills Rauscher
(2000), on heart rate and blood pressure (Aldridge, 1996), on cortisol and norepinephrine levels
(Vander Ark & Mostardi, 2000) on emotional responses (Radocy & Boyle, 1997) and on consumer
behaviour (North and Hargreaves, 1997).
Recent advances in the psychology of music have begun to highlight the importance of researching
the impact of factors beyond the nature of the music itself on music perception. Issues such as social,
environmental and cultural influences have been investigated but there is a need for more research
which focuses on the impact these factors might have (Hodges and Haack, 1996; Miell and
MacDonald, in press). Social and cultural influences such as peer groups, the family and the listening
context have been highlighted as key areas of research interest (Hargreaves and North, 1997) and it is
important to further develop our knowledge of how these and other extra-musical factors impact upon
our responses to musical stimuli.
When examining the role of these extra-musical influences it is important to consider the impact of
popular music culture. Through, for example, television, radio, the film industry, magazines etc
popular music culture plays an influential role in everyday life (Frith, 1987). Such is the impact of
popular music culture that it could be seen as constituting an 'informal learning environment' - that is a
means through which we all learn and develop our preferences for music (Folkestad, 1998). One
example of the influence of popular music culture is when we look at how children develop their
musical skills, preferences and knowledge. Some authors have speculated on whether these informal
learning environments have more to do with a child's developing musicality than the conventional
classroom setting for music teaching (Folkestad, 1998). Folkestad suggests that previous research has
tended to view a child's musical development as intrinsically linked to the school or institutional
environment and he argues that, instead, the impact of popular music culture is such that we need to
move away from this narrow definition of musical development and include an analysis of cultural
influences such as pop music on the child's developing musicality. This has direct relevance for music
education research, as it is important to consider this wider social context in which children listen to,
play and learn about music (Miell and MacDonald, in press) These research priorities have relevance
not only for children but also for adult music listening as well.
The points raised above have set a context highlighting the extra-musical aspects of music perception,
emphasising the role that popular culture has in this process. At a more specific level, it is also
important to note that popular music is the predominating genre of music in modern popular culture.
Its centrality in the personal identity of adolescents has been reported by Zillman and Gan, (1997)
and, as a pervasive art form, is present while we perform many day to day activities from shopping to
drinking in a bar (Hodges and Haack, 1996). However, as Hargreaves and North (1997) suggest, this
influence is not reflected within the research literature that investigates the psychology of music at a
general level or the effects of music listening at a more specific level. The aim of this paper is to
highlight, from a psychological perspective, the importance that popular music should have for
researchers interested in the psychological effects of music. I will do this by reviewing existing
literature and presenting two studies that have relevance to this topical issue.
Method
Overviews of two music listening studies are presented here. These two studies are chosen as they
involve investigations of the effects of listening specifically to popular music, extending similar
studies of listening which have been undertaken using classical music. The first study was a small
scale student project while the second study was part of a project funded by the Scottish Executive
investigating the effects of listening to music in a hospital setting for the purposes of pain and anxiety
reduction.
Study one
This study investigated the emotional changes taking place after listening to different types of music
and also compared the effect of listening to music in different locations. Previous research has
suggested that the impact of the listening context is a crucial variable in how we interpret music
(North and Hargreaves, 1997) We were interested in comparing emotional responses to popular versus
classical music. While there is a long history of research investigating emotional responses to music
(Meyer 1956, Sloboda, 1985) highlighting issues such as emotionality embedded in structural features
of music it is suggested here that much of this research has neglected investigations of popular music.
Thirty participants listened to 4 pieces of popular music and 4 classical pieces. The four pieces in the
pop category were: The Verve, Bitter Sweet Symphony; Robbie Williams, Let Me Entertain You;
Massive Attack, Teardrops and Skunk Anasie, Selling Jesus. The four pieces in the classical condition
were extracts from Tchaikovsky's Symphony No.6., Mozart's Magic Flute, Bellini's Norma and
Handel's Hallelujah Chorus. The experiment was counter-balanced to ensure that participants listened
to the pieces in different orders and half the participants listened to the music in their home setting
first whilst half listened to the pieces in a lab setting first.
Participants completed a profile of moods scale (POMS; McNair, Lorr, & Dropleman, 1992) before
and after listening to each piece and an aggregate of mood change was generated. The results
demonstrated that, in general, listening to the popular music produced significantly greater changes in
mood in comparison to the classical music (t(29)=2.51, p<.05). The direction of the emotional change
was dependent upon the piece. For example, listening to the Robbie Williams track, 'Let me entertain
you' produced the biggest positive change within the 'vigour' category of the POMS. When comparing
the changes of mood in participants listening to music in the home compared to the lab, participants
listening to music in the home produced significantly greater changes in mood (t(29)=3.79, p<.01). In
cases where there was a significant differences, more extreme effects of the same emotion were
obtained when participants were in the home as opposed to when they were in the lab.
The participants (all undergraduate students) were, in general, more familiar with the pieces in the
popular category and expressed more liking for them than the classical compositions. One of the
purposes of the experiment was to present pre-selected music in order to make comparisons of
emotional responses at a general level and this result demonstrates that participants were more likely
to prefer and be more familiar with popular music. Familiarity with the music is obviously an
influential factor in determining an individual's emotional responses to a piece a music. However in
many studies the popular genre is not presented at all and issues of personal preference and familiarity
are often ignored. The implications of this for results of existing research will be considered further in
the presentation. In terms of the results of this study, participants were more familiar with popular
music, they appeared to like this style of music over the classical and it produced the biggest
emotional changes for the listeners. In addition, listening to the music in their home environment also
produced a bigger emotional change than listening to the music in the laboratory setting. Thus the
listening context is very important to take into consideration when investigating emotional responses
to music, highlighting the importance of environmental and extra-musical factors.
Study Two
In this study participants undergoing surgical operations were invited to select music from their
personal collection to listen to during the post-operative recovery period (approximately 24 hours).
Many authors, from a range of health care professions, advocate a multidisciplinary approach to pain
management that encourages the use of non-pharmacological and non-invasive strategies such as
relaxation, biofeedback and psychological coping techniques (Standley, 1986; 1995). It has been
suggested that music may play a key role in helping to reduce perceptions of pain for patients in
hospital settings (Chesky, Michel and Kondraske, 1996). These findings are in accord with
meta-analyses reported by Standley (1986, 1995). In addition, a recent survey of music therapists
(n=348) by the American National Association of Music Therapy (NAMT) reported that 45% of
respondents used music specifically for pain management with elderly individuals, individuals in a
hospital setting and physically disabled populations (Michel & Chesky, 1996). Gardner and Licklider
(1960) report an empirical study in which listening to music had a significant effect on pain
perception. In their experiment, 25% of the participants (n=1000) reported that, as a result of listening
to music, they did not require analgesia during dental treatment. Herth (1978) also reports reduced
analgesic requirements in hospitalized patients listening to music. Specific physiological effects of
listening to music in clinical setting have been investigated extensively by Spintge (1985), who
reports changes in patients listening to music, specifically, in a number of neuroendocrinological
measures, including blood pressure and plasma levels of noradrenaline. It is suggested that these
changes are indicative of an anxiolytic effect. Koch and Kain (1998) found that patients who listened
to music while undergoing urologic procedures required less intra-operative sedation and analgesia
than participants in a control group.
Although the number of empirical projects investigating the process and outcomes of this type of
intervention is growing, many authors have suggested that there is still a need for rigorous evaluation
studies (MacDonald, O'Donnell and Davies, 1999; Radhakishnan, 1991; Purdie, 1997; Standley,
1992). Such evaluations can help us understand in more detail the specific effects that music listening
can have within clinical and hospital settings. This study is focussed on evaluating the possible
therapeutic effects of music listening. In many of the experimental studies cited above the precise
nature of the music is not specified and music is pre-selected with an assumption that playing this
pre-selected music will be relaxing for all participants. It is important that we have a clearer
knowledge about the types of music chosen and the individual interpretations and evaluations made
by participants. For this study, the music was selected by the participants from their own personal
collections. Thus, we were interested in subjective responses to particular pieces of music and not
responses to pre-selected music that has been chosen for their apparently general relaxing or pain
reducing effects. With these issues in mind, this experiment was designed to investigate whether
music listening could have a beneficial effect for patients undergoing minor operations in terms of
reduced anxiety perceptions.
Participants
There were 17 participants in an experimental group (11 females & 6 males), and 23 participants in a
control group (14 females & 9 males).
Equipment
The Spielberger trait anxiety inventory was used to measure participants levels of anxiety
(Spielberger, 1983). This assessment instrument was a commonly used and well validated measure of
anxiety (Anastasi, 1990).
Procedure
Patients wishing to take part in the study were asked to sign a consent form and randomly assigned to
either the experimental or the control group. Participants in the experimental group were asked to
select an audiocassette(s) from their personal collection to listen to via personal stereo after the
operation. On the day of the operation, baseline measurements were made on all assessment
instruments prior to the operation (Time 1). All participants then underwent minor foot surgery and,
shortly after the operation, measurements were once again taken (Time 2). Participants were again
assessed on the measurement instruments 4 hours after the operation (Time 3). Although patients were
not required to listen to their music for a set period of time, they were encouraged to listen to the
music as much as possible. All patients listened to music for at least 45 minutes during the 4-hour
postoperative assessment phase. Descriptive statistics for the experimental and control group are
presented in Table 1. A 3 (Time1, Time 2, Time 3) X 2 (Control Group, Experimental) ANOVA
produced a significant interaction effect [F(2,76)=65.36,p<.01] highlighting the anxiolyitc effects of
music listening.
Control Experimental
associational meaning for the participants, in that the piece of music selected reminded the
participants of happy memories or positive, relaxed calm feelings. Indeed, analysis of qualitative data
obtaining from asking the participants why they selected their chosen music supports this suggestion.
Many participants said that they selected a particular piece of music because it would help them to
relax in the hospital, as it was a piece they particularly enjoyed listening to at home. The wider
implications of these findings for other studies of music listening will be discussed.
Conclusion
Research that investigates the cognitive and emotional effects of music should take into consideration
that popular music is pervasive and in many cases will be the preferred style of music for participants
to listen to. For example, experiment two highlighted that when given the opportunity to select music
for explicitly clinical purposes, i.e. pain and anxiety reduction, participants overwhelmingly chose
popular music, but that whatever music they chose (ie whichever genre they preferred) had beneficial
effects on their perceptions of anxiety. Asking participants about their musical preferences during
these types of study should therefore be an important feature of this type of work. Although studies do
demonstrate that one particular piece of music can have a very specific general effect it is important to
take into account personal preference.
Reductionist interpretations of music cognition that view music as a unitary stimulus that will have
certain general effects at perceptual, cognitive and even neurological levels should take these issues
into consideration. The complex nature of music listening involves not only structural features of the
music but also social, cultural and environmental factors and, while the precise psychological
mechanisms involved in how individuals interpret music remains to be explored in more detail, these
results do emphasis the extra-musical (eg learned and associational) aspects of music listening as
opposed to the intra-musical (eg structural) features.
It is important for researchers to consider the impact of popular culture and popular music. As
previously suggested we are all exposed to these influences and they can be thought of as constituting
an informal learning environments. This exposure as well as experiences with family and friends play
an important role in developing an individual's specific musical preferences and abilities.
Consequently, research can be enriched by including some analysis of these influences in undertaking
studies of music perception.
References
Anastasi, A. (1990). Psychological testing. New York: Maxwell Macmillan.
Aldridge, D. (1996) Music Therapy Research and Practice in medicine: From out of the Silence.
London: Jessica Kingsley.
Chesky K. S., Michel, D. D., & Kondraske, G. V. (1996). Developing methods and techniques for
scientific and medical applications of music vibration. In R. R. Pratt & R. Spintge (Eds.),
MusicMedicine ,vol. 2 (pp. 227-241). Saint Louis: MMB
Frith, S. (1987) The industrialization of Popular music. In J.Lull (ed) Popular Music and
communication (pp53-77), Sage: CA, USA.
Folkestad, G. (1998) Musical Learning as cultural practice as exemplified in computer-based creative
music making. In B. Sundril, G.E. McPherson and G. Folkestad (eds) Children Composing (pp.
97-135), Lunds University: Malmo, Sweden.
Gardener, W. J., & Licklider, J. C. R. (1960). Auditory analgesia in dental operations. The Journal of
the American Dental Association, 59, 1144-1149.
Hargreaves and North (1997). The Social Psychology of Music. London: Oxford University Press
Herth, K. (1978). The therapeutic use of music. Supervisor Nurse, 9, 22-23.
Hodges, D.A. & Haack (1996) The influence of music on human behaviour. In D.A.
Hodges (Ed) Handbook of Music Psychology (pp. 467-557), Texas: IMR.
Koch, M. E., & Kain, Z. N. (1998). The sedative and analgesic sparing effect of music. Anesthesiolgy,
89, 300-306.
MacDonald, R.A.R., O'Donnell, P.J, & J.B., Davies (1999). Structured music workshops for
individuals with learning difficulty: an empirical investigation. Journal of Applied Research in
Intellectual Disabilities 12(3) 225 - 241.
McNair, D. Lorr, M, & Dropleman, L.F. (1992) Profile of Mood States Manual, Educational and
Industrial Testing Service: San Diego.
Mertz, M. (1998) Some thoughts on music education in a global culture. International Journal of
Music Education, 32, 72 -78.
Melzack, R. (1980). Psychological aspects of pain. Pain, 8, 143-145
Meyer (1956) Emotion and Meaning in Music. University of Chicago press: Chicago.
Michel, D. E., & Chesky, K. S. (1996). Music and music vibration for pain relief: standards in
research. In R. R. Pratt & R. Spintge (Eds.), MusicMedicine, vol. 2 (pp. 218-226). Saint Louis: MMB.
Miell, D. & MacDonald, R.A.R. (in press). Children's creative collaborations: The importance of
friendship when working together on a musical composition. Social Development.
North A.C. & Hargreaves, D.J. (1997) Music and consumer behaviour in D.J. Hargreaves and
A.C.North (Eds) The Social Psychology of Music. London: (pp. 268-290) Oxford University Press
Overy, K. (1999) Can music really Improve the mind. Psychology of Music, 26, 97- 103.
Purdie, H. (1997). Music therapy with adults who have traumatic brain injury and stroke.British
Journal of Music Therapy, 11, 45-50.
Radhakishnan, G. (1991). Music therapy - A review. Health Bulletin, 49(3), 195-199.
Rauscher, F. (2000) Musical influences on spatial reasoning: experimental evidence of the Mozart
Effect. Paper presented at the biannual conference of Society for Research in Psychology of Music and
Music Education, Leicester UK.
Radocy, R.E. and Boyle, J.D. (1997) Psychological Foundations of Musical Behaviour, 3rd edition,
Springfield, Illinois: C.C. Thomas
Sloboda, J.A. (1985) The Musical Mind: The Cognitive Psychology of Music. Oxford: Clarendon
press.
Spielberger, C. D. (1983) State trait anxiety inventory. Palo Alto, CA: Consulting Psychologists Press.
Standley, J. M. (1995). Music as a therapeutic intervention in medical and dental treatment: Research
and clinical applications. In T. Wigram, B. Saperston, & R. West, (Eds.), The art and science of music
therapy (pp. 3-22). London: Harwood Academic Publishers.
Standley, J. M. (1992). Meta-analysis of research in music and medical treatments: Effect size as a
basis for comparison across multiple dependent and independent variables. In R. Spintge & R. Droh,
(Eds.), MusicMedicine (pp. 345-349). Saint Louis: MMB.
Standley, J. M. (1986). Music research in medical/dental treatment: Meta analysis and clinical
applications. Journal of Music Therapy, 23, 56-122.
Spintge, R. (1985). Some neuroendocrinological effects of socalled anxiolytic music. International
Journal of Neurology, 19, 186-196.
Vander Ark, S.D. & Mostardi R. (2000) Physiological effects of music on the human organism. Paper
presented at the biannual conference of Society for Research in Psychology of Music and Music
Education, Leicester UK.
Zillman , D. & Gan, S. (1997) Musical taste in adolescence. In D.J. Hargreaves and A.C. North Eds.
(pp. 161-188) The Social Psychology of Music. London: Oxford University Press.
Back to index
Proceedings paper
Elvira Brattico1, Risto Näätänen1, Tony Verma2, Vesa Välimäki2, & Mari Tervaniemi1
1 Cognitive Brain Research Unit, Department of Psychology, University of Helsinki, Finland
2 Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Espoo,
Finland
INTRODUCTION
In all musical cultures the human auditory system is suggested to be differentially sensitive to certain
musical intervals. It is known that this different response to sounds is in function of frequency ratios
between partials or tones. In particular, since the Greek philosopher Pythagoras, the psychological
phenomenon of consonance has been associated to the simplicity of the frequency ratio between
tones. Helmholtz was the first to give a physiological explanation to that ancient observation. He
argued that two complex tones in a complex frequency ratio produced beats thus provoking the
simultaneous activation of adiacent hair cells of the organ of Corti. According to Helmholtz, the
overload of the input sent to the brain caused the characteristic disturbances on interval sensation
(Helmholtz, 1954; see also Brattico, 1996; Brattico, 1998). More recently, these ideas, and especially
the V-curve of dissonance as a function of the frequency separation of two sine components, were
confirmed almost completely (Plomp & Levelt, 1965; Kameoka & Kuriyagawa, 1969a; Kameoka &
Kuriyagawa, 1969b). In these recent studies, the maximal dissonance was found at 10% frequency
difference in the middle octave with equal SPL, and at 15% frequency difference in the octave below
the middle one for simple tone dyads (Kameoka & Kuriyagawa, 1969a). A different explanation of
the perceptual effect of consonance and dissonance was also made by Helmholtz, and, similarly, but
from a phenomenological point of view, by Stumpf. The first one considered the more pleasantness of
certain intervals or chords as due to their tonal affinity (Klangverwandtschaft), i.e. to the actual
coincidence between partials. On the other side, according to Stumpf (for a review, see Schneider,
1997) the so-called tonal fusion (Verschmelzung) corresponded with the "psychological" identification
of two or more sounds as one (see also Dewitt & Crowder, 1987). A more recent interpretation of the
phenomenon was also suggested where the dissonance was identified with the psychological
complexity of certain chords being in function with their frequency ratio and with the listeners’
Twelve paid subjects, unselected with regard to their musical background, were employed. One
subject was rejected because of noisy EEG. Of the remained subjects, 4 were males (age range from
18 to 27; average years 22). Two subjects reported to have had formal music lessons (one in piano for
ten years and another in singing for five years).
Two conditions were used (see FIGURE 1). In the Tritone condition, three blocks were presented
each containing the same standard but a different deviant, each. The first block consisted of a
sequence of standard fifth intervals (g3-d4; interval width: 7 semitones) (p=0.8) interrupted by the
deviant augmented fourth or "tritone" interval (g3-c#4; interval width: 6 semitones) (p=0.2). In the
second block the deviants (same probability) were perfect fourths (g3-c4; interval width: 5 semitones).
In the third block they were major sixths (g3-e4; interval width: 9 semitones). In the Seventh
condition, the standard sounds were major sixths (g3-e4; interval width: 9 semitones) (same
probability as before). The deviants were in each block, respectively, major seventh (g3-f#4, interval
width: 11 semitones), perfect fifth (g3-d4; interval width: 7 semitones), and perfect octave (g3-g4;
interval width: 12 semitones). The choice to present each deviant interval in different blocks was
made in order not to create any harmonic context. Moreover, standards of the same pitch were chosen
with regard to the observation that consonance ratings change in different frequency ranges (see
Kameoka & Kuriyagawa, 1969b). The interval width difference between the consonant standard
intervals and dissonant deviant intervals was one or two semitones smaller than the interval width
difference between the consonant standard intervals and the deviant consonant intervals in both the
Tritone and Seventh conditions.
In both the conditions, piano test sounds from the McGill Master series recordings were employed
(McGill University master samples, Volume 3, McGill University 1988;
http://www.music.mcgill.ca/resources/mums/html/mRecTech.html) (see FIGURE 1). For each note,
the first 350 ms part of the sounds was extracted from the longer recorded signal. To avoid audible
clicks by ensuring a smooth release, the final 10 ms of the 350 ms signal was multiplied with an
exponentially decaying signal. The test signals were then normalized to contain the same energy over
their 350 ms duration. The silent inter-stimulus interval lasted 600 ms. Stimuli were presented to the
subjects at 50 dB above individually determined hearing threshold. In all conditions, the stimuli were
presented by the Brain Stimulator software (designed at the Cognitive Brain Research Unit) and
delivered via headphones binaurally with equal phase. Each block consisted of 1000 trials. The
presentation order of the different blocks was randomized across subjects, keeping always a distance
between blocks where standards and deviants were the same. The specific case occurred between the
block of the Tritone condition, where the standard is a perfect fifth and the deviant is a major sixth,
and the block of the Seventh condition, where the standard and deviant were the same as the previous
one but inverted. The subjects were instructed to ignore the stimuli and to concentrate on watching a
silent movie of their own choice.
The electroencephalogram (EEG) (bandpass 0.1 – 30 Hz) was recorded from the fronto-central
electrodes Fz, Cz, Pz, and from L1, L2, R1, and R2 (the one- and two-third sites on the arc connecting
Fz with the mastoids on left and right sides). Vertical and horizontal eye movements were monitored
from electrodes lateral and below the right eye. All electrodes were referenced to the one placed on
the tip of the nose. The impedance was kept along the experiment below 10 kΩ at 30 Hz. The epoch
considered for averaging ranged from 100 ms before stimulus onset to 400 ms after stimulus onset.
EEG epochs with EEG or EOG artefacts beyond ± 75 µ V at any electrode were rejected from
averaging. In order to get the subjects used to the sounds and to avoid novelty effects on the first
trials, the first 15 stimulus trials of the first block and the first 10 trials of the other blocks were also
rejected from averaging. Moreover, the stimuli of first block of the experiment were used to measure
the hearing threshold of each subject. ERPs were averaged separately for each stimulus type and
condition and then digitally filtered (passband 1-20 Hz; slope 24 dB/octave). The baseline was taken
from 100 ms before stimulus onset to the attack of the sound. For each subject, the recordings for each
condition contained not less than 110 accepted responses to the deviant stimuli.
The MMN was calculated by subtracting the responses to the standards from the responses to the
deviants. In each condition, MMN peak latencies were determined from 80-220 ms window in the
individual difference waveforms. The MMN mean amplitudes were measured from the 40 ms
windows centered on the individual peaks. The statistical significance of the MMN was evaluated by
comparing the MMN amplitude at Fz, Cz, L1, and R1 electrodes to zero with one-tailed t-tests. In the
same way, the mean amplitudes of the positivity reversal were individually determined at both the
mastoid leads and subsequently their significance was also evaluated.
In order to analyze MMNs with maximized amplitude, ERPs were re-referenced to the average of the
left and right mastoids. MMN individual peak latencies for the re-referenced waves were also
measured from 80-220 ms window. The MMN mean amplitudes of the re-referenced waveforms
were, then, calculated from the 40 ms window centered on the individually determined peaks.
The MMN amplitudes and the MMN peak latencies were compared between the different blocks of
each condition at the Fz, Cz, L1, and R1 electrode sites by two-way ANOVAs with repeated measures
(STATISTICA software). Moreover, in order to study laterality effects, a two-way ANOVA was also
performed for the MMN amplitudes at L1 and R1 electrodes only. The significance levels for F values
from ANOVAs were Greenhouse-Geisser corrected when appropriate. Post-hoc tests were conducted
by Newman-Keuls comparisons.
RESULTS
The frequency change elicited significant MMNs in all conditions. The MMN reached its maximal
amplitude over the frontal area peaking, on the average, at about 150 ms after the stimulus onset. The
MMN amplitudes for all conditions differed significantly from zero at Fz, Cz, L1 and R1 (t = 4.2 -
13.4, p < 0.005; one-tailed t-tests). The reversed potential at the left and right mastoids was
significantly different from zero in all conditions (t = 4.6 - 12.6, p < 0.0005; one-tailed t-tests).
MMN amplitude
In the Tritone condition (see FIGURE 2, left panel), the MMN amplitude differences were analyzed
by a two-way ANOVA with Deviant (tritone, fourth, sixth) and Electrode (Fz, Cz, L1, R1) as factors.
Main effects of Electrode [F(3, 30) = 3.49; p < 0.03] and Deviant x Electrode interaction [F(6, 60) =
2.78; p < 0.02] were found. To study possible effects of laterality, as indicated by the significant
interaction, a two-way ANOVA was also performed with Deviant (the same as before) and Electrode
(L1 and R1) as factors. Also in this case, the interaction between the two factors was significant [F(2,
20) = 4.29; p < 0.03]. A post-hoc Newman-Keuls test showed that the tritone MMN was larger than
both fourth and sixth MMNs only at the R1 electrode (p < 0.003, p < 0.03, respectively), while the
three deviants did not differ at the L1 site. Moreover, the tritone MMN was significantly larger at the
right electrode (R1) than on the left one (p < 0.01), while the MMNs to the other deviants did not
show different amplitude between the right and left electrodes.
In the Seventh condition (see FIGURE 2, right panel), the MMN amplitude differences were also
analyzed by a two-way ANOVA with Deviant (seventh, fifth, octave) and Electrode (Fz, Cz, L1, R1)
as factors. Main effects of Deviant [F(2, 20) = 8.53; p < 0.002] and Electrode [F(3, 30) = 3.59); p <
0.03] were found. The Newman-Keuls post-hoc test applied to the main effect of Deviant clarified
that the MMN amplitude for the seventh deviant was significantly larger than the MMN amplitude for
the sixth and octave deviants (p < 0.01, p < 0.002, respectively). In addition, a two-way ANOVA
performed only for L1 and R1 electrodes (with Deviant as the other factor), and the relative post-hoc
Newman-Keuls test showed that the seventh MMN was always larger than the sixth and octave
MMNs at L1 and R1 electrodes (p < 0.002 in all comparisons). In addition, the MMN amplitude at R1
was significantly larger than the MMN at L1 for all deviants (p < 0.02-0.04).
MMN latency
In the Tritone condition (FIGURE 2, left panel, bottom), the MMN latencies at the Fz, Cz, L1, and R1
electrode sites were also analyzed by a two-way ANOVA with Deviant and Electrode as factors. A
main effect of Deviant [F(2, 20) = 6.52; p < 0.007)] was found. A Newman post-hoc test revealed that
the tritone MMN occurred significantly later than the fourth and sixth MMN (respectively, p < 0.02, p
< 0.009), while there was no significant difference between the MMN latency of the fourth and sixth
deviants. On the contrary, in the Seventh condition (FIGURE 2, right panel, bottom), no significant
latency differences were found.
DISCUSSION
The first objective of the present experiment was to investigate the pre-attentively encoded
similarity/dissimilarity between consonant and dissonant intervals. The second parallel objective was
to analyze the way those intervals are processed by the human auditory system. The present data
indicate that dissonant intervals are processed in a specific way when compared to consonant intervals
(the perfect fifth and the major sixth used as standards). In particular, the seventh as deviant evoked a
larger MMN respect to the fifth and octave deviants when compared to the sixth standard. On the
contrary, the tritone deviant, compared to the fifth standard, did not elicit a larger MMN with respect
to the fourth and sixth deviants but a slower one. The slower processing of that interval is probably
due to its frequency ratio complexity. The enhanced MMN amplitude in the case of the seventh
deviant is related to its larger dissimilarity with the standard, but could be also related to the presence
of beats.
Moreover, the present data suggest that the tritone deviant could be processed more in the right
cerebral hemisphere than the other consonant intervals of the same condition (the fourth and the
sixth), being consistent with the results regarding emotional processing by Blood, Zatorre, Bermudez,
& Evans (1999).
Our findings thus suggest that not only differences on a physical dimension could have generated
different ERP responses to musical intervals. If it was so, one could not explain why the MMN
latency and laterality were different in response to intervals varying only in their pitch width and in
their degree of consonance while all the other parameters were kept constant. Following studies
should, then, focus on the localization of sound combinations differing in the frequency ratios
between them and in the way they fuse, keeping in mind that the processing of intervals, chords or
phonemes does not correspond to the quantitative summation of their components but involves
qualitatively new mechanisms.
ACKNOWLEDGEMENTS
We thank prof. M. Karjalainen, Ms. Kuusi, and Mr. C. Erkut for their support. This study was
supported by the Academy of Finland.
FIGURE LEGENDS
FIGURE 1. Schematic illustration of the stimulation used in this experiment. Left panel, top: Musical
representation of the stimuli used in the Tritone condition. The three bars, representing the three
experimental blocks, contain repeated fifth standard intervals and, respectively, the tritone, fourth, and
fifth deviants. The pause between the sounds represents the silent inter-stimulus interval (ISI) of 600
ms used in the experiment. Left panel, bottom: Physical representation of the stimuli in the Tritone
condition. The first envelope from the top shows the standard and the other envelopes the three
deviants. Right panel, top: Musical representation of the stimuli used in the Seventh condition. The
three bars, representing the three experimental blocks, contain repeated sixth standard intervals and,
respectively, the seventh, fifth, and octave deviants. As in the Tritone condition, the pause between
the sounds represents the silent inter-stimulus interval (ISI) of 600 ms of the experiment. Right panel,
bottom: Physical representation of the stimuli in the Seventh condition. The first envelope from the
top shows the standard and the other envelopes the three deviants.
FIGURE 2. Left panel, top: The MMN difference waves for the three deviants of the Tritone
condition at L1, Fz, and R1 electrodes. Left panel, bottom: MMN mean amplitudes (left) and latency
(right) for the three deviants of the Tritone condition averaged across Fz, Cz, L1, and R1 electrodes.
Right panel, top: MMN to the three deviants of the Seventh condition at L1, Fz, and R1. Right panel,
bottom: MMN mean amplitudes (left) and latency (right) for the three deviants of the Seventh
Back to index
Proceedings paper
Steven Jan
School of Academic Studies, Royal Northern College of Music, 124 Oxford Road, Manchester M13 9RD, United Kingdom.
E-mail: steven.jan@rncm.ac.uk
At its most radical, the principle of universal Darwinism maintains that some of the most remarkable and powerful things in the
universe are a class of entities Dawkins calls replicators, which he defines as "...anything in the universe of which copies are made.
Examples are a DNA molecule, and a sheet of paper that is xeroxed" (1983: 83). These, he contends, are the fundamental units of
selection in a universe-wide process of Darwinian evolution. A class of such entities is the meme, the subject of the present paper,
defined by Dawkins as a "...unit of cultural transmission...a unit of imitation" (1989: 192).
As a means of understanding the nature of culture, the memetic paradigm has received increasingly serious attention in recent years,
despite attempts from its inception to downgrade it to the status of a "meaningless metaphor" (Stephen Jay Gould, in Blackmore
1999: 17). From the ranks of its advocates, Derek Gatherer has warned of the danger of memetics' becoming "...merely a
meta-narrative having no more right to call itself scientific than dialectics..." (1997: 83). Despite vocal criticism, increasingly serious
attention has been paid to the theory, culminating in three recent book-length studies (Brodie 1996, Lynch 1996, Blackmore 1999),
and the inception, in 1997, of a dedicated, online Journal of Memetics.
If one accepts the validity of the memetic paradigm-that human culture is an ecology of independent particulate entities which
optimize their chances of survival to the degree that they maximize their tendency to imitation-then it is reasonable to attempt to
apply it to music. After all, music is a stream of sound information which, in its generation and perception, is segmented into
discrete, particulate units. Moreover, a memetic perspective on music would draw heavily upon psychology, for our innate perceptual
and cognitive competencies are part of the long-term environment of the meme, and our neural structures their fundamental physical
incarnation.
While constraints of space prevent a detailed exposition of what a memetics of music might constitute, I hope to provide here an
outline of its main premises, with reference to some of the principal concerns of music psychology. In particular, I shall consider how
our innate perceptual and cognitive attributes affect the meme, and how hierarchical aspects of musical structure and perception
relate to the claims of memetics.
As an element of the act of cognition, we subject music to the operation of segmentation, dividing the stream of sound information into
discrete units in order to facilitate processing. Our perceptual and cognitive faculties are attuned to obvious points of articulation, such as
pauses, cadences, and changes in material. More fundamentally, however, our comprehension of patterning is controlled by attributes
identified by the Gestalt tradition of psychology. Deutsch observes that
...we group elements into configurations on the basis of various simple rules.... One is proximity: closer elements are grouped together in preference
to those that are spaced further apart.... Another is similarity.... A third, good continuation, states that elements that follow each other in a given
direction are perceptually linked together.... A fourth, common fate, states that elements that change in the same way are perceptually linked
together. As a fifth principle, we tend to form groupings so as to perceive configurations that are familiar to us.
(1999: 300)
Perhaps the most thorough application of such pattern-perception principles to musical analysis is Eugene Narmour's
implication-realization model (1977, 1989, 1990, 1992, 1999), which draws strongly on the Gestalt-inspired groundwork established by
Leonard Meyer (1956, 1973, 1989). The implication-realization model offers means of tracking the note-to-note implicative flux of a
melody and identifying its points of procession and closure at various hierarchical levels. Narmour notes that
...the separate registral and intervallic aspects of small intervals...are said to be implicatively governed from the bottom up by the Gestalt laws of
similarity, proximity, and common direction.... As perceptual-theoretical constants, what is important to notice about the invocation of such Gestalt
laws is (1) that they have been shown to be highly resistant to learning and thus may be innate...; (2) unlike the notoriously interpretive, holistically
supersummative, top-down Gestalt laws of "good" continuation, "good" figure, and "best" organization...the Gestalt laws of similarity, proximity,
and common direction are measurable, formalizable, and thus open to empirical testing....
(1989: 46-47)
The following example illustrates the relationship between grouping principles and memetic replication. The opening of Example 1 i, from
the Act I finale of Mozart's Die Zauberflöte, is replicated at the opening of Schubert's song "Heidenröslein," Example 1 ii. In particular,
Schubert imitates the pitch segment marked x, allowing us to regard it as a meme. According to Narmour's segmentation principles, the
minim a2 in the second bar of each melody is a point of articulation, for
...incisive points of melodic closure creating pitch groupings take place when in the parameter of duration a short note moves to a long note...; we
may mark such durationally cumulative places analytically with the symbol (d) over the closed melodic note, the "d" standing for durational
interference in the continuation of the melodic line....
(1989: 45)
Given that the segments are identical, it follows that the implication-realization closural structure of the meme is unchanged. It consists of
three melodic-implicative units: D (reiteration of a pitch), IP (realization of implied intervallic similarity but denial of implied registral
direction), and P (realization of both implied intervallic similarity and registral direction) (for a full explanation of the meaning of these
symbols, see Narmour 1989, 1990, 1992):
Assuming direct imitation by Schubert, it is unlikely for the later composer to have perceived a segmentation of the first four bars such as
that shown in Example 1 iv, which conflicts strongly with the concept of durational interference; the segmentation given in Example 1 iii
would clearly have had more cognitive reality for Schubert. It will be understood from this simple example that the gene-meme interface is
always significant. To say that a meme will prosper if it takes advantage of the perceptual and cognitive environment provided by genes
(the hierarchical level of laws, discussed in Section 4.1 below) is insufficient; a meme is in large part defined by the template of that
environment.
1. Coequality
As is seen in Example 1 i and ii above, the identification of constituent memes in music is based upon the principle of coequality-the
presence of an overlapping string of data which allows the initial and terminal pitches and medial content of the meme, in both contexts, to
be defined by reference to that segment which is copied. Without the presence of a coequal, a particle would not be a meme; it would be, in
Lynch's terminology, a mnemon-"[a]n item of brain-stored memory. When copied from one brain to another, it becomes a meme" (1998).
It is clear from this example that the longer an imitated passage, the greater the statistical probability the later copy is a conscious
(self-)quotation. Conversely, many very short coequals, of three notes or more, may be so anonymous as to be hard to situate in any nexus
of imitation, in the absence of what Nattiez terms strong poietic evidence (1990: 10-16). These particles exist as the common currency of a
style, to which all practitioners of a given period and location had access. In western tonal music such patterns-style forms, in Narmour's
term, discussed in Section 4.1 below-include basic scale-degree progressions, such as the pattern Þ 3-Þ 2-Þ 1, and simple harmonic
progressions, such as the cadential sequence ii63-V-I. This continuum of imitative relationships is represented below:
As with the gene, the longer and more complex the meme, the more susceptible it is to fragmentation and miscopying upon replication. It
follows from this that long memes have lower copying-fidelity than short, yet perhaps have higher psychological salience; by contrast,
memes which are too short, perhaps of fewer than three or four elements, lack the necessary prominence which ensures their fecundity
(Dawkins 1989: 18, 194).
1. Memetic Hierarchies
The meme exists as part of a rich complex of hierarchic levels which operate in two basic dimensions, cultural and structural. While
the first dimension is the province of the historian and style analyst, the second is the domain of the music theorist; indeed it is
perhaps true to say that over the last century the central preoccupation of music theory has been to model the internal hierarchical
Depending upon its intrinsic psychological salience, a nascent meme arising at the level of intraopus style may eventually
come to be propagated, via the level of idiom (the style of a composer), at the level of dialect-the meme becomes part of the
compositional repertoire of all the composers of a given chronological or geographical locus. The implication-realization
tradition defines the basic uniparametric patterns which populate the dialect as style forms; in their specific contexts-i.e., at the
intraopus level-style forms exist as syntactic, multiparametric style structures. If a meme is propagated in several dialects, it
will contribute to the structure of the system of rules ultimately mediating the replication of all memes at lower hierarchic
levels. Beyond this, however, the level of laws is governed by innate (i.e., genetically-determined) attributes of perception and
cognition, such as the Gestalt principles examined in Section 3.1 above.
The movement of a meme outwards from the centre of Figure 2 is described in epidemiological terms by those
commentators-starting with Juan Delius (1986)-who see memetics as the study of "thought contagion" (Lynch 1996) by
"viruses of the mind" (Brodie 1996). The infectivity-or cultural fitness (Cavalli-Sforza and Feldman 1981: 17)-of a meme is an
index of its intrinsic appeal to the environment of a brain, which is circumscribed both by innate perceptual and cognitive
attributes, and by the receptivity to incursion of the complement of memes already encoded therein.
2. Structural Hierarchies
Memes exist within a work at hierarchic levels other than the immediate foreground; they are generated at higher structural levels by
memes at lower levels. While perhaps the most obvious means of conceptualizing such intraopus hierarchies, the Schenkerian model
has been justly criticized for its axiomatic imbalance-the a priori generation of every structure recursively downwards from the
Ursatz-which is at odds with the interplay between the "bottom-up" and "top-down" operations which characterize musical
processing. By contrast, the implication-realization model discussed in Section 3.1 above is one system which, while capable of
accounting for hierarchical structures, attempts to reconcile top-down and bottom-up mechanisms by taking account of lower-level
procession and closure. Moreover, it lends itself well to mapping memetic replication at various strata within a work.
One fundamental unit in intraopus hierarchies is the style structure (see Section 4.1 above), which might be understood as
...a kind of "theme" that listeners implicatively map from the top down onto incoming foreground "variations." They hear different melodic variants on
lower levels as creating similar structural-tone "themes" on higher levels.
The following diagram represents in abstract fashion how a style structure consists of a constellation of pitch structures, each of
which is demarcated by Gestalt closural principles. These generate (bottom-up), and are perceived in terms of (top-down), a
middleground "theme" at the ° . level:
Figure 3: Implication-Realization Hierarchies (after Narmour 1999: Figure 2)
Note: Dotted vertical lines represent transformation of initial and terminal pitches of lower-level groupings to higher levels. Brackets above units are
implication-realization spans, as in Example 1 i and ii above. The diagram simplifies the relationship between units for, as in Example 1, terminal pitches
of one unit may simultaneously function as the initial pitch of another.
Memes may assemble to form confederations termed coadapted meme-complexes, or memeplexes (Blackmore 1999: 19). If it is
accepted that the four foreground-level memes which generate the first structure at the ± level in Figure 3 might theoretically occur
in another context, then while existing as a ± level meme (it is memetic at this level because the same ± level structure occurs in
these two hypothetical contexts), it also exists as a replicated complex of patterns at the foreground level. Each individual foreground
pattern is a meme (it exists in this form in these two contexts, and may exist independently in other contexts), but each complex is
also memetic. Furthermore-applying the same logic recursively to higher levels-if it is accepted that the two ± level memes which
generate the ° . level structure in Figure 3 might theoretically occur in another context, then while existing as a ° . level meme (it is
memetic at this level because the same ° . level structure occurs in these two hypothetical contexts), it also exists as a replicated
complex of patterns at the ± level.
Clearly real units at the foreground level generate virtual configurations at higher structural levels. It may be the case that the same
pattern is replicated at more than one level in a work; or, in different works, the same pattern may be propagated not at their
foregrounds, but at higher levels. In this second case, on a strict definition, these are not units of direct imitation, but they are units of
consequential replication and are therefore memetic. The structure of such higher-level memes is potentially instructive for what it
can tell us about the conglomerative grammar of foreground orientated memes, which is ultimately a function of their initial and
terminal nodes, pitch content, and the way these elements interact with our cognitive attributes.
Example 2: The Þ 1-Þ 7...Þ 4-Þ 3 Changing-Note Schema: after Haydn, Minuet from Divertimento in C major Hob. XIV: 10
(c. 1760), bb. 1-4
In perception, bottom-up processes at the opening of this phrase will first identify the component features of the initial event, but
without at this stage comprehending their broader context. At some point within the initial event, the cumulative evidence of the
From this one might infer that such cognitive schemata create selection pressure in favour of memetic conformance. In Example 2,
for instance, the bass pattern c1-g1-f1-e1-d1 in bb. 1-2 and the corresponding b-f1-e1-d1-c1 in bb. 3-4 (together with their associated
rhythmic meme, ³ 1/4 ± . Ä Ö µ | ° £ ), while different memes (because of their dissimilar scale-degree orientation and
contrasting internal intervallic structure), may achieve comparable population sizes in the dialect because of their membership of a
schema which favours parallelism between its initial and terminal melodic events.
(1995: 343)
In memetic terms, the variation (mutation) of memes creates the required "abundance"; the replication of memes is, of course, one of
their defining attributes, furnishing the necessary "heredity"; and it is the case that novel (mutant) memes are more likely to be
imitated than those already established in the meme pool, imparting to them a higher "differential fitness."
If a meme is perceived as a variant of an existing form, this deviation from the antecedent configuration may aid its differential
fitness. This may be because the mutation has the effect of increasing its implicative energy, which might in turn have the effect of
raising its cultural fitness. For instance, meme x from Example 1 i and ii above might be mutated by changing the closing a2 of its
second bar to e2. The resulting terminal IP structure (replacing the original P) is less closed-Narmour regards the IP shape as an
example of "partial denial" of implications (1989: 48)-and this characteristic arguably makes the mutant meme more psychologically
salient, and therefore more fecund, than its antecedent:
Example 3: Memetic Mutation, after Mozart: Die Zauberflöte K. 620 (1791) no. 8, bb. 327-328
If it is the case that such changes affect the propensity of memes to imitation, then the meme is clearly an active replicator, defined
by Dawkins as
...any replicator whose nature has some influence over its probability of being copied. For example a DNA molecule, via protein synthesis, exerts
phenotypic effects which influence whether it is copied: this is what natural selection is all about. A passive replicator is a replicator whose nature has no
influence over its probability of being copied. A xeroxed sheet of paper at first sight seems to be an example, but some might argue that its nature does
influence whether it is copied, and therefore that it is active: humans are more likely to xerox some sheets of paper than others, because of what is written
on them, and these copies are, in their turn, relatively likely to be copied again.
(1983: 83)
Such changes at the level of the individual meme have a cumulative effect which eventually leads to changes at higher hierarchic
levels. The Þ 1-Þ 7...Þ 4-Þ 3 schema discussed in Section 5 above, for instance, gradually declined in population density in the
early-nineteenth century because of the consequences of changes to its constituent features (see the population distribution curve in
Gjerdingen 1988: 263, which gives its apogee as c. 1773). As these components changed, so did the resultant middleground style
structure; although the distinction is fuzzy and ultimately subjective, beyond a certain point of alteration a style structure-indeed any
meme-is no longer the same pattern mutated, but a different pattern.
Ultimately, however, memetics even undermines the notion of a unitary conscious self, seeing this as merely a memeplex, albeit a
large and sophisticated one; Blackmore, the leading advocate of this argument, speaks provocatively of
...the most insidious and pervasive memeplex of all....the 'selfplex.' The selfplex permeates all our experience and all our thinking so that we are unable to
see it clearly for what it is-a bunch of memes.
(1999: 231)
This interpretation clearly challenges received conceptions of the creative process in music. Where traditionally the composer is seen
as in full conscious control of the creation of a work, many accounts given by composers themselves suggest the validity of a
memetic analysis. Despite the variable authenticity of some such remarks, sufficient consensus exists among them to suggest that the
composer is
...not so much conscious of his ideas as possessed by them. Very often he is unaware of his exact processes of thought till he is through with them;
extremely often the completed work is incomprehensible to him immediately after it is finished.
In the ontogeny of a musical work, memes rage within the composer's selfplex, gradually conglomerating memotypically to engender
the finished structure in its phemotypic incarnation. Adopting Dennett's computer analogy, the "greatness" of the product is a
function of the composer's memory capacity, neuropsychological processing power, and the richness of the environment-the
complexity of the software-to which he or she is exposed.
In conclusion, despite these challenges, and despite the reservations some might have to the controversial issues noted in Section 7
above, there is much to commend the memetic paradigm as relevant to the concerns of musicology and music psychology. Firstly,
given that musical analysis is "...the resolution of a musical structure into relatively simpler constituent elements, and the
investigation of the functions of those elements within that structure" (Bent and Drabkin 1987: 1), it is legitimate to attempt to cleave
the structure at perceptual/cognitive-imitative joints, for these articulations, as noted in Section 3.1 above, have strong psychological
realty for composers and listeners. Secondly, the memetic perspective is fully concordant with the synchronic view of music as a
multileveled hierarchic structure and the diachronic view of music as a timeline of imitative connection manifesting change over
time. Thirdly, there is clearly much common ground between musicology, music psychology, and memetics. The area of overlap
between the three disciplines clearly has the potential to be a place of fruitful interdisciplinary collaboration.
© Steven Jan, May 2000.
6. References
Bharucha, J.J. (1999) Neural nets, temporal composites, and tonality. In D. Deutsch (Ed.). The
Psychology of Music (Academic Press Series in Cognition and Perception), 2nd edn. San Diego,
Academic Press. pp 413-440.
Blackmore, S.J. (1999) The Meme Machine. Oxford, Oxford University Press.
Brodie, R. (1996) Virus of the Mind: The New Science of the Meme. Seattle, Integral Press.
Cavalli-Sforza, L.L. and Feldman, M.W. (1981) Cultural Transmission and Evolution: A
Quantitative Approach (Monographs in Population Biology, no. 16). Princeton, Princeton
University Press.
Dawkins, R. (1983) The Extended Phenotype: The Long Reach of the Gene. Oxford, Oxford
University Press.
Gatherer, D. (1997) The evolution of music: a comparison of Darwinian and dialectical methods.
Journal of Social and Evolutionary Systems, 20/1, 75-92.
Gjerdingen, R.O. (1988) A Classic Turn of Phrase: Music and the Psychology of Convention.
Philadelphia, University of Pennsylvania Press.
Hewlett, W.B. and Selfridge-Field, E. (Eds.) (1998) Melodic Similarity: Concepts, Procedures,
and Applications (Computing in Musicology, no. 11). Cambridge, MA, MIT Press.
Jan, S.B. (2000b) The selfish meme: particularity, replication, and evolution in musical style.
International Journal of Musicology 8 (in press).
Jan, S.B. (2002) The illusory Mozart: selfish memes in the priests' marches from Idomeneo and
Die Zauberflöte. International Journal of Musicology 10 (forthcoming).
Lerdahl, F. and Jackendoff, R. (1983) A Generative Theory of Tonal Music. Cambridge, MA,
MIT Press.
Lynch, A. (1996) Thought Contagion: How Belief Spreads Through Society-The New Science of
Memes. New York, Basic Books.
Lynch, A. (1998) Mnemon 1998a: Y2K Memes (Issue 1).
<http://www.mcs.net/~aaron/Mnemon1998a.html>
Meyer, L.B. (1956) Emotion and Meaning in Music. Chicago, University of Chicago Press.
Meyer, L.B. (1973) Explaining Music: Essays and Explorations. Chicago, University of Chicago
Press.
Meyer, L.B. (1989) Style and Music: Theory, History, and Ideology. Philadelphia, University of
Pennsylvania Press.
Narmour, E. (1977) Beyond Schenkerism: The Need for Alternatives in Music Analysis. Chicago,
University of Chicago Press.
Narmour, E. (1989) The 'genetic code' of melody: cognitive structures generated by the
implication-realization model. In S. McAdams and I. Deliège (Eds.). Music and The Cognitive
Sciences. London, Harwood. pp 45-63.
Narmour, E. (1990) The Analysis and Cognition of Basic Melodic Structures: The
Implication-Realization Model. Chicago, University of Chicago Press.
Narmour, E. (1992) The Analysis and Cognition of Melodic Complexity: The
Implication-Realization Model. Chicago, University of Chicago Press.
Narmour, E. (1999) Hierarchical expectation and musical style. In D. Deutsch (Ed.). The
Psychology of Music (Academic Press Series in Cognition and Perception), 2nd edn. San Diego,
Academic Press. pp 441-472.
Nattiez, J.-J. (1990) Music and Discourse: Toward a Semiology of Music. Tr. C. Abbate.
Princeton, Princeton University Press.
Plotkin, H.C. (1995) Darwin Machines and the Nature of Knowledge: Concerning Adaptations,
Instinct and the Evolution of Intelligence. London, Penguin Books.
Sloboda, J.A. (1996) The Musical Mind: The Cognitive Psychology of Music (Oxford Psychology
Series, no. 5). Oxford, Clarendon Press.
Back to index
Proceedings paper
3-second silence interval. Subjects were asked to identify the tone by pointing to the corresponding key on a
paper keyboard. The test administrator recorded their choice of keys and later calculated the number of correct
identifications and the number of incorrect identifications that were off by one semitone. The accuracy of these
calculations were corroborated by another test administrator. Subjects were considered to possess absolute pitch
if they met two criteria: (1) They identified a minimum of 85% of the pitches accurately and (2) their incorrect
identifications, if any, were off target only by a semitone in at least 85% of the errors. It was found that all the
subjects who met criteria 1 also met criteria 2 (n = 12). These subjects were classified as absolute pitch
possessors. Six subjects identified between 58% and 77% of the tones accurately. The incorrect identifications of
five of these subjects were off from target only by a semitone in most of the errors (88% - 100%) and those of the
remaining subject were off by a semitone in 54% of the errors. Although these six subjects identified pitch in
isolation better than did the relative pitch possessors tested by Zatorre et al. (1998), they did not meet the two
criteria fulfilled by absolute pitch possessors in the present study. These subjects were classified as pseudo
absolute pitch possessors and excluded from the comparisons of absolute and relative pitch possessors.
The data from two subjects with hearing impairments and two subjects who did not follow test directions
properly were discarded. The final sample was comprised of 12 musicians with absolute pitch, 11 musicians with
relative pitch, six musicians with pseudo absolute pitch, and 12 nonmusicians. There were two left-handed
subjects in each of the groups and a total of 18 female and 23 male subjects in the sample.
Results
In order to establish whether there were differences in spatial abilities among absolute pitch possessors, relative
pitch possessors, and nonmusicians, I analyzed their scores in the three tests (Hidden Figure Test, the Spatial
Subtest, and the Object Assembly Subtest) through three analyses of variance. The results showed that Group
affected scores in the Hidden Figures Test, F (2,34) = 8.33, p = .001, but did not affect subjects' performance in
the Object Assembly and Spatial Subtests. Scheffe comparisons indicated that absolute pitch possessors obtained
significantly higher scores, p < .05, than did relative pitch possessors and nonmusicians in the Hidden Figures
Test (Table 1). There were no difference between the Hidden Figure Test scores of nonmusicians and musicians
with relative pitch .
Table 1
Tests' mean scores of nonmusicians, absolute pitch possessors and relative pich possessors
The results of the analyses do not support previous findings regarding the superior spatial abilities of musicians
as compared to nonmusicians because no differences in spatial performance could be established between
musicians with relative pitch and nonmusicians. In order to study the effect of music training on spatial abilities
further, I performed multiple regression analyses with years of music instruction and years of participation in
ensembles as the independent variables and the three test scores as the dependent variables. The data from the six
subjects classified as absolute pitch possessors were included in these analyses because the classification into
absolute pitch or relative pitch possessors was superfluous. The model explained almost 20% of the variance in
the Hidden Figure Test scores (R = .49; Adjusted R-squared = .19, p = .01) and showed that years of music
training, and not years of music ensemble, affected subjects performance in this tests (Year of music training:
These results suggest that the differences between the absolute pitch possessors and the other musicians and
nonmusicians could be attributed to the early musical training received by the former group. In order to explore
this idea further, I studied the data from the absolute and relative pitch possessors included in the original
analyses taking into consideration the age at which they began music instruction. ANOVAs with Group (absolute
pitch or relative pitch possessors) and Starting Age (3/4/5 or 6+ years) as the independent variables indicated that
Group affected the Hidden Figure Test scores F (1,19) = 4.53, p =.05. No other significant main effects or
interactions were found. It is important to consider the small number of subjects included in these exploratory
analyses: Only four relative pitch possessors had began music instruction by age 5 and four absolute pitch
possessors had done so after such age. The number of absolute pitch possessors with early musical training and
relative pitch possessors with late musical training was higher (n = 8 and 7 respectively).
Discussion
The results of the study show that a selected group of musicians, those who had absolute pitch, performed better
in the Hidden Figure Test than did nonmusicians and musicians with relative pitch. No other differences among
absolute pitch possessors, relative pitch possessors, and nonmusicians could be established.
Based on the results of previous studies about the superior spatial abilities of musicians as compared to
nonmusicians and the effects of music instruction on spatial development, I had expected to find clear differences
in the spatial scores of musicians and nonmusicians. In this study, musicians had extensive music training (M =
15.5 years) and nonmusicians had no formal instruction or ensemble experience. Although the mean scores of the
nonmusicians were lower than those of the musicians, in none of the three spatial tasks used in the study did
musicians with relative pitch outperform nonmusicians and only in the Hidden Figures Test did musicians with
absolute pitch obtain significantly higher scores than did nonmusicians.
Interestingly, absolute pitch possessors not only outperformed nonmusicians in the Hidden Figure Test test, but
they also outperformed musicians with relative pitch. This difference between absolute pitch and relative pitch
possessors cannot be exclusively attributed to music training given that the type and length of the training
received by both groups were very similar. The only obvious difference between the two groups of musicians
was the age at which they began formal music instruction. Absolute pitch possessors began taking lessons at a
younger age (4.5 years) than did relative pitch possessors (7.4 years) suggesting that music training may improve
the performance in specific spatial tasks only if it occurs very early in life.
Other results of the study support the idea that the age at which music training is initiated affects the relationship
between music training and spatial development. When I disregarded the information about subjects' possession
of absolute pitch or relative pitch and analyzed the data according to the age at which they began formal music
instruction, I found that musicians who started music lessons during the first five years of life scored significantly
higher in the Hidden Figure Test and in the Object Assembly Subtest than did nonmusicians. Musicians who
began music instruction after their sixth birthday did not score significantly better than nonmusicians in any of
the spatial tests and scored significantly lower than the musicians with early musical training in the Spatial
Subtest. Apparently, extensive music instruction may be associated with enhanced performance in certain spatial
tasks only if provided from a very young age.
There is some evidence that early musical instruction may not only be associated with enhanced performance in
certain spatial tasks, but actually improves the development of spatial abilities. While research conducted with 3-,
4-, and 5-year-olds concluded that music instruction improved children's performance in the Object Assembly
Subtest of the Weschler Intelligence Scale for Children (Gromko & Poorman, 1998; Rauscher et al., 1997),
studies developed with older children did not find such improvements or found only temporary improvements in
certain spatial tasks (Costa-Giomi, 1999; Hurwitz et al., 1975; Persellin, 2000). In the present study, there were
no differences between the Object Assembly Subtest scores of subjects with extensive musical training (i.e.,
musicians with relative and absolute pitch) and those with no formal musical training (i.e., nonmusicians).
However, musicians who began taking music lessons during their first five years of life outperformed
nonmusicians in this test. These findings corroborate that the age at which music training is initiated affects the
relationship between music instruction and spatial development.
Because starting age of musical instruction affected test scores, it could be assumed that the superior performance
of absolute pitch possessors in the Hidden Figure Test reported earlier was the result of their participation in
music instruction from a young age. While most absolute pitch possessors began taking lessons by age 5 (66%),
most relative pitch possessors did so after age 7 (64%). Despite these obvious differences between absolute and
relative pitch possessors, it is not clear that starting age of musical instruction is the cause for their performance
in the Hidden Figure Test scores. Exploratory analyses that looked at the effects of both starting age and absolute
pitch on partial data (nonmusicians were excluded from these analyses), indicated that absolute pitch, and not
starting age, affected the scores in this test. Additionally, no interaction between the two variables was found.
Because of the small sample size of these particular analyses, their results should be taken with caution.
However, the findings indicate that there are differences other than starting age of musical instruction between
absolute pitch possessors and relative pitch possessors that affect their performance in a nonmusical task. Future
Back to index
Proceedings paper
The perception of rhythmic patterns exhibits certain features of categorical perception, including abrupt category boundaries and nonmonotonic
discrimination functions (Clarke, 1987; Schulze, 1989). Other attributes of rhythm categorization, however, such as good within-category
discrimination and strong dependence on context have special implications for the perception of musical rhythm. It has been suggested that two
processes operate in rhythm perception, one assigns rhythms to categories depending on metrical context, while another interprets category deviations
as expressive (Clarke, 1987). This interpretation possesses a certain circularity, however. Perceived rhythmic patterns influence the perception of
metrical structure, while metrical structure influences the perception of rhythmic patterns. How does a temporal sequence give rise to a structural
interpretation? How does metrical structure subserve both categorization and discrimination? Which temporal fluctuations are expressive, and which
force structural reinterpretation? The current study aims to address these issues by investigating the role of rhythmic context in the categorization of
temporal patterns.
Background
In a pioneering study, Clarke (1987) demonstrated that the categorization of rhythmic patterns was sensitive to metrical context. Music students
listened to short musical sequences in which the durations of the final two time intervals were varied systematically to create an interval ratio between
1:1 and 2:1, inclusive. Musicians were asked to categorize the ratio of the final two intervals as either 1:1 and 2:1. The musical sequences were
presented in two blocks, one providing the context of duple meter, the other a context of triple meter. Clarke found that the position of the category
boundary shifted according to metrical context: Ambiguous ratios (between 1:1 and 2:1) were more likely to be categorized as 2:1 in the context of
triple meter, whereas these same ratios were more likely to be categorized as 1:1 in the context of duple meter. Moreover, in a discrimination task
Clarke discovered a nonmonotonic discrimination function with a single peak at the category boundary, providing evidence for categorical perception.
Schulze (1989) criticized Clarke's findings on two grounds. First, he argued, because listeners were forced to choose between two categories, the
category boundary shift might not be perceptual; it might simply reflect a shift in response criterion. Second, because tempo was held constant,
file:///g|/Tue/Large.htm (1 of 7) [18/07/2000 00:36:16]
Rhythm Categorization in Context
listeners might not be performing rhythm discrimination task at all, but rather a time discrimination task. Thus, the evidence for categorical perception
of rhythmic patterns might be suspect as well. To control for these factors Schulze (1989) asked two musicians to learn numerical category tags for
prototypical rhythmic patterns. Then, during a response phase, the tempo of the patterns was roved randomly from trial to trial, and listeners' rated
rhythms according to the degree to which they were perceived as realizations of the prototypical patterns. When the ratings were used to derive a
measure of discriminability Schulze found nonmonotonic discrimination functions. But these were not the single-peaked functions of classic
categorical perception (Liberman, et. al., 1957); these discrimination functions contained multiple peaks. These results suggest that rhythmic patterns,
heard out of context, are not perceived categorically.
The specific question of whether or not rhythmic patterns are perceived categorically may be beside the point, however. First, a large branch of
research into categorical perception has called the entire categorical/continuous distinction into question (Macmillan, 1987). Second, Clarke reported
within-category discrimination that was much better than is typical of other categorical judgements. In support of this observation, Jones & Yee (1993)
have reported that time discrimination is better in metrical than nonmetrical contexts. Finally, it is quite clear that musicians (at least) categorize
rhythmic patterns all the time, hence the ability to notate musical performances. Thus, the more relevant questions would seem to be: What is the role
of context in the categorization of rhythmic patterns?, and Can this phenomenon tell us something about how people perceive metrical structure?
If Clarke's (1987) interpretation is correct, that meter provides the categories available for rhythmic pattern classification, then his finding may indeed
inform us as to the nature of meter perception. In dynamical systems terms, Clarke's data provide evidence of hysteresis in meter perception, the
persistence of a percept (e.g. a duple meter) despite a change in the stimulus that favors an alternative pattern (e.g. a triple meter). For example,
understanding the influence of context in rhythm perception could help to address the basic issue of whether meter perception is best described as a
linear (e.g. Scheirer, 1998; Todd, et. al. 1999) or a nonlinear (e.g. Large & Kolen, 1994; Large, 2000) dynamical system, because nonlinear dynamical
systems exhibit hysteresis, whereas linear systems do not.
A Categorization Experiment
The initial objective of this study was to establish baseline observations regarding the role of metrical context in rhythm perception for both musician
and non-musician listeners. An additional objective was to establish an experimental methodology for studying context effects within a framework that
supports interpretation from a dynamical systems point of view. The main requirement for assessing the existence of hysteresis is the systematic
variation of a single stimulus parameter. Thus, in the case of rhythm perception, the stimulus should be gradually changed from one rhythmic figure to
another. However, this raises the difficult problem of distinguishing between hysteresis in the listener's perception and hysteresis in the response.
Listeners responding to a gradually changing stimulus parameter may persevere in their responses even after their percept has changed. There are other
interpretive problems as well. For example, are observed hysteresis effects truly perceptual, or do listeners persist in an earlier decision while the
stimulus parameter passes through values for which they are uncertain about what they are hearing? Hock, Kelso, & Schöner (1993) developed a
methodology for studying perceptual hysteresis in apparent motion patterns. It allows the study of perceptual changes resulting from varying the value
one stimulus parameter using a simple modification of the psychophysical method of limits. The modified method of limits procedure minimizes the
potential for confounding perceptual hysteresis with response hysteresis by requiring a single response only after an entire sequence has been heard.
Here, the modified method of limits procedure is applied to the categorization of rhythmic patterns in an attempt to deal with interpretive problems.
Methods
Figure 1
The dashed curves show the results of the second two sessions. These indicate where the musicians first heard a transition away from duple in the
ascending condition (dashed black line) and where they first heard the transition away from triple in the descending condition (dashed gray line).
These clearly indicate an ambiguous region. But how should it be interpreted? Compare the two gray lines (farthest left) for M2, 600ms. These
measure boundary the same boundary between duple and not duple, but in different contexts (ascending and descending). Likewise the two black lines
(farthest right) measure the boundary between triple and not triple in different contexts. This leads to two preliminary conclusions. First, there are (at
least) three categories: duple, triple, and neither. Second, category boundaries shift depending on context.
With these observations in hand, context effects can be interpreted. For M1, 600ms, there is a slight hysteresis in the boundary between duple and
non-duple (gray lines). The boundary between triple and non-triple (black lines), however, shows a strong enhanced contrast effect. Enhanced contrast
is the opposite of hysteresis, the switch to an alternative percept before the stimulus parameter reaches a value that favors the alternative percept.
(Tuller, et. al., 1994). This is a nonlinear effect that is often observed in perceptual switching studies and is discussed in more detail below. M2
displayed enhanced contrast at every boundary. Finally, in the 300ms condition, M1 displayed hysteresis at every boundary. Although not reported in
detail here, both non-musicians also displayed greater hysteresis in the 300ms condition.
Finally both listeners were interviewed regarding their perception of the patterns. M1 found the 600ms patterns slightly ambiguous: "Sometimes the
Back to index
Proceedings paper
are taken from Experiment 3, and show subjects' performances after the pulses were switched off.
The data in Figure 1 show that performance is much more variable at slow speeds than at high speeds.
This is confirmed by standard deviations, which show consistently more variability for slower speeds:
11-159 msec for 1 note/sec and 1-10 msec for 12 notes/sec. Taken together, the data vividly illustrate
that the problem with this task is the control of motor performance at very slow speeds, not very fast
speeds.
Conclusions. The main result was that subjects played very well at high speeds, in terms of their
ability to keep up with the beat, shown by low variability both across and within subjects. By contrast,
at slow speeds, subjects' performances were much more variable.
This is a very surprising result, because it seems so intuitively obvious that performance should get
worse as speed increases. What is going on?
One explanation is that superior performance at high speeds could be due to the motor system shifting
into a modular state, placing it outside any interference from perceptual or cognitive processes.
However, Experiment 5 showed that perceptual monitoring is possible at very high speeds (the data
from Experiment 5 are very similar to the data from Experiment 3 shown in Figure 1).
Experiment 6 investigated which kind of perceptual monitoring was most useful by re-doing
Experiment 5 with participants wearing blindfolds. The results from Experiment 6 show that coupling
between the hands disappears and that at slow speeds, subjects slowed significantly.
Another experiment could be done which is the converse of Experiment 6, where the subjects can see
but not hear, i.e. suppress the sound from the piano keyboard. If the data were the same as
Experiments 2-5, we might be able to say that success at high speeds is due to a visuo-motor module
rather than a motor module by itself.
Evidence for a visuo-motor module has been found in the work of Milner & Goodale (1995) who
show that processes for telling what an object is are correlated with different pathways in the visual
system from the processes that tell where an object is. The "what" processes involve the conceptual
representation of the object, while the "where" processes work in conjunction with the motor system.
These "where" processes have been shown by Milner & Goodale (1995) to be fast and automatic, and
can been thought of as a visuo-motor module. These findings are consistent with those of Experiment
6, which found that lack of visual information caused performance to suffer.
Key Words: polyrhythm, performance, expert, non-expert, modular, visuo-motor module.
References:
Fodor, J. A. (1983). Modularity of Mind. Cambridge MA: MIT.
Lashley, K. S. (1951). The problem of serial order in behavior. In L. A. Jeffress (Ed.), Cerebral
mechanisms of behavior (pp. 112-131). New York: Wiley.
Back to index
Proceedings abstract
Dr Adrian North
ACN5@LE.AC.UK
Background:
Many authors have often considered the relationship between (particularly pop)
music and the broader zeitgeist in which it is experienced. However to date
there has been little quantitative research on the subject.
Aims:
The study described in this paper aimed to investigate the relationship between
the content of pop music lyrics and various zeitgeist indicators, and also to
investigate trends in the evolution of pop music lyrics.
method:
Lyrics were obtained for each song to have appeared in the British weekly Top 5
singles sales charts between March 1960 and December 1998. The lyrics were
scored by text analysis software to produce several lyrical variables (e.g.
optimism). Zeitgeist indicators (e.g. GDP, crime figures) were obtained from a
variety of sources (e.g. UK Government publications): some of these indicators
were available monthly, while others were available only quarterly or annually.
Results:
At the time of writing, analysis of the data has only just begun. This will
describe whether pop music lyrics precede or follow changes in the zeitgeist
variables, and will describe how the lyrics have changed over the 38 year
period considered.
Back to index
Proceedings paper
apatel@nsi.edu
ABSTRACT
Background. It is a commonplace notion that melodies are tone sequences which are neither too random nor too predictable
in their structure. Little is known, however, about patterns of brain response as a function of the structure of tone sequences.
Aims. This study sought to determine if differences in the statistical structure of tone sequences are reflected in measurable,
dynamic neural responses, and if sequences that are melody-like in their statistical properties have a distinct neural
signature.
Methods. Subjects listened to 1-minute long diatonic tone sequences while neural signals were recorded using 148-channel
whole-head magnetoencephalography (MEG). Sequences were random, deterministic (scalar), or one of two categories of
'fractal' sequences differing in their balance of predictability and unpredictability. (One of the fractal categories had
melody-like statistics). Amplitude-modulation of the tone sequences was used to generate an ongoing, identifiable neural
response whose amplitude and timing (phase) could be studied as a function of sequence structure.
Results. Ongoing timing patterns in the neural signal showed a strong dependency on the structure of the tone sequence. At
certain sensor locations, timing patterns covaried with the pitch contour of the tone sequences, with increasingly accurate
tracking as sequences became more predictable. In contrast, interactions between brain regions (as measured by temporal
synchronization), particularly between left posterior regions and the rest of the brain, were greatest for the tone sequences
with melody-like statistics. This may reflect the perceptual integration of local and global pitch patterns in melody-like
sequences.
Conclusions. Dynamic neural responses reveal a neural correlate of pitch contour in the human brain, and show that
interactions between brain regions are greatest when tone sequences have melody-like statistical properties.
1. Introduction
Melodies are a special subset of auditory sequences. Their acoustic raw materials can be extremely simple (e.g. a few dozen
pure tones), yet the arrangement of these materials in time can create structures that engage a host of interacting mental
processes, including chunking, melodic expectancy, and the perception of meter. Studying these processes is a principal goal
for the cognitive science of melody, and is being actively pursued by a number of research groups (e.g. Krumhansl et al.,
2000).
Discovering the neural correlates of melodic processing is a challenge for cognitive neuroscience. Progress in this area has
focused on average neural responses to individual events in sequences (e.g. via event-related potentials or ERP Besson &
Faïta, 1995) or on the brain's average response to entire sequences (e.g. via positron emission tomography or PET Zatorre et
al., 1994). These techniques continue to provide valuable information, yet it is evident that to "tap into the
moment-to-moment history of mental involvement with the music" (Sloboda, 1985), techniques are needed that measure
patterns of neural activity as perception unfolds within individual sequences.
With these goals in mind, we set out to determine if aspects of tone sequence structure are reflected in dynamic neural
responses, and if melody-like sequences have a distinct neural signature. Full details of this study are given in Patel &
Balaban (2000). This paper emphasizes a qualitative understanding of our methods and results.
2. Methods
To explore brain responses, we used statistically-generated tone sequences. This allowed us to generate novel stimuli which
lay on a spectrum from random to deterministic in structure. We elected to use statistical tone sequences rather than
precomposed melodies so that the sequences would be unfamiliar to subjects, easily generated in quantity, and
mathematically well characterized. The latter two points were of particular importance because we were employing a novel
brain imaging technique and wanted to have good control over the stimuli.
All tone sequences were approximately 1 minute long, consisting of ~150 pure tones (415 msec each) with no temporal
gaps. Sequences were diatonic, and ranged between A3 (220 Hz) and A5 (880 Hz) in pitch. Four structural categories of
sequences were employed: random, deterministic (musical scales), and two intermediate 'fractal' categories of constrained
variation which differed in their balance of predictability and unpredictability (Schmuckler & Gilden, 1993). These
categories were given mathematical names in accordance with the technique used to generate them1: 1/f ("one over f") and
1/f2 ("one over f squared").
A qualitative understanding of these categories is possible without delving into the underlying mathematics. In random
sequences each successive pitch is chosen independently of the previous one, and there are no long term pitch trends.
Deterministic sequences represent the opposite case: they consist entirely of long-term pitch trends (predictable stair-like
patterns) with no short-term unpredictability. The fractal sequences are intermediate. 1/f sequences have a hint of long term
pitch trends but still have much unpredictable variation from one pitch to the next. 1/f2 sequences are strong in long term
pitch trends, but retain a small amount of unpredictability in the behavior of successive tones.
Examples of pitch contours from the different sequences are shown in Figure 1, f-i (black/dark lines). Note the different
shapes of the pitch contours in the four conditions: the random sequence has no discernable long-term patterns. The 1/f pitch
contour has some evidence of long-term patterns (e.g. the general dip in the pitch contour in the middle of the sequence,
followed by a slow climb in average pitch), but retains a good deal of unpredictable jagged pitch movement. The 1/f2 pitch
contour has clearly discernable long-term pitch patterns with relatively little unpredictable jaggedness. The pitch contour of
the scales moves up and down in a completely predictable way. Sound examples of all tone sequence categories can be
heard at: www.nsi.edu/users/patel/tone_sequences.
Subjects (n=5 right handed males, 2 with musical training) were familiarized with the different stimulus categories in a
training session where examples of each category were presented along with an arbitrary category label (the numbers 1-4).
Subjects quickly learned to identify the different categories, and during the experiment, classified novel sequences by their
category with little difficulty. The experiment consisted of 28 such sequences, 7 per category. Stimuli in each category were
equally distributed among seven Western diatonic modes (ionian, dorian, phrygian, lydian, mixolydian, aeolian, and
locrian). Each subject heard a unique set of stimuli, with the exception of the scales, which were identical across subjects.
During stimulus presentation, neural data were recording using 148-channel whole head magnetoencephalography (MEG).
MEG measures magnetic fields produced by electrical activity in the brain, providing a signal with similar time resolution to
electroencephalography (EEG) but with certain advantages relating to source localization and independence of signals
recorded from different parts of the sensor array (Lewine & Orrison, 1995).
We used a novel method to detect stimulus-related neural activity. Each sequence was given a constant rate of amplitude
modulation (41.5 Hz), as shown in Figure 1 a-c. Fig 1a shows frequencies from a 4-second portion of a tone sequence. Fig
1b shows the associated amplitude waveform. Figure 1c provides a detail of a small piece of the waveform, showing the
constant amplitude modulation frequency (41.5 Hz, blue/dark line) overlaid on the changing carrier frequency. This
amplitude modulation gave the tone sequences a slightly warbly quality, without disrupting their perceived pitch pattern:
listeners heard them as sequences of pitches at the underlying pure tone frequencies.
It is known from auditory neuroscience that continuous amplitude modulation of pure tones results in a detectable brain
response at the amplitude modulation frequency (Galambos et al., 1981; Hari et al., 1989), known as the auditory
"steady-state response" (SSR). This response is visible in a power spectrum of the brain signal, which shows a peak at the
amplitude modulation frequency. Fig 1d shows a 4-second piece of brain signal, and Fig1e shows two corresponding power
spectra, based on two successive 2-second portions of the signal. A peak at 41.5 Hz is clearly visible.
Thus amplitude modulation results in detectable stimulus-related cortical activity. We studied properties of this activity
during individual sequences. In particular, for each sequence heard by a subject we studied the amplitude and timing
characteristics (phase) of this activity in contiguous two-second epochs from each channel. One amplitude and phase value
of the SSR was obtained from each successive 2-second epoch of the channel's brain signal via a Fourier transform, yielding
approximately 30 data points x 148 channels per sequence.
Since a good deal of our analysis concerns phase information, it is worth giving a brief conceptual explanation of phase. By
amplitude modulating our tone sequences at 41.5 Hz, we are introducing an oscillatory signal into the brain at that same
frequency. This causes an oscillatory response (the SSR) at that frequency in certain brain regions. The degree to which the
oscillatory brain response lagged the time-referenced input signal is measured by the phase of the brain response at 41.5 Hz.
We studied the amplitude and phase of the brain response over time during individual sequences heard by our subjects.
3. Results
Our first finding was that the phase of the measured brain signal varied with the pitch of the tone sequence. As pitch
increased, phase advanced (corresponding to a decreased lag between stimulus brain response), and vice-versa. This general
result is depicted in Fig1e, which shows two spectra, one taken during a sequence of low pitches (Fig 1a, left half) and one
taken when pitches were higher (Fig1a, right half). Fig 1e shows that the peak of the SSR remains steady at 41.5 Hz, but its
phase (inset arrow) advances as the average pitch of the tone sequence increases. This relationship between SSR phase and
carrier frequency was suggested by early work (Galambos et al., 1981), and has been independently confirmed by another
laboratory (John & Picton, 2000). It is likely to be due to the tonotopic layout of the basilar membrane in the human ear,
where higher frequencies are closer to the oval window and hence stimulated earlier than lower frequencies.
Our next finding was that the phase of the brain response tracked the pitch contour a subject was hearing, and that this
tracking improved as the sequences became more predictable in structure, with the best tracking for musical scales.
Examples of phase-time contours (red/light lines) overlaid on their corresponding pitch time contours (black/dark lines) are
shown in Fig1 f-i, showing how tracking improves across the stimulus conditions. Each subject showed a number of sensor
locations where this 'phase tracking' of pitch was observed. Across subjects, these locations tended to be in fronto-temporal
regions, with a right-hemisphere bias (Patel & Balaban, 2000, Fig 2). A similar set of locations was identified when we
looked for sensors where the amplitude of the SSR was strong. However, we found no evidence that the amplitude of the
SSR correlated with the heard pitch contour.
Knowing that the phase of the brain response contained information about stimulus properties, we then turned to looking at
patterns of phase coherence between different brain regions. Phase coherence does not measure the lag between an
oscillatory signal and brain response but rather the stability of the phase difference between oscillatory activity in different
brain areas. Thus phase coherence is a measure of temporal synchronization between brain regions. If two brain areas show
greater synchronization during certain condition, this is suggestive of a greater degree of functional coupling between those
areas (see Bressler, 1995 for a review).
We found that across subjects, the different conditions were characterized by differing degrees of phase coherence. Random
sequences generated less phase coherence than all other categories, and among the structured categories, 1/f2 sequences
generated the greatest degree of phase coherence (Patel & Balaban 2000, Fig3). Interestingly, statistical research on Western
music suggests that melodic tone sequences have approximately 1/f2 statistics (Nettheim, 1992; Boon & Decroly, 1995),
suggesting that music-like sequences generated more brain interactions than other sequences.
To better understand the nature of these interactions, we examined topographic patterns of phase coherence, subdividing the
brain into four quadrants (anterior and posterior x left and right). We found that the greater phase coherence of 1/f2
sequences was driven by interactions between the left posterior hemisphere and the rest of the brain, including the two right
hemisphere quadrants. This is of interest because neuropsychological studies of brain-damaged patients suggests that left
superior temporal regions are involved with the discrimination of precise interval sizes, while right fronto-temporal circuits
are involved with the perception of more global contour patterns (Liégeois-Chauvel et al., 1998; Patel et al., 1998). Thus the
observed pattern of coherence may reflect the dynamic integration of local and global pitch perception, and suggests that
this integration is greatest when tone sequences resemble musical melodies.
4. Discussion
This study has shown that it is possible to extract a signal from the human cerebral cortex which reflects the pitch contour an
individual is hearing. The accuracy with which this signal reflects the pitch contour improves as the pitch sequence becomes
more predictable. Thus there may be top-down influences of musical expectancy which influence this brain signal.
The basis of this signal is temporal information in cortical activity. When the amount of activity was examined, no
relationship with pitch contour was observed. This suggests that dynamic imaging techniques have an important role to play
in the study of music perception, complementing techniques sensitive to the amount of neural activity but insensitive to the
fine temporal structure of that activity (e.g. functional magnetic resonance imaging, fMRI).
Dynamic imaging techniques also offer the opportunity to study how brain areas interact during perception. It is clear from
decades of neural research that the brain is divided into different regions, each of which has a special role to play in
perception and cognition. Yet it is also clear that these brain areas must interact to form coherent and unified percepts.
Complex patterns such as music and speech engage multiple brain regions, and sequences with different perceptual
properties may be distinguished by the pattern of brain interactions they engender rather than by the particular brain regions
which respond to them.
Using phase coherence, we examined brain interactions as a function of stimulus structure and found that sequences with
melody-like statistics engendered the greatest degree of neural interactions. In particular, we found evidence for strong
functional coupling between the left posterior hemisphere and right hemisphere regions during the perception of melody-like
sequences. This may reflect the perceptual integration of local and global pitch patterns, and suggests that one neural
signature of melody is the dynamic integration of brain areas which process structure at different time scales.
Future work will use this technique to examine brain interactions as a function of stimulus structure in real melodies. This
may provide one way to quantify the perceptual coherence of melodies in individuals who cannot easily give details of their
perception, such as non-musicians and infants.
Figure 1. (a-d): Example of stimulus and brain response over 4 seconds: (a) Tone frequencies; (b) Stimulus waveform; (c)
Waveform detail (150 msec), showing constant modulating frequency (41.5 Hz, blue/dark line) overlaid on changing carrier
frequency; (d) Neural signal from one sensor. (e) Successive 2-second spectra of neural signal. The brain signal shows an
energy peak at 41.5 Hz, whose phase (inset arrow) varies with carrier frequency. (f-i): Phase-tracking of individual tone
sequences. Pitch-time contours (black/dark lines) illustrate the four different stimulus categories. Associated neuromagnetic
phase-time series (red/light lines) from a single sensor during one trial in one subject were randomly drawn from the top
10% of sensor correlation values for each stimulus. The correlation between the resampled pitch-time series and the
neuromagnetic phase-time series is given in the inset to each graph.
Acknowledgements
This work was supported by the Neurosciences Research Foundation as part of its research program on Music and the Brain
at The Neurosciences Institute.
NOTES
1. Inverse Fourier transform of power spectra with different slopes (see Patel & Balaban 2000 for details).
references
Besson, M. & Faïta, F. (1995). An event-related potential (ERP) study of musical expectancy: Comparison of musicians with nonmusicians. J. Exp. Psych: Human
Perception and Performance, 21, 1278-1296.
Boon, J.P. & Decroly, O. (1995). Dynamical systems theory for music dynamics. Chaos 5, 501-508.
Bressler, S. (1995). Large-scale cortical networks and cognition. Brain Research: Brain Research Reviews, 20(3), 288-304.
Galambos, R., Makeig, S. & Talmachoff, P.J. (1981). A 40-Hz auditory potential recorded from the human scalp. Proc. Natl. Acad. Sci. USA 78, 2463-2647.
Hari, R. Hämäläinen, M., & Joutsiniemi, S.-L. (1989). Neuromagnetic steady-state responses to auditory stimuli. J. Acous. Soc. Am. 86, 1033-1039.
John, M.S. & Picton, T.W. (2000). Human auditory steady-state responses to amplitude modulated tones: phase and latency measurements. Hearing Research, 14,
57-79.
Krumhansl, C., Louhivuori, J., Toiviainen, P., Järvinen, T. & Eerola, T. (2000). Melodic expectation in Finnish spiritual folk hymns: convergence of statistical,
behavioral, and computational approaches. Music Perception, 17(2), 151-196.
lography and magnetic source imaging. In: Functional Brain Imaging (W.W. Orrison et al., ed): 369-417. St. Louis: Mosby.
Liégeois-Chauvel, C., Peretz, I., Babaï, M., Laguitton, V. & Chauvel, P. (1998). Contribution of different cortical areas in the temporal lobes to music processing.
Brain 121, 1853-1867.
Patel, A.D. & Balaban, E. (2000). Temporal patterns of human cortical activity reflect tone sequence structure. Nature, 404, 80-84.
Patel, A.D., Peretz, I., Tramo, M. & Labrecque, R. (1998). Processing prosodic and musical patterns: a neuropsychological investigation. Brain and Language 61,
123-144.
Schmuckler, M.A. & Gilden. D.L. (1993). Auditory perception of fractal contours. J. Exp. Psychol: Human Percep. & Perform. 19, 641-660.
Sloboda, J. (1985). The Musical Mind: The Cognitive Psychology of Music. Oxford: Clarendon Press.
Zatorre, R.J., Evans A.C. & Meyer, E. (1994). Neural mechanisms underlying melodic perception and memory for pitch. Journal of Neuroscience, 14(4), 1908-1919.
Back to index
Proceedings paper
"It [music] has no zoological utility; it corresponds to no object in the natural environment; it is a pure
incident of having a hearing organ...our higher aesthetic, moral, and intellectual life seems made up of
affections of this collateral and incidental sort, which have entered the mind by the back stairs, as it
were, or rather have not entered the mind at all, but got surreptitiously born in the house."
In recent years the historical origins of human cognitive abilities have become an intense topic of discussion. Many
have argued quite convincingly that evolutionary forces have shaped cognitive capacities to assist us in solving
environmental problems, thereby increasing reproductive fitness. Numerous cognitive abilities that are present in
humans have been considered in light of natural selection, and language in particular has received a large amount of
attention (e.g. Pinker and Bloom, 1990; Pinker, 1994; Jackendoff, 1994). More recently, the evolutionary history of
musical ability has been examined, largely from an adaptationist perspective as a trait shaped by natural selection
(Wallin et al., 2000). However, processes other than direct adaptation can also account for its origins and are consistent
with a modern evolutionary framework. Two such processes, cultural transmission and exaptation, seem especially
suited to an evolutionary theory of the origins of music.
Examining the evolutionary origins of music cognition or any other psychological phenomenon requires grappling with
some of the most central and difficult issues in cognitive psychology, cultural anthropology, and evolutionary biology.
Within the traditional framework of the social sciences, behavioristic psychology was given the limited role of
explaining the general learning mechanisms through which culturally transmitted knowledge is acquired. More
recently, the more domain-specific and modular cognitive psychology has formed a stronger allegiance with the
biological sciences, and has attempted to explain the structure of the adult mind in terms of evolutionarily-derived
mental mechanisms rather than acquired cultural traditions (see Tooby and Cosmides, 1992). An extreme adaptationist
version of psychology, however, is no improvement over an equally extreme behaviorism. By taking one perspective
over the other, the psychologist risks losing half of the field's potential for exploring the origins of human mental
processes.
We believe that arguments concerning the evolutionary history of a cognitive phenomenon such as music and its
corresponding neural structure must first deal with two issues. First, both evolutionary and cultural processes can
explain the origins of these cognitive structures in the adult mind. Second, both adaptationist and non-adaptationist
processes can account for the evolutionarily-derived structures. If one is going to argue that music or any other mental
process is a biological adaptation that has been shaped by natural selection, two essential cases must be made. First,
one must be prepared to argue that the process is based on domain-specific innate universals that are not specifically
devoted to other cognitive abilities. Otherwise its developmental origins may be explained more parsimoniously via
mechanisms of learning and cultural transmission, rather than through natural selection. Second, one must be prepared
to argue that the purpose of this mental process is the same function that facilitated its selection. Otherwise its
evolutionary origins may be explained more parsimoniously via mechanisms of exaptation rather than adaptation.
Effects of Maturation and Learning on Cognitive Structure. Cognition is the product of a nervous system as it interacts
with the environment. Conceptually, we identify two processes by which the structure, and therefore the behavior, of
the nervous system can be determined and changed. The first process includes relatively dramatic changes that occur
through the interaction of an organism's genetic makeup with the environment during development. We often call such
changes maturation. These changes represent the unfolding of the genetic blueprint of the organism. The second
process includes relatively subtle yet still permanent changes that occur in response to environmental stimuli. We often
call these changes learning. Although these processes vary in the magnitude of their effect on the organism, both are
dependent upon the interaction of an organism's genetic endowment and its environment. In the case of maturation, an
environmental context in which genetic programs can unfold is required. In the case of learning, genetically specified
learning mechanisms upon which environmental information can act are required. Because these two processes share a
similar outcome, which is the specification of the structure of the nervous system, it is difficult to distinguish where
one ends and the other continues. It should be clear that we are not speaking of the traditional dichotomy between
genetics and the environment where a greater amount of direct genetic specification means a lesser degree of
environmental influence. Genetics and the environment, broadly construed, produce the structure of an organism
through their concerted actions and these two paths in many cases cannot be considered independently of one another
(see Tooby and Cosmides, 1992).
Historically, however, maturation and learning have been considered two separate processes, and have been cast
throughout the history of philosophy and psychology as dichotomies between nativism and empiricism, innate and
acquired traits, nature and nuture, and genetics and the environment. Despite this deep historical division in the way we
think of an organism's unchanging biological endowment and that which is acquired from the environment, these
processes overlap and interact in interesting and potentially powerful ways. For instance, in organisms with an evolved
system of learning and memory, biologically relevant information can also take the form of shared group knowledge.
This information can be transmitted either through observing older members of the group or it can be encoded into a
representational system. Organisms that use a representational system must share the ability to use such a code, the
most common example of which is language. The sharing and intergenerational transfer of such information has been
called cultural evolution and cultural transmission (see Key and Aiello, 1999) as well as memetic transmission
(Dawkins, 1976). Cultural evolution in humans not only carries simple types of information, but can also embody very
complex constructions such as religion and social customs. According to Mead (1964), "the term cultural transmission
covers a series of activities, all essential to culture, which it is useful to subdivide into the capacity to learn, the
capacity to teach, and the capacity to embody knowledge in forms which make it transmissible at a distance in time or
space."
Differences between Biological and Cultural Transmission. Although heavily intertwined there are clear differences
between biological and cultural information transfer. Biological information, the information that is carried by an
individual's genetic makeup, is resistant to change and therefore has high fidelity across many generations. It cannot be
altered within the lifetime of an organism, but only between generations. In addition, a substantial number of changes
that do occur in the genetic code are eliminated, either because they are incompatible with life or result in a decrease in
the relative reproductive success in the organisms that possess this change. Other changes to the genetic code are
neutral, neither conferring an immediate adaptive advantage or disadvantage (Patterson, 1999). Bits of cultural
information, or memes, are also capable of transmitting information that is relevant to reproductive success. Unlike
genetic transmission, memetic transmission has relatively low fidelity and can incorporate or delete changes multiple
times across a single generation (Dobzhansky and Boesiger, 1983). In addition, we are not burdened with an
unalterable set of memetic information. Humans can modify or disregard information that has become obsolete and is
no longer useful. The invention of written languages has greatly diminished the permanent loss of cultural information.
However, if one considers only the active set of commonly accepted cultural information, we are faced with an ever
changing entity where only the most critical pieces of information retain stability in the knowledge set. The cognitive
abilities, such as language and social group formation, that are required to use, reshape, and convey such a knowledge
set across generations are stable, and perhaps biologically-based cognitive adaptations. The temporal aspects and
fidelity of these two modes of transmission are quite different from each other but do seem to share the ability to be
acted upon by selection when the solutions they provide have utility for the organism.
It is easy to understand how such a system of culturally carried information that increases the reproductive fitness of
members of the social group could provide a powerful basis for reproductive success. So powerful is the potential of
such a system that some researchers have argued that it is of greater significance than biological evolution in
explaining our unique cognitive abilities.
"That adaptation to cultural change is more important to humans than adaptation by genetic change is
incontrovertible; changed genes are transmitted only to direct descendants of the individuals within
which the changes arise. Many generations of selection are needed in order to confer the benefits of the
changed genes on the whole species. Changed ideas, skills, or inventions can be transmitted, in
principle, to any number of persons within a single generation." (Dobzhansky and Boesiger, 1983)
Both an extreme nativist and an extreme empiricist argument can simultaneously incorporate memetic and genetic
modes of information transfer. For the empiricist, the genetics of an organism can specify a general learning
mechanism that is capable of acquiring musical ability. Memetics then provides the information that this general
system uses to produce music cognition. In this view it is the general learning mechanism that provides a reproductive
advantage and the construction of musical ability by this mechanism is largely irrelevant to its evolutionary history. For
the nativist, musical ability is a direct outcome of a specific neuronal architecture that has provided reproductive
success to the organism and has been open to natural selection. These arguments differ in the degree of specificity that
the biology of the nervous system carries to the situation and also differ in the emphasis of where selection has had its
effect, on the general learning mechanism or on more specific mechanisms.
Chomskian Learning Theories. Perhaps no one has argued for specific cognitive mechanisms as convincingly as Noam
Chomsky (1975), who has described development in terms of species- and domain-specific learning theories. A
learning theory is the set of innate mental mechanisms that all members of a particular species use to acquire
information about a particular domain from the environment. In some cases, learning mechanisms may seem to be
relatively general and rely on laws of association. In others, a high degree of innately-specified "knowledge" must be
present in order for the organism to draw the right conclusions. Chomsky's learning theory of interest was for language
acquisition, but this kind of knowledge can also be extended to other domains, including music. Lerdahl and
Jackendoff (1985) have provided one detailed hypothesis of the mental mechanisms we have for structuring musical
information.
Adults possess a great deal of knowledge about music that is represented in the structure of their brains. As we have
suggested, the first question that evolutionary musicology has to face is the respective roles of biological evolution and
cultural transmission in getting that structure in place. Although it does not make sense to expect a clear distinction
between knowledge that is innately specified and knowledge that is learned, it does make sense to distinguish between
(1) knowledge that is acquired via the unfolding of developmental programs interacting with environmental universals
and (2) knowledge that is acquired via the process of enculturation, having accumulated throughout thousands of
generations of human culture. In particular, the more recent cultural developments, such as the European
tonal-harmonic system, are likely not to have been involved in any co-evolutionary processes. Natural selection
certainly played a role in setting up the general learning mechanisms and linguistic abilities that allow cultural
transmission. However, a reliance upon biological adaptation to account for specific kinds of knowledge that could be
explained by cultural transmission may not be warranted.
The Example of Language. Another reason why learning theories are important to the discussion of music evolution is
the example of language itself. It has been argued that language is an adaptation shaped by natural selection (Pinker
and Bloom, 1990). The primary reason why this argument is compelling is not based on the current utility of language;
it is based on the fact that the entire field of linguistics provides us with a learning theory for the domain of language
that is unparalleled by research in any other cognitive domain. The kinds of innate constraints we have for learning
language are complex, specialized, and seem to be well-designed for the task they serve. This kind of
argument-from-design for language as an adaptation is not possible without first having the foundations of a learning
theory for language that was provided by Chomsky and his followers.
If one is going to talk about music and evolution, a learning theory for musical knowledge would help reveal the kinds
of knowledge that are genetically specified. This is the information that has been carried throughout human history and
is open to traditional forms of biological evolution. Certainly there is more to music than this biological system, but it
is the domain of cultural transmission and not biological evolution that must explain the elements that are not innate.
We will now set aside the issue of culturally transmitted contributions to the structure of the adult mind and turn the
discussion to these evolved cognitive mechanisms.
The Distinction between Function and Use in Evolutionary Biology. This distinction between complex mental structure
that originated because of selection for a behavior of interest (in this case musical behavior) as opposed to another
behavior is an echo of the distinction between function and use in the field of evolutionary biology (see Williams,
1966; Gould and Lewontin, 1979). A function of a given structure is the purpose for which the structure was selected,
while a use is a purpose that the given structure allows that is not the purpose for which it was selected. It has also been
argued that the term "adaptation" be reserved for structures whose modern functions of interest are the same purposes
for which they were selected; structures whose original functions and modern uses differ are sometimes referred to as
"exaptations" (Gould and Vrba, 1982). The issue of teasing apart original function from current use is no less of a
problem when we are dealing with cognitive structure than when we deal with macroscopic structure (Lewontin, 1990).
These distinctions are crucial if one is going to label a particular behavior an evolutionary adaptation. In order to make
such an argument, that behavior must be the function (and not merely a use) of the neural structures that specify it. In
the case of music, we are left with the question of if on one hand, our innate abilities that lend themselves to musical
processing and behavior are there because of the fact that they lend themselves to such abilities, making music the
function of these innate mechanisms as well as an adaptation, or if on the other hand, these neural structures are there
because of selection pressures in other domains, making music one of perhaps many uses or exaptations of these
mechanisms. Arguments have been made extensively on both sides of this issue as it pertains to language (see Pinker
and Bloom, 1990).
The Fallacy of Inferring Function from Use. The importance of this distinction between original function and current
use has been either underestimated (Huron, 1999) or explicitly denied (Miller, 2000; Brown, 2000) in recent treatments
of music evolution. Specifically, consider the following statement by Brown (2000).
Music making has all the hallmarks of a group adaptation and functions as a device for promoting group
identity, coordination, action, cognition, and emotional expression. Ethnomusicological research cannot
simply be brushed aside in making adaptationist models... Music making is done for the group, and the
contexts of musical performance, the contents of musical works, and the performance ensembles of
musical genres overwhelmingly reflect a role in group function. The straightforward evolutionary
implication is that human musical capacity evolved because groups of musical hominids outsurvived
groups of nonmusical hominids due to a host of factors related to group-level cooperation and
coordination (pp. 296-297).
Brown not only has inferred evolutionary function from current use, but has also implied that by disagreeing with him
we insult the field of ethnomusicology. We would disagree on both points. First, decades of evolutionary biological
theory exist that would take major issue with the "straightforward" conclusion derived above. Second, from the cultural
perspective disagreeing only implies that certain musical phenomena fall in the domain of cultural anthropology rather
than evolutionary biology. In fact, comparative ethnomusicological research may be an essential tool if we are to
determine which elements of music are the products of evolution and which are the products of culture.
Often closely following on the heels of the modern function argument is the claim that the evidence for a theory of
music evolution will be found in studies of modern evolutionary fitness (see Miller, 2000) and genetics (see Huron,
1999). In other words, we should examine the purposes of musical behavior in modern human society and observe if
such behavior leads to different levels of individual reproduction or is associated with differences in genetics. This
particular kind of empirical evidence is not only likely to be impractical (see Lewontin, 1990), but fundamentally
incapable of answering the critical question at hand (see Symons, 1992), as we explain next.
The Fallacy of Inferring Heritability from Reproductive Success. Consider the first question of whether musical ability
leads to increased reproduction. First, assuming that if musical ability had been selected as an adaptation, the once
essential variability may now be significantly reduced in modern humans by the forces of selection. Second, even if
there were a significant correlation with reproductive rate, it would likely require an overwhelmingly large data set in
order to observe it. Third, assuming that there were a correlation and that we could measure it, we would have then to
prove somehow that the differences in musicality were based on differences in genetics; we would have to show that
the differences are heritable. This makes the strength of argument from this first question entirely dependent on a
particular answer for the second: that differences in musical ability are related to differences in genetics.
The Fallacy of Inferring Adaptation from Heritability. Are there genes for music? Symons (1992) discusses how there
are really three such questions, which we apply to music here. In the ontogenetic sense, the answer is a trivial yes to
any question about genes and behavior. Everything about being human is inextricably linked to numerous genetic
processes. In the heritability sense, the answer is yes if any population variance in musical ability is caused by genetic
variance. The answer to this question is probably yes, but this would not say anything about whether or not music is an
adaptation. One might find that good readers and poor readers have statistically different genetics, but that does not
change the fact that written languages are a cultural invention. Lastly, in the adaptationist sense, the answer is yes if
there is evidence from studies of phenotypic design. If experimenters are interested in the adaptationist form of the
question, but design experiments that test for heritability, they have not accomplished anything. Just as careful
consideration of the question of reproduction rates requires asking a question about genetics, careful consideration of
the question of genetics requires asking a question about phenotypic design. Why not begin with the most fundamental
issue of all, the issue that in the long run will require an answer if you start out by examining reproduction and
genetics? This is the issue of special design.
Special Design for Music? The Relationship of Music to Other Cognitive Domains
The primary reason for skepticism regarding a musically adaptive explanation of the origins of neurological
connectivity relating to music is the fact that music shares mechanisms with other domains. The mere existence of
other kinds of thinking and other kinds of behavior that use cognitive mechanisms similar to that of music decreases
the odds that music is the evolutionary function of such mechanisms. Furthermore, many of these other forms of
cognition have a more compelling argument from design. These shared mechanisms fall into two general categories,
which we will consider below. The first category includes general mechanisms of perceptual organization and
cognitive representation. The second category includes the more specific domains of language and emotion, whose
mechanisms are possibly more specific and specialized than those of the first group. Similar arguments could be
extended to the domains of timing, motor control, and other capabilities related not only to perception but performance
as well (Justus and Hutsler, in preparation).
Helmholtzian Perceptual Organization. If one were to ask what perception is for, in the evolutionary sense, a
reasonable answer might be that perception is designed to provide an organism with useful knowledge about its
environment. In the auditory realm, a fundamental problem is to segregate the large set of frequency information
entering the ear into categories corresponding to different environmental objects and events. Bregman (1990) has
termed this process "auditory scene analysis." In the natural world, most sound sources produce harmonic vibration. If
we assume that part of the evolutionary function of the auditory system is to categorize frequencies on the basis of
what environmental objects and events are producing them, then it should not be surprising that the system makes the
assumption that tones with similar harmonic spectra should be categorized together, as if they were part of the
harmonic vibrations emanating from a single source. Such heuristics recall Helmholtz's (1867/1925; 1877/1954) ideas
of unconscious inference in perception.
In many cases, fundamentals of our perception of music seem to be governed by such Helmholtzian principles of
perceptual organization. Candidate universal "musical" principles that fall into this category include octave
equivalence, consonance, perceptual fusion, and pitch. Here we have cases in which a universal constraint of musical
processing, and perhaps even an innate universal, seems to be either an implementation (in the case of fusion and pitch)
or a direct consequence (in the case of octave equivalence and consonance) of harmonicity-detection mechanisms that
more plausibly evolved as a perceptual heuristic. It would not make sense to propose a separate adaptive value for
these mechanisms as they apply to music making; they were already put in place by another system. In fact, there may
be no need to hypothesize special innate mechanisms for learning to perceive pitch and the relationships between
musical intervals at all. Given that our adult knowledge in this case is reflecting acoustic regularities in the
environment, only the general mechanisms of perceptual learning are required (Bharucha and Mencl, 1996). It is
reasonable to suppose that at least pitch perception based on harmonicity and octave equivalence are universal features
of human auditory processing, either innate or learned as an unavoidable consequence of experience with physical
universals, given that infants seem to categorize frequency complexes based on pitch and regard octave-spaced tones as
equivalent (see Clarkson and Clifton, 1985; Demany and Armand, 1984).
Gestalt Perceptual Organization. These first examples are primarily concerned with the perception of simultaneous
auditory or musical events. The principles that govern melodic grouping are an example of the perceptual organization
of events unfolding over time. One class of perceptual mechanisms that seem to be employed extensively in melodic
grouping are those of Gestalt psychology, exemplified by the work of Wertheimer (1924/1950). The kinds of principles
that can govern auditory organization are the same as those described for vision: proximity, similarity, good
continuation, and common fate. The dimensions along which these principles operate include frequency, amplitude,
timbre, temporal position, and spatial location (see Bregman, 1990; Deutsch, 1999), just as in the visual realm they
might apply to different dimensions such as hue, brightness, saturation, spatial location, and temporal position. The
kinds of rules that govern what a melody can be seem to stem nicely from Gestalt principles. These principles may be
an example of a kind of innate knowledge that is applied to the perception of music. It hardly seems plausible that such
principles evolved so that they could contribute to the musicality of humankind. If anything, they evolved (probably
well before Homo sapiens did) in order to increase the fluency with which organisms interpret their environments.
Conceptual Representation. We now turn to the more cognitive or representational end of the perception-cognition
continuum. Gestalt psychology also serves well to explain cognitive principles used in music perception. Krumhansl
(1990) has described how tonality can be considered a holistic property. Individual tones within tonal contexts are not
perceived in an atomistic manner, but rather in terms of their relation to the whole. Three contextual principles
summarize the findings of such top-down processing on the relationship between tones: contextual identity, contextual
distance, and contextual asymmetry. These findings parallel more general principles of cognitive representation, and
could just as easily describe our knowledge about colors or artifacts (see Rosch, 1975; Tversky, 1977; Krumhansl,
1978). Again, here we have a case in which a major constraining principle in musical organization seems to be an
implementation of more general cognitive principles. Even the concept of musical key, which in the European
tonal-harmonic system requires quite an abstract representation, can be thought of as an extension of similar principles
of cognitive organization.
Another feature of music that illustrates cognitive principles is the existence of musical schemata, representations of
the musical regularities in one's culture. The most basic musical schemata contain information about the transition
probabilities between musical events. For example, American listeners show implicit knowledge of the chord transition
probabilities of tonal-harmonic music (Bharucha and Stoeckig, 1986). We have already discussed how it is to an
organism's advantage to make perceptual inferences from an ambiguous signal in order to interpret the environment
effectively. It is also to the organism's advantage to recognize patterns over time that it encounters frequently in order
to make predictions and process subsequent information more efficiently. Elements of many musical systems seem to
take advantage of the brain's ability to make predictions based on prior experience.
Language. As previously mentioned, the most extensive application of a "linguistic" approach to music is Lerdahl and
Jackendoff's (1983) Generative Theory of Tonal Music. Taking a linguistic approach to the study of music, however,
does not presuppose that the cognitive structure underlying music is the same as that of language. The parallel often
suggested between the syntax of language and the structure of music does not seem to hold. One of many fundamental
differences between the two is that while language is concerned with absolute grammaticality (either a sentence is
well-formed or it is not), music is concerned with preference (one particular interpretation is preferred over several
good interpretations).
There are some parallels, however. Patel (1998) has proposed a shared structural integration resource (SSIR)
hypothesis, which suggests that the cognitive operations performed within the domains of language and music are
distinct but that common systems are used in the process of structural integration. Such an organization explains one of
the most compelling similarities between music and language: a hierarchical structure that unfolds over time. One piece
of evidence to support this SSIR hypothesis is the fact that a particular event-related potential (ERP) known as a
correlate of syntactic processing in language (the P600) also is observed when listening to music that deviates from
expected patterns (Patel et al., 1998).
This ability to integrate patterns over time, to build expectations based on what was just perceived while at the same
time integrating each new piece of information, seems to be a good candidate for a universal feature of music. At the
same time, such an ability seems not to be unique but shared with language, a more complex domain with a more
compelling evolutionary story. In fact, one might argue that such processes in language began as more elementary
temporal pattern recognition processes. Recent brain imaging research has found evidence that the amount of
processing of a visual sequence within Wernike's area, a posterior area of the superior temporal gyrus associated with
language comprehension, is correlated with the predictability of the sequence (Bischoff-Grethe et al, 2000).
Emotion. The link between music and emotion is also a compelling issue for any psychological theory of music. Again,
it seems to be the case that there is no need to postulate some sort of special mechanisms on the part of music that lead
to its emotional qualities. Consider some of the basic findings of research on cognition, physiology, and emotion.
William James (1890) believed that the physiological response of the body to an emotional event preceded the
cognitive interpretation of what the experienced emotion was. This idea was strengthened and modified by the work of
Stanley Schacter (e.g. Schacter and Singer, 1962) who found that an otherwise unexplainable physiological reaction
was necessary for an emotional experience, which would then be interpreted in different ways depending upon the
context surrounding those reactions.
One kind of experience that can lead to an emotional reaction is the disruption of a cognitive schema (Mandler, 1984).
As mentioned earlier, abstracting information about past experiences into general principles about regularities in the
world is an adaptive trait, allowing organisms to process information more quickly when it is predicted by the schema.
When events occur that violate the predictions of the schema, physiological arousal occurs to indicate that something in
the environment requires attention. This arousal may then be experienced as an emotion, such as fear.
The emotions associated with music may emerge from such a process of schema disruption and interpretation. In fact,
the very nature of music's aesthetic value may depend on its ability to fulfill and violate expectations based upon
previous experience to varying degrees (Meyer, 1956; Dowling and Harwood, 1986). The fact that our knowledge
underlying such expectations seems to be modular and therefore ineffable may contribute critically to this aesthetic
experience (see Fodor, 1983; Raffman, 1993; Justus and Bharucha, submitted).
Our point is not to devalue the emotional qualities of music, but rather to illustrate how they may be a natural
consequence of the ways in which prior experience, physiology, and cognition interact to create emotional experiences
in general. Again, if this quality of music seems to be an implementation of a broader psychological principle, there is
no need to postulate special musical mechanisms underlying the emotions it causes us to experience.
What would it take to argue for such a position? First, an innate constraint in musical processing that seems to be
unique or at least primary to music must be described precisely. Perhaps when the study of music from a cognitive
science perspective begins to catch up with linguistics, with the aid of a more integrated developmental psychology and
ethnomusicology, we will be in a position to evaluate whether or not any such unique biologically-specified
mechanisms exist. Second, a convincing argument must be made for why these particular mechanisms, used in the
context of musical activities, would have been well-designed to give Homo adaptive advantages. While it is tempting to
start a discussion of music's evolution with speculation of how music would have been adaptive to our ancestors, doing
so without discussing what exactly music cognition corresponds to developmentally fails to rule out any number of
alternative possibilities.
In our view, given the current state of knowledge about music and the brain, there is no compelling reason to label
music as an evolutionary adaptation to the exclusion of many other hypotheses. Such a conclusion is not inconsistent
with the belief that music is a universal and cherished human trait, as many universal human qualities may share
similar evolutionary pasts. Nor is it inconsistent with the belief that music is a product of the evolutionary process,
since as we have discussed even the most general processes of learning and cultural transmission are products of an
evolved brain. Instead, music may illustrate a unique combination of non-adaptationist evolutionary processes and
cultural transmission. Whether any part of music has entered the house of the human mind from the front stairs, to use
James's metaphor, remains an open question.
References
Bharucha, J.J. and Mencl, W.E. (1996). Two issues in auditory cognition: Self-organization of octave
categories and pitch-invariant pattern recognition. Psychological Science, 7, 142-149.
Bharucha, J.J. and Stoeckig, K. (1986). Response time and musical expectancy: Priming of chords.
Journal of Experimental Psychology: Human Perception and Performance, 12, 403-410.
Bischoff-Grethe, A., Proper, S.M., Mao, H., Daniels, K.A., and Berns, G.S. (2000). Conscious and
unconscious processing of nonverbal predictability in Wernike's area. Journal of Neuroscience, 20,
1975-1981.
Bregman, A.S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge,
MA: MIT Press.
Brown, S. (2000). The "musilanguage" model of music evolution. In N.L. Wallin, B. Merker, and S.
Brown (Eds.), The Origins of Music (pp. 271-300). Cambridge, MA: MIT Press.
Buss, D.M., Haselton, M.G., Shackelford, T.K., Bleske, A.L., and Wakefield, J.C. (1998). Adaptations,
exaptations, and spandrels. American Psychologist, 53, 533-548.
Clarkson, M.G. and Clifton, R.K. (1985). Infant pitch perception: Evidence from responding to pitch
categories and the missing fundamental. Journal of the Acoustical Society of America, 77, 1521-1528.
Darwin, C. (1859/1964). On the Origin of Species. Cambridge, MA: Harvard University Press.
Demany, L. and Armand, P. (1984). The perceptual reality of tone chroma in early infancy. Journal of
the Acoustical Society of America, 76, 57-66.
Depew, D. J. and Weber, B. H. (1995). Darwinism Evolving: Systems Dynamics and the Genealogy of
Natural Selection. Cambridge, MA: MIT Press.
Deutsch, D. (1999). Grouping mechanisms in music. In D. Deutsch (Ed.), The Psychology of Music (2nd
ed., pp. 299-348). San Diego, CA: Academic Press.
Dobzhansky, T. and Boesiger, E. (1983). Human Culture: A Moment in Evolution. New York:
Columbia University Press.
Dowling, W.J. and Harwood, D.L (1986). Music Cognition. San Diego, CA: Academic Press.
Fodor, J.A. (1983). The Modularity of Mind. Cambridge, MA: MIT Press.
Gould, S.J. and Lewontin, R.C. (1979). The spandrels of San Marco and the Panglossian paradigm: A
critique of the adaptationist programme. Proceedings of the Royal Society of London B, 205, 581-598.
Gould, S.J. and Vrba, E.S. (1982). Exaptation - a missing term in the science of form. Paleobiology, 8,
4-15.
Helmholtz, H.L.F. von. (1925). Treatise on Physiological Optics. New York: Dover. (Original work
published 1867)
Helmholtz, H.L.F. von. (1954). On the Sensation of Tone as a Physiological Basis for the Theory of
Music. New York: Dover. (Original work published 1877)
Huron, D. (1999). An instinct for music: Is music an evolutionary adaptation? The 1999 Ernest Bloch
Lectures, Department of Music, University of California, Berkeley.
Jackendoff (1994). Patterns in the Mind: Language and Human Nature. New York: Basic Books.
Key, C. A. and Aiello, L. C. (1999). The evolution of social organization. In R. Dunbar, C. Knight, and
C. Power (Eds.), The Evolution of Culture: An Interdisciplinary View. Edinburgh: Edinburgh University
Press.
Krumhansl, C.L. (1978). Concerning the applicability of geometric models to similarity data: The
interrelationship between similarity and spatial density. Psychological Review, 85, 445-463.
Krumhansl, C.L. (1990). Cognitive Foundations of Musical Pitch. Oxford: Oxford University Press.
Lerdahl, F. and Jackendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge, MA: MIT
Press.
Lewontin, R.C. (1990). The evolution of cognition. In D.N. Osherson and E.E. Smith (Eds.), Thinking:
An Invitation to Cognitive Science, Volume 3 (pp. 229-246). Cambridge, MA: MIT Press.
Mandler, G. (1984). Mind and Body: Psychology of Emotion and Stress. New York: Norton.
Mead, M. (1964). Continuities in Cultural Evolution. New Haven, CT: Yale University Press.
Meyer, L. (1956). Emotion and Meaning in Music. Chicago: University of Chicago Press.
Miller, G. (2000). Evolution of human music through sexual selection. In N.L. Wallin, B. Merker, and
S. Brown (Eds.), The Origins of Music (pp. 329-360). Cambridge, MA: MIT Press.
Patel, A. (1998). Syntactic processing in language and music: Different cognitive operations, similar
neural resources? Music Perception, 16, 27-42.
Patel, A., Gibson, E., Ratner, J., Besson, M., and Holcomb, P. (1998). Processing syntactic relations in
language and music: An event-related potential study. Journal of Cognitive Neuroscience, 10, 717-733.
Pinker, S. and Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences,
13, 707-754.
Raffman, D. (1993). Language, Music, and Mind. Cambridge, MA: MIT Press.
Schacter, S. and Singer, J.E. (1962). Cognitive, social, and physiological determinants of emotional
state. Psychological Review, 69, 379-399.
Symons, D. (1992). On the use and misuse of Darwinism in the study of human behavior. In J.H.
Barkow, L. Cosmides, and J. Tooby (Eds.), The Adapted Mind: Evolutionary Psychology and the
Generation of Culture (pp. 137-159). Oxford: Oxford University Press.
Tooby, J. and Cosmides, L. (1992). The psychological foundations of culture. In J.H. Barkow, L.
Cosmides, and J. Tooby (Eds.), The Adapted Mind: Evolutionary Psychology and the Generation of
Culture (pp. 19-136). Oxford: Oxford University Press.
Wallin, N.L., Merker, B., and Brown, S. (2000). The Origins of Music. Cambridge, MA: MIT Press.
Wertheimer, M. (1950). Gestalt theory. In W.D. Ellis (Ed.), A Sourcebook of Gestalt Psychology (pp.
1-11). New York: Humanities Press. (Original work published 1924)
Williams, G. (1966). Adaptation and Natural Selection. Princeton, NJ: Princeton University Press.
Back to index
Proceedings abstract
Back to index
Proceedings abstract
George Papadelis
University of Thessaloniki
Greece
Background
Aims
Method
Results
Comparative analysis between the high and the low performance group revealed
that subjects' efficiency on classifying the best exemplars within each
category reflects the accuracy that characterises category's mental
representation throughout its hole range. Additionally, it was also concluded
that extreme tempo values and a high degree of pattern's structural complexity
affect mainly categorisation performance for these subjects who exhibit a low
degree of rhythmic skills.
Conclusions
final stage, where accurate and detailed representations are made and fine -
grain distinctions are possible.
Back to index
Proceedings paper
Introduction
Simultaneous notes in the printed score (chords) are not played strictly simultaneously by pianists. As reported in the literature, an emphasised voice is not only played
louder, but additionally precedes the other voices typically by around 30ms (melody lead; Hartmann, 1932; Vernon, 1937; Palmer, 1989, 1996; Repp, 1996). It is still
unclear whether this phenomenon is "a common expressive feature in music performance ... that aids listeners in identifying the melody in multivoiced music" (Palmer,
1996, 51). An alternative hypothesis is that it may be mostly due to the timing characteristics of the piano action (velocity artefact, Repp, 1996) and therefore a result of a
dynamic differentiation of different voices. Especially in chords played by the right hand, high correlations between velocity difference and melody lead (between melody
notes and accompaniment) seem to confirm this velocity artefact assumption (Repp, 1996).
The investigated data, derived mostly from computer-monitored pianos, represents the asynchronies at the hammer-string contact points. The present study will be focused
on asynchrony patterns at the finger-key contact times as well. Finger-key profiles represent what pianists initially do when striking chords. In this paper, we show that the
melody lead phenomenon disappears at the finger-key level. That means that pianists tend to strike the keys almost simultaneously, and it is only the different dynamics
(velocities) that result in the typical hammer-string asynchronies (melody lead).
Background
In considering note onset asynchronies, one has to differentiate between asynchronies that are indicated in the score (arpeggios, appoggiaturas) and asynchronies that are
performed but not especially indicated in the score. For the latter, two typical types have been observed in the literature: (1) Melody precedes other voices about 30ms
(melody lead), or (2) the melody lags in comparison to the other voices. Type 2 asynchronies occur mostly between two hands, e.g. a bass note is played clearly before the
melody (melody lag or bass lead), which is well known from old recordings of piano performances. Type 1 asynchronies are more common within one hand (especially
within the right hand, because melody often corresponds with the highest voice).
Method
Materials
The Etude op. 10/3 (first 21 measures) and the Ballade op. 38 (initial section, bar 1 to 45) by Frédéric Chopin were recorded on a Boesendorfer SE290
computer-monitored concert grand piano by 22 skilled pianists (9 female and 13 male). They were professional pianists, graduate students or professors from the Vienna
Music University, came well prepared to the recording sessions, but were nevertheless allowed to use the music scores during recording. Additionally the pianists were
asked to play the initial 9 bars of the Ballade in two versions: once particularly stressing the melody (first voice) and once stressing the third voice (the lowest voice of the
upper stave). The performance sessions were recorded onto digital audio tape (DAT) and the MIDI data from the Boesendorfer grand piano was stored on a PC's hard disc.
All performances came up to a very high pianistic and musical level and contained very few errors (overall error rate for the Etude and the Ballade was 0.34% and 0.66%
respectively).
Apparatus
The Boesendorfer SE290 has one set of shutters at the hammers, which provides two trip points, one as the hammer crown just starts to contact the string and the other 5
mm lower. These two trip points provide two instants in time as the hammer travels upward, and the time difference between these instants yields the final hammer
velocity (FHV, in meters per second), which can be transformed into MIDI velocity.
The instant at which the trip point at the strings occurs is taken as note onset time. The note onset times show a timing precision of 1.25 milliseconds, the FHV
measurement has a counter period of about 0.04 milliseconds.
Figure 1a, 1b. The timing characteristics of a grand piano action - the hammer-string contact times as functions of final hammer velocity (left side) and MIDI velocity (right side, solid lines). The
y-axes represent the time intervals from the finger-key strike times to the hammer-string contact times. The data in Figure 1a provided by Askenfelt is drawn in dotted lines with asterisks, the
horizontal lines indicate the key-bottom times for the three notes (piano, mezzo forte, forte), which are temporally displaced relative to the hammer-string contact times (see text).
Procedure
All note onsets and the velocity information were extracted from the performance data. This data was corrected and matched to a symbolic score in which each voice was
individually indexed. The error rate was very low (0.34% or 0.66%, see above). Wrong pitches were corrected and wrong or missing notes marked as missing. Timing
differences and velocity differences between the first voice (= melody) and each other voice were calculated for each event in the score. From that, asynchrony profiles and
Results
Hammer-string asynchronies
The average chord profiles are shown in Fig. 2 & 3 (Etude, Ballade, solid lines with asterisks). All pianists play the first voice consistently louder than the accompaniment,
so it can undoubtedly be called melody. As expected, the melody precedes other voices about 20-30 ms. In the Ballade the chord profiles are very similar to each other,
melody lead slightly increases the lower the voices are. The chord profiles of the Etude show more variability, especially in the left hand, where the bass voice (7) tends to
have a lead with some pianists. This corresponds to type 2 asynchrony pattern (bass lead, esp. Pianist 6, 12, 13, 14, 16 and 21). A real outlier is Pianist 3, who plays
melody up to 50ms before accompaniment. In this case, it is a deliberate choice or spleen that pianist 3 uses to emphasise melody (personal communication with pianist 3).
In the exaggerated first voice versions of the first 9 bars of the Ballade, the stressed voice is played louder than in the normal version (1.4m/s versus 1.0m/s on average),
while the accompaniment maintains its dynamic range. The melody lead increases up to 40 to 50ms (see Figure 4). The third voice versions show the same tendency, but to
a smaller extent. The third voice is played loudest (at about 1.2m/s on average), the melody is still quite prominent (about 0.8m/s) and the other voices are as usual. The
third voice leads about 20ms compared to the first voice, the remaining voices lag by another 20ms (Figure 4). Thus, when pianists are asked to emphasise one voice, they
play this voice louder and enlarge the dynamic distance to the accompaniment. Parallel to this, the timing difference increases correspondingly.
Figure 2 & 3. The average chord profiles for the average version (bold lines) and the 22 individual performances (grey lines). The profiles plot the averaged timing delays of the individual voices
relative to the melody (voice 1). The right line tree (solid lines, average version with asterisks) represents the hammer-string domain (h-st); the left tree (dotted lines, average version with circles)
indicates the finger-key domain (fg-k). Pianist 3 is outlined by the (coloured) lines with diamonds.
Figure 4. Average chord profiles of the 22 individual performances. The versions with the 1st & the 3rd voice stressed are indicated by asterisks or circles respectively (hammer-string domain, solid
lines). The finger-key domain is plotted in dotted lines.
Generally, it can be seen that the larger the dynamic differences, the greater the extent of melody lead. This overall tendency can be measured also for each single event in
Table 1. Correlations between velocity differences and timing differences for 22 recordings and their average, two pieces and two hands. The significance level is indicated by asterisks (* p < .05; **
p < .01)
Etude Ballade
The within right hand coefficients are usually higher than the between hand coefficients (l.h. in Table 1), more so for the Etude than for the Ballade. The lower left-hand
correlations indicate a larger independence between the two hands. The often anticipated bass note is one example of this tendency, as it is displayed by Pianists 5, 12, 13,
14, 16 and 20 in the Etude. The correlation coefficients for the special versions of the Ballade show a similar picture and are not shown here.
In considering correlations of timing and velocity differences, a linear relation is measured. The timing correction curve of the piano action shows an inverse power
relationship (Fig. 1). However, to get at least an impression of the slope of TCC, the FHV - MIDI velocity data was approximated by a linear curve (Fig. 1b). Figure 5
shows the scatter plots of the timing and velocity differences of the average of the 22 recordings for both hands. The interpolated slope of the TCC (-.69) is slightly steeper
than the slopes of the scatter plots. The left hand slope of the Etude (-.27) is less steep, because of the asynchrony tendencies described above. In this figure, one can see
that the expected and the observed direction of the velocity artefact (slope of TCC and regression line of the average data respectively) are quite similar.
Figure 5. MIDI velocity differences against timing differences of the average of the 22 recordings.
The dotted line indicates the linearly interpolated slope of the TCC function (slope = -.69).
Finger-key asynchronies
To provide an overview of the initial finger-key times, a special finger-key version for each recording was computed, where the onset times were corrected by TCC in
dependence of FHV. The average chord profiles of the finger-key versions are plotted as dotted lines in Figures 2 & 3, the average finger-key chord profile of the average
versions in solid lines with circles.
It can be easily seen that the chord profiles of the finger-key versions are much more synchronous than the hammer-string pattern. The melody lead for the right hand is
reduced to about zero, which seems to indicate a strong effect. The differences between the hammer-string and the finger-key profiles throughout all pianists, pieces and
voices are significant at p < .01 (two-tailed t-test), whereas the delays of the other voices relative to the melody in the finger-key profiles are all statistically non-significant
(p > .05). This result indicates that the deviations from zero in the finger-key profiles show no striking evidence.
The grey chord profile cluster, which gives an impression of the individual chord profiles of the 22 pianists, is more homogeneous for the Ballade than for the Etude. This
Discussion
The consistently high correlations between timing and dynamic difference show the strong overall dependency of melody lead on velocity. The more the melody is
separated dynamically from accompaniment, the more it precedes. Our way of calculation and our results coincide with Repp's study (Repp, 1996).
In addition to these findings, the finger-key versions show that most of the investigated melody lead phenomenon disappears at this level. Pianists start to strike the keys
almost synchronously, which is a strong support of the overall validity of the velocity artefact assumption. They begin their acceleration basically simultaneously, but
different velocities cause the hammers to arrive at the strings at different points in time.
Nevertheless, pianists clearly play asynchronously in some cases as well. First, the left hand leads globally in the finger-key domain by about 10ms, which is an
unexpected result. Is it possible that pianists compensate for the longer response time of softer notes, and therefore start to strike soft chords instinctively earlier? This is -
nota bene - an effect which occurs only between hands, and not within one hand. Second, as a special case of the first, the bass anticipations are extended up to 150ms in
some cases and about 50ms usually, when they occur. This distinct anticipation seems to be produced by the pianist's will, although probably often unconscious.
Another interesting finding is that pianists obviously are able to enlarge the melody lead deliberately beyond the usually observed 30ms, as Pianist 3 indicates. The
question is, whether it is possible for pianists to dynamically differentiate voices in a chord without producing melody lead in the hammer-string domain. In a common
sense view this opposite direction seems not to be possible. There is no example in the data which would prove the opposite.
The findings of this study confirm the assumptions by Repp (1996) more than those of Palmer's studies (1989, 1996). Nevertheless, melody lead is of course a
phenomenon which helps a listener to identify melody in a multi-voiced music environment. Temporally offset elements tend to be perceived as belonging to separate
streams (stream segregation, Bregman & Pinker, 1978) and spectral masking effects are diminished by shifting one masking voice by several milliseconds (Rasch, 1978,
1979). But in the light of the present data, it doesn't seem that pianists primarily produce melody lead in order to separate voices temporally. The temporal shift of melody
is more a result of differentiating voices dynamically. Melody lead is linked together with dynamic differentiation, as we have seen, however both phenomena have similar
perceptual effects, that is, separating melody from accompaniment.
The TCC used to create the finger-key onset points in time is still a preliminary approximation of the grand piano's action characteristics. However, in further research a
Acknowledgements
This research was supported by the Austrian Federal Ministry of Education, Science and Culture in the framework of the START programme (grant no. Y99-INF). I want
to specially thank Wayne Stahnke, who gave generous insight into the functionality of the Boesendorfer SE system, and to the Boesendorfer Company in Vienna, which
provided the SE290 Imperial grand piano for experimental use. I am grateful to Gerhard Widmer, Simon Dixon and Emilios Cambouropoulos for correcting earlier
versions of this paper.
References
Askenfelt, A. & Jansson, E. V. (1990). From touch to string vibrations. I: Timing in grand piano action, Journal of the Acoustical Society of America, 88, 52-63.
Askenfelt, A. & Jansson, E. V. (1991). From touch to string vibrations. II: The motion of the key and hammer, Journal of the Acoustical Society of America, 90,
2383-2393.
Askenfelt, A. (ed.) (1990). Five lectures on the acoustics of the piano, Stockholm, (Publications issued by the Royal Swedisch Academy of Music, 64).
Bregman, A. S. & Pinker, S. (1978). Auditory streaming and the building of timbre. Canadian Journal of Psychology, 32, 19-31.
Goebl, W. (1999). Analysis of piano performance: towards a common performance standard? Paper presented at the Society of Music Perception and Cognition
Conference (SMPC99), Evanston, USA, August 14-17, 1999.
Hartmann, A. (1932). Untersuchungen über das metrische Verhalten in musikalischen Interpretaionsvarianten. Archiv für die gesamte Psychologie, 84, 103-192.
Huron, D. (1993). Note-onset asynchrony in J. S. Bach's two part inventions. Music Perception, 10, 435-444.
Palmer, C. (1989). Mapping musical thought to musical performance. Journal of Experimental Psychology: Human Perception and Performance, 15, 331-346.
Palmer, C. (1996). On the assignment of structure in music performance. Music Perception, 14, 23-56.
Rasch, R. A. (1978). The perception of simultaneous notes such as in polyphonic music. Acustica, 40, 21-33.
Rasch, R. A. (1979). Synchronization in performed ensemble music. Acustica, 43, 121-131.
Repp, B. H. (1996): Patterns of note onset asynchronies in expressive piano performance. Journal of the Acoustical Society of America, 100, 3917-3932.
Stahnke, W. (2000), developer of the Boesendorfer SE system. Personal communication.
Vernon, L. N. (1937). Synchronization of Chords in Artistic Piano Music. In: Carl E. Seashore (ed.). Objective Analysis of Musical Performance, Iowa: University
Press, Studies in the Psychology of Music, IV, 306-345.
Back to index
file:///g|/Tue/Goebl.htm (12 of 12) [18/07/2000 00:36:32]
EBM_keele
Proceedings paper
1 Background
According to Berlyne (1971), preference for stimuli is related to their complexity or unpredictability.
Although this claim has been supported by a large number of studies in the field of music (reviews by
Finnäs, 1989; and Fung, 1995; North & Hargreaves, 1995), adequate objective ways of measuring the
originality or complexity of music are in short supply. The existing objective measurements, such as
the information-theoretic and tone transition probability (Simonton, 1984) models possess
well-known limitations: First, research within this tradition has tended to employ limited stimuli,
which have typically been specially-composed pieces or excerpts of classical music. Secondly, these
models do not address the role of the listener's perceptual system in organising the structural
characteristics of music.
Therefore the aim of the present research was to (i) devise an objective computer model of those
perceptual processes which underlie human listeners' musical expectations and complexity
judgements; (ii) determine the extent to which this model could predict experimental participants'
complexity judgements in response to a range of musical stimuli; (iii) determine the effectiveness of
the model relative to that of the previous models; and (iv) provide an example of the model's
application to real music.
supported by Palmer & Krumhansl, 1990; Thompson, 1994) and duration of tones (e.g. Castellano,
Bharucha & Krumhansl, 1984; Monahan & Carterette, 1985). These modifications are made since
they emphasise tones that occur in more prominent locations or possess longer durations: These
factors both lead to the increased perceptual saliency of tones.
Intervallic factors consist of principles derived from Narmour's (1990) implication-realization model.
The principles are proximity, registral return, registral direction, closure, intervallic difference, and
consonance. These principles are hypothesised to be innate and are based on a variety of Gestalt laws
applied to tone-to-tone continuations. Here the principles are used to measure the extent to which
these implied patterns are violated. The coding of the model is derived from Krumhansl (1995).
Rhythmic factors include rhythmic variability, which accounts for changes in the duration of notes,
syncopation, which measures the amount of deviation from the regular beat pattern, and rhythmic
activity, which is simply the number of tones per second. All three rhythmic principles have been
found to increase the difficulty of perceiving or producing melodies (e.g. Clarke, 1985; Povel &
Essens, 1985; Conley, 1981, respectively.).
In short, melodies which create expectancies that are clearly structured in terms of their tonal,
intervallic, and rhythmic properties tend to be easier to reproduce and recognise, and are also judged
by listeners as being less complex. The EBM processes MIDI melodies and produces a final melodic
complexity score by calculating the weighted sum of each principle. Full details of the EBM can be
found in Eerola and North (2000).
musicological literature concerning them, and because the popularity of the songs can be easily
determined. Furthermore, an earlier study by West & Martindale (1996) showed that the arousal
potential of the lyrics of the Beatles increased across time and was not related to the popularity of the
songs, as measured by the record positions in the charts. The aim of the present research was to
employ the EBM in determining (a) whether the Beatles' melodies show an increase in arousal
potential over time and (b) whether the popularity of the melodies is linked with their complexity.
5.1 Analysis material
The material used in the study comprises of all of the songs that were written by the Beatles for the
Beatles, published by the English record companies EMI and Apple officially in 1962-70. This
sampling deliberately excludes cover versions recorded by the Beatles or recordings of Beatles' songs
by other artists. This resulted in a set of 182 qualifying songs. The songs were arranged
chronologically by their original recording date (which was derived from Lewisohn, 1988). The
melodies of the songs were obtained from most reliable notation of the Beatles' music available
(Fujita, Hagino, Kubo, and Sato 1993), encoded as MIDI files, and transposed to a common key (C
major or minor) for the analyses. Grace notes and notated non-pitch information (e.g. shouting,
speaking etc) were removed when encoding the melodies for the computerised analysis.
5.2 Results of the archival study
The melodic complexity of the songs was first regressed onto recording dates, to see whether linear
time-trends across the Beatles' career existed. A highly significant increasing trend emerged (R2=
12.4, F= 25.42, df= 1,180, p< .001). More simply, the melodies of the Beatles' songs became more
complex over time. This is consistent with a previous study that has investigated the statistical
properties of the Beatles' lyrics (West & Martindale, 1996) by means of computerised content
analysis. Both results support Martindale's theory of aesthetic evolution (1990) (which proposes that
art works should become increasingly arousing over time), but could also be attributed to increasingly
sophisticated performance and compositional skills.
Next various popularity indices were correlated with melodic complexity. These included the number
of weeks the songs spent on the chart, the chart positions, and an aggregate function of both these for
singles and albums in the UK and US top 40 (Gambaccini et al, 1996; Whitburn, 1996). The
popularity of singles as measured by weeks on chart in UK correlated negatively with the melodic
complexity of the songs (r(23)= -.567, p< .01). This implies that the simpler the song is melodically,
so the longer it spent on the charts. Also, the chart success of albums (measured either by chart
position, chart duration or both) correlated negatively with the mean melodic complexity of the
albums (r(11)= -.729, p< .05), suggesting that melodically simpler albums have fared better in the
popular music markets. Finally, different poll results (Reed, 1982), compilations (Bronson, 1995) and
expert ratings (Larkin, 1994,) were compared with the melodic complexity of the songs but no clear
trends emerged from these analyses. Interestingly, the other models were not able to predict any
trends in the popularity of the Beatles songs. The transition probability model, however, demonstrated
the same, although weaker, increase in melodic complexity over time as the EBM (R2= .037, F= 6.91,
df= 1,180, p< .01).
It should be noted that although relationships between chart performance and melodic complexity
were observed in this study, several extraneous social, cultural and commercial influences certainly
affect the popularity of songs. However, the present findings may serve as a starting point for further
inquiries into questions of this nature.
6 Discussion
In this paper, an expectancy-based model of melodic complexity was tested. The EBM provided the
most accurate prediction of human responses to the present stimuli. However, it is important to
establish the extent to which the EBM can predict humans' responses to a wider range of melodies.
There is a risk that the model may be excessively tailored to the specific set of melodies employed in
the present research: some principles might be more important in predicting responses to kinds of
music different to those considered here. Also, the EBM ignores several aspects of music by
considering only single melodic lines. For example, the richness of arrangement, harmony and tempo
are factors outside the scope of the EBM but which might influence listeners' sense of the complexity
of the music in question.
In conclusion, the EBM may have potential for the analysis of very large samples of 'real' musical
stimuli in terms of their melodic complexity. This was demonstrated by analysing all the songs by the
Beatles, which revealed modest trends between melodic complexity and chart success of the singles
and albums; and also an increase in melodic complexity over time. Studies using considerably larger
samples of music are currently underway. Research along these lines could have a considerable
impact on our understanding of the perception and appreciation of music, as this computerised
approach opens up new possibilities for studying the relationship between the properties of music and
listeners' preferences.
7 References
Berlyne, D. E. (1971). Aesthetics and psychobiology. New York: Appleton-Century-Crofts.
Bronson, F. (1995). Billboard's hottest hot 100 hits. New York: Watson-Guptill.
Carlsen, J. C. (1981). Some factors which influence melodic expectancy. Psychomusicology, 1,
12-29.
Castellano, M. A., Bharucha, J. J., & Krumhansl, C. L. (1984). Tonal hierarchies in the music
of North India. Journal of Experimental Psychology: General, 113, 394-412.
Clarke, E. F. (1985). Some aspects of rhythm and expression in performances of Erik Satie's
"Gnossienne No. 5.", Music Perception, 2, 299-328.
Conley, J. K. (1981). Physical correlates of the judged complexity of music by subjects
differing in musical background. British Journal of Psychology, 72, 451-464.
Eerola, T., & North, A. C. (2000). Measuring melodic complexity: An expectancy-based model.
(Submitted for publication).
Finnäs, L. (1989). How can musical preferences be modified? Bulletin of the Council for
Research in Music Education, 102, 1-58.
Fujita, T., Hagino, Y., Kubo, H. & Sato, G. (Transcr.) (1993). The Beatles Complete Scores.
London: Wise Publications.
Fung, V. C. (1995). Music preference as a function of musical characteristics. The Quarterly
Journal of Music Teaching and Learning, 6, 30-45.
Gambaccini, P., Rice, T., & Rice, J. (1996). The Guinness book of top 40 charts. 2nd ed. UK:
Back to index
Proceedings abstract
Mr Philippe Lacherez
lacher@psy.uq.edu.au
Background:
Recent models of pitch perception have suggested that pitch is not perceived
absolutely but involves comparisons between successive sounds. An important
paradigm in this research has been the use of specially-synthesised "Shepard"
tones, which ambiguate absolute frequency information while retaining the
relationships between tones, thus permitting a direct analysis of this
comparison process. Recent work using such stimuli has led to the suggestion
that a motion-specific perceptual process may be involved in relative pitch
judgements.
Aims:
method:
Results:
Conclusions:
Back to index
Proceedings paper
domain-specific intelligences, with learning, memory and thinking machinery focused upon specific
areas of function" (1994, 189). This concept of modularity, with each domain somewhat autonomous
as it originated in response to specific needs, is perhaps most popularly exemplified in Gardner’s
(1983) list of multiple intelligences; we acquired linguistic, musical, logical-mathematical,
bodily-kinesthetic, spatial, intrapersonal, interpersonal, naturalist, and spiritual (or existential)
intelligences as a means of adaptation to changing circumstances in the environment.
Time — Rhythm— Hearing
Nature includes many periodicities, or rhythms. Many of these, such as day-night, lunar, and seasonal
cycles, impact animal behavior. Think of the effects of temperature changes on cold-blooded animals.
A snake, for example, is at the mercy of the environment as the temperature swings from chilly in the
morning to hot in the afternoon to cold at night. Other animals have internal regulators that help to
keep internal temperatures more constant. In humans, brain waves, hormonal outputs, and sleeping
patterns are examples of the more than 100 complex oscillations monitored by the brain (Farb, 1978).
Homeostasis (preservation of internal sameness), among higher animal forms and especially among
humans, represents freedom by making us more time-independent. The issue of time is not only
biological, but psychological as well, as it was just as important to find ways to create mental
homeostasis (a stable psychological world). "Inner sameness, whether biological or psychological (the
two cannot be separated in any clear-cut way), is an evolutionary invention peculiar to advanced
forms of life and necessary if living creatures are to avoid being the slave of time" (Campbell, 1986,
60).
Hearing is a primary sense through which we create a stable, inner world of time. Millions of years
ago when dinosaurs roamed the earth, mammals, then just small forest creatures, were forced to hunt
at night for safety's sake. Hunting at night requires a keen sense of hearing as sonic events occurring
over time must be ordered to become meaningful. A rustling of leaves may indicate a predator
approaching or prey retreating. Typically, organisms analyze sounds for patterns, patterns detected are
ascribed "meaning," and this meaning drives behavior (Mikiten, 1996). The more complex the sound
analyzer and pattern detector (i.e., the brain and its related sensory organs), the more complex the
patterns that can be detected and the more complex the resulting behaviors. Thus, evolution provided
human beings with a remarkable capacity to interpret sounds that are time-ordered. "To hear a
sequence of rustling noises in dry leaves as a connected pattern of movements in space is a very
primitive version of the ability to hear, say, Mozart's Jupiter Symphony as a piece of music, entire,
rather than as momentary sounds which come and then are no more ..." (Campbell, 1986, 263-264).
Biophony
The sonic world in which we evolved was filled with an incredible array of detectable patterns.
Modern living has detached us from the sounds of nature, but for our ancient ancestors their very
survival depended upon the ability to detect patterns in these sounds, derive meaning from them, and
adjust their behavior accordingly. Wind and water noises, bird calls, monkey screeches, and tiger
growls all had meaning. Beyond this, many (if not all) animal sounds were suffused with an
"emotional" content. They screamed in pain, roared a challenge, or offered enticements for mating.
Darwin contended that human musicality arose out of the emotional content of animal sound-making
when he said that "musical tones and rhythm were used by our half-human ancestors, during the
season of courtship, when animals of all kinds are excited not only by love, but by the strong passions
of jealousy, rivalry, and triumph" (1897/nd, 880).
Early humans would have heard these sounds not in isolation but holistically as a sound tapestry.
Krause’s (1987) niche hypothesis likens the sounds of nature (biophony) to a symphonic score. A
spectrogram of the sounds of the forest or around a pond shows that each species produces sounds that
occupy particular niches. If these sounds are important—mating calls, for example—they wouldn’t be
very effective if they were lost among all the other sounds. Thus, each animal has learned over the
millenia to create sounds that occupy a very particular strata in the overall biophony, insuring that
those for whom the sound is intended can pick it out.
Growing up in a particular sonic environment—growing up both in the sense of the individual and of
the generations over thousands of years—it is quite natural that we would make attempts to mimic the
sounds of nature. With our great brains we moved easily from mimicry to elaboration, extension,
synthesis, and eventually the creation of novel sounds. Thus, we occupy our own niche in the natural
order of sounds, but we are not content to remain in that niche. As a dramatic example, Krause (1987)
finds that it now takes 2,000 hours of field recording to acquire one hour of usable material; the
reason for this is that it is nearly impossible to find natural habitats that are not invaded by human
sounds.
Much of the earliest music would have been vocal (and other bodily sounds) and many of the earliest
instruments would have been biodegradable, having been made of reeds, wood, or skins, and thus lost
in the mists of time. Nevertheless, there are evidences of early music. Acoustical analyses of caves
show that those places where the acoustics are best are accompanied by many paintings; those places
where the acoustics are poor have few or no cave paintings. "Thus, the best places to view the artwork
of the cave appear to have been the best places to hear music or chants" (Allman, 1994, 216). Also
found in the caves are whistles, flutes, and mammoth bones that may have been used as drums or as
Ice Age xylophones.
Attema (2000) recently demonstrated a 53,000 year-old bone flute made from an animal leg bone.
This is not a simple "toy" such that any child could make out of a hollow tube. Rather, it is a fipple
flute (similar to a recorder), requiring a plug in the head joint with an air channel venting through an
air hole and tone holes properly spaced down the length of the tube. This is a startling demonstration
that even at that early stage in our development we had the brain power to figure out complex
problems. Moreover, this was obviously important enough to have invested a considerable amount of
time and energy to get it right. No doubt there were many unsuccessful attempts along the way. (See
Hodges and Haack, 1996, for a review of additional ancient musical artifacts.)
Mother/Infant Bonding
In consideration of the survival benefits music has to offer, the evolutionary advantage of the smile,
like music a universally-innate human trait, provides a useful analogy. From a cultural evolutionary
standpoint, the smile has taken on many diverse meanings. However, from a biological evolutionary
standpoint, the primary survival benefit may have been the bonding of mother and infant (Konner,
1987). Likewise, although music has many diverse cultural meanings today, at its roots it may also
have had survival benefits in connection with mother-infant bonding.
From Australopithecus africanus, nearly five million years ago, to modern humans, the brain has
nearly tripled in size (Cowan, 1979). If the human fetus were carried "full term" in terms of brain
development, the head would be too large to pass through the birth canal and we would be unable to
be delivered. The evolutionary solution to this problem was that we are now born with our brains
incompletely developed. It takes about six years for the brain to reach 90 percent of its eventual adult
size.
The result of this post-partum brain development is an increased period of dependency of infants on
their parents. Compared with other animal species, human infants are more helpless and for a
significantly longer period of time. The fact that human mothers most often give birth to single babies
rather than litters means that more time may be devoted to the individual child. While the baby is in
this stage, s/he is growing, developing, and learning at a tremendous rate. Nearly 75 percent of a
newborn's cerebral cortex is uncommitted to specific behaviors (Springer and Deutsch, 1989). This
uncommitted gray matter, called association areas, allows for the integration and synthesis of sensory
inputs in novel ways.
Mothers and newborns confer many important physiological and psychological benefits on each other
and chief among them are loving behaviors. Babies learn to love almost immediately and in turn are
nurtured by love. The importance of these loving interactions cannot be overstated.
Love and affection are communicated to a baby in a number of ways. Speaking, singing, and touching
(primarily in the form of rhythmic stroking, patting, and rocking) are three primary modes of
communicating with infants. Some psychologists have coined the term "motherese" in reference to the
particular kind of speech patterns mothers use with their infants (Birdsong, 1984). The musical
aspects of motherese are critically important, not only as an aid to language acquisition, but especially
in the communication of emotions. Long before youngsters begin to talk, they are adept at deciphering
the emotional content of speech, largely due to the musical characteristics of motherese. In motherese
speech, it is the pitch, timbral, dynamic, and rhythmic aspects to which the baby responds, certainly
not the verbal content. "You are an ugly baby" spoken in a soft, sing-song fashion will illicit a far
more positive response than "you are a beautiful baby" shouted in an angry tone.
Of course, the communication system is a two-way affair. Babies, too, are learning to give love as
well as receive it. Vocalizations are a primary way that babies express their feelings (Fridman, 1973;
Roberts, 1987). Even in the first few days of life, babies begin to establish a relationship with their
parents through their cries. In the first few months, they develop a wider range of crying styles that
form a particular kind of infant language. The development of variations in crying styles is important
to emotional development, in providing cues to parents regarding their state, and in practicing for the
eventual development of language. Babies learn to cry to gain attention and to express an increasing
range of feelings. Because their vocalizations are nonverbal, it is once again the manipulation of pitch,
timbre, rhythm, and dynamics (prosody) that forms the basis of their communications system.
Imagine a small tribe of people living many thousands of years ago. A mother sits cradling a newborn
baby in her arms. This baby will be totally dependent upon her for all the basic necessities of
life—food, clothing, shelter, protection—for nearly two years and somewhat dependent upon her for
many years after that. If the baby were not respondent to anything related to musical or pre-musical
behaviors, how would the mother communicate love? And if the mother could not communicate love,
how would the baby survive? And if the baby could not survive, how could the species survive?
Fortunately, the baby has an inborn capacity to respond to a wide range of pre-musical expressions. A
large part of this inborn response mechanism must deal with some notion of pleasure. Warmth,
assurance, security, contentedness, even nascent feelings of happiness, are all a part of what is being
shared with the baby. If these responses to pre-musical activities were wired into the brain, is it not
understandable that music still brings us deep pleasure long after cultural evolution has developed
these pre-musical behaviors into playing bagpipes or singing grand opera?
Music and Language
One of the outcomes of the mother/infant dyad discussed previously is that the baby becomes
motivated to recognize and respond to sound patterns that will later become necessary for speech
perception. When parents communicate with their infants, their "baby talk" quite naturally emphasizes
expressed through music. Members of one tribe must band together to fight off members of another
tribe. Music gives courage to those going off to battle and it gives comfort to those who must stay
behind. Much of the work of a tribal community requires the coordination of many laborers. Music
not only provides for synchrony of movement but also for relief from tedium. These are but a few of
the many ways music may have supplied a unifying force to early communities.
Memory is also of crucial importance to the survival of a society. Not only is memory of a
technological nature important—When best to plant? Where best to find game? How best to start a
fire?—but also the things that make the society unique and special. Who are we? Where did we come
from? What makes us better than our enemies who live on the other side of the river? Music is one of
the most effective mnemonic devices. It enables preliterate societies to retain information—not just
facts but the feelings that accompany the facts, as well. Poems, songs, and dances are primary vehicles
for the transmission of a heritage.
Much of musical thinking may be placed under a broader heading of "play," which may provide
significant evolutionary advantages. The importance of play is understood more clearly when seen in
the fullest sense of exploring, examining, and problem solving (Brown, 1994). Curiosity may have
killed the cat, but for human beings it has led to discoveries and inventions that have aided survival.
Playing with every aspect of the environment has led both to the invention of the bow and to the songs
and dances that accompany the hunt and the battle. Which is more important? Are not both necessary
for survival? In fact, some evidence suggests that the bow was initially as much a musical instrument
as it was a weapon (Mumford, 1966). Musical bows can be found among both Native American and
African tribes. There are indeed significant survival premiums in play, generally, and in musical play,
specifically. What human beings have learned about themselves and the world through music has
been of tremendous benefit.
Akin to the notion of play, is Wilson’s (1998) contention that the hand and brain co-evolved. That is,
developmental changes that took place in the arm and hand allowed for many more skills (e.g.,
grasping, throwing, pounding, creating and manipulating tools, etc.) and this spurred brain
development. Furthermore, hearing is important in tool use; for example, as it aids in the processes of
filing or hammering. The same combination of hearing, handedness in tool making, and brain would
have been involved in the creation of early bone flutes. Tinkering and experimenting with different
lengths of tubing, where to put the tone holes, how to direct the air through the air channel, and so on,
would have provided important mental problems to solve, with an emotional investment in the
outcome. (Wilson also devotes an entire chapter to the idea that music has an evolutionary basis as a
secondary heuristic.)
Perhaps the most important thing human beings have learned through music is how to deal with
feelings. Although certain emotional responses may be inborn as a protective mechanism,
by-and-large we have to learn to recognize and express feelings. One of the hallmarks of humanness
is a sensitivity to feelings that allows for many subtle nuances. Being fully human means to
experience the infinite shadings that exist between the polar ends of emotional states. Our experience
of these refined feelings is essentially nonverbal. Notice how limited our vocabulary is in this area and
how often we experience difficulty in telling another exactly how we feel.
Music may provide a means of conferring survival benefits through the socialization of emotions.
When group living is mandatory for survival, as it is for human beings, learning to react to others with
sensitivity has clear evolutionary advantages. Lions hunt in groups; however, after a kill has been
made each individual fights for his or her share. The biggest and strongest get the most to eat. This
display of aggression at feeding time necessitates a subsequent period of licking and
rubbing—"making up"—an activity necessary to keep the social bonds of the pride in place (Joubart,
1994). Among primates, grooming serves a similar purpose, while language (particularly gossip) may
do the same for humans (Wilson, 1998). Music, as previously suggested, also contributes.
Listening to the daily news is all one needs to do to realize that human beings still have to deal with
many aggressive behaviors in our societies. We need to find ways to separate actions from feelings.
How does one feel anger without acting on it? How does one avoid giving in to loneliness and
despair? It is important to learn how to feel deeply without always resorting to action. Music is one of
the most powerful outlets for expressing emotions. One can learn to cope with grief, frustration, and
anger or to express joy and love through musical experiences.
Each intelligence has developed because it provides a unique way of knowing about the world. Each
type of intelligence may be better suited for providing information about different aspects of the inner
and outer worlds of human beings. Music, no better and no worse than other types of intelligence,
provides it own type of information. Music is particularly useful in providing a medium for dealing
with the complex emotional responses that are primary attributes of humanity. Clearly, developing
means of controlling and refining emotions would have evolutionary advantages.
Concluding statement
Contrast "Minimal musical skills are not essential so far as we know …" (Brown, 1981, 233) with
"Art was as crucial a part of our ancient ancestor’s survival as finding food and shelter" (Allman,
1994, 209). Or compare "Why do we respond emotionally to music, when the messages therein seem
to be of no obvious survival value?" (Roederer, 1982, 38) with "It [music and art] represents activity
as basic for the survival of the human species as reproducing, getting food, or keeping predators at
bay" ("Pfeiffer, 1980, 74). Finally, consider
[Music] is a creation of the human brain that made use of structures it inherited from
evolution, and that were designed to serve biologically relevant functions, in order to
develop and sustain a domain of activity as yet unheard of and of no direct biologically
adaptive value. (Sergent, 1993, 20)
in relation to
Proper study of the organization of the brain shows that belief and creative art are
essential and universal features of all human life. They are not mere peripheral luxury
activities. They are literally the most important of all the functional features that ensure
human homeostasis. (Young, 1978, 231)
Which side has it right? Is it reasonable to agree that like language, music is found in all human
groups, like language it arises readily in children with some degree of rule-based structure, and like
language it has identifiable neural structures devoted to it, but unlike language it confers no survival
benefits? It makes more sense that musicality was built into the human system over thousands of years
because, like language and all the other intelligences, it provides unique ways of knowing that
allowed our species to cope with the many uncertainties of life.
References
Allman, W. 1994. The stone age present. New York: Simon and Schuster.
Attema, J. 2000. Music from prehistoric times. Biomusic: The music of nature and the nature of
music. Washington, DC, February 19.
Birdsong, B. 1984. Motherese. In Science yearbook 1985: New illustrated encyclopedia, 56–61.
Lomax., A. 1968. Folk song style and culture. New Brunswick, NJ: Transaction Books.
Merriam, A. 1964. The anthropology of music. Chicago: Northwestern University Press.
Mikiten, T. 1996. A method for research in music medicine. In MusicMedicine, Volume 2, ed.
R. Pratt and R. Spintge, 14-23.
Montagu, A., and F. Matson. 1979. The human connection. New York: McGraw-Hill.
Mumford, L. 1966. The myth of the machine. New York: Harcourt Brace Javanovich.
Pfeiffer, J. 1980. Icons in the shadows. Science80 1, no. 4:72-79.
Plotkin, H. C. 1994. Darwin machines and the nature of knowledge. Cambridge, MA: Harvard
University Press.
Restak, R. 1979. The brain: The last frontier. New York: Warner Books.
Roberts, M. 1987. No language but a cry. Psychology Today 21, no. 5:41.
Roederer, J. 1982. Physical and neuropsychological foundations of music. In Music, mind, and
brain, ed. M. Clynes, 37-46. New York: Plenum Press.
Roederer, J. 1984. The search for a survival value of music. Music Perception 13:350–56.
Sergent, J. 1993. Mapping the musical brain. Human Brain Mapping 1:1, 20-38.
Springer, S. and G. Deutsch. 1989. Left brain, right brain, 3d ed. New York: Freeman.
Stern, D. 1982. Some interactive functions of rhythm changes between mother and infant. In
Interaction rhythms: Periodicity in communication behavior, ed. M. Davis, 101–17. New York:
Human Sciences Press.
Stiller, A. 1987. Toward a biology of music. Opus 35:12–15.
Wehr, T. 1982. Circadian rhythm disturbances in depression and mania. In Rhythmic aspects of
behavior, eds. F. Brown and R. Graeber, 399–428. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Wilson, F. 1986. Tone deal and all thumbs? New York: Viking Penguin.
Wilson, F. 1998. The hand. New York: Vintage Books.
Young, J. 1978. Programs of the brain. Oxford: Oxford University Press.
Back to index
ABSTRACT
This paper will consist of sociological reflections on the importance of music in everyday life. It will
be organised around two themes. First, the dialectic between music and noise. On the one hand, I'll
consider the ways in which music has become noise--the issue here isn't simply the ubiquitous
presence of music as the background sound of an increasing number of public, domestic, and private
activities, but also the circumstances under which such music becomes irritating, intrusive, a threat.
On the other hand, I'll consider the ways in which noise becomes music--my interest here is in the use
of volume and the effects of electronic sound production.
My second theme is listening. What does it mean to listen to music? What is involved in terms of
attention and engagement? I ask these questions not as a psychologist, wondering what's going on
inside people's heads, but as a sociologist, interested in both the social circumstances in which people
pay a special kind of attention to what they hear (what's involved here is both a timetable of
engagement and a sense of proper musical space) and in the discourse they use to explain what they
are doing.
While it is easy enough to describe the social and economic forces that have put music everywhere in
our lives, it is less easy to explain how, nonetheless, music remains a special experience. The question
I want to raise is what that 'specialness' now involves, with particular reference to the concept of
choice and key everyday musical institutions like the radio, the CD player and the club.
Back to index
Back
Proceedings
Understanding Formal approaches Drift and timing Prosody, meaning The nature of Auditory
the artist: to the evaluation variability in and musical musical emotions: roughness
exploring how of music isochronous behaviour a perspective estimation of
singers are compositions of interval production from psychology complex tones
evaluated children by with and without
external judges: music imagery (Abstract)
rating scales,
rubrics and other (Abstract)
techniques
(Abstract)
Voice, Emotion The use of Functional imaging The birth of music Is the emotional The perceptual
and Facial consensual of rhythm in synchronous system isolable organisation of
Gesture in assessment in the perception chorusing at the from the cognitive complex tones
Singing evaluation of hominid-chimpanzee system in the in a free field
children's music split brain?
compositions
(Abstract)
The voice in Children's On the persistence Music in Human Musical emotions: Musical timbre
therapy: perception of their of metrical percepts evolution a perspective beyond a single
monitoring own music from development note: tessitura
disease process in compositions (Abstract) (Abstract) and context
chronic (Abstract)
degenerative (Abstract)
illness
. S1 S2 S3 S4 S5 S6
Symposium: Thematic session: Thematic session: Thematic session: Symposium: Thematic
Session:
Children's Perception of Music and Computational Current trends in
compositions: harmony and meaning models the study of music Music and
understanding tonality and emotion movement
Chair: Ohgushi,K. Chair:
the process and Chair: Deliege,I. Belardinelli,M. Convenors: Juslin, Chair: Desain,P.
outcomes. P., Zentner, M.
Convenor: Chair: Zentner,M.
MacDonald, R.
Disscussant:
Chair: Miell,D. Scherer, K.
Discussant:
Folkestad, G.
(Abstract)
An empirical More about the Deriving meaning Artist: a Unresolved issues A sensory-motor
investigation of (weak) differences from sound: an connectionist in continuous theory IV:
the social and between musicians experiment in model of musical response in vestibular
musical and ecological acculturisation methodology responses to
processes non-musician's acoustics music
involved in abilities to process
children's harmonic (Abstract)
collaborative structures
compositions
(Abstract)
Symposium introduction
Title of Symposium: The Power of the Voice for Singer and Listener
Symposium Rationale: Using the human voice as a means of musical production has long been recognised as a way of
permitting any individual to have immediate access to musical expression. It has also been anecdotally reported to be
the most powerful of all musical instruments to elicit emotional responses in both performers and listeners. So, to
investigate the voice with all its potentialities is clearly important for the psychologist. However, little systematic study
has been undertaken to consider how voice production is achieved and received within a psychodynamic framework.
That is, how singers and listeners interpret the singing experience.
Aims The current symposium offers a broad range of research perspectives on the voice as a means of musical
expression and communication. The focal point for all presentations is an exploration of the psychodynamics elicited as
individuals engage in performing and listening to the voice. In the first presentation, the impact of the singer and his or
her stage presence is explored, and criteria are described which encapsulate what makes a moving or beautiful
performance. In the second paper (highlighting the impact of stage behaviour and opera house convention) a detailed
study of how non-vocal gestures are used to achieve enhanced emotional expression in singing is considered. In the
third paper, the role of the voice to monitor degenerative illness is considered in a case study of Music Therapy with a
Multiple Sclerosis client. Finally, the fourth paper offers some insight into communication in singing within Christian
worship.
Back to index
Symposium Introduction
Assessment of Children's Music Compositions: In Looking Out and Out Looking In
Convenor: Webster, P
The aim of this series of presentations and group discussion is to explore what
we know about the approaches to evaluating music compositions of children.
Music composition in the schools has had a long tradition in the United
Kingdom, Australia and in certain Western European countries and is how
beginning seriously in the United States as part of the Voluntary National
Standards in the Arts. Experiments in the effective assessment of these
compositions as part of a long-range plan to develop well-rounded music
experience for children is relatively new.
This two-hour symposium will address both past research and practice on this
topic, as well as offer possibilities for future work. Four researchers from
the United States and the UK will offer perspectives. Two of the papers (Mellor
and Seddon) will deal with children's own self-assessment of compositions and
the other two papers (Hickey and Webster) will explore assessments of the
compositions by adults. Mellor will concentrate on self-assessment patterns
across age and gender boundaries and Seddon will present data how children view
their own computer generated compositions in light of their past music tuition.
Webster will review the formal approaches to the assessment of compositions by
adult judges (rating scales, rubrics, written descriptions)and Hickey will
present evidence to support consensual assessment.
The convenor will begin the symposium with some introductory remarks and will
moderate the discussion. Contributions from the audience will be an important
part of the symposium, encouraging participants to share their experiences with
the topic. Each paper will be restricted to 20 minutes, allowing for about 30
minutes of discussion.
Back to index
Symposium introduction
The study of music and emotion should be at the very heart of music psychology. Yet this is a topic that has been
seriously neglected during the last few decades. Contemporary volumes on music psychology rarely discuss musical
emotions at any length. However, there has recently been a resurge of interest in research on music and emotion. The
intention of this symposium is to bring together researchers who have made theoretical and empirical contributions to
this the field in order to show current trends in the study of music and emotion. Topics addressed by this symposium
include: the nature of musical emotions, the relationship between emotion and cognition, the development of musical
emotions, similarities between speech and music, continuous recording of emotional responses, parallels and contrasts
between recognition and induction of emotion. The symposium is organised into two parts. Each part concludes with a
discussion of important problems featuring an invited expert as discussant.
Back to index
Proceedings paper
Performance observation
To date, there has been limited research interest in the topic. Wapnick (1997) carried out a brief study
on University entrance auditions for singers and discovered that singers who were more animated,
smiled more often and established more eye contact with the assessors were rated as attractive. Thus it
seems important to consider the role of gestures in the assessment of performance quality. Indeed,
Davidson (1991, 1993, 1994), in a series of studies about the content of body movements discovered
that musical intentions were more clearly revealed in physical gestures than in musical sounds,
suggesting the critical perceptual role of the body in understanding and communicating a musical
work to the audience.
Furthermore, Sundberg (1982) in a study on speech, song and the emotions argues that there is a
connection between the psychological emotion to be transmitted, the performer's external body
movements and the acoustic consequences of the gesture of the speech organ - the internal body
movements of the vocal tract which result in varying the timbre of the singing tone.
In spite of the fact that some performers and teachers are aware of the importance of body movement
in conveying musical expression, it is not clear the extent to which body movements are considered
important criteria in the assessment of musical performance.
It is noteworthy that in a study by Saunders and Holahan (1997) it was discovered that both teachers
and assessors discriminate readily between levels of technical and artistic attainment, and use these
two distinct categories to determine an area of 'performance error'. To date, no explanation of how the
elements that are said to constitute musical performance quality are judged has been attempted. There
is an acknowledgement that performances comprise both technical merit and aesthetic appeal like
Saunders and Holahan's work describe, yet there are no indications about what stocks of knowledge
are being drawn upon when assessments are made. The current study is an attempt to explore what
these criteria may be.
The population under consideration in the current study is the staff and students of the Vocal Studies
Department at the Guildhall School of Music in London where second year mid-term assessments
were examined.
their own written reports, but they offered comments in order to construct a collaborative final report.
This report was eventually given to the students along with their grades. All performances were video
taped and all discussions were tape-recorded.
Results
In summary, using an interpretative phenomenological approach to analysing the data, the assessors
revealed that the following 'criteria' were being used:
- Technical Control
The assessors were very concerned about how the voice and body were controlled. That is, how the
technical aspects of singing were embodied. For instance:
'At the moment, what we hear is a good voice. Everything else is slightly lacking: not enough
connection, support, and engagement with body and brain. (...) At the moment, the small voice
and big physique don't match'.
Perhaps the most significant sub-theme to emerge from the technical issues was the concept of vocal
support. If a singer does not have a strong foundation of support, with the correct development of the
muscles in question, the voice cannot reach its full potential.
'Today there was little to help the sound. No teamwork between the muscles'.
'Posture- she collapses too much, needs to cool it physically'
'There is quite shocking neck tension and an overall lack of support. The tension begins to make
the voice wobble (...) whilst the potential is very impressive we feel as though it will be
compromised unless the physical tensions are removed'.
In summary, it seems that the assessors looked for an ideal connection in the body for the achievement
of technical control. Additionally, there was a concern that the mind and body were working in
synchrony for the performance:
'The voice seems very young- the inflexibility-vocally, mentally and physically this was
worrying'.
'The voice breaks because he does not know how to draw on his physical strength, and
this leads to not harnessing the passion that is in him'.
The single most striking and surprising criterion to emerge was the emphasis the assessors placed on
the physical appearance of the singers.
- The Body and Appeal
Comments of this type were very often of a personal nature. Here are several examples:
'Odd looking chap' (female assessor of a male singer)
'Visually: Odd make-up and ill-fitting cardigan' (male assessor of a female singer)
'Pretty girl. Stood like a dancer' (female assessor)
'A big guy. With a high lyrical voice' (female assessor)
'A rather puppet like physical appearance' (male assessor of a female singer)
'Very (oddly) splayed feet' (male assessor of a female singer)
'Bow ankles and sweater covered hands. You seem a bit motherly matronly in this outfit' (male
assessor of a female singer)
Physical appeal was very often the first thing the assessors noted, when asked to write their
impressions of the performer and his/her performance.
- 'Bodily Communication'
The assessors also focused heavily upon bodily communication, and more specifically upon aspects
related to the use of facial expression and eye contact, as shown in comments such as:
'A self-possessed beam'
'A visual "performing" element missing. A problem of self-image: Does he need/want to
develop as a performer?'
'Very appealing visual/facial expression', or on the contrary, 'Eyes dead. Blank face'
'Body involvement needed'
'Lovely freedom of body movement'
'Eyes attempted audience involvement'
In summary, one could say that the body was regarded not only as the physical support of the singing
process, but also as a means of expression and so, a primary means of communication with the
audience. From these comments it is evident that the physicality of singing and how this interfaces
with the performer's inner mental processes - what we believe the assessor's label of 'artistic
communication' to mean - are key criteria in the assessment process.
Indeed, the next most strongly emergent criterion employed by all of the assessors was 'artistry'. Here,
we try to de-construct what they mean by this term by the issues they raised to discuss allied to its use.
- The importance of artistry
We have identified three components of artistry: communication, performance personality and
presence.
Communication
1) Communicating meaning
The meaning of the song or aria's message was a central concern. The assessors suggest that ideally
the interpretation should be 'heart-felt', 'from the centre of the person', and therefore with
'self-possession'. The consequence of these terms seems to be that an expressive performance is
created emerging out of the individual's personality and presentation on stage.
2) Interacting with the audience
The singers showed commitment in interacting with the audience in various degrees and in different
styles varying from physically approaching the audience to simply smiling, or introducing the
performance pieces to explain their performance intentions.
Connected to the concept of singer-audience interaction, there was much discussion of the singer's
'presence' - the assessors referred to it as the singer's projection of the 'self' on the stage.
Presence
The more focused the singer is on transmitting the musical intention with a strong projection of
personality, the more the assessors seem to be captivated and willing to interact, and therefore,
consider the singer as being 'appealing'. The underlying logic may be, as one of the assessors
commented: 'The singer is so focused that nothing interferes with our relationship, and so, not only the
composer's message is important, but the singer also acknowledges that I am here and I am important
too'.
If the singer is not sufficiently 'present', if he or she does not bring sufficient personality to the stage,
the assessors feel either that there is a lack of energy or interest, or that the singer is hidden behind the
song's message. Lack of energy or interest in acting is then considered as a deficiency in the
performance itself:
'A lovely sound, but rather disappointing as she doesn't get involved as an artist'
Singing Personality
Although the assessors are aware of a 'performing personality' which seems to be different to the
'inner self', on several occasions a process of identification between the performing and the inner
personality occurred simultaneously. This was perhaps due to the fact that the singer may have
identified with the song and so internalised its meaning, or contrarily, the singer may have acted in a
rather convincing way:
'Charming girl-Charming voice.'
'A sweet and sunny personality. A sweet and sunny voice.'
'The Barber was transfixing. Lots of intelligence, self-possession and humility here.'
'I am just flooded with pictures of Sarah Black, his girlfriend-Why not? This is the reality of
this song.'
'This is an engaging performing personality showing great intensity. It is all engaged and
heartfelt.'
From the above-mentioned it can be extrapolated that singers are expected to display their emotions in
overt behaviour. It seems that the assessors expect the overt behaviour to show internal states. How
much of this is 'acting' and what effect this 'acting' has on the singers' personality and identity is
clearly a fascinating emergent issue to which we have no further insight at the present time. We are all
aware of the cultural stereotype of the 'luvvy', 'loud' and 'extravagant' personality of the operatic Diva.
We would have to ask whether the job demands this kind of behaviour or if this kind of person is
attracted to the stage.
Discussion
From the analysis of the data, it is evident that in assessment the body is a critical factor for
consideration: how does the body look, how it is presented; how is the singing physically prepared
and executed? The interface of personality through both music and stage presence was critically
important. Also, the appropriateness of repertoire to the singer's level of achievement is of great
influence in assessing performance.
The results clearly show two different dimensions of criteria used by the assessors; those related to the
technique of sound production; and those related to the presentation of musical content such as
emotional expression and the personality of the interpreter. These two dimensions proved to be highly
interrelated, since it is evident that a correct technique not only enables but also integrates a greater
degree of artistry, which would be the main aim of each performance.
However, in the assessment procedure, technical proficiency and artistry seem to work as
'compensation laws'. That is, if a student has not acquired a sufficient technical proficiency, the
assessors feel obviously more involved and touched with the performance artistically; if everything is
technically correct but does not involve emotional expression and personality, the technical content
becomes the central focus. Hence the compensation of technical for artistic and vice-versa.
As far as technical proficiency is concerned, comments, critiques and even solutions were presented to
the singers in a far more objective way than the comments made about the artistry level of the
performance. Maybe this occurs because technical proficiency is less subject to stylistic and social
influence, and relates to more concrete (bodily, facial and physical) aspects of the sound production.
Another emergent criterion of assessment was the importance of body movements as a means of
expression, on the one hand, of the emotional state of the singer and, on the other, as a means of
conveying structural and expressive features of the music to the audience. Therefore, it would be
relevant to explore the existence of a vocabulary of both body movements and phonation processes,
which would enable the singers to achieve better performances.
References
Balk, H.W. (1991). The Radiant Performer: The Spiral Path to Performing Power. University of
Minnesota Press, Minneapolis.
Davidson, J.W. (1991). The perception of expressive movement in music performance. Unpublished
doctoral dissertation. City University, London.
Davidson, J.W. (1993). Visual perception of performance manner in the movements of solo
musicians. Psychology of Music, 21, 103-113.
Davidson, J.W. (1994). What type of information is conveyed in the body movements of solo
musician performers? Journal of Human Movement Studies, 6, 279-301.
Howard, V.A. (1982). Artistry: the work of artists. Hackett, Indianapolis.
Noy, P. (1993). How Music Conveys Emotion. In S. Feder, R.L. Karmel and G.H. Pollock (Eds.).
Psychoanalytic Explorations in Music. International Universities Press, Madison, Connecticut.
Radocy, R.E. (1989). A review of Singing and Self: The Psychology of Performing (by S.E.
Stedman). Council for Research in Music Education, 100, 23-26.
Saunders, T.C. & Holahan, J.M. (1997). Criteria-specific rating scales in the evaluation of high school
instrumental performance. Journal of Research in Music Education, 45, 259-270.
Sundberg, J. (1982). Speech, song and emotions. In M. Clynes (Ed.). Music, Mind and Brain: the
neuropsychology of music. Plenum, New York.
Wapnick, J., Darrow, A.A., Kovacs, J. & Dalrymple, L. (1997). Effects of physical attractiveness on
evaluation of vocal performance. Journal of Research in Music Education, 45(3), 470-479.
Back to index
Proceedings Abstract
FORMAL APPROACHES TO THE EVALUATION OF MUSIC COMPOSITIONS OF CHILDREN BY
EXTERNAL JUDGES: RATING SCALES, RUBRICS AND OTHER TECHNIQUES
pwebster@nwu.edu
Background:
Aims:
The aim of this paper is to provide an overview of the major research-based and
conceptual work that is available on the evaluation of children's music
compositions which use rating scales, rubrics, checklists and other
psychometric techniques. Open-ended items will also be considered in this
analysis. Problems and opportunities of these approaches will be summarized.
Work from the international literature will be reviewed including the studies
from the British Journal of Music Education and the Australian literature. Work
from various state committees in the United States will be included as will
evaluation efforts like Harvard's Project Zero and the National Assessment of
Educational Progress. The emphasis of the review will be to spotlight the more
aesthetic-based assessment efforts.
Main contributions:
see above
Implications
Data from this review will be useful in designing new assessment tools and for
evaluating their effectiveness more rigorously.
Back to index
Proceedings paper
...language must have evolved out of some prior system, and yet there does
not seem to be any such system out of which it could have evolved.
(Bickerton, 1990, p. 8)
Bickerton proceeds to define the properties of human languages and to illuminate the means by which
speech is acquired, as well as to examine claims for linguistic abilities in other species such as the
chimpanzee.
Like Pinker (1994) and Juscyk (1997), Bickerton focuses especially on the properties of generative
grammar which are universal across languages, as formulated in the work of Chomsky (1975). In this
linguistic tradition, tools for the analysis of the syntactical, semantic and lexical elements of language
have been developed which explore convincingly the cognitive scaffolding through which language is
acquired and the structures on which its employment depends. The biological basis of language in
adapted respiration (Deacon, 1997, pp. 247-252), and its acoustical components (Laitman, J., et al,
1990) and antecedents in animal communication (Scherer, 1992) have, by comparison, received less
attention within this tradition. An outcome of this divergence of methodologies is Pinker's (1997)
conclusion that "music...shows the clearest sign of not being (an adaptation)": a hypothesis quite at
variance with that supported within the fledgling field of biomusicology (Wallin, 1991; Bannan, 1997;
Skoyles & Vaneechoutte, 1998; Cross, 1999; Wallin, Merker & Brown, 2000) that, to quote Tomatis
(1991), 'music is the substrate of language'.
Bickerton (990) argues that speech allows humans to exchange representations of the world which
language permits us to formulate: the feat of representation is as significant as communication. The
latter may be present, even elaborate, in a variety of species of monkey, bird and cetacean; but the
former with its empowerment of self-consciousness, is exclusive to our species.
This paper seeks to question assumptions that representation is confined to syntactic components of
language, and to assert that, by contrast, meaning can be both represented and communicated by
features of language which draw on musical perception and production.
Whilst ritual and co-ordinated behaviour varies enormously between cultures, the capacity for
simultaneous action moderated by musical response would seem to be an inseparable aspect of this
biological inheritance (Merker, 2000). It has its parallels in the animal kingdom:
insists that 'evolution has favoured the rules themselves and not their consequences'. His analysis
provides criteria for rule-based, species-specific behaviour as genetically and culturally evolved which
define the properties which need to be tested to illuminate such claims. His reasons are as follows,
somewhat adapted to illustrate their applicability to human vocality:
1 efficiency
2 consistency
3 adaptability
4 dependence
• efficiency: the rules are simpler than the behaviours they generate
• consistency: protection against the consequences of 'rogue' mutation
• adaptability: small changes in the rules cause big changes in behaviour
• dependence: sensitivity to the group protects the individual
Bet
Bate
Bit [Birt/Burt]
Beet
Boot (scottish) [Bute]
Failure to select and perform the correct vowel sound requires the listener to rely more on context to
correctly decipher what is being communicated. One can see here a separation of roles between the
meaning derived from the sound of words themselves and that derived from context dependent on
grammatical relationships.
Experiments with sound and meaning
Bickerton (1990) adopts instances from his studies of pidgin languages to illustrate the limitations of
meaning which arise where primitive syntactic structures fail to embed one phrase within another. His
purpose in doing so was to test structural assumptions in Premack's (1985) contribution to the
'continuity paradox' debate, arising from his studies of chimpanzee communication, wherein he
extrapolated a hypothetical 'inter-language' which might bridge the gap between 'animal'
proto-language and (human) 'true language'.
On paper, Bickerton's resolution of ambiguities through the use of conjunctions and reflexives
conveys clearly the dependence of meaning on the certainties which are provided by the evolved
capacities of advanced grammar, and we can marvel at the engineering achievement this view of
language represents. But what of the pidgin user, himself probably illiterate and oblivious of such
means of parsing the speech he is uttering?
Bickerton lays down a challenge with his statement 'without grammatical items it would be
impossible' to determine what is meant. Let's take the alternatives available; the two Bickerton cites
plus two other potential 'performances'. Can intonation do the job of 'grammatical items'? If so, how
can the phrase
John tell Bill boil milk
be inflected in spoken language to yield these meanings?:
1 John told Bill to boil the milk
2 John told Bill the milk was boiling
Experimentation with a listener illustrates immediately that phrases 1 and 3 could be unambiguously
conveyed through intonation. Phrases 2 and 4 are more problematic, until the word order is changed,
at which point
John tell Bill milk boil
could be inflected to convey both meanings clearly.
Bickerton anticipated this change of word order, citing it as a property of 'the mechanisms of true
language, even without grammatical items'. However, attempts to convey the meanings of phrases 2
and 4 without re-ordering leads to further distinction of meaning, through the emphasis placed on, say,
boil (as opposed to roast or whip) and milk (as opposed to oil or whisky). The possibility also exists
of forming various questions:
Did John tell Bill to boil the milk? etc.
through the convention of creating a contour for the phrase which rises in pitch.
Further experiments can be designed to yield meaning out of nonsense. In addition to two old
favourites of the teacher of English punctuation (and what is punctuation but the notation of
characteristics of timing and tone of voice?), it seems appropriate to borrow from Pinker (1994) an
example which provides the means to question his own subsequent position regarding the
evolutionary relationship of musical behaviour and language.
Intonation vs syntax
an experiment which the reader is invited to replicate
1 Smith where Jones had had had had had had had had had had had his teacher's approval
2 There's too much gap between George and and and and and Dragon
(traditional English punctuation-test examples)
Conclusions
In Wallin et al (2000), a range of researchers in animal communication, anthropology, linguistics,
music theory and neurology considered different aspects of The Origins of Music. Bickerton made his
own, cautious contribution, but not before exposure to the influence of others involved:
Until I heard the stunning presentation by François-Bernard Mâche, I would
probably have said, by analogy with language, that music was unlikely to be
in any sense a continuation of nonhuman song or any other form of
behaviour. After I heard Mâche's recordings of a vast range of different
traditions in human music, each one accompanied by an eerily similar effect
produced by an avian, mammalian, or even amphibian species, I was not so
sure. If anyone could produce such a performance with linguistic material, I
would be tempted to convert to continuism overnight.
(Bickerton, in Wallin et al, 2000, p. 161)
Can Bickerton himself being moving towards a position in which he embraces evolved, musical
vocalisation as a resolution of the 'continuity paradox' he defined?
Bibliography and References
Bannan, N. (1997) 'The consequences for singing teaching of an adaptationist approach to vocal
development', in Proceedings of the First International Conference on Music in Human Adaptation,
VirginiaTech/MMB Music Inc. (pp. 39-46)
Bickerton, D. (1990) Language and species Chicago: University of Chicago Press
Blacking, J (1987) 'A commonsense view of all music' Cambridge: Cambridge University Press
Campbell, P S (1991) Lessons from the world New York: Schirmer
Chomsky, N. (1975) Reflections on Language New York: Pantheon
Cross, I. (1999) 'Is music the most important thing we ever did? Music, development and evolution',
in Music, Mind and Science (Suk Won Yi), Ed., Seoul, Korea: Seoul National University Press
Dargie, D (1988) Xhosa music Cape Town: David Philip
Deacon, T. (1997) The symbolic species London: Allen Lane
Gould, S (1977) Ontogeny and Phylogeny Cambridge, Harvard University Press
Jürgens, U (1992) 'On the neurobiology of vocal communication', in Papousek, H. ,
Jürgens, U. and Papousek, M. Nonverbal vocal communication Cambridge University Press
Juscyk, P. (1997) The discovery of spoken language Cambridge, Mass.: MIT Press
Laitman, J., Reidenberg, J., Gannon, P., Johanson, B., Landahl, K. & Lieberman, P.(1990)'The Kebra
hyoid: what can it tell us about the evolution of the hominid vocal tract?' American Journal of
Physical Anthropology 18, 254
Locke, J (1993) The child's path to spoken language Cambridge, Harvard University Press
Mithen, S. (1996) The prehistory of the mind London: Thames & Hudson
Pinker, S. (1994) The Language Instinct London: Allen Lane
_________(1997) How the Mind Works New York: Norton
Premack, D (1985) ''Gavagai!' or the future history of the animal language controversy' Cognition 19:
207-96
Scherer, K (1992) 'Vocal affect expression as symptom, symbol and appeal' in Papousek, Jürgens and
Papousek Nonverbal vocal communication Cambridge University Press
Vaneechoutte, M. and Skoyles, J.R. (1998) 'The memetic origin of language: modern humans as
musical primates.' in Journal of Memetics - Evolutionary Models of Information Transmission, 2.
http://www.cpm.mmu.ac.uk/jom-emit/1998/vol2/vaneechoutte_m&skoyles_jr.html
Stewart, I. (1998) Life's other secret: The new mathematics of the living world London: Allen Lane
Tobias, P. (1987) 'The brain of Homo habilis: a new level of organisation in cerebral evolution'
Journal of Human Evolution 16, 741-61
Tomatis, A.A. (1991) The conscious ear Barrytown, NY: Station Hill Press
Tumat, G. (1992) Vocal solos in Tuva: Voices from the Land of the Eagles Leiden: Pan Records
Wallin, N. (1991) Biomusicology New York: Pendragon
Wallin, N., Merker, B. and Brown, S. (2000) The origins of music Cambridge, MA: MIT Press
Woodward, S. (1992) 'The transmission of music into the human uterus and the response to music of
the human fetus and neonate'. Unpublished doctoral dissertation, University of Cape Town, South
Africa.
Back to index
Proceedings abstract
Marcel R. Zentner
Department of Psychology
University of Geneva
CH - 1205 Geneva
SWITZERLAND
Background:
1. Scholars tend to rely upon labels of every-day life emotions when studying emotional responses to music.
However, it is unclear whether such emotional labels are musically plausible.
2. While past research has systematically examined the efficacy of different film excerpts to induce emotion,
comparable research with music excerpts is still fragmentary.
Two studies with participants from diverse populations and age groups were conducted to empirically examine the
nature, structure, and organisation of musical emotions and to identify music excerpts that are effective in eliciting
certain emotions. The aim of Study 1 was to provide an empirical basis for developing a lexicon of musically plausible
affect terms. 138 subjects had to rate 149 affect terms (derived from a pre-study) in regard to the frequency with which
these states were both expressed and induced by their preferred style of music (classical, pop-rock, or techno). Rarely
occurring states were excluded yielding a reduction of approx. 80 emotion labels. The aim of the second study was (a)
gather comparative ratings on the emotional effects of a variety of music excerpts, (b) examine the basic dimensions of
musical emotions, and (c) on this basis, refine our lexicon of musically plausible affect terms. Subjects (N=184)
listened to 20 excerpts of either classical or rock-music and rated them on the new set of emotion words derived from
Study 1.
Results:
Study 1: First, occurrence of emotions, be they expressed or induced, differs according to the musical style. Second,
across all emotions and musical styles, there is a considerable difference between expression and arousal of emotion.
Third, there are also significant interactions between emotion clusters, musical styles and emotion modality (expressed
vs. induced).
Study 2: Factor analyses were carried out to identify a number of basic "musical emotions". Furthermore, the
proposenity of the music excerpts to arouse different emotions is described.
Implications:
Implications are discussed for the development of a taxonomy of "musical emotions", a scale to measure "musical
emotions", and for the choice of music excerpts to be used for emotion induction.
Back to index
Proceedings paper
i Introduction
During the last forty years a number of models quantifying auditory roughness have been proposed
and have been employed in a series of studies, demonstrating a relatively low degree of predictive
power. Correct estimation of the degree of roughness of a pair of sines or of an arbitrary spectrum is
necessary before any claimed link between roughness and some acoustic, perceptual, or musical
variable can be tested, as well as an important step towards the difficult task of quantifying
inharmonicity.
Roughness is one of the perceptual attributes of amplitude fluctuation. Musical sounds are represented
by vibration signals whose characteristics, practically always, change with time. Amplitude
fluctuations (variations of a signal's amplitude around some reference value) constitute one such
change and can be placed in three broad perceptual categories depending on the amplitude fluctuation
rate. Slow amplitude fluctuations ( 20 per second) are perceived as loudness fluctuations referred
to as beating. As the rate of amplitude fluctuation increases, the loudness appears to be constant and
the fluctuations are perceived as 'fluttering' or roughness. The roughness sensation reaches a maximal
strength and then gradually diminishes until it disappears ( 75-150 amplitude fluctuations per
second, depending on the actual vibration frequency). These distinct perceptual categories do not
reflect any fundamental qualitative differences in the vibrational frame of reference and should be
approached as alternative manifestations of a single physical phenomenon.
If we accept that the ear performs a frequency analysis on incoming signals, the above perceptual
categories can be related directly to the bandwidth of the analysis filters. For example, in the simplest
case of amplitude fluctuations resulting from the addition of two sine signals, the following statements
represent the general consensus: If the filter bandwidth is much larger than the fluctuation rate then a
single tone is perceived, either with fluctuating loudness (and sometimes pitch) or with roughness.
And if the filter bandwidth is much smaller than the fluctuation rate then a complex tone is perceived,
to which one or more pitches can be assigned but which, in general1, exhibits no loudness fluctuation
or roughness.
In the first case, the degree, rate, and shape (sine / complex) of amplitude fluctuations are parameters
that are manipulated by musicians of various cultures, exploring the beating and roughness sensations.
Manipulating the degree and rate of amplitude fluctuation helps create a shimmering (i.e. Indonesian
gamelan performances) or rattling (i.e. Bosnian ganga singing) sonic canvas that becomes the
backdrop for further musical elaboration. It permits the creation of timbral (i.e. Middle Eastern mijwiz
playing) or even rhythmic (i.e. ganga singing) contrasts through gradual or abrupt changes between
fluctuation rates and degrees2.Whether those contrasts are explicitly sought for (as in ganga singing,
mijwiz playing, or even the use of 'modulation' wheels/pedals in modern popular music) or happen
more subtly and gradually (as may be the case in the typical chord progressions/modulations of
Western music), they form an important part of a musical tradition's expressive vocabulary.
Important clues regarding the ways various musical cultures approach roughness and other perceptual
attributes of amplitude fluctuation may be found through an examination of musical instrument
construction and performance practice. Additionally, the different choices among musical traditions
with regards to vertical sonorities (i.e. harmonic intervals, chords, etc.) can reveal a variety of
attitudes towards the sonic possibilities opened up by the manipulation of amplitude fluctuation in
general and the sensation of roughness in particular.
Similarly to the sensation of beats, the sensation of roughness has often been associated with the
concepts of consonance/dissonance, whether those have been understood as aesthetically loaded
(Rameau, Romieu, in Carlton, 1990; Kameoka & Kuriyagawa, 1969a; Terhardt, 1974a&b, 1984) or
not (Helmholtz, 1885; Hindemith, 1945; von Békésy, 1960; Plomp & Levelt, 1965.) Some of the
studies addressing the sensation of roughness have occasionally (i.e. Stumpf, 1890, in von Békésy,
1960: 348; Vogel, 1993; etc.) been too keen to find a definite and universally acceptable justification
of the 'natural inevitability' and 'aesthetic superiority' of Western music theory. This has prevented
them from seriously examining the physical and physiological correlates of the roughness sensation.
On the contrary, Helmholtz, the first researcher to tackle theoretically and experimentally the issue,
concluded that:
Whether one combination [of tones] is rougher or smoother than another depends solely
on the anatomical structure of the ear, and has nothing to do with psychological motives.
But what degree of roughness a hearer is inclined to ... as a means of musical expression
depends on taste and habit; hence the boundary between consonances and dissonances
has frequently changed ... and will still further change... (Helmholtz, 1885: 234-235.)
The present study adopts this position and treats the sensation of roughness simply as a perceptual
attribute that can be manipulated through controlling the degree and rate of amplitude fluctuation,
providing means of sonic variation and musical expression.
ii Existing roughness estimation models and their application
The two principle studies that have systematically examined the sensation of roughness (von Békésy,
1960: 344-354, Terhardt, 1974a;) have, to a large extend, been ignored by the majority of models
quantifying roughness / smoothness. Numerous such models have been proposed (Plomp & Levelt,
1965; Kameoka & Kuriyagawa, 1969a&b; Hutchinson & Knopoff, 1978), and have been employed in
later studies (Bigand et al., 1993; Vos, 1986; Dibben, 1999;) demonstrating a low degree of agreement
between predicted and experimental data. Dibben, for example, found no correlation between sensory
consonance (smoothness), as predicted by the Hutchinson & Knopoff model, and the completeness /
stability ratings of the final bars of selected musical pieces. She concluded that sensory consonance /
dissonance is not a good measure of musical stability / tension, or completeness / incompleteness,
interpreting her conclusion as supporting the need for an alternative model of consonance /
dissonance. Her study is a good example of an attempt to load the concept of consonance with
meanings that go far beyond the scope of the model employed. It basically demonstrates that the
degree of smoothness of a vertical sonority is not a good measure of its sense of stability or
completeness. This result could have been anticipated since the 'sense of stability' of any given event
may be highly related to the events that precede it, while roughness models calculate the roughness of
isolated vertical sonorities. The surprising fact is not the results of Dibben's study but the implied
expectations that: a) a measure of 'smoothness' could correlate with multidimensional and highly
temporal and context dependent notions such as stability or completeness, and b) any model of
consonance / dissonance should map to stability / tension responses. It appears that the concept of
consonance (even more than that of timbre - see Bergman, 1990: 93) has been a 'wastebasket' of all
kinds of aesthetic and evaluative judgments in music, as well as the box of treasures for justification
arguments regarding general stylistic trends or specific compositional decisions.
There are, however, many reasons (other than those posited by Dibben) for the revision of the existing
models quantifying roughness, some of which have already been pointed out by other researchers and
some of which will be addressed by the present study.
Vos (1986) pointed out a number of inconsistencies in the Plomp & Levelt and the Kameoka &
Kuriyagawa models3, with regards to the critical bandwidth model derived from loudness summation
experiments (Zwicker et al., 1957). In his study, Vos suggested some adjustments that would bring the
predictions of all three models to a better agreement. Hutchinson & Knopoff's model has been
criticized (Bigand et al. 1996) for its relatively crude representation of the nonlinear relationship
between the amplitude fluctuation rate corresponding to maximum roughness and the frequency of the
lower of the interfering sines.
A recent model (Sethares, 1998) has the advantage of being based on a large number of direct
smoothness / roughness experimental ratings of pairs of sines, fitting a function that accounts for the
above mentioned nonlinear relationship. Sethares' model offers the best theoretical fit to the observed
relationship between roughness, frequency separation of the two interfering sines, and frequency of
the lower sine. In this model, the experimentally derived roughness curves (i.e. graphs plotting the
perceived roughness of a pair of sines with equal amplitudes as a function of their frequency
separation) are essentially interpreted as positively skewed gaussian distributions:
Eq. (1)
where x represents an arbitrary measure of the frequency separation (f2 - f1), while b1 & b2 are the
rates at which the function rises and falls. Using a gradient minimization of the squared error between
the experimental data (averaged over all frequencies) and the curve described by Eq. (1) gives: b1 =
3.5 and b2 = 5.75. For these values, the curve maximum occurs when x = x* = 0.24, a quantity
interpreted as representing the point of maximum roughness. To account for the non-linearity in the
relationship between the fluctuation rate corresponding to maximum roughness and the frequency of
the lower sine, Sethares introduced the following modification, which includes the actual frequency
spacing ( ) and the frequency of the lower component ( ), into the calculation of roughness
(R):
Eq. (2)
The parameters s1 and s2 allow the function to stretch / contract with changes in the frequency of the
lower component so that the point of maximum roughness always agrees with the experimental data.
A least square fit gave s1 = 0.0207 and s2 = 18.96.
Eq. (3)
b) The contribution of SPL to the sensation of roughness of AM-tones (Eq. (4)) is negligible,
especially when compared to the contribution of the degree of amplitude fluctuation (Eq. (3)):
c) The roughness of a beating tone pair (f1, f2; A1, A2), , is related to the roughness of an AM
tone ( ; , ), , as follows:
Eq. (5)
Eqs. (3), (4) & (5) illustrate that all existing models calculating the roughness of pairs of sines (Plomp
& Levelt, 1965; Kameoka & Kuriyagawa, 1969a & 1969b; Hutchinson & Knopoff, 1978, Sethares,
1998), have largely underestimated the importance of amplitude fluctuation depth5 (i.e. relative
amplitudes values), while overestimating the importance of SPL (i.e. absolute amplitude values.)
Combining Eqs. (2), (3), (4), & (5) gives the new model for the calculation of the roughness, R, of
pairs of sines (with frequencies f1 & f2, amplitudes A1 & A2 [ ], and zero initial phases):
Eq. (6)
The roughness of complex spectra with more than two sine components will be calculated by adding
up the roughness of the individual sine-pairs. Although von Békésy (1960: 350-351) has suggested
that the total roughness can be less than the sum of the roughness contributions of the individual
sine-pairs, depending on the relative phase of the respective amplitude fluctuations, initial
experiments indicated otherwise confirming previous experimental results (Terhardt, 1974a.)
appropriate roughness range. A second group of subjects was asked to rate the stimuli on a
'dissonance' scale, outlined by the adjectives 'dissonant' - 'not dissonant'. No comparison stimuli were
included in this case, since the goal was to get at the assumed cultural associations of the terms
'consonance' and 'dissonance.' Subjects were able to listen to the stimuli as many times as needed
before making their decision. Preliminary analysis of the results indicates that the roughness ratings
correlate with the roughness of the stimuli as estimated by the proposed model (Eq. (6)) and that the
responses of the first group of subjects correlate with those of the second group.
v Summary - conclusions
All existing models quantifying roughness have demonstrated limited predictive power due, for the
most part, to:
a. an underestimation of the contribution of the degree of amplitude fluctuation (i.e. relative
amplitudes values of the interfering sines) to roughness and
b. an overestimation of the contribution of SPL (i.e. absolute amplitude values of the interfering
sines) to roughness.
With the roughness calculation model introduced by Sethares (1998) (see Eq. (2) above) as a starting
point, a new model has been proposed (Eq. (6)), which includes a term that accounts for the correct
contribution of the amplitudes of interfering sines to the roughness of the resulting complex tone. This
term is based on existing experimental results (Terhardt, 1974a, von Békésy, 1960) with an additional
adjustment that accounts for the important quantitative difference between amplitude modulation
depth and degree of amplitude fluctuation (Vassilakis, in preparation.) The roughness of complex
spectra with more than two sine components is calculated by adding up the roughness of the
individual sine-pairs.
The final model has been tested experimentally and has been applied to the testing of a hypothesis
linking the consonance hierarchy of harmonic intervals within the Western chromatic scale to
variations in roughness degrees. Analysis of the pilot data indicates that, for isolated harmonic
intervals, the proposed roughness estimation model agrees well with observation.
End Notes
1)The beating and roughness sensations associated with certain complex tones are essentially
understood in terms of sine-component interaction within the same critical band. However, studies
(von Békésy, 1960: 577-590; Plomp, 1966) examining the beating and roughness of mistuned
consonances (i.e. sine-pairs with frequency ratio slightly removed from a small integer ratio) indicate
that these sensations arise even when the added sines are separated by frequencies much larger than
the critical bandwidth. The experimental results of von Békésy and Plomp challenge earlier
explanations of this phenomenon that were based on the nonlinear creation of combination tones
(Helmholtz, 1885: 197-211) or harmonics (Wegel & Lane, 1924, in Plomp, 1966: 463; Lane, 1925)
inside the ear. Although their final interpretations differ, both studies link the beating and roughness
sensations of mistuned consonances directly to the complex signal's amplitude-fluctuations.
2) Changes in the rate of amplitude fluctuation exploit the differences not only between the beating
and roughness sensations but also between various degrees of roughness. Depending on the rate of
fluctuation, three 'shades' of roughness have been distinguished (von Békésy, 1960: 354.)
Approximately 45 fluctuations per second give roughness of an intermediate character, lying between
that of slower rates ("R" character) and that of higher rates ("Z" character.)
3) The Plomp & Levelt (1965) model underestimates roughness because of its bias against a power
function for roughness, while the Kameoka & Kuriyagawa (1969a&b) model overestimates roughness
because of its bias for a power function for roughness. The fact that some sort of power function
(although not exactly the one relating amplitude to loudness) is called for is supported by the
relationship between the mechanisms associated with the sensations of roughness and loudness. (Von
Békésy 1960: 344-350.)
4) If two sines with different frequencies: f1, f2, ( ) and amplitudes: A1 and A2
( ) are added together, the amplitude of the resulting signal will fluctuate between a maximum
(Amax = A1 + A2) and a minimum (Amin = A1 - A2) value. The degree of amplitude fluctuation (Daf) is
defined as the difference between the maximum and minimum amplitude values relative to the
References
von Békésy, G. (1960). Experiments in Hearing. New York: Acoustical Society of America Press
(1989.)
Bigand, E., Parncutt, R., and Lerdahl, F. (1996). Perception of musical tension in short chord
sequences: The influence of harmonic function, sensory dissonance, horizontal motion, and musical
training. Perception & Psychophysics, Vol. 58: 125-141.
Carlton Maley,V. Jr. (1990). The Theory of Beats and Combination Tones, 1700-1863. [Harvard
Dissertations in the History of Science. O. Gingerich, editor.] New York: Garland Publishing Inc.
Dibben, N. (1999). The perception of structural stability in atonal music: The influence of salience,
stability, horizontal motion, pitch commonality, and dissonance. Music Perception, Vol. 16(3),
265-294.
Helmholtz, H. L. F. (1885). On the Sensations of Tone as a Physiological Basis for the Theory of
Music (2nd edition.) Trans. A. J. Ellis. New York: Dover Publications, Inc. (1954.)
Hindemith, P. (1945). The Craft of Musical Composition; Book 1 (4th edition). New York: Associated
Music Publishers Inc.
Hutchinson, W. and Knopoff, L. (1978). The acoustic component of Western consonance. Interface,
Vol. 7, 1-29.
Kameoka, A. and Kuriyagawa, M. (1969a). Consonance theory, part I: Consonance of dyads. JASA,
Vol. 45(6): 1451-1459._ (1969b). Consonance theory, part II: Consonance of complex tones and its
Plomp, R. and Levelt, W. J. M. (1965). Tonal consonance and critical bandwidth. JASA, Vol. 38:
548-560.
Terhardt, E. (1974a). On the perception of periodic sound fluctuations (roughness.) Acoustica, Vol.
30: 201-213._ (1974b). Pitch, consonance and harmony. JASA, Vol. 55(5): 1061-1069.
_ (1984). The concept of musical consonance: A link between music and psychoacoustics. Music
Perception, Vol. 1(3): 276-295
Vogel, M. (1993). On the Relations of Tone. Bonn: Verlag fur systematische Musikwissenschaft,
GmbH [Lehre von den Tonbeziehungen, 1975. Bonn, trans. V. J. Kisselbach.]
Vos, J. (1986). Purity ratings of tempered fifths and major thirds. Music Perception, Vol. 3(3):
221-258. <
Zwicker, E., Flottorp, G., and Stevens, S. S. (1957). Critical band-width in loudness summation.
JASA, Vol. 29(5): 548-557.
Back to index
Proceedings paper
different interpretations of what fear was and how it was created. One possible reason for
the differences is that in the case of singer B, he was thinking about the surprise within the
Erlkonig itself, where the child suddenly, in a fearful state, calls out for his father. Singers A
and C said that they focused more on a generic expression of fear, and did not have the
element of surprise in mind when singing.
Whilst there was a high correlation for all states, especially happiness, it is worth mentioning
that all the singers noticed discomfort in their throats when singing in the angry condition.
Upon analysis of the data, it seems that in this state the muscle platysma is involved. Since
the singers were wearing throat straps, the two electrodes were forced downwards as the
platysma was brought into action, causing the discomfort. The sung effect was to create a
rougher, less vibrant tone.
Interview data can be touched upon here to show that all the singers believed that their
emotions were 'authentic' in that they were constructed out of memories of those states.
However, they were all able to recognise that in their different interpretations some were
more 'successful' than others. The quantitative data so far does not show any statistical
differences in these more or less successful interpretations. Thus, it is only possible to begin
to theorise about what might allow for the differentiation between the interpretations to occur.
Form the commentaries of the singers and the viewers, it appears that there are qualitative
differences in the intensity of how the muscles are used. That is to say that if the singer is
clearly working with a stronger inner intention the effect is more successful.
5. Implications: Within the signifying process of singing, it is evident that facial behaviour
plays an important role when communicating. The impact of the singer's awareness of
his/her vocal expression and vocal self-perception clearly can be gained through the
awareness of facial behaviour.
Paper
Intentions
The current study investigates empirically how voice, emotion and facial gesture might be
connected in Western classical singing. The need for such a study arose out of an
awareness of several issues. Firstly, facial expression in singing is often discussed
anecdotally, but has rarely been subjected to any empirical analysis. Secondly, singing
teachers often ask singers to 'sing with the eyes', 'make a smile' and so on, to achieve
technical ends in singing. Thus, it was considered important to know whether these different
facial movement expressions do in fact affect the quality of the produced vocalisation.
Thirdly, given the second point, it seemed important to know if there was an objective
correlation between the facial gesture and the sound made - in terms of its expressive
intention. From a perceptual perspective, for instance, does the audience understand more if
the singer looks as well as sounds 'sad', and what do these emotions objectively look and
sound like? Fourthly, it is well known that singers and actors often show empathy with an
emotional state, without entering into it completely or 'authentically'. In fact, singers 'act' out
emotions. It was a final intention of this work to explore the extent to which the emotional
expressions requested were perceived as being authentic by both the performer him/herself
and the audience. It was possible to match these data against 'norms' for emotional
expression in the face by comparing the profiles of the singers with measurements of facial
formation/musculature arrangement for real emotions recorded by Ekman and Friesen
(1969) from still photographs taken when people were subjected specific emotion eliciting
situations.
Background
According to Manen (1974), historically, Bel Canto vocal technique was a musical
exploration of the different vocal expressions for the different emotional states. So, in a
practical way there has been an exploration of vocal emotion in singing, but few systematic
empirical studies. Of the existing empirical work, key research has been undertaken by
Kotlyar and Morozov (1976) and Sundberg (1980) who have demonstrated that when asked
to sing with the emotional intentions of joy, sorrow, anger, fear, and no emotion, very
different spectrographic analyses of the vocal sounds emerge. In joyful, for instance, there is
a much higher frequency than the other emotions, the tonal course of the pitches is
moderate, both up and down, the tonal colour comprises many overtones and the volume is
loud. In sadness, there is a much lower frequency produced, the tonal course of the pitches
is downward, the tonal colour is very restricted, with few overtones and the volume is soft.
Given the complete absence of research precedents for what happens to the face in singing,
it was hypothesised that the face would differ greatly according to the emotion, with
happiness involving very different gestures to sadness, as it was felt that there would be a
correlation between size of expression and loudness of sound produced. These hypotheses
was in part based on intuitions from everyday observations, but also emerged out of drawing
parallels with the work of Davidson (1994) who discovered that when a pianist was asked to
perform with different emotional intentions, the louder he played, the larger his movements
were in order to produce the sounds.
In the general emotional expression literature, key contributions to the field have been made
by Ekman and Friesen (1969) and their co-workers. Whilst they largely support Davidson's
findings, it is worth noting that in terms of musculature, fewest muscles are involved in
happiness than the other emotions.
Linking these general research findings about musculature to singing technique, it is
important to note that in classical singing, the intention is generally to keep the larynx free, to
allow for optimum vibration. Additionally, the singer is taught to use the resonating cavities of
the face and the pharynx. To achieve this, vowel sounds are often modified from those used
in everyday speech, with the mouth opening rather more at the back than the front
(Helmsley, 1998). These factors may have an impact on how the face works when the highly
trained singer is asked to produce an emotional expression. In fact, there may even be some
source of conflict, with natural facial expression involving a muscle in one direction which
may need to function in another way for the sake of optimum vocalisation of the same
emotion when interfaced with the technique of singing.
Fonagy (1962) undertook some pioneering research examining glottal behaviour during
emotional speech. He found that very different glottal behaviours for the different emotions,
with the ventricle folds being further apart in sadness or tender whispering, and the laryngeal
ventricle squeezed together for pressed phonation in anger, for instance. But, if, as
Helmesley (1998) argues, the emotions have to come first from the mind (thought of anger),
then through the eyes (visualise to realise the emotion) and eventually into the voice, it is
important to examine whether the facial musculature leads to the sound production or if the
sound production causes the facial expressions. The issues of learning, formation or innate
expression of emotion require careful consideration, for contrarily, Fonagy (1962) refers to
the glottal profiles created in his study as 'pre-conscious expressive gestures'.
Singers are, of course, actors, often asked to characterise different people or emotional
states in their work. They are people who apparently learn to 'fake' behavioural moods.
Runeson and Frykholm (1983) believe that faked emotional states and expressions are
formed in a slightly different manner to genuine ones, and thus an expert viewer can
distinguish between the two. Of course, in singing, we accept that in an operatic
characterisation there is an element of dramatic play, and so perhaps even expect the
gestures and expressions used to be 'fake' or 'larger' than in real life. There is an expectation
which needs to be fulfilled. It seems critical for these reasons to compare data from singers
producing 'faked' emotional facial expression and sounds with those of genuine ones.
Alongside all of the research described above, it is important to note that recent
psychodynamic, phenomenological and therapeutic research has approached singing as a
creative musical experience occurring in both time and space, and existing precisely at the
threshold between 'self and the world' - a resonant field where self and other may experience
feelings of oneness and wholeness, a channel through which one expresses and
communicates something from the interior world. In other words, singing provides a bridge
between the inner world of mood, emotion, image, thought, experience and the outer world
of relationship, discourse and interaction (Salgado, 1999, Draffan, 2000). Thus, finally, to get
a deeper insight into the issue of emotion versus faked emotion, it is important to interview
performers to ask how they feel when producing these emotions, and to explore audience
reactions to the facial gestures used.
Methods
Recordings
Two male and one female singer (mean age 32 years) with an average professional
experience of 8 years singing in solo oratorio, opera and recital work were used as the
subjects of investigation. They were asked to prepare the musical phrase "Mein Vater, mein
Vater" from the Lied Elrkonig by Schubert for recording. This phrase was chosen since the
word could apply to almost any situation of emotional state. The musical line itself both rises
and falls within a limited range of a third, so does not make particular technical demands on
the singer, and so again leaves interpretative possibilities open. Recordings were made in
five different conditions:
Neutral: this was based on Fonagy examination which always used the emotional state of
neutral as a state against which to measure other emotional states. It acted as a base-line
measurement.
Happy.
Sad.
Fearful.
Angry.
These are the four fundamental emotions which are now well-reported as being the most
strong and clearly recognisable in many contexts (Ekman and Friesen, 1969)
For the recording, it was necessary to video tape the singers in full-face close up.
Measurements of facial muscle activity were taken by digitising and tracking muscle activity
over time with a specially designed software package. To track the movements it was
necessary to mark the muscles to be mapped with 25 colours circular stickers, 12 on each
side of the face and an anchor marker on the bridge of the nose. For the vocal recordings,
the betacam sound channel was used to input a spectogram through software which allowed
for an immediate plot of the harmonic spectrum and the singer's formant. Additionally, an
electrolaryngograph was used to collect data about the opening and closing of the glottis.
This was recorded by placing two electrodes on the outside of the larynx. These were kept in
place using an elasticated neck strap.
After being videotaped, the singers collectively, along with three other viewers watched the
recordings to assess the success of the tasks. From between two and four different attempts
at each emotional state made by each singer, the viewers assessed which were perceptually
the most/least authentic, and these bi-polar pairs were used as sources of data for analysis.
The singers were also interviewed about their views on emotion and singing and these
qualitative comments were used to help interpret the data.
Results
Sound analysis
Emotion - Neutral. In this condition, the singers formant was not dominant, and the voice is
weak in both amplitude and harmonics.
Emotion - Sadness. Relatively low harmonic content to that in the other
emotions. With singer B's voice being far weaker in both amplitude and harmonics than in
his other examples. In all cases, the singer's formant is not particularly dominant (the strong
harmonics between about 2500 and 3500 Hz).
Emotion - Fear. Singer's A and C both show very low harmonic content, with only the first
two harmonics coming out strongly in singer A's case. Singer C is totally lacking the singer's
formant, and singer A's is very weak, the tendency to half voice/ whisper to create the
impression of fear. For singer B, however, there is quite strong harmonic content, indicating
a much fuller voiced interpretation.
Emotion - Happy. Singer's C and A still show a weaker harmonic content, though the
2500-3500Hz area is stronger than in the fearful example. Singer B's plot, however, is not
greatly different from that for fear.
Emotion - Anger: This showed the most dramatic change and both amplitude and harmonic
content are considerably greater for singers A and C, and slightly greater for singer B. Singer
C's singers formant still seems relatively weak - but the overall amplitude for all plots is
considerably weaker than for either of the
Other singers. (That is, singer C is singing rather quieter).
Visual analysis
Emotion - Neutral. In this condition, the measurement of the movements show a very limited
range of muscle activity, with a high degree of correlation between the three singers' use of
their faces in this condition.
Emotion - Sadness. Corrugator muscles are used extensively in this condition, with singer B
showing the most movement activity here, singer A, a moderate range of activity, and singer
C the least activity. There is a correlation between individual's data for the bi-polar pairings
of sadness recordings, showing that whether authentic or inauthentic emotion is expressed,
the same muscles are involved.
Emotion - Fear. Levator labu and frontalis muscles are used here, but the degree of
involvement varies according to individual and bi-polar interpretation. For instance, in Singer
B, both lots of muscles are equally involved in both interpretations of fear. For singer C,
there is limited activity in both. For singer A, there is little frontalis activity, but more in her
more authentic interpretation of fear.
Emotion - Happy. Zygomaticus major and risorius muscles are involved. Like in neutral,
there is a high degree of correlation between singers and bi-polar pairs.
Emotion - Anger. Platysma and procerus are involved. Here, singers A and B use very
similar formations and degrees of activity in both renditions of the emotion. For singer C,
there is less overall facial activity.
Conclusions
In summary, the data analysis reveals that both larynx and face move and work very
differently according to emotional state. The particular profiles of each share some common
characteristics. For instance, when singing with a sad expression, the face contracts,
reducing its overall surface area, and so too does the vocal sound, producing a more
breathy, whispered tone, indicating that the ventricle folds are further apart than in the other
emotional conditions. Interestingly, a similar result was obtained for the communication of
fear for singers A and C, whereas singer B used a much fuller sound and the frontalis was
lifted, like is typically used in surprise (Ekman and Friesen, 1969) . This suggests that the
three singers had slightly different interpretations of what fear was and how it was created.
One possible reason for the differences is that in the case of singer B, he was thinking about
the surprise within the Erlkonig itself, where the child suddenly, in a fearful state, calls out for
his father. Singers A and C said that they focused more on a generic expression of fear, and
did not have the element of surprise in mind when singing.
Whilst there was a high correlation for all states, especially happiness, it is worth mentioning
that all the singers noticed discomfort in their throats when singing in the angry condition.
Upon analysis of the data, it seems that in this state the muscle platysma is involved. Since
the singers were wearing throat straps, the two electrodes were forced downwards as the
platysma was brought into action, causing the discomfort. The sung effect was to create a
rougher, less vibrant tone.
Interview data can be touched upon here to show that all the singers believed that their
emotions were 'authentic' in that they were constructed out of memories of those states.
However, they were all able to recognise that in their different interpretations some were
more 'successful' than others. The quantitative data so far does not show any statistical
differences in these more or less successful interpretations. Thus, it is only possible to begin
to theoriesabout what might allow for the differentiation between the interpretations to occur.
Form the commentaries of the singers and the viewers, it appears that there are qualitative
differences in the intensity of how the muscles are used. That is to say that if the singer is
clearly working with a stronger inner intention the effect is more successful.
References
Davidson, J.W. (1994) What type of information is conveyed in the body movements of solo
musician performers? Journal of Human Movement Studies, 6, 279-301.
Draffan, K. (2000) Singing from the Soul. Unpublished MMus Dissertation, University of
Sheffield.
Ekman, P. and Friesen, W. V., (1969) The repertory of nonverbal behaviour:Categories,
origins, usage, and coding. Semiotica, 1, 49-98.
Fonagy, I. (1962) Mimik auf glottaler Ebener. Phonetica, 8, 209-219.
Helmsley, T. (1998) Singing and Imagination. Oxford: Oxford university press.
Kotlyar, G.M. & Morozov, V.P. (1976) Acoustical correlates of the emotional content of
vocalised speech. Soviet Physiology and Acoustics, 22, 208-211.
Manén, L. (1974) 'The Art of Singing', London: Faber Music Ltd.
Runeson, S. and Frykholm, G. (1983) Kinematic Specification of Dynamics as an
informational basis for person-and-action perception: Expectations, gender, recognition, and
decpetive intention, Journal of Experimental Psychology: General, 112, 585-615.
Salgado, A. (1999) Rethinking voice evaluation in singing, Conference Proceedings from
European Society of Cognitive Sciences of Music conference on Research Relevant to
Music Training Institutions. Lucerne, Switzerland, September.
Sundberg, J,. (1980) Röstlära. Stockholm: Proprius Förlag.
Back to index
Proceedings paper
Brinkman asked 32 high school instrumental music students to compose two melodies. Three judges
independently rated the melodies using a consensual assessment technique. That is, each judge was
asked to rate each melody on a 7-point scale ("low" marked on one end and "high" on the other) in the
categories of originality, craftsmanship and aesthetic value. The reliability of the three judges creativity
ratings of the 64 melodies ranged from .77 to .96.
Reliability figures for 3 judges ratings of 14 children's musical compositions ranged from .62-.73 for
creativity and .81-.95 for craftsmanship in a study by Hickey (1995).
The reliability of "creativity" ratings for children's musical compositions was .93 in another study by
Hickey (1996).
Most recently, Hickey (in process) examined one of the assumptions of the consensual assessment
technique that states that "experts" must be used as assessors of the creative products. In the domain of
music, just who are the "experts" when it comes to dealing with children's compositional products?
Amabile answers this question for visual art and problem solving studies which used the consensual
assessment technique by reporting: ". . . . we are now convinced that for most products in most
domains, judges need not be true experts in order to be considered appropriate" [for judging products]
(1996, p. 72). She qualifies this, however, by stating that in many domains, some form of training in the
field may be necessary for judges "to even understand the products they are assessing" (p. 72) and
specifically cites computer-programming tasks and judging portfolios of professional artists. Based on
analyses of several studies, Amabile concludes with a suggestion:
...the level of judge expertise becomes more important as the level of subjects' expertise in
the domain increases. In other words, the judges should be closely familiar with works in
the domain at least at the level of those being produced by the subjects. (1986, p. 72-73)
The purpose of the present study is to report the findings from 2 recent experiments which use the
consensual assessment technique in order to refine this technique in music and to find which group of
"experts" are best qualified to assess children's musical composition. The studies and results are
reported next.
Study A
The purposes of this study were to: a) determine which group of judges-composers, theorists, music
teachers, or children-would make the most reliable creativity ratings of children's musical
compositions; and, b) determine the relationships of mean creativity scores between these groups of
judges.
Subjects
Five groups of judges' creativity assessments of children's musical compositions were compared. The
groups were: 17 music teachers, 3 composers, 4 music theorists, 14 seventh-grade children, and 25
second-grade children. The music teachers were broken down into the following groups for analysis: 10
"instrumental" music teachers (teachers who taught only junior or senior high school band/orchestra); 4
"mixed experience" teachers (teachers who taught a combination of instrumental and choral or
instrumental and general music), and; 3 "general/choral" music teachers (elementary general music
teaching with some choral music). From the group of composers, two were college composition
professors, and the 3rd was a graduate student in composition. All had at least 15 years of experience
writing music in a wide variety of genres ranging from jazz to classical. The music theorists were
college theory professors. The two groups of children came from contained classrooms in a private
grade-school.
The 11 musical compositions which were rated by all of the judges were randomly selected from a pool
of 21 compositions generated by fourth- and fifth-grade subjects in a previous research study (Hickey,
1995). In the 1995 study, the subjects were given unlimited time to create an original composition using
a synthesizer connected via MIDI interface to a Macintosh computer. The final compositions were
captured in MIDI file format using a computer program that allowed the recording of up to three
simultaneous tracks of music. No compositional parameters were given. Students were encouraged to
re-record their compositions as often as necessary until they were satisfied with their finished product.
Procedure
Amabile (1983) recommends that in order to assure discriminant validity between other areas and
creativity, that dimensions such as craftsmanship and aesthetic appeal be included on the rating form.
The form used by the theorists and composers for this study was developed by combining and adapting
items from Amabile's Dimensions of Creative Judgment (1982) and Bangs' Dimension of Judgment
rating forms (1992). The final form was used and tested in two previous studies (Hickey, 1995; 1996).
The rating form contained 18 items which fell under one of three dimensions: creativity, craftsmanship,
and aesthetic appeal. The items consisted of 7-point Likert-type scales with anchors marked "low,"
"medium," and "high." The music teachers in this study used a 3-item form with 7-point rating scales
for creativity, craftsmanship and aesthetic appeal. The creativity item for the music teachers, theorist
and composers was worded: "Using your own subjective definition of creativity, rate the degree to
which the composition is creative."
The children rated the compositions first for "Liking," and on a second listening, for "Creativity," using
a separate form for each scale. The Creativity form asked the students to rate each composition on a
5-item scale with "Not Creative" and "Very Creative" marked on the low and high ends. The
second-grade children's form had icons (from plain to more elaborate/silly faces) at each point on the
scale to aid them in understanding the continuum.
The groups of children listened to the compositions together in their respective classrooms. Before
listening to and rating the compositions, the author engaged the children in discussion about "Liking"
music and/or thinking that music is "Creative." The children shared ideas about what "creative" meant
to them and the discussion was guided to help them focus on understanding this term for rating music.
They then rated each compositions first for "Liking," and on a second listening, for "Creativity."
All of the judges were informed that the compositions were composed by fourth- and fifth-grade
children. And, following Amabile's suggestion for proper consensual assessment technique procedures
(1996), the judges rated the compositions independently and were instructed to rate them relative to one
another rather than against some "absolute" standard in the domain of music.
Results
The analyses in this report are based on the judges' ratings on only the Creativity item of the assessment
forms. Interjudge reliabilities were calculated using "Hoyt's analysis," an intra-class correlation
technique which reports a coefficient alpha (Nunnally & Bernstein, 1994). The statistics were computed
using GB-Stat (1994) software. Because each group had a different numbers of judges, reliability
coefficients were adjusted in order to compare the groups as if only 3 judges were used in each group.
The adjusted interjudge reliabilities for each group's creativity ratings on the musical compositions
were: composers, 0.4; all music teachers, .64; instrumental music teachers, . 65; "mixed teachers," .59;
general/choral teachers, .81; music theorists, .70; seventh-grade children, .61; and, second-grade
children, .50.
The correlations of mean creativity ratings among the different groups of judges is presented in Table 1.
Due to the lack of agreement among the composers, each composer is represented separately rather
than using the group mean for correlation with the other groups. Significant correlations were found
between the three groups of music teachers, between the music teachers and music theorists, and
between the two groups of children. Though music teachers and music theorists agreed with each other,
and the groups of children had a high positive correlation with each other, the theorists and teachers
showed moderate to low correlations with the groups of children. There were no strong positive
correlations amongst the composers nor between the composers and the other groups.
Table 1
Correlations of Mean Creativity Ratings Between Groups of Judges
Judges 1 2 3 4 5 6 7 8 9
1. Composer A
2. Composer B -.02
3. Composer C .07 -.26
4. Music Theorists .16 -.02 .58
5. All Music .35 .01 .37 .90 * *
Teachers
6. Instrumental .45 -.09 .39 .88 * * -
Teachers
7. Mixed Teachers .18 .11 .35 .86 * * - .78 * *
8. General/Choral .14 .17 .19 .63 * - .68 * .72 *
Teachers
10. 2nd-grade .19 -.03 .19 .38 .18 .11 .41 -.01 .83 * *
Children
* *p <.01, * p<.05
Study B
The purpose of this most recent study was to test the reliability of a one-item creativity rating form
using the consensual assessment technique for rating children's musical compositions and to test the
reliability of a small group of judges.
Subjects
The judges in this study were 6 music teachers who came from slightly varied teaching backgrounds.
Three of these teachers were active teaching music composition to students in their general music
classes. One was a high school band and general music teacher, and the other two were middle school
general music teachers. Two judges were elementary general music teachers who taught music
composition to their students only on a few occasions in the past (music composition was not a regular
part of their curriculums). The final judge was a student teacher who was student teaching in
elementary level general music.
The 53 compositions that were rated in this study were created by 28 third-grade children (8 and 9
years of age). The children were volunteers who came to the University over three, 2-hour Saturday
sessions to learn about music composition using Macintosh computers and synthesizers. The students
were shown how to use a simple music sequencing software with Korg X5D synthesizers to create
original music compositions. The compositions collected for this study were composed on the first and
third day of the sessions. The children's instructions were to simply create a composition that they
liked. They could use as many tracks as they wished, and any combination of the available 128 General
MIDI timbres. They were given as much time as needed (no child needed more than 45 minutes) and
could revise and re-record as much as needed until they were satisfied with their composition. Several
children recorded more than 1 composition during each of these sessions. They were asked to choose
their favorite composition for purposes of this study. Twenty-five of the children completed 2
compositions for the project while three children only completed the first session.
Procedure
The MIDI compositions were converted to audio files and saved onto a CD ROM for judges to listen to.
Each judge received a CD with the 53 compositions in a different and random order. The judges then
independently listened to the compositions and rated each on creativity using a 7-point Likert-type
scale with anchors marked "low," "medium," and "high." The instructions for rating each composition
were: "Using your own subjective definition of creativity, rate the degree to which the composition is
creative."
Results
The average creativity score for all 53 compositions was 3.8, with a range from 1.34 to 6.17. Interjudge
reliabilities were calculated using "Hoyt's analysis," an intra-class correlation technique which reports a
coefficient alpha (Nunnally & Bernstein, 1994). The statistics were computed using GB-Stat (1994)
software. The reliability coefficient for all 6 judges was .61 (p < .01). To test the hypothesis formed
from the results found in study "A"-that is that general/choral elementary teachers are the best experts
in judging children's compositions-I calculated reliability coefficients with a variety of combinations of
judges to see which produced the best reliability. The best reliability figure was . 65 (p < .01) when
calculated without the high school band/general music teacher.
Discussion and Implications for Further Research
The main purposes of this paper were to describe the conceptual background of and technique for using
consensual assessment to rate the creativity of children's music compositions and to share results of
research in which this particular assessment technique is used. Study "A" sought to determine who
might be the best group of experts to judge the creativity of children's musical compositions when using
a consensual assessment technique. Based on the results of this study, it seems that the best "experts,"
or at least the most reliable judges, may be the very music teachers who teach the children-the
general/choral music teachers. Perhaps the extensive music training that music teachers have along with
their experience in the classroom with children provides them with the tools necessary to make
consistent and valid judgments about the creative quality of children's original musical products. It is of
interest to note that the composers used in this study were the group least able to come to an agreement
on the creativity of the children's compositions. In music education in the United States, music
composition is sometimes viewed as "mysterious," and often, the only experts considered in this realm
are the professional composers. Perhaps music teachers should have reason to feel more confident in
their ability to accurately assess the relative creativity of their students' musical compositions.
Study "B" further tested the reliability of subjective creativity assessment of children's musical
compositions and also examined the differences in judges' ratings based upon their teaching
backgrounds. The best reliability figure was obtained without the high school band/general music
teacher. This corresponds with study "A" in that perhaps the high school teacher does not have the same
sense, hence criteria, of what young children are capable of creating in musical compositions and may
not be the best "judge" for rating creativity in children's musical compositions.
The reliability of .65 is significant, yet lower than figures obtained previously. One reason may be that
the rating form asked judges to rate the musical compositions on only one item for "creativity."
Amabile suggests that at least aesthetic appeal and craftsmanship (in addition to creativity) be used as
items to rate creative products in order to force judges to think more carefully about the "creative"
aspects of the product. Though rating on only 1 item is easier and quicker for judges, this may prove
that this method is not as reliable for consensual assessment as using at least 3 rating items.
Another way to make this procedure more reliable is to include a general creativity definition for the
judges. This definition would be that creative musical compositions are both original and "appropriate"
(this seems to be the most common definition in the literature [Amabile, 1996]). Amabile suggests that
a definition may be needed when judges are uncomfortable with the idea of rating the creativity of
products in the absence of a guiding definition (1996).
A final hypothesis for the unsatisfactory reliability coefficient in Study "B" is that children at this age
(8- and 9-years) are not developmentally "ready" or able to create an original and musically satisfying
composition. The compositions that were rated very high or very low may have been done so by chance
and not with any real intent or ability. We need more research in our field to understand the
developmental trend of creative musical thinking in children in order to test this hypothesis.
Why bother with this pursuit of consensual assessment for rating creativity in children's composition?
For one, and mentioned briefly above, it is to show that teachers indeed do know, and can reliably
assess the creative quality of children's compositions without the need for clear-cut objective criteria.
Of course critieria for assessing compositions should be made clear to children when the consequence
might be a grade, but these studies show that teachers naturally have a subjective idea of compositions
which are more or less creative when compared to others.
Using a subjective consensual assessment technique, one might collect and examine the compositions
from children which are consistently rated as highly "creative." What are the features of these
successful compositions? From these compositions we may be able to formulate sensible rubrics to aid
in assessing children's musical compositions in schools. Furthermore, compositions rated highly
"creative" could also be used as models for elementary music classrooms-models are desperately
needed for teachers who strive to do more musical composition activities in their classrooms.
The subjective consensual assessment of children's musical compositions, for the most part, has
worked. It may provide the most appropriate measure because of its subjectivity and because it does
not presume objective criteria for creativity. This line of research may prove fruitful for the pursuit of
understanding better the genesis and factors surrounding a creative musical "aptitude" in children. In
order for consensual assessment to be the procedure for this identification, however, the next step is to
identify children who repeatedly produce creative musical compositions and which are rated such by
experts. And then to pursue more answers to questions about these children: what are the social and
external factors that surround these children's background? Is there a relationship between scores on
general as well as musical creativity tests and the creative musical production? Is there a relationship
between musical creativity (based on musical composition assessment) and musical "aptitude" (based
on a standardized musical aptitude test)?
Creative musical thinking in children is a complex phenomenon in need of further study. The use of the
consensual assessment technique for identifying creative musical compositions and their creators, may
prove to be the most reliable measure to aid in this research endeavor.
References
Amabile, T. M. (1982). Social psychology of creativity: A consensual assessment
technique. Journal of Personality and Social Psychology, 43, 997-1013.
Amabile, T. M. (1983). The social psychology of creativity. New York: Springer-Verlag.
Amabile, T. M. (1996). Creativity in Context. Update to The social psychology of
creativity. Boulder, CO: Westview Press.
Bangs, R. L. (1992). An application of Amabile's model of creativity to music instruction:
A comparison of motivational strategies. Unpublished doctoral dissertation, University of
Miami, Coral Gables, Florida.
Brinkman, D. (1994). The effect of problem finding and creativity style on the musical
compositions of high school students. Unpublished doctoral dissertation, University of
Nebraska, Lincoln.
Brown, R. T. (1989). Creativity. What are we to measure? In J. A. Glover, R. R. Ronning,
& C. R. Reynolds (Eds.), Handbook of creativity, (pp. 3-32). New York: Plenum Press.
Cattell, R. B. (1987). Intelligence: its structure, growth and action. Amsterdam: Elsevier
Science Publishers.
GB-Stat [Computer software]. (1994). Silver Spring, MD: Dynamic Microsystems, inc.
Guilford, J. P. (1950). Creativity. American Psychologist, 5, 444-454.
Guilford, J. P. (1967). The nature of human intelligence. New York: McGraw-Hill.
Hickey, M. (1995). Qualitative and Quantitative Relationships Between Children's
Creative Musical Thinking Processes and Products. Unpublished doctoral dissertation,
Northwestern University, Evanston, IL.
Hickey, M. (1996). Consensual Assessment of Children's Musical Compositions.
Unpublished paper presented at the Research Poster Presentation, New York State School
Music Association Convention.
Hocevar, D., & Bachelor, P. (1989). A taxonomy and critique of measurements used in the
study of creativity. In J. A. Glover, R. R. Ronning, and C. R. Reynolds (eds.), Handbook
of creativity, pp. 53-76. New York: Plenum Press.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York:
McGraw-Hill.
Torrance, E. P. (1966). Torrance tests of creative thinking. Princeton, NJ: Personnel Press.
Torrance, E. P. (1974). The Torrance tests of creative thinking: Technical-norms manual.
Bensenville, IL: Scholastic Testing Services.
Torrance, E. P. (1981). Thinking creatively in action and movement: administration,
scoring, testing manual. Bensenville, IL: Scholastic Testing Service, Inc.
Webster, P. & Hickey, M. (1995, Winter). Rating scales and their use in assessing
children's compositions. The Quarterly Journal of Music Teaching and Learning, VI (4),
28-44.
Back to index
Proceedings paper
Very few biological studies have investigated how we perceive musical rhythm. In the present research, we aimed to
find the cerebral bases of perceptual and cognitive processes of rhythm using Functional Magnetic Resonance Imaging
(fMRI). Many authors (e.g. Lerdahl and Jackendoff, 1983; Povel and Essens, 1985; Parncutt, 1994; Drake, 1998; see
Clarke, 1999 for a recent review) have proposed the existence of two types of temporal processes that appear
fundamental in the perception of simple rhythmic sequences: the segmentation of an ongoing sequence into groups of
events on the basis of their physical characteristics and the extraction of underlying temporal regularities.
The first process ("grouping") is based on gestalt principles and depends, among other physical characteristics, on the
relative proximity in time of sound events. The occurrence of a longer gap between two events creates a boundary
leading to the perception of two distinct perceptual units. A sound sequence is thus perceived as a succession of
rhythmic groups.
The second fundamental process ("meter") may occur in parallel. It corresponds to the extraction of temporal regularities
in the sequence in the form of an underlying pulse, often influenced by the alternation of strong and weak beats in the
musical sequence (that is the metric structure stricto sensu).
Both grouping and metric processes seem to depend on the complexity and the hierarchical structure of the musical
sequences but seem to be functional in early life (see Drake, 1998 for a review). Neuropsychological studies in
brain-lesioned or "surgical" patients have shown that the processing of these two rhythmic organizations could be
impaired relatively independently from one another, suggesting that processing metric and grouping characteristics may
involve different mechanisms and networks in the human brain (Peretz, 1990; Liégois-Chauvel et al, 1998). In order to
investigate this problem, we asked subjects to compare pairs of short rhythmic sequences differing in the position of one
event moved towards another group ("grouping" condition) or slightly displaced in order to disrupt the underlying pulse
("metric" condition). During these tasks, we measured their brain activity in an fMRI scanner.
Method :
Subjects
9 subjects (3 males, 6 females, mean age: 25; right-handed) participated in the experiment. All reported normal hearing
and no neurological history.
Stimuli
Our stimuli consisted of pairs of short (3.8 sec) rhythmic sequences separated by a silence of 1.7 sec. Each rhythmic
event consisted of a complex percussive sound with a fundamental pitch corresponding to A4 (F0=440Hz). Each sound
lasted 50ms and had a level of 90 dB SPL. The sounds were played through headphones via a pneumatic system. The
average level of the noise produced by the scanner was reduced to 75dB SPL by the means of sound protecting
headphones. Depending on the sequences, the number of events varied from 8 to 12.
Grouping task: In this task, the sequences were irregular, with alternating Inter Onset Intervals (IOI) of 250,
700 and 1300 ms in order to prevent subjects from extracting a regular pulse or metric structure (ratio = 1
:2.8 :5.2). In half of the trials, one event was displaced from one group to the next. An example for one trial
is shown in Figure 1.
Metric task: In this task, the sequences were composed of the alternation of 3 IOI sharing integer ratios
1:2:4 (250, 500 and 1000 ms). In half of the trials, one event was delayed or advanced by 70 ms, locally
disrupting metric expectancies but still remained in the same group. An example for one trial is shown in
Figure 2.
Baseline condition: Both rhythmic tasks alternated with the same control task. In this task, subjects heard a
continuous sound with the same pitch and timbre, of a total duration around 10 seconds. We used such a
sound in order to remove any activity involving non temporal auditory processing. We decided to use a
continuous sound rather than regular filler patterns as other authors did (e.g. Sakai et al, 1999) in order to
prevent the subject from performing any cognitive task involving processing of temporal information
during the control condition.
Procedure
Before scanning, subject were presented with practice trials in order to check their understanding of the tasks. During
scanning, series of several pairs of sequences were presented for both conditions. The subjects had to press the right
button of a mouse if both sequences of one pair were identical and the left button if they were different. During the
control condition, they pressed alternatively the right or left button as soon as the sound stopped (subjects were informed
that no reaction time was measured). Specific instructions were given to the subjects: they were not allowed to
inner-sing or tap the rhythm and had to wait until the end of the second sequence to give their answer. They had to focus
on the regularity in the metric task and on the succession of events in the grouping task (we did not mention the presence
of distinct groups).
Data Acquisition and Analyses
A Bruker (Karlsruhe, Germany) 2.0 T system equipped with a 30mT.m-1 gradient coil set for echo planar imaging (EPI)
was used to perform all studies. Measures were averaged over four different repetitions for both conditions. The
acquisition consists of 32 transaxial gradient-echo planar (GE-EPI) 64*64 brain isotropic (4 mm) slices, so-called
volume, and repeated each 5500ms (repetition time), resulting in 120 scans in total. Before statistical analysis,
pre-processing of the images had been performed, namely realignment, normalisation and smoothing according to the
procedures proposed by Friston et al. and as implemented in the Statistical Parametric Mapping (SPM99b) software
(Wellcome Department of Cognitive Neurology; Friston et al., 1995. Statistical analysis was performed: the single
condition paradigm (two tasks) was modeled using delayed (5.5 s) boxcar hemodynamic model function in the context
of the general linear model (Friston et al., 1995), resulting in a t statistic for each and every voxel. These t statistics were
transformed to z statistics. Voxels that survived the statistical criteria of significance (p < 0.01 one tailed) corrected for
multiple comparisons constitute a statistical parametric map (SPM). Anatomical identification of activated areas was
performed individually by mapping areas onto the subject's own anatomical normalized (T1) images (T1 images on T1
template). Following individual anatomical identification of activated areas for each subject, the identified activated
areas from multiple subjects were mapped onto the best fitted area of the normalized template T1 image in the Talairach
(Talairach and Tournoux, 1988) reference coordinate system.
Results:
Several brain regions revealed significant activation during grouping and metric processing relative to the baseline
(control non rhythmical task). Table 1 presents the areas significantly activated for both conditions. Figure 3 and 4
report the activation obtained in the grouping and the metric conditions (N=9) respectively, on a standardised brain. We
will review in turn the cerebral regions activated and identify the brain regions active in only one of the conditions.
Figure 3: Statistical parametric analysis (SPM{z}) for 9 subjects in the grouping condition. The foci of signicantly
increased activities (shown in red) were rendered on the surface template of a standard brain as implemented in SPM99b
(Wellcome Department of Cognitive Neurology, London, UK). The dimmer the color, the deeper the activation.
Figure 4: Statistical parametric analysis (SPM{z}) for 9 subjects in the metric condition (see legend of Figure 3).
Temporal lobes
The temporal lobes (especially Brodmann areas -BA-s 22 and 40), were bilaterally activated in both tasks. This confirms
the idea that musical rhythm is processed in associative auditory areas in the temporal lobes. However, there was a
spread of activation in the anterior part of the superior and medial left temporal gyri (BA 22 and 38) in the metric task.
The spread of activation in the temporal regions was more posterior (towards BA 40) in the grouping than in the metric
condition. These results partly confirm the anterior/posterior dissociation observed in metric and "rhythmic" contrasting
tasks observed recently by Liegeois-Chauvel (1998).
Frontal and Prefrontal lobes
A broad bilateral activation in the SMA (supplementary motor area) and some activation in the PMC (premotor cortex,
BA 6) were obtained in both grouping and metric tasks. This activation was spread towards more anterior regions in the
grouping task. The activation in motor-associated areas may seem surprising since any motor activity (in our case
pressing a button to give the answer) should have been eliminated by the motor activity in the control condition.
However, activation in these areas has already been reported during the processing of visual and auditory temporal
patterns (Tracy et al., 1999; Sakai et al., 1999; Schubotz et al., 2000), but all these tasks were assessed by the
reproduction (tapping) of the rhythmic stimuli. To our knowledge, this is the first report of the activation of
motor-related areas in a purely perceptual rhythmic task. This is also a strong evidence against previous assumptions
that motor areas would only be involved in the programming of the reproduction of rhythmic sequences.
The frontal opercular areas (including Broca's area and its equivalent in the right hemisphere) were activated bilaterally
in both tasks. This pattern had already been observed in nonverbal rhythmic tasks (Schubotz et al., 2000). Since part of
the motor system (SMA and PMC) was involved in our tasks, it was not possible to assess if the activity in this frontal
area was related to inner singing, linked with the articulation of verbal sounds or if it was related to a broad nonverbal
time processing system. However after single subject analyses revealed that activation in such regions did not perfectly
overlap between the two tasks. A right/left asymmetry of frontal opercular activity was observed for each subject.
However, there was no consistency between subjects regarding the preferential use of the right or left side depending on
the task.
Middle frontal areas (including BA 45, 46 and 9) were also activated in most subjects, with a larger spread of activation
on the right side in the grouping task. This could be related to the memorization of the sequences during the comparison
since These brain regions have classically been proposed to mediate attentional processes and working memory when
listeners perceive melodies (Zatorre et al., 1994). However, selective attention to the time intervals could be a better
explanation since our inter-sequence delay was too short for the participants to completely rehearse the first sequence of
each pair.
Cerebellum
We found cerebellar activation in both rhythmic tasks. This also supports the involvement of motor areas in the
time-based activities (Tracy et al., 1999; Sakai et al., 1999; Schubotz et al., 2000). However the cerebellum is considered
to play a role in perceptual timekeeping tasks. The bilateral cerebellar activity was more lateral in the grouping task and
more medial in the metric task (including the vermis). We did not find any evidence of a posterior/anterior opposition as
reported by Sakai et al. (1999) in the case of reproduction of metric and non metric sequences.
Other areas
The superior parietal gyrus (BA 7) was actived on the right side of the brain in both tasks, which had been observed in
other rhythmic tasks (Platel et al., 1997; Sakai et al.; 1999). This associative multimodal cortex is supposed to take part
in timing mechanisms during both perceptual and motor tasks (Sakai et al., 1999). This region has been proposed to be a
component of a general time-keeping system (Maquet et al, 1996; Sakai et al, 1999), sometimes associated with the
attentional binding of sequential events across time (Posner and Dehaene, 1994).
Finally, in 6 out of 9 participants, left occipital regions (BA 19) were activated in the grouping task only. This could be
explained by the use of mental imagery as a strategy to visualise the rhythmic groups and the displacement of an event
from one group to another. In fact, such a strategy was reported by most subjects during debriefing. The activation of the
visual cortex had already been observed for pitch discrimination in musical sequences (Platel et al., 1997), but was not
observed in any rhythmic task.
Conclusions:
Our results show that common areas are used by subjects in rhythmic tasks based on two distinct cognitive processes.
The brain networks involved in these tasks comprised auditory and motor associated areas. The latter have already been
shown to be part of a general time processing system. Our study is the first to report an activation of motor structures
during a perceptual task using musical stimuli. Most brain imaging studies of rhythm used motor reproduction to assess
subjects' performance. The common areas activated in the metric and grouping tasks included both right and left
superior and middle temporal cortices, prefrontal and right superior parietal areas.
We found a large overlap of the areas activated in both tasks for our 9 subjects. Only little activation was specific to one
of the two tasks. Even if a double dissociation between metric and nonmetric rhythmic processes has clearly been
evidenced in neuropsychological studies (Mavlov, 1980; Peretz, 1990; Liégeois-Chauvel, 1998), very little research has
been carried out on the interaction between these processes. Hence, there are still discrepancies about the exact nature of
each of these processes and their independence from each other. Further research may show, as already suggested by
Liégeois-Chauvel et al. (1998) that these two processes would be less independent than has been observed, especially
since they occur in parallel. This could explain why so much overlap was found between metric and grouping task in our
study. Since grouping and metric features in musical sequences depend mainly on their hierarchical structures, we
envisage in our next brain imaging study to focus on the relation between basic and hierarchical levels of rhythmical
sequences.
However, a few areas were activated in only one task. In the metric condition, we observed a large spread of activity in
the anterior part of the left temporal lobe and in the left medial frontal gyrus (premotor cortex). In the grouping
condition, a larger anterior activation was found in the right prefrontal cortex. Moreover, cerebellar activity showed
distinct patterns between the two tasks. Thus, although our results confirm the existence of a general time processing
system involved in both of our rhythmic conditions, there are specific regions related to only one of the two distinct
types of cognitive processes. However, there was a high variability between subjects and no clear distinctive activation
profiles were shared by all subjects. Hence, preliminary analyses after grouping subjects according to their musical
expertise suggest that the cerebral networks used by expert musicians to process rhythm seems to be spread over more
cerebral structures than those of nonmusicians. Further analyses will be carried out to confirm this view.
References:
Drake, C. (1998) Psychological processes involved in the temporal organisation of complex auditory
sequences: universal and acquired processes. Music Perception, 16, 11-26.
Clarke, E. F. (1999) Rhythm and Timing in Music. In The Psychology of Music, 2nd Edition. D. Deutsch,
Ed. New-york, Academic press, 473-500.
Lerdahl, F., & Jackendoff, R. A. (1983). A generative theory of tonal music. MIT Press. Cambridge, MA.
Liegeois-Chauvel C, Peretz I, Babai M, Laguitton V, Chauvel P (1998) Contribution of different cortical
areas in the temporal lobes to music processing. Brain;121 (10):1853-67.
Maquet P, Lejeune H, Pouthas V, et al. (1996) Brain activation induced by estimation of duration : a PET
study. NeuroImage 3: (2) 119-126.
Mavlov, L. (1980) Amusia due to rhythm agnosia in a musician with left hemisphere damage : a
non-auditory supramodal defect. Cortex, 16, 331-338.
Parncutt, R. (1994) A perceptual model of pulse salience and metrical accent in musical rhythms Music
Perception 11: (4) 409-464
Peretz, I. (1990) Processing of local and global musical information in unilateral brain-damaged patients.
Brain, 113, 1185-1205.
Platel H., Price C., Baron J-C., Wise R., Lambert J., Frackowiak R.S.J., Lechevalier B., Eustache F., (1997)
The structural components of music perception : A functional anatomic study. Brain, 120, 229-243.
Posner, M. I. and Dehaene, S. (1994) Attentional networks. Trends in Neuroscience, 17, 75-79.
Talairach, J. and Tournoux, P. (1988) Co-planar stereotaxic atlas of the human brain: 3-dimensional
proportional system: an approach to cerebral imaging. Stuttgart: Thieme.
Tracy, JI, Faro, SH, Mohamed, FB, Pinsk, M and Pinus, A. (2000) Functional localization of a
"time-keeper" function separated from attentional resources and task strategy. NeuroImage 11, 228-242.
Zatorre, RJ., Evans, CE. and Meyer, E. (1994) Neural mechanisms underlying cerebral melodic perception
and memory for pitch. Journal of Neuroscience, 14, 1908-1919.
Back to index
Proceedings paper
hand, generally do not leave their natal territory but defend it by patrolling its borders against
neighboring groups (Wrangham 1975; Ghiglieri 1984). Stable groups of territorial males must, in
other words, attract females from other territorial groups, and are thus in competion with one another
regarding migrating females. Female exogamy is also present in humans, characterizing a majority of
hunter-gatherer societies (Ember 1978). A trait shared by humans and chimpanzees can be assumed to
have been present in their common ancestor as well, whose social system accordingly featured groups
of associated males competing for migrating females. This global pattern of sociality is strikingly
reminiscent of the pattern of sociality associated with the evolution of genuinely cooperative chorus
synchrony in insects (Morris et al. 1978; Greenfield 1994), and raises the question of whether
synchronous chorusing might have been a factor in the evolutionary history of the
hominid-chimpanzee clade.
There are two major points of divergence in this history which have left representatives surviving to
this day, one being the late miocene split between hominids - who eventually gave rise to homo - and
ancestral chimpanzees some five to six million years ago, the other being the split between ancestral
common chimpanzees and ancestral bonobos a few million years later. Synchronous chorusing
appears to have played a role in both speciation events, since in either case the present day
representatives of one branch of either split, namely human beings and bonobos, appear to possess the
capacity for synchronous chorusing, a capacity not possessed by the common chimpanzee. As already
noted, the capacity for entrainment to an isochronous pulse is a cross-cultural human universal, and it
has been reported that bonobos engage in a unique vocal behavior without homologue in common
chimpanzees - so called "staccato hooting" - in which multiple individuals synchronize their hooting
to a common steady beat (de Waal 1988).
The possibility that the behavioral adaptation of entrainment for vocal synchrony arose twice by
independent evolution from the common ancestor of humans and chimpanzees implies that there must
have been strong predisposing factors for it in ancestral behavior, making the step to cooperative
synchrony a short one, taken twice by independent evolution. Beyond the global pattern of sociality
already referred to there is an additional feature of chimpanzee behavior which might provide the key
in this regard, namely the so called "carnival display" (Reynolds and Reynolds 1965; Sugiyama 1969,
1972; Wrangham 1975, 1979; Ghiglieri 1984). When a subgroup of foraging chimpanzee males
discovers a ripe fruiting tree they tend to launch a noisy display of combined vocal and locomotor
excitement which attracts other males and females to the site. The new arrivals join the group display
before eventually settling down to feed on the newly discovered resource. The display is cooperative
in that it attracts additional mouths to the resource, and it is an honest signal of resource discovery,
since false alarm are likely to provoke retaliation.
Over time, the frequency of carnival displays on a given territory would tend to reflect a combination
of male cooperativity and abundance of fruit trees on that territory. This signal would provide a source
of potentially important information not only for other members of the territorial group, but for those
inter-territorially migrating females who are in the process of choosing a territory on which to settle
permanently for the rearing of their young. However, the voice resources of common chimpanzees are
such that the chaotic noise of their carnival display is unlikely to span even the diameter of their home
territory - measuring some four kilometers across - let alone penetrate the circle of surrounding
territories. This constraint imposed by vocal limitations could, however, be overcome by cooperative
male synchronous chorussing, allowing the amplitude of individual voices to sum to the extent of the
precision of their entrainment to a common, isochronous pulse. Since sound attenuates according to
the square of the distance, 4 perfectly synchronized males would double the reach of their signal,
while 16 synchronized males would quadruple its reach, allowing the carnival display to penetrate
beyond territorial boundaries to reach the ears of migrating females.
It only remains to suggest that humans are the direct descendants of a subpopulation of the
human-chimpanzee common ancestor in which the selection pressure of female choice of migratory
target territory led to the evolution of temporal synchrony in the vocal behavior of males performing
the carnival display. Since the extent of summation of individual voices, and thereby the geographic
reach of the summed signal, depends on the precision of synchrony, a premium attached to timing
precision. It is further suggested that just as today we assist the precision of our musical timing by
relying on repetitive locomotor rhythms like foot tapping, this early break-away subpopulation of the
common ancestor supported the precision of its vocal timing by repetitively rhythmic locomotor
movements performed in place, that is, by a form of dancing. Synchronous chorusing and dancing to
an isochronous pulse in a group setting would seem to qualify as a form of music by a wide range of
construals of that elusive term (see, for example, the Bantu term "ngoma" (Keil 1979), the Blackfoot
"saapup" (Nettl 2000) and the old greek "mousiké" (Merker 1999)).
If the above evolutionary scenario has any merit, the ultimate roots of human music extend back to the
parting of ways between pre-chimpanzees and hominids through a late miocene "breakthrough
adaptation" (synchronous chorusing) which allowed a subpopulation of group-living ancestors to
broadcast the witness of their voices regarding their own cooperativity and the resource-richness of
their territory to the tuned ears of migrating females in search of a suitable territory on which to settle
to rear their young. The present-day universal human capacity to keep time (entrain to an isochronous
pulse) would accordingly be an adaptation retained from this time, informing the cross-cultural
ubiquity of measured music as well as other uses of the human capacity to entrain to an isochronous
pulse. This evolutionary scenario for the origins of human music harbors farreaching consequences
for numerous issues pertaining to the subsequent trajectory of human evolution, including the issues
of brain expansion, the evolution of language, and the relationship between human music and
language (see Merker, 2000).
Acknowledgement
Work on this paper was supported by a grant from the Bank of Sweden Tercentenary Foundation to
Bjorn Merker.
References
Arom, S. (1991). African Polyphony and Polyrhythm (Book V). Cambridge: Cambridge University
Press.
Arom, S. (2000). Prolegomena to a Biomusicology. In N. L. Wallin, B. Merker and S. Brown (Eds.)
The Origins of Music. Cambridge, Mass.: The MIT Press, Ch. 2.
Backwell, P., Jennions, M., and Passmore, N. (1998). Synchronized courtship in fiddler crabs. Nature,
391, 31-32.
Brown, D. B. (1991). Human Universals. New York: McGraw Hill.
Buck, J. (1988) Synchronous rhythmic flashing in fireflies. II. Q. Rev. Biol., 63, 265-89
Buck, J. and Buck, E. (1978). Toward a functional interpretation of synchronous flashing in fireflies.
Am. Nat., 112, 471-492.
Ember, C. R. (1978). Myths about hunter-gatherers. Ethnology, 17, 439-448.
Fraisse, P. (1982). Rhythm and tempo, In D. Deutsch (Ed.) The psychology of music (pp. 149-180).
New York: Academic Press.
Ghiglieri, M. P. (1984). The chimpanzees of Kibale Forest. New York: Columbia University Press.
Greenfield, M. D. (1994). Cooperation and conflict in the evolution of signal interactions. Annual
Review Ecol. Syst., 25, 97-126.
Greenfield, M. and Roizen, I. (1993). Katydid synchronous chorusing is an evolutionarily stable
outcome of female choice. Nature, 364, 618-620.
Keil, C. (1979). Tiv Song. Chicago: The University of Chicago Press, Ch. 2.
McNeill, W. H. (1995). Keeping together in time. Dance and drill in human history. Cambridge,
Mass.: Harvard University Press.
Merker, B. (1999). Synchronous chorusing and the origins of music. Musicae Scientiae. Special Issue:
Rhythm, Musical Narrative, and Origins of Human Communication, pp. 59-73.
Merker, B. (2000). Synchronous chorusing and human origins. In N. L. Wallin, B. Merker and S.
Brown (Eds.) The Origins of Music. Cambridge, Mass.: The MIT Press, Ch. 18.
Morris, G. K., Kerr, G. E., and Fullard, J. H. (1978). Phonotactic preferences of female meadow
katydids (Orthoptera: Tettigoniidae: Conocephalus nigropleurum). Can. J. Zool., 56, 1479-1487.
Nettl, B. (1983). The Study of Ethnomusicology: Twenty-nine Issues and Concepts. Urbana:
University of Illinois Press.
Nettl, B. (2000). An Ethnomusicologist Contemplates Universals in Musical Sound and Musical
Culture. In N. L. Wallin, B. Merker and S. Brown (Eds.) The Origins of Music. Cambridge, Mass.:
The MIT Press, Ch. 25.
Pusey, A. (1979). Inter-community transfer of chimpanzees in Gombe National Park. In D. A.
Hamburg and E. McCown (Eds.) The great apes (pp. 465-79). Menlo Park, Calif.:
Benjamin/Cummings.
Reynolds, V. and Reynolds R. (1965). Chimpanzees of the Budongo Forest. In I. DeVore (Ed.)
Primate Behavior: Field Studies of Monkeys and Apes. New York: Holt, Rinehart and Winston, pp.
368-424.
Sugiyama, Y. (1969). Social behavior of chimpanzees in the Budongo Forest, Uganda. Primates, 9,
225-258.
Sugiyama, Y. (1972). Social characteristics and socialization of wild chimpanzees. In F.E. Poirer
(Ed.) Primate Socialization. New York: Random House, pp. 145-163.
de Waal, F. B. M. (1988). The communicative repertoire of captive bonobos (Pan paniscus) compared
to that of chimpanzees. Behavior, 106, 183-251.
Williams, L. (1967). The dancing chimpanzee: A study of primitive music in relation to the vocalizing
and rhythmic action of apes. New York: Norton.
Wrangham, R. W. (1975). The behavioural ecology of chimpanzees in Gombe National Park,
Tanzania. Ph.D. dissertation. Cambridge, England: University of Cambridge.
Wrangham, R. W. (1979). On the evolution of ape social systems. Soc. Sci. Int., 18, 335-368.
Back to index
Proceedings abstract
IS THE EMOTIONAL SYSTEM ISOLABLE FROM THE COGNITIVE SYSTEM IN THE BRAIN?
Isabelle Peretz
Department of Psychology
University of Montreal
C.P. 6128
succ. Centre-ville
CANADA
Background: A central question in cognitive neurosciences is related to the functional and neuroanatomical autonomy
of the emotion recognition system with regard to the perceptual and memory system. Such a distinction is well
established for faces. For music, a similar dissociation has recently emerged in the literature.
Aims: The goal of the presentation will be to summarise these recent studies performed in neuropsychology, with a
brain-damaged patient and with brain imagery techniques.
Main Contribution: It will be suggested that emotion and recognition share a common perceptual analysis system but
differ in the type of structural characteristics that are needed to achieve their respective goal. For instance, minor and
major mode and tempo are important perceptual determinants of the happy-sad distinction in music. In contrast, mode
and tempo are of little importance for discrimination and identity recognition. Similarly, I will show that emotional
judgement of dissonance in subcortical structures cannot take place without initial perceptual analysis in the auditory
cortex.
Implications: Emotions cannot be totally divorced from structural organisation of the musical input. Emotional
judgements provide a novel and indirect way to study implicit knowledge of musical structure.
Back to index
Proceedings paper
This research was presented as part of a Ph.D. thesis submitted to the Psychology Department of McGill University. Funding was provided in
part, by a grant from the National Sciences and Engineering Research Council of Canada (NSERC), and in part by a team grant from the to A.S.
Bregman and R.J. Zatorre.
a) The first author, Martine Turgeon is currently affiliated with "Institut de Recherche en Coordination Acoustique/Musique" (IRCAM),
Perception et Cognition Musicales. Reprints are available from Martine Turgeon at IRCAM, 1 Place Igor-Stravinsky, 75004, Paris, France.
Introduction
Issues of interest
The perceptual organization of complex tones depends on the detection of biologically-relevant cues in the acoustic signal,
such as those providing evidence for a common spatial location of the components of a sound source, and those reflecting
spectro-temporal regularities typical of causally-related sounds such as simple harmonic ratios and temporal synchrony.
There is evidence that the auditory system groups perceptually sounds that share a common fundamental frequency
(McAdams and Smith, 1990), and a common spatial location of their sources (Kidd et al., 1998). Furthermore, Turgeon and
Bregman (1999) have shown that the fusion of noise bursts in a free field is promoted by temporal synchrony. Though the
contribution of many specific cues to auditory grouping has been established empirically (reviewed by Darwin and Carlyon,
1995), their interaction is poorly understood, especially in a free field. It is important to study grouping in the context of
many interacting cues, since in real-world situations, no grouping cue acts in isolation. The present study was conducted in a
free field with a semi-circular array of speakers to look at how the spatial separation of sound sources interacts with two of
the most robust cues for the grouping of concurrent tones: harmonicity and temporal synchrony.
Rationale of the Rhythmic Masking Release (RMR) paradigm
We used the RMR paradigm (Turgeon and Bregman, 1996) to study the relative contribution of onset asynchrony, deviations
from simple harmonic ratios, and the spatial separation of sources for the segregation of concurrent brief tones. In this RMR
study, a rhythm was perceptually masked by embedding identical tones irregularly among the regular tones. The rhythm is
camouflaged because no acoustic property distinguishes the regular subset of tones from the irregular one. We refer to the
irregular tones as "maskers"; though they do not mask the individual tones, they mask their rhythmic sequential organization.
"Captor" tones can be added in different critical bands simultaneously with the irregular maskers. These tones release the
rhythm from masking when they are completely simultaneous (Turgeon and Bregman, 1996); that is, temporal coincidence
fuses them perceptually. The newly formed masker-captor units have emergent properties, such as a different timbre and a
new pitch; this distinguishes the irregularly-spaced components from the regularly-spaced ones. The accurate perception of
the rhythm is thus contingent upon the fusion of the irregular maskers and captors. Measuring the listener's ability to identify
the embedded rhythm thus provides an estimate of the degree of perceptual fusion of the maskers and captors. We
manipulated the spatial, spectral and temporal relations between the maskers and captors to see how their fusion was affected
by these factors, using a two-alternative forced choice task, in which one of two rhythms was embedded in the sequence.
Objectives and hypotheses
Relative contribution of a common onset and offset, F0, and location of source on fusion. One of the objectives of the study
was to assess the relative importance of auditory-grouping cues by creating competition among them. For instance, suppose
that the masked rhythm sequence and the captors are presented in different speakers. While the common relation to a
fundamental frequency (F0) and the common speaker location of the masked-rhythm tones should promote their sequential
grouping, the temporal coincidence and common F0 between the maskers and captors should promote their simultaneous
grouping. If common spatial location and frequencies overcome the segregating effects of temporal synchrony, the rhythm
should remained perceptually masked; on the other hand, if temporal synchrony and a common F0 (among spatially and
spectrally distributed components) win the competition, the maskers and captors should fuse perceptually and the rhythm
should be heard clearly.
We expected simultaneity of onset and offset to make a much greater contribution to the fusion of complex tones than would
their harmonic relations or their separation in space. This expectation was based on the high ecological validity of temporal
coincidence for the perception of components as a single event as well as the empirical evidence showing its powerful effect
on the fusion of components (reviewed by Darwin and Carlyon, 1995; Turgeon, 1999). Despite the importance of simple
harmonic ratios on pitch perception (Hartmann, 1988), we did not expect it to have a strong effect on the fusion of our brief
tonal stimuli. This expectation was based on recent results showing that harmonicity only weakly affects the diotic and
dichotic fusion of the same stimuli over headphones (chapter 4 in Turgeon, 1999). The weakness of the harmonicity effect
was attributed to the short duration of the tones (i.e., 48 ms). The tones typically used to study the effect of harmonicity on
fusion range from one to several hundreds of milliseconds in length.
Past results suggest that the perceptual organization of sounds is influenced by the spatial separation of sound sources (Kidd
et al., 1998). However, the results of a recent RMR experiment (Turgeon and Bregman, 1999), which presented noise burst
stimuli in a free-field setting, showed that presenting them in different speakers only weakly affected their fusion, compared
to when they came from the same speaker. Moreover, up to an angular separation (∆θ) of 180 degrees of the sources, ∆θ was
not sufficient for the full segregation of synchronous or slightly asynchronous bursts, and the magnitude of ∆θ did not affect
the strength of the fusion. This weak effect can be contrasted with the strong effect of onset asynchronies (SOA) of 36 and 48
ms, which fully segregated the maskers and captors at all ∆θ's (from 0 to 180 degrees). The weak effect of ∆θ, compared to
SOA, might be related to temporal coincidence being a more robust cue than a common location in space for sound-source
determination. Unlike reflected light, sounds go around and through rigid surfaces (as such they are like transparent objects).
As a consequence, in estimating the point of origin of a sound (i.e., the spatial location of the vibrating source), echoes may
suggest more than one point of origin. We believe that echoes and reverberation present the auditory system with a degraded
signal, so that spatial information is often unreliable. Given these ecological considerations and the results of our earlier
free-field study with noise bursts (Turgeon and Bregman, 1999), we expected ∆θ to only have a weak effect on cross-spectral
fusion.
Temporal limits for event perception. Another objective was to evaluate the minimum temporal deviation from perfect
temporal synchrony which triggers the perception of concurrent tones as separate sound events. Onset asynchrony was
expected to have a powerful effect on cross-spectral grouping, because it is a highly reliable cue for the segregation of
sound-producing events. In a natural context, it is likely for sounds coming from different environmental sources to have
some degree of temporal overlap; however, it is unlikely that they happen to be perfectly coincident in time. Given the
adaptive value of detecting deviations from perfect coincidence, an empirical question of interest was to estimate the physical
range of tolerance for the perceived simultaneity of sound events. Past research in this laboratory addressed this issue by
estimating to what extent there could be a deviation from onset and offset synchrony before concurrent sounds were
perceived as separate events 75% of the time (Turgeon, 1999). Such an SOA threshold for perceiving separate events was
estimated to be between 28 and 35 ms, when brief complex tones were presented diotically and dichotically over headphones
(chapter 4 in Turgeon, 1999). Such an SOA threshold for perceiving separate events was estimated to be between 28 and 35
ms, when brief complex tones were presented diotically and dichotically over headphones (chapter 4 in Turgeon, 1999). In
that study, we estimated individual SOA thresholds, within each of four conditions: diotic and dichotic presentation of
maskers and captors, either harmonically related or not. When they were presented dichotically, which induced a difference
in perceived lateralization, the value of SOA required to segregate them was 12 ms lower than when they were presented
diotically. This was true for both harmonic tones (40 vs. 28 ms) and inharmonic ones (38 ms vs. 26 ms), a lower threshold
indicating less fusion of the maskers and captors. However, whether or not the tones shared a common F0 had little influence
on the SOA threshold (the SOA value required to segregate them). A difference in F0 caused only a 2-ms difference in mean
SOA thresholds, and the standard errors overlapped. Turgeon (1999) concluded that dichotic presentation, but not
harmonicity, influenced the temporal disparity between concurrent tones that was needed for their perception as separate
sounds.
The present experiment examined whether similar temporal limits hold for the presentation of the same stimuli in a free field.
We did not expect harmonicity to have a significant effect on SOA thresholds, though it might affect them weakly. Assuming
that the earlier observed effects of dichotic presentation had acted through differences in perceived lateralization, we
expected that larger angular separations of maskers from captors in a free field should diminish SOA thresholds.
Methods
Subjects. The listeners were 18 adults who were naive to the purpose of the experiment. All had normal hearing for the
250-8000 Hz frequency range, as assessed through a short air-conductance audiometric test.
Stimuli. Stimuli were synthesized and presented by a PC-compatible 486 computer, which controlled a Data Translation DT
2823 16-BIT digital-to-analog converter. The rate of output was 20000 samples per second. Signals were low-pass filtered at
5000 Hz, using a flat amplitude (Butterworth) response with a roll-off of 48 dB/octave. Listeners sat at the center of a
semi-circular array of 13 speakers, one meter away from the listener. The speaker array was situated in the sound-attenuated
chamber of Dr. Zatorre, at the Montreal Neurological Institute. The head of the listener was fixed so as to point in the
direction of the central speaker of the array. The RMS intensity level was the same for all the four-partial tones; it was
calibrated as equal to that of a 1000-Hz tone presented at 60 dB SPL at the central position of the listener's head, that is, at
the center of the array of speakers, one meter away from all of them. When temporally-overlapping tones were presented in
two different speakers (a four-harmonic tone was presented in each speaker) the RMS level was the same at each speaker.
Two rhythmic patterns were to be discriminated by the listeners. Each was repeated to form a sequence that had a total
duration of 9.5 seconds, was composed of 15 tones, and had a tempo of 1.7 tones per second. The two rhythms were different
temporal arrangements of a short 384-ms inter-stimulus interval (ISI) and a long 768-ms one. Rhythm 1 repeated an
alternation of short, long, short, long ISI's three and a half times. This gave rise to perceptual grouping of tones by pairs.
Rhythm 2 repeated a cycle of short, long, long, short ISI's three and a half times; this gave rise to perceptual grouping of
tones in which triplets alternated with a single tone. Both rhythms started and ended with an alternation of a short and a long
ISI. To perceptually camouflage each rhythm, irregular maskers were interspersed among the rhythmic tones. The rhythms
had a constant temporal density of one irregular masker for each 192-ms interval; there were thus two maskers in the short
384-ms ISI and four in the long 768-ms one. The variability in the distribution of irregular intervals was the same in all
conditions, including the no-captor controls. There was no overlap between the rhythmic and masking tones.
In any condition, the same spectrum was used for the rhythmic and masker tones: the same four harmonics of 300 or 333 Hz
of equal intensity. Together they formed the masked-rhythm sequence, which was presented in isolation for the no-captor
control conditions. In all the other conditions, some captor tones were added; they were composed of four harmonics either
of the same F0 as the maskers, or of a different F0. The four possible combinations of maskers and captors were: odd and
even harmonics of a 300-Hz F0; odd and even harmonics of a 333-Hz F0; odd harmonics of a 333-Hz F0 and even harmonics
of a 300- Hz F0; odd harmonics of a 300-Hz F0 and even harmonics of a 333-Hz F0. For each of these combinations, there
were two versions, one in which the maskers (and the rhythm) had the high pitch (even harmonics of 300 or 333 Hz), the
captors having the low pitch (odd harmonics of 300 or 333 Hz), and the other, which had them interchanged.
Each tone was 48-ms long, including a 8-ms onset and offset. The captors could be either simultaneous with the maskers or
delayed from them by 12, 24, 36 or 48 ms. The maskers and captors were of the same duration; hence, for each onset
asynchrony there was an offset asynchrony of the same duration. The amount of temporal overlap between the maskers and
captors varied from a full 48-ms overlap to no overlap. The asynchronous maskers and captors were aligned in phase during
their period of overlap so that the positive peaks of their waveforms were aligned at the period of their common F0. The
masked-rhythm sequence and the irregular captors were either presented in the same central speaker, or else in two different
speakers, equally distant from the central speaker. The speakers could be off center by 30, 60 or 90 degrees; these relative
positions of the sources of the maskers and captors yielded threes angular separations (∆θ): 60, 120 and 180 degrees. For
each ∆θ, the presentation of the masked rhythm and captors on each side of the array was counterbalanced across trials.
Procedure. The subjects had to judge which one of the two rhythms was embedded in the sequence and how clearly it was
heard on a 5-point scale. After each trial, feedback about the accuracy of rhythm identification was provided. There was a
short training session. Listeners were told that they would hear a warning tone followed by one of the two rhythms that they
had previously heard in isolation. They were instructed to direct their attention to the location of the speaker that had sent the
warning tone and to tell which of two rhythms was played. The two isolated rhythms (without captors) were randomly played
at each of the 13 possible speakers until the listeners reached the criterion of 10 correct identifications in a block. This was
followed by a practice session which randomly presented each combination of SOA, ∆θ and harmonicity. This session
allowed the listeners to become familiar with the task and to hear the variations across the conditions, so as to better use the
full range of the rating scale. During the experimental trials of the experiment proper, a 1000-Hz warning tone was played in
the speaker of the masked rhythm so that listeners could pay attention to the location of that rhythm. The listeners' heads
remained fixed despite their attention being directed to speakers in different locations.
Results
Computation of scores
Measure of rhythm sensitivity and response bias. Different accuracy measures were derived from listeners' responses: d'
scores, proportion-correct scores (PC) and weighted-accuracy (WA). WA weights the rated accuracy by the clarity of the
identified rhythm. For parsimony purposes, this short paper focuses on d'scores, occasionally reporting PC scores. The
d'scores and response bias or c were evaluated according to standard procedures (Macmillan and Creelman, 1991). The d'
scores measured sensitivity to Rhythm 1. In terms of Z (i.e., the inverse of the normal distribution function), d' is defined as
Z(H) - Z(F); where H is the proportion of Hits (i.e., Rhythm 1 is reported when it is physically present) and F is the
proportion of False Alarms (i.e., Rhythm 1 is reported when Rhythm 2 is physically present). In Z-scores units, c is given by:
0.5* [Z(H) + Z(F)]. A standard table of the normal distribution was to convert H and F to Z-scores (Macmillan and
Creelman, 1991).
When listeners cannot discriminate at all between the two rhythms (i.e., chance-level performance), H=F and d'=0. On the
other hand, perfect accuracy implies an infinite d'. To avoid values of infinity in the computation of d', proportions of 1 and 0
were thus converted into 0.999 and 0.001 respectively. Proportions of 0.999 and 0.001 yield d' values of 6.18 and -6.18.
However, a lower value of d', namely, 4.65 is usually considered as the effective ceiling (Macmillan and Creelman, 1991);
this is obtained when H=0.99 and F=0.01. As for response bias, a positive c indicates a higher tendency to respond Rhythm
1, a negative c indicates a higher tendency to respond Rhythm 2. Mean bias parameter c close to the zero-bias point are thus
considered as indicative of the absence of a systematic response bias for a given subject.
Estimates of asynchrony threshold for perceiving separate events. To obtain an estimate of the magnitude of stimulus onset
asynchrony (SOA) required for the perception of concurrent sounds as separate events, we determined the 75% SOA
threshold from psychometric "Weibull" functions for the individual listeners (Weibull, 1951). Separate SOA thresholds were
evaluated for the eight different spectro-spatial relations of this experiment (harmonic and inharmonic conditions for each
∆θ). For each of the eight ∆θ-by-harmonicity conditions, the mean goodness of fit (as measured by r, the Pearson correlation
coefficient) of the data was equal to, or larger than 0.87.
Description of the main trends in the results
No-captor controls and measures of biases. The no-captor controls yielded mean PC of 0.54 (SE=0.03) and mean d' of 0.26
(SE=0.12); this is close to the chance level PC performance of 0.5 and d' performance of 0. This verifies that the rhythm was
perceptually masked in the absence of captors. The results also verified that there was no bias for one rhythm over the other.
For the conditions with captors, the mean response bias parameter c for the 18 listeners was -0.03 (SE=0.05); for their
no-captor counterparts, the mean c across individuals was -0.009 (SE=0.09). Given the mean c value very close to zero and
the small standard errors (SE), we concluded that response bias did not diminish the power of our statistical comparisons.
Effect of stimulus onset asynchrony (SOA). For each of the eight ∆θ-by-harmonicity conditions with no temporal asynchrony,
the rhythm-identification performance was at the ceiling value, namely, a PC of 0.99 and a d' of 4.65 for each listener. It thus
seems that temporal coincidence caused frequency components to be perceptually fused, whether they were harmonically
related or not, and whether they came from the same location or from spatially-separated sources, 60, 120 or 180 degrees
apart.
There was a clear monotonic decrease of d' with SOA [p < 10-5]. This powerful effect of SOA upon the fusion of the
maskers and captors is consistent with past results found in the laboratory for the diotic and dichotic fusion of the same tonal
stimuli presented over headphones, as well as for the fusion of brief noise bursts in a free field (Turgeon and Bregman, 1996,
1999). From the mean SOA thresholds estimated for the eight ∆θ-by-harmonicity conditions, an SOA between 26 and 37 ms
(i.e., the range extending from one SE below the lowest mean threshold found in the present experiment to one SE above the
highest one found) seems to trigger the perception of concurrent brief tones as separate events. This is in good agreement
with the estimated 23-to-42 ms range for the perception of the same tones as separate events when they were presented over
headphones (chapter 4 in Turgeon, 1999).
The 25-to-40-ms range of the mean SOA thresholds for the diotic, dichotic and free-field segregation of brief tones agrees
with the literature on auditory grouping, reviewed by Darwin and Carlyon (1995), showing that an SOA of 30 to 40 ms is
required for removing a partial from contributing to the overall timbre, to the lateralization and to the vowel identity of a
complex sound. There is a close correspondence between the magnitude of the SOA leading to the perception of separate
sounds and that for the computation of its emergent properties, since timbre, vowel quality and lateralization are properties of
perceptually-segregated sound events. It is worth noting that this does seems not to apply to all perceptual properties of
sounds. For instance, the SOA needed to prevent a partial from entering the computation of the pitch of a complex tone,
estimated as 300 ms by Darwin and Ciocca (1992) is an order of magnitude higher than our estimated 30 ms SOA for event
perception. This discrepancy between the temporal limits for pitch and event perception may be related to differences in their
underlying neural mechanisms (Brunstrom and Roberts, 2000).
Effect of harmonicity and of the spatial separation of sound sources (∆θ). There was a weak but consistent effect of
harmonicity in promoting the fusion for asynchronous masker and captor tones, as measured by d' scores for each listener
[p<0.01]. The spatial separation of sources (∆θ) did not affect at all rhythm sensitivity as estimated by d' [p>0.1]. The d'
scores are compatible with the highly consistent mean SOA thresholds found across the different spatial and spectral
relations, as shown on Figure 1. The mean thresholds all fell between 28.5 (for ∆θ of 120 degrees and different F0's) and
34.2 ms (for ∆θ of 180 degrees and same F0). Note that a higher threshold indicates more fusion, since a larger asynchrony is
required to perceptually segregate the maskers from the captors. From this figure, it is clear that the effect of harmonicity was
weak and that ∆θ had no effect on fusion; still, harmonicity slightly affect the temporal disparity needed for the perception of
separate events - a mean SOA of 32.4 (SE=3 ms) for harmonic stimuli, versus 30.1 (SE=3.9) for inharmonic ones. This 2-ms
difference between the mean SOA threshold estimates for harmonic and inharmonic tones corresponds to that found for their
presentation over headphones (chapter 4 in Turgeon, 1999). The present results suggest that only spectro-temporal
regularities matter for the cross-spectral segregation of concurrent brief tones in a free field, SOA making by far the greatest
contribution.
Figure 1: Mean SOA thresholds across individual listeners for different spectral and spatial relations between the masker and
captor tones having the same F0 (harmonic) or different F0s (inharmonic), at four angular separations of their sources in a
semi-circular speaker array. Standard errors (SE) are indicated.
Discussion
Temporal coincidence and deviations from it, as induced by onset and offset asynchrony, was by far the most important
factor for the perception of short-duration tones as one or two sound(s). Whereas masker and captor tones fused into a single
masker-captor event when they were synchronous, when they were separated by an SOA of about 30 ms, they were
segregated as two distinct events. Strong fusion was clearly shown by the perfect rhythm-identification performance at 0-ms
SOA (PC of 0.99). On the other hand, clear segregation was shown by the low performance at 36 ms (mean PC of 0.70) and
48 ms SOA (mean PC of 0.67). Intermediate values of SOA of 12 and 24 ms produced ambiguous cases of grouping, in
which the maskers and captors were neither fully fused, nor fully segregated. This ambiguous grouping might be linked to
the inherent temporal constraints of the auditory system due to short-term adaptation of the auditory-nerve fibers (Kiang et
al., 1965). As a result of the 10-to-20 ms period that it takes for an onset-sensitive neuron to return to its baseline activity,
there might be a minimum temporal disparity required for the system to distinguish two consecutive sound events which are
temporally contiguous. This is the situation when two sounds are close together in time and separated by a brief period of
silence, as is the case for the detection of a temporal gap, or when they are temporally overlapping, as is the case in our RMR
studies. The hypothesis of short-term adaptation as imposing some limit for the temporal resolution of sound events at
different places in the spectrum is consistent with the estimated minimum 30 ms disparity needed to detect a gap across the
spectrum, i.e., an offset-to-onset interval (Formby, Sherlock and Li, 1998) and to detect an onset-to-onset and offset-to-offset
disparity across the spectrum in our RMR studies.
The presence or absence of a common F0 does not seem to play an important role for the segregation of brief concurrent
tones as shown by the small differences in PC and d' obtained for harmonic and inharmonic maskers and captors.
Furthermore, Figure 1 shows that it affected only weakly the temporal disparity needed for their segregation as separate
events. This is consistent with the results found for the presentation of the same stimuli over headphones (chapter 4 in
Turgeon, 1999). Further experimentation should attempt to determine whether the weak role of harmonicity for the fusion of
short-duration sounds is related to differences in the temporal limits for the segregation of sounds as separate events and the
computation of their pitch (Darwin and Carlyon, 1995).
In this study, the angular separation of the sources (∆θ) did not yield any difference in fusion, whether fusion was estimated
from d' or from SOA thresholds based on PC scores. This goes contrary to the results of research in which the same sounds
were presented over headphones (chapter 4 in Turgeon, 1999). It might be that dichotic separation is more efficient for sound
segregation because it is an extreme case of interaural differences for sounds happening simultaneously, the stimulation of
one sound being delivered to one ear only, while that of the other sound(s) is delivered to the other ear only. Free-field
testing is more akin to real-world situations in which each of many individual sounds stimulates both ears, though at slightly
different times and intensities, allowing for the computation of the location of each sound source. When drawing conclusions
about the contribution of spatial disparities, one should not consider dichotic presentation as reflecting ecologically valid
differences in sound-source locations. Even when two sound sources are close to different ears, a sound coming from one of
them usually stimulates the two ears, albeit with larger binaural differences in intensities and time of arrival than if sources
were closer to the midline axis. For this reason, the separation of sources in a free field is considered as more representative
of the true contribution of spatial separation to sound-source segregation. This contribution seems to be very weak when two
sound sources are simultaneously active. It is also worth noting that steady-state sounds were used in the present study.
Tones fluctuating in amplitude might permit spatial differences to cause segregation, especially with longer tones. This
remains to be empirically investigated.
An important implication of this research is that when brief complex tones happen at the same time, sound-source
segregation ("how many" individual sources are perceived) is independent of sound-source separation ("where" individual
sources are relative to each other in the immediate environment). This is consistent with the claim that localization ("where")
entails segregation ("how many"), but not the reverse. To localize a source, a source has to be perceived in the first place. For
instance, if the bark of a dog or the sound of an unknown animal is heard as coming from a precise location, its source has to
be segregated from the other environmental sources, and this whether it is identified or not. However, a source can be
perceived and identified without being localized. Everyone has experienced at some point, hearing a familiar sound
distinctly, without being able to tell exactly where it was coming from. A similar reasoning holds for pitch: pitch perception
is a property of a perceptually-segregated sound; nevertheless a sound can be segregated without having a pitch, as happens
when a brief click without a definite pitch is perceived. Sound segregation is such a basic property of audition that one might
expect that the system computes it even in the face of ambiguity in the signal (e.g., as to "where" it comes from).
Summary of conclusions
The use of the RMR paradigm, which creates ambiguous auditory figures, allows for the evaluation of the relative
importance of auditory-grouping cues in sound-source determination. It has shown that: i. temporal coincidence is sufficient
for the perceptual fusion of short-duration tones; ii. an onset-to-onset disparity between 28 and 35 ms segregates them as
separate sound events; iii. spectral regularities, such as simple harmonic ratios affect weakly the degree of fusion at
intermediate values of temporal disparities from 12 to 24 ms and iv. the fusion of short-duration tones which are spectrally
non overlapping appears to be independent of the angular separation of their sound sources. The short tones that were used in
this study may have been responsible for these results. Whether these conclusions apply to sounds of a longer duration and to
other types of complex sounds (e.g. speech sounds) awaits further experimentation.
References
Brunstrom, J.M., and Roberts, B. (2000). Separate mechanisms govern the selection of spectral components for
perceptual fusion and for the computation of global pitch. J. Acoust. Soc. Am., 107, 1566-1577 .
Darwin, C. J. and Carlyon, R. (1995). Auditory grouping. In B.C.J. Moore (Second eds.), Hearing: The handbook of
perception and cognition, Volume 6, (pp. 387-424). London: Academic Press.
Darwin, C. J. and Ciocca, V. (1992). Grouping in pitch perception: Effects of onset asynchrony and ear of presentation
of a mistuned component. J. Acoust. Soc. Am., 91, 3381-3390.
Formby, C., Sherlock, L. P., and Li, S. (1998). Temporal gap detection measured with multiple sinusoidal markers:
Effects of marker number, frequency, and temporal position. J. Acoust. Soc. Am., 104, 984-998.
Hartmann, W. M. (1988). Pitch perception and the organization and integration of auditory entities. In G.W. Edelman,
W.E. Gall & W.M. Cowan (Eds.), Auditory function: neurobiological bases of hearing, (pp.; 623-645). New York:
John Wiley and Sons).
Kiang, N. Y-S. Watanabe, T., Thomas, E. C., and Clark, L. F. (1965). Discharge patterns of single fibers in the cats
auditory nerve. Research Monograph No. 35. Cambridge, MA.
Kidd, G., Mason, C., Rohtla, T. L., and Deliwala, P. S. (1998). Release from masking due to spatial separation of
sources in the identification of nonspeech auditory patterns. J. Acoust. Soc. Am., 104, 422-431.
Macmillan, N. A. and Creelman, C. D. (1991). Detection Theory: a User's Guide. Cambridge, MA: MIT Press.
Turgeon, M. (1999). "Cross-spectral grouping using the paradigm of rhythmic masking release". McGill University.
Doctoral thesis dissertation.
Turgeon, M., and Bregman, A.S. (1996). " 'Rhythmic Masking Release': A paradigm to investigate the auditory
organization of tonal sequences.", In: Proceedings of the 4th ICMPC, pp. 315-316.
Turgeon, M. and Bregman, A.S. (1999). Rhythmic Masking Release II: Contribution of cues for perceptual
organization to the cross-spectral integration of concurrent narrow-band noises in a free field -- asynchrony,
correlation of rapid intensity changes, frequency separation and spatial separation. Unpublished manuscript, Dept. of
Psychology, McGill University Montreal, Quebec, Canada. Submitted to J. Acoust. Soc. Am.
Weibull, W. A. (1951). A statistical distribution function of wide applicability. J. Appl. Mech., 18, 292-297.
Back to index
Proceedings paper
The Voice In Therapy: Monitoring Disease Process In Chronic Degenerative Illness.
Wendy L. Magee PhD SRAsT(M)
Music Therapy Department
Royal Hospital for Neuro-disability
West Hill
London SW15 3 SW
Department of Music
University of Sheffield
Sheffield S10 2TN
Jane W. Davidson PhD
Department of Music
University of Sheffield
Sheffield S10 2TN
Background.
From the moment we are born, our voice is the instrument with which we communicate through
non-verbal vocalisations (H. Papousek, 1996). Intuitively, care-givers respond to these non-verbal
vocalisations in an interactive way, imitating, extending and developing the pitch, melodic contour,
rhythm, phrasing and volume of the infant's vocal gestures (M. Papousek, 1996; Stern, 1985). In this
way, a child learns to interact and develops in their social and emotional functioning.
Therefore, the voice, with its shifting, fluid, musical make-up provides the basic vehicle for human
communication and interpersonal relationships. However, an individual who acquires
neuro-degenerative disease faces the possibility of total loss of voice and the most primitive and
spontaneous means of communication. Although there are many augmentative communication aids
and assistive technologies now available for people who can no longer speak, the psychosocial
consequences of losing all ability to voice cannot be underestimated.
Music Therapy is the planned and intentional use of music to the meet an individual's social,
emotional, physical, psychological and spiritual needs within an evolving therapeutic relationship. In
the therapy session, the therapist and client explore the client's world together, basing all interaction
on the client's musical utterances or musical preferences. This forms the basis for the therapeutic
relationship.
Within the clinical literature with a neuro-degenerative population, particular focus has been given to
the use of the music for emotional expression and personal interaction skills (Magee, 1995a&b;
Brandt, 1996; O'Callaghan and Turnbull, 1987 & 1988; O'Callaghan and Brown, 1989) and life
review processes through song choice and song-writing (O'Callaghan, 1984, 1990, 1995, 1996, 1999).
Although music therapy programmes have also aimed to improve functional speech through rhythmic
speech drills and singing (Erdonmez,1976; Crozier and Hamill, 1988), there has been little detail
given to the role singing in therapy may play in the holistic social, emotional and physical needs of
the patient who faces gradual voice loss as part of their degenerative disease process.
Aim.
This paper presents a single case study taken from a larger study investigating music therapy in
chronic neurological illness (Magee, 1998). This case study explores the experience of the physical
act of singing in the therapeutic process for an individual living with chronic degenerative illness.
Method.
A group of adults with Multiple Sclerosis were recruited from multidisciplinary referrals and
self-referrals at a residential and day care facility for complex neuro-disability. Participants received
individual music therapy from a qualified, state registered music therapist as part of a wider clinical
programme. The music therapist was the primary researcher and so worked as a participant researcher
for the study.
The music therapy sessions took place weekly for a period of approximately six months for each
participant. The session format included active participation in exploring instruments, joint clinical
improvisation with the therapist and singing songs of the participant's choice which had particular
meaning to them. Discussion of the musical material or personal material relating to it was included in
the session if the participant indicated a desire to do so.
Primary data were collected in the form of focussed interviews held after sessions by the
therapist/researcher. Secondary sources of data included the verbal, musical and behavioural
responses from sessions, as well as open coding analytical notes made during transcription of the
interviews. Three forms of data therefore emerged from the process.
A modified grounded theory paradigm was used to analyse data employing the steps of open and axial
coding (Strauss & Corbin, 1990). Trustworthiness was gained through prolonged involvement,
persistent observation, long-standing clinical experience with this population, and peer debriefing
with the multidisciplinary team. Triangulation was implemented on several levels. Ongoing analysis
of the clinical material was taken to an independent music therapy supervisor whose theoretical
framework differed from the therapist/researcher's, offering alternative interpretations of events to
those made by the researcher and thereby enhancing objectivity. This process was also implemented
with selections of the interview analyses, using an independent auditor familiar with therapeutic
theory. Case-study design was used to report the findings.
Results.
Open coding of the data found that individuals overtly or subtly monitored the changes in their
physical, vocal or cognitive functioning resulting from their disease process. This phenomenon was
entitled 'Illness monitoring'. This action included individuals describing a particular ability in different
situational contexts, comparing one's ability with others', monitoring the type of change experienced,
the extent of any change, and making temporal comparisons of 'now' to 'before'.
Individuals often consistently assessed different aspects of their own physical, cognitive and vocal
functioning in relation to those around them, who lived on the same ward. Others with whom general
living space was shared may have had the same diagnosis, but may have been in a more advanced
stage of the disease. Monitoring change in this way served to increase awareness and self-knowledge
thereby regaining some sense of control. Furthermore, by increasing self-knowledge, one was better
prepared to employ strategies for dealing with the emotional consequences of a negative change in
abilities.
A single case study will be used to illustrate the results of axial coding, examining the particular
phenomenon of 'vocal monitoring'.
production. From the larger analysis of the group's data, it was found that individuals drew on coping
strategies to a greater degree when the individual felt a lower sense of control, a higher sense of threat,
and a greater sense of confrontation by their disease. Drawing on coping strategies in this way served
to mask deeper responses to the disease process. It can therefore be assumed that the process of vocal
monitoring stimulated emotional responses for Jack which he felt the need to control. When he was
able to engage more emotionally with the music however, vocal monitoring was less likely to take
place.
Conclusions.
This case study highlights the importance which singing can hold for an individual with chronic
degenerative disease. However, the meaning of singing found in this study does not support the ideas
put forward in previous music therapy literature. Song themes were not a primary facilitative factor in
Jack's use of song, as has been stated in previous music therapy literature reporting on the use of
song-based techniques. Despite his illness process and the difficulty which Jack was experiencing in
vocalising, he used the songs within his music therapy as a way to defy his illness process. Certainly
he gained greater meaning in life through his act of singing songs within a therapeutic relationship. In
reality, Jack died of pneumonia and respiratory failure two years after his participation in this study
finished. Retrospectively, it is apparent that his experience of singing his songs represented life's
breath running through him. The continual referral to his breathing, throat and voice, on reflection,
indicate a high level of anxiety which he was attempting to conceal.
Considering theoretical frameworks offered by health sociology, through the act of singing Jack was
testing the physical limits of his body and making comparisons in terms of temporal and situational
parameters (Corbin and Strauss, 1987). In this way, he achieved greater senses of independence, skill
and ability, which helped to shift his sense of identity using the therapist to validate his performance.
The phenomenon 'Illness monitoring' which emerged in this study has elsewhere been entitled the
'dialectical self' (Charmaz, 1991). This phenomenon, like illness monitoring, involved taking the body
as an object, appraising it, and comparing it with the self in different temporal and situational
frameworks.
Active involvement in music therapy through singing facilitates a physical expression in which
individuals explore their remaining physical capabilities. Through sustained exploration of their own
individual physical change and loss, the physical experience becomes an intensely emotionally
charged one relating directly to aspects of the illness identity. It is imperative for the music therapist
working with this population to understand that through physical monitoring during the act of singing,
individuals with chronic degenerative illness may become more acutely aware of their emotional
responses to their illness process.
References.
Brandt, M. (1996). "'This is my life.' Songwriting and song interpretation with Huntington's patients."
In: Smeijsters, H. and Mecklenbeck, F. (Eds.), Book of Abstracts, of 'Sound and Psyche' 8th World
Congress of Music Therapy, Hamburg, 1996, p. 216. Druck: Service-Cruck Kleinherne, Dusseldorf.
Charmaz, K. (1991). Good Days, Bad Days. The Self in Chronic Illness and Time. New Brunswick:
Rutgers University Press.
Corbin, J. & Strauss, A. (1987). Accompaniments of Chronic Illness: Changes in Body, Self,
Biography, and Biographical Time. In Roth, J. & Conrad, P (Eds.), Research In the Sociology of
Health Care: A Research Annual. The Experience and Management of Chronic Illness, 6, 249-281.
London: JAI Press Inc.
Crozier, E. and Hamill, R. (1988). The benefits of combining speech and music therapy. Speech
Therapy in Practice, November, 9-10.
Erdonmez, D. (1976). The Effect of Music Therapy in the Treatment of Huntington's Chorea Patients.
Proceedings of the 2nd National Conference of the Australian Music Therapy Association
Incorporated, 58-64.
Magee, W. (1995)a. 'Case studies in Huntington's Disease: music therapy assessment and treatment in
the early to advanced stages.' British Journal of Music Therapy, 9(2) 13-19.
Magee, W. (1995)b. 'Music Therapy as Part of Assessment and Treatment for People Living with
Huntington's Disease'. In: C. Lee (Ed.), Lonely Waters: Proceedings of the International Conference
Music Therapy in Palliative Care, 173-183. Oxford: Sobell Publications.
Magee, W. (1998) 'Singing my life, playing my self'. Investigating the use of familiar pre-composed
music and unfamiliar improvised music in clinical music therapy with individuals with chronic
neurological illness. Unpublished doctoral dissertation, University of Sheffield, UK, #9898.
O'Callaghan, C. (1984). Musical Profiles of Dying Patients. Australian Music Therapy Association
Bulletin, 7(2), 5-11.
O'Callaghan, C. (1990) Music therapy skills used in song writing within a palliative care setting.
Australian Journal of Music Therapy,1, 15-22.
O'Callaghan, C. (1995). Songs Written by Palliative Care Patients in Music Therapy. In: C. Lee (Ed.),
Lonely Waters: Proceedings of the International Conference Music Therapy in Palliative Care, 31-40.
Oxford: Sobell Publications.
O'Callaghan, C. (1996). Lyrical Themes in Songs Written by Palliative Care Patients. Journal of
Music Therapy, 33(2), 74-92.
O'Callaghan, C. (1999). Lyrical Themes in Songs Written by Palliative Care Patients. . In Ed. D.
Aldridge, Music Therapy in Palliative Care, 43-58. London: Jessica Kingsley Publishers.
O'Callaghan, C. and Brown, G. (1989). Facilitating Communication with Brain Impaired Severely Ill
People: Using Neuropsychology and Music Therapy. Presented at N.A.L.A.G.'S Sixth Biennial
Conference, Melbourne, September 1989.
O'Callaghan, C., and Turnbull, G. (1987). The Application of a Neuropsychological Knowledge Base
in the Use of Music Therapy With Severely Brain Damaged Adynamic Multiple Sclerosis Patients.
Proceedings of the 13th Conference A.M.T.A., Melbourne, 92-100
O'Callaghan, C., and Turnbull, G. (1988). The Application of a Neuropsychological Knowledge Base
in the Use of Music Therapy With Severely Brain Damaged Disinhibited Multiple Sclerosis Patients.
Proceedings or the 14th Conference A.M.T.A., Adelaide, 84-89
Papousek, H. (1996) Musicality in infancy research: biological and cultural origins of early
musicality. In: Deliege, I. & Sloboda, J. (Eds), Musical Beginnings: Origins and Development of
Musical Competence, 37-55. Oxford: Oxford University Press.
Papousek, M. (1996) Intuitive parenting: a hidden source of musical stimulation in infancy. In:
Deliege, I. & Sloboda, J. (Eds), Musical Beginnings: Origins and Development of Musical
Competence, 88-112. Oxford: Oxford University Press.
Stern, D. (1985) The Interpersonal World of the Infant. New York: Basic Books.
Strauss, A. & Corbin, J. (1990). Basics of Qualitative Research. Grounded Theory Procedures and
Techniques. Newbury Park: Sage Publications, Inc.
Authors' note.
Wendy L. Magee BMus PhD ARCM SRAsT(M) is Head of Music Therapy at the Royal Hospital for
Neuro-disability, London, holding a clinical post as a music therapist working with adults with
acquired and complex neuro-disability, and developing research projects with this population. This
research is part of doctoral research undertaken whilst registered at the Department of Music,
University of Sheffield. Jane W. Davidson BA PGCE MA PhD Cert. Counselling is Senior Lecturer
in Music at the Department of Music, University of Sheffield. She is editor of the international journal
Psychology of Music and has researched on a wide range of topics from self and identity in singers
through to expressive body movement and piano performance, having over 50 publications to her
name in international peer-reviewed journals. Besides researching, she teaches a wide range of
courses and is an active performer, artistic director and producer.
The authors would like to acknowledge the Living Again Trust, the John Ellerman Foundation, the
Juliette Alvin Trust and the Music Therapy Charity who all contributed to funding this project. The
author also would like to thank the research participants who took part in this study. The Royal
Hospital for Neuro-disability received a proportion of its funding to support this paper from the NHS
Executive. The views expressed in this publication are those of the authors and not necessarily those
of the NHS Executive.
Address for correspondence: Dr. Wendy L. Magee, Music Therapy Department, Royal Hospital for
Neuro-disability, West Hill, London SW15 3SW, UK
Back to index
Proceedings paper
This paper draws from my own PhD research on assessing pupils' compositions. Within the National
Curriculum in England composing is a statutory part of the programmes of study for Music. From
September 2000 the revised orders for music introduce level descriptors to be used by teachers to best
fit a pupil's performance at the end of Key Stage 1 (age 7), Key Stage 2 (age 11) and Key Stage 3 (age
14).
From the original implementation of the Music National Curriculum, to subsequent revisions in 1992,
followed by the Dearing revision (1995) and the recent simplification (1998), the expectations at each
of the three Key Stages have narrowed. This is presented in the form of a sequential progression
through the elements of music (pitch, duration, dynamics, tempo, timbre, texture and structure). Thus
:
Key Stage 3 pupils should be 'discriminating within and between the musical elements'
In 'Teaching Music in the National Curriculum' Pratt and Stephens (1995) presented this model in the
form of a table to indicate progression within each element. According to this, it follows that : pupils
would first talk about loud, quiet and silence before going on to talk about gradating levels of volume,
and before progressing to recognise subtle differences in volume. This implies that musical conceptual
understanding is developed though an increasingly discriminatory vocabulary based on the principle
of quantitative addition. As Swanwick (1996) argues, most of the Key Stage statements in the Music
National Curriculum document are essentially quantitative in character rather than qualitative; he
urges 'we need to have criterion statements to pick up these qualitative shifts' (p. 34).
The theoretical framework draws from several research projects which have tried to map out these
qualitative shifts. For example, Swanwick and Tillman's model (1986) was adapted by Swanwick to
suggest a basis for establishing criteria for assessing composition and more recently for assessing
performance and listening (Swanwick, 1988, 1996, Hentschke, 1993). Other researchers such as Flynn
and Pratt (1995) use a 'bottom-up' approach. This type of approach sought to make explicit the criteria
which the teachers identified in making such qualitative shifts.The DELTA Project (Development of
Learning and Teaching in the Arts) conducted by Hargreaves, Galton and Robinson (1996) devised a
methodology which claimed to make explicit the implicit criteria which teachers use to make
judgements about children's products. The findings for music made ground in developing a language
of assessment. Hargreaves et al. (1996) provided a five phase model which incorporated domain
specific as well as general cognitive aesthetic developments. The researchers reported the
developments in terms of phases so as not to be confused with Piagetian 'stages'. The five phases are
denoted as sensorimotor , figural, schematic, rule systems and professional. The model draws from
current domain specific research and acknowledges its somewhat sketchy form at the stage of writing
as reflecting the parsity of research in this area. Nevertheless, it provides some interesting insights
which draw together psychological research into aesthetic development. Rather than attempting to
draw out level descriptors Hargreaves and Galton's model sketches phases of development which
invite further research and 'real-world' application.
This paper draws from that part of my earlier research which set out to exemplify a 'real world'
application. It was also set in the climate of the raised profile of literacy across the curriculum taking
into account the reports on the use of Language within the Common requirements of the National
Curriculum (SCAA, 1997) and their attempt to provide a way forward for the role of language in
music education:
Teachers are encouraging the use of technical vocabulary with greater confidence but more help is
requested with regard to the musical vocabulary which should be taught at each key stage. (SCAA
1998, Section 11)
Teachers in all key stages need guidance on subject knowledge and how this knowledge can be
integrated in practical work including the development of aesthetic awareness and musical
vocabulary. (SCAA, 1998, Summary)
One objective of the research was to investigate how children used the language of the musical
elements as defined by the English National Curriculum as pitch, duration, dynamics, tempo, timbre,
texture and structure. I was interested to see how the terms were used by the children and to what
extent these revealed qualitative shifts in their conceptual understanding of their own and their peers'
compositions.
Methodology
The research included 154 children, 78 girls and 76 boys aged 9-13 years. The sample was taken from
Years 5-6 in the Upper Primary School (Upper Key Stage 2: ages 9-11) and Years 7-8 of the
Secondary School (Key Stage 3; ages 11-13). The research task was designed as part of the children's
curriculum music sessions led by the teacher (new in post) who was also the researcher. The children
had been asked to compose 'what they thought made a good tune'. The design took account of the pilot
study research in the following ways.
First, the composing task was presented in an open-ended way. To avoid influencing the listening
responses the task was not directed in a series of sequential stages. Second, the starting point was a
melodic composition; as such, it did not have an extra-musical referent. Third, the composing activity
was organised on an individual basis and not in groups, thereby allowing each composition to be
identified with each pupil's individual response. Fourth, the school was fortunate to have sufficient
keyboards for the children to share one between two. Although there may have been some pair-work
influence the children used separate headphones and were asked to work individually on their own
tunes. Keyboards were used as a means of 'controlling' the sound sources used, so that the children's
subsequent responses were not limited to a simple recognition of the instrument. They also proved to
be a highly motivating sound source across the sample age range. For the purposes of the research the
pupils were asked to choose their own sound from the sound bank but not to use a rhythm or beat
accompaniment. Note names and/or staff notation could be used as a means to map out the tune for
performance but this was not obligatory. The children worked on their tunes in 3 x 50 minute lessons
and performed and recorded their compositions onto audio tape.
The design took into account that for many children composing was a new experience. Some had
more experience of playing an electronic keyboard than others. The research also acknowledged that
some children had piano skills and that this would influence the musical outcome. However, it was
considered that the task was equally accessible for all children, open-ended enough to allow
individual approaches and age appropriate (in both the instruments used and also the type of task).
Equally, the task fulfilled the requirements of the Music National Curriculum and was presented in
such a way as to encourage pupil ownership of the learning which was synonymous with the school's
philosophy of education.
In the final week the children were invited to appraise their compositions. For research purposes the
design investigated the childrens' listening responses to their own composition and to those of their
peers. In practice the children listened to the recording of their classes' compositions and, in the pause
in between each piece, they gave a mark out of 10 and wrote a reason for their choice on a given pro
forma. The results were collated and used as a basis for both quantitative and qualitative analysis.
Data Analysis
In order to analyse the data a coding scheme was developed to categorise the content of written
listening responses. An initial survey of the data produced 22 categories of response. These were
subsequently reduced to five broad categories as follows :
1. Musical Elements
Responses in this category refer to the elements of music as defined in the Music National
Curriculum. They include references to: Pitch, Duration, Dynamics, Tempo, Timbre, Texture,
Structure. Responses in this category might include for example, 'it was loud', 'it was short', 'the notes
went up and down', 'it had a hollow sound', 'it repeated'.
2. Style
In this category are responses which make stylistic references, for example, 'it sounds classical', 'it
sounds like Jazz', 'it sounded Japanese'.
3. Mood
Responses in this category indicate an affective response to the music, for example, 'it made me feel
happy', 'it was depressing', 'it was spooky'.
4. Evaluation of Composition
Responses in this category demonstrate an evaluative statement of the composition itself, for example,
'it was good', 'it was well put together'.
5. Evaluation of Performance
Responses in this category refer to an awareness of the qualities of the performance, for example, 'he
missed a note', 'it was played well' .
An independent rater and myself performed a reliability study which categorised a sample of the
responses. The results were correlated using statistical measures and showed that the coding scheme
allowed a satisfactory level of agreement and that further analysis was justified.
Results
For the purposes of this paper I shall present a summarised version of the
qualitative results. The initial analysis involved mapping the responses into the five categories
described above as Musical Elements, Style, Mood, Evaluation of Composition and Evaluation of
Performance. I shall discuss each category in turn focusing on the language used by the children. The
extracts from the children's responses were chosen because they were representative of particular
types of response.
Musical Elements
Within this category the responses were subdivided into a further 7 subcategories which corresponded
to the musical elements within the Music National Curriculum (DFE, 1995). I shall focus on each in
turn.
Pitch
In the Music National Curriculum pitch is described from Key Stage 1-3 as :
(KS 1) high /low
(KS 2) gradations of pitch e.g. sliding up/down, moving by step/leap, names for pitch
(KS 3) various scales and modes e.g. major, minor
The children's responses revealed a range of ways of talking about pitch in relation to their tunes
showing a greater degree of aesthetic differentation to the schema presented in the Music National
Curriculum. To summarise, responses which refer to pitch show:
● children at KS 2 and 3 identify high and low and gradations of pitch, some recognise
scales; e.g.'it is a really high tune', 'I like it because it goes up and down' , 'it did sound very like
a scale'
● children show personl preferences and prefer tunes which are not too high or too
low;e.g.'it was too low for me', 'too high and boring'
● younger children prefer tunes where the pitch does not move around too much whereas
older children tolerate a greater range of pitch; KS 2 e.g. 'it was a good tune and none of the
notes clashed', 'all the notes go really well', 'the notes moved around too much'.KS 3 e.g. 'the
notes are low and go with one another', 'very interesting - big range of notes'
● movement metaphors are used to describe pitch contour; e.g.'running up and down'.
Duration
In the Music National Curriculum duration is described from Key Stage 1-3 as :
(KS 1) long/short; pulse or beat; rhythm
(KS 2) groups of beats, e.g. in 2s, 3s, 4s, 5s; rhythm
(KS 3) syncopation, rhythm
The responses divided between those which focused on the qualities of the duration as beat or rhythm
and those which focused on the duration of the tune as a whole. The children's responses which
focused on duration/beat-rhythm were considerably less differentiated than those within the
category of pitch..To some extent some of the responses might have been made in relation to the the
quality of performance as much as for the rhythmic qualities of the tune. No explicit responses
demonstrated an understanding of groups of beats per se.To summarise, responses which refer to
duration/ beat-rhythm show:
● recognition of a beat or rhythm; e.g. 'I like the rhythm' 'I liked the beat'
● a sense of rhythm which can be followed and which flows; e.g.'needs a better beat - should
flow more', 'quite bitty, rhythm hard to follow'
● rhythm in time; rhythm off or on the beat; e.g. 'out of time', ' a good off beat'.
Apart from one response, which comments on the length of the notes ('[it's] good how it is staccato'),
most of the children's responses which use the words long and short refer to duration as the length of a
the tune as a whole. In this way duration is linked to the element of structure. To summarise,
responses which refer to duration/long-short show:
● preference for tunes which are neither too short or too long; e.g. 'it was short and boring',
'short and snappy', 'it went on too long and was boring'.
● preference for longer tunes by some KS 3 children; e.g.'good flow and rhythm, but abit
short'
● appropriate duration for each particular tune; e.g. 'short and effective', 'it was nice that it
was short'.
Dynamics
In the Music National Curriculum dynamics are described within Key Stage 1-3 as :
(KS 1) loud, quiet, silence
(KS 2) different levels of volume, accent
(KS 3) subtle differences in volume, e.g. balance of different parts
Not all the keyboards were touch sensitive and so the volume was controlled at source rather than
through touch. The most marked difference in this sub-category was that the Key Stage 2 children
made far more references to dynamics than the Key Stage 3 children. From this initial analysis there is
evidence to suggest that the girls produced more responses which showed a preference for quiet music
and boys produced more responses which favoured loud music, especially in upper Key Stage 2. To
summarise, responses which refer to dynamics show:
● recognition of loud or quiet, decreases and increases in volume; e.g. 'it is quiet', 'I like the
fading out bits'
● parts of the tune which varied in volume; e.g.'it has one side soft and one side loud', 'it was
quiet and has a slight echo'
● dislike for tunes which were either too soft or too loud; e.g.'too quiet', 'too loud all a long'
● preferences which relate mood to dynamics; e.g. 'it was soft and calming'
● KS 2 produced more responses than KS 3, boys prefer loud tunes especially at KS 2; e.g.
'it was not loud enough '(boy), 'I don't really likeit because it is loud' (girl).
Tempo
In the Music National Curriculum tempo is described from Key Stage 1-3 as :
Timbre
In the Music National Curriculum timbre is described from Key Stage 1-3 as :
(KS 1) quality of sound, e.g. tinkling, rattling, smooth, ringing
(KS 2) different qualities, e.g. harsh, mellow, hollow, bright
(KS 3) different ways timbre is changed, e.g. by mute, bowing/plucking, electronically; different
qualities, e.g. vocal and instrumental tone colour
To summarise, responses which refer to timbre express :
● qualities of the sound which relate to other sound sources and instruments; e.g.'it sounded
like bottles', 'it sounded like the flute', 'it sounded like a bassoon', 'it sounded like someone
playing the sitar'
● preference for changes of the sound; e.g.'its good with lots of sounds', 'I liked the sound
effects', 'I don't think the sound was relevant to the tune'.
● qualities of the sound in terms of mood, depth and association; e.g. 'funny sound', 'weird
sound', 'heavy sound', 'ghostly sound'
● more responses by the girls than boys;
Texture
In the Music National Curriculum texture is described from Key Stage 1-3 as :
(KS 1) several sounds played or sung at the same time/one sound on its own
(KS 2) different ways sounds are put together e.g. rhythm on rhythm; melody and accompaniment;
parts that weave, blocks of sounds, chords.
(KS 3) density and transparency of instrumentation; polyphony and harmony
There were very few responses in terms of texture and this can be accounted for by the nature of the
composition task. This was essentially a linear melodic construction and did not require more than one
part at once. Some children used chords to accompany their melodies and some responses reflect this
e.g. 'long, with nice chords', 'the chords go well together'.
Structure
In the Music National Curriculum structure is described from Key Stage 1-3 as :
(KS 1) different sections, e.g. beginning middle end, repetition e.g. repeated patterns, melody, rhythm;
(KS 2) different ways sounds are organised in simple forms, e.g. question and answer, round, phrase,
repetition, ostinato (a musical pattern that is repeated many times), melody;
(KS 3) forms based on single ideas e.g. riff, forms based on alternating ideas e.g. rondo, ternary,
forms based on developmental ideas e.g. variation, improvisation.
The responses showed that the pupils perceived structure in a number of ways.
To summarise, responses which refer to structure show:
● attention to beginnings and endings more than to middle events; e.g.'I liked the beginning
bit', 'good start', 'it finished well'.
● extra-musical associations and musical events within the structure; e.g.'at the start it is a bit
creepy and heavy', 'in the beginning it sounds like birds'
● structure used to locate one or more particular musical events within the same tune;
e.g.'in the middle it was a bit of a copy', 'a tiny difficulty in the middle'
● perception of simple/complicated structures, in terms of repetition, change and pattern;
e.g.'a bit repetitive to begin with', 'he mostly uses the same keys', 'it kept on continuing itself
forever and forever', 'it didn't change much and had no variation', 'he just repeated', 'it changed a
lot which was good'
● that KS 3 pupils are more aware of the structural process i.e. how a tune is built up and
its effectivieness; e.g. 'he should have added more in between', 'it was like it was gradually
building up', 'it was put together well', 'plain but good', 'its simple but it has something to it',
Style
Children's style sensitivity is represented in a number of ways. To summarise, responses in this
category refer to:
● chronology; e.g.'like a 1900's' tune', 'a bit old','medieval'
● musical features from the music of other countries; e.g.'it sounded Chinesy', 'it sounds good
like Indian Music', 'sounds Japanese', 'sounds oriental', 'I like the Caribbean beginning', 'sounds
very Egyptian'.
Children who responded in this way have picked out a quality in the sound, such as the use of the
Indian sitar in the sound bank, or a musical feature, such as the intervallic pitch relationships in the
'Egyptian tune', or the syncopated rhythm of the 'Caribbean tune'. As they do not yet have the
vocabulary to describe the specific musical features, they use stereotypes.
● particular musical styles, qualities of style and style preference; e.g..'sounded like jazz',
'very into rock music', 'quite classical'. 'it sounded jazzy', 'it was funky', 'it has a swinging beat'
● styles associated within the media, with other songs, styles which would be appropriate for
films, style similarity between peers; e.g.'it was obviously copied from a nursery tune', 'it was
copied off a pop song that came out recently', 'definitely heard it on TV before', 'mix of When
the Saints and London's Burning'. 'it's like a computer game', 'it sounds like its out of a cartoon',
'sounds like the beginning of a TV programme', 'sounds like a cat food advert', 'it struck me as
something out of a film', 'like space music', 'something out of a Walt Disney Film', 'like
something out of a fairy tale', 'like something out of a child's detective movie', 'too much like
the Snowman', 'like the Little Mermaid', 'something from Maid Marion', 'like something out of
Grease', 'like something out of Bugs Bunny', 'nearly the same as Jai's'.
● identification of tunes for and from a particular contexts; e.g.'like something from a circus
or a fair', 'I liked it because it reminded me of a church', 'for a horror movie', 'good for a play',
'sounds like a disco', 'sounds like a piece for ballet', 'it reminded me of a holiday'.
Whereas at Key Stage 2 style responses were dominated by references to film, video and TV, Key
Stage 3 pupils' perception of style related to personal experience, preference and identity.
Mood
Children responded in this category in a number of ways. To summarise, responses in this category
include:
● positive and negative moods; e.g.'it was jolly and fun', 'a nice happy tune', 'fun and
entertaining'
● responses where the listener identifies with the mood; e.g.'I liked it because it was a happy
tune it's a tune that makes me feel lonely', 'that tune makes me feel jolly', ' it makes you feel
good'
● responses where the tune is identified as having a mood relating to atmosphere or to
movement qualities; e.g.'OK and very calm', 'a relaxing piece of music', 'it sounds restful', 'its
lively','lumpy and springy', 'jerky but good', 'it's got bounce', 'I love this its so bouncy and great',
'like an old man walking', 'like a fairy dancing', 'like someone diving'
● recognition of a change of mood and juxtaposition of two mood states; e.g.'it goes scary,
then normal', 'sweet and catchy', 'funny and strange'
● moods relating to the 'life' and 'feeling' of a piece; e.g.it has a nice feel to it', 'it doesn't have
any life', 'it was good and full of life', 'she should have added a bit more spice to the tune', 'it
was playful'.
Evaluation of Composition
To summarise, responses in this category include :
● value judgements and qualified value judgements; e.g.'it is good', 'I like it', 'it is my kind of
music', 'not to my taste', 'I liked the beat'.
As above, many children justified their preference by valuing one or more aspects of the tune in the
categories of Musical Elements and Style.
● responses referring to the 'whole' fit and of a tune; KS 2 e.g. 'I didn't like this because it
bumped', ' it was a bit rickety', 'it was a bit wobbly'. KS 3 e.g.it doesn't fit together properly', 'it
didn't mix well', 'a bit unstable', 'it sounded together', 'synchronised', 'it was in place'
● responses referring to the quality of thinking/organisation of ideas and expectations
perceived in the music; KS 2 e.g. 'it was a bit messy', 'it was a bit muddled'. KS 3 e.g.'it was
well organised and she knew what she was doing', 'good and well organised', 'well thought up
and practised''too random', 'he didn't know what he was doing', 'it sounded like it was made up',
'it's just anything that he is playing',
● responses valuing to originality, imagination and creativity; e.g.'he was being quite
imaginative', 'very creative and good'.
Interestingly the focus of the responses showed that the younger children were more likely to express
themselves in terms of whether the piece was copied whereas the older Key Stage 3 children were
more concerned with the quality of originality, difference, imagination and creativity.
● responses which 'match' with the pupils' view; e.g. 'it suited her', 'not as good as I thought it
would be'
● responses referring to the expectations in the music; e.g. 'it was too predictable', 'you sort of
knew which note would come next', 'a bit off course towards the end'
● the more intuitive aesthetic judgements and comments on the 'properness' of the music at
KS 3; e.g. 'started brilliant - like a proper piece', 'good but no proper ending', 'very
professional', 'it sounded like a real tune' 'I just enjoyed it so much as it was getting to the point
of a piece'.
Evaluation of Performance
Children produced different types of response in this category. To summarise, responses in this
category refer to :
● how well the tune was played, identification of mistakes; e.g. 'very good, well played', 'good
but a few mistakes', 'it has few jolts and didn't go well'
● practice and technical mastery of the instrument; e.g.'could be played better', 'very good, but
I think she practised'
● differentiation between composition and performance; e.g.'it was skilfully played and
composed', 'it was a good effort and it came out well'
● differentiation between technical demands and the experience of the player; e.g. 'she
stumbled a bit but it was quite good', 'very good except for the slip at the start', '[it was] dull but
well played'.
Discussion
To a certain extent the responses in each sub-category confirm the way the Music National
Curriculum defines the increasing levels of discrimination within each musical element i.e. pitch,
duration, dynamics, tempo, timbre, texture and structure across Key Stages 1-3. The results therefore
provide verbal evidence of how children listen and appraise music in relation to the English Music
National Curriculum .
However, the results also show ways in which the children's musical understanding becomes
increasingly differentiated both within each sub-category and between categories of perception,
beyond the definition presented within the documentation of the Music National Curriculum. This
gives a more detailed picture of how children use language in their responses in each of the
subcategories and more particularly shows us what they value. Responses also change across the
categories, within the categories and with respect to age and gender, and leads to a fuller picture of the
qualitative shifts in the conceptual understanding of music. This is an area for future research.
The results also reveal that responses which show an absence of technical vocabulary may
nevertheless communicate a sense of the music. However, some responses use technical statements
inappropriately e.g. 'out of tune' was used of the intervallic range within the pitch contour. The
conclusion to be drawn is that the use of technical vocabulary may not be evidence of musical
understanding.
Another consideration in the analysis is how far the responses were influenced by musical expertise
and peer group issues of perceived musical expertise, status, friendship and competition. This is
illustrated by examples from the qualitative data which take into account biographical and social
observations of the children (Mellor, 1999). For example, the experienced pianist responds with a
voice of expertise : 'could have practised more', 'original and good for someone who doesn't play the
piano'. The saxophonist with an experience of playing jazz responds using a phraseology common to
jazz style e.g. 'doesn't make the most of the rests, needs to sit back on the beat'.
From my experience as teacher and researcher some responses reflect the relative social status of the
children within the class. The use of the term high/low status is defined by my observation of how the
children interacted and whom they held in esteem amongst their peers. The qualitative analysis
therefore reveals that additional factors need to be taken into consideration. The particular value of a
teacher/researcher is the ability to analyse internal social hierarchy within classes which produces
another level of subjectivity beyond that of gender.
As the full research shows, whilst some responses share characteristic patterns or phases of
development, concerned with issues of recognition, conformity, appropriateness, originality and
reflection, listening responses show individualised profiles which are mediated by the listening
context and the social structure of the group. The question for policy makers and music education
research must be how to integrate these observations into the assessment model. Whilst general
guidelines may be welcomed, over simplification as presented by the Qualifications Curriculum
Authority (2000) might be limiting and misleading. In seeking to 'level out' the types and range of
performance that pupils demonstrate I hope we don't inadvertently 'level out' the richness of this
inquiry which is still a largely uncharted territory.
References
Department of Education and Science (1991) Music for ages 5-14. London: HMSO.
Department of Education and Science (1992) Music in the National Curriculum. London. HMSO.
Department for Education (1995) Music in the National Curriculum, London: HMSO
Flynn, P. and Pratt, G. (1995) Developing an understanding of appraising music with practising
primary teachers. British Journal of Music Education, 12, 127-158.
Hargreaves, D.J., Galton, M.J. and Robinson, S. (1996) Teachers' assessments of children's classroom
work in the creative arts. Education Research, 38, 2, 199-211.
Hentschke L. (1993) Musical development; testing a model in the audience-listening setting.
Unpublished PhD Thesis. Institute of Education: University of London.
Mellor L. (1999) The Language of Self- Assessment: Towards aesthetic understanding in music. in E.
Bearne (Ed.) Use of Language across the Secondary Curriculum..London: Routledge.
Pratt, G. and Stephens, J. (1995) (Ed.) Teaching Music in the National Curriculum. National
Curriculum Music Working Group. Oxford: Heinemann.
Qualifications and Curriculum Authority (1998) Breadth and Balance in the National Curriculum
School Curriculum Assessment Authority (1997) Music and the Use of Language at Key Stage 3,
London: SCAA.
Swanwick, K and Tillman, J (1986) The sequence of musical development. British Journal of Music
Education 3, 305-39.
Swanwick, K. (1996) Music before the National Curriculum. In G. Spruce (Ed.), Teaching Music.
London: Routledge.
Back to index
Proceedings abstract
Dr Ian Cross
ic108@cus.cam.ac.uk
Background:
The theory of natural selection constitutes the ontological core of many recent
theories of biology, mind and culture. From the perspective of evolution, music
has been variously appraised as a significant agent in mechanisms of group
selection, as an elaborate means of mate selection, as an activity that is
parasitic on other, more adaptive behaviours and as an entirely hedonistic,
potentially maladaptive, optional extra in human evolution and behaviour.
Aims:
This paper re-examines the notion of "music" in the context of human evolution
and development, drawing on evidence from studies on infant capacities and
behaviours, from cross-cultural studies of musical behaviour, from theories in
cognitive archaeology and on the archaeological record to suggest that music
may have played a significant role in human evolution and still plays such a
role in cognitive development.
Main contributions:
Implications:
The ideas presented here have implications that are scientifically pragmatic,
in that they present hypotheses that are intended to be empirically testable,
and political, in that they suggest a possible basis for the valorisation of
musical activity in contemporary cultures.
Back to index
Proceedings abstract
Department of Psychology
CANADA
sandra.trehub@utoronto.ca
Background: In recent years there has been increasing interest in
mothers\rquote vocal behaviour with prelinguistic infants. Not only do mothers
talk to their non-comprehending infants; they sing to them as well. The
apparent parallels between maternal speech and song to prelinguistic infants
may stem from the principal goals of such interactions, which are related to
modulation of infant arousal and attention.
Aim: This paper distinguishes between the broad parallels that have been made
across investigations of maternal speech and singing and those that are
emerging from direct comparisons of speech and singing within the same
mother-infant dyads.
Main contribution: Some of the reported parallels between maternal speech and
singing break down on closer inspection. Nevertheless, the differences shed
light on the nature and function of such vocal interactions. For example, the
style of maternal singing is relatively stable over considerably longer periods
than is that of maternal speech. Moreover, the two types of vocal interaction
have distinct consequences for infant attention. When infants watch and listen
to their mothers sing, they seem to be mesmerised, as reflected in prolonged
gaze at mother and relative passivity. By contrast, episodes of maternal speech
lead to intermittent looking at mother but to greater vocal and gestural
responsiveness.
Back to index
Proceedings abstract
Roger A. Kendall
kendall@ucla.edu
Abstract
The majority of research in timbre has involved a single note drawn from the
middle range of the group of instruments studied. A common finding among such
studies is a strong mapping (r>.89) between long-time-average spectral centroid
(spectral center of gravity) and the principle dimension in multidimensional
scaling analysis of perceptual similarities when continuant steady-states are
considered. Another acoustical variable, often mapping to the second dimension,
is spectral flux or varaiability. In fact, it has been shown that when
synthetic emulations of instruments fail to capture these two variables, their
similarity to natural instruments suffers (Kendall, Carterette, Hajda, 1999).
Back to index
Proceedings paper
Singing on High: investigating the use of singing in Christian worship
Diana Meadows BA (Hons) MMus
42 Yarmouth Road
Blofield
Norwich
NR13 4LQ
Department of Music
University of Sheffield
Sheffield S10 2TN
Background
The role of singing in Christian worship has always been important. References to early examples can
be found in the Bible. In the Old Testament, in Exodus 15:1, after the Israelites had crossed the Red
Sea the Song of Moses was sung, and in the New Testament, in his Letter to the Colossians 13:16,
Paul writes:
Let the word of Christ dwell in you richly as you teach and admonish one another with all wisdom, and
as you sing psalms, hymns and spiritual songs with gratitude in your hearts to God.
Hymn singing is seen to play an important part in uniting a community during public worship, with
the congregation standing, or sitting and singing as one. This can provide comfort and strength to the
individual as well as the entire congregation. (Tamke, 1978). Singing in worship provides an
opportunity to praise and worship God, to provide a focus in prayer and can be seen as an aid to
evangelism. (Archbishop, 1992).
Music and dance have united communities for centuries, not only in a religious context but also within
battle and sport. The relationship between religion and sport is of special significance. Liverpool,
Everton, Celtic, Rangers, Manchester United and Manchester City Football Clubs have their roots in
Protestantism, Catholicism and Methodism. Christian rallies such as Spring Harvest, Easter People
and Soul Survivor share several characteristics with football matches. (Percy & Taylor, 1997)
Historically these rallies have attracted large crowds. In the eighteenth century two great preachers,
John Wesley and George Whitefield, preached to crowds in excess of 20,000 people, while during the
nineteenth century, the American evangelists Ira D. Sankey and Dwight L. Moody drew similar
numbers to their meetings in the large cities in Britain.
There is anecdotal evidence recording the effects of singing at these meetings. In 1859, when Revival
came to Britain from America there were reports that congregational singing changed from being
formal and constrained to joyful and full of praise, the result of which was a sense of peace (Phillips,
1989). This was followed by the Welsh Revival where the quality of the singing was especially noted
and the crowds were singing with great joy making use of their bodies as well as their voices (Evans,
1969). Similar accounts can be found in reports of Christian events past and present.
Today thousands of people attend the large Christian events, and now, as in the past, singing by those
At the other end of the scale, ten churches have congregations with less than 50 people. Five churches,
four Anglican and one Evangelical Free, rarely use worship choruses, and five Anglican churches
never use them. The majority of these churches have no members under the age of 50 in their
congregations.
Many church leaders reported that they chose the hymns or choruses purely for the words, and gave
little thought to the setting of the verse or whether the congregation could sing them. Worshippers
found that in many cases, the tunes to some of these texts were unknown and difficult to sing, which
often led to dull, uninspired singing.
On the other hand, church leaders liasing with their musical directors or worship leaders chose
settings to the texts, which enabled meaningful participation in worship.
A selection of reasons for including music in worship was given and leaders were asked to place them
in the order of importance. 'Praising God' was by far the most significant reason for the inclusion of
music, and 'establishing a mood' to aid worship followed this. 'Fostering a sense of community' and
'aiding evangelism' came at the bottom of the list. Interestingly, individual worshippers believed these
two points to be important factors for the inclusion of music.
Leaders were asked whether their congregations made specific requests for hymns or choruses. The
majority of congregations preferred a balance of traditional hymns and worship choruses. Some would
like to have the opportunity to sing new hymns or songs, but because of small congregations did not
have the confidence to attempt them. One congregation specifically often commented that they
wanted to enjoy "a good sing" when attending a service.
The wish to improve the music and singing in a service was widespread, but unfortunately the
resources were limited, either with personnel or restrictions of their building. One church, with a small
congregation, accompanied the singing with a flute, a trombone and a euphonium.
In reply to the qualitative questionnaire, it was clear that worshippers from all denominations
recognised the importance of singing in their worship. They reported that the involvement of joining
with fellow Christians in praising and worshipping God through song heightened their emotions,
encouraged an intimate personal relationship with God and provided a sense of belonging. A hymn or
worship song provided the opportunity for the individual to communicate on a personal level with
God, praising Him, giving Him thanks and asking Him into their lives. New traditional-style hymns
provided the opportunity to sing their concerns with texts written about issues of today such as
homelessness, the environment and racism.
Today's worshippers report that if the first hymn was dull, unknown or difficult to sing, this had a
detrimental effect on their enjoyment of the service, whereas if the first hymn was joyful and easy to
sing it lifted the spirits and prepared the way for worship. One person stated that they could be
enjoying a service, but when a hymn or song was sung which they did not like, it ruined the entire
service.
.
The confidence to sing was encouraged when the texts were set to music written in an accessible,
secular style. Many churches now make extensive use of musical instruments to accompany singing,
and this again helped to increase confidence. The use of percussion was particularly helpful in
heightening awareness and emotions. One church has a collection of African drums, tom-toms and
other percussion instruments for worshippers to use during worship.
The singing of hymns at funerals was found to help, as it provided the opportunity to express personal
grief and emotion.
A significant number of those worshipping at New Churches admitted that it was the singing that
drew them away from the traditional denominations. Others, who preferred quieter services with less
personal participation, found a church offering this. There is no doubt that one of the most important
factors for changing the place of worship is the music used for congregational singing.
The growing use of overhead projectors and computer-generated lyrics instead of hymnbooks initiates
other forms of personal involvement for the worshipper. There is more freedom to clap, wave their
arms or dance as they sing, leading to increased feelings of euphoria and well being.
Several individuals reported they had experienced tears, joy, euphoria or ecstasy as a result of singing
within worship.
There is also a contextual element to worship. When attending a service in Norwich Cathedral for an
alternative Halloween service the congregation was identified, in the main part, as consisting of
members of the Evangelical and Charismatic churches. In this context, worshippers who would, in
their less traditional church buildings, have sung enthusiastically and with great feeling, but in the
Cathedral were controlled and restrained.
First hand reports record that many worshippers became Christians during the singing of a hymn or
chorus, and they can remember vividly which hymn this was. Billy Graham, the great American
evangelist, provided an excellent example of how to build an atmosphere with the use of hymns and
choruses. The hymn 'Just as I am' (1834), led to his conversion and this hymn has meant a great deal
to many new Christians. Other favourites include 'How great Thou art', again made popular by Billy
Graham and 'And can it be' by Charles Wesley, especially the fifth line of the fourth verse 'My chains
fell off'. These and other older hymns have remained popular with their strong melodies that are easy
to sing and members of congregations often sing spontaneously in four-part harmony. In the Billy
Graham Crusades in Britain during the 1980s a popular chorus was 'Majesty' to which many made a
commitment. Recently the modern worship choruses 'Lord, the light of your love' by Graham
Kendrick, and 'My Jesus, my Saviour' by Darlene Zschech have been sung with great assurance.
Conclusions
Worshippers may use singing in order to heighten an emotional experience. The simple repetitive
music found in worship choruses can be very effective in setting the atmosphere for a service.
Feelings of joy, praise, sorrow, love, compassion and contemplation can all be encouraged by music
with the help of a sympathetic worship leader or musical director, but it is not always possible to
know which particular hymn or chorus will affect members of the congregation, or when. Churches of
all denominations are becoming increasingly aware and sensitive to this and are making use of singing
to promote a sense of community, both within and outside the church. The implications for the power
of singing in psychological terms are immense. There is a need for the theory of religious singing to
be developed, using music from the past and present.
References
Tamke, S. S.: Make a joyful noise unto the Lord (Ohio, Ohio University Press, 1978)
Archbishop's Commission: In Tune with Heaven (London, Church House, 1992)
Percy, M. & Taylor, R.: 'Something for the weekend sir? Leisure, ecstasy and identity in football and
contemporary religion': in Leisure Studies 16 (1997)
Phillips, T.: The Welsh revival (Edinburgh, The Banner of Truth trust, 1989)
Evans, E.: The Welsh Revival of 1904 (London, Evangelical Press, 1969)
Author's note
Diana Meadows BA (Hons), MMus is Chairman of Musical Keys, a Norfolk Charity providing music
and movement for children under the age of eight with special needs. The author has had a great deal
of experience as a music director in nonconformist churches.
This research is part of doctoral research undertaken under the joint supervision of Dr. Jane Davidson,
Department of Music and Rev. Canon Dr. Martyn Percy, Director of The Lincoln Theological
Institute, both at The University of Sheffield.
Back to index
Proceedings paper
During the composition sessions all 'on screen manipulations' of the program were unobtrusively recorded to
videotape through a 'video-card' installed in the computer. In addition to this videotape data, 'midi files' were
saved using different name references via the 'save as' method (e.g., David 1, David 2 etc.) for each
participant at the end of each composition session. Videotape recordings of 'on screen manipulations' and
'midi files' provided process of composition data for investigation.
Measure of participants' self-evaluations
Questionnaires were administered at two points in time. Time one: prior to both training in the use of the
Cubase program and engaging in computer-based composition and, time two: after completing their
computer-based compositions. At time one, participants were asked questions designed to reveal their levels
of confidence in their ability to compose pieces of music in relation to 'other students' with, and separately
without, FIMT. At time two, participants were asked to evaluate their own compositions in relation to 'other
students' with, and separately without, FIMT. According to Diener and Dweck (1980) knowing the
adolescent's ratings of 'others' performance allows a clearer interpretation of the evaluation of their own
performance. For example, an adolescent may rate his or her own performance as 8 on a 10-point scale; but if
that adolescent thinks that most other adolescents would rate a 9 or 10 on the scale, then he or she may not
consider 8 to be a successful score. On the other hand, if the adolescent believed most other adolescents
would rate a 4 or 5, then his or her performance might be outstanding by comparison (p.994). Thus the
difference between the questions 'How good do you think most students who have had [have not had]
instrumental tuition are at composing pieces of music?' and 'How good do you think you are at composing
pieces of music?' (time one) was calculated. Also 'How good do you think the compositions of most students
who have had [have not had] instrumental tuition will be?' and 'How good do you think your composition
sounds?' (time two) was calculated.
Teacher evaluations of compositions
The completed compositions were recorded to CD for evaluation by specialist music teachers using
'consensual assessment' procedures (Amabile, 1982; Daignault, 1996; Hickey, 1998). Evaluations of the
compositions were made by four, practising, experienced, specialist music teachers. Separately, they rated the
compositions each using different CDs with the compositions recorded in a different random order on each
CD. The compositions were identified by number only. They made their evaluations using pre-prepared forms
that required them to listen to the CD twice. On the first listening they were asked to rate for 'overall
impression'. On the second listening, they were asked to rate for 'creativity' and 'craftsmanship'. 'Overall
impression' was rated using a 7 point rating scale (anchored from 1= very poor, 4= average, to 7= excellent).
'Creativity' and 'craftsmanship' were rated using a 7 point rating scale (anchored from 1= low, 4= medium, to
7= high). Instructions to the teachers included: 'Please try to use the full range of the scale from 1-7', 'When
rating for 'creativity' please consider the following dimensions: originality, novel use of timbres, novel
musical ideas and variety.', 'When rating for 'craftsmanship' please consider the following dimensions: form,
technical goodness, detail, complexity and overall organisation'. Teachers were also advised:
'Though it is certainly possible to give similar ratings on all three categories ('overall impression', 'creativity'
and 'craftsmanship') do not allow how you rate on one scale to necessarily effect how you rate the
composition on the others. Keep the ratings for each category separate as you listen to the compositions'.
Results.
Participants' self-evaluations
T-tests were carried out to compare difference means at 'Time 1'and 'Time 2' means are displayed in Table 1.
Table 1.Mean scores (and standard deviations) of difference means for participants with and without FIMT
when compared to 'others' with, and separately, without FIMT at 'Time 1' and 'Time 2'
Without FIMT (23) 4.09 (1.76) 0.61 (1.80) 1.48 (1.34) -0.044 (1.02)
'Time 1'
Participants with and without FIMT rated themselves worse at composing than 'others' with FIMT but
participants with FIMT had significantly lower difference means than participants without FIMT (t = 4.95
(46), p<.001).
Participants with FIMT rated themselves better at composing than 'others' without FIMT and participants
without FIMT rated themselves worse at composing than 'others' without FIMT and that the difference
between these ratings were significant (t = 2.64 (46), p <.05).
'Time 2'
Participants with and without FIMT rated their compositions worse than the compositions of 'others' with
FIMT but participants with FIMT had significantly lower difference means than participants without FIMT (t
= 2.15 (46), p<.05).
Participants with and without FIMT rated their compositions better than the compositions of 'others' without
FIMT but participants with FIMT had higher difference means than participants without FIMT, although this
results failed to reach significance (t = 1.87 (46), p =.066).
Chi-square analyses were carried out to investigate these findings further. Participants were cross-classified
according to whether or not they had prior experience of FIMT and for 'Time 1', whether they rated their
ability to compose pieces of music same/better than or worse than 'others' with and separately without FIMT
and for 'Time 2' whether they rated how good their composition sounded the same/better than or worse than
'others' with and separately without FIMT. The results are summarised in Table 2.
Table 2. Number (and percentage) of participants with and without FIMT according to whether they rated
their ability to compose pieces of music same/better than or worse than 'others' with and separately without
FIMT at 'Time 1' and whether they rated how good their composition sounded the same/better than or worse
than 'others' with and separately without FIMT at 'Time 2'.
With FIMT
3 22 20 5 13 12 21 4
(25)
(12.0) (88.0) (80.0) (20.0) (52.0) (48.0) (84.0) (16.0)
Without 1 22 13 10 5 18 16 7
(23)
FIMT (4.3) (95.7) (56.5) (43.5) (12.7) (78.3) (69.6) (30.4)
At Time 1 most participants with FIMT rated themselves 'worse' at composing than 'others' with FIMT but the
same or better at composing than 'others' without FIMT. At Time 2 participants with FIMT were more evenly
divided when comparing their completed compositions with 'others' with FIMT but there was little change
when comparing their compositions with 'others' without FIMT.
At Time 1 most participants without FIMT rated themselves 'worse' at composing than 'others' with FIMT but
were more evenly divided when comparing their completed compositions with 'others' without FIMT. At
Time 2 more participants without FIMT considered themselves the same or better when comparing their
completed compositions with 'others' with FIMT than at Time 1. This trend was repeated when comparing
their completed compositions to 'others' without FIMT.
In order to investigate these changes for participants from Time 1 to Time 2 difference means were the
dependent measures in a repeated measures ANOVA with two between-subjects factors (FIMT/ NON-FIMT
and gender) and two within-subject factors (Time 1/ Time 2 and 'others' with/without FIMT). The results are
summarised in Table 3.
Table 3.
There were significant main effects for Time 1/ Time 2 (F=28.95, p< .001) and 'others' with/without FIMT
(F= 189.84, p< .001). There were significant interactions between Time x 'others' with/without FIMT (F=
27.77, p< .001) and Time x FIMT (F= 4.55, p<.05). There were no significant main effects of interactions for
gender.
Teacher evaluations of compositions
Mean scores were calculated for each teacher on each category (see Table 4).
Table 4. Mean scores (and standard deviations) by teachers for 'overall impression', 'creativity' and
'craftsmanship'
An examine of the correlation matrix for each teacher on each of the three categories ('overall impression',
'creativity' and 'craftsmanship') revealed that for all four teachers the three categories were highly correlated
indicating they were not differentiating between the categories. As the teachers were not differentiating
between categories it was decided to create an 'overall rating' for each teacher by calculating composite mean
ratings for ('overall impression', 'creativity' and 'craftsmanship') for each teacher. An examination of the full
correlation matrix for this 'overall rating' revealed the scores for three of the teachers were significantly
correlated. Since no significant correlation was found between the ratings of one of the four teachers this
teacher's rating was omitted from the analysis. Further examination of this teacher's ratings may reveal some
interesting differences in approach, however for the purposes of the present study, a consensus of agreement
between the other three teachers suggested more reliable evaluations (see Table 5).
Table 5. Correlation between teachers for 'overall rating'
T.1 T.2 T3
increase in participants' overall levels of self-confidence as a result of engaging with the composition task.
Ratings at Time 2 being made in relation to the composition itself rather than speculating about composition
ability. Results also revealed participants with and without FIMT awarded lower ratings to their completed
compositions when compared to 'others' with FIMT but participants without FIMT awarded significantly
lower ratings than participants with FIMT. This lends support to a previous study which found children
without FIMT were more likely to rate their own compositions lower than children with FIMT (Seddon and
O'Neill, 1999). However, when comparing their compositions to 'others' without FIMT participants with and
without FIMT awarded higher ratings to their own compositions. When interpreting these results it is
important to note that the results of the teachers evaluations found no significant differences between the
compositions based upon prior experience of FIMT. This means that either the adolescents are employing
different evaluation criteria than the teachers or their levels of self-confidence are influencing the accuracy of
their self-evaluations.
Implications.
If self-assessment of adolescent computer-based composition is to be employed, issues of self-confidence in
relation to prior experience of FIMT need to be addressed to improve the accuracy of these measures. Based
upon the evidence of this study, it seems likely that self-confidence in computer-based composition
(regardless of prior experience of FIMT) will increase as a result of engaging with the process. Adolescents
should be encouraged to make self-evaluations of their compositions based upon the composition itself rather
than being influenced by their self-confidence in relation to their prior experience of FIMT.
References
Amabile, T. M., (1979). Effects of external evaluation on artistic creativity. Journal of Personality and Social
Psychology, 37(2), 221-233.
Covington, M.V., and Omelich, C.L. (1979). Effort: The double-edged sword in school achievement. Journal
of Educational Psychology, 71, 2, 169-182.
Daignault, L. (1996). A study of children's creative musical thinking within the context of a
computer-supported improvisational approach to composition. Unpublished doctoral dissertation. Chicago,
U.S.A.: Northwestern University.
Diener, C. L. and Dweck, C. S. (1980). An analysis of learned helplessness: ll. The processing of success.
Journal of Personality and Social Psychology, 39, 940-952.
Folkestad, G. (1998). Musical learning as cultural practice: as exemplified in computer-based creative
music-making. In B. Sundin, G.E. McPherson, and G.
Folkestad (Eds.), Children composing: research in music education (pp. 97-134). Malmo Academy of Music:
Lund University.
Hickey, M. (1998). Consensual assessment of children's musical compositions. Submitted to Creativity
Research Journal January 20, 1998.
Scripp, L., Meyaard, J., and Davidson, L. (1988). Discerning musical development: Using computers to
discover what we know. Journal of Aesthetic Education, 22 (1), 75-88.
Seddon, F.A., & O'Neill, S.A. (1999). An evaluation study of computer-based compositions by children with
and without prior experience of formal instrumental music tuition. Accepted for publication Psychology of
Music January 1999.
Visopel, W. P. and Austin, J. R. (1993). Constructive response to failure in music: The role of attribution
feedback and classroom goal structure. British Journal of Educational Psychology, 63, 110-129.
Vispoel, W. P. and Austin, J. R. (1998). How American adolescents interpret success and failure in classroom
music: relationships among attributional beliefs, self concept and achievement. Psychology of Music, 26, 1,
26-45.
Back to index
Proceedings paper
1. Introduction
When listeners hear a musical stimulus, they immediately orient themselves in the sound and use surface
cues to make musical judgments, such as "this is by Bach" or "I hate this kind of music." This
orientation process is apparently pre-conscious, relating to basic auditory organization rather than to
high-level cognitive musical abilities. These musical judgments may be immanent or may lead to overt
acts such as foot-tapping, speech acts, musical gestures (such as vocalization) or other observable
behaviors.
Naturalistic real-world settings exist that provide opportunities to see these behaviors in action. Perhaps
the most significant is scanning the radio dial. A preliminary report on scanning-the-dial behavior and
its implications was recently presented by Perrott and Gjerdigen . They found that college students were
able to accurately judge the genre of a piece of music (about 50% correct in a ten-way forced choice
paradigm) after listening to only 250-ms samples. The kind of musical information that is available after
only 250 ms is quite different than the kind of information that is treated in the traditional sort of
music-psychology experiment (notes, chords, and melodies).
Immediate music-listening behaviors like these are fundamentally inexplicable with present models of
music perception. It is not at all clear what sort of cognitive structures might be built that could support
this sort of decision-making. The stimuli are too short to contain melodies, harmonic rhythms, or much
hierarchical structure. On the other hand, the spectral content, in many styles of music, is not at all
stationary even within this short duration. Thus, it seems quite possible that listeners are using dynamic
cues in the short-time spectrum at least in part to make these judgments. This sort of description makes
genre seem very much like timbre classification. Such a viewpoint is in concert with the writing of many
modern-day composers on the relationship between timbre and orchestration .
We define the musical surface to be the set of representations and processes that result from immediate,
preconscious, perceptual organization of a acoustic musical stimulus and that enable a behavioral
response. There are then three questions that immediately concern us. First, what sorts of representations
and processes are these? Second, what sorts of behaviors do they afford the human listener? Third, what
is the interaction between the representations and the processes as the listening evolves in time?
In this paper, we present exploratory experimental and computer-modeling research that investigates the
role of perceived complexity in the musical surface.
2. Listening Experiment
As part of a larger project on the perception and modeling of immediate music-listening behavior , we
conducted an experiment dealing with the human perception of musical complexity directly (along with a
number of other perceptual attributes that will not be reported here). We define this perceptual feature to
be the sense of how much is going on. It is the scale on which listeners can rate sounds along a range
from simple to complicated. This experiment was investigatory in nature and was not designed to test
any hypotheses in particular.
Overview of procedure
Thirty musically trained and untrained subjects listened to two five-second excerpts taken from each of
75 pieces of music. The subjects used a computer interface to listen to the stimuli and make judgments
about them. Among the judgments elicited was the subjects' sense of the music as simple or complex.
1. Subjects
The subjects were drawn from the MIT community, recruited with posts on electronic and
physical bulletin boards. Most (67%) were between 18 and 23 years of age, the rest ranged from
25 to 72 years. The median age was 21 years. Of the 30 subjects, 10 were male and 20 were
female, although there were no gender-based differences hypothesized in this experiment. All but
four subjects reported normal hearing. 22 reported that they were native speakers of English, and
6 reported that they were not.
9 subjects reported that they had absolute-pitch (AP) ability in response to the question "As far as
you know, do you have perfect pitch?" No attempt was made to evaluate this ability, and it is not
clear that all respondents understood the question. However, as reported below, there were small
but significant differences on the experimental tasks between those who claimed AP and those
who did not. The subjects had no consistent previous experience with musical or psychoacoustic
listening tasks.
After completing the listening task, subjects were given a questionnaire regarding their musical
background, and thereby classified into three groups: M0 (nonmusicians, N = 12), M1 (some
musical training, N = 15) and M2 (experienced musicians, N = 3). No formal tests of audiology or
musical competence were administered.
Breakdowns of musical ability by age and by gender are shown in Table 1. Note that the
experiment was not counterbalanced properly for the evaluation of consistent demographic
differences.
M0 1 11 9 0 2 1
M1 8 7 9 5 0 1
M2 1 2 2 0 1 0
1. Materials
The experimental stimuli were 5-second segments of real, natural music. Two non-overlapping
segments were selected at random from each of 75 musical compositions. The 75 source
compositions were selected by randomly sampling the Internet music site MP3.com, which hosts
a wide variety of musical performances in all musical styles by amateur and professional
musicians. Samples were mixed down to mono by averaging the left and right channels,
resampled to 24000 Hz, and amplitude-scaled such that the most powerful frame in the 5-second
segment had power 10 dB below the full-power digital DC. The music was not otherwise
manipulated or simplified. The stimulus set contains jazz, classical, easy-listening, country, and a
variety of types of rock-and-roll music.
It is worthwhile to explore the implications of this method of selecting experimental materials.
MP3.com is presently the largest music web site on the Internet, containing about 400,000
freely-available songs by 30,000 different performing ensembles. Using materials from such a site
enables studies to more accurately reflect societal uses of than does selecting materials from
personal music collections. The materials are certainly more weighted toward rock-and-roll and
less toward music in the "Western classical" style than is typical in music-psychology
experiments. However, this weighting is only a reflection of the fact that the listening population
is more interested in rock-and-roll than it is in "Western classical" music.
A second advantage of selecting music this way is that scientific principles may be used to choose
the particular materials. In this case, since the set to be studied is a random sample of all the music
on MP3.com, it follows from the sampling principle that the results we will show below are
applicable to all of the music on MP3.com (within the limit of sampling variance, which is still
large for such a small subset). This would not be the case if we simply selected pieces from a
more limited collection to satisfy our own curiosity (or the demands of music theorists).
2. Detailed procedure
Subjects were seated in front of a computer terminal that presented the listening interface, as
shown in Figure 1. The interface presented six sliders, each eliciting a different semantic judgment
from the listener. The scales were labeled simple-complex, slow-fast, loud-soft,
interesting-boring, and enjoyable-annoying (only the first will be directly discussed here). The
subject was instructed that his task was to listen to short musical excerpts and report his judgments
about them. Three practice trials were used to familiarize the subject with the experimental
procedure and to set the amplification at a comfortable listening level. The listening level was
allowed to vary between subjects, but was held fixed for all experimental trials for a single
subject.
Figure 1
Each of the 150 stimuli (75 musical excerpts x 2 stimuli/excerpt) were presented in a random
order, different for each subject. When the subject clicked on the Play button, the current stimulus
was presented. After the music completed, the subject moved the sliders as he felt appropriate to
rate the qualities of the stimulus. The subject was allowed to freely replay the stimulus as many
times as desired, and to make ratings in any order after any number of playings. When the subject
felt that the current settings of the rating sliders reflected his perceptions accurately, he clicked the
Next button to go on to the next trial. The sliders were recentered for each trial.
The subjects were encouraged to proceed at their own pace, taking breaks whenever necessary. A
typical subject took about 45 minutes to complete the listening task.
3. Dependent measures
For each trial, the final setting of the simple-complex slider was recorded to a computer file. The
computer interface produced a value from 0 (the bottom of the slider) to 100 (the top) for this
rating on each trial. Any trial on which the slider was not moved at all (that is, for which the slider
value was 50) was rejected and treated as missing data for that stimulus. Approximately 5.2% of
the ratings were rejected on this basis.
The response variables were shifted to zero-mean and scaled by a cube-root function to improve
the normality of distribution. After this transformation, the responses (labeled SIMPLE for
brevity) lie on a continuous scale in the interval [-3.68, +3.68] and are bimodally distributed, with
modes at about ± 2.5. Two additional dependent variables were derived. The SIGN variable
indicates only whether the response was above or below the center of the scale; it is a binary
variable. The OFFSET variable indicates the magnitude of response deviation from the center of
the scale on each trial, without regard to direction. It is calculated by collapsing the two lobes of
Several analyses of variance were conducted to explore the relationship between subject demographics
and the rated judgments of complexity. Results are summarized in Table 2. In each case, the dependent
variable was OFFSET, calculated by collapsing the two lobes of the bimodal response distribution, since
the main judgment was not normally distributed. OFFSET measures the degree to which subjects use the
ends of the scale relative to the center. Rejecting the null hypothesis in an analysis of variance of
OFFSET (that there is no effect of the subject condition) is a sufficient condition to reject the null
hypothesis for the main variable, SIMPLE.
As seen in the table, each of the demographic variables had a significant effect on the subject ratings.
The first two effects, based on subject and stimulus number, were expected. These effects that some
subjects consistently find all stimuli to be more complex than do other subjects, and that some stimuli
are rated more complex by all subjects than others. The rest of the effects were unexpected and difficult
to interpret. The means and 95% confidence intervals of OFFSET broken down by each of these
independent variables are plotted in Figure 2.
Independent df F p
variable
Figure 2
Experienced musicians (M2 subjects) used the ends of the scale slightly more than other subjects.
Subjects claiming absolute pitch used the ends of the scale slightly more. Subjects whose native
language was not English, female subjects, and older subjects also used the ends of the scale more.
Without many more subjects to fill out a complete multidimensional ANOVA, it is difficult to interpret
these small but significant differences. One possibility is that the independent variables shown here are
actually covariates of some unmeasured demographic variable that is more fundamental, perhaps
corresponding to social cohort. Small but consistent effects of subject demographics similar to these
have been measured in previous research on loudness judgments of natural music examples by Fucci et
al. .
3. Computational modeling
In parallel to the experimental research, we developed a psychoacoustic model that incorporates
submodels of tempo and rhythm perception , auditory scene analysis , and the extraction of sensory
features from musical stimuli. The auditory model is implemented as a set of signal-processing computer
programs. It operates directly on the acoustic signal, not from symbolic models of stimuli, and so can be
used to study naturalistic samples of music taken from compact discs or other acoustic sources.
1. Modeling technique
The psychoacoustic model extracted 16 features from each of the 150 musical excerpts. Brief
descriptions of the features are shown in Table 3. Scheirer provided more details on these features
and how they are extracted from musical signals. Note that there are no features that relate to the
cognitive structure of the musical signal. All of the features deal with sensory aspects of the
musical sound such as loudness, pitch, tempo, and auditory scene analysis.
NameMeaning
The features were entered in a multiple-regression procedure, where they were used to predict the
mean complexity ratings for each stimulus that were collected in the experiment of Section 2.
(Even though the individual ratings were bimodally distributed, the mean stimulus-by-stimulus
ratings across all subjects were normally distributed, and so can be modeled with linear
regression). Two kinds of multiple regressions were computed. The first entered all features at
once, to determine how much of the mean complexity could be explained with this psychoacoustic
model. The second entered the features one-at-a-time in a stepwise regression procedure, to see
which features are most useful for explaining the primary degrees of freedom of the complexity
judgments.
2. Modeling results
The first model, in which all features were entered, was strongly significant, with R = 0.536 (p <
0.001). Thus, compared to the correlations with the counterpart stimuli calculated in Section 2.6,
the psychoacoustic model explains slightly more of the variance in the ratings (R2 = 0.294 for the
Figure 3
Further, when the psychoacoustic features and the counterpart ratings were included in a single
regression, the combined R2 value was 0.448. This is remarkably close to the result
(0.294 + 0.250 = 0.544) that would be obtained if the covariance explained by the counterpart
ratings were precisely orthogonal to that explained by the psychoacoustic model. This finding is
compatible with the hypothesis that the sources of complexity shared between each stimulus and
its counterpart are primarily cognitive (musical style, genre, use of lyrics) while the sources of
complexity captured in the psychoacoustic model are primarily sensory.
The second model, in which the psychoacoustic features were entered in a stepwise regression,
was strongly significant at every step, as shown in Table 4. The +/- signs on each feature in Table
4 indicate the direction of the partial correlation of that feature with the residual at that stage of the
stepwise regression (recall that larger values for SIMPLE indicate simpler stimuli). In total, five
features are entered in the stepwise model. Two of these are features that relate to the
auditory-scene-analysis of the signal (CHANCOH and VARIM) and two are features that relate to
the tempo and beat structure of the signal (VARIBI and BESTT). In some cases, the sign of the
partial correlation seems counterintuitive. For example, the negative VARIM partial correlation
indicates that, once the effects of CHANCOH are accounted for, stimuli are simpler when they
have a more-frequently changing number of auditory streams. However, since in each stage of the
stepwise regression, only the residual from the previous stage is being explained, it is impossible
to interpret the role of the later features without a more-detailed analysis of the feature covariance.
The most important conclusion is that a model based on only five psychoacoustic features can
explain nearly 20% of the variance in mean ratings of stimulus complexity.
3. Individual differences
The results in the previous section indicate only that the overall mean ratings can be predicted with
psychoacoustic models. It is also useful to explore individual within-subject ratings to examine whether
they, too, can be predicted with such a model. Since the individual ratings are not normally distributed, a
linear regression model is not appropriate. Rather, we converted the ratings into a binary response
variable (above center/below center) and used logistic regression to model this variable, called SIGN.
We computed 30 separate logistic regressions, one for each subject, using the 16 psychoacoustic features
to predict SIGN. That is, the logistic regression for a subject tries to predict whether the subject gave a
response above center, or below center, for each stimulus.
In the grand average, 73.6% of the responses were correctly predicted (50% is the chance level). There is
a clear advantage to using separate models for each subject. If a single model is used to predict the
responses of all subjects, only 58.6% of the responses can be predicted correctly.
Of the 30 subjects, the responses of 14 of them (46.7%) could be modeled significantly well to the p <
0.05 level. The other 16 subjects could not be modeled in this fashion. For some of the nonmodeled
subjects, the difficulty was that the responses given by that subject were so heavily weighted to one side
of the complexity scale that the constant term in the model explains nearly all of the log-probability,
leaving no residual for the predictors. For example, for subject #30, more than 80% of his/her responses
were correctly predicted by the model, yet this performance can be expected reasonably often by chance
(p = 0.18). Such results indicate that a larger set and even broader range of stimuli is required to evaluate
these models more carefully.
There were no significant effects of the demographic variables on the proportion of responses that could
be predicted. The null hypotheses that musicians' responses are as easy as nonmusicians' to predict,
males' as easy as females', and so forth, cannot be rejected with this testing methodology.
30 independent stepwise logistic-regression analyses were also computed, to examine the various
features that helped to predict the different subjects' ratings. In these analyses, since the number of
predictors and thus the degrees of freedom are fewer, more of the analyses reach significance. 25 of the
30 subjects (83.3%) had their responses predicted significantly well with a logistic model containing
between one and five predictors. The most frequently-entered features were MEANMOD, entered in 8
of the 25 models; TEMPENT and VARIM, entered in 7 of the models; and BESTT, entered in 6 of the
models. All of the features except DYNRNG were entered in at least one model.
If many more subjects had been used in the study, it might be possible to divide them into groups based
on the features that predict their responses. But this is difficult when the features number more than half
the subjects as they do here. As one example of this sort of analysis, we divided the subjects into two
groups. The first group consisted of those subjects (N = 12) for whom MEANMOD or CHANCOH were
entered as predictors in the stepwise regression (these two features are strongly correlated, r = .270). The
second group consisted of the rest (N = 18). Using these two groups, we determined how many of the
intersubject correlations in rating patterns were significant, as was done for the whole subject pool in
Section 2.6. 62.1% of the 66 intersubject correlations in the first group and 49.0% of the 153 intersubject
correlations in the second group were significant at p < 0.05 or better. Thus both groups seem to be more
homogenous, according to this metric for homogeneity, than the subject pool as a whole was.
This argument by itself is not conclusive, as it is somewhat circular (second-order statistics are used to
identify subjects to put into groups, who are then found with related second-order statistics to have
something in common). However, it indicates a method that for a larger study might use to identify
groups of subjects who share common strategies for making complex judgments. This is a first step
towards a broader study of individual differences in listening behavior.
4. Discussion
Let us return to the concepts put forth in the introduction. We assume for the moment that there is a stage of
perceptual processing that can reasonably be called the musical surface. How could we determine whether a
particular feature of music (complexity, in this case) is a surface feature, and whether a particular judgment or
behavior is based partly, mostly, wholly, or not at all on the surface features of music?
Of course we do not mean to argue that only surface information is used for making musical judgments.
Surely, low-level surface information and high-level cognitive information interact in complicated ways in any
music-perception situation. However, most previous research on musical has focused exclusively on cognitive
cues, such as tonal constraints, melody construction and identification, and other structural aspects of music.
This approach limits both the styles of music that can be addressed (since the overwhelming majority of
cognitive-structural hypothesis about music perception narrowly target Western classical music) and the
explanatory power of the models. It is difficult to see how theories of music perception could ever relate to the
acoustic signal when the basic theoretical elements are so distinct from the sensory aspects of hearing.
The modeling of cognitive aspects of music perception must be considered in relationship to the sensory
modeling results that we have presented. The statistical results shown here demonstrate that significant
proportions-more than a quarter-of the variance in human judgments of complexity can be explained without
recourse to cognitive models. In other words, we have demonstrated that a sensory model suffices to explain a
significant proportion of the variance in this judgment. The only explanatory space left to cognitive models
remains in the residual.
The independence of the variance in judgments explained by the counterpart ratings, and that explained by the
psychoacoustic model, allows us to formulate a coherent hypothesis regarding these two factors. Namely, that
the variance explained by the counterpart ratings is primarily due to cognitive or structural similarities and
differences among a set stimuli, while the variance explained by the psychoacoustic model is primarily due to
sensory similarities and differences. One test for this hypothesis would be to control the length of the stimuli
used in the listening task, as done by Perrott and Gjerdigen in their scanning-the-dial experiment. If the
hypothesis is correct, as stimuli become very short, the counterpart ratings should be able to explain relatively
less variance than the psychoacoustic model, because there will be little basis for examining structural
similarities and differences among the stimuli. In contrast, as the stimuli become longer, the counterpart ratings
should be able to explain relatively more variance, as the structural properties of the music become more
important for mentally summarizing it for comparison.
A pressing question regarding experimental judgments of the sort we have reported here is that of individual
differences. Although the intersubject variance in this task was small enough that experimental effects could be
observed, it still seems large relative to the ratings being made. It is obviously inadequate to divide listeners so
crudely into categories by their musical backgrounds.
Considering again the modeling results from Section 3, we can formulate several hypotheses regarding
individual differences. The question at hand is what sorts of differences there are among listeners. We
distinguish three hypotheses targeting only the sensory aspects of musical hearing (although we do not mean to
claim that this list is exhaustive):
H1. There are no important differences among listeners. Different listeners use essentially the
same features weighted the same way to make judgments.
H2. Individual differences are based on different weights applied to a single feature set. Each
listener extracts the same auditory cues from sounds, and then these cues are combined with
different weights to form judgments.
H3. Individual differences are based on different features of sound. Different listeners extract
different cues and combine them in idiosyncratic ways to form judgments.
The present results are not compatible with hypothesis H1. If H1 were true, then a single regression model
would be as good a model for subjects' judgments as the individually-adapted models. But we found that
individual models could predict the subjects' judgments much more accurately than a single model.
We did not collect enough data in this experiment to distinguish H2 and H3. Although it is clear that different
stepwise models enter different features, a few of the features are entered very often, and the overall space of
features is really quite small. In the music-psychology literature, there seems to be almost no discussion of
different listening strategies that listeners might adopt, the reason that different listeners (even those with
similar musical experience) hear different things in music, or the perceptual and cognitive bases of musical
preference. These topics must be considered crucial if we wish to develop a coherent psychology of
music-listening behavior. Continuing evaluation of these hypotheses, and other hypotheses regarding
individual differences in listening, awaits future research.
References
Erickson, R. (1985). Sound Structure in Music. Berkeley, CA: University of California Press.
Fucci, D., Petrosino, L., & Banks, M. (1994). Effects of genre and listeners' preference on
magnitude-estimation scaling of rock music. Perceptual and Motor Skills, 78(3), 1235-1242.
Perrott, D., & Gjerdigen, R. O. (1999). Scanning the dial: An exploration of factors in the identification of
musical style. Paper presented at the Society for Music Perception & Cognition, Evanston, IL.
Scheirer, E. D. (1998). Tempo and beat analysis of acoustic musical signals. Journal of the Acoustical Society
of America, 103(1), 588-601.
Scheirer, E. D. (1999). Sound scene segmentation by dynamic detection of correlogram comodulation. Paper
presented at the International Joint Conference on AI Workshop on Computational Auditory Scene Analysis,
Stockholm.
Scheirer, E. D. (2000). Music-Listening Systems. Unpublished Ph.D., Massachusetts Institute of Technology,
Cambridge, MA.
Back to index
Proceedings abstract
Patrik.Juslin@psyk.uu.se
Background:
Studies of music performance have been conducted for a hundred years. This
research has yielded a large body of findings regarding different aspects of
performance. In particular, a lot of research has concerned a phenomenon
referred to as performance expression; that is, variations in timing, loudness,
timbre, and pitch that form the so-called microstructure of a performance. A
number of different approaches to performance expression have been advanced,
but few attempts have been made to relate the different approaches.
Aims:
Main contributions:
Implications:
The preliminary evaluation of the GERM model suggests that (a) different
sources of expression can be integrated into a common model, (b) the model may
contribute to our understanding of how different sources of expression
interact, and (c) different performers might be characterized in terms of their
relative weights regarding different sources of expression.
Back to index
Proceedings. Keynote
Klaus R. Scherer
University of Geneva
Klaus.Scherer@pse.unige.ch
Back to index
Proceedings paper
Sounds may possess symbolic meaning stemming from the association with the parts of body to
which they are "tuned in" through resonance.
The unconscious is always considered as something that is "under" consciousness and its center: the
ego. We can easily observe that the lower sounds resonate with the chest or even the stomach, while
the higher with the throat and this or another section of the cranium. Throughout all of the
Indo-European tradition beginning from the Vedas the chest (heart) is associated with the will and
passions, while the stomach with drives or desires (unconscious and relatively more abound in
energy), and the head—together with some sapient division—with reason or the mind (conscious but
very often lacking energy).
Such observations, as well as the everyday experience of speech and music, lead us to formulate the
basic assumption of the present model: the rising of the tune corresponds to the ego's more intensive
demand for energy, while its falling to the ego's energy demand of reduced intensity.
energy demand, instead of its relative decrease in the major. Correspondingly, 3 - 4 = -1.
Now we can easily feel the reason why, in particular in the 17th and 18th centuries, music was often
considered as closely related to mathematics. (Mozart himself said that music is joy that wants to
calculate itself).
The parallel to mathematics is obvious here: the relation that determines whether the key is major or
minor corresponds, so to say, to the second derivative from the function of the rising of the tune.
Examples
In the two aforementioned examples (Almayev N., in press) we considered low energy demand in
combination with the high level of available energy, and high energy demand in combination with low
level of available energy. One can refer to the famous R.Wagner's "Flight of Walkure" theme as an
example of intensive energy demand combined with the big amount of available energy.
Part2. Quasi-experimental study
Main assumption of all the proposed approach is that meaning of both natural language words and
music might be described in one and the same way through the models of meaning realization. Being
living-through processes meaning realizations are subjected to those intentional modifications, which
were described by Husserl and to which the two functions of psychic energy management were added.
If that could be considered true, then the task of mutual elucidation of music and natural language
becomes actual. We need natural language words for qualification of musical pieces, and also music
from its' site being very well structured quantitative system could help in elucidation of how the
meanings of the words are realized internally.
Therefore, first of all the reliability of correspondence between verbal qualifications and musical
pieces should be checked. Will the subjects who share the common native language and belong to the
same culture, but who are differently educated in general and in music in particular, show consistency
in their qualification of different musical pieces?
In order to answer this question we (with my student Elkhimova L.- this work was for her graduating
diploma at the Moscow State Open University) conducted in 1999 empirical research.
Method
Two experts in music (teachers of a musical school) were asked to select 6 melodies (3 -19-th century
Romantics, and 3 - "contemporary" Rock) and to describe each of them with the help of 4 adjectives,
which will suite each composition best of all. Experts were also asked to have in mind, if possible,
metaphors of taste.
As the result 24 descriptors were obtained. Some of them evidently implied estimation of tempo:
"agile", "turbulent", the other were more or less resembling traditional adjectives from Osgoods
(1976) semantic differential, several were very specific like: "aerated like Soda water", "little bit
sweet" ("Land of confusion"), "strict" ("Du hast"). Correspondence of adjectives of different
languages is the separate topic, therefore I will not try to translate the whole list.
Experts were asked only to propose metaphoric definitions for each composition, but not to compare
compositions to each other according to the definitions that were already made.
Hypothesis
Different subjects of the same culture but different educational and social background will be in
general agree with the experts in describing the musical compositions by the adjectives of their native
language (Russian).
H1. Subjects categorizations of different compositions will be significantly different and coincide
with that of experts.
H0. Subjects categorization of different compositions will be random and will not coincide with
experts estimations.
Stimuli
The following 6 compositions were selected: "Flight of a bumble-bee" by N. Rimsky-Korsakov,
"Hungarian Dance" by J.Brahms, F.Kreisler's "The Torments of Love", Genesis "I can't dance",
Genesis "Land of Confusion", Rammstein "Du hast".
Subjects
20 Subjects, 10 male, 10 female predominantly young (17-35 years) with different musical
preferences and educational background participated. Each of them received 6 blanks each blank
contained all the 24 descriptors (one blank for one composition). They were asked to estimate
correspondence of each composition to all the descriptors scaling from 1 - fits very poor to 7 - fits
very well.
Results
ANOVA with repeated measures was applied for the scores of each descriptor as a dependent variable
and compositions as the levels of independent (grouping) variable. Distribution of all the dependent
variables was quite close to normal. Data for each descriptor were computed separately. Compositions
significantly differed according to all the descriptors. 9 cases can be considered as the "definite hits" -
experts descriptions achieved the highest score within the subjects estimations and there was
significant difference between the leading composition and the next one closest to it. In 7 more cases
2 compositions one of which was those one that was predicted by experts shared first place with no
significant difference between them.
In the other 8 cases the composition that was predicted by experts was either among three or more
leading compositions or differed significantly from the first one.
Nevertheless, in almost all the cases prototypical compositions were estimated significantly higher
than the mean by corresponding descriptor.
We have also performed regression analysis of scores obtained for each descriptor as a dependent
variable and tempo as an independent one.
Results were very different for the different descriptors. It was high for descriptors that evidently
presupposed tempo estimation, low for all the taste-based metaphors and moderate for the rest. The
greatest dependency on tempo R-sq.=0.661 (S- function) was detected for descriptor "Energetic".
There were plenty of Quadratic and Cubic functions that served as the best approximations for various
"U" and "inverted U" type dependencies of some descriptors on tempo. For example, in the case of
descriptor "lucid" best approximation was cubic function with R-sq.=0.381, while linear
approximation provided only about 15 % of explained variance.
In all the taste based metaphors descriptors R-sq. for the dependency on tempo was very small, just
about several per cents.
Discussion
In general, H1 may be considered as accepted and H0 as rejected. All the compositions varied
significantly in all the descriptors. All the descriptors may be considered as adequate because the
mean scores of corresponding compositions were significantly higher than the mean.
As for the fact that in 15 cases selected composition shared its primacy with one or more other
compositions, it is rooted in the experimental task as it was formulated for the experts. They had to
propose four descriptors for each composition, although not to evaluate which composition fitted this
or that descriptor best of all. As the result descriptors with high loading on tempo has repeated several
times for each composition. "Dynamic", "turbulent", "exciting"("agitating") for the "Bumble-bee",
"agile" for "Land of confusion", "energetic" for "Du hast", "cheerful" for "I can't dance" and some
other adequate translation of which puzzles me. As the result, compositions with the higher tempo
either occupied the first rank or formed the whole group on the base of contrast to slow "Torments of
Love", as it was the case with the descriptor "cheerful" that was supposed to designate "I can't dance".
The most striking and unexpected result for us was that the subjects reproduced the strange
taste-based metaphors like "aerated" and "little bit sweet". Mean scores of the "Land of confusion" by
those descriptors were significantly higher than that of any other composition. All my attempts to
propose any reasonable explanation for this event ended as yet without any success.
"Strict" descriptor has also reproduced with the great difference of "Du hast" from all the other
compositions. The latter may lead to formulating a hypothesis of how "strict" is encoded by music.
"Strict" means that protentions of "not allowed" objects will be repressed quickly and without
hesitations. "Du hast" differs from the other compositions by the substantially greater number of
pauses which brake the melody. We always stay with some expectation of prolongation of previous
sounds when the next pause happens. This explanation of course has to be tested in a special
experiment that could help to identify concrete temporal and other characteristics of melody that are
responsible for the encoding of "strictness".
The meaning of "Aired" (not to mix with "aerated like soda-water") which was initially proposed for
"Land of confusion" seems to be encoded by the constant rise of tune without any significant
dependency on the tempo. The first in this descriptor was the "Bumble-bee flight", although the
second was slow "Torments of Love" that, nevertheless is characterized by the almost constant rise of
tune.
The tempo although being very important category cannot predict the results solely. Even in the case
of "Energetic" in which R-sq. was the biggest, the most energetic was "Du hast" with tempo=115,
while not significantly different from it "Bumble-bee" had tempo=188!
Nevertheless, Regression analysis seems to be the most appropriate procedure for evaluation of
influence of music variables on the estimation of this or that descriptor. Correlation (that presupposes
linearity and is so widespread in the different branches of Psychology) is hardly may be of some value
because most of the dependencies, as we have seen, are considerably different from the linear.
What is needed for an exploratory stage of investigations in relations between meaning of the words
and meaning of musical pieces is to be able to include more significant variables into regression
equations. Such variables might be: number and duration of rises and falls of melody, number and
duration of pauses, "speed" of the rise and "speed" of the falls, etc. Unfortunately, we neither had
software that could count such statistics for melodies, nor time to calculate it manually.
Concrete constants like, for example, the time at which estimation of available energy takes place is
the matter of another type of experiments of a more precise character. We plan to apply schemes of
one subject and paired comparisons of stimuli that will vary only in the duration of an event that is
under investigation in order to determine the constants.
References
Almayev N. (1999a) Dynamic Theory of Meaning: New Opportunities for Cognitive Modeling. Web
Journal of Formal, Computational & Cognitive Linguistics. http://fccl.ksu.ru/fcclroot.htm .
http://fccl.ksu.ru/winter.99/cog_model/proceedings.htm
Almayev N. (in press). The Concept of Psychic Energy and the Phenomenon of Music. Analecta
Husserliana: The yearbook of phenomenological research / Published under the authority of the
World Institute for Phenomenological research and learning - Dordrecht: Kluwer Academic
Publishers.
Penultimate draft is available at:
http://www.psychol.ras.ru/strukt/ALMAEV/penult.htm
Anochin P.K., (1978).Beitraege zur allgemeinen Theorie des funktionellen Systeme (Jena: Fischer,
Bernstein N.A. (1967). The Co-ordination and Regulation of Movements (Oxford: Pergamon Press,
Husserl E. (1939). Erfahrung und Urteil , Prag:Academia.
Husserl E. (1969). Gesammelte Werke Bd.10. Vorlesungen ueber inneren Zeitbewustssein. Den Haag.
Martinus Nijhoff.
Luescher M., (1983). Luesher Color Test. London-Sydney: Pan.
Osgood Ch., Susi C.J., Tannenbaum P.H. (1957). The Measurment of Meaning rbana: University of
Illinois Press.
Osgood Ch. E., (1976). Focus on Meaning vol.1.The Hague-Paris: Mouton.
Simonov P.V.(1985). The Science of the Human Higher Neural Activity and Artistic Creation.
Moscow: Nauka.
Sloboda, J. (1986). The Musical Mind. Oxford: Clarendon Press.
Back to index
Proceedings paper
Introduction
Music can arouse deep and profound emotions within us, often in conjunction with external situations. However one of the main questions
which remains unanswered is whether the emotional feelings, in response to music, are due to inherent features of the music or learnt by
association with concurrent events.
Cooke (1959) suggests that particular sequences of notes are always associated with, and express, particular emotions. He proposes sixteen
different basic sequences of notes, and gives examples from the repertoire of western classical music to support his proposal. Meyer (1956)
however suggests that emotion is aroused when a tendency to respond is inhibited. He argues that music sets up expectations, which produce a
mental response to complete these expectations. If the actual music is different from the mental expectation, then an emotional response is
produced. He points out that the communication of shared meaning can only take place within a cultural context. Only if one is familiar with
the music within a culture will one generate appropriate expectations. Sloboda (1991) demonstrated some empirical support for Meyer's ideas.
He asked musicians to describe musical passages that produced very intense emotional experiences, such as 'tears' or 'shivers down the spine',
and some of these experiences followed unexpected changes, such as a change of key or rhythm, or a new vocal or instrumental entry.
One approach to elucidating the question as to whether the emotional responses to music are learnt within a culture or are inherent to the
music is by cross-cultural studies of responses to music. If listeners can detect the emotional content of music from cultures with which they
are unfamiliar, then this would suggest that the emotional content is inherent in the music.
Several studies have been conducted using Hindustani classical music, which is prevalent throughout North India and Pakistan. This music
involves improvisation within a raga, which is a complex melody structure, similar to the western concept of mode, but more detailed.
Different ragas are traditionally associated with particular emotional feelings.
Deva & Virmani (1980) showed that Indian listeners are sensitive to the emotional content of classical Hindustani music. Studies comparing
Indian with western listeners have generally found differences in their sensitivity to such music. Castellano, Bharucha & Krumhansl (1984)
asked Indian and western listeners to rate how well a probe tone fitted a theme from each of ten ragas. Apart from the notes corresponding to
the tonic and fifth of the western musical scale, which are equally important on Indian scales, only Indian listeners were sensitive to the scales
underlying the ragas. Vaughn (1994) found that affective ratings by western musicians of melodies from ragas did not agree well with those
used by Indian musicians.
Gregory & Varney (1996) found differences between the responses of western and Indian listeners to Hindustani ragas. The listeners in this
study were all students living in Britain, and it could be argued that many of the Asian students were not so familiar with classical Hindustani
music, perhaps only coming across this music in the popularised form used in Indian "Bollywood" films. If so, this would tend to diminish the
extent of any differences. Gregory (1996) has however confirmed the differences between the emotional responses of listeners from different
cultural backgrounds in a more detailed study comparing western listeners with those from an Indian/Pakistani background, most of whom
were quite familiar with Hindustani classical music.
These studies would all seem to support the contention that the emotional responses to music are learnt within a particular culture. However
this may be an oversimplification. The possibility still exists that some features of music are universal across cultures in producing an
emotional response, while others are specific to particular musical cultures and are learnt within the culture.
Balkwill & Thompson (1999) propose a multiple cue model for the perception of emotion in music. They suggest that some musical features,
such as tempo and melodic or rhythmic complexity, may provide universal cues as to the emotional content, whilst other features such as
modality may be specific to particular cultures. Listeners may therefore rely on either, or both, of these types of cues depending upon their
cultural knowledge. They carried out an empirical study, playing Hindustani ragas to western listeners, and showed that the listeners were
sensitive to the intended emotions of joy, sadness and anger in the music, but not to that of peace. They also showed that listeners' judgements
of different emotions were significantly related to their ratings of certain musical features of the ragas. For example the perception of joy was
associated with fast tempo and low melodic complexity, whilst sadness was associated with slow tempo and high melodic complexity.
The present study looks at the issue from a different perspective, by using Qawwali music. Qawwali is a recognised musical genre in the
Indian subcontinent, but has unique characteristics related to its religious function. The term Qawwali (an Arabic word meaning "utterance")
applies both to the medium and to the occasion of its performance, the devotional assembly of Islamic mysticism (Sufism) in India and
Pakistan. Qawwali as music is a group song sung by qawwals. A group of qawwals is made up of a lead singer, one or two secondary singers
and musicians, and clapping junior members. Performers believe they have a religious mission: to praise the Name of Allah using rhythmic
handclapping, vigorous drumming on a barrel-shaped dholak, harmonium and a vast repertoire of sung poetry. Qawwals present mystical
poetry usually in either Farsi, Urdu, or Hindi. By repeatedly and hypnotically chanting salient phrases, they claim to transport audiences to a
Duration
(min)
Traditional Christian hymn Come Thou long expected Jesus by St. Michael's 2
singers
Results
For each adjectival scale, the mean responses of the four groups of listeners to the different pieces of music are shown in Figure 1.
Figure 1. Mean rating scores by each listener group on adjectival scales
The mean scores of the four groups of listeners were compared by means of a one-way analysis of variance, calculated for each piece of music
on each adjectival scale. The values of F and levels of significance are shown in Table 1.
Table 1. One-way analysis of variance comparing the difference between the groups of listeners on each adjective scale on each piece of
music.
Type of music
References
Balkwill, L-L. & Thompson, W.F. (1999). A cross-cultural investigation of the perception of emotion in music: psychophysical and cultural
cues. Music Perception, 17,
Back to index
Proceedings paper
Music is known to be connected with emotion. Yet, it is evident that music is a product of cultural
development. Consequently it is nearly impossible really to understand the meaning of music from a
different culture. This can even apply to different styles or epochs of one's own culture. Following
Dowling & Harwood (1986) there are musical signs which are of a symbolic character that can induce
emotions, provided they are familiar. The understanding of music often takes place at levels of a high
degree of complexity. Nevertheless, music is often supposed to be a "language" common to all humans.
If this is true, there must also exist signs which go back to early ages of human development and which
retained their meaning. It is very difficult to discover them, because early communication has not been
fossilized. In order to get round this problem it seems useful to focus our attention on the musical
elements which are used by parents in motherese for communication with their infants. The use of
motherese occurs suddenly when there is need for it and it has been proved to be cross-cultural (Grieser
& Kuhl, 1988; Fernald et al., 1989; Fernald, 1992; Papoušek, M. 1994). H. Papoušek (1985) suggests
that motherese is a revival of an early form of communication used to build a bridge to the preverbal
child.
In this special way of talking several different pitch contours are produced by marked modulation of the
fundamental frequency and by prolonging the underlying syllable. These melodic contours, as they are
called, are effective in mediating emotional messages. This is in line with the observation of Williams &
Stevens (1982) that the course of the fundamental frequency in time most clearly characterizes a
speakers' emotional state. The use of melodic contours is linked to social contexts. So for catching the
infant's attention and encouraging it to imitate or to take its turn in a dialogue parents increase the use of
rising contours. Contrastingly, to soothe an infant softly declining falling melodic contours are used.
Bell-shaped melodic contours prevail in approval of a desired behaviour like smiling; they decline softly.
To discourage unwanted behaviour steeply declining falling- and bell-shaped melodic contours will
occur (Stern, Spieker & MacKain, 1982; Jacobson, Boersma, Fields & Olson, 1983; Werker & McLeod
1989; M. Papoušek, H. Papoušek & Symmes, 1991).
It is to these musical elements of parent-infant communication that H. & M. Papoušek (1995) trace back
the roots of musical development. So the question arises, whether these melodic contours, which are
known to transform into prosody of speech, remain effective in music, especially in the melody of
singing.
In order to answer this question, four different song categories were chosen with two each forming
contrasting pairs. They are connected to different social situations which also occur in parent-infant
contacts. These categories are 'Songs to arouse Attention' (A), 'Lullabies' (L) as soothing songs,
'Warriors' Songs' (W) and 'Praise Songs' (P).
The songs were taken from ethnological archives. They should be from a large variety of cultures and be
transmitted orally only. Their melodic construction was analyzed in order to compare it with linguistic
research. For this purpose the tapes were played at half speed and the pitch contours were written down
by hand. Truly comparable pitch contours should be expected to be composed differently in the
examined song categories always according to purpose.
As a first result it can be noted that the melodies of the songs are really composed of single pitch
contours whose figures are similar to the melodic contours of motherese.
In accordance with the melodic contours of motherese they were specified as Level, Rising, U-shaped,
Falling, Bell-shaped, Sinusoidal and Complex.
The Level-contours can be on one-level, "1l", or on two levels like coockoo-calls, "2l".
To facilitate interpretation the following forms were subdivided further:
Rising melodic contours can rise steeply or softly. This is described as "st" or "so".
The same differentiation applies to the U-shaped contours which in motherese are also used to get the
child's attention.
Falling and Bell-shaped melodic contours can decline steeply or softly. That is correspondingly marked
"st" or "so".
The Sinusoidal melodic contours can be shaped differently, producing different effects. Therefore such
Sinusoidal contours which consist of lined-up Bell-shaped contours are called Sinusoidal-Bell-shaped
contours with the subdivisions SBst and SBso. Often Sinusoidal melodic contours contain large leaps,
named SLst and SLso respectively. Sinusoidal melodic contours can also be softly swinging, specified as
SSst or SSso.
Finally, Complex melodic contours comprise lipthrills, short exciting cries, whistling etc.
As a second result it can be noted that the composition of forms of melodic contours shows
different focal points in the four song categories.
This is presented in the table below.
Song Amount of Melodic Contours (%)
category Level Rising U-Shaped Falling Bell-shaped C / sinusoidal
n = 22 15.39 1.67 19.40 2.51 1.84 0.67 7.86 11.87 2.17 3.85 0.50 0.17 7.69 8.03 0.67 15.72 0.00
n = 38 11.90 1.25 3.83 2.79 2.71 0.83 22.18 12.81 5.37 4.12 0.13 0.54 11.15 10.98 1.21 1.46 6.74
n = 42 5.30 0.51 1.99 1.45 0.40 1.88 9.05 12.86 7.80 7.68 3.47 5.75 10.13 12.81 4.61 12.52 1.76
n = 83 3.01 1.02 3.40 1.60 0.83 1.84 8.49 15.28 8.64 11.79 1.31 2.81 12.03 11.50 3.59 12.42 0.44
Table 1. Distribution of forms of melodic contours in the four song categories (%). The upper numbers
always show the whole amount of the respective form. The differentiated values are given in the
numbers below.
As can be seen from the table, steeply Rising melodic contours prevail in the 'Songs to arouse Attention'
and, compared to other song categories, they have the largest amount of 2l-melodic contours. This is
reminiscent of the findings about corresponding social contexts in linguistic research. Also, the
1l-melodic contours as well as the SSso form of Sinusoidal melodic contours are emphasized. 1l-
melodic contours are presumed to fix the listener's attention and the softly Swinging Sinusoidal melodic
contours seem to have a similar effect, but in a somewhat moderate manner.
The 'Warriors' Songs' are characterized by falling melodic contours with the focal point on the steeply
declining version, which also fits the results about motherese. Additionally, 1l- melodic contours are also
of importance in these songs. Sinusoidal melodic contours containing large leaps represent a big amount,
SLst and SLso. Noteworthy is the number of complex contours compared to the other song categories.
The latter two have the effect of being rather arousing. So they may support the effect of the steeply
declining Falling melodic contours.
In 'Praise Songs' both versions of Falling melodic contours with the focal point now on the softly
declining form and both versions of Bell-shaped melodic contours predominate. Additionally they
contain the largest number of Sinusoidal melodic contours. All of this is reminiscant of the preferential
use of melodic contours in parent-infant communication concerned with rewarding.
'Lullabies' also show mainly Falling- and Bell-shaped melodic contours, but with the difference that now
the focus is in each case on the softly declining part. Sinusoidal melodic contours are also highly
important. Interestingly, only the amount of Sinusoidal-Bell-shaped forms is smaller than in the 'Praise
Songs'; the melodic contours in 'Lullabies' are of less complexity in accordance with the infant's capacity
of reception.
However, certain reservations have to be made. On closer inspection we will find at least four different
types of lullabies. They differ with regard to the composition of melodic contours and, connected to that,
they differ in function. Lullabies can be very soothing, moderately soothing, entertaining and even be of
a warning character (Cordes, 1998). But the prevailing type of lullaby is the moderately soothing one and
which praises the infant's behaviour. Therefore the averages reflect those which are known from
motherese to have the corresponding effect, i.e. softly declining Falling- and Bell-shaped melodic
contours. This explains why 'Lullabies' and 'Praise Songs' look rather similar with regard to the
composition of melodic contours.
Summarising the findings it can be stated that the outstanding forms of melodic contours of each
song category are comparable to those which are preferably used in corresponding social contexts
in motherese.
Finally there is one more important fact which came out. From the duration of a song and the number of
its melodic contours the average duration of the melodic contours was calculated. Thereby it became
obvious that average duration differs according to song category. In 'Songs to arouse Attention' they are
longest, with 5.14 sec on average, followed by 'Lullabies' and 'Praise Songs' with 4.51 sec and 3.99 sec
of average duration resp.. The melodic contours of 'Warriors' Songs' turned out to be shortest, with 2.50
sec on average. These findings are particular relevant when placed in relation to the ethologist
Tembrock's (1971) research. According to him, in the acoustic system of animals, the control of distance
is very important and he claims that distance-reducing calls, which have an attracting effect, are
characterized by a relatively long rise time, until they reach full amplitude, as well as by longer temporal
extension. In vertebrates they have a tonal character with dominating frequencies. In contrast, calls
which widen the distance between individuals reach maximum amplitude quickly and are of short
duration. They are not or only irregularly repeated and are of a noisy character. The broad spectrum of
frequencies prevents or restricts an adaptation in recipients and causes them to seek refuge, often in
flight. Spittka (1969) has been able to show that rats, having the opportunity to choose between different
sounds, prefer those with long rise times, of longer duration and with rhythmic repetition.
In communication with animals we use these effects, though unconsciously, when luring or shooing
them away. The findings of M. Papoušek et al. (1990) prove that these effects are meaningful in
preverbal parent-infant communication, too. They have shown that expanded melodic contours manage
to attract the attention of four-month old infants, whereas a short rising-falling contour fails to do so.
Relating these findings to my own results it is obvious that the temporally stretched melodic contours of
'Lullabies' and 'Praise Songs' and contrastingly the short duration of melodic contours in 'Warriors'
Songs' conform to Tembrock's statements regarding animal acoustic behaviour. The same is true for the
large amount of steeply declining Falling melodic contours in 'Warriors' Songs', which just start at the
highest point, as well as for those Bell-shaped melodic contours that rise and decline suddenly.
Following Tembrock, they have the effect of being most aggressive. In line with this is the outstanding
number of noisy complex melodic contours in this category and finally the noticeably larger number of
different forms of melodic contours which are used in a song. By contrast, the prevailing Falling- and
Bell-shaped melodic contours of 'Lullabies' and 'Praise Songs', which belong to pleasant situations,
mainly decline softly. A moderately sudden decline of them serves to keep the listener's attention (M.
Papoušek, H. Papoušek & Symmes, 1991).
Finally, the especially long duration of melodic contours in 'Songs to arouse Attention' has a counterpart
in the 'long-distance calls' of animals which are used to call for someone very distant or as alarm calls
(Tembrock, 1971).
Conclusion
The melodies of songs have been proved to be composed of melodic contours similar to those that
parents of preverbal infants use to convey emotional meanings. This correspondence suggests that the
primordial connection of music and emotion has developed from an early form of human
communication. Additionally, the correspondence of important features of melodic contours with
features in animal sounds suggests that in some ways the emotional meaning of these human acoustic
figures can be traced back to the pre-human state of development. So one can suppose that music
expression and comprehension originate at least in part on a basic level which share all humans.
True, a large number of researchers has proved that pitch contour is highly important for recognition of
melodies. Typically this applies especially to children and untrained grown ups. While this lends general
support to my findings, further research is needed to buttress them.
References
Cordes, I. (1998) Melodische Kontur und emotionaler Ausdruck in Wiegenliedern. In K. E. Behne, G.
Kleinen & H. de la Motte-Haber (Eds.). Musikpsychologie: Jahrbuch der Deutschen Gesellschaft für
Musikpsychologie. Göttingen, Bern, Toronto, Seattle, Hogrefe-Verlag.
Fernald, A. (1992) Meaningful Melodies in Mothers' Speech to Infants. In H. Papoušek, U. Jürgens & M.
Papoušek (Eds.). Nonverbal Vocal Communication: Comparative and Developmental Approaches.
Cambridge University Press, Cambridge and Editions de la Maison des Sciences de l'Homme, Paris. pp.
262-282.
Fernald, A., Taeschner, T., Dunn, J., Papoušek, M., De Boysson-Bardies, B. & Fukui I. (1989).A
Cross-Language Study of Prosodic Modifications in Mothers' and Fathers' Speech to Preverbal Infants. J.
Child Lang., 16, 477-501.
Grieser, D. L. & Kuhl, P. K. (1988). Maternal Speech in a Tonal Language: Support for Universal
Prosodic Features in Motherese. Developmental Psychology, Vol. 24, No. 1, 14-20.
Jacobsen, J. L., Boersma, D. C., Fields, R. B. & Olson, K. L. (1983). Para-linguistic Features of Adult
Speech to Infants and Small Children. Child Development, 54, 436-442.
Papoušek, H. (1985) Biologische Wurzeln der ersten Kommunikation im menschlichen Leben. In W.
Böhme (Ed.). Evolution und Sprache: Über Entstehen und Wesen der Sprache. Tron, Karlsruhe. pp.
33-47.
Papoušek, H. & Papoušek, M. (1995) Beginning of Human Musicality. In R. Steinberg (Ed.). Music and
the Mind Machine. Verlag Springer, Berlin- Heidelberg-New York-London-Paris-Tokyo-Hong
Kong-Barcelona-Budapest. pp. 27-34.
Papoušek, M. (Ed.) (1994) Vom ersten Schrei zum ersten Wort. Anfänge der Sprach-entwicklung in der
vorsprachlichen Kommunikation. Verlag Hans Huber, Bern-Göttingen-Toronto-Seattle.
Papoušek, M., Papoušek, H., Bornstein, M. H., Nuzzo, C. & Symmes, D. (1990). Infant responses to
prototypical melodic contours in parental speech. Infant Behavior and Development, 13, 539-545.
Papoušek, M., Papoušek, H., Symmes, D. (1991). The meaning of melodies in motherese in tone and
stress languages. Infant Behavior and Development, 14, 415-440.
Spittka, O. (1969) Akustische Wahlversuche mit Albino-Ratten in der Skinner Box. PhD.,
Humboldt-Universität, Berlin. Unpublished.
Stern, D. N., Spieker, S. & MacKain, K. (1982). Intonation Contours as Signals in Maternal Speech to
Prelinguistic Infants. Developmental Psychology, Vol. 18, No. 5, 727-735.
Tembrock, G. (Ed.) (1971). Biokommunikation. Informationsübertragung im biologischen Bereich, Teil
II. Akademie-Verlag, Berlin.
Werker, J. F.& Mc Leod, P. J. (1989). Infant Preference for Both Male and Female Infant-Directed Talk:
A Developmental Study of Attentional and Affective Responsiveness. Canadian Journal of Psychology,
43 (2), 230-246.
Williams, C. E. & Stevens, K. N. (1982) Akustische Korrelate diskreter Emotionen. In K. R. Scherer und
P. Ekman (Eds). Approaches to Emotion. Lawrence Erlbaum Associates, Inc. Publishers, Hillsdale, New
Jersey. pp. 307-325.
Back to index
Proceedings paper
MELODIC MUSICAL INTERVAL OCCURRENCE AND PERCEIVED EMOTIONS IN CLASSICAL AND SERIAL MUSIC
Marco Costa, Serena Rossi, Luisa Bonfiglioli, Pio Enrico Ricci Bitti
Department of Psychology, University of Bologna, Italy
costa@psibo.unibo.it
Aim. The aim of this study was to investigate the relations between the statistical occurrence of the different musical intervals in melodies and the evoked emotions. The
hypothesis was that the expression of a particular emotion in music is associated with a distinct interval frequency distribution.
Method. Thirty-four melodic pieces from tonal classical music and four melodic pieces from serial music (Le Pierrot Lunaire by Schönberg) were chosen and presented to a
sample of 19 students without a specific musical training for the evaluation of their emotional content on a semantic differential composed by 10 bi-polar adjective scales.
Melodies that polarized on each adjective were selected and a complete intervallic analysis was performed. Without considering pauses, each interval was classified within 15
categories. An interval frequency distribution for those melodies that characterized each adjective was obtained and analyzed.
Results. A global interval ranking showed that in tonal music unisonous and seconds accounted for the 66.5% of total intervals and if intervals until the perfect fifth were
considered the 92.5% was reached (Figure 1). In Schönberg's pieces there was a significantly lower frequency of unisonous, min II, maj II, min III, and a greater frequency
diminished and augmented intervals within the octave (Figure 2). In melodies rated as pleasant and agreeable there was an higher presence of unisonous, maj II, min III, perfect
IV and a lower occurrence of maj VII, diminished, augmented, and intervals greater than the octave (Figure 3). Melodies expressing unhappiness were composed by a higher
occurrenceof unisonous whereas melodies expressing happiness were formed by an higher occurrence of maj VI, octave, and intervals greater than the octave (Figure 4).
Melodies with a positive affective connotations (relaxed, stable, calm,serene, carefree) were characterized by a frequent use of maj II, min III, and perfect IV. To the contrary
melodies with a negative affective connotation (restless, unstable, fearful, worried, anguished) were characterized by an higher occurrence of unisonous, min II, diminished and
augmented intervals (Figure 5). Melodies evoking power had an higher frequency of unisonous, a lower frequency of min II, maj II, min III, maj III, an higher frequency of
diminished, augmented, and intervals wider than the octave (Figure 6).
Conclusions. The study of interval occurrence has revealed an effective tool for the investigation of emotional content in melodies.Furthermore this research has showed that the
dimensional property of musical intervals (its amplitude ranging from unisonous to intervals greater than the octave), it is important for the explanation of perceived emotions in
melodies.
Figure 1
Figure 3
Figure 4
Figure 5
Figure 6
Back to index
Proceedings abstract
THE INFLUENCES OF A CONCURRENT AUDITORY FREQUENCY ON THE PERCEPTION OF AN
AMBIGUOUS VISUAL STIMULUS
1. BACKGROUND
3. METHOD
4. RESULTS
5. CONCLUSIONS
This study began to bridge the gap between visual film and musical underscoring
by examining the relationships between very basic and simple visual and
auditory stimuli. Allowing for precise laboratory control over stimuli, it was
somewhat removed from the level of complexity involved in musical underscoring.
Even so, robust effects of auditory frequency were found, thus helping to
establish the role of auditory stimuli in the interpretation of visual stimuli
on a firm scientific basis.
Back to index
Proceedings paper
Background
The study of historical performance practice has occupied a prominent place in musicological investigations of baroque music
in the 20th century, but particularly since the 1960s. Instruments of the period have been revived and contemporary documents,
including treatises of performance practice, studied and interpreted (Dolmetsch 1949, Donington 1963, Hubbard 1965,
Neumann 1978, 1982, 1989). By the 1980s-1990s a consensus began to be formed as to the characteristics of a historically
stylish performance of baroque compositions. Although certain publications of the field, especially the more philosophically
and psychologically oriented ones (Farnsworth, 1969; Kenyon 1984, 1988, Kivy 1995, Taruskin 1995, Wiora 1968), implicitly
acknowledged that taste and enculturation have a major part in evaluating musical performances, musicologists failed to study
the performance style and its components from an empirical point of view. In other words, the listening process has largely
been ignored. The current project undertook the first step in rectifying this situation.
During her investigations of baroque performance practice issues the first author (Fabian, 1998) questioned the validity of some
of the claims musicologists made regarding the important characteristics of the historically informed performance. While they
often referred to tempo, ornamentation and use of baroque instruments, her experience suggested that these elements were not
as crucial as pulse or meter, articulation and accenting. Before being able to discuss this issue, a broader question needed to be
addressed ? What are the dimensions that underlie the listeners' perception of baroque music? To address this problem, we
decided to adopt an empirical approach. The study reported is largely exploratory, however, we propose that key dimensions of
the perception of baroque music will be related to the constructs of stylishness and expressiveness. This is in line with the view
of certain writers (eg. Avison, 1752; Babitz, 1952, 1967, 1970; Donington, 1973, 1989; Rosenstiel, 1972; Schulenberg, 1990)
who look at the performer's perspective, rather than that of the listener.
Stylishness refers to the appropriateness of performance in its historical and musical context. This being adherence to historical
performance circumstances, such as size of ensemble and type of instruments, and the use of historically documentable
instrumental techniques, such as fingering, tonguing and bowing, and other period performance practices such as articulation,
ornamentation and metre (as understood by eighteenth century practitioners). To this end, stylishness is the most important
issue in performance practice theory. Significantly its evaluation depends on perception as well, and not just on musicological
prescriptions. Expressiveness generally refers to interpretative and individual aspects of a performance. However there is a
crucial need to identify the parameters of musical expressiveness as distinguished by various musical styles. For example, in
baroque music, using rhetorical gestures and a speech-like flexibility in rhythmic patterns is perfectly in line with the period's
performance practice. This is a kind of expressiveness that is peculiar to baroque and distinct from the common understanding
of expressiveness, such as rubato used in reference to romantic music.
Aim
Stimuli
5 different recordings each were chosen of 2 musical excerpts: bars 82-125 of the 1st Movement (Allegro) of J. S. Bach's
Brandenburg Concerto No. 4 (ca. 1 minute in length), and bars 1-16 of the 2nd (Adagio) Movement of J. S. Bach's Brandenburg
Concerto No. 1 (duration ca 1 minute 50 seconds). (see Appendix for details). These were played from a cassette tape in two
different orders. The choice of the examples was based on the first author's study of over 40 complete Brandenburg sets
(Fabian, 1998), and represented various aspects of debatable performance practice issues, such as choice of instruments,
phrasing, articulation, tempo, tone production, clarity of texture, and so on.
Participants
44 volunteers took part in the study. Approximately half of them were 3rd year or graduate music students at the University of
New South Wales or musicians specialising in performing on baroque instruments. The other group consisted of novices (i.e. 1st
year introductory music students at University of New South Wales). Only pooled data responses are reported here.
Procedure
The participants were sitting in a classroom. They filled in a questionnaire regarding their musical background. Following this
they received brief instructions about how to complete the inventory. Three versions of the inventory were distributed to
participants. Each version had a different order of items and different order of scale poles. Participants were encouraged to
provide their first reaction to each example. Each example was played once (cassette player operated by instructor) and was
followed by enough time (silence) for everybody to complete his or her answers. The session took approximately 60 minutes.
Factor analysis was conducted with components extracted for eigenvalues of greater than 1 using SPSS Version 6.1.1 for
Macintosh software. The first three factors explained 45.8% of variance, and the first two explained 27.1% and 11.6%
respectively. Factor loadings are shown in Figure 1. The items which loaded onto the first factor consisted of two kinds?those
which were related to the stylishness of the musical items, such as Speech-like, Articulated, Clear Structure and, of course
Stylishness; and those which required value judgements (Good-Articulation, Good-Performance, Good-Pulse etc) also loaded
onto this factor. This suggests two things: (1) the first dimension is related to Osgood, Suci and Tannenbaum's ‘Evaluative'
dimension, and, therefore (2) judgments related to stylishness of the excerpts is associated with positive evaluations. The
second dimension was loaded highly with variables that were associated with expressive aspects of the performance, such as
Flexible, Over-Expressive, Romantic and a negative loading from Mechanistic. The third factor did not suggest a clear cut label
to the researchers, and given its relatively weak contribution to the variance it was omitted from further analysis.
The first two factors were investigated further. Given that the theoretical interest was that stylishness is an important dimension
of the perception of baroque music, we tried to explain stylishness response as a composite of other responses. A stepwise
linear regression was performed (using SPSS) in which the Stylishness variable was modelled in terms of the other scales used.
The results of the analysis are summarised in Table 1. The scales which could be used to explain stylishness (at p = 0.05) were
Articulation, Speech Likeness, and Not Romanticness (-ve coefficient). Interestingly, only two evaluative scales (Good
Performance and Well-Ornamented) were entered into the model. The model explained 78.7% of the variance in Stylishness
response. This results suggest that the first factor relates to something more complex than evaluation. Most likely, part of a
perceived good performance is that it is stylish, and stylishness together with judged quality may form an important dimension
of the baroque listening experience.
Table 1. Summary of Stepwise Regression Model of Stylishness
Multiple R .89378
R Square .79885
A similar regression analysis was conducted with the proposed ‘expressiveness' label of the second factor. The resulting
model is summarised in Table 2. This model explained 66.8% of variation in Expressiveness responses and did so in terms of
the Over-Expressiveness, Romantic and not Mechanistic scales (at p = 0.05). In this analysis three evaluative scales were also
included: Good Performance, Good Tempo and Well Accented. In contrast to the factor analysis, the regression analysis
suggests that evaluative scales are used to describe elements of expressiveness and stylishness. However, by taking only the
non-evaluative scales of the factor analysis and the regression analyses together, we tentatively conclude that the stylish
performance is judged primarily according to articulation and speech-likeness. An expressive performance is judged primarily
according to how romantic and unmechanistic it was.
Table 2. Summary of Stepwise Regression Model of Expressiveness
Multiple R .83
R Square .69
Conclusions
This study provides a first step in understanding how listeners interpret baroque performance. While a clear identification of the
theoretical dimensions of stylishness and expressiveness await further investigation, the present study provides musicologists
with an alternative methodology for understanding highly complex performance practice issues from a more objective and
scientifically plausible stance.
A thorough investigation is required in the method of selecting the scales upon which to judge items. Since the terminology
stems from highly sophisticated listeners (musicologists), there will be occasions when the terminology becomes unclear or
interpreted ‘incorrectly' by the more typical listener. For example, in debriefing participants, we discovered that the terms
‘Texture' ‘Detailed' and ‘Well-Accented' were probably used with different shades of meaning than that intended in
musicological literature.
Acknowledgement
The authors are grateful to Kate Stevens and members of the Australian Music Psychology Society (AMPS) for their comments
on an earlier draft of this work.
References
Avison, C. (1752). An essay on musical expression. London
Babitz, S. (1952). A problem of rhythm in baroque music. The Musical Quarterly 38: 533-565
Babitz, S. (1967). Concerning the length of time that every note must be held. Music Review 28: 21-37
Babitz, S. (1970). The great baroque hoax: a guide to baroque music and performance for connoisseurs. Los
Angeles: Early Music Laboratory
Dolmetsch, A. (1949). The interpretation of the music of the 17-18th centuries. London: Novello (1st published
1915).
Donington, R. (1963). The interpretation of early music. London: Faber (revised: 1973, 1989).
Donington, R. (1973). The performer's guide to baroque music. London: Farber
Fabian, D. (1998). J. S. Bach recordings 1945-1975: The Passions, Brandenburg Concertos and Goldberg
Variations ? A study of performance practice in the context of the early music movement. Unpublished Doctoral
Dissertation. The University of New South Wales.
Farnsworth, P. R. (1969). The social psychology of music (2nd ed.). Ames, Iowa: Iowa State University Press.
Gotleib, H. and Koneĕni V. J. (1985). The effects of instrumentation, playing style, and structure in the Goldberg
Variations by Johann Sebastian Bach. Music Perception, 3: 87-102.
Hubbard, Frank (1965). Three centuries of harpsichord making. Cambridge, Mass: Harvard University Press
Kenyon, N. (Ed.) (1988). Authenticity and early music. London: Oxford
Kenyon, N. (1984). The limits of authenticity ? a discussion. Early Music, 12: 3-25.
Kivy, P. (1995). Authenticities - philosophical reflections on musical performance. Ithaca: Cornell University
Press
Neumann, F. (1978). Ornamentation in baroque and post-baroque music. Princeton: Princeton University Press
Neumann, F. (1982). Essays in performance practice. Ann Arbor, Mich: UMI Research Press
Neumann, F. (1989). New essays in performance practice. Ann Arbor, Mich: UMI Research Press
Osgood, C. E., Suci, G. J. & Tannenbaum, P. H. (1957). The measurement of meaning. Urbana, IL: University of
Illinois Press.
Rosenstiel, L. (Ed.) (1972). .The spheres of music: harmony and discord. Current Musicology 14: 81-172
Schulenberg, D. (1990). Expression and authenticity in the harpsichord music of J. S. Bach. The Journal of
Musicology 8: 449-476
Taruskin, R. (1995). Text and act ? essays on music and performance. Oxford: OUP
Wiora, W. (Ed.) (1968). Alte Musik in unserer Zeit ? Referate und Discussionen der Kasseler Tagung 1967
Musikalische Zeitfragen Vol. 13 Kassel-Basel: Bärenreiter
Appendix. List of Recordings Used in Experiment
Virtuosi of England, directed by Arthur Davison. EMI Classics for Pleasure CFP 40010 (rec.1971) [Brandenburg
No. 4 ‘Allegro' only]
Collegium Aureum, no director listed. Victrola VICS 6023 RCA (rec: 1965) [Brandenburg No. 1 ‘Adagio' only]
Academy of St Martin-in-the-Fields, directed by Neville Marriner. Philips 6700045 (rec. 1971)
Concentus Musicus Wien, directed by Nikolaus Harnoncourt. Telefunken ‘Das Alte Werk' SAWT 9459-60 (rec.
1964)
Sigiswald Kuijken and other soloists, directed by Gustav Leonhardt. Seon RL 30400 EK (rec: 1976-1977)
Concentus Musicus Wien, directed by Nikolaus Harnoncourt. Teldec ‘Harnoncourt Edition' DAW 8.42823 XH /
242925-2 (rec. 1982)
back to index
Proceedings paper
This particular study is concerned with an experimental testing of Deryck Cooke's theory about
emotion in music as expounded in his book "The Language of Music" (1959). In it he claimed that
an analysis of tonal music reveals a consistent use of particular patterns of pitches linked to
quite precise emotional meanings. He argued that these meanings constituted a shared language
available to anyone familiar with the idiom and further, that they arose from tensions inherent
in the relationships between musical sounds arising from their origins in the harmonic series.
Cooke illustrated his thesis by reference to many examples of music where the expressive aim of
the composer could be inferred from an accompanying text. He then extended the argument by
analogy to other, purely instrumental, music. In the central chapter of his book he identified
16 basic melodic patterns derived from this material, the meanings of which, he claimed, could
be reliably understood and agreed.
2. Background
Cooke's theory has a persuasiveness about it which invites serious consideration. Given the
importance we attach to melodic contour in our identification and memory of music, it would
indeed seem to have a reasonable claim to be a significant carrier of emotional meaning.
Further, the sheer quantity and range of the examples Cooke produced in support of his ideas
means that they cannot be lightly disregarded, although some writers have been unimpressed by
this. Zuckerkandl (1960), for example, argued that the musical examples used had been selected
to fit the hypothesis, whereas counter-examples could also be easily found. Langer (1957),
though not referring to Cooke, considered that this kind of methodology simply reveals
conventions in musical usage.
However, Cooke specifically rejects the notion that cultural conditioning can be the whole
explanation for his observations:
".....it is difficult to believe that there is no more to it than that. .....one can only
wonder how certain patterns of tone setting ever came about in the first place to 'correspond
with certain emotional reactions on the listener's part', unless the correspondences were
inherent ....." (Language of Music p. 24/25).
Cooke's thesis fits into a model of the relationship between music and emotion which was
current at the time he was writing. Earlier research had established that the perceived
emotional content of music was widely shared among listeners (Schoen & Gatewood, 1927;
Gundlach, 1935; Hevner, 1935a). Hevner, in a series of experiments, (1935b, 1936, 1937),
systematically explored the expressive effects produced by varying specific features of the
music. She concluded that, in order of effectiveness, predictable effects upon the perceived
emotion in music were produced by changes to the tempo, mode, pitch, harmony and rhythm,
although sometimes the influence of one dimension was so strong as to inhibit the impact of
others. Only one of the features she examined, that of melodic direction, had no impact. Given
the significance imputed to melodic direction by Cooke this would seem to undermine one of his
major claims. However, since he makes no attempt to offer any psychological evidence himself,
it is impossible to know how he would have countered that finding.
In more recent years there has been a move away from this essentially static view of the
relationship between music and emotion, particularly since Meyer (1956) drew attention to the
importance of expectation and resolution in the emotional structure of a composition. Interest
has switched to identifying the features which are believed to be implicated in inducing
personal reactions (e.g. Sloboda, 1991). From that approach is developing a broader theory
based on analogies between the dynamics of music and what might broadly be called the human
condition (Sloboda, 1998) There has also been an increasing interest in the role of the
performer and what it is that s/he does which affects how listeners respond. (e.g. Gabrielsson
& Juslin, 1996).
At the same time, however, the notion of there being universal components to our perceptions of
music has continued to be argued. Dowling & Harwood (1986), for example, suggest a number of
apparently culturally universal features which appear to derive from underlying properties of
the human information processing system, while a recent study by PapouŠek (1996) concludes that
Cooke himself foreshadowed one of the major concerns which Gabriel's methodology raises by
drawing attention to the importance of the holistic nature of the musico-emotional experience,
i.e. if the pitch sequences are divorced from a musical context the power of the emotional
message will inevitably be lost
A second problem is highlighted by more recent research findings which have drawn attention to
the importance of the role of the performer with regard to the emotional 'tone' of the music
[Gabrielsson & Juslin (1996)]. By removing that human element entirely from the experiment
Gabriel may again have compromised the result
A third difficulty arises with his use of sine-wave tones as a stimulus since the overtones of
the harmonic series, the presence of which, Cooke argues, give emotional tension to the
intervals of a motif, are missing.
Finally he paid scant regard to Cooke's emphasis on the importance of the listener's musical
sophistication since there is no evidence that his 22 students were musically experienced in
Cooke's terms.
All these factors suggest that it might be worthwhile trying to devise a fairer test for
Cooke's claims, so as to give the musical motifs the best opportunity to convey whatever
emotional charge they might carry.
5. A Naturalistic Alternative
There are a number of concerns which arise when proposing an experiment which uses real musical
examples as its stimuli rather than specially created ones. The first is the need to control as
much as possible for any variables, other than pitch change and modality, which could affect
the emotional interpretation of music. The second is to ensure that any excerpts used are long
enough to have some musical integrity, while being confined to only one basic term and avoiding
such features as modulations which could cloud both the tonality and the emotional tone of the
passage. A third is that the performances offered should be appropriately stylish without being
mannered. Finally there is the issue of how to focus the listener's response on the basic term
as opposed to the longer contextual passage. In consideration of these and other problems the
following criteria were established:
The extracts presented should be limited to 19th century piano music, since that would have the
advantages of restricting timbre and stylistic range while offering a wide choice of
repertoire.
suitable performances should be identified using the criteria that they are considered to be
idiomatic and that the recorded sound is of good quality.
Each extract chosen should contain the exposition of a clear and tonally unambiguous exposition
of just one basic term
The selected motifs should have thematic significance rather than being primarily linking or
passage work and, as far as possible, be similar in terms of general tempo, rhythmic features
and dynamic levels
a coherent extract should be played to the participants, followed by two repetitions of the
particular basic term being considered before asking for a response.
participants should be provided with a written response sheet which would allow them to
quantify the strength of all of Cooke's descriptors to each extract.
participants should be experienced in the tonal idiom
Test material was prepared and trialled with members of the music psychology unit at Keele
University. The response instrument began with a cover page requesting information on age, sex,
and musical experience, followed by a brief explanation of the purpose of the study and a set
of instructions as to how the task should be completed. Attached to the cover page were further
sheets containing a series of response blanks, one for each extract. These listed Cooke's
descriptions of his basic terms. Listeners were asked to rate them as to how well each matched
their own perception of the emotion of the extract. Ratings were on a scale of 1 (No match) to
5 (Perfect match).
A tape of musical extracts was prepared. Each extract was copied from compact disc using a
Marantz CD-67 player linked to a Denon DRM 550 Stereo Cassette Tape Deck. Each was long enough
to establish a context for the basic term and the notes comprising the basic term itself were
then immediately recorded twice more. Participants completed the response sheets while
listening to the tape and were then invited to give feedback both about the experiment and the
rationale behind it. Subsequently a number of revisions were made, the most important being
that the number of basic terms to be tested was reduced from 16 to 9 so as to eliminate
redundancy caused by some terms being musically and descriptively close to one another. The
final selection of the basic terms, together with their musical exemplars, is given in Table 2
below.
Table 1: Basic Terms Selected for the Test
6. Test Methodology
In order to ensure a sample which would be experienced in the tonal idiom, several groups of
adult amateur instrumentalists were contacted. Three groups agreed to take part in the test, as
did a number of music students at a midlands conservatoire. Numbers were as follows:
The volunteers were each given a copy of the response sheets and the page of explanations and
instructions was read through with them verbatim. An opportunity was given to ask questions.
These were rare and none revealed any confusion about the nature of the test or the
instructions. For the final test, four random sequences of recordings had been prepared. Each
sequence began with the opening bars of Chopin's Ballade in A flat major followed by two
repeats of the first 8 notes as a trial 'basic term'. One of the four sequences of recordings,
previously assigned at random to that group, was then played. Of the 34 participants in the
sample, 9 heard the first sequence of extracts, 14 the second sequence, 7 the third sequence
and 4 the fourth sequence
The tape was stopped after each item and re-started after one minute. Participants were warned
10 seconds before each new sequence was about to start. Visual observation suggested that this
time was more than adequate for all responses to be completed. The tape was played on a Bush MC
113 CD Micro System provided by the writer. The whole test from initial explanation and
distribution of the response sheets through to their final collection took about 20 minutes.
Many participants expressed interest in the task and in the outcome of the test.
On the completion of all the tests the results were transcribed from the response sheets so
that scores for each basic term were collated, regardless of which particular sequence had been
heard. The raw scores were entered into an SPSS Spreadsheet for analysis.
7. Results
The results of the tests were analysed first to provide evidence for the hypothesis that it is
possible to detect significant relationships between 'real music' examples of Cooke's basic
terms and his verbal characterisations of them.
A failure to find such a significant relationship, however, need not mean that the results have
nothing to say. While Cooke's theory makes claims specifically for the emotional charge
generated by tonal relationships, he also recognises three component elements in their make-up.
One is modality: major motifs express positive emotions, minor ones express negative emotions.
A second is the direction of the movement: rising motifs are perceived as representing outgoing
or assertive emotions while falling motifs denote passive, receptive ones. A third is the
terminal point of the motif: those ending on the tonic are seen as expressing fulfilment or
completion while those moving away from the tonic lack that element. The responses were
therefore analysed to see if there is any evidence to support these components of the basic
terms.
It also seemed worthwhile investigating whether responses varied according to sex or to
previous familiarity with the specific musical examples chosen. Given the number of analyses to
be performed upon the data it was decided to consider only those factors which generated a
probability of less than 1% as being significant.
a) Overall Responses
First and foremost the results showed that there was only one extract (No. 3) where the
predicted definition was rated highest. The conclusion therefore is that Cooke's basic terms
have not been shown to convey the precise meanings he ascribed to them. However, the mean
rating of the statements did vary markedly for many of the extracts so further analyses were
undertaken to explore the effects of modality, contour and final note.
b) Modality
Scrutiny of the results suggested that, except for extract 8, the means fell into two clusters
in which the valency of the descriptors varied according to the modality of the music. Thus in
extract 1 (major) the most highly scored statements were the positively valenced ones. By
contrast extract 2 (a minor basic term) produced the strongest mean scores for the negatively
valenced descriptors.
In order to confirm this apparent relationship the responses were aggregated and subjected to
an anova test which proved significant at the p=< 0.001 level So, insofar as Cooke's
explanations for his theory include a modal dimension, this result offers some confirmation of
The motifs in minor keys produced similarly unconvincing evidence for the relationship between
active/passive emotions and rising/falling themes.
d) The position of the tonic
(i) Major mode
The significance or otherwise of the tonic termination of a basic term was examined by
comparing the responses to extracts 3 & 7. Cooke's theory would predict that statement 3
(Joyful acceptance .......fulfilment or homecoming) would be preferred for the former, and
statement 7 (Confident incoming emotion of ....anticipation or expectation) for the latter. The
actual results showed that statement 3 was the preferred descriptor for both extracts. In the
case of the basic term which ends on the tonic (Extract No. 3) the difference is not
significant, whereas there is a small significant difference at the p=0.025 level for Extract
No. 7 (i.e. the 'wrong way').
(i) Sex
There were significant differences in the responses of men and women to extract 1 (p = 0.001)
and to extract 8 (p < 0.001). Scrutiny did not reveal any consistent pattern in the former
case. However, in extract 8, which had already been shown to be atypical in the relationship
between tonality and valency of emotion women consistently respond more favourably to the
positive statements and less favourably to the negative statements than did the men in this
sample. Possible reasons for this will be explored below
(ii) Experience
Participants were chosen with regard to their practical experience of working in a tonal idiom.
Analysis confirmed that differences in the length of this experience was not a significant
factor .
(iii) Familiarity with the Extracts
One final possible influence on the participants' responses was examined, namely whether or not
any previous familiarity with the music might have influenced the results. Here again analysis
revealed no significant differences.
g) Rank Order of Descriptors
It was observed that certain descriptors seemed to be more frequently chosen than others. The
responses to the major basic terms favoured statement 3 above all the others, while the
responses to the minor motifs favoured statement 4 (though Extract 8 is an exception [see
below]).
At the other end of the scale, statement 1 was the least favoured descriptor for the major
extracts and statement 9 for the minor ones. The likeliest reason for this, and for the
The sex of the respondent may have some influence on the perception of specific emotional
characteristics in music.
8. Discussion
The core finding of this study is that the precise emotional connotations which Deryck Cooke
ascribed to certain musical basic terms were, with one exception, not agreed by a sample of
listeners experienced in the western tonal idiom. That is not to say that there were no
agreements about the emotional tone of the extracts but these appear to be influenced by
factors other than the tonal tension upon which Cooke built his hypothesis.
Principal amongst these factors is the modality of the fragments. In most cases the findings
show a significant link for listeners between positive affect and major modality, and between
negative effect and minor modality, which is in accordance with many studies made over the
years (e.g. Hevner, 1935; Rigg, 1964). Although clearly not supporting the detail of Cooke's
claims, it could be argued that this linkage is evidence of an inherent mechanism at work.
Various studies have set out to establish whether or not there is an acoustic foundation for
the association between modality and emotion and also the age at which it reliably begins to
appear. The evidence is contradictory because of the various different frameworks which have
been used for such enquiries. Kastner & Crowder (1990), working with children ranging in age
from 3 years to 12 years, showed that even the youngest were able to discriminate and register
culturally shared emotional perceptions of the two modes. They speculated that the reason why
such young children were able to recognise this link is to do with the relative familiarity of
the major and minor modes. They referred to hypotheses by Zajonc (1984) that mere exposure to a
stimulus leads to familiarity with it and that, in all organisms, familiar stimuli tend to be
preferred, novel stimuli feared. Familiarity in this case, the writers argued, could be to do
with the presence of the major triad as easily audible harmonics early in the overtone series,
which would support the notion of an inherent perceptual component in the modal-emotional
linkage. However, the link between familiarity and preference is contradicted in the musical
field by Hargreaves and Castell (1986) who reported that 4/5 year old children did not
discriminate in their preference ratings between familiar and unfamiliar melodies because it
takes time to develop the cognitive maturity necessary to link familiarity with liking. This
suggests a developmental process.
A further difficulty lies in the fact that even this powerful influence is not always
over-riding. As has already been discussed, extract 8, the F minor study, was perceived as
primarily having positive affect. This suggests that such factors as a fast tempo or a
distinctive rhythm can be more sometimes be more significant in moulding perceptions than can
mode or any particular tonal relationships between the pitches. Indeed, Hevner (1937) concluded
that tempo was the most important dimension of all for carrying the emotional connotation of
music, though she warned against assuming any simple correspondence between any one factor and
a particular emotional tone.
Despite Cooke's many references to instances where the rising and falling of melodies, to and
from the tonic, has particular emotional resonances, the factors of musical contour and
key-note position were found not to be significant in this experiment. This is consonant with
developmental studies, which do not support the notion that these elements are perceived in
Reference was made earlier to the identification by Dowling & Harwood (1986) of certain
'universals' in music. They also consider its adaptive value in terms of evolution and suggest
that it can be explained by reference to its value to human groups rather than necessarily to
individuals, allowing them to express shared experiences, values and cultural identity which
are important for survival. (Op. cit. p. 236). The work of PapouŠek (1996) quoted earlier draws
attention to its value in terms of individual social development. Cross (1998) has suggested
the outlines of a developmental theory which would encompass both these elements. Perhaps
future work in such areas as ethnomusicology or neuromusical research will yet reveal important
inherent features of music to help explain why it is so all pervasive in the cultures of the
world.
REFERENCES
Abeles, H. F. & Chung, J. W. (1996) Responses to Music Ch. 8 in the Handbook of Music
Psychology (2nd. Edn.) ed Hodges, D. A. IMR Press, UTSA
Cooke, D. (1959). The Language of Music. Oxford University Press
Cross, I. (1998). Is Music the Most Important Thing We Ever Did? Music, Development and
Evolution Proc. 5th Int. Conf. on Music Perception & Cognition 35 - 40
Dowling, W. J. & Harwood, D. L. (1986): Music Cognition. Academic Press.
Gabriel, C. (1978). An Experimental Study of Deryck Cooke's Theory of Music and Meaning.
Psychology of Music. 6 (1): 13 - 20
Gabrielsson, A. & Juslin, P. N. (1996). Emotional Expression in Music Performance: Between the
Performer's Intention and the Listener's Experience. Psychology of Music. 24 (1): 68 - 91
Gerardi, G. M. & Gerken, L. (1995). The Development of Affective Responses to Modality and
Melodic Contour Music Perception 12, 3: 279 - 290
Giomo, C. J. (1993). An Experimental Study of Children's Sensitivity to Mood in Music
Psychology of Music 21, 2: 141 - 162
Gundlach, R.H. (1935). Factors Determining the Characterization of Musical Phrases Amer. J.
Psychol., 47, 624 - 643
Hargreaves, D. J. & Castell, K. C (1986). Development of Liking for Familiar and Unfamiliar
Melodies Reported in The Developmental Psychology of Music. C.U.P.(1986). p. 117/8
Hevner, K. (1935a). Expression in Music: A Discussion of Experimental Studies and Theories
Psychological Review, 42 : 186 - 204
Hevner, K. (1935b). The Affective Character of the Major and Minor Modes in Music Amer. J.
Psychol.:47 : 103 - 118
Schoen, M. & Gatewood, E. L. (1927) The Mood Effects of Music. In : Schoen, M. (Ed). The
Effects of Music. Kegan Paul, Trench, Trubner & Co., Ltd. London
Sundberg, J. (1993). How Can Music be Expressive? Speech Communication 13. 239 - 253
Sloboda, J. A. (1991). Musical structure and emotional response: some empirical findings.
Psychology of Music. (19: 110 - 120
Sloboda, J. A. (1998) Does Music Mean Anything? Musicae Scientae: 11, 1: 21 -31
Trehub, S., Schellenberg, G & Hill, D (1997). The Origins of Music Perception and Cognition: A
Developmental Perspective In: Deliege, I. & Sloboda, J. (Eds). : Perception & Cognition of
Music. Psychology Press.
Zajonc, R. B. (1984). On the Primacy of Affect American Psychologist. 39: 117 - 123
Zuckerkandl, V. (1960). Review of "The Language of Music" by Deryck Cooke Journal of Music
Theory. (Vol. 4:1, 104 - 109)
Gatewood, E. L. (1927) The Mood Effects of Music. In : Schoen, M. (Ed).
Back to index
Proceedings abstract
THE RELATION OF MELODY AND TEXT IN JAPANESE POPULAR SONGS: AN EXAMINATION BASED
ON A COLOUR-CHOICE METHOD AND A COMPARISON OF GENERATIONS.
UENO-GAKUEN UNIVERSITY,
DZE04250@nifty.ne.jp
Aims:
Songs consist of two components: melody and text. The purpose of this study was
to ascertain and compare the influences of image on two components.
Method:
1) 30% of colour responses were chosen in common between melodies and texts
from the same colour categories such as "warm colour", "cool colour", "neutral
colour" and "muddy colour"(Kawakaki,et al.,1994).
2) There was a stronger relation of image between text and song than between
melody and song.
Back to index
Proceedings abstract
EMOTIONAL PROTOTYPES IN MUSIC
Some empirical findings how basic emotions are expressed in music structure
Kari Kallinen, Department of Musicology, University of Jyvaskyla
Background.
It has been suggested that qualities of musical syntax are central to the expression or communication
of different kinds of emotions. However, there seem to be only few studies providing empirical
evidence to support this claim.
Aims.
This study is about how basic emotions with different intensity levels are expressed in the structure of
(tonal) western art music.
Method.
Music listeners were asked to choose a basic emotional facial expression (joy, sadness, anger, fear,
surprise, disgust) corresponding to the emotional character of the recorded music sample heard and
evaluate the intensity of the emotion by marking a cross to the intensity segment. Passages considered
as best examples of basic emotions (consensus of choices of facial expressions was high) were taken
under structural analysis. Features such as loudness, tempo, complexity of harmony ("iconical"
meaning) and sudden dynamic or textural change, new or unprepared harmony ("symbolic" meaning)
were examined. The combinations of structural features related to distinct emotions and intensities
were then considered as prototypes.
Results.
Results seem to give some support to the claim that states that emotional qualities in music are
qualities in musical syntax. However the final results of this study shall be presented in Keele 2000,
since by the time of the submission the analysis is not in full completed.
Conclusions.
The study suggests that expressions of basic emotions (or at least the recognition of basic emotional
characters) seem to have some analogy with the visual pattern recognition. A figure can be percepted
when all or some of the most important partials are present.
Back to index
Proceedings paper
Charlie Parker and the Golden Section : An Examination of Musical Proportion in the Released and
Alternate takes of "An Oscar for Treadwell"
B. Kenny
ABSTRACT
This paper examines Charlie Parker's contrasting approaches to musical pacing in consecutive takes of "An Oscar For Treadwell." Following a brief analysis of recurrent thematic material common to both takes,
an analysis which incorporates a discussion of Parker's complex approach to motive deconstruction, is a proportional examination of music pacing. Four nodes corresponding to Golden Section proportions were
chosen along the 64 bar time lines for each take. At each of these nodes, the Released Take was compared with the Alternate Take in respect to motivic choice, melodic contour, rhythmic impetus and overall
successful dramatic realisation of each node. Results show a close relationship between significant musical events articulated by the Released Take and Golden Section proportions. It is suggested that one of the
key factors which contribute to the organic unity found in Parker's best work is his ability to conceive of improvisation in long range terms. With all the inherent drama, logic and fluidity of a musical
conversation, Parker is able to generate and sustain rhetorical tension throughout each multi-chorus improvisation, although to a greater extent in the Released Take. Such rhetorical constructs assist Parker in
overcoming artificial formal and musical divisions implicit in the 32 bar chorus song forms he improvises on, a characteristic which distinguishes his playing from other "chorus to chorus" bebop contemporaries.
PROPORTIONAL ANALYSIS
The main issues to be addressed in the following discussion and overview of proportional analysis can be summarised as follows:
1. What is proportional analysis and, in particular, the Golden Section ratio?
2. What analytic advantages does this methodology bring to a greater psychological understanding of such ephemeral but important concepts as rhetoric and pacing in improvisation?
3. What is its history of application to both notated and improvised musics?
4. What is the likelihood of an improviser such as Charlie Parker consciously articulating its aesthetic properties?
One of the primary recommendations for any form of proportional analysis is its very generality, namely that such analyses necessitate a discussion of many interrelated musical parameters at the chosen
proportional moments in a given work. Furthermore, proportional analyses acknowledge the important temporal aspects of music as a series of discrete linear events. The identification of these events and the
manner in which they are prepared and resolved often holds the key to an understanding a work's inherent structural drama. In a genre perhaps more indebted to concepts of spoken 'rhetoric' than notated music,
such analyses may also illuminate the improviser's unique approach to the original structure.
The GS ratio involves the division of a line, an area or a musical work into two parts, "so that the ration of the shorter portion to the longer portion equals the ration of the longer portion to the entire length."
(Howat, 1983, p. 2). The exact length of the ratio is irrational, approximating 0.618034 or a little less than 2/3 of the entire length measured (Figure 1).
Most often, discussions of GS proportions have focused on musical form (i.e. AABA, ternary, sonata), which is either viewed to be underpinned by the ratio or exists in a complementary relationship to it. As
Dorfman (1986, p. 20) notes, GS divisions either coincide with major formal divisions in a work or, as is more common, with major turning points. In the compositions of Debussy, for example, Howat (1983)
demonstrates how GS proportions give shape to and make sense of the composer's smaller episodic divisions. These overriding proportions account for significant thematic events which defy explanation by more
conventional analytic methodologies.
Musical works have the potential for at least two major GS divisions, which occur roughly 2/3 from the beginning (Long GS) or 2/3 from the end of the work (Short GS). These two main GS divisions can
likewise subdivide existing sections or be subdivided themselves in a seemingly infinite number of ways (Figure 2).
Naturally, the greater number of these subdivisions there exist, the more likely that they will either be confused with the primary GS for the work or be subsumed under a consideration of other formal divisions.
While GS proportions have received greater attention in the visual arts, the relationship between logarithmic proportions and music has been an enduring and intriguing one. Throughout history, music theorists
and composers have made connections between this dynamic ratio which can be found in nature and similar proportions in music. In Ancient Greek music theory, GS proportions played a key role in the
Pythagorean and Euclidean formulation of the harmonic overtone series (Dorfman, 1986, p. 26-39). Logarithmic proportions are likewise present in other theoretical constructs of music, such as the relationship
between loudness and sound intensity and, to an extent, in the traditional Western system of proportional notation (Dorfman, 1986, p. 54).
In the absence of verifiable anecdotal evidence, various musicologists have noted a remarkable tendency in the works of Western composers towards GS proportions. These proportions either coincide with larger
formal divisions or with other significant musical which cannot be accounted for by static form alone. Authors with more reliable anecdotal evidence have also examined the intentional use of GS proportions as a
compositional device in the works of Debussy (Howat, 1983), Satie (Adams, 1996) and Stravinsky (Watkins, 1994). To this author's knowledge, the ratio has never yet been applied to a dedicated study of
proportion in improvisation.
What is the likelihood of a jazz improviser and composer such as Charlie Parker deliberately employing GS proportions as a compositional or improvisatory method? Parker's intentional use is probably highly
unlikely, for several reasons which will be addressed at the conclusion of this paper. If not ignorant of GS proportions (one cannot dismiss Parker's acquaintance with them out of hand) it is improbable that
Parker would have intentionally wished to straitjacket his improvisation in this manner. As for any subconscious applications of the ratio, one enters the realms of speculation and hypothesis. Discussing evidence
METHODOLOGY
The two works chosen for analysis were Charlie Parker's improvisations on his own composition "An Oscar For Treadwell." Both the Released and Alternate takes were recorded consecutively at Mercury
Studios in New York City on June 6, 1950. The musicians on the date were Dizzy Gillespie (trumpet), Thelonious Monk (piano), Curly Russell (bass) and Buddy Rich (drums). The particular analytic focus for
this paper is Parker's two improvised choruses (bars 41 to 104) and their relationship to the composed Head melody which proceeds them (Figure 3).
The 'head' or composed melody for this song was most likely a Contrafact, meaning that Parker composed a new melody over the chord changes of an existing jazz standard. While it could be argued that one
should ultimately look for similarities between the original jazz standard and the resulting improvisation, such an approach poses several problems. First, it is often difficult to establish beyond doubt which tune
the changes were originally appropriated from. Second, this exercise is rather pointless in that countless jazz standards share the same or very similar sets of chord changes - the most famous set being those based
on Gershwin's "I Got Rhythm", usually referred to as the "Rhythm" changes. For these reasons, I opted for Parker's composed head which bookends the improvisations of this song.
In each take, Parker's improvised choruses (64 bars in total) articulate two cycles of the 32 bar Head. The Head itself can be subdivided into four 8 bar phrases which express a AABA form.
These clear formal divisions implicit in the Head meant that all A and B sections could be analysed in respect to their A or B section groups, irrespective of the take they originally came from. With the A and B
sections for both takes combined, this brought the total number of A section motives to twelve and B section motives to four (Figure 4).
Within each of the section groups, motives were further subclassified according to their thematic material. For example, A Section motives incorporating the notes of the theoretical 'Blues' scale were labelled
'Blues' motives. Following this identification and classification procedure, four nodes corresponding to Golden Section proportions were chosen at corresponding points along 64 bar time lines for both takes (bar
41 to bar 105). These Four nodes, ranging from greater to lesser significance, are detailed in Figure 5.
At each of the four nodes, the Released Take is compared with the Alternate Take in respect to motivic usage, melodic contour, pacing and overall successful dramatic realisation of each node.
ANALYSIS
CONCLUSIONS
Throughout this paper, it has been argued that the Released Take's greater conformity to idealised GS proportions demonstrates a greater awareness of pacing than the Alternate Take. It must be stated, however,
that this issue of pacing may represent only one of the reasons why the Released Take was ultimately chosen for release. Although Parker may have personally participated in the final editing and selection
procedure, it is more likely that release decisions would have ultimately been made by the record producer. Furthermore, issues other than pacing may have determined the decision outcome. These may include
factors affecting the musical quality of the performance (i.e. miscued notes, the relative merit of the sidemens' solos, the tempo of each take, the arrangement of the Head, etc.) or those affecting the sound
recording process (i.e. balance, intrusive noises, etc.). In spite of these important variables, it cannot be altogether discounted that the release decision process may have been influenced by issues of pacing and
proportion.
Another important point is that in all probability Parker would have only been aware of GS proportions at an intuitive and subconscious level. It is doubtful whether Parker would have been able to, or would have
even wanted to, employ GS proportions with any degree of precision. On the other hand, Parker may have intuitively acquired them through three equally plausible hypotheses.
The first is that as an consummate jazz improviser, Parker would have certainly understood the importance of melodic rhetoric and similarly how to structure it to achieve optimal communication. Through years
of experience, experimentation and self-critical feedback, Parker would have also realised the importance of building and releasing tension across large formal entities. His very attraction to such proportions as
the Golden Section may be explained simply as a rational expression of a generalised arch curve, one which prolongs tension for the majority of a work and leaves sufficient time to resolve it.
The second hypothesis is that Parker would have in time internalised many of the jazz Head arrangements he improvised on to the extent that their proportions could not but influence his improvisations. As a
composer and regular performer of jazz standards, Parker would instinctively come to know where the important moments of most tonal works occur. In well constructed song forms, these peaks would be
especially articulated in the Head arrangement's melodic contour, harmony, rhythmic figuration, dynamics, texture, articulation, etc. The argument here is that if one plays and improvises on enough of these
composed song forms, one cannot help but internalise their logic.
The third and final hypothesis is that Parker may have acquired GS proportions from nature. As remarked earlier, GS proportions have been uncovered in the works of many Western art music composers who
may have never intentionally employed them. Although musicologists have generally abdicated from responsible conclusions as to why these proportions occur, it may be construed that many of these
occurrences may be in fact manifestations of the ratio as a naturally occurring phenomenon. It should be of no surprise, therefore, that Parker, a very successful exponent of a genre which owes much of its
impetus to a spontaneous rhetoric, may have intuitively constructed his musical argument within GS proportions.
This analysis of Parker's successful and less successful pacing of the same work bears testimony to his improvisational genius and also hopefully demonstrates, on a small scale, a great improviser's work in
progress. Questions surrounding Parker's intentional or subconscious use of GS are perhaps irrelevant in view of the fact that there will never be sufficient anecdotal evidence to substantiate any of the assertions
made in this paper. That GS proportions are nevertheless better adhered to in the Released Take is evident in the music itself.
ACKNOWLEDGEMENT
The author of this paper is indebted to Associate Professor Gary McPherson (University of NSW) for all of his advice, enthusiasm and critical feedback, all without which the writing of this paper would have
been impossible.
BIBLIOGRAPHY
Adams, C. (1996). "Erik Satie and Golden Section analysis." Music & Letters. LXXVII(2), p. 242-252.
Bauer, S. (1994). Structural Targets in Modern Jazz Improvisation: An Analytic Perspective. Ph.D. diss., University of California, San Diego.
Berliner, P. (1994). Thinking In Jazz : The Infinite Art of Improvisation. Chicago: University of Chicago Press.
Blanq, C. (1977). Melodic Improvisation in American Jazz : The Style of Theodore "Sonny" Rollins, 1951-1962. Ph.D. diss., Tulane University.
Clarke, Eric F. (1988). "Generative processes in music." in J. A. Sloboda (ed.), Generative processes in music: the psychology of performance, improvisation and composition. Oxford: Clarendon Press, p.
1-26.
Dorfman, A. (1986). A Theory of Form and Proportion in Music. Ph.D. diss., University of California, Los Angeles.
Ernest, G. (1903). "Some Aspects of Beethoven's Instrumental Forms." Proceedings of the Royal Musical Association, Twenty-Ninth Session 1902-1903, p. 73-98.
Ford, Andrew. (1997). Illegal Harmonies: Music of the 20th Century. Alexandria, Sydney: Hale and Iremonger Pty Ltd.
Forte, A. (1973). The Structure of Atonal Music. New Haven: Yale University Press.
Howat, R. (1983). Debussy in Proportion. Cambridge: Cambridge University Press.
Jarvinen, T. (1995). "Tonal Hierarchies in Jazz Improvisation." Music Perception 12(4), p. 415-437.
Johnson-Laird, P. N. (1991). "Jazz Improvisation: A Theory at the computational level." in P. Howell, R. West and D. Cross (Eds.) Representing Musical Structure. New York: Academic Press, p. 291-325.
Kernfeld, B (Ed.). (1988). New Grove Dictionary of Jazz. New York: Macmillan.
Kernfeld, B. (1981). Adderley, Coltrane and Davis at the Twilight of Bebop: The Search for Melodic Coherence (Volumes I and II). Ph.D. diss., Cornell University.
Kramer, J. (1995). "Beyond Unity: Toward an Understanding of Musical Postmodernism." in E. Marvin and Richard Hermann (Eds.). Concert Music, Rock and Jazz Since 1945 : Essays and Analytic
Studies. New York: University of Rochester Press.
Martin, H. (1996). Charlie Parker and Thematic Improvisation. Lanham: Scarecrow Press Inc.
Marvin, E. (1995). "A Generalization of Contour Theory to Diverse Musical Spaces: Analytical Applications to the Music of Dallapiccola and Stockhausen." in Marvin, E. and Richard Hermann (Eds.)
Concert Music, Rock and Jazz Since 1945 : Essays and Analytic Studies. New York: University of Rochester Press.
Marvin, E. and P. Laprade. (1987). "Relating Musical Contours: Extensions of a Theory for Contour." Journal of Music Theory, 31(2), p. 225-267.
Marvin, E. and R. Hermann (Eds.) (1995). Concert Music, Rock and Jazz Since 1945: Essays and Analytic Studies. New York: University of Rochester Press.
McCreless, P. (1997). "Rethinking Contemporary Music Theory." in Schwarz, D., A. Kassabian, L. Siegel (Eds.). Keeping Score: Music Disciplinarity, Culture. London: University Press of Virginia.
Monson, I. (1996). Saying Something. Chicago: University of Chicago Press.
Morris, R. (1993). "New Directions in the Theory and Analysis of Musical Contour." Music Theory Spectrum 15(2), p. 205-228.
Norden, H. (1968). Form - The Silent Language. Boston: Branden Press.
Back to index
Proceedings abstract
mmishra@siue.edu
Background:
Aims:
The purpose of this study was to determine whether altering environmental
context affected performance accuracy of memorized music.
method:
Results:
Conclusions:
Back to index
Proceedings paper
The Problem
The overall meaning of the Seventh Symphony of the Austrian composer Gustav Mahler (1860-1911) has
consistently baffled analysts and musicologists. It used to be thought of as one of the weakest of the
composer's symphonies. The English musicologist, Deryck Cooke, said: 'The Seventh is undoubtedly the
Cinderella among Mahler's symphonies.'(1) He was particularly critical of it: 'The truth is that
No.7...presents an enigmatic, inscrutable face to the world...one which arouses suspicions as to its
quality.'(2) James L. Zychowicz noted the criticism: 'It is rare, indeed, when an international symposium
is devoted to a controversial - and sometimes castigated work - such as the Seventh Symphony of Gustav
Mahler.'(3) It was this symposium that threw much light on the nature of the work and particularly its
individual parts. The symphony's overall 'meaning' is something that has troubled many commentators.
Typical is the opinion of Peter Franklin: 'The Seventh Symphony (1904-5) makes use of as wide a range
of allusive musical imagery as any of his works, while remaining mysteriously canny about its cumulative
meaning.'(4) Henry-Louis de La Grange voiced similar thoughts: 'Non seulement elle n'est accompagné
d'aucun "programme" qui permettre d'en décrypter le sens, mais elle ne semble pas, comme les autres
symphonies mahlériennes, avoir de grand dessin, de propos général suseptible de justifier le plan de
l'ensemble et la bizarrerie du détail.'(5) La Grange also gave extensive and sympathetic consideration to
the possibility of a programme, citing the influence of the poet Eichendorff and the ideas of Peter
Davison, Peter Revers, Willem Mengelberg and Alphons Diepenbrock without coming to any firm
overall conclusions.(6)
It is an article of many writers' faith that while the middle movements (II, III and IV) are among the
composer's most attractive creations, the first and last movements, for one reason or another, fail to
convince. The second movement, entitled Nachtmusik, gives a vivid picture of nocturnal activities, horn
calls, sinister marches with whirling counterpoints, birdcalls and even screams. The fourth movement,
also entitled Nachtmusik, is superficially a charming serenade, complete with mandolin and guitar. It has
an engaging character with its memorable melodies, simple repetitive rhythms and chamber-music like
orchestral textures. The central third movement, the scherzo, is also very nocturnal in character. Its
heading Schattenhaft (shadowy) clearly indicates this. The way that the accompaniment is built up in the
bass register, with timpani, cellos and basses, bass clarinet and horns, before the violins' whirligig runs
emerge, is typical. The volume, apart from a couple of violent outbursts, is kept low and the textures are
mostly delicately scored. It would appear to be generally uncomplicated.
The problem seems to involve the outer movements, the first and last. In various ways they are thought
not to be convincing. Let us look at these in turn. The first movement by various analysts is considered
somewhat diffuse in form. The slow introduction is integrated thematically into the allegro section, a
feature which should cause no problems, but the central part of the allegro appears to some to be a series
of sections of episodic character. There are sections which exemplify Adorno's 'suspension of time' and
frequently the music seems to lose its momentum. One can compare this movement to the equivalent one
of the Sixth Symphony. The latter's fairly straightforward classical sonata structure is not difficult to
follow, with the composer making his points in an orderly and traditional way, even if the content is
powerful and imposing. The first movement of the Seventh Symphony contains themes with a family
resemblance to those in the Sixth Symphony, but they are handled far less formally and much more
flexibly and probably for Mahler intuitively. The finale presents even more serious problems. It is said to
be untypical of the composer, especially obviously after the three very evocative night movements. The
almost forced joyous character of the movement comes with something of a jolt after the delicate
serenade. The main thematic material is uncommonly four-square for Mahler and has an uncanny
resemblance to the prelude to Wagner's Die Meistersinger. Rather than point an accusing finger at
Mahler, we should rather try to understand what is happening.
The polarisation of favourable opinion concerning the central movements on the one hand and the critical
opinion of the outer movements on the other hand suggests that Mahler's inspiration was in some way
faulty or that the work itself may have been misunderstood by its audiences. Mahler himself, always his
severest critic, was reportedly satisfied with the work and no less a musician than Arnold Schoenberg was
very impressed by the work.(7)
It is my contention that the overall significance or meaning of the work is lost if one concentrates too
closely on the individual parts, attractive though they are. It is a symphony not a suite and as such it can
be expected to present a coherent and unified message. Further, the traditional methods of analysis,
especially if used singly, are unlikely to illuminate the richness of the work. As Mahler's music responds
to approaches from so many different angles and perspectives, surely it is sensible to take a number of
different viewpoints together to see if the 'meaning' of the work can be better uncovered.
Analytical Methods
The music of Mahler is particularly rich in its features. Much more than the music of, say, the Classical
period, it can be viewed in a number of different ways. Consequently it is reasonable to believe that there
are many different analytical approaches which are valid for this music. Let us look at these in turn.
The traditional approach to music often uses some type of formal analysis. One can thus take a standard
model of a symphony, usually from the Classical period and map this model on to the work being studied.
The relationship that this reveals between the two can be considered in a number of ways: themes, keys,
relative lengths of the 'standard' sections are probably the most important. This form of comparison to a
notional model can of course be very revealing. On the most simplistic and obvious level, points of
similarity indicate a valid contact, while differences show the divergence from tradition. The broad
conclusion that one can arrive at is that one can recognise some common characteristics, even though
there are very many divergences.
One feature which suggests to us that Mahler's music does not adhere strictly to classical procedures is
the appearance of ideas that indicate some overt extra-musical influence. This comes in two forms. The
first is basically musical in nature, involving the complex use of dance forms and marches. Dances are
not restricted to consistent and uniform tempos and styles. They are organic features that are constantly
being developed and changed. The composer's use of marches and march-like music has a similar variety
in its use; it is equally liable to change its character, without any obvious internal musical explanation.
The second is the use of features that do not normally find a place in music, for example, birdsong, and
the sounds of cowbells, that hint at something outside the normal range of music. This imagery is a potent
feature in Mahler's music.
It is not difficult to imagine that some kind of narrative underlies the music. Although the idea of
narrative in music has generated a great deal of controversy, following Adorno, it has gained a wide
currency in discussions of Mahler's symphonies. The questions that one should ask are: in what way can
music 'narrate' and if so does this process apply to the music of Mahler?
We now move naturally into the area of musical hermeneutics, the study of the meaning of music, the
ultimate aim of this study. Herman Kretschmar, one of the pioneers of this type of study, rejected the
conception of music, deriving from purely formal considerations, of Hanslick and the so-called
Formalists, but he also rejected the poetising descriptions of much music writing of his day. He tried to
work out the real emotions which, he argued, were inherent in the music itself, drawing on biographical
and general historical data to support his explanations.(8)
In Mahler's case there is indeed much biographical information, including the diary and notes of Natalie
Bauer-Lechner, the letters and diaries of his wife, and his own letters. Detailed biographical work has
been done by Donald Mitchell and especially by Henry-Louis de la Grange that reveal many important
details. It is to some of these that we can turn to support what may be apparently speculative suggestions
of various interpretations.
Formal Models
Despite their complexity the broad structures of Mahler's Seventh Symphony can be related to traditional
symphonic forms. The first movement is a loose sonata structure with a slow introduction. That some of
the introduction's material is worked into the form of the main allegro should not concern us much at the
moment. The main allegro hinges around a thrusting march-like first theme and an important subsidiary
section (Mit großen Schwung, bar 118) which is strongly linear, with a rich melodic chromaticism and
frequent emotionally charged pauses on the second beats of the bar. It is not difficult to recognise a
similarity with the comparable music from the first movement of the Sixth Symphony, a point made by
numerous commentators.
Leaving aside for the moment the extended development section, we can see a certain regularity in the
recapitulation. The return of the adagio introduction (bar 338) and its transformation into the Allegro
come prima - maestoso (bar 373) is broadly similar to the opening, but significantly abbreviated by the
omission of the march which has played an important part in the music so far. The second theme (bar
465) charts a similar course, but now without the emotionally charged pauses found in the exposition. The
aspect that has drawn most comment is the extended development section (bars 171-337) which works
through numerous sub-sections of very varying tempos. In the slow sections time stands still: Meno
mosso (256-265), Etwas gemessener (bars 298-316) and the subsidiary theme (bars 317-337). The last in
B major forms a link to the return of the slow introduction. The later appearances of the march which was
first heard in the slow introduction are most interesting: the end of the exposition in which it is very loud,
then augmented as a slow chorale in the middle of the development (bars 256-265) and finally loudly in
the coda. We are entitled to ask what is the significance of these changes in the character of the march,
and further what is the relationship between the main themes and between them and the march itself.
The first of the Nachtmusik movements can be seen as a very varied sonata model, but Constantin Floros
identifies a plausible quasi-arch model.(9) The fact that this rather neat plan is strangely unrecognisable in
practice should alert us to the problems of interpreting the music in traditional terms. The sections that
return always come in a different form, something which demands some explanation. The Scherzo leads
to more difficulties of interpretation. The simple plan could look like this: scherzo with repeat, trio,
scherzo with repeat, coda. By further sub-dividing the movement into many sections, it is possible to give
some details of Mahler's micro-structural working, but this kind of analysis becomes pointless.(10) What
is in fact a patchwork of materials that are juxtaposed in various ways is a contradiction of the simple
plan. We might also ask why the first reprise is in the 'wrong' key - 'false recapitulation' is Berio's
phrase.(11)
The second Nachtmusik movement would appear more straightforward. Floros postulates the sequence:
introduction - main section - development - trio - recapitulation - coda.(12) There is a considerable rondo
feel about the music, although it is impossible to connect this with traditional rondo structures.
The problematic finale in some ways is easiest to understand in traditional terms, that of a baroque
ritornello. Floros identified six elements that are used for the ritornello theme. In only two of the eight
appearances of the ritornello (the first and the last) do all six elements appear; in all the others only some
(between one and four) elements are used, a procedure in line with Baroque practice.(13) Why did Mahler
use this strangely archaic formal plan?
In all five movements one can recognise some vestiges of traditional formal structures. However, in all
cases, there are good reasons to believe that there is much more to the music than mapping his music on
to an earlier model.
Musical Imagery
We now can look at one of the richest sources of clues in this symphony: the composer's use of musical
imagery. This can take a number of forms: birdsong, country sounds, military rhythms, dance-like
passages, references to other works of his own, melodies and melodic fragments previously set by Mahler
to words of some significance and quotations or quasi-quotations from other composers' works.
The opening movement seems to be carrying on the drama of the first movement of the Sixth Symphony -
the melodic shapes of the main themes are clearly related. The first Nachtmusik shows this in a
particularly vivid form. An early passage (bars 9-27) was said by Alma Mahler, the composer's wife, to
represent birdsong in its triplet woodwind figures. There are numerous military features that take their
inspiration from the Wunderhorn song Revelge, composed in 1899. Note especially the rhythm: quaver, 2
semiquavers, 2 quavers, crotchet. The three references in this movement (bars 28-29, 187-88, 337-38) to
the motto of the Sixth Symphony (major to minor chord shift) must hold some significance. The
appearance of cowbells as part of an episode that recalls the echoing horn calls of the introduction must
have some possibly pastoral significance. Peter Davison wrote: 'The presence of cowbells, echoing horns,
march music and exotic dance rhythms could initially seem to convey a[n] unconnected sequence of
extramusical significance'.(14) We can listen to the spectral whirligig music of the scherzo and imagine
all kinds of nocturnal activity, some of it very sinister. The second Nachtmusik, a movement that has
caused no end of problems for analysts using traditional criteria, is also full of evocative ideas that
conjure up the image of a beautiful serenade, loving played by the wind instruments with gurgling
accompaniments from the clarinets and gentle plucking from the guitar and mandolin. But perhaps the
most intriguing aspect of the work is the way that the finale seems to derive its main theme from the
prelude to Wagner's Die Meistersinger. Even the appearance of a derivative of Franz Lehár's Die lustige
Witwe ('The Merry Widow') would seem to have some hidden significance.(15)
The polarity between traditional forms which Mahler superficially follows and explicit programme music
(which Mahler rarely uses) is put very forcibly by Peter Davison: 'Complications arise from the need to
explain anomalous formal characteristics within the framework of traditional formal concepts, instead of
within a common-sense approach to the musical narrative.'(16)
Musical Narrative
Other clues to a narrative interpretation are readily forthcoming. The main thematic material of the
opening movement interacts in a very interesting way. The main allegro section has two groups of
materials that have something in common with the first movement of the Sixth Symphony. What is
interesting is the brief and innocent sounding march that first appears in the introduction (bar 19) which
shows dramatic transformations in its reappearances. What is being indicated by these changes? John
Williamson noted: 'In Mahler, Sonata form and march are frequently equated with motivic struggle, even
motivic disorder.'(17) There is indeed some conflict between the main allegro's processes and this march.
It appears fast (Flott) in bars 136-44 and bar 238, but very slowly and quietly in the central development
section (Meno mosso) at bars 258-65 and (Sehr gehalten) at bars 304-11. Its final appearance (Frisch) at
487-94 is very powerful. It seems to infiltrate itself into the other thematic material.
The Nachtmusik movements have strong connections to traditional forms in their reprises and
symmetries. One can sense that they are probably more descriptive than narrative in their nature. The
scherzo that separates them is different. It is the third of a series of developmental scherzos that Mahler
composed for these middle period symphonies. Its complex structure relates clearly to the traditional
scherzo in its overall plan: scherzo with repeat, trio, scherzo with repeat, coda. This simple plan conceals
the constant changes to the thematic material at each appearance. Very notable is the reprise, part of
which appears a semitone higher, in E flat minor rather than D minor. Also of some significance must be
the section marked 'Wild' (bars 416-20) in which there is a violent outburst from the trombones and tuba.
The sinister element in this movement disrupts the generally peaceful mood of the two Nachtmusik
movements.
The finale presents the conflict between form and content dramatically The most plausible model seems
to be a Baroque ritornello with the complete version of the opening part (all six elements) heard only in
the first and last of eight appearances. In the other reprises only between one and four elements are used.
The tonic key of C major is used in only the first three appearances and the last. Unlike in the first
movement, the music barely stops for breath. There are no slow episodes. Is there any narrative
significance in this plan? The return of the allegro music from the first movement must also surely have
some, probably narrative, significance. This type of reference to an earlier movement is, of course, a very
common procedure in Romantic symphonies.
So far we have only a disconnected group of suggestions about narrativity in this music. It does appear to
have the clear sense of direction found in its two predecessors. The Fifth Symphony presents in its first
two movements a conflict that ends in an attempt at a triumph (the D major chorale) which collapses into
fragments and a return to A minor. After an invigorating developmental scherzo and a beautiful
intermezzo (the Adagietto), Mahler takes us through a rondo that again rises to the D major chorale found
in the second movement, but this time it sustains its tonality and key right up to a final triumph.
In the Sixth Symphony, the process is turned on its head, or nearly so. The first movement contrasts a
vigorous minor-key march section with exultant major-key material (the composer referred to this as his
'Alma' music, referring to his wife). Taking the order of the movements that Mahler adhered to in his
lifetime (Andante moderato second, Scherzo third), this is followed by the calm idyll of the Andante
moderato. The parodistic scherzo shatters this calm and mocks the music and tonalities of the first
movement. It propels the narrative with progressively compressed appearances of the main scherzo to a
collapse whose tonalities link directly with the finale. This mammoth movement with its rondo-like
introduction superimposed on an extended sonata structure leads us through great striving for the same
goal as the Fifth Symphony. It is the three hammer blows, placed in somewhat unpredictable places, that
make the narrative convincing, with the final collapse horribly inevitable. Mahler's removal of the third
hammer blow seems to have been the result of a superstition about his own fate. Nothing so obvious can
be related to the Seventh Symphony, so are we asking the wrong questions?
Biographical Evidence
An intriguing piece of biographical information has a bearing on the Seventh Symphony. In 1908 (or
possibly 1909) Mahler conducted in Amsterdam a performance of this symphony, which was prefaced by
three works by Wagner: Eine Faust-Ouvertüre, Siegfried Idyll and the prelude to the opera Die
Meistersinger. This is contained in a letter from Amsterdam to his wife.(18) This may be an example of
Mahler's imaginative programme planning, but it could also be a clue to the inspiration for the symphony.
There is a considerable amount of evidence that Mahler used ideas from his own and other composer's
music in his own works. Inevitably these ideas are modified, sometimes nearly out of recognition. They
occur with such frequency that they can hardly be considered incidental. In his article on the
phenomenon, Henry-Louis de La Grange presents a large number of reasons for the composer doing
this.(19)
A Faust Symphony?
The first suspicion of some underlying idea in the choice of works might be the presence of the name of
Faust. Mahler was very familiar with Goethe's Faust, as his setting of the last section of part 2 in his
Eighth Symphony was to show. The Faust Overture might also be the catalyst for the structure of the first
movement of the Seventh Symphony. The overture itself is a single-movement allegro (Sehr bewegt) with
a slow introduction (Sehr gemessen). Despite the fact that it lasts only ten minutes and its internal
construction is relatively straightforward, it could have acted as a distant model for the first movement of
the Seventh Symphony. It was originally intended in 1840 as an overture to Goethe's Faust Part1. Is the
connection with this idea just a coincidence or are we looking at a Faust symphony? If one does follow
the Faust theory, a great many apparently disconnected features fall into place (see Table 1).
The plan of the first movement presents less of a problem than has been suggested. The two main groups
of the allegro are clearly differentiated in character. It is possible to imagine that the thrusting first theme
stands for Faust himself and the more romantic and slightly sentimental for Gretchen. The fact that the
latter corresponds with the section in the Sixth Symphony that Mahler said represented his wife, Alma,
adds further corroboration to this idea. The march that appears briefly in the introduction of the Seventh
Symphony can be seen as a disruptive element, as a sinister and slightly threatening element at first and a
much more powerful one at the end of the exposition. Its appearance, now as a quiet chorale slowed down
so that it is almost unrecognisable, in the slow episodes of the middle of the movement, is calm and
restrained. In the recapitulation it does not return at first, presumably because but it was held back until
the end of the movement where it makes an aggressive reappearance. If we follow this, the three elements
are seen to be in some kind of unresolved conflict.
The vivid and picturesque first Nachtmusik seems to be the dreamy Faust himself. There are recollections
of the countryside and memories, some slightly threatening. The setting of night is entirely in character
with what we find in Goethe's Faust; scene after scene has a nocturnal setting. The sub-plot of the nature
poetry of Eichendorff fits in perfectly with this. The sinister element which casts its shadow on the scene
is the three appearances of major-minor chord shift that acted as the motto for the terror in the Sixth
Symphony. They are almost like the three hammer blows in the finale of the latter. This should prepare us
for the terrifying experience of the scherzo.
Without using a Wagnerian point of reference, the scherzo can be seen as a sinister night ceremony,
perhaps even an encounter with Mephistopheles. There are screams, 'things that go bump in the night',
and eerie rustlings. The reprise that is in the 'wrong' key can be thought of as a bad omen. Then there is at
bar 146 a savage outburst, from the trombones and tuba, marked Wild, that seems to be the final waltz of
the devil. What follows is a typically Mahlerian collapse, with disjointed fragments that disappear into
nothing just as in the scherzo of the Sixth Symphony. What can be the significance or meaning of this
movement? One possibility can be found toward the end of part 1 of Goethe's Faust. The scene called
'Walpurgisnacht' concerns a nocturnal meeting in the Harz Mountains between Faust and the devil,
Mephistopheles. The Nachtmusik that follows seems to know nothing of what has taken place. The
delightful serenade that Peter Davison maps on to Wagner's Siegfried Idyll, the latter composer's song of
love for his wife, appears as a way of obliterating the memories of the scherzo. It is not implausible to
think of this as a Gretchen movement.
This brings us to the finale. Mahler clearly wants some sort of redemption. In the Fifth Symphony, he
achieved it on the second attempt. In the Sixth Symphony, he failed heroically, despite a moment of
tranquillity early on. That work's Mephistophelean scherzo destroyed that peace. In the Seventh
Symphony Mahler must have wanted to expunge the overwhelming experience of the Sixth's finale. It
emerges then as a headlong and joyous affirmation of his belief in love. The quasi-quotation from the
prelude to Wagner's Die Meistersinger must surely confirm this - the story of Walter in the opera
vindicates his belief in love, something that will triumph over everything. Mahler did not want the idea to
be lost, so he hardly lingered at all in this finale. There are no slow episodes and the second main material
is specifically marked to be played at the same speed as the first. Just in case the music did not convince
Mahler made a second attempt to represent the redemption of love by a woman, in the finale of the Eighth
Symphony. This time he made no mistake: it was explicit and open.
There is one question that remains to be answered and that is, if one accepts this Faustian interpretation of
the Seventh Symphony (and there will be many who find it impossible agree with the points presented
here), to what extent are we talking about Faust, the mythical hero, or are we really talking about Gustav
Mahler, the composer himself. Because some of the earlier symphonies, particularly the First and Sixth,
do seem to be concerned with a hero who can easily be seen as a parallel with Mahler, it is not
unreasonable also to connect the alleged Faust figure of Seventh Symphony with the composer. In that
case we are now dealing with another 'biographical' work whose secret has been for so long been hidden
in the felicities of the quite remarkable nocturnal central movements and the controversies of the
perplexing outer ones.
Table 1
Proposed Programme
4 Andante amoroso Nachtmusik Wagner's Sigfried 'Love, love, love' based on Wagner's
Idyll Siegfried Idyll
5 Allegro ordinario Rondo-Finale Wagner: Prelude to The triumph of love over the devil,
Die Meistersinger Mephiostopheles.
and Lehár's Die
lustige Witwe
References
1. Deryck Cooke: Gustav Mahler: An Introduction to his Music, Faber, London, 1980, p.88
2. Cooke,1980, p.88
3. James L. Zychowicz (ed): The Seventh Symphony of Gustav Mahler: A Symposium, University of
Cincinnati, Cincinnati, 1990, p.v
4. Peter Franklin: The life of Mahler, Cambridge UP, Cambridge, 1997, pp.158-59
5. Henry-Louis de La Grange: 'L'énigme de la Septième', in Zychowicz (ed), op.cit., p.13
6. Henry-Louis de La Grange: Gustav Mahler, Vienna: Triumph and Disillusion (1904-1907), Oxford
UP, Oxford, 1999, pp.849-53
7. In Alma Mahler: Gustav Mahler Memories and Letters, ed. Donald Mitchell, John Murray,
London, 1968, pp.325-27
8. Tibor Knieff: The New Grove Dictionary of Music and Musicians, Macmillan, London, 1980, vol
8, p.511
9. Constantin Floros: Gustav Mahler: The Symphonies, Scolar, Aldershot, 1994, 198-99
10. See my investigation in Niall O'Loughlin: 'The Rondo in Mahler's Middle Period Symphonies:
Valid Model or Useful Abstraction', Muzikološki zbornik 35, 1999, p.138
11. Talia Pecker Berio: 'Perspectives of a Scherzo', in Zychowicz (ed): op.cit., p.88
12. Floros: op.cit., p.204
13. Floros: op.cit., pp.206-11; Martin Scherzinger: 'The Finale of Mahler's Seventh Symphony: A
Deconstructive Reading', Music Analysis vol 14, 1995, no 1, pp.69-88
14. Peter Davison: 'Nachtmusik I: Sound and Symbol', in Zychowicz (ed): op.cit., p.68
15. Henry-Louis de La Grange: 'Music about music in Mahler: reminiscences, allusions or quotations?',
in Stephen E. Hefling: Mahler Studies, Cambridge UP, Cambridge, 1997, p.166
16. Peter Davison: 'Nachtmusik I: Sound and Symbol', in Zychowicz (ed): op.cit., p.68
17. John Williamson: 'Mahler and Episodic Structure: The First Movement of the Seventh Symphony',
in Zychowicz (ed): op.cit., p.34
18. Alma Mahler: Gustav Mahler: Memories and Letters, ed. Donald Mitchell, John Murray, London,
1968, pp.308-9
19. See in particular: Henry-Louis de La Grange: 'Music about music in Mahler: reminiscences,
allusions or quotations?', in Stephen E. Hefling: Mahler Studies, Cambridge UP, Cambridge, 1997,
pp.122-68
Back to index
Proceedings paper
environmental influences. This is what is most fundamental about man's nature, that is, that he has to
learn everything. The 'nature' of man is that he acts in the world as a being with the limits, strengths
and weaknesses of which he is persuaded. This peculiarity of man (his vulnerability to influence and
persuasion) can and should be the basis for his growth in all types of development and transcendence.
As we think about the whole matter of education, we should remember that the history of man is that
he is always surpassing what once was believed to be ultimate limits--it clearly matters what people
believe to be true about man's potentialities. For example, we now think a child's cognitive abilities
are developed much earlier than was previously thought. Pre-kindergarten training is now known to
be pedagogically essential (Gordon, 1986), even pre-natal 'education' is presently widely researched.
When does the music experiencing begin? The music environment of the young child has long been
the concern of professional musicians, but little has been discovered about the effects on the foetus of
the music and sound environment until the last two decades or so. Wilkin (1994), after a decade of
research, reports that as early as 38 weeks gestation, the foetus appears to be selective in the music to
which it responds--thus she opines that the learning process in humans begins before birth. Also,
during the last two months of gestation it was possible to condition the human foetus. Abrams and
Gerhardt (1997) offer more details to substantiate the case for prenatal music for the foetus by saying
that the degree to which airborne music is heard by the foetus depends on the attenuation features of
the abdominant surface, the distance to the foetal head and the low pass features of bone conduction.
However, music produced by mechanical vibrators against the skin is transmitted more effectively and
with less attenuation to the amniotic fluid. (It has been known for many years that the peripheral and
central components of the auditory system are formed and functional prior to birth.) In consideration
of the continuing research results being reported on the importance of aural experiences for the
unborn child, this educative process may become recognized as equally important as we now
recognize the place of musical experiences in early childhood development.
What does the 'foetal training' have to do with music cognition? The aural sense, being the first of the
five senses to be physically developed, can be musically stimulated, thus enhancing brain
development in its functioning--and the sooner begun, the more learned in the lifetime of the human.
Hodges (March, 2000),although making no claim about the efficacy of pre-natal musical training,
reported that we are knowing more and more about the environment (musical) producing physical
brain changes (wiring and structure), that music influences brain development, and that, because of
the plasticity of the brain, early development (music stimuli) will have a decisive, long-lasting impact.
Contrarily, early negative (no music) experiences may have dramatic and sustained consequences.
One concern very basic to those of us in the field of pedagogy (and all musicians are music
pedagogues) is to shape experiences as a way for persons to act, guided by intelligence and respect for
life, so that needs are satisfied and that they will grow in awareness, confidence and the capacity to
experience meaningfully the humanly unique aspects of life. Music experiences minister to the three
basic 'needs' of the human, viz., spiritual, emotional, and intellectual (the first of the three, spiritual, is
related to man's need to transcend himself. . . and music is the major enhancer of the Individual's
worship experiences. Also, two categories into which responses to music by the human have been
labeled are affective and aesthetic. Most authors have focused on the differences and similarities of
the labels while a very few have defined them as distinct one from the other. For most, the aesthetic
experience is an intense, subjective, personal experience that includes some mood, emotional, or
feelingful aspect , that is a component of the affective response. The affective response is generally
discussed as a less superficial response than the aesthetic response/experience. Such terms as taste,
attitude, or preference have been used inconsistently. (Parker, 1978)
The relationship (interaction) of psychological concepts and music is also exemplified by the fast
growing discipline of Music Therapy which needs to be mentioned here in the context of the
importance of musical experiences in the human's life. Mental illness is certainly a psychological
aberration which is treatable in a music therapy therapeutic setting. Oversimplied for purposes of this
discussion, although a person may have withdrawn from reality (is non-communicative), he may
refuse to hear/interpret the spoken word, but he cannot refuse to hear the music stimuli and respond to
if IF the music stimuli is remiiniscent of his past (learned) music experiences, because those have
been encoded in his brain.
The origin of psychology (as a discipline) was that it was the‚ science of the soul (psyche equals soul
or mind, logos equals science). However, we are not always able to distinguish between what
concerns the body and what belongs to the mind. In applying knowledge and research from
psychology in music to the endeavors of musically educating, we need not think of splitting the
soul/mind and the body--we simply deal with the musical behavior of the individual--holistically, that
is.
Each human has a need for music (there is no society/culture that does not have a music). Therefore,
what music, when? Briefly, Sloboda (1998) opines that feelings are tied to musical structure, that
there is a consistency in musical responses given the relatively same environment. Basically he is
saying that there is a strong cognitive content in the emotional experience that results from the
musical stimuli. He is making a strong case for the cognitive approach in the experiencing of music
stimuli.
There should be constant interaction between the fields of music and psychology, because
psychologists are interested in the interpretation of human behavior and because music, in addition to
being an art, is a form of human behavior which is unique and powerful in its influence. Some
questions which come to mind, in attempting to successfully investigate this uniqueness and influence
of music on the human's behavior might include:
1. What is the language aspect/syntax of music?
2. What is the appropriate combination of aural, visual, and kinesthetic experiences? and,
3. From where does the meaning of music emanate? In searching for answers to such questions,
perhaps the most far-reaching endeavors by the music profession were the three Ann Arbor
symposia, held in 1978, 1979, 1982, involving circa 150 musicians and psychologists.
The format of the symposia centered around questions/presentations by 12 musicians and responses
by 12 psychologists. The six categories selected for discussion and debate were: auditory perception,
motor learning, child development, cognitive skills, memory and information, and affect and
motivation. Subsequent publications, presentations at profession meetings, and seminars were
developed, and answers promulgated/published disseminating the long term pedagogical goals of the
music profession. Out of the intense discussions between musicians and psychologists at the Ann
Arbor symposia important benefits are coming, not only pedagogically, but for institutional
collaboration and for reaching out to other disciplines, e.g., all the social sciences as well as the pure
sciences.
At this time it would be germane to generalize on questions/answers under each of the six categories.
As to auditory perception, people come to any task, especially music, with their sum heritage,
training, and disposition--with apparent auditory limitations depending more on the limitations of the
stimulus than on the person's ability to discriminate. As to motor learning, the 'staggering' complexity
of music performance, psychomotor execution is still drawing a paucity of research in the discipline;
only speculative replies were managed at the symposia, such as 'globals' over 'locals'. Wilson (1986),
with much more lucidity offered helpful understandings in motor learning, such as answering why it is
refereed research papers, representing 26 countries. The presenters were about evenly divided--50%
psychologists and 5)% musicians, Hodges (1997) addressed one of the main concerns of both
disciplines as follows:
For its current richness, music psychology is a fractionated discipline. Muchhigh quality
work is going on here and around the world, but often without an overarching conception
of the field as a whole. One of the reasons this is so that differences in training may keep
us from a more coherent and complete view of the field. . .simplistically, musicians may
regard some psychologists' research as musically naive, while psychologists may view
the research of some musicians as less than rigorous. Other differences arise between
basic and applied researches, between researchers and practioners, between differences
based on geography or language, etc. (p. 33)
The ESCOM Conference was, in essence, a call for unification of the goals and objectives of the
researchers and practitioners of the two disciplines, as promulagated by Hodges (1997): . . .one of the
simplest and quickest changes to make would be for everyone working in the field to adopt a multi-
and interdisciplinary view of music psychology. That is, we should be encouraged to read broadly and
to adopt a holistic attitude toward musical behavior. (p. 40)
It is my feeling that the implications of what the psychologists can tell us about such things as:
1. Is there a difference between music perception and general perception;
2. Is there transfer of auditory perceptual skills to music from other modes of listening,; and,
3. Is there a best learning time, mental development, for the acquisition of music perceptual skills?
For another instance, could it be that pitch and not rhythm is the crucial factor in learning
music, as some psychologists are now suggesting, which is contrary to what we have believed
these many years?
We are also having some reports of research that tell us that we do not learn to read music as we learn
to read our language, viz., learning musical syntax is different than learning language syntax.
(Hodges, March, 2000) The foregoing idea is supported by a previous study reported by Storr (1992)
wherein the left hemisphere was sedated while the right hemisphere was in a normal state. The subject
exhibited the emotional effect of music that was heard. As practicioners, we will have to remember
that research reports will be limited in effectiveness only as we fail to put them into practice. It is our
concern to provide the appropriate environment for the human as a child to coincide with his
developmental potentials.
Another major source of confirmation is Gagne's (1970) hierarchies of learning which provide an
understanding of essential conditions for learning, labeling eight learning types for simple to complex,
the prerequisite capabilities, and the external conditions of learning. The empirical adaptation of these,
in connection to the cognitive, psychomotor, and affective domains, has helped music teachers in their
successes because there is a simple to complex continuum for almost all music outcomes. This
continuum is irrevocably ties to the kind of learning nexessary for step-by-step achievement.
Developmental psychology has influenced music education curriculum makers to almost establish
firm and inalterable stages, determined by age for the musical development for children. This means
that steps in musical development are linked with age and an exact coincidence of a certain level of
musical development for the certain age is said to exist. Michel (1973) warns against the
establishment of such narrow and rigid age limits because they can have a serious effect on the whole
range of music education which would lead to selecting teaching materials matched with each stage
and any transgression of the age stages is branded as an educational climb. He feels that this rigid
view overlooks the fact that the essential process of musical development is always an individual
matter in which age can be one fact among sever, and not the sole determinant. He suggests that 'far
greater significance be attached to the opportunities that exist for practical music activity on the part
of the child that is his active dialogue with the musical phenomenon of his environment'.
Michel does suggest that music teachers should not wait for the development alleged to occur
spontaneously at particular age levels, but that teachers must provoke and encourage development to
the fullest possible extent by conscious organization of musical activity. Not only is it imperative that
in the training of younperformersÓg children that we teachers be concerned about the nature of their
acquiring perceptual technical skills (cognitive and psychomotor) but we must additionally concern
ourselves with the training of each individual to have the most possible significant aesthetic
experiences.
Significant aesthetic experiences, it follows, are based on choices that have been developed from past
musical experiences, mainly I am referring to musical tastes--which are defined as 'stable, long-term
preferences to particular types of music composers, or performers'. (Russell, 1997, p. 141). Tastes
develop out of experiences gained in home, church, club, school, and out of contacts with the concert
stage, recordings, radio, television, and the printed page. The agencies of education, propaganda, and
censorship help persons to revere certain composers and/or performers, and to take less seriously
other composers and their works. Age, intelligence, special training, and all musical experiences can
be important variables in this process of taste formation. In summary of the acquisition of how we
come to acquire several standards of musical taste, musical taste is not whimsical and it is
culture-bound, not culture-free.
The idea that the need for music is universal is a viable premise. Therefore, if there is music in every
society, there will be some sort of music education in every society, whether formal or informal.
Music is learned, wherever; therefore the principles of learning are happening, hopefully
psychologically based. Thus, the disciplines of music and psychology are compatibles and not
alternatives. This interdependence becomes more profound as the mental processes develop
sequentially, and as the individual develops a set of music preferences/tastes. The sheer ubiquity of
music's presence in each society, whether as an art form or in a functional mode, establishes music as
a cultural activity, an artifact, which shapes and controls so much of human behavior in an
all-pervasive manner.
Finally, then, teachers/performers of music must search out and employ all the psychological avenues
of learning in order to be a major force in the achievement of the umo universale --that maximal
musical literacy does result for all the world's societies. Utilizing psychological principles of learning
in serving as purveyors of cultures, teachers of music will serve a catalysts in achieving world-wide
musical literacy. And, this must be achieved before the world's population can experience the
sought-for profundity of humaness! 'We are the world.'
References
Abrams, R. M. & Gerhardt. K. K. (1997). Some aspects of the foetal sound environment. In I. Deliege
& J. Sloboda (Eds.), Perception and cognition of music(pp. 83-99). Hove, East Sussex: Psychology
Press Ltd.
Bannon, N. (1999). Out of Africa: The evolution of the human capacity for music. International
Journal of Music Education, 33, 3-9.
Bjorklunmd, D. F. (1995). ChildrenÕs thinking: Developmental function and individual differences.
New York: Brooks/Cole Publishing Company.
Bowman, E. (1998). Universals, relativism, and music education. Bulletin of the Council of Research
in Music Education, 135, 1020.
Boyle, J. d. (1992). Evaluation of music ability. In R. Colwell (Ed.), Handbook of research on music
teaching___ and learning (pp. 247-265). New York: Schirmer Books.
Carlsen, J. C. (1979) Ann Arbor symposium: A forum on the psychology of music education. Journal
of Research in Music Education, 27(1), 51-52.
Crozier, W. R. & Chapman, A. J. (1985). Psychology and the arts: The study of music. Music
Perception, 2(3), 291-298.
Crummer, G. C., Walton, J. P., Wayman, J. W., Hantz, E. D., & Frisina, R. D., (1994). Neural
processing of musical timbre by musicians, nonmusicians, and musicians possessing absolute pitch.
Journal of the Acoustical Society of America, 95(5, Part l), 2720-7.
Davidson, L., & Scripp. L. (1992). Surveying the cognitive skills in music. In R. Colwell (Ed.),
Handbook of research on teaching and learning music (pp. 392-413). New York: Schirmer Books.
Frisina, R. D. & Walton, J. P. (1988). Neural basis for music cognition: Neurophysiological
foundations. Psychomusicology, 7(2), 99-107.
Gagne, R. M. The conditions of learning.New York: Holt, Rinehard, and Winston, Inc., p. 33___4.
Gardner, H. (1983). Frames of mind: The theory of multiple intelligences.New York: Basic Books.
Gaston, E. T. (1957). Factors contributing to responses to music. Music therapy. Lawrence, KS: The
Allen Press, pp. 23-30.
Gordon, E. E. (1986, April). Musicality: Preschool/early childhood pedagogical. Paper presented at
the biennial meeting of the Music Educators National Conference, Anaheim, CA.
Hargreaves, D., & Zimmerman, M. P. (1992) Developmental theories of music learning. In R. Colwell
(ed.), Handbook ofresearch on music teaching and learning (pp. 377-391). New York: Schirmer
Books.
Hodges, D. A. (1997). Standing together under one umbrella: A multidisciplinary and
interdisciplinary view of music psychology, In A. Gabrielson (Ed.), Proceedings: Third triennial
conference of the European Society for the Cognitive Sciences of Music (pp. 33-42). Uppsala,
Sweden: Uppsala University Press.
Hodges, D. A. (2000, March). Brain research applied to music education. Paper presented at the
biennial meeting of the Music Educators National Conference, Washington, D. C.
Killam, R. N., & Baczewski, P. (1985). The perception of music by professional musicians. In G. C.
Turk (Ed.), Proceedings of the research symposium on the psychology and acoustics of music (pp.
71-82). Lawrence, KS: University of Kansas Press.
Kleinen, G. (1997). The metaphoric process: What does language reveal about music experience? In
A. Gabrielson (Ed.), Proceedings: Third triennial conference of the European Society for the
Cognitive Sciences of Music (pp. 644-649). Uppsala, Sweden: Uppsala University Press.
Latham-Radocy, W. B., & Rodocy, R. E. (1996). Basic physical and psychoacoustical processes. In
D. A. Hodges (Ed.), Handbook of music psychology ((pp. 69-82). San Antonio, TX: IMR Press.
Florida.
Wilson, F. (1986, April). The neurological basis of musical ability. Paper presented at the biennial
meeting of the Music Educators National Conference, Anaheim, CA.
Back to index
Proceedings abstract
Dominique M. Richard
drichard@seas.upenn.edu
Background.
Aim.
Main contribution.
Implications.
Back to index
Proceedings abstract
carlos-x-rodriguez@uiowa.edu
Background:
Aims:
The purpose of this study was to replicate the earlier study using Italian
children in order to compare their performance with that of American children,
with particular attention to the distinctions that might obtain in a different
cultural environment.
method:
Sixty Italian children aged seven, nine, and eleven performed a MIDI file using
the computer keypad as a trigger for musical events. Four days later, the
children were asked to identify which of three performances of the same MIDI
file from within their age group was their own. The children also verbally
explained their decision. Three sets of factors were used to categorize the
subjects' responses; sensory variables, cognitive strategies, and product
variables.
Results:
At the time of this submission, the data are being collected in Italy. We will
report discrimination scores as percentages for age groups. We will categorize
verbal data using a judging task. To detect age-related tendencies in the
verbal responses, we will use ANOVA to test for a linear trend component.
Conclusions:
Back to index
Proceedings paper
Anders Friberg, Department of Speech, Music and Hearing, Royal Institute of Technology, Stockholm
1. INTRODUCTION
In Western European tradition, musical works generally exist in the form of written notation, or score, which has been produced by a composer and
which must be converted into sound by a (group of) performer(s). The recent decades have witnessed an increase in empirical studies on musical
performance (Gabrielsson 1999). There is a general agreement regarding the fact that if the score is transmitted into sound without any
modifications, this would result in the so-called deadpan version, i.e. something musically unacceptable. It is believed that expressive devices
complementary to the score are used by performers mainly for two purposes. First, to make it easier for the listener to differentiate between
musically relevant categories in the domains of pitch and duration and, second, to provide for a better grasp of the hierarchical structure of musical
work (Sundberg 1999b).
Friberg (1991) has successfully modelled the performance of musical work by some twenty-odd generative rules that automatically convert input
note files into sound performance on a synthesizer. The rules introduce into the performance micropauses, lengthenings and shortenings of tone
duration as well as long and short-term increases and decreases of sound level. The system of rules should be understood as a generative grammar
of musical performance, reflecting the musical competence available to its authors.
There exist other cultures in the world, however, which do not resort to prescriptive notation in the process of musical communication. An absolute
majority of musical folklore, as well as much of popular music (e.g. jazz) belongs to the oral tradition. Performance in such traditions is the result
of improvisation, i.e. spontaneous (re)creation of music from memory. If notations exist for folk music performance, these must be regarded as
descriptive, in the sense that they have been produced post factum by an ethnomusicologist or anthropologist, rather than in advance by a
composer. (One can thus hardly think of the category 'composer' in folk music.) Due to the fact that notation in such cases aims at describing what
is happening in the performance, as opposed to recreating the music anew, the target of such notation is thus research and not performance.
The old Baltic-Finnic folksongs are an example of such an oral musical tradition (Lippus 1995). This tradition has been shared by native speakers
of most of the Baltic-Finnic languages: the Finns, Estonians, Karelians, Votes, and Izhorians. The other two Baltic-Finnic ethnic groups, the
Vepsians and Livs (Livonians), do not evidence it. (There are about five million speakers of Finnish and about one million of Estonian in the
world; both of these succeeded in establishing nation states upon World War I. The other Baltic-Finnic languages have significantly fewer speakers
left.) The old folksongs are also called Kalevala songs (after the famous Finnish epic) or runic songs. In this paper the three terms will be used as
synonyms. The old folksongs are estimated to be two to three thousand years old. Although preserved in a relatively extensive body of archival
recordings, they have been fading from daily circulation since the18th century. The written part of the recordigs has mainly been collected during
the second half of the 19th and the first half of the 20th century; the majority of sound recordings come from the years 1930 to 1970.
In the present study, we will look into the extent of similarity between musical competence in the performance of works of the professional musical
tradition of Western Europe and that of the old Baltic-Finnic folksongs. We will be restricting ourselves to the domain of duration, i.e. we will not
study pitch, sound level or the timbral characteristics of musical performance. The majority of musical performance studies have so far been
concentrating on the European classical piano repertoire from the 19th century, their favourite objects of study being shorter compositions by e.g.
Fryderyk Chopin or Robert Schumann (e.g. Repp 1998 and, 1999a and b). No doubt there are substantive differences between the performative
situations of such piano pieces and runic folksongs. We see the following two differences to be the most significant.
1. A 19th century piano work is performed from a score, while a folksong is improvised. At a first glance, the conversion of note values from the
score into acoustic events of certain duration seems to have no equivalent in folksong performance. We have to take it into account, however, that
the runic folksong melodies are mostly isochronous, i.e. consisting of note durations of (nearly) equal value. This enables us to compute the
average note duration value over a certain portion of the musical work (provided that the tempo remains constant) and to hypothesize that
deviations from the average note duration value are used by the performer for expressive goals, in a similar way to that in which the performer of a
piano piece by Chopin employs deviations from normative durational values for expressive purposes.
2. In folksongs, the durations of sound events may also depend on the sung text (lyrics) and the verse metre. In the Baltic-Finnic languages,
quantity plays an important role in speech prosody. Duration differences in these languages serve the semantic function, i.e. distinguish the
meanings of words. In the Estonian language, one and the same disyllabic sequence may have three different meanings depending on whether the
approximate ratio of its constituent syllables equals 0.66, 1.5 or 2.0 (Lehiste 1997). Also the metre in Estonian folksongs, defined as trochee, uses
oppositions not only between the stressed and unstressed syllables but also between the long and short syllables for contrasting the ictus and
off-ictus positions in verse lines (Tampere 1983). The issue to what extent the requirements of word prosody and metre are combined or contrasted
in the musical realisation still remains open to a significant extent. If the former were supported by musical rhythm in folksongs, it would certainly
enhance the intelligibility of words to the listener. What cannot be ruled out, however, is that it may be difficult for the performer to meet the
structural requirements coming from three separate domains-speech, metre and music-at the same time. Nor can it be ruled out that possible
conflicts between the three systems may be creatively used by performers for aesthetic purposes.
2. MATERIAL
Seven one-voiced folksongs from the repertoire of the female singer LK were used for analysis in this study. All recordings had been made in 1937
3. SEGMENTATION
From the structural point of view the shortest meaningful sound events in the runic songs are syllables (defined on the phonetic basis), or notes
(defined on the basis of the melody). Provided that the number of syllables in a verse line equals the number of notes, the question would be: Do
the boundaries between successive syllables in old folksongs coincide with the boundaries between successive notes? The answer to this question
need not necessarily be affirmative. Sundberg (1999a) argues that in sung performances, the tone (note) is expected to start with the onset of a
vowel. The use of quantity in spoken Estonian, however, occurs in differentiating the length of vowels as well as consonants. Thus the lengthening
of a stop consonant yields a geminate which, by definition, consists of two parts belonging to neighbouring syllables. The problem Sundberg (id.)
is pointing to concerns the segmentation of Estonian words, e.g. [saakki]: either to [saak-ki] as two syllables, or [(s)aakk-i] as two tones (notes).
The determination of the onset of tone as coinciding with the onset of vowel, however, seems to be related to the theory of P-centers in phonetics
(see e.g. Pompino-Marschall, Tillmann and Kühnert 1987). According to this theory, in alternating sequences of monosyllables the perceived onset
(P-center) of a syllable as a rule does not correspond to its acoustic onset. Generally, the syllable onset (the beat) is highly correlated with the onset
of syllable nucleus (the vowel), while being somewhat displaced as a function of both the initial consonant(s) and the length of rhyme.
The method of segmentation adopted by Sundberg would be hard, if not impossible, to combine with the theory of quantity in spoken Estonian
(Lehiste 1997). According to Lehiste's theory, the quantity in spoken Estonian utterances is determined on the level of disyllabic sequences because
a strong tendency of these to isochronism. For the disyllabic sequence there are three contrastive duration degrees called the short, the long and the
overlong. The three durational degrees may be applied to the first syllable in a disyllabic sequence, while the duration of the second syllable
depends on the duration of the first. If the first syllable is longer, the second syllable must be shorter and vice versa. Therefore the best device for
an acoustical description of the functioning of the Estonian quantity is the duration ratio of the syllables in disyllabic sequences (see above).
The present study has adopted the 'phonetic' method of segmentation, which establishes boundaries according to the syllable structure of Estonian.
Table 1. Covariance analysis of sound event durations in seven old Estonian folksongs by a single female performer. N is the total number of
syllables (notes) in each song. r2 is the coefficient of determination which measures the amount of sound event duration variability in each song
that can be determined by the total effect of seven individual variables. Columns 4 to 10 present the statistical significance level of the effect of
each variable to the sound event duration in each song. In some of the songs the effect may be non-significant (n/s), or the variable not applicable
(n/a).
N r2 Significance level
phonol syll metric pos mel peak # of phonemes dotted final mel charge
exp syll durat
1 2 3 4 5 6 7 8 9 10
leskim 204 .128 .0021 n/s n/s n/s n/a n/s n/s
läksinm 367 .400 n/s .0001 n/s .0001 .0001 .0405 n/s
minulk 284 .396 .0073 .0014 n/s .0001 n/a n/s .0295
vendas 493 .168 .0267 .0001 n/s .0001 n/a n/s n/s
peren 318 .515 .0507 .0001 n/s .0001 .0001 n/s n/s
Overall 1837 .298 .0008 .0001 n/s .0001 .0001 n/s n/s
Covariance analysis of the material was performed in order to estimate the influence of selected variables on sound event durations. A summary of
the results appears in the Table.
The effects of four out of seven variables on sound event durations were highly significant. The variables concerned include those of metrical
position (strong or weak), the number of phonemes per event (in ms), deviations in the score from isochronous sequence (p< .0001 in all three
cases) and the phonological length of the event (short or long, p= .0008). The effects of three variables, the melodic peak, the final note of the line,
and melodic charge, were not significant.
The covariance analysis model is capable of accounting for an average total of 30 per cent of sound event duration variance in the seven songs
studied. The percentage varies across songs. It reaches the highest value of 52 per cent in the song 'Peren' and the lowest value of 17 per cent in the
song 'Vendas'.
An earlier finding (Ross and Lehiste 1998), according to which metrically strong, or ictus, positions of the line are performed longer than the
metrically weak, or off-ictus, positions, was confirmed by the covariance analysis. There seem to exist at least two possibilities for interpreting this
result. When projected against the background of the Kalevala metre theory (Lippus 1995), it suggests that the partly quantitative nature of the
trochaic metre, as described on the 'phonological' level, is indeed acoustically realized in runic songs. Application of the swing rule (Inegales)
should in this case be specific to the runic song tradition. If, however, the swing rule also operates in Friberg's (1991) set of rules intended to
simulate the performance of a musical idiom different from the runic songs, it would suggest that making stressed positions longer and unstressed
positions shorter is a more universal cognitive principle in the musical performance, specific neither to the old Baltic-Finnic folksongs nor to the
7. CONCLUSIONS
Seven generative rules affecting sound event duration in musical performance (Friberg 1991) were examined in order to determine their suitability
for modelling the performance of old Estonian folksongs. Three rules out of seven were rejected because of their irrelevance for the musical
tradition investigatexamined. The remaining four rules were complemented by three additional ones derived from the nature of articulatory
productionsinging, the prosodic description of the Estonian language, and the specifics of the old folksong performancetradition. The influence of
seven variables on the acoustical duration of sound events in folksong performance was investigated by means of covariance analysis. Of the four
original rules adopted from Friberg's (1991) set, the InegallesInegales rule, which makes metrically strong positions in isochronous melodies longer
and weak positions shorter, was found to apply in runic songs. We did not find evidence of the effect of three rules: Melodic Charge, Faster Uphill
and Phrase. The Melodic Charge rule is expected not to work because of weak tonal structure in the runic songs. The Faster Uphill rule is expected
not to work because of the relatively narrow ambitus of melody in those songs. The two additional variables, which were formulated ad hoc and
were not present in Friberg's set, relate to the prosodic characteristics of Estonian speech. Their significant influence on sound event durations in
runic songs, together with the negligible effect of some of Friberg's rules, suggests a pronouncedly speech-like character of the analysed folksongs.
The latter retain a number of characteristics stemming from speech prosody, while failing to evidence other characteristics thought to be
specifically musical.
ACKNOWLEDGEMENTS
We wish to thank Meelis Mihkla of the Institute of the Estonian Language, Tallinn, for making it possible to use the text-to-speech Estonian
synthesis software for prediction of expected syllable durations in texts, Professor Ene Tiit of the University of Tartu, for help in statistical
processing of data, and Professor Ilse Lehiste of the Ohio State University, for productive discussions on many aspects of this work.
REFERENCES
Friberg A (1991). Generative rules for music performance: A formal description of a rule system. Computer Music Journal 15, 56-71.
Gabrielsson A (1999). The performance of music. In D Deutsch (Ed). The Psychology of Music. San Diego et al: Academic, pp 501-602.
Laugaste E (1989). Vana Kannel VI. Haljala regilaulud (Old Folksongs from Haljala District, in Estonian, 2 vols). Tallinn: Eesti Raamat.
Lehiste I (1997). Search for phonetic correlates in Estonian prosody. In I Lehiste and J Ross (Eds). Estonian Prosody: Papers from a Symposium.
Tallinn: Institute of Estonian Language, pp 11-35.
Lippus U (1995). Linear Musical Thinking. A Theory of Musical Thinking and the Runic Song Tradition of Baltic-Finnish Peoples (= Studia
Back to index
Proceedings paper
DIANA SANTIAGO
Assistant Professor, Department of Applied Music
Universidade Federal da Bahia, Brazil
Researchers' conception of meaning in music do not coincide. For some of them, musical meaning is found at the level of structural description; for others, musical meaning is
something much broader in scope (Sloboda, 1998, p. 25). Besides, several layers of meaning may be ascribed to a piece of music (Dunsby, 1988 , pp. 217-218).
The discussion directs us to the very nature of the musical event, and there is not a consensus of its basic features, either. For some, music is comparable to a language (Sloboda,
1990, p. 65), while others do not believe there is such a thing as a musical grammar (Dempster, 1998). As a performer, two questions have aroused my mind. What is that which
performers struggle to convey to the music they play? Which are the necessary steps for building an appropriate musical performance?
The construction of a theory of musical performance is still on the way. Although studies on the nature of such performance conducted by music psychologists during the past
decades have shed much light upon it, there is still a vast field to be explored. Studies on musical understanding, for instance, usually focus on the listeners (Shaffer, 1995, p. 18).
Furthermore, these studies are usually conducted by psychologists, and they should not be neglected by the performers, who could stress aspects forgotten by other researchers.
Certainly a better understanding of the performing action in itself would mostly benefit performers and instrument teachers.
This paper aims to present a practical view of the study of meaning in music, a view that details the aspects which the author, herself a performer, took into account while
practising the piece. It is expected that the findings will provide not only a better understanding of the way an individual performance was conceived, but also may be applied to
the elaboration of other performance plans.
The numbers indicate the measures, and the vertical lines demarcate relevant structural points. The dotted lines indicate both the half and the golden section of the piece. Circles,
squares, and the rectangle represent the main thematic material. Some tonal references appear below the horizontal line, while the small dashes represent notes in the bass register
which assume pedal function.
Well-balanced proportions indicate that symmetry was a guiding principle for the composer. The half of the piece is surrounded by equivalent portions of music. The 40
millimetres of mm.[18-37] correspond to the 41 millimetres of mm.[38-58], and the 34 millimetres of mm.[1-17] correspond to the 32 millimetres of mm.[59-74].
The form of the piece is clarified by the thematic material that occurs over the D pedal of mm. [1-4, 18-21, 59-62, 71-74]. The melody exposed in the introductory measures
returns at m.[18]. However, transformed at m.[22], this melody participates of what can be considered as the development of the piece − mm.[18-58]. At m.[59], the introductory
material is recapitulated, and, at m.[71], it appears in transformation, functioning as a coda.
The predominant texture of the piece is that of accompanied melody. As a matter of fact, the accompaniment is one of its most characteristic features. Arpeggios circulating on the
low and medium registers of the instrument create a wave-like movement easily associated with the title of the piece. This type of accompaniment is very common in barcaroles.
Commentary on the melodic material will be given after examining the work's tonal structure.
There is not a single authentic cadence in the piece. Resources such as evaded cadences − mm.[10-11, 12-13, 16-17, 33-34, 46-47], augmented sixths chords − mm. [17, 58, 72],
etc. are used to provide continuous mobility to the harmonic progression. However, this mobility is contrasted by the static quality which is characteristic of added sixths chords.
Their frequent use (see Example 1) lends a special colour to the sonority of this piece. Although they do include a dissonance of a second or a seventh (as they can be seen as
first-inversion seventh chords), added sixths chords have their static character determined mostly by the absence of the interval of a tritone. Example 1 makes easy to observe that,
among the versions used in the piece, only one has a tritone (D-F-A-B).
Example 1 Added sixths chords
Each one of the twelve pitches of the octave (in the system of equal temperament) is represented in the piece by a chord or a tonal area. In major or minor versions, or in both,
some pitches receive more projection, although a real tonic quality is not attributed to any one of them. For instance, D major assumes a prominent position because it is located
over the pedal that begins the piece and delimits important formal divisions, but it is not established as a tonality. Prominent at mm. [13 and 53], C major is also not established.
The study of the proportions of the piece lets us perceive, however, that although E major does not appear on the pure triadic version which would convey to it the stability of
tonic, it comes out as the principal candidate to what can be seen as the "focal point of the tonal structure". Four arguments favour this candidature:
1. pedal E in the middle of the structure, followed by E major at m.[39],
2. V13 of E as the final sonority,
3. modulation to B major - which enhances the dominant of E - in symmetrical disposition in the structure, for mm.[30 and 43] are equidistant, respectively, from the
beginning and the end of the piece, and
4. A flat at mm.[33 and 46] which enhances by enharmony the third of E major.
It is worth observing that the enhancement of these areas represents an expansion of the pitches that constitute the triad of E major: E - G#/Ab - B. (Other important areas in the
piece will be considered later.)
The main melody, on the top of the texture, confirms the importance of E major. By observing the melodic contour in the first half of the piece, it becomes possible to verify that
the notes of the dominant triad of E (B - D# - F#) work as melodic points of reference. The initial F sharp note, reiterated at m.[7], after ascending to the note D sharp in the treble
register at m.[15], prevails again at m.[16], while the note B is emphasised at m.[30]. In the second half, departing from F sharp at m.[38] in the middle register of the instrument,
the melodic line ascends to the highest note in the piece − the C of m.[53], which may be seen as a relation of half-tone above the dominant of E. F sharp comes back at m.[59]
and, after being reiterated at m.[71], ascends to the treble register at m.[73] to compose the dominant of E in the final sonority of the piece.
The contour of the main melody is full of surprises, and this seems to explain the harmonic progression with all its evaded cadences and unforeseen resolutions. It is worth
observing the arrival of the G note on the downbeat of m.[13], when the E flat note should resolve the tension of its leading note D in the preceding measure. Still in regard to the
linear treatment, there is an aspect which ought to be commented. It consists of melodic fragments that create a subtle counterpoint to the main melody. These fragments, which
originate in the accompaniment, sometimes interweave with the main melody. This can be clearly observed, for instance, from m.[27] to m.[28]. The melodic gesture of the left
hand from the first beat to the second at m.[26], after being imitated on the two following beats, appears again and, by acquiring new melodic strength, interweaves with the main
melody on the C sharp note of m.[28]. This type of counterpoint is much used by Chopin, a composer who was certainly studied by Velasquez.
The melodic character given to the accompaniment creates an uninterrupted musical discourse. At m.[8], for instance, while the right hand sustains a melodic C sharp, the
accompaniment gains melodic character when, at m.[9], it reaches the G sharp of the main melody. This sense of continuity searched by Velasquez in the piece is what mostly
suggests that he might have been in contact with Wagner's "endless melody". It would be necessary to investigate if his contact with the music of Wagner was direct or indirect.
There is information that Velasquez has been the first Brazilian composer to be seduced by the new harmonic conceptions, at the beginning of the twentieth century. He got
acquaintance with them through the works presented by Alberto Nepomuceno (1864-1920) in the concerts of the Exposição Nacional which, in 1908, celebrated the centennial of
the opening of Brazilian ports during the Portuguese Kingdom. In Devaneio, diverse factors such as the absence of authentic cadences, evaded cadences, unresolved tensions,
A comparative study of both pieces transcends the limits of this paper. Nevertheless, some aspects will be approached here, because the understanding of Devaneio's tonal
organisation is altered after the tonality of Sobre las olas is examined.
With G major as the tonal centre, the piece by Rosas is made of 14 portions, with 32 measures each. As can be seen in Graph 2, the thematic material in these portions is repeated
(AA, BB, CC, DD, EE, AA, BB), organising 7 sections of 64 measures each. Sections A and B are marked in the score as Waltz 1, and the sections C and D as Waltz 2. Section E is
not numbered. Waltz 1 returns at m.[325]. The sole interruption in the regularity of the piece's organisation is due to the insertion of 4 measures with a modulation from G major to
E major between Waltz 1 and Waltz 2.
From some angles, Sobre las onlas is extremely simple. From others, it displays very original traits. One of these traits is its tonal organisation. G major is used for Waltz 1, for the
D section, and for the recapitulation of Waltz at m.[325], i. e., on both extremes and in the middle of the structure. The use of C major in section E does not bring any novelty, for
C is the subdominant of G. In section B, however, there occurs an unexpected tonal event: eloquently announced by its individual dominant, E major is established at m.[133]. But,
instead of maintaining it, the composer creates a modulation to B major and ends the section in this tonal area. Would not the unusual procedure of beginning the section in one
tonality and concluding in another have inspired Velasquez the originality of his Devaneio? This question has sparked my curiosity about the relationship between the two works
● at m.[23], E flat functions as the dominant of A flat, but resolution in this tonality does not occur;
● at m.[32], integrating the circle of fifths F-B flat-E flat-A flat, E flat acts as the dominant of A flat, which occurs at m.[33];
● at m.[36], as happened at m.[23], E flat functions as the dominant of A flat, which does not appear; what appears instead, both at mm. [24-25] and at mm. [37-38], is a bass
E followed by E major at mm.[26 and 39], respectively; this leads to the understanding that, on a large scale, E flat acts as an enharmonic expansion of the leading tone in E
major;
● at m.[45], E flat plays the same role it had played at m.[32], since mm.[31-34] are equivalent to mm.[44-47];
● at m.[53], E flat is inserted in the accompaniment and hinders the establishment of C major;
● in the last two measures, the enharmonic transformation in D sharp confirms that the events in E flat are nothing but an expansion of the leading tone in E.
Concerning E flat, it is also worth mentioning that the distance from the beginning of the piece up to the event at m.[23] − 22 measures, is the same which separates the event at
m.[53] from the end of the piece. Thus, the placement of E flat and B major allows one to perceive the principle of symmetry acting in the tonal organisation.
Would symmetry be emulating the symmetrical organisation of the sections in Sobre las olas? Would the above mentioned hemiolic effects in Devaneio be emulating the explicit
hemiola of the portions in E major and C major (see Example 2) of Sobre las Olas? Would the expressive melodic line of Velasquez be emulating the so popularised melodies by
Juventino Rosas? An answer to these questions demands further research.
References
APPLEBY, David P. (1989). Music in Brazil. Austin: University of Texas Press.
DEMPSTER, D. (1998). Is there even a grammar of music? Musicae Scientiae, 2(1), 55-65.
DUNSBY, J. & WHITTALL, A. (1988). Musical analysis in theory and practice. London: Faber Music.
HOWAT, R. (1983). Debussy in proportions: A musical analysis. Cambridge: Cambridge University Press.
http://pub2.lncc.br/dimas/velasquez . Glauco Velasquez.
Back to index
Proceedings paper
Statement of Purpose
This study is a comparison of two frames of mind belonging to those who are drawn to music and to
business. It is hoped that insights will be yielded regarding the psychodynamics of object choice and
usage, particularly in the area of musical creativity.
College music and business majors are administered three instruments: one measuring a dimension of
ego functioning referred to as boundaries; one measuring relational tendencies; and a third measuring
the ability to find connections between sets of words . The extent to which artists, as compared to
those engaged in business studies, have greater access to their emotions, are more open to accessing
connections between ideas, less adept socially, more insecure and inclined to emotional vulnerability,
will be addressed. Stereotypes of the musician will be addressed via concepts of boundary thinness,
insecurity and attachment issues, and the capacity for heuristic thinking. The stereotype of the
business-person who distances himself from emotions in addressing the bottom line and impersonal
material exchange, who places a premium on efficiency in his operations and relationships, will be
tested with regard to boundaries, social competency, flexibility in association and openness to inner
experience.
The impulse for this study began with the question "why are artists creative?", relevant irrespective of
talent or quality of output, and addresses the urge to create, to manifest the "aesthetic of the
personality" (Bollas, 1992), to seek or create environments, relationships and occupations which
permit this. Psychologists and philosophers of many schools have addressed the nature of creativity.
The urge to create, the cultural, social and psychological functions of creativity, the significance of
that which is created, the link between creativity, states of consciousness and psychopathology, the
use of the creative process as therapy - all have been explored. However, most work in the area of
psychodynamics and creativity has been theoretical or anecdotal. There has been a paucity of
empirical work linking personality with creativity from a psychodynamic perspective. I hope to show
that what we do reveals who we are and how we are constructed, through the use of valid
psychological instruments. This study is founded on recent developments in object relations and self
theory which have focused increasingly on use and choice of objects, as in the work of Christopher
Bollas (1992). Additionally, essays on music and psychology such as those by Anthony Storr (1992),
and case studies and empirical research by Janet Dvorkin (1991, 1992) highlight the accessibility of
this area of inquiry.
and through deflected or corrupted strategies embodied in the limited dimension of desire. Eigen
thereby addresses the question of whether character, as akin to calling and true self, or any of the other
phrases used to illuminate the poetic but necessary signifier of a wholly personal essence, can find
expression and harmony in the act of creating, and in so doing reach toward a medium of expression
with some degree of fidelity to that essence. He also raises interesting questions addressed elsewhere
in the literature on intrinsic and extrinsic motivation as to the deflection and compromise encountered,
here by the musician, and he as a subset of humanity, in the inevitable encounter with demands,
agendas and concepts superimposed upon the act of true-self-expression. The marriage of ideas of
Lacan and Winnicott is relevant to this study in its implications regarding the ravages of language as
essential limiters and deflectors of meaning, and the extra- (or "pre-") verbal power of music as a
symbol-world offering a privileged access to some manner of authentic self-experience This view of
music is relevant also as a bridge to communal experience which, again, is far more limited by the
limiting and refractory nature of language. These are potent ideas which are compatible and
complementary with the operational ideas of boundaries, cognitive strategies, object relations and self
or character used in this study, and the challenge to establish sufficient access to the energies of that
hypothetical true or authentic self.
The ways in which one does this will be studied using the Bell Object Relations/Reality Testing
Inventory.
Association
Association refers to the connection between signs or symbols. This connection may be arrived at
"algorithmically", through reference to a formula. As such, it has both a predetermined solution as its
end product and a finite series of steps established to reach that end.
Connections may also be made heuristically. Heuristic describes a non-linear process by which
responses or solutions are arrived at through unpredictable, idiosyncratic means. Heuristic refers to
the process by which thoughts, images, fantasy and feelings emerge uniquely for each individual,
rather than by reference to a fixed, external source or algorithm for deriving meaning.
A test of word association will be used to test the hypothesis that musicians have greater access to
obscure connections, due to their ability to heuristically use or create unfamiliar cognitive paths - a
hallmark of creativity.
Literature Review
Little empirical work has been done regarding the psychodynamic characteristics of artists, alone or in
comparison with people in other fields. However, the following reviews some significant work
supporting the relevance of this study from different perpsectives
and Serna, who use an information processing perspective, state that the common thread of thought on
creativity, from Aristotle, Locke, through Ribot, Hollingsworth, and Freud and contemporary
information processing researchers and philosophers of aesthetics, was that "the essence of creative
thought inheres in the process of bringing disparate mental elements together to form new and useful
combinations." (Coney and Serna, 1995).
Coney and Serna discuss a number of information processing theories which would seem to be
consistent with psychodynamic explanations. One such, put forth by Lewis and Anderson (1976), has
been dubbed "the fan effect", and is based on Anderson's (1976) ACT model. This suggests that, in
information processing parlance, the capacity for new associations is inversely proportional to the
number of links supported by a given knowledge structure, or which connect a given "node" to other
nodes. More creative people have access to a greater number of source nodes, which can communicate
via a greater number of different pathways or links, rather than all cognitive tasks being supported by
a limited number of links from a limited number of source nodes. The capacity to access, modulate
and interrelate greater and more vivid modes of experience is precisely one of the goals of
psychodynamic therapy and a measure of psychological adaptability. While numerous secondary
hypotheses of Mednick's have not been supported, the basic one, that creative people were able to
produce significantly more associations to target stimuli than non-creative people, has been supported
in a variety of contexts and supports the use of the Remote Associates Test in this study.
Keith Sawyer (1992) explores, via interviews with jazz improvisers and citations of the creativity and
cognitive-processing literature such as Csikszentmihalyi, Simonton, Campbell, Hadamard and
Rothington, et.al, the subjective and functional aspects playing music. He refers to the two-stage
model of creativity which divides the process of creativity into the ideation and selection phases,
corresponding to the unconsciously generated material and the process and criteria by which it is
edited and selected. Sawyer explores the subjective experience of this in which the non-intellectual or
seemingly spontaneous, "non-conscious" generation of musical ideas in improvisation is then filtered
through partly-conscious form, personal criteria and social necessity (the ongoing effort of the band
and the response of the audience). The selection and refinement process applied to raw musical
impulse describes what would be subsumed under ego functioning in the psychoanalytic paradigm.
An important stimulus for this study is the work of Mary Louise Serafine (1988) in her exploration of
the music as a cognitive faculty inherent in the biological mind. Dr. Serafine explores the social
functions of music, as well as studying in the correspondence between learning, music theory and the
experience of making and hearing music, linking epistemological concerns and developmental
psychology. She explores the music as an activity of mind akin to language, its development in the
child, as well as the nature of abstraction, time, emotion, collaboration, and the nature of vocal and
non-vocal musics. She argues against the notion that music is a human appropriation of physics and
thus only a method, technique or artifact.
Boundaries and Relational Phenomena
One explicitly psychodynamically-oriented study was done by Juni et al (1986). The authors correlate
preferences for selected passages of music noted for a distinct emotional tone or program with the
subject's Rorschach percepts. Subjects were first administered the Rorschach, which was standardized
to 25 uniform responses, and scored for anal, oral, sadistic and phallic fixations, according to Juni and
Fischer's (1985) expanded lexical word count. Musical preferences were scored on a 1-4 score, and
correlated to the tonalities of each piece of music presented to subjects. The study identified oral
fixation issues correlating to the preference for minor tonality music; minor tonalities are commonly
experienced as sad or evoking feelings of loss or passage. This is taken to illustrate the emotional
valence of music and its influence upon affective, psychodynamic dispositions. These results lend
strength to the expectation that oral-stage attachment issues are reflected in those more susceptible to
and seeking strong musical-emotional valences.
In Sousa's study The Relationship Between Boundary Permeability of Psychoanalysts and their
Attitudes Toward Countertransference (1997), Dr. Sousa points out that the capacity for mature object
relations is predicated upon a clear differentiation between internal and external reality - understood
as a chief function of the ego. She cites work from theoretical positions including those of Mahler,
Pine and Bergmann (1975), Winnicott (1971), Searles (1986), Hartmann, Kris and Loewenstein
(1949), and Blatt and Wild (1976), supporting the notion that psychic functions are interdependent.
However, of immediate concern for this study is Sousa's finding of correlations between measures of
"insecure attachment" on the Bell Object Relations/Reality Testing Inventory, and thin inner
boundaries on the Hartmann Boundaries Questionnaire. This correlation and relevance of these
variables in the very different context of her study supported the choice of these two instruments for
this study.
Kohut (1955) describes music in terms of regression in tservice of the ego, and implies functioning
across boundaries:
"Music...as an extraverbal mode of mental functioning, permits a specific, subtle regression to
preverbal, i.e. to truly primitive forms of mental experience while at the same time remaining socially
and aesthetically acceptable".
Kohut's comment illustrates the psychoanalytic notion that the ego brings into cooperation the drives,
external criteria and the need for social conformity. This implies that the successful regression in
service of the ego depends upon "social acceptability", involving managing relational necessity in a
pre-verbal mode - straddling developmental levels without violating reality-testing and social
participation. It also raises the issue of compromise between conflicting agencies, a way of making
the unacceptable acceptable and giving structure and expression to what is chaotic and primitive - a
questionaable understanding of the creative process.
A few entries in the literature address clinical links between object-relations and music. Dvorkin
(1991), offers clinical examples of the link between pathology, therapeutic process, and affect on the
one hand, and musical creation, affective tone of the music and verbal reflection upon the music in the
treatment of 17 year old borderline girl. She supports the notion of music as a transitional object
facilitating and structuring the emergence and expression of primitive contents. Types and ranges of
tonality and musical dynamics are linked to particular affective and self-states - precisely those areas
left open by Serafine's work.
Dvorkin (1992) explores the use of music as a transitional form of non-verbal communication and
social involvement among high school students. She finds higher degrees of capacity for trust and
intimacy among music students than a control group, with evidence of higher developmental levels,
but no greater capacity for individuality, suggesting that this supposed relational capacity is dependent
upon the ego-binding and interpersonally-connective functions of the music. This raises the question
of whether the engagement in music provides an ego-binding function at the same time relying upon a
degree of ego permeability, which manages to exist without a significant increase in pathology or
distress.
However, pathological considerations are not the focus of this study. More immediately relevant is the
question of how and why one engages with - or is engaged by - the materials of ones life, consistent
with Bollas' idea of the personal idiom. This is tested by the Bell Object-Relations/Reality Testing
Inventory (BORRTI). The BORRTI reveals general, clearly defined and operationalized relational
patterns and tendencies, as well as indicating pathological extremes where they are evident. Of
greatest relevance here are attachment issues, social competence, egocentricity and alienation.
Instruments
Hartmann et. al. lay the groundwork for the Boundaries Questionnaire studying nightmare sufferers
(1981). Two studies compared nightmare sufferers with non-nightmare vivid dreamers and
non-nightmare non-vivid dreamers, giving subjects at least two psychiatric interviews, the Rorschach,
MMPI and five TAT cards. The results are as follows: Compared with controls and with population
norms, nightmare sufferers dreamed in greater length and frequency. They displayed greater fluidity
with the respect to the content and transformative quality of their dream imaages, self-and
other-representations and emotions; they shifted readily from one dream into another or awakened
from one directly into another. They reported difficulty knowing whether they had awakened or not
after nightmares or other intense dreams. They reported more drowsiness and/or daydreaming, with a
more "daymares", or reverie drifting into unpleasantness.
During interviews, the nightmare subjects were reported to have free-associated more readily, taken
more time offering detailed answers and many more associations. They were "immensely trusting and
open...sharing all kinds of intimate detail...much more so than the control groups". They reported
more conflictual relationships in their personal lives. They all described childhood and adolescence as
difficult or complicated, more so than control groups. However, there was no greater incidence of
trauma or abuse. Nearly all nightmare sufferers reported involvement in the arts, teaching or forms of
healing or therapy. No nightmare sufferers (26 out of n=50 subjects) were in blue-collar or 9-5 white
collar jobs.
Hartmann points out that although descriptions suggest psychopathology, based upon
symptomatology fewer than one third of 26 nightmare sufferers qualified for formal DSM-III
diagnosis, despite reported intensity and chronicity of nightmares. Of these, most were tentative Axis
II diagnoses, two were possible schizophrenics and none were anxiety disorders.
Nightmare sufferers showed distinct MMPI characteristics, with significant elevations on psychotic
scales (Pa, Pt , SC Ma) but no elevations on the "neurotic" side associated with depression and
anxiety. Hartmann states that compared to controls, this does not indicate a "sick" population. He also
points out that elevations on psychotic scales are equally characteristic of borderline patients, people
with psychotic diagnoses and art students - the latter having no greater incidence of serious diagnoses.
On the Rorschach, nightmare sufferers showed more primary process and vivid content in their
percepts, but did not differ from the other groups on any standard Exner measures. However, with
respect to "permeable boundaries", from the work of Blatt and Ritzler (1974) and Fisher and
Cleveland (1958), the nightmare group scored significantly higher (p>.01).
On TAT cards targeting interpersonal aggression and hostility, nightmare sufferers showed no
elevation. The highest levels of hostility and aggression were displayed by male
non-nightmare/non-vivid dreamers. This suggests, the relevance of further study regarding the effects
of repression.
Hartmann summarizes the findings as indicating that nightmare sufferers, who tended to be in the arts
and helping professions, were no more anxious, depressed or hostile than controls, and displayed only
slightly greater incidence of specific, well-contained pathologies. He reports that the interviewers' and
testers' descriptions of the nightmare subjects frequently contained words and phrases like
"vulnerable", "undefended", "vivid" (with respect to both verbal imagery and behavior), and
"tendency to merge". Hartmann states that these descriptions yielded the term "thin boundaries".
In order to systematically study the boundary phenomena emerging from the dream research,
Hartmann, et al. devised the 145 item Boundary Questionnaire. Hartmann distinguishes between inner
boundaries and outer boundaries. Inner refers to phenomena of feeling and dreaming and the ways in
which thinking, feeling, and particular thoughts and feelings are separate or continuous with each
other. Outer refers to tendencies, preferences and opinions about the outside and social world.
Hartmann suggests two areas of inquiry for this study, reporting that subjects with thick inner and thin
outer boundaries had little psychiatric difficulty and close ties to significant people in their lives.
Conversely, those with thin inner and thick outer, including artists, had significantly more psychiatric
difficulty, whereas successful artists had a more even balance of inner and outer thinness, perhaps
indicating less vulnerability to chaotic primary process and a more efficient handling of inner and
outer reality.
The Hartmann Boundaries Questionnaire is outlined further in Appendix A.
BORRTI
Object relations are measured utilizing the Bell Object Relations-Reality Testing Inventory. Bellak,
Hurvich and Gediman (1973) devised an ego-functioning-oriented clinical interview to identify
aspects of object relations and reality testing, the thrust of which had many points of convergence
with key boundaries concepts. The latter consisted of "reflective awareness", "accuracy of
perception", and the "ability to distinguish between internal and external" experience. The former is
derived from the quality of relationships and self-experience in relation to others.
Bell, Billington and Becker (1985, 1986) created a self-report, true-false measure which would
address the areas aimed at by Bellak, et al. Their inventory consists of subscales assessing Object
Relations (OR) issues of alienation (Aln), insecure attachment (IA), egocentricity (Egc), social
incompetence (SI), and Reality Testing issues (RT) of reality distortion (RD), uncertain perception
(UP) and hallucinations and delusions (HD).
The BORRTI subscale most relevant to this study is the insecure attachment scale (IA), which Bell
(1991) points out is the most likely of the scales to be elevated in high-functioning individuals. This
serves the dual purpose for this study of reducing the influence of pathology per se as a focus, and of
identifying intrapsychic issues hypothesized to be characteristic of creative people in any number of
fields. Individuals with elevations on this scale are likely to be sensitive to rejection, criticism and
threats to closeness. The sensitivity of the BORRTI to indicators of pathology, should these be
relevant in the sample tested, will serve additionally as a control mechanism.
The BORRTI is outlined further in Appendix B
Remote Associates Test
Sarnoff Mednick (1962) studied subjects' associations to stimulus words (with predetermined correct
solutions) as a measure of creativity, looking at the clustering of correct associations to both obvious
and obscure stimulus words. He states that his method is not intended to identify any particular
creative process in any particular field, but rather to tap into a set of processes which underlie all
creative thought. Mednick's original published use of the instrument achieved reliability scores of .92
among women and .91 among men of college age. In this study, it is used as a general measure of the
cognitive capacity to access conceptual connections; the distinction between algorithmic and heuristic
problem solving is relevant to this ability, with respect to the ability to find the idiosyncratic
connections necessary for the creative process.
To clarify algorithmic and heuristic: The former is, by definition, a formula for discovering or
clarifying a correct solution through the least number of pre-determined steps and is thus reductive in
nature, and non-creative. Summarizing Mednick, heuristic refers to an idiosyncratic or novel approach
to a problem and presumes that neither the method nor the result are predictable or predetermined, but
depend upon access to personal associations rather than learned method or technique. With respect to
the RAT items, it is expected that the "Difficult" items require more heuristic thinking, based on a
fluidity of boundaries allowing an ease of association and an absence of obvious or ready-made
connections between words. Though there has been argument in the literature as to whether the RAT
tests creativity per se, the abilities, as tested by the RAT, to access and synthesize both established
forms and methods and personal image and idea part and parcel of creative work.
Methodology:
40 subjects participated in this study. These were major-declared students, culled from music and
business classes and two neighboring Long Island universities of similar demographic constitution.
They were approached, with the cooperation of professors and department chairpersons, via brief talks
given to classes, and via sign-up sheets, approved by department chairpersons. Incentive was offered
in the form of 3 lottery style cash prizes to be paid at the completion of the study. Students were
encouraged to attend group testing sessions held in university classrooms. Those who were unable to
do so were tested individually or in pairs at times and places selected for minimum distraction.
Procedure did not differ between settings, except for waiting for all subjects in group administration to
finish one instrument before commencing with the next. Each was administered a questionnaire
packet consisting of consent form, demographic data, the Remote Associates Test, Boundaries and
BORRTI. Each RAT item was allotted 30 seconds for completion,
Subjects were informed that the experiment concerned personality and career choice. They were asked
to read and sign the consent form and complete a brief demographic questionnaire. The timed Remote
Associates Test was administered next. After instructions, subjects were given up to thirty seconds for
each of the word association problems. They were then instructed on the procedure for the Hartmann
questionnaire, followed by the BORRTI questionnaire, neither of which were timed. The examiner
was available for questions and debriefing subsequent to testing.
RAT items were hand scored, and Boundaries and BORRTI questionnaires were computer-scored and
analyzed using dedicated software.
Pilot Hypotheses
Hyp.1: On the Remote Associates Test, music majors will achieve a greater number of correct
answers than business majors, particularly on the Difficult test items. As an exploratory hypothesis, it
is predicted that this will be accompanied by a greater willingness to try, indicated by higher number
of attempted responses.
The "heuristic" thinking, or the freer associating, necessary to complete the difficult items, will be
easier for the music majors due both to previously discussed personality factors and that, due to the
continuous exposure to novel and non-reductive challenges associated with their work, they will be
more adept at finding remote or counterintuitive solutions. This hypothesis follows Mednick's original
results to this effect.
Hyp.2: Musicians will have thinner boundary scores than business majors, as indicated by higher
Sumbound scores.
Hartmann's initial research indicated that thinner-bounded individuals were found more often, among
others, in artistic professions. In addition, thick responses correspond to the organized, discrete,
quantifiable, emotionally neutral axis of cognitive style associated with business and finance.
Hypothesis 3:
Musicians will score higher than business majors on the BORRTI measure of insecure attachment.
The functions of music as a special category of transitional object, as previously discussed, would
suggest the proximity of oral and attachment issues, among the other "primitive forms of mental
experience", (Kohut, 1955), such as the omnipotence and pre-verbal expressive functions of playing
music. However, as BORRTI measures are more pathology-dependent than not, and previous findings
(Dvorkin, '92) indicate the adaptive value of musical involvement, this is a tentative prediction.
Hypothesis 4: There will be a positive correlation between boundary thinness and Insecure
Attachment subscales, based upon Sousa's (1993, 1996) findings to this effect. Sousa (1994) found a
significant positive correlation between Insecure Attachment and the Sumbound scale of the
Hartmann Boundaries questionnaire (r= .4276, p < .001), suggesting that people with thinner
boundaries overall will also have issues in the areas described by Insecure Attachment. This may be a
stronger indicator of the idea suggested in Hyp. 3.
Results
N=40 (20/20)
CORRELATIONS BETWEEN SUBSCALES: BOUNDARIES AND BORRTI
Hypothesis 1: Music majors will achieve a greater number of correct answers than business majors on
the Remote Associates Test, particularly the Difficult test items. As an exploratory hypothesis, it is
predicted that this will be accompanied by a greater willingness to try, as indicated by a higher
number of attempted responses.
TABLE 1
Means
TOTAL CORRECT DIFFICULT CORR EASY CORRECT
MUSIC 9.0 2.1 6.89
BUSINESS 6.87 2.0 4.87
(p= .035) (p=.425) (p=.006)
TABLE 1a.
CORRELATIONS BETWEEN SUBSCALES: BOUNDARIES AND REMOTE ASSOCIATES
TEST
Music majors did achieve a significantly greater number correct as indicated by the Total Correct
figures, but these were clustered within the Easy items, with no significant difference found on the
Difficult items. Hypothesis #1 is thus supported, but not with the expected strength. The exploratory
hypothesis is not supported, as there was no significant difference in number of responses attempted.
Hypothesis #2: Musicians will have thinner boundaries than business majors, as indicated by higher
Sumbound scores.
Sumbound
Music: M=317.8
Musicians demonstrated thinner overall boundaries by a wide margin. This finding was consistent
across most boundary subscales with the exception of the Precision and Sensitivity subscales, in
which there was no difference.
Hypothesis #3: Musicians will score higher than business majors on the BORRTI measure of insecure
attachment.
z-score
Music: M=.0863
While the mean score was, at first glance, quite a bit higher for musicians, this hypothesis is rejected
on the criterion of variance.
Hypothesis #4: There will be a positive correlation between boundary thinness and Insecure
Attachment.
Discussion
A number of issues point toward further research. One comes from the pathological/non-pathological
distinction. Sousa (1996) documents the correlation between severe psychiatric illness and the very
thin inner/very thick outer boundary profile. While the musicians in this study had thinner inner
boundaries than outer, the differences between musicians and business majors with respect to outer
boundary thickness were far less than for inner boundaries, suggesting that the ego functions of both
groups were intact, consistent with the dismissal of pathology as a factor. In other words, the
populations may be considered to be well adapted and not, on the whole, pathological. The issue may
be one of degree, wherein a "normal" population of artists have both serviceable though less flexible
or sophisticated social coping capacities (outer boundaries) than the inner associative fluidity
responsible for creativity (inner boundaries). This profile can easily be seen as yielding to pathology
when the inner experience becomes overly fluid and undifferentiated and irreconcilable with outer
reality, which is dealt with in a brittle and inept manner, a description which, though incomplete, is
consistent with psychosis. Measures of egocentricity, alienation and uncertain perception may reflect
both a preoccupation with ones own inner experiences and interpretations and a resultant sense of
otherness, doubt and separateness - an interpretation which seems consistent with the thin inner/thick
outer profile.
The pathology-sensitive BORRTI yields results supporting the predicted links between object
relations and career choice, while indicating no significant psychopathology. Hartmann's findings that
his thin-bounded subjects had more vivid and distressing access to inner, irrational material but
without diagnosable disorders corresponds to this, and suggests that "psychopathology" rests along a
loose continuum of common human experience and may often be a matter of degree, not kind, of
factors producing reality and relational distortion. With regard to Kissen's (1995) work on the linkage
of affects and internalized objects, the fact that the inner boundaries of musicians are far thinner than
outer and that thin inner boundaries as a whole are highly correlated with attachment issues may
demonstrate both the links between creativity and inner fluidity, and that the affects which "drive" the
creative process and motivate the choice of an artistic career derive from the affect and personal
meaning of internalized early object relationships. Given the thin inner boundaries of musicians and
the strong correlation between inner boundary thinness and insecure attachment, the variance-related
lack of significance in the difference between musicians and business majors with regard to insecure
attachment may be considered a statistical anomaly. Additionally, the lack of correlation between
insecure attachments and outer boundary fluidity may suggest higher level ego functions "layered
atop" the oral issues explored in the aforementioned study by Juni et. al. (1986), corresponding most
closely to the attachment issues as reflected in the BORRTI. This is a fruitful topic for future research.
An offshoot of this would be the study of Hartmann's (1981) preliminary finding that composers
("pure" artists) scored thinner than instrumentalists ("interpretive" artists).
It may be more difficult to account for the reversal of the inner/outer profile with regard to
performance on the Remote Associates Test than the Boundaries Questionnaire. There was a strong
correlation between performance on the RAT and outer boundary thinness, but no correlation between
RAT performance and inner boundaries. It may be that there is a qualitative difference
psychodynamically between fluidity in verbal and non-verbal associative ability, and that the
object-activating functions utilized by musicians while playing music (i.e. those involving objects
active in insecure attachment issues) tap into pre-verbal psychic territory; both music and enterprise
tap into distinct but related psychological processes. Thus, the "creativity" which the musician uses in
making music is distinct from that which he uses when solving a heuristically challenging word
puzzle, and that it is due to a globally, reasonably well-adapted and diversified personality that he is
able to do both. This should not be surprising given recent advances in understanding the neural and
informational mechanism activated in various cognitive processes, such as those of Anderson and
Lewis (1976) discussed earlier.
As previously discussed, a key area for further study involves how character is both a determinant and
a result of choosing "objects" in the broader sense, a choice which involves confronting several
"boundaries", mandates, personal criteria, etc., and modifying them by adapting them to the emerging
structure of the individual character, which is never fixed but is in a dynamic process of fixing itself
and modifying its boundaries through such choices. One can consider a reed player's angst-ridden
decision to play the sax in a jazz quartet instead of the clarinet in an orchestra as a determining factor -
a vector. or moment of truth - in the emerging sense of himself and the life he will lead, as much as
having been brought about by earlier factors of which he was merely the locus or result. This can only
expand the conception of the individual self as a key player in his own dynamics, not merely a dyad of
conscious puppet and unconscious puppeteer residing in a single body, or of historical cause and
human effect. Further study should address the complementarity and interaction or psychic
mechanisms such as those addressed here and in related study, perhaps with respect to insights from
other areas of study of dynamic systems. There is much in this fertile field to apply to the uniqueness
of the individual and the phenomenon of the dynamic relation with the inner and outer world which
yields something as odd as a personality or a self. In so doing, we may achieve valuable insights into
creativity, work and love in vivo, and neither merely in vitro, cross-section nor theory.
References
Anderson, J.R., (1983). The Architecture of Cognition. Cambridge, MA: Harvard University Press
Bell, M. (1991). An Introduction to the Bell Object Relations and Reality Testing Inventory, Los
Angeles, CA: Western Psychological Services
Bell, M., Billington, R. & Becker, B., (1985). A Scale for the assessment of object relations:
reliability, validity and factorial invariance. Journal of Consulting and Clinical Psychology, 42, 5,
733-741.
Blatt, S.J. and Ritzler, B.A., (1974). Thought disorder and boundary disturbance in psychosis. Journal
of Consulting and Clinical Psychology, 42,3 370-381.
Bollas, C. (1992). Being a Character: Psychoanalysis and Self Experience. New York: Hill and Wang
Bollas, C. (1989). Forces of Destiny: Psychoanalysis and Human Idiom. New Jersey: Jason Aronson
Coney, J. & Serna, P. (1995). Creative thinking from an information processing perspective: A new
approach to Mednick's theory of associative hierarchies. Journal of Creative Behavior, Vol. 29,
Number 2, 109-132
Dvorkin, J. (1992) Ego Development and Self Representation Among High School Adolescents in
Music Performing Groups, Doctoral Dissertation, Pace University, Department of Psychology, New
York, NY.
Dvorkin, J., (1991). Individual Music therapy for an adolescent with borderline personality disorder:
An object relations approach. In Case Studies in Music Therapy (ed. Bruscia, K.E.). Phoenixville, PA:
Barcelona Publishers
Eigen, M. (1996). Psychic Deadness. New Jersey: Jason Aronson.
Forbach, G.B., & Evans, R.G. (1981) The remote associates test as a precondition of productivity in
brainstorming groups. Applied Psychological Measurement, 5, 333-339.
Greenberg, J.R., & Mitchell, S.A. (1983). Object Relations in Psychoanalytic Theory. Cambridge,
MA: Harvard University Press.
Hartmann, E., 1990. Thin and thick boundaries: personality, dreams and imagination. In Mental
Back to index
Appendix A
Hartmann et al organized their questions into twelve general categories, which comprise the subscales
of the questionnaire. Each subscale contains questions about experiences, opinions, preferences,
tendencies, etc., reflecting more concretely the personal boundary tendencies of the subject. They are:
1. Sleep/Wake Dream
example: "When I wake in the morning, I am not sure whether I am awake for a few minutes".
2. Unusual Experiences
example: "I have had deja vu experiences"
3. Thoughts, Feelings, Moods
example: "Sometimes I don't know whether I am thinking or feeling"
4. Childhood, Adolescence, Adulthood
example: "I am very close to my childhood feelings"
5. Interpersonal
example: "When I get involved with someone, sometimes we get too close"
6. Sensitivity
example: "I am very sensitive to other people's feelings"
7. Neat, Exact, Precise
example: "I keep my desk and worktable neat and well organized"
8. Edges, Lines, Clothes
example: "I like houses with flexible spaces, where you can shift things around and make different
uses of the same rooms"
9. Opinions about Children
example: "I think a good teacher must remain in part a child"
samples undergoing treatment. Validity estimates are based on the use of the BORRTI with diverse
and divergent clinical populations over time. The authors report on more than a dozen such studies
prior to 1991 in which the BORRTI has demonstrated discriminate, concurrent and predictive validity,
others in which it has been used as an outcome measure, and a review in Tests Critiques (Alpher,
1990) which concluded that it is a reliable and valid instrument for assessment of object relations and
reality testing.
Proceedings abstract
wcooper@utdallas.edu
Background:
Aims:
Participants in this study will engage in a dual-task paradigm where they will
be exposed to both a musical stimulus (either a brief rhythm sequence or a
brief pitch sequence) and a non-musical stimulus (either a string of digits or
a temporally presented sequence pattern of lighted squares). They will be asked
to remember both sequences simultaneously. Immediately following this
presentation, subjects will perform a two-alternative forced choice decision
for each of the two stimuli presented. Each of the two-alternative forced
choice tasks will present a correct and incorrect sequence, respective to the
earlier presented stimulus. For each of the two tasks, the subject responds by
indicating which stimulus sequence was presented earlier.
Results:
Results from the previous study indicate that the performance on memory tasks
for the four stimuli sequences is poorer when digit sequences are paired with
rhythm or pitch sequences, and when rhythm sequences are paired with block
sequences (lighted squares). It is predicted that the current experiment will
produce similar results.
Conclusions:
Baddeley's model of working memory seeks to account for how one might store
phonological and visual-spatial information in working memory. However, it is
not clear how this model accounts for the storage of different types of musical
information. The present line of research described here helps to further
define Baddeley's model by indicating that the resources used to process
musical and non-musical information overlap in predictable ways.
Back to index
Proceedings paper
Nearly all memory theorists agree that two forms of memory storage exist: short-term memory and long-term memory (the first, who
proposed this duality, was James, 1890). Short-term memory refers to the information that forms the focus of current attention and that
remains in consciousness after it has been perceived and forms part of the psychological present, while long-term memory contains
information about events that have left consciousness, and are therefore part of the psychological past. It holds information for a long time
- days, months, years. It is obvious that short-term memory plays an important role in perception of music. Successful processing of just
perceived pitch and temporal information requires to keep perceived stimuli for certain time period in a short-term memory store.
There are not enough empirical works concerning memory for tempo. The significant study by Levitin and Cook (1996) was devoted to
long-term remembering of tempo of familiar songs. However, any research in short-term remembering of speed of tempo in general form,
e.g. without association with a specific piece of music, has not been done yet. Psychology of music disposes with a large body of
knowledge from the domain of short-term memory for pitch (see Deutsch, 1999). It is known that there is a special memory store for pitch
and that remembering of pitch decays very slowly. However, no similar knowledge is available on short-term memory for temporal
information, for instance for a rate of tempo
Our research was devoted to the short-term memory for rate of presented metronomic sequence. The aim of the present study was to
investigate how remembering of a rate of a tempo gradually decays and to test whether the decay of memory differs in various tempo
zones. The experiment was designed to be a pilot research in this problem.
Method
Thirty-four subjects, music amateurs, aged between 19 and 28 years, participated in the experiment. They were asked to hear a short
sequence in the standard tempo and after a retention interval to reproduce the rate of the sequence by finger tapping onto a special tapping
device connected with a computer. The device enabled to measure duration of intertap intervals produced be the subjects. The standard
tempo was presented via an electronic metronome. The rate of a tempo was defined by duration of the intervals between metronome clicks.
The following tempi were used: 300, 600, 900, 1200, and 1500 msec. The retention interval was 3, 10, 20, or 30 sec, respectively. The end
of the retention interval was marked by a green light, which appeared on the computer display.
All stimuli were presented in a random order. During the experimental session, subjects completed 20 trials, each trial consisted of a
certain combination of a particular tempo rate and retention interval (5 tempi x 4 durations of the retention interval). Subjects were asked to
avoid rhythmical body movements and/or internal continuation of the tempo (counting or continuos mental presentation of the tempo)
during the retention interval.
Results
For each trial, the averages from durations of intertap intervals referring to the rate of the retrieved tempo, were computed. The memory
decay was expressed as the magnitude of an error of the retrieved tempo with respect to the rate of the standard tempo, in the form of
absolute deviation (absolute magnitudes of positive/negative deviations between retrieved and the standard tempo, see Fig.1.).
Fig.1. Memory decay of retrieved tempo as a function of tempo zone and duration of retention interval. The means and standard errors are
presented. The error is expressed in the form of absolute deviation (absolute magnitudes of positive/negative deviations between retrieved
and the standard tempo).
The effect of the (1) duration of a retention interval and (2) the rate of the tempo on the magnitude of error on absolute deviation was tested
with repeated measures ANOVA. Both (1) duration of a retention interval (F = 51.59467, p <.000000) and (2) tempo (F = 5.38411, p
<.000000) revealed significant effects. Generally, the longer retention interval, the greater error. However, the clear effect of memory
decay as the function of duration of a retention interval was observed only in fast and intermediate tempi. The error was greater in slower
tempi than in faster tempi. When the effect of duration of retention interval absolute deviation was tested within particular groups of tempi,
the significant impact was found only in tempi 300 and 1200 ms.
Discussion
In our experiment we demonstrated that the remembering of a rate of tempo depends on duration of a retention interval and a tempo zone.
However, the memory decay, which we observed, was not very large. Especially in fast and intermediate tempi the decay, even after the
retention interval of 30 seconds, was rather small. On the other hand, it is surprising that after the shortest retention interval 3 seconds the
tempo recall was not very precise. It is in the contrast with assumptions of time psychologists, that the temporal interval 3 seconds falls
into the psychological present (for instance, Fraisse,1984).
To date there is no appropriate theory describing the process of remembering a tempo rate. It is obvious that there is a substantial
difference between the task to remember a tempo and classical memory experiments, where subjects are asked to remember, e.g., digits or
letters. Schulze (1978) and Keele et al. (1989) refer memory model for discrimination of tempo change. According to the model, a listener
derives from a regular temporal sequence an internal reference interval, which he/she uses for mental representation of the rate of the
sequence. In accord with the mentioned model we might assume that, in our experimental task, the subjects encoded a rate of a standard
tempo in a form of a reference interval, which they remembered. Memory decay is then caused by gradual forgetting the precise duration
of the reference interval. We also assume that the subjects helped themselves in remembering by applying the categorization strategy. They
assigned particular tempi, which they perceived, to particular tempo categories (for instance "very fast", "fast", "intermediate" ...). If they
forgot a precise rate of a tempo, they might simply produce a similar tempo falling into the category. This encoding strategy might explain
relative small and slow memory decay of a remembered tempo rate.
References
Deutsch, D. (1999). The processing of pitch combinations. In D. Deutsch (Ed.), The Psychology of Music (pp.349-411), 2nd Edition. San
Diego: Academic Press.
Fraisse, P. (1984). Perception and estimation of time. Annual Review of Psychology, 35, 1-36.
Keele, S.W., Nicoletti, R., Ivry, R. I., & Pokorny, R. A. (1989). Mechanisms of perceptual timing: Beat-based or interval-based judgment?
Psychological Research, 50, 251-256.
Levitin, D. J. & Cook, P. R. (1996). Memory for musical tempo: additional evidence that auditory memory is absolute. Perception &
Psychophysics, 58, (6), 927-935.
Schulze, H. H. (1978). The detectability of local and global displacements in regular rhythmic patterns. Psychological Research, 40,
173-181.
Back to index
Proceedings paper
Introduction
The purpose of this paper is to investigate the nature of the implicit memory for the musical rhythm.
A great deal of recent studies about memory are devoted to examining the relation between explicit and implicit forms of
memory. Explicit memory refers to conscious or intentional recollection of previous experience; implicit memory, in
contrast, refers to unintentional retrieval of previously acquired information on tests that do not require intentional
recollection of a specific prior episode. Current researches have revealed dissociations between explicit and implicit
memory. For example, some studies have showed that explicit and implicit memory are differentially affected by such
variables as study/test modality shifts (e.g., Graf, Shimamura, & Squire; Roediger & Blaxton, 1987; Schacter & Graf, 1989),
levels and types of study processing (e.g., Graf & Schacter, 1989; Jacoby, 1983; Schacter & Graf, 1986;) and various other
manipulations (e.g., Hyman & Tulving, 1989a, 1989b; Michell & Brown, 1988).
Implicit memory can be confirmed by examining what we call priming effect. Priming is a phenomenon where processing of
the preceding stimulus influences the process of the succeeding stimulus and is classified into two types: direct priming and
indirect priming. Direct priming is observed when the preceding stimuli are exactly the same as the succeeding stimulus.
Therefore, it is also called repetition priming or perceptual priming. In contrast, indirect priming is observed when there is
semantic relation between the preceding and succeeding stimulus. In this article the term "priming" is used to refer to direct
priming.
Research concerning implicit memory has focused almost exclusively on tests involving visual processing. As for the
studies using verbal materials, for example, word identification (e.g., Graf & Ryan, 1990; Jacoby & Dallas, 1981), fragment
and stem completion (e.g., Hayman & Tulving, 1989a; Roediger & Blaxton, 1987), and lexical decision (e.g., Rueckl, 1990;
Scarborough, Gerard, & Cortese, 1979) are used as a test involving visual processing. There are also many papers about
implicit memory for nonverbal objects such as picture completion (e.g., Jacoby , Baker & Brooks, 1989; Snodgrass, 1989),
picture naming (e.g., Bartram, 1974; Michell & Brown, 1988), object decision (e.g., Schacter, Cooper, & Delaney, 1990),
and pattern completion and identification (e.g., Musen & Treisman, 1990).
Similarly, some researches have explored implicit memory in the auditory domain. Several studies have demonstrated
priming effects on auditory-word identification and sentence-identification tasks (e.g., Franks, Plybon, & Auble, 1982;
Jackson & Morton, 1984: Schacter & Church, 1992), on an auditory stem-completion task (Bassili, Smith, & MacLeod,
1989; McClelland & Pring, 1991), and the like. It is true that there is relatively little research in this field, but over the past
few decades a number of studies have been made on the implicit memory in the auditory domain.
In comparison with these researches, the study of the memory for music information so far only has scratched the surface of
the topic. To put it more precisely, there are some studies concerned with the musical information, but most of them deal
with explicit memory; for example, pitch information (e.g., Deutsch, 1970; Massaro, 1970; Sloboda, 1976), short tone
sequences with pitch height (e.g., Mikumo, 1990, 1992,1994a, 1994b), melodic contour (e.g., Bartlett & Dowling, 1980;
Dowling & Bartlett, 1981; Dowling & Fujitani, 1971) and so on. These researches have investigated the nature of explicit
These tone sequences were generated to satisfy three constrains: (a) the tone sequence was consisted of by more than two
kinds of note values, in other words, the tone sequence was NOT consisted of by one kind of note value; (b) the tone
sequence was not extremely syncopated one; and (c) the tone sequence was metrical but not random one. The reason for
these constrains was that the tone sequence consisted of only one note value was too simple and not appropriate as an
experimental material, and that a "metrical" tone sequence was more difficult to memorize than that of metrical one (cf.
Povel and Essence, 1985). With regard to the constraints (b) and (c), two musicians who did not participated in the present
experiment rated all the candidate tone sequences in terms of "how each tone sequence was "metrical"" on a 7-point
scale(1= not metrical , 7= metrical enough) and the tone sequences which were judged "metrical" were used in the present
experiment.
Participants studied 21 tone sequences. The remaining 21 were not studied; they were included on the priming task in order
to determine baseline levels of performance and on the recognition task as distracter items. The priming task and
recognition task thus consisted of 42 critical items: 21 studied tone sequences and 21 nonstudied tone sequences. The
presentation order of tone sequences on both tasks was determined randomly for each subject. The sound pressure level of
all tones were equal, at the comfortable listening level (about 75dB SPL). The distance between a subject and the speaker
was about 70cm. The timbre of a note was equal, the piano sound. The pitch height of all notes was equal in the study phase,
A4, and shifted either to E5 or B4 in the test phase (this is described more precisely in a procedure section).
Procedure
All subjects were tested individually in a soundproof chamber. Each experiment was conducted under conditions of
incidental encoding. Subjects were told that they were participating in an experiment on the preliminary experiment of
music perception, and no mention of a later memory test was made.
Study phase. In the study phase, subjects were informed that rhythmic tone sequences would be presented from the speaker
and their task was to rate "coherence" of the tone sequence on a 7-point scale (1= not coherent at all, 7= completely
coherent). They were further instructed to hear the tone sequence carefully, because the tone sequences were very short one,
and were told that it was important for them to make an accurate rating. The study phase then began with five practice items,
followed by presentation of the 21 critical tone sequences in a random order.
Test phase. Immediately after a study presentation, the half of the subjects were given instructions for the priming task and
the other half were given instructions for the recognition task.
The priming task was two-forced-choice task. On each trial, 2 tone sequences (both old and new items, in a random order)
were presented in succession at an interval of 2.0 sec. Loudness of two tone sequences (that is, intensity of all notes
consisting of the tone sequence) was the same.
The pitch of one third of old items or 7 tone sequences in the priming task was raised by shifting all pitches 5 scale steps
higher from A4 to E5. Similarly the other 7 tone sequences were lowered by shifting all pitches 5 scale steps lower from A4
to D4 (Figure 1). The pitch height of old items and new items was the same on each trial. Subjects were told that their task
was to judge which of tone sequences was the loudest. Subjects were instructed to hear carefully and mark at the designated
place on the sheet.
The recognition task was a surprise yes/no task. Subjects were instructed to mark either "Hai (means "yes" in Japanese)" on
the sheet if they remembered hearing
Figure 1. Example of stimulus used in this experiment; A is the original tone sequence used in the study phase and in the
pitch-identical condition of the test phase, B is the tone sequence used in the pitch-up condition and C is the tone sequence
used in the pitch-down condition.
the tone sequence during the prior rating task, or "Iie (means "no" in Japanese)" if they did not remember hearing the tone
sequences.
The pitch of one third of old items or 14 tone sequences in the recognition task was raised by shifting all pitches 5 scale
steps higher to from A4 to E5. Similarly, the other 14 tone sequences were lowered by shifting all pitches 5 scale lower from
A4 to D4. Six practice items (three new and three old) were presented before the 42 critical items. The order of presentation
was randomized. As in the priming task, a period of about 1 min intervened between the completion of the study task and
the appearance of the first critical item on the recognition test. The exact length of time to complete the recognition task
varied from subject to subject, but it generally took about 10 min.
After the completion of the test, all subjects were debriefed about the nature and purpose of the experiment.
Results
The results of the priming and recognition tasks are first considered separately and then followed by a contingency analysis
of the relation between them.
Priming task. Because the priming task was the two-forced-choice task, only the hit rate was analyzed. Hit rate was the
proportion of studied items called "louder." The priming effect was defined as the difference between the hit rate and chance
level
Figure 2. The result of priming task. The dotted line represents the chance level of 50%.
(50%, Figure 2).
Three important points should be noted about the results of the priming task. First, the priming effect was observed in all the
conditions. Subjects selected the studied items correctly in the pitch-identical condition, the pitch-up condition and the pitch-
down condition. Second, the priming effect in the pitch-identical condition was larger than that in the pitch-up condition and
that in the pitch-down condition. Third, the priming effect was almost the same in the pitch-up condition and the pitch-down
condition. This result shows that the shift of a pitch does not have an influence on magnitude of the priming effect.
Analysis of variance (ANOVA) confirmed this description of the results. A significant main effect of a pitch shift was
observed (F (2, 38)= 12.33, p<.01). Tukeyâ€(tm)s HSD test showed significant differences between the pitch-identical
condition and the pitch-up condition and between the pitch-identical condition and the pitch-down condition (HSD=0.24, p<
.01, HSD=0.19, p< .01, respectively). No significant difference was observed between the pitch-up condition and the
pitch-down condition.
Then, a further analysis was performed in order to confirm the difference between the result of hit rate and the chance level
in each condition. In all conditions, t test revealed that the result of hit rate was significantly higher than the chance level (t
(19)=3.11, p< .001, t (19)=2.96, p< .001, t (19)=3.21, p<. 001, respectively).
Figure 3. The result of recognition task. The dotted line represents the chance level of 50%.
Recognition task. Two different measures of recognition were subjected to an ANOVA: the hit rate and hit rate minus false
alarm rate. Since both analyses led to an identical conclusion, only the result of hit rate analysis was reported; this simply
reflects the fact that false alarm rates were relatively constant across conditions.
An overall ANOVA revealed no significant main effect of pitch shift. The difference between the result of the hit rate and
chance level in each condition was not significant (Figure 3).
Contingency analysis of coherence judgment and recognition performance. The purpose of the contingency analysis was to
determine whether priming effects on coherence performance are dependent on, or independent of, recognition memory. In
order to determine the relation between priming task and recognition task, the Yule Q statistic was used. Q is a measure of
the strength of relation between two variables that can vary from â€"1 (negative association ) to +1 (positive association); 0
indicates complete independence (Hayman & Tulving , 1989a). For the present data, Q=+.082 at the pitch identical
condition, +.99 at the pitch up condition, and +. 099 at the pitch-down condition. These value did not differ significantly
from zero; significance was assessed by a chi-square test suggested by Hayman and Tulving, χ2 (1, N=20)=0.52 at the
pitch-identical condition, χ2 (1, N=20)=0.42 at the pitch-up condition, and χ2 (1, N=20)=0.48 at the pitch-down condition.
The contingency analysis thus demonstrates stochastic independence between recognition and coherence judgment
performance.
Discussion
In this study the implicit memory for the music was investigated experimentally.
Priming task and recognition task were performed for the musical rhythm, that is to say, the rhythmic tone sequences
consisted of by more than one note value. The following results were obtained in the priming task: First, perceptual priming
was observed not only in the pitch-identical condition but also both in the pitch-up and pitch-down conditions. The results
show that the information that depends on the pitch height is encoded in the priming representation of a rhythmic tone
sequence. Secondly, the result of priming both in the pitch-up condition and the pitch-down condition was smaller than that
of the priming effect obtained in the pitch-identical condition. This result indicates that the information depending on the
pitch height of the stimuli is encoded in the priming representation. These results lead to the conclusion that the priming
representation of the rhythmic tone sequence encodes both information dependent and independent of the pitch height.
Thirdly, the magnitude of the priming effect was the same in both the pitch-up condition and the pitch-down condition. In
other word, the direction of a pitch shift did not influence the priming effect. We can explain this results as follows; the
priming effect both in the pitch-up and -down conditions is caused only by the information independent of the pitch shift
because the information dependent on the pitch shift is lost by ascending or descending the pitch height, and such
information is independent of the pitch shift.
On the recognition task, the results were at the chance level in all conditions. Coherent judgment performance seems to have
not been influenced by the recognition performance because the results of recognition task were statistically independent of
priming task.
After the experiment, almost all subjects who were assigned to the priming task stated their impressions that they could not
judge which of tone sequences was the loudest. They often said that they had judged haphazardly. The results show,
however, they selected studied item as a "loudest" tone sequence correctly. This suggests that the task was performed by
incidental encoding and subjectsâ€(tm) judgment was unconscious. We may, therefore, reasonably conclude that the
loudness judgment task reflects implicit memory.
As I said at the outset, the primary purpose of this research was to investigate whether a direct priming effect for a rhythmic
tone sequence was observed in the experiment. The result of the experiment showed the priming effect for the rhythmic tone
sequence and implicit memory for musical tone sequence was confirmed. The results also suggested that the pitch had a
certain influence on the priming representation of rhythm. We come to the conclusion that the pitch information is equally
encoded in the priming representation of the rhythmic tone sequence in spite of its direction of change.
This research examined the implicit memory for the musical rhythm. The nature of priming representation for rhythmic tone
sequences was clarified especially focusing on the pitch height of notes. Needless to say, we cannot say that the whole
nature of priming representation for the rhythmic tone sequence was clarified by only one experiment described above.
There is a room for further investigation; for example, whether the degree of pitch shift has an influence on the magnitude of
the priming effect, whether other information about a note, such as timbre, intensity and so on, can be represented in the
priming representation, and whether a non-metrical rhythmic tone sequence has a similar priming effect. These are
remaining to be proved in future investigation.
References
Bartlett, J. C. & Dowling, W. J. (1980). The recognition of transposed melodies: A key-distance effect in
developmental perspective. Journal of Experimental Psychology: Human Perception and Performance, 12. 403-410.
Bassili, J. N., Smith, M. C., & MacLeod, C. M. (1989). Auditory and visual word-stem completion: Separating
data-driven and conceptually driven process. Quartely Journal of Experimental Psychology, 41A, 439-453.
Bartram, D. J. (1974). The role of visual and semantic codes in object naming. Cognitive Psychology, 6, 325-356.
Bharucha, J. J., & Stoeckig, K. (1986). Reaction time and musical expectancy: Priming of chords. Journal of
Experiment Psychology: Human Perception and Performance, 12, 403-410.
Bharucha, J. J., & Stoeckig, K. (1987). Priming of chords: Spreading activation or overlapping frequency specta?
Perception and Psychophysics, 41, 519-524.
Dertsch, D. (1970). Tones and numbers: Specificity of interference in short-term memory. Science, 168, 1604-1605.
Dowling, W. J., & Bartlett, J. C. (1981). The importance of interval information in long-term memory for melodies.
Psychomusicology, 1, 30-49.
Dowling, W. J., & Fujitani, D. A. (1971). Contour, interval, and pitch recognition in memory for melodies.
Perception and Psychophysics, 9, 524-531.
Franks, J. J., Plybon, C. J., & Auble, P. M. (1982). Units of episodic memory in perceptual recognition. Memory &
Cognition. 10, 62-28.
Goto, Y., & Abe, J. (1998). Psychological reality of metrical units in rhythm perception. Proceedings of fifth
International Conference of Music Perception and Cognition, 335-340.
Graf, P., & Ryan, L. (1990). Transfer-appropriate processing for implicit and explicit memory. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 16, 978-992.
Graf, P., & Schacter, D. L. (1989). Unitization and grouping mediate dissociations in memory for new associations,
Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 930-940.
Graf, P., Shimamura, A. P., & Squire, L. R. (1985). Priming across modalities and priming across category level:
Extending the domain of preserved function in amnesia. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 11, 385-395.
Hebert, S., & Peretz, I. (1997). Recognition of music in long-term memory: Are melodic and temporal patterns equal
partners? Memory & Cognition, 25, 518-533.
Hyman, C. A. G., & Tulving, E. (1989a). Contingent dissociation between recognition and fragment completion: The
method of triangulation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 228-240.
Hyman, C. A. G., & Tulving, E. (1989b). Is priming in fragment completion based on a "traceless" memory system?
Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 941-956.
Jackson, A., & Morton, J. (1984). Facilitation of auditory word recognition. Memory & Cognition. 12, 568-574.
Jacoby, L. L. (1983). Perceptual enhancement: Persistent effect of an experience. Journal of Experimental
Psychology: Learning , Memory, and Cognition, 9, 21-38.
Jacoby, L. L., Baker, J. G., & Brooks, L. R. (1989). Episodic effects on picture identification: Implications for
theories of concept learning and theories of memory. Journal of Experimental Psychology: Learning , Memory, and
Cognition, 15, 275-281.
Jacoby, L. L., & Dallas, M. (1981). On the relationship between autobiographical memory and perceptual learning.
Journal of Experimental Psychology: General, 110, 306-340.
Johnson, M. K., Kim, J. K., & Risse, G. (1985). Do alcoholic Korasakoffâ€(tm)s syndrome patiendts acquire affective
reactions? Journal of Experimental Psychology: Learning , Memory, and Cognition, 11, 22-36.
Kawaguchi, J., & Mikumo, M. (1994). Implicit memory for music information: Priming effect on majar-minor
decision task for chord. Abstract of the 3rd Practical Aspects of Memory Conference. Mayland, U.S.A. 107.
MaClelland, A. G. R., & Pring, L. (1991). An investigation of cross-modality effects in implicit and explicit memory.
Quarterly Journal of Experimental Psychology, 43A, 19-33.
Mandler, G., Nakamura, Y., & Zandt, B. J. S. V. (1987). Nonspecific effects of exposure on stimuli that cannot be
recognized. Journal of Experimental Psychology: Learning , Memory, and Cognition, 13, 646-648.
Mitchell, D. G., & Brown, A. S. (1988). Persistent repetition priming in picuture naming and its dissociation from
recognition memory. Journal of Experimental Psychology: Learning , Memory, and Cognition, 14, 213-222.
Massaro, D. W. (1970). Retroactive interference in short-term recognition memory for pitch. Journal of Experimental
Psychology, 83, 32-39.
Mikumo, M. (1990). Merodi no fugouka to sainin [ Coding strategies and recognition of melodies]. The Japanese
Journal of Psychology, 61, 291-298. [In Japanese].
Mikumo, M. (1992). Encoding strategies for tonal and atonal melodies. Music Perception, 10, 73-81.
Mikumo, M. (1994a). Finger-tapping for pitch encoding of melodies. Japanese Psychological Research, 36, 53-64.
Mikumo, M. (1994b). Motor encoding strategy for pitches of melodies. Music Perception, 12, 175-197.
Musen, G., & Treisman, A (1990). Implicit and explicit memory for visual patterns. Journal of Experimental
Psychology; Learning, Memory and Cognition, 16, 127-137.
Povel, D. J., & Essens, P. (1985). Perception of temporal patterns. Music Perception, 2, 411-440.
Roediger, H. L. III, & Blaxton, T. A. (1987). Effects of varying modality, surface features, and retention interval on
priming in word-fragment completion. Memory & Cognition. 15. 379-388.
Rueckl, J. G. (1990). Similarity effects in word and pseudoword repetition priming. Journal of Experimental
Psychology; Learning, Memory and Cognition, 16, 374-391.
Scarborough, D. L., Gerard, L., & Cortese, C. (1979). Accessing lexical memory: the transfer of word repetition
effects across task and modality. Memory & Cognition. 7. 3-12.
Schacter, D. L., & Church, B. A. (1992). Auditory priming: implicit and explicit memory for words and voices.
Journal of Experimental Psychology; Learning, Memory and Cognition, 18, 915-930.
Schacter, D. L., Cooper, L. A., & Delaney, S. M. (1990). Implicit memory for unfamiliar objects depends on access to
structural descriptions. Journal of Experimental Psychology; General, 119, 5-24.
Schacter, D. L., & Graf, P. (1986). Effect of elaborative processing on implicit and explicit memory for new
associations. Journal of Experimental Psychology; Learning, Memory and Cognition, 12, 432-444.
Schacter, D. L., & Graf, P. (1989). Modality specificity of implicit memory for new associations. Journal of
Experimental Psychology; Learning, Memory and Cognition, 15, 3-12.
Schulkind, M. D. (1999). Long-term memory for temporal structure: Evidence from the identification of well-known
and novel songs. Memory & Cognition, 27, 896-906.
Sloboda, J. A. (1976). Visual perception of musical notation: Registering pitch symbols in memory. Quarterly
Journal of Experimental Psychology, 28, 1-16.
Snodgrass, J. G. (1989). Sources of learning in the picture fragment completion task. In S. Lewandowsky, J. C. Dunn,
& K. Kirsner (Eds.), Implicit memory: Theoretical issues (pp. 259-282). Hillsdale, NJ: Erlbaum.
Tekman, H. G., & Bharucha, J. J. (1992). Time course of chords priming. Perception and Psychophysics, 51, 33-39.
White, B. W. (1960). Recognition of distorted melodies. American Journal of Psychology, 73.100-107.
Back to index
Proceedings paper
By the way, "Lunch Box" was chosen for a chanting material for the Japanese subjects for good reason "Lunch Box" is not a
traditional rhyme but a relatively new one, which had spread among the postwar generation. Nowadays, it is no exaggeration to
say that all Japanese children learn it in kindergarten or nursery school. In general, it is often said that the rhythmic feelings of
the Japanese young generation quite differ from that of the old one. So, this study focuses on the clapping behaviors of the
younger generation as Japanese subjects. That is an important reason why "Lunch Box" was selected.
English rhyme, "One Potato"
Óne Potato Twó Potatoes Thrée Potatoes Fóur,
Fíve Potatoes Síx Potatoes Séven Potatoes Móre,
Japanese rhyme, "Lunch Box"
Korékkurainó Obéntobakoní, Ónigiriónigiri Chóttotsumeté,
Kízamishóugani Gómajiopáppa, Nínjinsánn, Góbosán
Ánanoáita Rénkonsán, Sújinotóutta Fúkí
Apparatuses
A clipboard for holding a paper printed with the rhyme, a metronome and a videotape recorder were set in front of subjects.
Each of the subjects was equipped with a motor sensor "ATOM 8" on their right wrist (see Photo 1 and 2), which detects speed
of rotary motion. Each performance was taped in digital videocassette and simultaneously digitized by computer (Macintosh
PowerBook G3) through ATOM 8. The data were recorded every 0.02 sec.
Photo 1 Photo 2
"Open" "Hold"
Procedures
First of all, each subject was asked whether he was familiar with the prepared rhyme. Everyone said, "Yes" and then he was
orientated towards the experiment phases. They were requested to equip with "ATOM 8" on their right wrist and to clap at the
downbeat chanting the rhyme individually. The common tempo, M.M=108, which seemed to be somewhat slow for both
groups, was adopted with the reasons that it is rhythmically acceptable and a slow tempo can make the differences of clapping
behaviors between the two groups clear. The experimental procedures consisted of two phases, that is, the training phase and
the testing one. In the former, each subject trained to chant the rhyme and to clap synchronizing to metronome sounds. After
two or three trials they entered into the testing phase. They performed the task three times. All the participants accomplished
their task successfully.
DATA PROCESSING
The last trial of each subject was adopted as a representative of one')s trials. Moreover, the six claps both from the front and
from the rear were excluded from further analysis due to the reason that the conditions of the performance were sometimes
unstable at the beginning and at the end of trials. So, twelve claps in the middle part of the last trial of each subject were
analyzed respectively.
A cycle of clapping motion was divided into three parts, that is, "Close", "Hold", and "Open". In addition, a state of "Stop" in
"Hold" was taken into account in data processing. Table 1, which is a fragment of digitized raw data of a subject from the
English group, shows a progression of a cycle of clapping motion. The number of Rotary Motion (R.M) means the direction
and the speed of the rotary motion of ATOM 8; if ATOM 8 turns clockwise, R.M becomes a positive number, and if ATOM 8
turns anticlockwise, R.M becomes a negative number. Furthermore, the absolute value of R.M correlates with the speed at
RESULTS
See Figure 3 and 4 again. In the English group "Open" motion starts preceded to upbeat, whereas in the Japanese group it starts
coincidentally with upbeat. In other words, at the moment when upbeat comes English native speakers are already in the
middle of "Open" motion while Japanese eventually begin to open their hands. In order to explain why the difference is caused,
the second point, that is, the differences of the allocation of time for "Hold" motion between the two groups should be
Back to index
Procedings paper
Background
Musicologists have addressed the issue of dotting ratios in performance practice by citing the commentary of contemporary writers such as
Couperin (1716), Quantz (c.1752) and C. P. E. Bach (1753), and by relying on their own experience, knowledge and intuitions (eg.,
Donington, 1989; Neumann, 1982, 1993). However, no performance practice researchers have attempted to determine the issue of the
perception of dotting empirically. Dotting ratio refers to the performance of a long note followed by a short note. For example, a mechanical
performance of a dotted quaver (dotted quarter note) followed by a semiquaver (sixteenth note) lasting for 1 beat is divided into the relative
temporal duration 0.75 of a beat and 0.25 of a beat respectively. In practice, however, this ratio is rarely found. The second authors' own
findings using digital sampling and analysis techniques has shown that across a sample of 30 different performances of the opening of
'Variation 7' of J. S. Bach's Goldberg Variations, dotting ratios were consistently greater than 0.81 (Fabian, 1998).
Some ambiguity in the actual measured ratios and the perceived dottedness was also noticed. Some performances sounded more dotted when
the duration of the third note in the group was shortened (Figure 1d, described in more detail later). The finding pointed to a possible illusion
which had not yet been discovered. Consequently, we designed an experiment to test whether there was an illusion of dottedness when the
third note was shortened, or 'kerned'. The methodology was based on an experimental psychology approach. Since this is not an approach
used in traditional musicology, the experiment was also designed to address the issue of the suitablity of the methodology in addressing
performance practice issues.
Figure 1. Transcription of four hypothetical performances of the right hand part of first two bars of 'Variation 7' from J. S. Bach's Goldberg Variations.
KO - no kerning
Kerning
The temporal gap between a dotted note and the short note which follows has been documented in the literature and is a known performance
technique. It is referred to as silence d'articulation (noun) after Quantz (Quantz 1752, Dolmetsch 1949). However, its use in the third note of
a group of three in 6/8 has not received attention in scholarly writing despite its prominence in the recordings analysed. To investigate the
effect of shortening the first or third note of the group of three notes in 6/8 metre, we defined the quantification of this auditory gap as the
'kerning' (verb) of the note. Mathematically:
Kerning = IOI - Duration
Where IOI is the interonset interval between the first and second note, and Duration is the length of time for which the first note is sounded.
Two examples of kerning are shown in Figure 1c and d.
Aim
In this paper we report the findings of a study in which participants made judgements about the perceived 'dottedness' of a 6/8 musical
pattern. In particular we investigated whether the participants were susceptible to the hypothesised 'kerning illusion', where the dotted pattern
in 6/8 is perceived as being more dotted if the third note (quaver) is kerned (eg., see Figure 1d) .
Method
The dotting ratio and kerning was manipulated in the context of a 6/8 metre, based on the opening of 'Variation 7', meaning that the metric
unit consists of an additional quaver as a third member of the dotted quaver-semiquaver pattern. Given the range of tempi found in real
performances, the tempo of the test stimuli was also manipulated to encompass the extreme tempos found in the sample initially analysed.
The design of the experiment was 2 x 3 x 2 and consisted of the manipulation of a MIDI sequence recorded from a student performance as
shown in Table 1.
Table 1. Independent Variable Manipulation
Kerning no kerning K0 a
Participants
40 people took part in the study. Most were music students at the University of New South Wales while others were friends and colleagues
of the authors. All were reasonably or very experienced listeners of Baroque music. Participation was voluntary.
Stimuli
Stimuli were produced by a harpsichord major music student performance on a Roland JV-35 touch sensitive keyboard recorded as a MIDI
sequence (using Cubase VST 4.1 software). The MIDI file was manipulated to produce the 12 test stimuli described in Table 1. These
sequences were converted to QuickTime sound files using the Harpsichord 1 sound on the Roland JV-35 General MIDI sound module.
Sound Files were presented to the participants and controlled by QMaker software (Schubert, 1999). Two additional 'real' recordings of the
same passage were converted to sound files: One by Ralph Kirkpatrick (1959) and the other by Gustav Leonhardt (1965)
Procedure
Each participant sat at a Macintosh Computer (PowerMac 8500) and completed a preliminary questionnaire. They received training
Results
Repeated measures t-tests demonstrated no significant difference between first and second presentation for each stimulus (alpha = 0.05, df =
39). The data were then analysed in two ways, once with responses collapsed by dotting, tempo and kerning levels, and again by comparing
responses across each of the 14 examples (initial data retained, not averaged).
Analysis of variance produced significant differences along each of the three independent variables investigated with no significant two of
three way interactions (alpha = 0.05). Post-hoc Tukey HSD tests were performed to determine how the independent variables affected
dotting response.
Overdotted examples were rated as being more dotted than mechanical performance. This unsurprising finding provides support for the
validity of the experimental design. A more interesting result is that K2 examples (third note shortened) were rated as the most dotted
relative to K1 (first note shortened) and K0 (neither note shortened), supporting the authors' thesis that there exists an illusion of dottedness.
Finally, faster tempo examples were rated as more dotted.
A second analysis consisted of a Friedman K-Related samples test on difference in perceived dotting piece by piece. Again there was a
significant overall difference (at alpha = 0.5, Chi-Square = 40, df = 13). The significant contrasts for dottedness ratings are shown in Table 2.
All D2 (overdotted) examples received significantly higher mean ratings and were rated as significantly more dotted than the D1
(mechanical) examples. Third note kerned examples (K2) all received positive mean dottedness ratings, and were rated as more dotted when
all other variables were held constant. For example K2D1106 has a higher mean dottedness rating than its K0 and K1 equivalents. While
none of these K2 examples on their own were rated as significantly more dotted than their K0 and K1 counterparts, the grouped effect of the
first analysis points to an important trend.
Finally, the mixing of the real recordings (by Kirkpatrick and Leonhardt) among dottedness responses further supports the validity of the
design and the selection of dotting ratios because they were positioned near the expected location with respect to the other examples.
This study points to the possible existence of an illusion which occurs in 6/8 dotted patterns. When the third note of the first dotted crotchet
beat is truncated or kerned, the listener perceives the group as sounding more dotted. The illusion could be due to the brevity of the third note
itself making the beginning of the next group sound delayed and hence more dotted. Alternatively, there may be some kind of high level,
backward temporal masking which interferes with the perception of the first note. Neither of these explanations provide a mechanistic,
functional nor structural explanation of the illusion. If the kerning illusion continues to be replicated, an associated challenge is to find a
tenable explanation of it.
Further research is required to demonstrate the reproducibility of the result. For example, it needs to be determined whether this effect is an
illusion or an ambiguous figure. One issue that requires addressing is whether the response of musically less sophisticated participants is the
same as those represented by the sample.
The expected perception of dotted stimuli, and the generalised responses to the 'real' stimuli support the validity and reliability of the
reductionist, controlled approach we chose in investigating this performance practice issue. Further work is now required to see how these
findings can be used to inform duple and quadruple metre performance practice theory. Also, the major issue of performance style needs to
be investigated, since an important aim of studying performance practice is to identify the underlying principals which determine a
stylistically appropriate performance (Fabian, 1998; Fabian Somorjay & Schubert, this volume). By the same token, the perception of dotting
of non-Baroque musical contexts, such as marches, dances and popular music, also provides a rich area for further research.
The discovery of the kerning illusion, if it is indeed an illusion. has important implications for musicology. First, it demonstrates that dotting
theory proposed in duple and quadruple metre cannot be generalised to compound metres. No past theory predicted anything like the kind of
dotting response we noticed (and supported experimentally) using the 6/8 pattern of 'Variation 7' of the Goldberg Variations. Second, the
perceptual approach to investigating musicological questions can help to inform and, we argue, drive musicological theory. Specifically, we
posit that perceptual, reductionist experimental designs provide an appropriate methodology from which musicological study, such as the
present case of performance practice issues, can be greatly enriched. It might be argued that the present study is far removed from music
because of the brevity and the highly controlled manipulation of the stimuli. While the criticism may be true, it is also true that the rather
individualistic approaches seen in traditional musicology lack the methodological rigour of the present approach. Indeed, the present
experiment aimed to address a question that arose from a thorough musicological investigation (i.e. the study of longer/complete and not at
all controlled stimuli). Further, we stand by the view that many kinds of methodologies should be embraced in developing deeper insights
into musicological issues. We argue that interdisciplinary methodologies should be used to inform one another, with the aim of finding
convergent evidence and substantial theoretical frameworks.
Acknowledgement
The authors are grateful to Kate Stevens and members of the Australian Music Psychology Society (AMPS) for their comments on an earlier
draft of this work.
References
Bach, C. P. E. (1753, Facsimile: 1969). Versuch über die wahre Art das Clavier zu spielen Leipzig: Hoffmann-Ebrecht
Couperin, F. (c.1716, Facs.: 1977). L'Art de toucher le clavecin Leipzig: Breitkopf und Härtel
Dolmetsch A. (1949). The interpretation of the music of the 17-18th centuries. London: Novello (1st Ed. 1915)
Donington R. (1989). Interpretation of early music. (1st Ed: 1963)
Fabian, D. (1998). J. S. Bach recordings 1945-1975: The Passions, Brandenburg Concertos and Goldberg Variations - A study
of performance practice in the context of the early music movement. Unpublished Doctoral Dissertation, University of New
South Wales, Sydney
Neumann, F. (1982). Essays in performance practice. Ann Arbour: UMI RP
Neumann, F. (1993). Performance practices of the 17-18th centuries New York: Schirmer
Quantz, Johann Joachim (1752). On playing the flute (Eng. trans. Reilly, E. R.) New York: Schirmer
Schubert, E. (1999). QMaker 1.02 [computer software]. Sydney, Australia. Author.
Sound Recordings
Kirkpatrick, R. (1959). CD re-issue1994 DGG Classikon 439 465-2
Leonhardt, L. (1965). CD re-issue 1995 Teldec DAW 4509-97994-2
Back to index
Proceedings paper
Participants listened to pure tone sequences. The standard unaccented version of the sequences consisted of twelve pure
tones of fixed frequency. Each tone had a duration of 210 ms. This duration contained a 10 ms rise time and a 10 ms
decay time. The tones were separated by silent intervals of 50 ms, resulting in IOIs of 260 ms. Pitches of the tones for a
sequence was selected randomly from among the chromatic pitches within a one octave range. Accented versions of the
sequences were prepared by increasing the intensity of four of the twelve tones by 4.5 dB. In the regular version of the
accented sequences two consecutive tones with higher intensity always had two tones with the standard intensity
between them. In the random version of the accented sequences the four tones in a sequence were selected randomly to
be the accented tones, with the restriction that they would not make up a triple rhythm.
The signal detection task also required sequences in which the IOIs were unequal. In these sequences four of the tones
came either slightly late or slightly early. In Experiment 1, in the case of positive deviations these deviant tones followed
the preceding tone with an IOI of 285 ms and the consecutive tone followed it by an IOI of 235 ms. Thus only the
deviant tone came late by 25 ms. In the case of negative deviations the deviant tone followed the preceding tone by an
IOI of 235 ms and the consecutive tone followed it by an IOI of 285 ms. Thus the tone came early by 25 ms. In the
accented sequences, the temporally deviant tones also had higher intensity. Both the accented and the unaccented
sequences had regular and random versions depending on the distribution of the deviant tones. In the regular sequences
each consecutive pair of deviant tones were separated by two tones with standard timing. In the random sequences four
tones were selected randomly to be the deviant intervals, with the restriction that two deviant tones would not follow
each other.
In Experiment 2, in the accented sequences the tones that followed the temporally deviant tones, rather than the
temporally deviant tones themselves, had higher intensity. Since positive deviations meant longer IOI preceding an
intensity accent, if it was present, a longer (285 ms) IOI was followed by a shorter (235 ms) IOI in sequences with
positive deviations. The shorter IOI was terminated by a higher intensity tone in the accented sequences. In the
sequences with negative deviations, on the other hand, a shorter IOI was followed by a longer IOI. The creation of the
regular and random versions of the sequences was identical to Experiment 1.
Apparatus
Participants were placed in a sound attenuated booth during the experiment. Creation and presentation of the stimuli, and
recording of responses were controlled by an IBM compatible computer equipped with a Creative SoundBlaster 16
Value sound card. Participants heard the stimuli through a Technics SU-V300 amplifier and Telephonics TDH-39P
earphones.
Procedure
Each participant took part in eight experimental signal detection sessions. The experimental sessions were preceded by
two practice sessions. Each session consisted of 72 trials. In each trial one sequence was heard. In 36 trials of a session
the duration of all silent intervals in the sequence were the same. In the remaining 36 trials the sequence contained
deviant intervals. Sequences with and without deviations in a session were ordered randomly. The task of the participants
was to determine whether the duration of all the time intervals separating consecutive tones were the same or not for
each sequence. The participant initiated the trials by pressing any one of the keys on the computer keyboard. Participants
responded by pressing one of two keys assigned to the "Same" and "Different" responses, respectively. Feedback about
the accuracy was given visually on the computer monitor after each response.
The eight experimental sessions were created by factorial combinations of three variables: (1) The sequences were either
unaccented or accented. (2) In case of unequal IOIs, the deviations were either positive or negative. (3) The deviant
intervals were arranged regularly or randomly. In each experimental session the sequences with equal IOIs and unequal
IOIs were similar in all other respects. The order of the eight experimental sessions was changed from participant to
participant according to a Latin square.
Each one of the two practice sessions also consisted of 72 trials. One practice session included unaccented sequences
only and the other included accented sequences only. All variations of these two types of sequences were sampled with
equal probability in the two respective practice sessions. The number of sequences with equal intervals and the number
of sequences with unequal intervals were equal in each practice session.
Results
Sensitivity was calculated for the eight conditions in Experiments 1 and 2 separately. Average sensitivities are given in
Figure 1. In both experiments the main effect of regularity was not significant [F(1, 31) < .01 for Experiment 1 and F(1,
31) = 2.73 in Experiment 2]. Thus, the alterations in the stimuli did not create an effect in favor of regularly distributed
deviation but it eliminated the effect in favor of randomly distributed deviations, which was observed in the earlier
experiment. The main effects, or lack thereof, had be considered in the light of a possible interaction of the effects of
regularity and accenting. In both experiments sensitivity was higher for regular than for random deviations with the
accented sequences but the difference was in the opposite direction with the unaccented sequences. This interaction
approached but fell short of significance in both experiments [F(1, 31) = 2.9, p = .098 for Experiment 1 and F(1, 31) =
3.88, p = .058 for Experiment 2].
Because of the similarity of the methods and stimuli the data from both experiments were analyzed in a single ANOVA.
In this analysis the only effect involving experiment that reached significance was the experiment by direction of
deviation interaction [F(1, 62 = 14.25, MSe = 0.48, p < .001]. The difficulty of the positive and negative deviations was
reversed across experiments. This is not surprising considering the sequences with positive deviations in Experiment 1
were identical to the sequences with negative deviations in Experiment 2 except for the placement of the higher intensity
tones, and vice versa.
In the analysis of the combined data from the two experiments the regularity by accenting interaction reached
significance [F(1, 62) = 6.72, MSe = 0.43, p < .05]. Thus, although regularity did not always lead to better performance
[F(1, 62) = 1.06 for the regularity main effect] presence of accents mediated such facilitation. This was consistent with
the dynamic attending theory.
Discussion
The unexpected negative effect of regularity in earlier work could be eliminated by changing the stimuli such that
detection of temporal irregularities depended on noticing temporal alteration of single sounds and not local changes in
tempo. It was found that combination of regularity with presence of accents provided a small advantage of regular over
random distribution of deviant time intervals. This effect was still small and the critical interaction required combining
the data from two experiments in order to reach significance. However, the finding was consistent with the theory of
dynamic attending. Addition of regularly distributed accents had a positive effect on detection of small temporal
variabilities and addition of irregular accents had a negative effect on it.
References
Boltz, M. & Jones, M.R. (1986). Does rule recursion make melodies easier to reproduce? If not, what does?
Cognitive Psychology, 18, 389-431.
Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory.
Psychological Review, 83, 323-355.
Jones, M. R. & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review, 96, 459-491.
Jones, M. R., Boltz, M., & Kidd, G. (1982). Controlled attending as a function of melodic and temporal context.
Perception & Psychophysics, 32, 211-218.
Jones, M. R., Summerell, L., & Marshburn, E. (1987). Recognizing melodies: A dynamic interpretation. Quarterly
Journal of Experimental Psychology, 39A, 89-121.
Tekman, H. G. (August, 1996). Effect of regular and irregular accents on detection of variations in tone sequences.
Paper presented at the 4th International Conference on Music Perception and Cognition, Montreal, Canada.
Back to index
Proceedings paper
Introduction
Timing in this narrower sense can be considered on several levels. The most
general level would be the average tempo (the reciprocal of the total time) of
a performance. Also, there may be tempo changes for sections of a piece (like a
new tempo indication, or piu or meno mosso). On an intermediate level there are
tempo deviations which affect a few successive notes in the same direction and
which are perceived as ritardando or accelerando. At last, as only modern
performance research has revealed, there are oscillating "micro" deviations
from the theoretical (mathematical) durations of successive tones which are
perceived not as tempo changes, but as making up the "correct" rhythm. These
micro deviations are present everywhere in a performance, and they have a
systematic character: they are repeated in parallel (analogous) places of a
piece, in repeated performances by the same player, and in performances by
different players (see, e.g., Gabrielsson 1987, 90 f.; Palmer 1989, 336 f.,
342). Therefore one can try to find rules for these deviations (see, e.g.,
Friberg 1991). So in the first 8 measures of Mozart's K. 331, a theoretical
duration relation of 3:1 tends to be sharpened - the second note is shortened
relative to the first note -, and a 2:1 tends to be flattened - the second note
is lengthened (Gabrielsson 1987, 83, 92 f.).
Mazzola and Beran (1998, 1999) analysed the IO's in the 28 performances of
Schumann's "Traeumerei" (op. 15, 7) provided by Bruno H. Repp. The properties
of the score to which they related the IO's were (1) tempo prescriptions
The weights are described by the authors as follows (1999, 52): "A note in the
score is metrically important if it is part of many [repeating metric
structures]. ... Essentially, a note is considered melodically important if it
is part of motifs that are similar to many other motifs that occur elsewhere in
the score. Finally, the harmonic analysis gives higher weights to notes that
are harmonically important in the sense of an extended classical theory of
harmony (Riemann theory)." Let us add that the characterisation of a motif
comprises only the "melodic" (pitch) and not the metric (duration) aspect
(1998, 46) which is treated separately - presumably in order to reduce somewhat
the enormous combinatorics involved in the motivic analysis. (A further
reduction of the melodic aspect from the size to the mere direction of the
intervals was envisaged by Mazzola on other occasions.) In fact, the
computation of the weights "involve[s] a large number of combinatorial
calculations. For example, the motivic calculations exceed any reasonable
amount of calculation if handled with ideal boundary conditions. The same is
true for harmonic weights." (1999, 78.)
The weights often vary strongly between successive events, and it was
considered appropriate by the authors to use them also in smoothed forms. The
value of a smoothed weight for an event was the mean of the weights within a
symmetrical window around the event (in fact, a weighted mean with the weight
decreasing linearly with increasing distance from the event at the centre of
the window - 1999, 60; 1998, 49 f.). Larger window widths correspond to
stronger smoothing. The authors applied widths of 8, 4, and 2 bars (1999, 67).
In addition, the first and second time derivatives of these weight functions
were used, as approximated on the basis of the finite time differences (1999,
67).
attributed, according to the model, to the independent variables. For the mean
performance the authors obtained R²=0.84 (1998, 54). This means that the
similarity between the timing of the real performance and the version
constructed by the statistical model from the properties of the score was
R²=0.84 or R=0.92. For the 28 individual performances R² ranged between 0.65
and 0.85 (1999, 69; 1998, 54).
It is good to remember that some trivial success is certain for any model even
if there is no true relationship at all between the dependent and the
independent variables. This is so because the sample correlations fluctuate
around their true values, and the model exploits these random correlations. Let
n=sample size, then, in the multinormal case, the expected sample value of R²
when its true value is zero is given by E(R²)=df/(n-1) (see Kendall and Stuart,
1973, p. 354, formula (27.84)). In the present case there were n=212 events
(including the repetition of the first 8 bars - 1999, 69) to which the
regression model was applied, so with df=57 an R² of approximately 57/211=0.27
(or R=0.52 !) is to be expected in any case and would figure as completely
insignificant.
The formula shows that the trivial success in terms of R² can always be
increased by choosing an appropriate value of df (by using more independent
variables - no matter what they are like). For df=n-1 (i.e., if the number of
nonredundant independent variables is equal to the number of data points minus
one) we have E(R²)=1 (=R² in this case): such an excessively rich model always
fits exactly.
The Mazzola-Beran approach requires the RUBATO software with the implemented
mathematical definitions of the weight functions, and it requires (as the first
author stated in a radio interview) hours or days of computing time even if it
is not "handled with ideal boundary conditions", i.e., if the length of the
metric and melodic patterns taken into consideration is severely restricted.
Therefore we wanted to test whether a similar R² can be obtained on the basis
of a simple surface analysis of the score which any musician can easily give,
and which requires only a negligible amount of computing time.
Method
Besides the ritardandos and the fermata prescribed in the score (excluding the
second fermata, the one on the final chord which, for want of a successor, does
not have an IO) we chose the metric figure of which an event is an element,
restricted to the two predecessors and two successors of the event. This is a
very narrow window as compared to the Mazzola-Beran windows of up to 8 bars.
Furthermore, the RUBATO weights reflect how often a metric or melodic pattern
is repeated in the whole piece, whereas our analysis never extends beyond the
second predecessor and the second successor of an event.
indicator variables. Now while Mazzola and Beran use quantitative variables
that can take an infinity of values, our indicator variables take only two
values and thus convey much less information, so that also in this respect they
are weak competitors of the Mazzola-Beran variables.
The next and last independent variable in our model is the interval which the
melody voice has taken in reaching the present tone. (If the melody voice held
the same tone from the preceding event, a zero interval was coded, as for the
case that the melody voice repeats the tone.) This again yields a qualitative
classification (decomposable into indicator variables), for we did not want to
prejudge a linear effect of the intervals.
Results
Our model was applied to a data set which (like the one used in Repp 1992) did
not contain the repetition of the first 8 bars, but instead the mean of the two
executions. Taking the mean reduces the residual variance; for independent
residuals, the variance is halved. In order to make the two models comparable,
we ran a simulated analysis with doubled residual variance for the first 8 bars
and two successive executions of these to obtain as many data points as
Mazzola-Beran and corrected values for our R². This correction strategy is
pretty radical in view of the fact that the two executions (mean performance -
including simulated identical executions for two pianists who did not repeat)
correlate no less than 0.987 (Repp 1992, 2552), which means that they were
practically identical and free of random errors, so that the residuals in the
regression models were in fact predominantly not random errors of repetition,
but systematic deviations from the model so that their variance would not have
to be doubled.
Using only the ritardandos and the first fermata (the second one is on the
final chord which, as remarked earlier, has no IO) led to R²=0.54 (uncorrected:
0.70). Adding the metric figure led to R²=0.83 (uncorrected: 0.87), and using
also the melody interval led to R²=0.90 (uncorrected: 0.91) with df=55, i.e.,
with 55 nonredundant independent (in our case, indicator) variables.
Mazzola-Beran had obtained R²=0.84 with df=57.
R²=0.65...0.85 (Mazzola-Beran).
Discussion
It appears that the success of the regression model based on the RUBATO depth
analysis of the score is surpassed by a model based on a very simple surface
analysis of the score and requiring only a negligible amount of computing time.
Since R² is remarkably high in both cases, it seems appropriate to discuss
possibilities of extending the analyses presented here to come closer to the
distant goal sketched in the last paragraph of the introduction.
weights, new pieces do not introduce new independent variables. The question,
of course, is how far the weights have the same effects on timing in different
pieces; and ritardandos, say, will not always be executed in the same manner as
in "Traeumerei". For our model, on the other hand, new pieces will introduce
new metric figures and melody intervals till saturation is reached.
When pieces with widely differing tempos are considered, it cannot be assumed
that a metric figure defined by the same notes is performed in the same way
e.g. in largo and in presto. It will be necessary to introduce a variable for
the approximate absolute speed at which, say, quavers are played (or for the
metronome indication for quavers). This variable will again be defined as a
qualitative classification of the tempos (decomposable into indicator
variables) in order not to prejudge a linear effect of the overall tempo.
Even then the overall tempo of a piece will hardly suffice to determine the way
metric figures are executed. Of course, if the data base is large enough, the
surrounding metric figure upon which the IO of an event is made to depend can
be extended to include more than two predecessors and successors of the event.
And, of course, further properties of the score can be introduced, e.g., a
harmonic analysis of the chords, or progressions to different keys.
Once a large variety of pieces (and interpreters) have been covered, one could
try to predict a new standard or particular performance, i.e., to generate it
on the basis of the parameters of the model as estimated from different pieces,
and to compare it, in terms of correlation, with the real standard or
particular performance. If many such attempts prove satisfactory, one might
dare to produce a synthetic performance of a piece not played by the
interpreter in question or by any one - if the piece is sufficiently similar to
those from which the parameters have been estimated. In principle it will also
be possible to generate a new style of interpretation (an artificial
interpreter) - if the parameters characterising existing interpreters are
sufficiently systematised so that meaningful interpolations or extrapolations
become possible.
The main practical problem of such an approach is to obtain the IO's. Disk
recordings are innumerable, but obtaining the onset times from wave data (as
done by Repp for "Traeumerei") is very laborious and does not seem to have been
satisfactorily automated thus far. Obtaining onsets from MIDI data is simple
and automatable, but the data base is scarce. With a recording device and
skilled volunteers, one can obtain such data. But if one wants to analyse
performances of well-known interpreters in non-wave format, the only - and
quite limited - source would seem to be the Yamaha disklavier recordings. They
are not in MIDI format, but the disklavier is reported to have a MIDI output.
Acknowledgements
References
Gabrielsson, Alf (1987): Once again: The theme from Mozart's piano sonata in A
major (K. 331): a comparison of five performances. In A. Gabrielsson (ed.),
Action and Perception in Rhythm and Music. Stockholm: Publications issued by
the Royal Swedish Academy of Music, no. 55, 81-102.
Kendall, M. G., and A. Stuart (1973): The Advanced Theory of Statistics, vol.
2, 3rd ed. London: Charles Griffin.
Back to index
Proceedings paper
Introduction
The recognition of rhythmic structures is a task that is mastered by music listeners every day without
effort. Musicians and music teachers easily realize and describe rhythmic structures. Yet it is not clear
how this task is accomplished. In Osnabrück at the Forschungsstelle Musik- und Medientechnologie
music tutorial programs are being developed. The development of these interactive music tutorial
programs brought the need for an automatic recognition of rhythmic structures, since many other
musical structures like melodies, cadences, and musical form incorporate rhythm. So a method for the
recognition of rhythmic structures is necessary for a wide range of interactive musical applications.
Most studies on rhythm perception, cognition and production focus on how a person would typically
perceive, play or memorize rhythmic structures. In interactive music applications the focus must be on
an highly flexible and fault tolerant reaction to user input. The question concerning user input is not
`How would a listener normally perceive this rhythm?' but `How can the system make sense of it?'.
Knowledge about rhythm perception is useful and necessary, of course, and should be used by the
system. But we do not have complete and exact knowledge of this field and the knowledge we have
needs to be used in a way that makes it work for the application task.
It is also desirable to make use of the knowledge of human experts. A recognition system should be
open to integrate explicit knowledge as well so learning from implicit knowledge encoded in examples
provided by experts. Fuzzy Logic systems provide a way of using incomplete and inexact knowledge
and if combined with neural networks it can learn to change its behaviour by examples. Compared to
neural networks a neuro-fuzzy system provides the additional advantage of making the change by
learning interpretable. It need not have the black box effect of a neural network.
Concept
The aim is to develop a system that performs an analysis of a user's rhythmic performance as input
(per MIDI) for a given task. Output should be an assignment of structures and information on the
similarities and differences. The system should make use of results from music theory and empirical
findings. Based on these 'clues' the system should find the adequate matching of rhythmic structure.
Also it should be able to learn from examples provided by an expert e.g. a music teacher. Thirdly it
should have an open architecture for the integration of further dimensions (e.g. pitch) and further
implicit and explicit knowledge.
Such a system should have certain features in concern to rhythm recognition. It should be tolerant of
expressive or inprecise timing. Similar to beat-tracking systems it should follow changes in tempo. It
should also be capable of following structurally distorted patterns, like a score following system. Yet it
should not only find synchronization points but a structural description for the whole input.
Our way to achieve this is to combine a segmentation and assignment-process with an extraction of
parameters that are rated by a fuzzy-logic system. This system can be trained by an expert user by
providing better solutions where the system makes mistakes. For training the fuzzy-logic system is
transformed into a neural network and trained with a modified backpropagation algorithm.
Using and training the system involves five stages of processing:
● segmentation
● assignment of groups
The overall rating can be interpreted as a measure for the similarity of rhythms. We have not yet
integrated an explicit model of beat or meter. If we have a musical beginner playing, cues for metrical
structure may be weak or inconsistent thus leading to inadequate results. So a system for this
application context should rather rely on the figural quality of the rhythmic patterns ([Bam80]).
Metrical information may help to improve system performance but is does not seem essential at the
current stage of development.
It is generally agreed on that perception and memory of sequences of discrete events are organized in
groups, like letters or phonems are grouped to words ([PP97], [Slo85], [Dem98]). Another piece of
evidence for this assumption is subjective rhythmisation. Even completely uniform isochronous
sequences of sounds are perceived as groups of tones with different accents ([Fra82], [Deu86]).
Grouping enables efficient and structured storage of rhythms in short and long term memory,
especially of repetitions. The segmentation into groups is essentially an automatic process that seems
to be ruled (among other factors) by physiological constraints for instance the capacity of sensory
memory, the numeral and timely capacity of the short term memory and the lower boundary of timely
discrimination of events ([Pöp89], [Sch97]). Groups as learned schemata can have an influence on the
perception of rhythms ([Swa86], [Bre90]). Although there is a considerable amount of research done
on grouping, a generally accepted model for the process of grouping auditory events has not yet been
developed.
Hierarchical levels
For the comparison of rhythms it is necessary to detect and qualify differences. There can be structural
differences, like missing events, as well as differing temporal or qualitative relations of events, like
longer notes, shorter notes or groups played faster or earlier or in differing order. We concentrate on
the onsets of the notes, since the length has less relevance for the rhythmical structure ([Han93],
[Slo85]). We have two levels of comparison:
● the comparison of rhythmic groups in respect to the temporal temporal structure of their
elements or - musically spoken - the comparison of rhythmic motifs
● the comparison of the temporal placement of groups meaning their oder and their relative
position or - musically - comparing the rhythmic structure of a phrase or a theme
There are of course more hierarchical levels up to a whole work or song ([LJ83], [Cla87]), but these
levels are not considered here.
Which grouping of events and which relation of groups is appropriate yields from matching the input
groups with the task groups. Since a group match affects the grouping of the other events the grouping
process cannot be separated from the matching process. Humans perform this matching task in real
time, patterns are matched to known patterns or stored as new patterns. How this process is performed
is not exactly known yet. Is clear that some kind of structuring has to take place in real time since
memory and processing capacity is limited ([Cow84], [Sch97]).
Neuro-fuzzy systems
As stated before the basis of rhythm production, perception and cognition is only partially known or
knowledge is vague, we know of some factors that they support certain effects but cannot quantify
them for arbitrary situations and cannot reduce them to a simple `if - then' condition. But often we do
know tendencies that some factor influences a certain consequence to some extent. Especially in
rhythm we can not as reduce statements to binary logic as easily as in e.g. harmony since the values
dealt with are not discrete. In actual computation they are discrete but represented in high resolution so
as to eliminate influence of discretization. Even if we could find absolute values at which a rhythmic
interpretation switches over, these values would change with the context. So it seems more adequate to
find different parameters and model their dynamic interaction.
The assignment of truth values to a set of rules in the form and (facts) , that is not zero
only for a finite number of rules defines a Fuzzy-Prolog program. Other rules can be derived from
those by using the modus ponens generalized for Fuzzy-Prolog:
When trying to model how the rules contribute to a output in a Fuzzy-System, the rules must be
weighed individually. These weights can be ad hoc estimates, e.g. derived from literature, but they
have to be adjusted by trial and error. The idea is now to automate this process by optimizing the
weights using sample ratings. Here we can make use of the neural net paradigm which is structurally
similar to Fuzzy-Systems. It is shown in [NKK96] that Fuzzy-Prolog programs can be transformed
into feed-forward neural nets and trained with the backpropagation algorithm if they meet some
constraints which do not mean any real limitations of the system design.
The problem with training a ANN by sample expert ratings is that an consistent absolute rating of
examples is hard to achieve. The correct relations become apparent possibly only after comparing
many examples, which may make a re-rating of many examples necessary. A direct decision wich one
of two examples is better is usually easy to achieve. In this situation it is desirable to train the net by
relative ratings. This can be done for feed forward nets using gradient descent training if the net is
duplicated, one for each example of the pair. This method is described in detail in [Bra97].
The program RhythmScan is an experimental implementation of our neuro-fuzzy system for rhythm
recognition. It is meant to be used by students and teachers. The student plays the input for a given
task on a keyboard. This can be a MIDI keyboard or the computer keyboard, but the latter provides no
velocity data which reduces the input by one dimension. The input is analyzed and the teacher
provides the corrections of system output for training the system. The program computes an analysis
of the input which states the assignment of groups, the assignment of notes within the groups and
information about the differences between the input and the task. If the teacher finds an inadequate
assignment of the rhythmic structure he or she can provide a better assignment which is stored as a
training sample pair. With these samples the program can train the neuro-fuzzy system. The training
optimizes the weights of the rules in the Fuzzy-Prolog program to give the assignments by the teacher
a better rating than those produced by the system before. If the program produces new mistakes after
the training, the teacher can again provide better assignments. By this procedure we hope to iteratively
improve the overall performance of the system. Also it might be of interest to examine the weight
settings produced by the training process since the weights can be interpreted directly to the role the
corresponding parameter plays in the (machine) recognition process.
Group assignments
The computation starts with the segmentation and assignment algorithm (SAA). The SAA first defines
coversets, i.e. sets of groups for the task and the input. Currently the only restriction on the structure of
the coversets is that the number of note must be at least 2 and 5 at the most. This is motivated by the
findings that groups in memory have at the most 7 to 9 elements, but already with more than 4 to 5
elements the sequence recall is getting less reliable[Mil56]. The preferred length is between 2 and 5
elements, especially in music longer groups are subdivided into smaller units. This number like all
constants in this system are tentative and still subject to experimental changes. It is desirable to
incorporate more constraints for segmentation, especially to reduce computation time but also to give
the system more clues as to which assignments are good ones. Still the groupings should not bee too
restricted to keep the system flexible.
Note assignments
With the coversets for both task and input a group of the task is assigned to each group of the input. It
is possible to assign an input group to no task group to deal with notes that are unrelated to input e.g.
produced by problems with the MIDI instrument or by gross errors. Then the notes within the groups
are assigned. Two constraints are imposed on the note assignment:
● every input note and every task note can only be assigned once
● serial order must be respected, i.e. if two input notes notes a and b and two task notes c and d
are assigned b-c and b-d then if b is after a, d must be after c.
The first constraint ensures that additional notes will be marked as such and not as timing deviations.
The second rule is obvious with plain rhythm. It might have to be changed when the system is used
with melodies where order information is provided by pitch.
The rules are similar to those outlined in [EW96]. Since they work well so far they have not been
included into training by the neural net yet. Four parameters are extracted for each input group:
● group correctness: It is allowed to leave up to two notes unassigned. These unassigned notes are
regarded as structural errors where different rules apply to errors that leave the rest of the notes
unchanged in structure or errors that involve shifting the following notes on the time axis. The
first type can occur for instance by hitting a wrong key and moving quickly to the correct one
without losing tempo while the second type is mostly a reading or memory error which usually
involves a metrical shift as well.
● group tempo: An input group's tempo is retrieved by calculating all tempo variants based on
taking two assigned notes to define the relation to the input pattern. All these tempo variants are
considered when searching the best assignment of notes within the groups. The quality of a
groups tempo is calculated from its stability in respect to the tempo of the last group (or the
given tempo for the first group) and its plausibility, i.e. if the tempo is within the range the range
of
● group precision: This value is calculated from the sum of squared deviations of the assigned
notes (cleared of tempo position differences of the groups)
● group position: The position relative to the expected beginning as calculated from the group
before or the initial tempo and position. Here metrical aspects are implicit if the task is
metrically structured.
The whole structure of coversets, group assignments and note assignments is called a coverset
assignment. When the best coverset assignment assignment is found we can determine the tempo per
group the deviation of the groups to their expected positions, the deviation of the notes. This
information can then be used to generate an adequate reaction for the user.
Terms that do not appear on the left hand side of a rule are facts. Their truth values are calculated from
the input data. The facts in rule (6) aeqb( ), aeq2b( ), aeqb2( ), aeq3b( ) und aeqb3( ) compare
the tempo of the input group with the task group. Since errors may lead to double or half tempo or
even triple or third tempo the last four rules are introduced, but their weight is (initially and after
training) much lower than that of the rule checking for identical tempo.
The plausibility of a group's tempo (GTpoPlsbl) states whether it is within the range musically sensible
tempos. We use a tapezoid function which decreases above 200 and blow 60. This is a preliminary
solution. It might improve results to use the a more elaborate model like [Par94].
The precision value (GPrcsn) is calculated as the sum of squares of the deviations of the assigned
notes. The correctness (GCorrect) is computed from the error values for unassigned notes. The order
(COrder) of a coverset assignment is calculated by counting for groups that are missing, added or in
wrong order.
The the overall rating of a coverset assignment (ca) (3) is combined from the four parts tempo,
correctness, precision and order. The rating for the tempo is calculated from the corresponding group
ratings (4). Those are combined from stability and plausibility (5). Tempo stability is combined from
the conjunction of the comparisons of the five variants (6). The correctness (8), precision (7)and
relative position (10) are combined from the conjunction of the corresponding group values.
These functions allow for an amount of compensation between the operands that can be adjusted by
the q parameter. For these functions we use a q value of 2. For , approaches min and
approaches max.
● The the evaluation functions instead of weighted sums make a change in the standard
backpropagation algorithm necessary.
Discussion
The RhythmScan program is still work in progress and has to date not been tested thoroughly and
systematically. First experiments show some strengths and weaknesses. The assignment of notes
within the groups works well so far. The assignment of groups did not always lead to correct results,
which is why we used the neural net for this part of the system. False assignments could be eliminated
by training so far. It might of course be necessary to extract more parameters in the preprocessing that
can be used by the net. The principle of optimizing with relative ratings is absolutely necessary to
make optimization feasible. So the expert (music teacher) can enter his preferred assignment if the
system makes a mistake without having to take care of the consistency of ratings which is practically
impossible.
A problem is at the current stage of development that, due to combinatorical explosion in the
segmentation and assignment process, computation takes too long for the use in interactive systems for
all but very short patterns. Here optimization is necessary especially for the use in real time systems.
The system seems to reflect relevant aspects of rhythmic structure. It recognizes rhythmic structures
without explicitly modeling metrical aspects. Nevertheless a model of metrical aspects should be
integrated in the future. Feedback about deviations of structure and tempo are useful when the group
assignment is correct but greater reliability is needed for practical use. When that is achieved the
system should be able to aid learners in interactive music tutorials. Provided an increase in
computational efficiency the system could also be used for other tasks like automatic notation or meter
recognition if the task is replaced by a library of rhythmic or metrical patterns or an ad hoc pattern
generator. Although it is not primarily a model of perception it could be used for testing hypotheses
and discovering factors in grouping and timing of musical events.
References
Bam80
J. Bamberger. Cognitive structuring in the apprehension and description of simple rhythms.
Archives of Psychology, 48:177-199, 1980.
Bra97
Heinrich Braun. Neuronale Netze. Springer, Berlin Heidelberg, 1997.
Bre90
A. Bregman. Auditory scene analysis: The perceptual organization of sound. The MIT Press,
Cambridge, Mass., 1990.
Cla87
Eric F. Clarke. Levels of structure in the organization of musical time. Contemporary Music
Review, 2(1):211-38, 1987.
Cow84
Nelson Cowan. On short and long auditory stores. Psychological Bulletin, 96(2):341-370, 1984.
Dem98
Steven M. Demorest. The role of phrase groupings in children's memory for melodies. In
Suk Won Yi, Hee Sook Oh, Sang Wook Nam, Serin Kim, and Mee Bae Lee, editors,
Proceedings of the Fifth International Conference on Music Perception and Cognition, pages
75-80, Seoul, Korea, 1998. Western Music Research Institute, Seoul National University.
Deu86
Diana Deutsch. Auditory pattern recognition. In K. R. Boff, L. Kaufman, and J. P. Thomas,
editors, Handbook of Perception and Human Performance: Cognitive Processes and
Performance, volume 2. John Wiley and Sons, New York, 1986.
EW96
Bernd Enders and Tillman Weyde. Automatische Rhytmuserkennung und -vergleich mit Hilfe
von Fuzzy-Logik. Systematische Musikwissenschaft, IV(1-2):101-113, 1996.
Fra82
Paul Fraisse. Rhythm and tempo. chapter 6, pages 149-180. Academic Press, New York, 1982.
Han93
S. Handel. The effect of tempo and tone duration on rhythm discrimination. Perception and
Back to index
Proceedings paper
1. INTRODUCTION
We often perceive rhythmic movements in our behavior and environment. Fraisse (1982) reviewed
psychological experiments concerned with rhythmic perception and behavior. In this article, he defined a
cadence as the rhythm produced by the simple repetition of the same stimulus at a constant rate, and
designated it as the basis of all rhythms. He reviewed studies regarding spontaneous and preferred tempi
of cadence, and then discussed the organization of rhythms based on the accents which were added to the
cadence in intensity, duration etc. He finally showed performed tempi of many musical works correlate
with spontaneous and preferred tempi of the cadence.
Moreover, in the field of musicology, Cooper and Meyer (1960) defined a pulse as one of the series of
regularly recurring, precisely equivalent stimuli. They assigned the series of the pulse as the basis of
musical rhythms and analyzed the complex rhythmic structures of musical pieces. Fraisse (1982) and
Cooper and Meyer (1960) suggest that the rhythmic behavior of music is based on the behavior of the
cadence or pulses.
Equal interval tapping produces the cadence or pulses. Musha et al. (1985) requested non-musicians and
an amateur pianist to tap castanets at equal intervals under two conditions: Under one set of conditions,
the subjects synchronized their tapping to the ticking of a metronome (metronome tapping). Under the
other set of conditions, the subjects only listened to the metronome before tapping but not while tapping
(free tapping). They analyzed the temporal fluctuation observed in these tapping experiments using
Fourier analysis and determined the power spectrum of the fluctuation. As a result, in the case of free
tapping, the power of the fluctuation was small and constant in the high frequency region above 0.1 Hz,
while it increased as the frequency decreased in the low frequency region below 0.1 Hz. The amplitude
or the power of the fluctuation indicates the difficulty of temporal control and the frequency of 0.1 Hz
corresponds to a period of 10 sec. Therefore, the critical phenomenon in the spectrum of free tapping
implied that the temporal controllability was excellent for a period which was less than 10 sec, but for
periods over 10 sec, the controllability worsened as the period increased. On the other hand, for
metronome tapping, the power was large and constant in the high frequency region above 0.1 Hz, similar
to free tapping, while it never increased as the frequency decreased in the low frequency region below
0.1 Hz. The power of the low frequency components was still as large as the power of the high frequency
components, or the power decreased as the frequency decreased. Musha and his colleagues suggested
that the critical phenomena of 0.1 Hz indicated that temporal control in equal interval tapping is
governed by a memory of 10 sec.
Yamada (1996, 1998) pointed out that there was a possibility that the memory capacity corresponded not
to a real time of 10 sec, but to a given number of taps, because the tempi were limited to 300-500 ms/tap
in the experiments of Musha et al. (1985). Yamada made experiments of free tapping in various tempi
ranging from 180 to 800 ms/tap. He requested that the musicians tap in equal intervals without
metronome ticking, using the index or middle fingers of their right hands. As a result, the critical period
that was determined by the method of least mean square showed around 20 taps for all tempi and for all
subjects. ANOVA showed neither significant main effects nor the interaction with regard to the factors of
tempo and finger used. Moreover, he applied auto-regressive (AR) models to the temporal fluctuation of
free tapping. The best AR model was determined as the model that minimizes the value of Akaike's
Information Criteria (Akaike, 1969). The order of the best AR model also showed around 20 for all tempi
and for all subjects. Yamada (1996, 1998) concluded that the memory capacity, which governs equal
interval tapping, was not 10 sec, but 20 taps, i.e., the preceding 20 intervals of the tapping is preserved
and used to determine the interval of the present tap.
Musha et al. (1985) used non-musicians and an amateur musician as subjects, while Yamada (1996,
1998) used only musicians. Yamada and Tsumura (1997) investigated the temporal controllability in
equal interval tapping as a function of musical training. They used skilled pianists and novice pianists as
subjects. As a result, when they performed tapping with one finger, skilled and novice pianists showed
the same temporal controllability and they consistently showed the critical phenomenon of 20 taps in the
spectrum. However, when they used multiple fingers, there were significant differences between the two
groups: The temporal controllability of the skilled pianists was unchanged, while the temporal
controllability of the novice pianists significantly decreased. These results suggested that the critical
phenomenon of 20 taps, which was observed in single finger free tapping, correlated with a basic feature
of temporal control that did not change with musical training.
The series of experiments by Yamada and his colleague verified that the 20-tap memory associated with
free tapping existed with various tempi. On the other hand, Musha et al. (1985) made both free tapping
and metronome tapping experiments, but the tempi were limited to 300-500 ms/tap. Therefore, it is still
not clarified the control of equal interval tapping with metronome ticking in various tempi.
In the present study, we have made free and metronome tapping experiments in various tempi to estimate
the temporal control of metronome tapping with regard to the control of free tapping.
2. EXPERIMENTAL METHOD
Ten students from the Department of Musicology at the Osaka University of Arts were used as subjects.
While in a soundproof room, the subjects tapped an aluminum board on a table with the middle fingers of
their right hands with as an equal interval and intensity as possible. All subjects had experience in
playing the piano and other instruments, but only at intermediate levels. Each subject made equal interval
tapping in the tempi of 200, 370 and 800 ms/tap and the spontaneous tempo, i.e., the comfortable tempo
for the subject to tap in equal intervals. The subjects were instructed not to count numbers of taps or to
imagine music during the tapping.
In metronome tapping, subjects listened to metronome ticking during the tapping. On the other hand, in
the case of free tapping for the fixed tempi of 200, 370 and 800 ms/tap, they listened to metronome
ticking for 20 sec before each trial, but not while tapping. The subjects were not exposed to metronome
ticking in spontaneous tempo, free tapping.
The metronome ticking was produced by a computer system with a D/A converter of 48 kHz. Each tick
consisted of a 4000 Hz tone with the triangle time envelope of 6 ms. The metronome ticking was
presented through headphones at about 73 dB(A). Small speakers attached to the aluminum board
converted the pressure generated by the subjectâ€(tm)s finger to a voltage. The computer system
converted this voltage to numeric data with a 12 kHz sampling A/D converter and measured the
inter-onset intervals (IOIs) of the tapping. The voltage was also used for each subject to monitor the
clicking sounds of his/her own tapping. The clicking sounds were monitored at about 73 dB(A) through
the same headphones that the metronome ticking was presented.
One trial of tapping consists of 1701 taps. In some cases, the IOI was not stable in the initial 100 taps,
therefore we used the stable IOI fluctuation of 1600 taps, from the 101st to the 1700th tap, for the
analysis in all tapping trials. In some cases of free tapping, the IOI showed the divergence phenomenon,
i.e., the IOI gradually increased or decreased, and the IOI values of the last part was quite different from
the values of the initial part. We defined the case in which the mean value of the IOI in the last ten taps
were different in more than 20% from the mean value of the 101st to the 110th taps as a failed trial.
The experiment was as follows: The experiment consists of four blocks. Each block corresponded to one
of the previously defined tempi. Each block consists of two phases. Each phase corresponded to each
condition (free or metronome tapping). The order of the blocks was randomized for each subject. The
order of the phases was also randomized for each block and for each subject with the exception of the
spontaneous tempo block. In the spontaneous tempo block, the free tapping phase was first performed,
then the average tempo of the spontaneous tempo was calculated and this tempo was used for the
following metronome tapping. In each phase, each subject carried out the trials until the number of
successful trials (not failed trials) reached seven. In free tapping phases, subjects carried out seven to 15
trials including failed trials, but there were no failed trials in the metronome tapping phases. A 5-10 min
rest separated the trials, and the subjects took a rest between phases and between blocks for at least 20
min. Each subject performed one to three phases a day and completed the entire experiment within seven
to ten days. 35 successful trials were obtained (five subjects, seven trials) for each tempo and each
condition by the process described above.
3. RESULTS AND DISCUSSION
The IOI was plotted as a function of the order of the taps. As mentioned above, because the IOI was not
consistently stable in the initial portion of the trials, the initial 100 taps in the IOI fluctuation were
eliminated. The fluctuation of the remaining 1600 taps was decomposed into Fourier components by
DFT with Hanning window, and the power spectrum was calculated for each trial. The power was
averaged over every 1/2-octave band, and then the resulting spectra were averaged over the same tempo
and the same condition on a logarithmic scale. Using this process, a smooth 1/2-octave band power
spectrum was obtained for each tempo and for each condition.
Fig. 1 Power spectra of temporal fluctuation in free tapping (lines with filled marks) and metronome
tapping (lines with empty marks) for various tempi.
3.1. Free Tapping
The lines with filled marks in Fig. 1 show the power spectra of the temporal fluctuation for free tapping.
As can be seen, the spectral features are similar for all tempi. In the high frequency region above
approximately 80 cycles for 1600 taps, the power is constant or slightly increases as the frequency
increases. On the other hand, the power increases as the frequency decreases in the low frequency region
below 80 cycles. The power of a frequency component indicates the difficulty of temporal control for the
frequency and the correlation between the frequency, f [cycles] and period, p [taps] is, p = 1600 / f.
Therefore the spectral features show that the temporal control in free tapping is excellent for a short
period which is less than approximately 20 taps, but the control becomes worse as the period increases in
the long period region above 20 taps. These spectral features are consistent with the 20-tap memory
shown by Yamada (1996,1998), although the critical period is not definitively shown in the present
study.
This uncertainty may correlate with the averaging process: Yamada (1996, 1998) calculated the power
spectrum for each subject. In these spectra, the critical phenomenon of the control was clearly observed
and the critical period was distributed around 20 taps ranging from 12 to 27 taps. In the present study,
these individual spectra were averaged into one spectrum. This averaging process may have dulled the
features of the spectrum and resulted in the uncertainty of the critical period.
The lines with filled marks in Fig.1 also show that the slower tempi resulted in a higher spectral position
on the power axis. Yamada (1998) showed that the coefficient of variation of IOI in free tapping was
consistently distributed around 4.3 % for various tempi. Because the mean IOI in a slow tempo is long,
the standard deviation and the power in the fluctuation are large. Therefore, the difference in spectral
position between different tempi for free tapping in Fig.1 is consistent with Yamada (1998).
In conclusion, the spectra for free tapping in Fig. 1 suggests that the temporal control of free tapping is
characterized by the 20-tap memory and a consistent coefficient of variation of IOI, as in the previous
studies by Yamada.
In the low frequency region below two cycles, i.e., the long period region above 800 taps, the power
seems to be reaching a plateau. This phenomenon requires further study conducting experiments in
which the tapping is observed for a longer period of tap numbers.
3.2. Metronome Tapping
The lines with empty marks in Fig. 1 show the power spectra for metronome tapping. There are no
significant differences between different tempi in the low frequency region below approximately 50
cycles. In this region, the slope of all spectra is steep, which implies that the power rapidly decreases as
the period increases. On the other hand, the spectral features are quite different between different tempi
in the high frequency region above 50 cycles. For example, the spectrum for the 200 ms/tap tempo shows
a decrease in slope above 50 cycles, whereas in the case of the 800 ms/tap tempo, the spectrum maintains
a steep slope up to 500 cycles, above which it then shows a decrease in slope.
The spectrum for the 800 ms/tap tempo is interpreted as follows: The power of the highest frequency
above 400 cycles is significantly larger than the other frequency components. This implies that virtually
the entire fluctuation consists of a few components of short periods below about four taps. In other
words, the metronome ticking suppresses the fluctuation of a long period above four taps. On the other
hand, in the case of the 200 ms/tap tempo, the fluctuation consists of many components in which periods
are 1-30 taps and the metronome ticking suppresses the fluctuation for long periods above 30 taps.
Now, the question is what mechanism(s) yield such differences between the different tempi. Let us
observe the correlation between the spectra of metronome tapping and free tapping for each tempo. In the
cases of the 200 and 370 ms/tap tempi and the spontaneous tempo, the spectral features show that the
power of metronome tapping increases as the frequency increases in the low frequency region. However,
once the power intersects with the power spectrum of free tapping, the power of the metronome tapping
coincides with the spectrum of the free tapping above the intersecting frequency.
The metronome and free tapping tasks are different with regard to the existence of metronome ticking.
Therefore, the excellent control exhibited in metronome tapping in the low frequency region corresponds
to the consistent suppression by the metronome for all tempi. This consistent suppression by the
metronome may itself result in a steep slope for a wide frequency range, as in the spectrum of the 800
ms/tap tempo. In the metronome tapping task, this suppression mechanism is active. However, the 20-tap
memory mechanism that governs free tapping, is also active in the metronome tapping task. Figure 1
definitely shows that, for the 200 and 370 ms/tap tempi and the spontaneous tempo, these two
mechanisms are both active in metronome tapping, and the fluctuation of a frequency component is
determined by the mechanism which provides better control for the frequency.
In the case of the 800 ms/tap tempo, the power of metronome tapping is significantly larger than that of
free tapping in the high frequency region. This suggests that metronome tapping is governed only by the
suppression mechanism in the case of a slow tempo. Why the 20-tap memory mechanism is not active for
metronome tapping in the slow tempo of 800 ms/tap still remains to be studied.
In the low frequency region below two or three cycles, the power seems to be reaching a minimum value.
As with the power reaching a plateau in free tapping, this phenomenon also requires further study in
which the tapping is observed for a longer period of tap numbers.
4. CONCLUSIONS
In the present study, therefore it is confirmed that the temporal control of free tapping is governed by a
20-tap memory for various tempi. Moreover, it is shown that in metronome tapping, a suppression
mechanism by the metronome exists throughout various tempi. For fast and intermediate tempi, the
temporal control of metronome tapping is governed by both the suppression and the 20-tap memory
mechanisms. However, in the case of a slow tempo, the 20-tap memory mechanism does not govern
metronome tapping directly. This aspect of the 20-tap memory mechanism requires further study.
REFERENCES
Akaike, H. (1969). Fitting autoregressive models for prediction. Ann. Inst. Statist. Math. 21,243-247.
Cooper, G. W. and Meyer, L. B. (1960). The Rhythmic Structure of Music. Chicago, The Univ. of
Chicago Press, pp. 3-4.
Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.). The Psychology of Music. New York,
Academic Press, pp.149-180.
Musha, T., Katsurai, K. and Terauch, Y. (1985). Fluctuations of human tapping intervals. IEEE Trans.
Biomed. Eng. BME-32, 578-582.
Yamada (1996). Temporal control mechanism in equaled interval tapping. Appl. Hum. Sci. 15, 105-110.
Yamada (1998). Temporal fluctuation in musical performances -Fluctuations caused by the limitation of
performersâ€(tm) controllability and by artistic expression-. Proc. 5th Int'l. Conf. Music Percept. Cogn.,
353-358.
Yamada, M. and Tsumura, T. (1997). Do piano lessons improve basic temporal controllability of
maintaining a uniform tempo? J. Acoust. Soc. Jpn. (E) 19, 121-131.
Back to index
Proceedings paper
1(a) 1(b)
Recording for time difference stereo is made on the left and the right channels on a CD-R in the same level but having specified time
difference between the channels for each instrument. The time difference for each instrument is decided based on the distance
difference of assumed position of the instrument. As the time difference between the two channels for each instrument was given
exactly in integer multiples of the sampling interval, without time error attributed to MIDI sound production.
Table 1 shows the experimental design for the time differences assigned to each instrument for the configurations in Fig.1, where In
represents n-th instrument and Sn denotes i-th stimulus. The unit of the time in Table 2 is the sampling interval of CD, i.e. about
22.7us. Positive values mean that the sound is radiated from the left loudspeaker first, while negative values, from the right first,
compared to the reference timing assigned at the center.
Table 2 shows the reproduction level of loudspeakers for each instrument for the configurations in Fig.1. The unit of the level in
Table 2 is dB. Positive values mean that the level of presentation is higher than the reference level assigned for the center, while
negative values, lower. Table 3 shows two patterns of instrument assignment in the experiments.
I1 I2 I3 I4 I5
S1 10 5 0 -5 -10
S2 30 15 0 -15 -30
S3 50 25 0 -25 -50
I1 I2 I3 I4 I5
S8 1 0.5 0 -0.5 -1
S9 4 2 0 -2 -4
S11 10 5 0 -5 -10
(unit u? dB)
Table 3: Patterns of instruments assignment
I1 I2 I3 I4 I5
Pattern 1 Fl Bs Gt Vib Ds
Pattern 2 Vib Ds Bs Gt Fl
6. COMMENTS ON STIMULI
Wavefronts
As the precedence effect is also called the "law of the first wavefront", which side of right or left ear the wavefront reaches first is
very important, though even in case of continuous waveforms the temporal correspondence between the signals to both the sides
might be sensed if the sounds are not exactly periodical. There arises a bit of anxiety concerning the building-up shapes of the sounds
employed here as the precedence effect may differ depending on waveforms of the wavefronts. The waveforms of the virtual
instruments employed in the present experiment differ significantly and they may effect on the results of the experiments, so we
provide the beginning parts of the waveforms of the instruments employed in the experiment in Fig.3 presuming future reference.
(a) Flute (b) Vibraphone (c) Guitar (d) Bass guitar (e) Drums
Figure 2: Three methods to generate the test signals having d-point difference. (a) preceding (b) delayed (c) shared time difference
7. EXPERIMENTAL RESULTS
Figures 4 compares the performances of the proposed time difference method and the conventional level difference method for
loudspeaker-listener configurations depicted in Fig.2. The abscissa in the figures for the Time Difference method is the absolute
maximum time difference or sampling counts, while that for the Level Difference method is the maximum level difference, in dB,
both assigned to the left- or the right-most instruments. The ordinate denotes the average of the direction IDs that subjects answered.
Clearer source images are located roughly in the desired direction (not exactly, but in the direction of loudspeakers) by the proposed
method than by the level difference method even in cases where level of sounds from the loudspeaker located in direction of
presumed source is smaller than that from the other.
Comparing the performance of the two methods for configuration 1, or for standard listening configuration, the upper half of Fig.4
claims that the time difference method yields sharp source image displacements by small time difference as 50 samples, i.e. about
1ms, while the level difference method shows gradual move of instrument images increasing the level difference. The level difference
method can locate sources in any direction between the loudspeakers, but the time difference method locates sources only in direction
of either of two loudspeakers or just the center of them.
Figure 4: Performance comparison between Time-Difference Stereo and Level-Difference Stereo for configurations depicted in Fig.2.
Pnu?Instrument Assignment Pattern n (n=1~5). Cnu?Configuration Pattern n (n=1~5).
ordinateu?perceived direction( + =right, - = left)
abscissau?time difference(samples) for the left-most instrument in Time-Difference stereo and level difference(dB) for the left-most
instrument in Level-Difference stereo
In case of car cabin situation, as typically shown in the lower half of Fig.4, the time difference method shows far better performance
than the level different method. Though the source images by the level difference method are weighted to the direction of the nearer
loudspeaker, the time difference method locates the source images near the designed directions though relative sound level suggests
the other. Two figures in the bottom left of Fig. 4 state that the amount of time difference required for satisfactory localization will be
around 200 sampling intervals or more, i.e. 4 to 5ms.
Subjects perceive echo-like sensation by two sounds having time difference more than around 800 sampling intervals, i.e. 18ms,
particularly pulsive sounds such as percussions. Some subjects say that flute is difficult to localize, sometimes it moves after once
fixed.
8. CONCLUSIONS
It is confirmed that stereophonic effects are obtained by the proposed time difference method with delay of 1 to 8 ms and it yields better
localization performance than the conventional level difference method for car-cabin situation in particular. Although live recording for the
proposed method is problematic, time-difference recording is easily accomplished employing MIDI synthesizers. Moreover, conventional
2-channel stereo reproducing systems are fully compatible for playing the disks or tapes recorded in the proposed scheme.
Further studies are required on precedence effect on periodical signals such as synthetic flute sound, and localizability by the time different
method in arbitrary directions where any loudspeaker does not exist. Also to be studied are precedence effect on composite signals as little is
studied on signals other than sinusoidal waves. Though it is reported that the precedence effect is not obtained on sinusoidal waves beyond 1
kHz, a lot of frequency components generally exist in ordinary sounds including those generated from MIDI synthesizers. So, precedence
effect on composite sounds including higher frequency components is expected to be studied as basic research.
ACKNOWLEDGEMENT
The authors express thanks to Prof. Masayuki Morimoto, Kobe University, for his initial helps and useful comments to this work, which has
been partly supported by Grant-in-Aid for Scientific Research (C) [10680395]"Stream Segregation in Music", Ministry of Education,
Science and Culture, Japan.
REFERENCES
Blauert, J. (1983) Spatial Hearing. MIT Press, Cambridge.
David, E. E., Guttman, N., & van Bergeijk, W. A. (1959) Binaural interaction of high-frequency complex stimuli. J. Acoust. Soc. Am.,
Vol.31, pp.774-782.
Gilkey, R. H., & Anderson, T. R. (1997) Binaural and spatial hearing in real and virtual environments. pp.191-197.
Gourevitch, G. (1987) Directional Hearing. Springer Verlag, p.85-98.
Harris, G. G. (1960) Binauralinteraction of impulsive stimuli and pure tones. J. Acoust. Soc. Am., Vol.32, pp.685-692.
Ito, Y., Ishiyama, Y., Ishii, H., & Ogushi, K. (1994) Study on escape guidance with voice using precedence effect in two-dimensional space.
Back to index
Symposium introduction
Rationale: Although there is a growing body of literature which focuses upon issues surrounding
children's composition there is still much to be understood about this topical and, at times,
controversial area. This symposium brings together a number of researchers who have focused upon
developing both a theoretical and practical understanding in different but related ways. Particular
emphasis is placed upon investigating the musical and social psychological processes involved in
compositional activities.
Aims: The aim of this symposium is to present a number of complementary approaches to
understanding children's compositional activities
Speakers: Two of the papers investigate the nature of children's collaborative compositions while two
of the presentations discuss music activities that children undertake by themselves. Louise Morgan,
David Hargreaves & Richard Joiner focus upon the group dynamics (both social and musical)
involved when children work in mixed gender groups. Raymond MacDonald, Dorothy Miell and
Laura Mitchell outline musical and verbal coding systems that can be utilised for analysing the
processes occurring between children working in pairs, highlighting the importance of social factors
such as friendship. Frederick Seddon and Susan O'Neill will discuss computer based composition,
with an emphasis upon the impact that formal music tuition has upon process and outcome in
composition. Charles Byrne focuses upon computer technology that can be used to enhance children's
musical inventing skills. His paper outlines a theoretical context for Spider's Web Composing Lessons,
a world wide web based interactive teaching resource.
Back to index
Proceedings paper
It is possible to engage in these' on task' activities with or without employing the replay facility available in the 'Cubase' composition
program. This replay facility enables previously recorded parts to be heard during 'on task' activities. The 'click' (an electronic metronome
device to assist performance in strict time) may be used as a substitute for replay of a previously recorded part if preferred. If replay (or
'click') facilities are engaged during 'on task' activities this is labelled as engaging with 'aural reference', if replay (or 'click') facilities are
not engaged during 'on task' activities this is labelled as engaging without 'aural reference'. The second transcripts were then coded using
the 'analysis codes'. The 'coded transcripts' containing sequentially numbered coded 'events' were also colour co-ordinated for the type of
instrumental sound used (see Table 2)
Table 2
Coding of transcript
Session one
Event I: PK/NAR
Event 2: PK/NAR
Playing keyboard on bass sound experimenting.
Event 3: PK/NAR
Using 'cut and paste' techniques a 'construction of parts' document was produced from the 'coded transcripts' for each participant in order to
trace the development of each instrumental sound part in the sequential order of 'events' (see Table 3).
Table 3
Construction of parts
Session one
Event 3: PK/NAR
Playing keyboard on sound one experimenting.
Event 5: PK/NAR
Event 8: PK/AR
Using 'midi files' a 'musical score' of each instrumental part (as it was saved at the end of each recording session) was produced to allow for
comparison and cross referencing with the 'construction of parts' document (see Example 1).
By cross referencing all three documents (coding of transcript, construction of parts and musical scores) it was possible to trace
sequentially the 'activities' involved in the development of each instrumental part to include not only musical material that was recorded
and retained but also musical material that had been discarded. This gave a very detailed record of the composition process for each
participant. Examination of these detailed records revealed emerging patterns of behaviour enabling the formulation of propositional
statements leading to 'rules of inclusion'. Typical examples of two of the emerging patterns are described below along with their
propositional statements and 'rules of inclusion'.
Example of 'Pattern A' composition strategy.
Neither of the other two sounds (strings and cello) are engaged with until the piano part is completed. At Event 11 (RK/AR), the participant
begins recording the cello 'accompaniment' with aural reference to the piano part. This first recording of the cello part is made without prior
experimentation and is deleted. The keyboard is played on cello sound developing the same part. The participant chooses not to aurally
reference the piano part while developing the cello part. Event 15 RK/AR results in the 14 bar cello part recorded with aural reference to
the piano part. This part is subsequently reduced to 9 bars through a 'cut/delete' note edit of bars 9-14 (Event 17) and remains as in (Ex. 2)
until it is deleted at the beginning of session two (Event 43).
Ex. 2.
The string part is started at Event 18: PK/NAR (i.e., playing keyboard on string sound), as with the cello part the participant chooses not to
aurally reference the previously recorded parts. Three recordings of the same string part are made with aural reference to the piano and
cello parts and are subsequently deleted before the fourth recording of the same part is made. This recording is also made with aural
reference to the piano and cello parts. The fourth recording remains but the last three bars 13-15 are 'cut/deleted' in a note edit (Event 32).
The string part remains as in (Ex 3) until it is deleted in session two, (Event 54).
Ex. 3.
Session two
After replay the cello part is deleted (Event 43) re-recorded this time four bars longer (bars 10-13). This extension to the part is made
without any prior keyboard playing activity. Two recordings are deleted before the third is accepted (Event 49) all recordings are made
with aural reference to the piano part but with the strings muted. The cello part is now as in (Ex. 4).
Ex. 4.
Comparison of (Ex. 2) with (Ex. 4) reveals that although in bars 1-9 the notation of the part appears different the sound remains unchanged.
The cello part remains unchanged from (Ex. 4) in the final composition.
Having completed the cello part the participant resumes work on the string part. The string part is deleted (Event 54) and re-recorded with a
note change and slight extension to the part. Two recordings are rejected before the third one is accepted (Event 58) see (Ex. 5). All
recordings are made with aural reference to the piano and cello parts.
Ex. 5.
Comparison of (Ex. 3) with (Ex. 5) reveals that although in bars 1-9 the notation of the part appears different the sound remains unchanged.
The string part remains unchanged from (Ex. 5) in the final composition. There are no changes made to the piano part in this session.
Session three
There are no events during this session that make changes to any of the parts. The session is spent almost exclusively in 'off task' activity in
particular playing recogniseable tunes e.g., 'We Three Kings', 'Little Drummer Boy', and 'Super Trouper'.
This typical example of 'Pattern A' is charaterised by its lack of experimentation. The melody appears during the first event and although
there are three recordings of the melody made these were made to correct performance errors. The recording of this melody which is
After deleting the 5 bar drum part (Event 22), a period of experimenting on drum sound with and without aural reference to 'sound one' and
'click' results in a 25 bar improvised drum part (see Ex. 7). This part is 8 bars longer than the existing 'sound one' part and is recorded at
Event 33 (RK/AR) making aural reference to the 'sound one' and 'click during recording.
Ex. 7.
No bass part is recorded during session one but two events Event 2 (PK/NAR) and Event 28 (PK/AR) reveal experimentation with and
without aural reference.
Session two
Bars 18-25 of the session one drum part (Ex. 7) are deleted after replay at Event 37. The deleted section (bars 18-25) of the original drum
part is replaced and extended to 39 bars by 'overdubbing' at Event 41 (RK/AR). Comparison of (Ex. 7) and (Ex. 8) reveals this change to
the drum part.
Ex. 8.
After a brief period of experimentation with possible bass 'riffs', Event 43 (PK/NAR), a bass part is recorded with aural reference to the
'sound one' and drum parts, Event 46 (RK/AR). This recording of the bass part is from bar 21-30 but bars 28-30 are subsequently deleted in
a note edit, Event 48 (ED(N)/AR). A period of practice, Events 50 and 52 (PK/AR) where the participant is playing keyboard on bass
sound while the drum part is replaying, results in an extension by 'overdubbing' to the bass part from bar 28-43, Event 54 (RK/AR). Bars
36-43 of this recording had not been practiced prior to recording and were an improvised section which was subsequently deleted (Event
57). The bass part at the end of session two is bars 21-36, see (Ex. 9).
Ex. 9.
Participant then moves to work on the drum part by playing the keyboard on drum sound, Events 68 and 70 (PK/AR), with aural reference
the latest bass recording (see Ex. 10). Eventually the drum part is extended by 'overdubbing' to bar 51, with aural reference to the bass part
Event 71(RK/AR) see (Ex. 11).
Ex. 11.
The final event (Event 73) is the recording of a solo 'coda' on sound one bars 52-65 (see Ex. 12).
Ex. 12.
This typical example of 'Pattern B' is charaterised by the way the composition develops over all three sessions. The participant experiments
with alternative musical material for the instrumental parts employing aural referencing techniques. This is exemplified by the sequence of
coded events : PK/AR-RK/AR-RP/AR followed by considered response to the outcome. Musical material is reviewed and sections are
deleted to be replaced by different material indicating that the closure of the creative process is not reached (if at all) until late in the
composition process. On occasion the composition is extended by employing improvisation techniques with the previously recorded parts
providing the stimulus for the current improvisation. It is doubtful that these improvised parts remain in aural memory. The absence of any
'off task' behaviours and the final event being a 'recording with the keyboard' event is further indication that the creative process is ongoing
rather than completed.
Propositional statement for 'rules of inclusion' for 'Pattern B'
'Pattern B' is characterised by the predominant use of 'improvising' techniques rather than 'practicing' techniques during 'playing the
keyboard' activities. New ideas are experimented with throughout the process of composition. Recording will be used to 'capture'
improvisations that may or may not survive in the final composition. The composition will evolve in sections rather than each part being
through composed. Aural reference will be made during 'playing the keyboard' activities in addition to 'recording' activities.
A further three 'patterns' of composition have emerged from the data: Patterns 'C', 'D' and 'E'. Broad descriptions of these patterns appear
below but space restrictions do not allow for detailed examples of these 'patterns' like those provided for 'A' and 'B'.
'Pattern C'
As with 'Pattern A' individual parts are completed from beginning to end before engaging with the next part. The main difference being the
harmonic structure of the accompaniment is completed first with the melody composed to fall within the harmonic boundaries created by
the accompaniment.
'Pattern D'
The main focus of attention in this 'pattern' is to achieve synchronisation between all three parts in strict time. A melody is recorded from
beginning to end using performance skill. On discovering the high levels of performance skill required to exactly synchronise the
remaining parts to the original the remaining time is spent manipulating various editing techniques ('copy/paste') to ensure the timing of the
parts is exactly the same. This can lead to parts being identical except for timbre.
'Pattern E'
This 'pattern' is predominantly random in nature lacking any observable structure in either the use of the program or composition process.
Both the Cubase program and the available sounds are extensively experimented with. The 'composition' is formed by the expiration of
time allowed rather than a decision being made to 'save' what has been produced.
Implications The data collection methods described above address the data collection shortcomings of past studies. Detailed and complete
data can be collected with reduced 'surveillance effect' and without relying upon participants' memory, levels of awareness or articulation
skills. The proposed procedures for analysis provide a more appropriate method for this type of data than can be achieved solely through a
statistical analysis of time spent in each activity.
We are currently engaged in a study for which data has been collected. The data has been through the initial process of analysis described
in this paper. This study involves 48 adolescents (aged 13-14 years). Twenty five (Female=12, Male=13) had between 2-4 years prior
experience of FIMT and twenty three (Female=12, Male=11) had no prior experience of FIMT. A review of the initial non-coded
transcripts suggests some of the identified 'patterns' of composition were adopted by groups of participants identified by their prior
experience and/or gender (e.g., 8 of the 12 females with prior experience of FIMT adopted 'Pattern A' and 6 of the 8 participants who
The detailed coded analysis of data from all 48 participant is in progress and it is predicted that the results will further our understanding of
the extent to which: a) adolescents employ different 'patterns of composition' similar to those described above when engaged in
computer-based composition and, b) these 'patterns of composition' may be linked to prior experience of FIMT and gender.
References
Daignault, L. (1996). A study of children's creative musical thinking within the context of a computer-supported improvisational approach
to composition. Unpublished doctoral dissertation. Chicago, U.S.A.: Northwestern University.
Folkestad, G. (1991). Music composition in the upper primary school with the help of synthesisers-sequencers. (Report No. 1991:19),
Stockholm: Center for Research in Music Education.
Folkestad, G., Hargreaves, D. J., Lindstrom, B. (1998). Compositional strategies in computer-based music making. British Journal of Music
Education (1998) 15: (1), (pp 83-97).
Getzels, J., & Csikszentmihalyi, M. (1976). The creative vision: a longitudinal study of problem finding in art. New York: John Wiley.
Glaser, B.G., & Strauss, A.L. (1967). The discovery of grounded theory. Chicago, Il: Aldine.
Goertz, J.P. & LeCompte, M.D. (1981). Ethnographic research and the problem of data reduction. Anthropology and Education Quarterly,
12, pp.51-70.
Hickey, M. (1995). Qualitative and quantitative relationships between children's creative musical thinking processes and products.
Unpublished doctoral dissertation. Chicago, U.S.A.: Northwestern University.
Hickey, M. (1997). The Computer as a tool in creative music making. Research Studies in Music Education No.8 July 1997.
Lincoln, Y. & Guba, E. (1985). Naturalistic enquiry. Beverly Hills, CA: Sage.
Maykut, P., & Morehouse, R. (1994). Beginning qualitative research: A philosophic and practical guide. London: The Falmer Press
Richardson, C.P., and Whitaker, N.L. (1996). Thinking about think alouds in music education research. Research Studies in Music
Education, No. 6. June 1996, pp. 38- 49.
Odman, P. J. (1992). Didactical/phenomenological aspects of creative music making with the help of computers. In Datorer I
musikundervisningen (11-21), Stockholm: Center for Research Music Education.
Scripp, L., Meyaard, J., and Davidson, L. (1988). Discerning musical development: Using computers to discover what we know. Journal of
Aesthetic Education, 22 (1), 75-88.
Seddon, F.A., & O'Neill, S.A. (1999). An evaluation study of computer-based compositions by children with and without prior experience
of formal instrumental music tuition. Accepted for publication Psychology of Music January 1999.
Sloboda, J. A. (1985). The Musical Mind: The Cognitive Psychology of Music. Oxford University Press.
Strauss, A. & Corbin, J. (1990). Basics of qualitative research: Grounded theory procedures and techniques. Newbury Park, CA: Sage.
Appendix A
Analysis codes:
'On task activities'
PK/AR Playing the musical keyboard with aural reference to previously recorded part(s) or with 'click' (similar to playing with a
metronome) during replay, experimenting with and developing ideas or practising prior to recording.
RK/AR Recording using the musical keyboard with aural reference to either 'click' or a previously recorded part(s).
RK/NAR Recording using the musical keyboard without aural reference to either 'click' or a previously recorded part(s).
'Replay' Codes
('Specific' replays that take place immediately following editing procedures are included in 'edit' coding with and without AR respectively.)
RP/AR 'Global' replay with aural reference to previously recorded parts or 'click'.
RP/NAR 'Global' replay without aural reference to previously recorded parts or 'click'.
'Editing' Codes.
ED(N)/AR Edits performed to change notes (or a group of notes) in time and/or pitch, erase, insert, extend, or identify, with aural
reference to previously recorded part(s).
ED(N)/NAR Edits performed to change notes (or a group of notes) in time and/or pitch, erase, insert, extend, or identify, without
aural reference to previously recorded part(s).
ED(V)/AR Edits performed to change the volume of notes (or a group of notes) with aural reference to previously recorded part(s).
ED(V)/NAR Edits performed change the volume of notes (or a group of notes) without aural reference to previously recorded
part(s).
DP Deletes part
C/P Copy/paste of part.
TC Tempo change
'Off Task activities'
'Off task' behaviours codes
PI Period of inactivity (Mouse remains motionless for more than 5 seconds).
PPI Prolonged period of inactivity (Mouse remains motionless for longer than one minute).
PK/OT Playing the keyboard in 'off task' way, random, displaying 'frustration' or 'recognisable' tunes.
'Error' Codes
EUP Events resulting from 'errors' using the program, 'accidents' or actions exploring the program that have no obvious intent.
PM Program malfunctions resulting from unknown sources or misuse of the program.
PD Program defaults preventing participants actions (e.g., 'cut' mid bar or when program defaults to the start when record button is
pressed.)
Back to index
Proceedings paper
convention that the first entry in the vector corresponds to the tone C, the second to C#/Db, the third
to D, and so on, then the vector for C major is: <6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39,
3.66, 2.29, 2.88>, the vector for C# major is: : <2.88, 6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19,
2.39, 3.66, 2.29>, and so on. The vectors for the different keys result from shifting the entries to
appropriate number of places to the tonic of the key.
Krumhansl and Kessler (1982) then used these data to study how the sense of key develops and
changes over time. They used ten nine-chord sequences, some of which contained modulations
between keys. Listeners did the probe tone task after the first chord, then after the first two chords,
then after the first three chords, and continued until the full sequence was heard. This meant that 12
(probe tones) x 9 (chord positions) x 10 (sequences) = 1080 judgments were made by each listener.
Each of the 90 probe tone ratings were compared with the ratings made for the unambiguous
key-defining contexts. That is, each set of probe tone ratings was correlated with the K-K profiles for
the 24 major and minor keys. For some of the sets of probe tone ratings (some probe positions in some
of the chord sequences), a high correlation was found indicating a strong sense of key. For other sets
of probe tone ratings, no key was highly correlated which was interpreted as an ambiguous sense of
key.
Probe tone methodology: The concurrent judgment
As should be obvious from the above, the retrospective probe tone requires an intensive empirical
effort to trace how the sense of key develops and changes, even for short sequences. In addition, the
sequence needs to be interrupted and the judgment is made after the sequence has been interrupted.
For these reasons, the judgments may not faithfully mirror the experience of music in time. For these
reasons, we were motivated to develop an alternative form of the probe tone methodology. In this
method, which we call the concurrent judgment, the probe tone is presented continuously while the
music is played. The complete passage is sounded together with a probe tone. Then the passage is
sounded again, this time with another probe tone. This process is continued until all probe tones have
been sounded.
In our initial application of this method, the passage was J. S. Bach's Organ Duetto IV, BWV 805. Its
duration is slightly longer than three minutes. The piece contains an interesting pattern of modulations
including a repeated, highly chromatic passage. At the beginning of the session, the listener heard the
entire passage from beginning to end without any probe tone so that they could become familiar with
the piece. During each trial, the piece was repeated twelve times, each time with a different probe
tone. The probe tone was sounded over six octaves spanning the range of the piece, similar to a
'Shepard tone'. The order of the probe tones was determined randomly and was different for each
subject.
To reduce the effects of sensory dissonance, the probe tone was sounded only in the left ear, while the
music was sounded only in the right ear. To help listeners continue to attend to the probe tone, it was
pulsed at the beginning of each measure. Listeners were instructed to use a computer mouse to move a
slider left and right to indicate the extent to which the probe tone fit with the music. The left end of
the scale was labeled "Fits poorly" the right end of the scale was labeled "Fits well". The computer
program, written in MAX, recorded the position of the slider every 200 msec. Because the task
requires concentration, only highly trained musicians were run in this initial application.
A geometric map of key distances from the tonal hierarchies
Krumhansl and Kessler (1982) used the K-K profiles to generate a geometric representation of
musical keys. The basic assumption underlying this approach was that two keys are closely related to
each other if they have similar tonal hierarchies. That is, keys were assumed to be closely related if
tones that are stable in one key are also relatively stable in the other key. To measure the similarity of
the profiles, a product-moment correlation was used. It was computed for all possible pairs of major
and minor keys, giving a 24 x 24 matrix of similarity values showing how similar the tonal hierarchy
of each key is to every other key. The correlations between the C major profile and the 24 major and
minor keys, and the correlations between the C minor profile and all the 24 major and minor keys
were presented in Krumhansl (1990, p. 38). To give some examples, C major correlated relatively
strongly with A minor (.651), G major and F major (both .591), and with C minor (.511). C minor
correlated relatively strongly with Eb major (.651), C major (.511), Ab major (.536), and F minor and
G minor (both .339). The same transposition-shift principle can be used to find the correlations for all
pairs of major and minor keys.
A technique called multidimensional scaling was then used to create a geometric representation of the
key similarities. The algorithm locates 24 points (corresponding to the 24 major and minor keys) in a
spatial representation to best represent their similarities. It searches for an arrangement such that
points that are close correspond to keys with similar K-K profiles (as measured by the correlations). In
particular, non-metric multidimensional scaling seeks a solution such that distances between points
are (inversely) related by a monotonic function to the correlations. A measure (called 'stress')
measures the amount of deviation from the best-fitting monotonic function. The algorithm can search
for a solution in any specified number of dimensions. In this case, a good fit to the data was found in
four dimensions.
The four-dimensional solution located the 24 keys on the surface of a torus (generated by one circle in
dimensions 1 and 2, and another circle in dimensions 3 and 4). Because of this, any key can be
specified by two values: its angle on the first circle and its angle on the second circle. The result can
be depicted in two dimensions as a rectangle where it is understood that the left edge is identified with
the right edge, and the bottom edge is identified with the top edge. The solution obtained was similar
to that shown in Figure 1 (see below). As can be seen, the locations of the 24 keys are interpretable in
terms of music theory. There is one circle of fifths for major keys (...F#/Gb, Db, Ab, Eb, Bb, F, C, G,
D, A, E, B, F#/Gb..) and one circle of fifths for minor keys (...f#, c#, g#, d#/eb, bb, f, c, g, d, a, e, b,
f#,...). These wrap diagonally around the torus such that each major key is located near both its
relative minor (for example, C major and a minor) and its parallel minor (for example, C major and C
minor).
Figure 1. a) The configuration of a toroidal SOM trained with the 24 K-K profiles. b) the response of
one subject, displayed on the SOM, at a point with a clear tonality (at 9.5 measures); c) the response
of Model 1 at the same point as in b; d) the response of the subject at a point with a less clear tonality
(at 49 measures); e) the response of Model 1 at the same point as in d; f) the response of the subject at
a point with a weak tonality (at 89 measures); g) the response of Model 1 at the same point as in f;
In this manner, each of the ten nine-chord sequence used by Krumhansl and Kessler (1982) generated
a series of nine points in the torus representation of keys. For nonmodulating sequences, the points
remained in the neighborhood of the intended key. For the modulating sequences, the first points were
near the initial intended key, then shifted to the region of the second intended key. Modulations to
closely related keys appeared to be assimilated more rapidly than those to distantly related keys, that
is, the points shifted to the region of the new key more rapidly.
Measurement assumptions of the multidimensional scaling and unfolding methods
The above methods make a number of assumptions about measurement, only some of which will be
noted here. The torus representation is based on the assumption that correlations between the K-K
profiles are appropriate measures of interkey distance. It further assumes that these distances can be
represented in a relatively low-dimensional space (four dimensions). This latter assumption is
supported by the low stress values (high goodness-of-fit values) of the multidimensional scaling
solution. It was further supported by a subsidiary Fourier analysis of the K-K major and minor
profiles, which found two relatively strong harmonics (see Krumhansl, 1990, p. 101). In fact, plotting
the phases of the two Fourier components for the 24 key profiles was virtually identical to the
multidimensional scaling solution. This supports the torus representation, which consists of two
orthogonal circular components. Nonetheless, it would seem desirable to see whether an alternative
method with completely different assumptions reproduces the same toroidal representation of key
distances.
The unfolding method also adopts correlation as a measure of distances from keys, this time using the
ratings for each probe position and the K-K vectors for the 24 major and minor keys. The unfolding
technique finds the best-fitting point in the four-dimensional space containing the torus. It does not
provide a way of representing cases in which no key is strongly heard because it cannot generate
points outside the space containing the torus. Thus, an important limitation of the unfolding method is
that it does not provide a representation of the strength of the key or keys heard at each point in time.
For this reason, we sought a method that is able to represent both the region of the key or keys that are
heard, together with their strengths.
SOM map of keys
The self-organizing map (SOM; Kohonen, 1997) is an artificial neural network that simulates the
formation of ordered feature maps. The SOM consists of a two-dimensional grid of units, each of
which is associated with a reference vector. Through repeated exposure to a set of input vectors, the
SOM settles into a configuration in which the reference vectors approximate the set of input vectors
according to some similarity measure; the most commonly used similarity measures are the Euclidean
distance and the direction cosine. The direction cosine between an input vector and a reference
vector is defined by
. (1)
Another important feature of the SOM is that its configuration is organized in the sense that
neighboring units have similar reference vectors. For a trained SOM, a mapping from the input space
onto the two-dimensional grid of units can be defined by associating any given input vector with the
unit whose reference vector is most similar to that particular input vector. Because of the organization
of the reference vectors, this mapping is smooth in the sense that similar vector are mapped onto
adjacent regions. Conceptually, the mapping can be thought of as a projection onto a non-linear
surface determined by the reference vectors.
We trained the SOM with the 24 K-K profiles. The SOM had a toroidal configuration, that is, the left
and the right edges of the map were connected to each other as were the top and the bottom edges.
The resulting map is displayed at the top of Figure 1. The configuration of the SOM is highly similar
to the multidimensional scaling solution (Krumhansl & Kessler, 1982) and the Fourier-analysis-based
projection (Krumhansl, 1990) obtained with the same set of vectors. Furthermore, Euclidean distance
and direction cosine used as similarity measures in training the SOM yielded identical maps.
Representing the sense of key on the SOM
In addition to this localized mapping, a distributed mapping can be defined by associating each unit
with an activation value. For each unit, this value depends on the similarity between the input vector
and the reference vector of the unit. Specifically, the units whose reference vectors are highly similar
to the input vector have a high activation, and vice versa. The activation value of each unit can be
calculated, for instance, using the direction cosine of Equation 1. Dynamically changing data from
either probe-tone experiments or key-finding models can be visualized as an activation pattern that
changes over time. The location and spread of this activation pattern provides information about the
perceived key and its strength. More specifically, a focused activation pattern implies a strong sense
of key and vice versa.
Tone transitions and key-finding
All the key-finding models presented to date are static in the sense that they ignore the temporal order
of tones. The order in which tones are played may, however, provide additional information that is
useful for key-finding. This is supported by studies on both tone transition probabilities (Fucks, 1962;
Youngblood, 1958; Knopoff & Hutchinson, 1978) and perceived stability of tone pairs in a tonal
context (Krumhansl, 1979, 1990). Fucks (1962) found that, in samples of compositions by Bach,
Beethoven, and Webern, only a small fraction of all the possible tone transitions were actually used
(the fractions were 23, 16, and 24 percent, respectively). Furthermore, Youngblood (1958) showed
that, in a sample of 20 songs by Schubert, Mendelssohn, and Schumann, there is an asymmetry in the
transition frequencies in the sense that certain tone transitions were used more often than their
inversions. For instance, the transition B-C was used 93 times, whereas the transition C-B was used
only 66 times. A similar asymmetry was found in the study on perceived stability of tone pairs in a
tonal context by Krumhansl (1990). The study showed that, after the presentation of a tonal context,
tone pairs that ended with a tone that was high in the tonal hierarchy were given higher ratings than
their inverses. For instance, in the context of C major, the ratings for the transitions B-C and C-B were
6.42 and 3.67, respectively.
Determining tone transitions in a piece of polyphonic music is not a trivial task, especially if one aims
at a representation that corresponds to perceptual reality. Even in a monophonic piece, the transitions
can be ambiguous in the sense that their perceived strengths may depend on the tempo and may vary
from one individual to another. Consider, for example, the tone sequence C4-G3-D4-G3-E4, where all
the tones have equal durations. When played slowly, this sequence is heard as a succession of tones
oscillating in pitch. With increasing tempi, however, the subsequence C4-D4-E4 becomes
increasingly prominent. This is because it is segregated from the stream of tones due to the temporal
and pitch proximity of its members. With polyphonic music, the ambiguity of tone transitions
becomes even more obvious. Consider, for instance, the sequence consisting of a C major chord
followed by a D major chord, where the tones of each chord are played simultaneously. In principle,
this passage contains nine different tone transitions. Some of these transitions are, however, perceived
as stronger than the others. For instance, the transition G-A is, due to pitch proximity, perceived as
stronger than the transition G-D.
It seems thus that the analysis of tone transitions in polyphonic music should take into account
principles of auditory stream segregation (see Bregman, 1990). Furthermore, it may be necessary to
code the presence of transitions on a continuous instead of a discrete scale. In other words, each
transition should be associated with a strength value instead of just coding whether that particular
transition is present or not. Below, a dynamical system that embraces these principles is described. In
regard to the evaluation of transition strength, the system bears a resemblance to the model of
apparent motion in music presented by Gjerdingen (1994).
Pitch transition model
Let the piece of music under examination be represented as a sequence of tones, where each tone is
associated with pitch, onset time, and duration. The main idea of the model is the following: given any
tone in the sequence, there is a transition from that tone to all the tones following that particular tone.
The strength of each transition depends on three factors: pitch proximity, temporal proximity, and
duration of tones. More specifically, a transition between two tones has the highest strength when the
tones are proximal in both pitch and time as well as have long durations. These three factors are
included in the following dynamical model.
Representation of input. The pitches of the chromatic scale are numbered consecutively. The onset
times of tones having pitch are denoted by , , and the offset times by , ,
where is the total number of times the kth pitch occurs.
Pitch vector . Each component of the pitch vector has non-zero value whenever a tone
with the respective pitch is sounding. It has the value of 1 at each onset at the respective pitch, decays
exponentially after that, and is set to zero at the tone offset. The time evolution of is governed by
the equation
, (2)
where denotes the time derivative of and the Dirac delta function (unit impulse function).
The time constant has the value of . With this value, the integral of saturates at
about 1 sec after tone onset, thus approximating the durational accent as a function of tone duration
(Parncutt, 1994).
Pitch memory vector . The pitch memory vector provides a measure of both the
perceived durational accent and the recency of notes played at each pitch. In other words, a high value
of indicates that a tone with pitch and a long duration has been played recently. The dynamics of
are governed by the equation
(3)
The time constant determines the dependence of transition strength on the temporal distance
between the tones. In the simulations, the value of has been used, corresponding to typical
estimates of the length of the auditory sensory memory (Darwin, Turvey & Crowder, 1972; Fraisse,
1982; Treisman, 1964).
Transition strength matrix . The transition strength matrix provides a measure of the
instantaneous strengths of transitions between all pitch pairs. More specifically, a high value of
indicates that a long tone with pitch has been played recently and a tone with pitch is currently
sounding. The temporal evolution of is governed by the equation
. (4)
Dynamic tone transition matrix . The dynamic tone transition matrix is obtained by
temporal integration of the transition strength matrix. At a given point of time, it provides a measure
of the strength and recency of each possible tone transition. The time evolution of is governed by
the equation
(5)
Model 1 is based on pitch class distributions only. It uses a pitch class vector , which is similar
to the pitch vector used in the dynamic tone transition matrix, except that it ignores octave
information. Consequently, the vector has 12 components that represent the pitch classes. The pitch
class memory vector is obtained by temporal integration of the pitch class vector according to
the equation
. (6)
Again, the time constant has the value . To obtain estimates for the key, vector is
correlated with the probe-tone rating profiles for each key.
Key-finding Model 2
Model 2 is based on tone transitions. Using the dynamic transition matrix , it calculates the
octave-equivalent transition matrix according to
(7)
In other words, transitions whose first and second tones have identical pitch classes are considered
equivalent, and their strengths are added. Consequently, the direction of transition is not taken into
account. To obtain estimates for the key, the pitch class transition matrix is correlated with the
matrices representing the perceived stability of two-tone transitions for each key (Krumhansl, 1990).
Sample results
Figure 1 shows some sample results from one of the participants in the experiment, a highly trained
musician. This musician is a graduate student of composition with more than twenty years
performance experience on the piano and some additional years on other instruments. Figure 1 b
shows the results for the listener at measure 9.5. A V-I cadence in A minor has just occurred and the
melody contains a descending diatonic line ending on a half-note A, followed by a tonic - leading tone
- tonic alternation. This is the conclusion of the opening passage played by the left hand only and the
right hand joins at this point in time. As can be seen, the sense of tonality is strongly focused on A
minor. Figure 1 c shows the results for Model 1 which are highly similar, again with a strong focus on
A minor. (Model 2 results were in general highly similar to Model 1, agreeing with the subject
slightly more than Model 1. Because of issues about how best to visualize the results of Model 2, we
show only Model 1 here.) Figure 1 d, e shows the results at measure 49. The right hand contains what
would be a tonic - leading tone - tonic in E major and E minor; the mode is ambiguous because both
G and G# appear. This leads to an ambiguity that spreads to other close related keys which contain the
other chromatic tones, C#, D#, and A#, that appear in this passage. Figure 1 f, g show the results at
measure 89. As can be seen, no clear tonal focus is found. The music is highly chromatic; of the 12
tones of the chromatic scale, all but G# appears in the three preceding three measures. Thus, these
results suggest that both listeners and the algorithm can generate musically interpretable, and highly
dynamic representations of tonality.
References
Bregman, A. S. (1990). Auditory scene analysis. Cambridge, MA: M.I.T. Press.
Darwin, C. J., Turvey, M. T., & Crowder, R. G. (1972). An auditory analogue of the Sperling partial
report procedure: evidence for brief auditory storage. Cognitive Psychology , 3, 255-267.
Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.),The psychology of music.. San Diego, CA:
Academic.
Fucks, W. (1962). Mathematical analysis of the formal structure of music. I R E Transactions of
Information theory , 8, 225-228.
Knopoff, L. & Hutchinson, W. (1978). An index of melodic activity. Interface, 7, 205-229.
Kohonen, T. 1997. Self-organizing maps.. Berlin: Springer-Verlag.
Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford.
Krumhansl, C. L., & Kessler, E. J. (1982). Tracing the dynamic changes in perceived tonal
organization in a spatial representation of musical keys. Psychological Review, 89, 334-68.
Krumhansl, C. L. & Shepard, R. N. (1979). Quantification of the hierarchy of tonal functions within a
diatonic context. Journal of Experimental Psychology: Human Perception and Performance, 5,
579-94.
Parncutt, R. (1994). A perceptual model of pulse salience and metrical accent in musical rhythms.
Music Perception, 11, 409-464.
Treisman, A. M. (1964). Verbal cues, language, and meaning in selective attention. American Journal
of Psychology, 77, 206-219.
Youngblood, J. E. (1958). Style as information. Journal of Music Theory, 2, 24-35.
Back to index
Proceedings abstract
MAKING MUSIC MEAN
Dr Nicola Dibben
n.j.dibben@sheffield.ac.uk
Background:
In very broad terms, two received views of music and meaning can be identified.
The first is that meaning is inherent in musical material. This approach can be
seen in the work of music psychologists on emotional responses to music and the
perception of meaning. The second view is that the meanings attributed to music
are wholly constructed - an approach that can be found in recent musicological
writings.
Aims:
This paper argues against both an entirely immanent view of musical meaning,
and a naively constructivist account, and puts foward an alternative which
attempts to capture the mediating role of musical material conceived as
socially and historically constituted.
Main contributions:
I argue that music is made to mean through a range of processes (e.g. discourse
about music such as journalistic and musicological writing, the values created
for music by its use in advertising and its marketing, the rituals and
practices which accompany music performance and consumption, etc). The way in
which music is made to mean is not simply a free-for-all in which "anything
goes" but a process in which meanings are created, shared, sustained, and
appropriated. The role of musical material in this is that it bears the traces
of its history of use, and thus embodies social sediment in its material form.
Implications
This view of musical meaning recognises the mutuality of listener and musical
material; it avoids essentialising meaning as inherent in the sounds of music
but recognises the role of compositional material as socially and historically
formed; and acknowledges the role of a wide range of social processes in the
construction of meaning in music. Empirical work which already begins to
explore meaning and music in this way is presented and the implications for
future empirical research are outlined.
Back to index
Proceedings paper
1 Introduction
1.1 Aim
2.1 Introduction
3 Conclusions
4 References
1. Introduction
1. Aim
In the past fifty years a great amount of studies have revealed a large number of factors that play a role
in the perception of music (Krumhansl, 2000). However, knowing what factors play a role is only a
first step towards understanding music perception. Real insight presupposes a theory that specifies
how these factors function in the processing of music, or more precisely a theory that specifies what
transformations are performed on an input leading to a mental representation. Only such a theory can
make specific predictions about how a concrete series of tones is perceived. Frameworks for a theory
of music perception have been proposed by Deutsch & Feroe (1981) and Lerdahl & Jackendoff (1983).
Although experimental evidence has been reported supporting these frameworks, no concrete
predictions can be derived from these theories.
The goal of this study is to develop a computational model, based on a set of assumptions, that
captures the on-line processing of music. The model construes music perception in terms of 1) the
activation of pertinent musical knowledge stored in the listener's long term memory, and 2) the
application of perceptual mechanisms that organize the elements in the input into a coherent mental
representation. The viability of the model is investigated in experiments that examine how perception
evolves while the stimulus is presented incrementally, by studying goodness judgments and the
expectations that arise in the process. The model we are developing mainly pertains to the stage in
which the elements in the input is transformed into a mental representation.
The points of departure of this study are: 1) a theoretical framework and 2) two earlier experimental
studies.
2. Theoretical framework
First we present a global outline of a model of music perception that schematically represents the
primary processes in music perception. See Figure1.
First, the scheme indicates that music perception is a process in which two types of information
interact: bottom-up information, consisting of the series of pitches presented to the listener
(represented as f1, f2, f3,... in the figure), and top-down information represented by all knowledge
relevant to music perception stored in long term memory. Second, the scheme conveys the incremental
character of music perception by the cyclic pattern in which pitch input is entered sequentially and a
succession of processes is executed repeatedly. Third, three groups of processes are displayed, all
relying on information stored in long term memory (LTM) denoted by the arrows, and each generating
different perceptual products. The first group of processes relates to the establishment of the
interpretative frames required: key-inference and meter-inference respectively. The second group of
processes is concerned with encoding in which series of pitches in the input are grouped into chunks
on the basis of structural regularities. In the third group of processes, the chunks generated in the first
encoding phase are integrated into even larger chunks, leading to a complete mental representation of
the input. Next we shall describe these processes in more detail.
The process of music perception may be conceived as the mapping of the input on musical knowledge
stored in the long term memory of the listener, and as the application of perceptual mechanisms to an
input consisting of a sequence of pitches. The aim of the process is to transform a series of
unconnected pitches into an integrated mental representation in musical terms. A sequence of sounds
which is conceived musically (rather than linguistically or otherwise), will be mapped on two
dimensions: the pitch-height dimension yielding the pitch of the sound, and the key-dimension
yielding the attribute of scale-degree. The key-dimension is the hierarchically organized mental tone
space in which the relations between tones and chords are specified (Krumhansl, 1990). It serves as an
interpretational frame that supplies the musical function of the sounds. As soon as the pitches of a
sequence have activated a specific key, they are identified as tones in a scale. All tones in a key are
associated with a certain degree of 'stability' (e.g., the first tone of the scale is the most stable tone, the
last tone, the 'leading tone', the least stable), and with a tendency to resolve to other tones (Cooke,
1959; Povel, 1996; Zuckerkandl, 1956). Thus, in making a musical interpretation of a tone series, the
tones function simultaneously in these two dimensions. Each dimension plays a specific role in the
formation of musical percepts:
mechanisms are applied to the input establishing relations between the elements in the input and lead
to a representation in terms of clusters of tones. Thus we assume that the aim of a listener is to
generate a mental description or code that encompasses as much as possible all elements in the input.
If this process is successful, the listener will have the impression that (s)he understands the input, that
it make sense musically. If, conversely, a listener does not succeed in finding such relations, no
coherent musical percept will result.
3. Influential earlier studies
1. Van Dyke Bingham (1910)
Two studies have played a role in shaping this research. The first is Van Dyke Bingham (1910)
who studied the factors that determine whether or not a tone series is perceived as a melody. As
an example he describes two sequences respectively containing the pitches: c' e' g' e' f' d' c' and
c' f' d' g' e' f' c'. The first of these was judged by listeners as a coherent sequence in which the
sounds seem to follow each other naturally, thus forming an esthetic unity, i.e. a melody. The
second sequence, however, was judged to be a non-melody. Van Dyke Bingham asserts that the
concept of tonality plays a decisive role in the processing of tone series as expressed in his
definition of the term:
'By a tonality is meant a group of mutually related tones, organized about a single tone, the
tonic, as the center of relations. Subjectively, a tonality is a set of expectations, a group of
melodic possibilities within which the course of the successive tones must find its way, or suffer
the penalty of not meeting these expectations or demands of the hearer and so of being rejected
as no melody.' (p. 36-37)
From this study we borrowed some of the theoretical ideas proposed above, as well as the idea
for a response, asking people whether a tone series can be conceived as a melody.
2. Cuddy, Cohen, & Mewhort (1981)
The second study is the seminal article by Cuddy, Cohen & Mewhort (1981) in which the authors studied the
perception of tone sequences having "varying degrees of musical structure". Starting from the prototypical
sequence: {C4 E4 G4 F4 D4 B3 C4}, they constructed a set of sequences by altering one or more tones thereby
gradually degrading the "harmonic structure", contour complexity, and excursion size (interval between first
and last tone). From the results of Experiment 1 in which subjects judged the "tonality or tone structure" of
32 seven-tone sequences, 5 levels of harmonic structure were constructed by combining 3 rules: 1)
diatonicism (a series either or not consists of only diatonic tones); 2) leading-note-to-tonic ending; 3) the
extent to which a sequence follows a I - V - I harmonic progression. These levels of harmonic structure were
factorially combined with 2 levels of contour complexity and 2 levels of excursion, yielding 20 stimuli. The
stimuli were recognized under transposition (Experiment 2), and the tonal structure of the stimuli was rated
in Experiment 3. Findings indicate that the ratings were mostly influenced by the factor harmonic structure
and less by contour and excursion.
The importance of this study is twofold: the use of similarity judgments and goodness ratings to measure the
perception of tone sequences, and its aim to discover factors that play a role in the perception of tone series.
Yet the study is limited in a number of respects. First, the concept of harmonic structure is rather ambiguous:
the ordering of the 5 levels of harmonic structure is not theoretically but empirically determined. This means
that it is unclear how the three rules precisely determine the variable harmonic structure. Second, it is not
clear what the subjects actually judged: besides being asked to judge the tonality or tonal structure, they were
instructed 'to reserve the highest ratings for sequences with "musical keyness" or "completeness" and to
assign lower scale values to sequences that contained "unexpected" or "jarring" notes.' (p. 875). Thus it
seems likely that the subjects have judged how well the tone series sounded as a melody; this is supported by
a study of Smith & Cuddy (1986) that obtained comparable results when listeners rated the same sequences
on "pleasingness". Third, although the rules affect the 20 sequences used in the study, it is unclear to what
extent the rules can be generalized to other tone sequences. For instance, the sequences {C4 E4 G4 F4 B3 D4
C4} and { C4 E4 G4 B3 D4 G4 C4}, violating the leading-tone-to-tonic-ending will probably be rated about as
high as the prototypical sequence { C4 E4 G4 F4 D4 B3 C4} which obeys that rule; and the sequence {C4 E4
F#4 G4 F4 D4 B3 C4} violating the rule of diatonicism (but allowing anchoring) will probably be rated much
higher than the sequence {C4 E4 G4 F4 D#4 B3 C4} used in the study. These examples do not undermine the
general finding that harmonic structure plays a role, but indicate that the definition of harmonic structure in
terms of stimulus characteristics is still incomplete. Finally, although the study shows that perception is
strongly influenced by the presence of detectable structure in tone sequences, it does not indicate the concrete
processes that are performed on the input resulting in a mental description.
2. Tracing the perceptual mechanisms in music processing.
1. Introduction
As stated before the general aim of our research is to understand the on-line processes that a listener
performs when perceiving music. This processing is conceived as the application of mechanisms that
combine elements in the input into larger chunks. The concrete goal is to develop a computational
model that describes how mechanisms are applied to the input leading to a more or less successful
mental description of the input. The success of the undertaking is determined by how well predictions
derived from the model are borne out in experiments.
Thus the specific goal of this study is to understand why some tone series are perceived as a melody
and other series are not. On the assumption that a tone series is considered a melody if the perceiver
can create an efficient code that includes possibly all tones of the series, the challenge for the approach
is to discover all perceptual mechanisms that listeners use in coding music.
There are a number of tasks one may use to examine the perception of tone series. Listeners may be
asked to judge the goodness or pleasantness (tonal structure etc.) of a series, or they can be asked to
indicate whether the series contains jarring notes. In other tasks subjects judge the similarity between
notes (for which a transposition paradigm may be used), or indicate which tones they expect at
different moments in the series. In our experiments we have used goodness judgments and expected
continuations. These experiments are described below.
2. Experimental studies
1. Series containing diatonic and chromatic tones
In a few experiments (Povel & Jansen, 1998) we studied the perception of a series of tone
sequences consisting of a subset of all orderings of the collection {C4 E4 F#4 G4 Bb4}. The
presentation of a tone sequence was preceded by the chords C7 - F to induce the key of F-major.
Based on a pilot study in which subjects judged how well fragments of the tone sequences
sounded as a melody, it was hypothesized that a tone series is judged a melody if either one or
both of the mechanisms chord recognition and anchoring can be applied to the series. Chord
recognition is the mechanism that describes a series of tones as a chord, and anchoring is the
mechanism that links a tone to a (chord) tone occurring later in the series. Applied to the stimuli
used in the experiments a sequence of tones may be conceived as a chord, namely C7, which is
feasible when the F#4, which does not belong to the chord, can be "anchored" to a subsequent
G4. Anchoring (Bharucha, 1984) may either be immediate when the G follows the F# like in the
tone series {C4 E4 F#4 G4 Bb4}, or more or less delayed when one or more tones intervene
between the F# and G as in the series {E4 F#4 C4 G4 Bb4} or {Bb4 F#4 E4 C4 G4}.
This hypothesis was tested in two experiments using a paradigm in which the participants heard
stepwise lengthened fragments (beginning with a fragment of length three) and rated the
melodic goodness of the fragment (Experiment 1) or played a few tones that completed the
fragment (Experiment 2). It was found that goodness ratings were highest if the fragment only
contained elements of the C7 chord, lower if the F# was immediately followed by the G, still
lower if one tone intervened between F# and G, and lowest if the G preceded the F#.
Unexpectedly, it was found that series in which two or three tones intervened between F# and G,
were rated higher than those with only one tone between F# and G. As in these series the
non-fitting F# occurred relatively early and the last three or four tones formed a C7 chord, this
finding was tentatively explained by assuming that goodness ratings are mainly based on the
most recent tones heard. Listeners' expectations collected in the second experiment corroborated
the above findings: series that activate the chord C or C7 (according to the hypothesis) tended to
be continued with the tone F, whereas series ending with the tone F# tended to be continued
with the tone G, later followed by the tone F.
Overall, the results support the hypothesis that the coding of these sequences was based on the
application of the mechanisms chord recognition and anchoring. As the interaction between the
two mechanisms is still not quite understood, we decided to subsequently study tone sequences
only containing diatonic tones.
2. Series only containing diatonic tones
In this experiment 20 subjects rated the goodness of 60 orderings of the collection {D4 E4 F4 G4
A4 B4} on a 5-point scale. In the experiment each series was preceded by the chords G7 and C
to induce the key of C-major and presented at a different pitch height. To explain the results a
number of computational models were developed based on a set of general assumptions
concerning music perception and a number of specific assumptions regarding the processing of
music. The general assumptions were: The pitches that are the basic constituents of a tone
sequence can be conceived in two ways: 1) As a sequence of pitches forming a contour the
sequential regularities of which can be described in a code; 2) As a sequence of tones conceived
within a key as a result of which the tones acquire the perceptual attributes stability and
expectation. These assumptions lead to the hypothesis that a tone sequence will be judged a
melody if the listener can mentally construct a code that includes all tones and in which the
raised expectations are resolved.
Regarding the coding aspect we assume a number of mechanisms that organize elements in the
input into higher order mental units such as: runs, chords, trills, motives, ornaments etc.
The expectations that are created when the input is interpreted in a key, are described in terms of
vectors. A vector has a direction, that points to some future musical unit, and a magnitude
representing the strength of the expectation. Specific assumptions regarding vector assignment
are: 1) vectors may be created by all mental units in which the listener codes the input, e.g. tones
and chords; 2) vectors are assigned by reference to the currently activated region. For example
in the series {B4 F4 G4 D4 A4}, the first four tones will induce the chord G7, as a result of which
the tone A will get a vector pointing towards the closest most stable element in the G7 chord,
namely G. Specific assumptions regarding vector resolution are: a vector will resolve
(disappear) 1) with time (the magnitude decreasing with some time function); 2) if the expected
tone occurs either immediately or after some delay; 3) if the vector-carrying tone is integrated in
a code (e.g. if the series {D4 E4 F4 G4 A4 B4} is conceived as a run, only the last tone B4 will
carry a vector).
Based on these assumptions, a model was developed that describes the coding of the tone series
in terms of runs and chords, and the resolution of expectations in terms of the logic of the
succession of recognized chords. The model was implemented as follows: Neighboring tones
having an interval of 1 or 2 semitones are chunked into runs, while the remaining tones are
recognized as triads on one of the seven scale degrees. Several assumptions regarding chord
recognition were made as a tone series may in principle allow for several harmonic
interpretations. For instance, the series {E4 A4 F4 D4 B4 G4} presented in the key of C-major,
may activate several chords: vi, (E4 A4); IV, (A4 F4); ii, (A4 F4 D4); vii, (F4 D4 B4); V, (D4 B4
G4); V7, (F4 D4 B4 G4); and V9, (A4 F4 D4 B4 G4). Chord recognition was implemented as
follows: 1) three different subsequent tones always lead to the unique identification of a chord;
2) two tones forming an interval of a fifth also always lead to the identification of a unique
chord; 3) a series of two different tones forming an interval of a third is interpreted as the major
triad (I, IV, or V) in which that third occurs; 4) vii is interpreted as V.
Finally, the logic of chord order is based on Piston's (1941/1989) Table of usual root
progressions, in which three categories of progressions were distinguished occurring in
Povel, D. J. (1996). Exploring the elementary harmonic forces in the tonal system. Psychological Research, 58,
274-283.
Povel D. J., & Jansen, E. (1998). Perceptual Mechanisms in Music Perception. Internal Report NICI.
Smith, K. C., & Cuddy, L. L. (1986). The Pleasingness of Melodic Sequences: Contrasting Effects of Repetition
and Rule-familiarity. Psychology of Music, 14, 17-32.
Van Dyke Bingham, W. (1910). Studies in melody. Psychological Review, Monograph Supplements. Vol. XII,
Whole No. 50.
Zuckerkandl, V. (1956). Sound and Symbol. Princeton University Press.
Back to index
Proceedings abstract
Patrik N. Juslin
Department of Psychology
Uppsala University
Box 1225
SE -751 42 Uppsala
SWEDEN
Background: A number of philosophers, psychologists, and natural scientists have speculated that speech and music
share a common origin - a notion that implies that the two modalities should have much in common. However, despite
considerable interest in this issue, there has been little empirical evidence to support such cross-modal parallels. This is
unfortunate since evidence of cross-modal parallels could offer a partial explanation of why music is perceived as
expressive of emotion.
Aims: This paper eavesdrops on the results from a systematic review of studies of emotional expression in speech and
music performance. The principal aim of the review was to explore the extent to which there are cross-modal
similarities between speech and music performance by integrating the results from a large number of empirical studies
in both domains.
Main Contribution: The results show that there are many parallels between speech and music performance with regard
to (a) accuracy, (b) coding, (c) code usage, (d) cue intercorrelations, (e) gender differences, and (f) the use of
expressive contours. However, the results also show that many of the acoustic cues remain to be studied systematically,
and that the relationships among acoustic cues and emotions are not consistent across different conditions.
Implications: The results support the often suggested hypothesis that speech and music share a common origin. A
theoretical explanation of the obtained results is provided and implications for future research are discussed. It is
argued that cross-modal comparisons yield insights that would be difficult to obtain from studying the two domains
separately.
Back to index
Proceedings abstract
HOW TO GET A PIANO INTO YOUR HEAD - EFFECTS OF PRACTICE ON CORTICAL AND
SUBCORTICAL REPRESENTATIONS OF THE SOUNDING KEYBOARD
Marc Bangert
Marc.Bangert@hmt-hannover.de
Background:
Aims:
method:
Dissociation paradigm: Subject's task was either to (1) listen to piano tones
passively, (2) press mute piano keys, or (3) practice on a modified piano with
randomly re-assigned key-to-pitch coupling. Data acquisition: Cortical:
32-channel DC-EEG (task-related slow potentials, event-related
desynchronisations, coherences). Subcortical: Classical conditioning of the
eyeblink reflex on particular notes and motor transfer. Behavioural: Detailed
performance analysis based on MIDI.
Results:
After practice, cortical auditory and sensorimotor areas are jointly activated
for purely auditory as well as for mute motor tasks. In addition, a right
dorsolateral prefrontal area engages in this corepresentation in beginners and
experts but not in controls who practiced on the manipulated piano in a way
that they could not establish a mental "map" of the keyboard. The eyeblink
experiment revealed interindividually heterogeneous results, but subcortical
audiomotor integration seems to be possible after years of training. The
manipulation experiments suggested a correlation between the flexibility to
re-learn a shuffled keyboard and individual practice habits (jazz vs. classic).
Conclusions:
Back to index
Proceedings Paper
Dorothy Miell, Department of Psychology, The Open University, Walton Hall, Milton Keynes MK7
6AA
Telephone: + 44 (0)1908 654 546
email: D.Miell@Open.ac.uk
Introduction
This paper reports two studies which have investigated the impact of social factors have upon
children's musical creativity. We have been concerned to explore these social factors since making
music is so essentially a social process, particularly in the collaborative settings of UK classroom
music lessons, with the interaction between children both affecting and being affected by the evolving
music. In exploring the processes involved we have drawn on the literature from social and
developmental psychology, particularly the growing literature on collaborative learning, which
emphasizes the importance of quality communication between children and the importance of
agreeing shared goals and ways of working together (Kruger, 1993; Rogoff, 1990). However, most of
the studies conducted on collaborative learning are of children's maths and science work, and there are
few empirical studies of children's collaborations on more open-ended tasks such as music
composition or creative writing (see Johnston, Crook & Stevenson, 1995 for an exception). A crucial
aspect of studying collaborative music making is that music affords a channel of communication other
than verbal interaction (Morgan, 1999). In addition, there is a reciprocal interaction between the
ongoing musical and verbal communication between children in this context (MacDonald, Miell &
Morgan, in press)
A key variable investigated in the studies reported here is the effect of an existing relationship
between the children on the way in which they communicate and work together. We expected that,
given the need for quality communication and for establishing a 'shared social reality' (Rogoff, 1990)
in order to achieve successful collaboration, working with a friend would be particularly helpful for
children. This might be expected to be particularly the case in creative, open-ended tasks where the
children have to not only work together on the task itself (eg composing a piece of music), but also
have to define the goals of their work and negotiate with each other without there being a 'right
answer' to guide them, as well as stimulate and build on each other's creative input. Such interactional
work is, we hypothesised, more likely to be achieved successfully when a child is working with
someone they have an established friendship with, where they have experience of working, talking
and playing together successfully.
Study One
In this study, 10-11 year old children were asked to compose a piece of music entirely of their own
and in a style of their choosing to reflect the theme of 'the rain forest'. The children all began their
involvement with the project by attending a workshop with one of the researchers during which they
experimented with different instruments, rhythms, dynamics etc and discussed ways in which
compositions can be developed and different effects achieved. The experimental sessions involved the
children working on their compositions in same sex pairs and they were given 15 minutes to complete
the task, using a full range of instruments typically available to them in school music lessons (tuned
and untuned percussion and keyboards). Half the children worked on the task with one of their best
friends while the other half of the children worked on the task with a child from a different class who
they would have known by sight but who was not a friend.
We were interested in both the nature and quality of the interactive process as well as in the quality of
the musical end product, and with this in mind we videotaped all the composition sessions and also
recorded onto an audiocassette each pair's final performance of their composition. All the verbal
utterances and musical motifs from the videotapes were transcribed and the talk was then coded in
accordance with a system introduced by Berkowitz et al. (1980) and developed by Kruger (1992).
This coding system divides utterances into 'transactive' and 'non-transactive' types. Transactive
communication is defined as communication which builds upon and extends ideas that have already
been voiced (either by the self or the partner) and the presence of transactive communication has been
shown to be a key factor in good quality collaboration. We adapted this verbal coding system to allow
us to also code the music played by the children as either transactive or non-transactive and to track
the occurrence and elaboration of each musical motif throughout the composition session
(MacDonald, Miell & Morgan, in press). The final compositions were rated for quality by a teacher
from another school who worked from the audiotape of each composition and was unaware of the
hypothesis of the experiment, the experimental conditions and all details of individual pairs. She rated
the compositions using a set of marking scales developed by Hargreaves, Galton & Robinson (1996).
The results of this study highlighted the impact that social factors such as friendship have upon both
the process and outcomes of children's collaborative compositional work. Looking first at the outcome
measure, the teacher rated the compositions produced by friends as of significantly higher quality than
the compositions of children who had been working in non-friendship pairs. Having established this
difference in the overall quality of the music produced, we then turned to the measures of the
processes involved in the talk and music of the interaction to see if there were also differences there
which related to the outcome scores. We found that both the musical and verbal communication styles
of the friendships pairs were qualitatively different from those of the non-friends. The friends both
spoke and played more music in total than the non-friends, but also had a different pattern of
interacting within these overall differences in amount. The friendship pairs used proportionally more
transactive communication in both the verbal and the musical domains than the non-friends. This
meant that the friends were building on, extending and elaborating on each other's ideas, expressed in
both the talk and music, and developing their compositions by this gradual process of offering and
refining suggestions. This style of interaction was found to be significantly positively related to the
teacher's higher score for these pairs, suggesting that the presence of more transactive communication
was what led to the higher quality compositions from the pairs of friends. In contrast, non-friends
were more likely to spend their time in the session experimenting with the instruments for themselves
and did not offer up or develop ideas together in the same way. The smaller amount of talk which they
produced was characterised by information giving and simple, unelaborated agreements and
disagreements with each other. Sometimes the music seemed to be played to cover their
embarrassment and the lack of talk between them.
Thus it appears from this study that social factors such as friendship are key variables that influence
the nature of children's interactions - in both the verbal and musical domains. The musical coding
scheme which we developed allowed us to track interactive processes expressed musically as they
occurred in the composition sessions and holds great promise for future studies of other groups and
pairs collaborating to compose and improvise.
Study Two
A second study was designed to extend the first. Two key issues were identified as important for
further investigation. The first was to explore the extent to which the friendship effect found in the
first study might generalise to other settings. In particular, given the finding by Azmitia &
Montgomery (1993) that working with a friend mainly helped children when they tackled difficult
tasks, we wanted to vary the difficulty of the task. One way of changing the difficulty was to change
the level of structure and guidance given to the children, so that children would find they had fewer
choices and decisions to make for themselves. The participants in this case were limited to using only
a keyboard and to starting their composition with a middle C (instructions derived from a study by
Kratus (1989) looking at the composition process in 7 year old children). In order to see whether the
friendship effect was also found in younger children, this second study also involved 8 year old as
well as 11 year old children. As in the first study, children worked in same sex pairs of friends or
non-friends, with one child in each pair being more experienced musically than the other. Again as in
the first study, the interactions between the pairs and the outcome of the collaborations were
examined. Dialogue between the pairs of friends and their musical interaction were analysed using
measures of transactive and non-transactive communication. The musical processes used by the pairs
were also examined. A school music teacher finally graded each composition.
Results highlighted that older children and those working with a friend took part in more transactive
communication - both in their dialogue and in the music that they played. At 11 years old it appeared
that there were no differences between the friends and non-friends in either the amount of transactive
communication or the scores received for the compositions, whereas at 8 years old the differences
between friends and non-friends were more apparent. Compositions by younger children paired with a
friend were given a higher score by the teacher and used more transactive verbal and musical
communication. In the analysis of the musical processes used, it also became clear that 8 year olds
paired with a friend were able to organise their time to include sufficient quantities of development
and rehearsal of their piece in a manner similar to that of the older children. The 8 year olds paired
with a non-friend, however, were found to spend most time in individual exploration of sounds or
silence.
The 8-year-old children paired with a non-friend took part in considerably less transactive dialogue
than the older children and than those of the same age paired with a friend. At 11 years of age, little
difference could be seen between the discussion style of the friends and non-friends, both taking part
in high amounts of transactive and useful non-transactive dialogue such as making proposals. In this
age group, scores given to the compositions were similar for the two groups, suggesting that the ways
of working together on a structured music task were of a similar nature by this age level whether one
is working with a friend or acquaintance, although we had observed differences in the previous study
where the task was unstructured. At 8 years of age, however, the relationship with the partner appears
to make more impact on the type of discussion that took place. At this age level, those paired with
friends took part in more other-oriented transactive communication - in terms of their statements,
questions and responses. The group scoring the lowest final composition score, the 8-year-old
non-friends, were found to use the least amount of dialogue overall, in both transactive and
non-transactive categories. Their counterparts of the same age paired with a friend, meanwhile, were
discussing the task in a style much closer to that of the 11-year-old participants.
In order to find out whether the musical interaction between the children matched the differences
found in the verbal interaction, an analysis of the amounts of transactive and non-transactive musical
communication was then carried out. In comparing the two age groups, it became clear that the
younger children had spent significantly more time playing 'music for self' - experimenting
individually without any apparent attempt to communicate with the partner. The older children had
also taken part in a significantly greater amount of musical repetition, rehearsing their composition
more thoroughly. The variable of friendship also revealed expected differences such as the
non-friends playing more 'music for self', and the friends using a greater number of other-oriented
transactive musical motifs. Although these effects for friendship and age separately were found, it was
again the interaction between these two variables that clarified the different ways of working in the
pairings.
As found in the verbal interaction, little differences in musical communication could be seen between
the friends and non-friends aged 11. In the 8-year-old group, however, the amount of transactive
musical communication was found to be more than double that of the non-friends. At the same time,
the musical activity in the friends of 8 years consisted of greater music for self, and less musical
repetition. Whereas the older children and the younger children paired with friends spent time
working on one another's suggestions, therefore, this group devoted most time to experimenting with
sounds individually, and without aiming to rehearse and close on one musical product.
The musical techniques used by each child were analysed using the categories devised by Kratus
(1989). In his study he examined the minute-by-minute development of the children's music as it fell
into 4 categories: repetition, development, exploration and silence. The same analysis was conducted
on the music played by the various pairs in the present study. The older children structured their time
to include greater repetition, which gave a greater opportunity for rehearsing their composition. The
8-year-olds, meanwhile, were found to have longer periods of silence and of musical exploration -
playing for themselves rather than their partner. The younger children spent more time in complete
silence with no musical or verbal interaction going on, and when actually playing were more likely to
be experimenting with new musical ideas for themselves. It was apparent that in the younger age
group it was the non-friends who took part in a significantly greater amount of exploration. This
group was also found to spend the least amount of time on repetition.
The analysis of the processes used at different times during the 10-minute period furthermore revealed
the planning of time by children in each age group and type of friendship pair. The pattern of playing
over time for the 11-year-old friends and non-friends could be seen to be generally similar, with
repetition as the predominant technique, increasing over time in order to allow for sufficient rehearsal
before the end. Exploration during this time decreased gradually, and development increased around
the middle of the session yet decreased later in order to make way for further repetition as the end
One aspect of the collaborative work of friends suggested by Azmitia and Montgomery (1993) was
that of greater checking and evaluating of solutions, a feature which in fact became apparent when
carrying out the study. Watching the friends at work together even at this point revealed that a
noticeable part of their discussion revolved around requesting to each other that they practice again,
going over fine points quickly in order to do so. The findings of the current study particularly relating
to the 8-year-old children were therefore found to support this theory. Whereas 8 year olds working
with a non-friend took part in little rehearsal of their composition, those working with a friend used
repetition within their 10 minutes at a level close to that of the 11 year olds. Relating to the suggestion
of Nelson and Aboud (1985) that criticism of the partner's work allows friend to give a superior
performance, it was also confirmed that friends do check and make suggestions on the partner's work
to a greater extent, taking part in greater verbal and musical transactive other-oriented suggestions.
A further proposal from the work of both Azmitia and Montgomery (1993) and from Newcomb and
Brady (1982) is that differences in performance between friends and non-friends are more likely to
emerge in challenging tasks rather than straightforward ones. This can be seen as one explanation for
the advanced interaction and superior performance of the younger friends, in that due to the younger
children having had less musical, and indeed less collaborative problem solving experience in general,
the task obviously presents a more challenging assignment for them than for the older children. For
the older children, we would suggest that no differences were found between friends and non-friends
in this study since the greater degree of structuring to the task made it easier than the more open ended
task set in Study 1, where differences between 11 year old friends and non-friends were found. In
Study 1, the children had to make a number of choices and decisions for themselves and it was
suggested that the friends were more successful since they had a more developed style for working
together and making such decisions collaboratively. With this more structured task, the advantage of
being with a friend was less apparent and the effect was not observed (as indeed is often the case in
studies of structured maths and science tasks with 11 year olds).
Hartup (1996) suggested that the important feature of friends' collaborative work was the ability to
establish 'joint productive activity' and this is backed up by several features of the children's
interactions in the current study. It became clear when conducting the study that the 8 year olds
working with a non-friend partner required much greater amounts of prompting in order to become
involved in the task. Even after this, it appeared that the non-friends at this age were unable to move
from one stage of the process to the next with ease, choosing to maintain a high level of
experimentation throughout the 10 minutes, or indeed waiting in silence for periods of time until one
partner could make a suggestion. Non-friends, then, had to struggle to try and establish a way of
working together before any productive activity could take place (particularly as here when no
structure was provided in the task instructions) - a feature which comes naturally to friends with a
history of such interactions.
Despite the findings of this and previous studies of the effective verbal interaction style of
collaborating friends, the fear of off-task talk between friends often appears to stop teachers from
pairing children in this way. Previous work by Miell and MacDonald (in press) and Hartup (1996),
however, found that contrary to expectations, friends spent less time in off-task talk than non-friends.
Although it was observable when carrying out the current study that friends appeared often more
tempted to play a tune they knew to entertain their friend, the overall lesser amount of general
exploration by the friends suggests that this did not affect the productiveness of the collaborations.
Although, therefore, more off-task talk was found between the friends, and in particular those of 11
years old, this was compensated for by the much greater amount of on-task talk by the friends. It
appears in the case of the 8-year-olds that more off-task talk in the friends did not mean that they were
communicating less effectively, merely that the non-friends generally failed to communicate with one
another at all.
A clear picture of the effect of friendship in these two groups has emerged, revealing that being in a
friendship pair does allow 8 year olds to engage in transactive discussion, musical interaction and
effective use of musical processes in a way that their non-friend same age counterparts cannot, and
that older children show the same pattern when working on a less structured task. These results
contrast with suggestions by Harrison and Pound (1996) that setting composition tasks to younger
children may curb enthusiasm and imagination, it appears rather that collaborating with a friend
allowed 8 year olds to use their friendship as a resource in encouraging increased motivational,
organisational and imaginative ability.
Conclusions
The two studies reported here were designed to investigate the social processes involved in children's
creative collaborations. The studies focused upon the process and outcomes of both the musical and
verbal interactions. The first study highlighted that when children aged 11 work with someone they
know well they produce proportionally more transactive communication. That is, communication that
developed upon ideas previously proposed. This result was evident in both in the verbal and musical
domains In addition, the compositions produced by pairs of friends are rated as being of a higher
quality than the compositions produced by children working with some they did not know. In study
two it was found that children aged eight produced similar findings to the first study when they were
working on a more structured musical tasks. No differences of this nature were found between
friendship and non friendship pairs of children aged eleven in this study. It is suggested that
differences in the nature of the musical tasks employed help explain the differences in the findings of
the two studies.
REFERENCES
Azmitia, M. & Montgomery, R. (1993) 'Friendship, transactive dialogues, and the
development of scientific reasoning.' Social Development, 2 (3), 202-221
Berkowitz, M.W., Gibbs, J.C. & Broughton, J. (1980) 'The relation of moral judgement
disparity to developmental effects of peer dialogue'. Merrill-Palmer Quarterly, 26,
341-357
Gottman, J. (1983) 'How children become friends'. Monographs of the Society for
Research in Child Development, 48(3).
Hargreaves, D.J., Galton, M.J. & Robinson, S. (1996) 'Teachers' assessments of primary
children's classwork in the creative arts.' Educational Research, 38 (2), 199-211
Harrison, C and Pound, L (1996) Talking Music: Empowering children as musical
communicators. British Journal of Music Education, 13 (3)
Hartup, W.W. (1996) 'The company they keep: friendships and their developmental
significance' Child Development, 67, 1-13
Johnson, P.G., Crook, C.K. & Stevenson, R.J. (1995) 'Childs play: Creative writing in
playful environments' in H.C. Foot, C.J. Howe, A. Anderson, A.K. Tolmie & D.A.
Warden (Eds) Group and Interactive Learning. Boston: Computational Mechanics
Publications.
Kratus, J.K. (1989) A time analysis of the compositional processes used by children aged
7 to 11. Journal of Research in Music Education, 37, 1, 5-20
Kruger, A.C. (1992) 'The effect of peer- and adult-child transactive discussions on moral
reasoning'. Merrill-Palmer Quarterly, 38, 191-211
Kruger, A.C. (1993) 'Peer collaboration: conflict, co-operation or both?' Social
Development, 2 (3), 165-182
MacDonald, R.A.R., Miell, D. & Morgan, L. (in press, 2000) Social processes and
creative collaboration in children. European Journal of the Psychology of Education.
Miell, D. & MacDonald, R.A.R. (in press, 2000). Children's creative collaborations: The
importance of friendship when working together on a musical composition. Social
Development
Morgan, L. (1999) 'Children's Collaborative Music Composition: Communication
through Music'. Unpublished dissertation, University of Leicester, UK.
Nelson, J. & Aboud, S. (1985) 'The resolution of social conflict between friends' Child
Development, 56, 1009-1017
Newcomb, A. F, and Brady, J.E (1982) Mutuality in boys' friendship relations. Child
Development, 53, 392-395.
Rogoff, B. (1990) Apprenticeship in thinking: Cognitive development in social context.
Oxford University Press: Oxford.
Back to index
Proceedings paper
More about the (weak) difference between musicians' and nonmusicians' abilities to process
harmonic structures
Bigand, E.*, Poulain, B.*, D'Adamo, D.*, Madurell, F.** & Tillmann, B.***
* LEAD-CNRS, Université de Bourgogne, France
** Music Department, Université Paris IV Sorbonne
*** Dartmouth College
Musicians and non musicians often behave similarly when they are required to perform musical tasks
that are not more familiar for the former than for the later. The purpose of the present study was to
further investigate this issue by using an on-line experimental paradigm designed to assess the
influence of cognitive and sensory components on the development of harmonic expectancy.
Experiment 1 involved a harmonic priming task in short contexts. The aim was to assess the
contribution of musical expertise on the differentiation between regular and less regular resolutions of
a diminished chord. In one condition, a prime chord (say a B diminished chord) was followed by a
target that was either one of the four possible resolutions of the diminished chord (C major chord for
example) or a less legal resolution (C# for example). Interestingly, less legal targets share more
component tones with the prime than do legal targets. According to sensory priming, illegal targets
should be processed more easily than legal ones due to the shared component tones. According to
Western musical rules, the processing of the target chord should be more facilitated (more accurate
and faster) for legal targets than for illegal targets. The critical point of the study was to investigate
the contribution of sensory and cognitive components of harmonic priming as a function of the extent
of musical expertise. Sensory priming was expected to predominate over cognitive priming in non
musicians, and a reverse tendency was expected in musicians.
Method. Participants performed a harmonic priming task by providing a simple perceptual judgment
on the target chord. Following Bharucha and Stoeckig (1987), the required perceptual judgment
concerned the sensory consonance of the target. For the purpose of the experimental task, half of the
target chords were rendered dissonant by adding either an augmented octave or an augmented fifth to
the perfect major triad. The velocity of this added tone was adjusted in order to render the dissonance
moderately hard to perceive. Participants had to judge as quickly and as accurately as possible
whether the second chord of the pair was consonant or dissonant. This perceptual judgment do not
require participants to pay attention to the harmonic relationship between prime and target. However,
we expected that this judgment will be more or less easy depending on this harmonic relationship.
Results. Participants with high level of musical training and participants with no musical training
processed the target chord more easily in the legal resolution condition than in the illegal condition.
This outcome suggests that for both musicians and non musicians cognitive components predominate
The goal of Experiment 2 was to extend this conclusion to longer harmonic contexts. Harmonic
priming has been shown to occur in chord sequences of different lengths for both musicians and non
musicians. In Bigand & Pineau (1997), the target chord was easier to process when it acted as a tonic
rather than as a subdominant chord in the preceding context. Bigand and Pineau's experiments were
not designed to definitely contrast the respective contribution of sensory and cognitive priming. The
chord sequences used in our present study attempted to address this issue. In two conditions (no
sensory priming conditions), the target chord never occurred in the preceding context. As a
consequence, the context never contained a tonic or a subdominant chord. Nevertheless, we expected
a stronger priming effect on the (implied) tonic chord than on the (implied) subdominant chord. In
two further conditions (sensory priming conditions), the subdominant chord occurred one or two times
in the context. The prior occurrence of the subdominant chord should increase the influence of
sensory priming. If long chord priming primary depends on a sensory component, the processing of
the subdominant chord should be easier than the processing of the tonic target chord. If long chord
priming primary depends on a cognitive component, the processing of the subdominant chord should
remain more difficult than the processing of the tonic target. Once again, the critical point of the study
was to assess the influence of these sensory and cognitive components as a function of the extent of
musical expertise.
Method and results. The method was identical to that of Experiment 1. The results demonstrated a
strong harmonic priming effect for the tonic chord. This priming effect was strictly unchanged when
comparing the sensory priming conditions to the no sensory priming conditions. Once again, non
musicians showed highly similar patterns of results.
Following the same rationale, we investigated in Experiment 3 the influence of horizontal motion on
the processing of the target chord. There was main effect of horizontal motion (with target chords
being more difficult to process in the bad voice leading condition), but no influence of horizontal
motion on harmonic priming. Irrespectively of the voice leading, tonic target chords were easier to
process than subdominant target chords. This finding was observed for both musicians and
nonmusicians.
Conclusions. These experiments provided evidence that for both musicians and non musicians the
processing of subtle changes in harmonic structure involves a sophisticated cognitive component that
does not depend on the extent of musical expertise.
References.
Bharucha, J. J. & Stoeckig, K. (1987). Priming of chords: Spreading activation or
overlapping frequency spectra? Perception and Psychophysics, 41, 519-24.
Bigand, E., & Pineau, M. (1997). Global context effects on musical expectancy.
Perception and Psychophysics, 59, 1098-1107.
NOTE:
This research was supported by the International Foundation for Music Research.
Back to index
Proceedings paper
While speech and music perception have received considerable attention in psychology (Deutsch, 1999; Jusczyk, 1997), relatively little
is known about the way in which humans perceive other environmental sounds. Recent literature within the field of ecological acoustics
has focused on the way in which the soundwave contains meaningful information for the listener (Ballas & Howard, 1987; Ballas, 1993;
Gaver, 1986, 1989, 1993a, 1993b; Heine & Guski, 1991; Jenison, 1997; Pressing, 1997; Rosenblum, Wuestefeld, & Anderson, 1996;
Stoffregen & Pittenger, 1995; Warren & Verbrugge, 1984). For example, Gaver (1986) proposed that acoustic properties of the sound
signal convey information that enables identification of an associated event. Sound-event mappings that express consistent information
regarding the source are termed nomic, whereas symbolic mappings consist of the pairing of unrelated dimensions. Gaver (1986)
predicted that the redundancy of information expressed within nomic mappings results in an intuitive association, and that this initial
advantage aids learning relative to the learning of symbolic mappings.
Surprisingly, few of today's 'informative' sounds would appear to build on the inherent meaning in nomic mappings. The ring of a
telephone, the buzz of an alarm clock, and the wail of an ambulance siren seem to be designed to gain attention, but may require an
additional cognitive step to link sound and meaning. Although ultimately effective, symbolic mappings of this kind may be relatively
inefficient and require an unnecessary period of learning. Accordingly, the aim of the present study was to conduct a systematic,
experimental investigation of the relative ease of learning nomic and symbolic sound-event mappings. A review of the relevant literature
begins with an outline of Gaver's theoretical framework for the field of ecological acoustics.
Everyday versus Musical Listening
Gaver (1993b) distinguished between two types of auditory perception that reflect the attentional focus of the individual. Musical
listening is the experience of perceiving properties of the proximal stimulus as it reaches the ear. Sounds are perceived in terms of pitch,
loudness, rhythm, and other acoustic components typically analysed by psychologists and psychophysicists. This auditory experience is
common during the perception of music. As an example, the sound of a melody is usually appreciated with reference to the rhythm and
pitch variations that signify the song.
The study of musical listening has received considerable attention in psychology (Deutsch, 1999; Tighe & Dowling, 1993). As a result
we understand a good deal about the way humans perceive acoustic features. However, listeners do not always focus on the proximal
stimulus but on the distal stimulus. For example, while listening to a melody it is also possible to determine the musical instrument
responsible. Contemporary psychological theories are less adept at explaining how the listener identifies the source of the sound as a
guitar.
Gaver (1993b) addressed this issue with the notion of everyday listening. During the everyday listening experience the distal stimulus is
the focus of perception. Rather than perceiving the rhythm and melodic contour of sound, the individual perceives the event responsible
for producing the soundwave. According to Gaver, perception of the event is made possible by detecting consistent causal relationships
as described by physical law. For instance, the action of plucking a metal string suspended over a resonant cavity produces a specific
pattern of air disturbances that can only be produced by a constrained number of objects and events. An individual engaged in everyday
listening is consequently able to perceive the strum of a guitar.
While everyday or musical listening can be applied during the perception of any sound, the majority of psychological research on
audition has studied the proximal stimuli as described by psychoacoustics (e.g. Rasch & Plomp, 1999). This empirical emphasis on
musical, at the expense of everyday, listening is most likely a product of the assumption that auditory stimulation requires processing
before it becomes informative. However, proponents of ecological acoustics counter this assumption, arguing that the structure of the
soundwave conveys meaning for the listener. The present study provides the first experimental examination of Gaver's conception of
everyday and musical listening, with the prediction that meaningful information regarding the sound source is accessible only during the
everyday listening experience. This hypothesis was tested by systematically manipulating instructions to encourage either everyday or
musical listening, a technique that has been shown to influence the expectations and performance of listeners (Ballas & Howard, 1987).
Learning Sound-Event Mappings
Within Gaver's (1993a, 1993b) framework, the structure of a soundwave does not specify one certain source, but rather constrains the
range of events that could have produced the sound. As a result, the specifics of the sounding event presumably have to be learned. The
association of a signal (sound) to a referent (event) can be conceptualised as a mapping (Familant & Detweiler, 1993). A nomic mapping
The purpose of this procedure is to identify integral dimensions that combine to produce a unitary perception. Nomic mappings may be
thought of in terms of the association of two perceptually integral dimensions, as both the signal and referent depict the same event. As
predicted, pitch and bar length were found to be perceptually integral, suggesting that they are nomically related. Damping and bar
length were confirmed as a symbolic mapping.
Acquisition of nomic and symbolic mappings was tested using a variation of the paired-associate learning task employed by Leiser,
Avons, and Carr (1989). There are several advantages provided by this design that are relevant. Most importantly, participants learn the
correct sound-meaning associations 'online' during an experiment via feedback: participants were required to guess the required
combinations at first exposure. This feature is useful for examining Gaver's (1986) claim that the intuitive association of sound and the
circumstances of production should be straightforward in nomic relationships.
As with all learning tasks, it is imperative to control the features of the associated referent to minimise confounds. For this reason bar
length was represented in numerical terms, with a three-digit measurement in millimetres used to distinguish among the different lengths.
It was concluded that these numbers were unlikely to differ in terms of familiarity, semantics, phonology, imagery, complexity, or
difficulty. While the selection of numerals to indicate length may be considered somewhat abstract, Pansky and Algom (1999) used the
Garner interference paradigm to demonstrate that numerical magnitude and physical size are perceptually integral dimensions.
Aim, Design, and Hypotheses
The aim of the present study was to examine the relative ease of learning nomic and symbolic sound-event mappings. The experiment
employed a 2X(2X2) factorial design: the two mapping levels of nomic and symbolic, an immediate and delayed test phase, and the
between-subjects factor of everyday versus musical listening. The dependent variable was the percentage of correct responses, a measure
widely considered to be a valid indicator of learning in humans (Brand & Jolles, 1985; Greene, 1988; Lachner, Satzger, & Engel, 1994;
Savage & Gouvier, 1992). The main hypothesis under investigation was that nomic mappings are more easily learned than symbolic
mappings. In an important qualification to this prediction, it is hypothesised that these learning advantages manifest only in the
immediate phase of the everyday listening condition. The experimental can therefore be sub-divided into four specific predictions:
Hypothesis 1: Learning in the nomic condition is superior to learning in the symbolic condition during the immediate phase within the
everyday listening group.
Hypothesis 2: Learning in the nomic condition is equivalent to learning in the symbolic condition during the delayed phase within the
everyday listening group.
Hypothesis 3: Learning in the nomic condition is equivalent to learning in the symbolic condition during the immediate phase within the
musical listening group.
Hypothesis 4: Learning in the nomic condition is equivalent to learning in the symbolic condition during the delayed phase within the
musical listening group.
Method
Participants
Participants were 40 students from the University of Western Sydney, Macarthur. Participation was voluntary and the only requirement
was that individuals had self-reported normal hearing and, for control purposes, no formal training in music.
Materials
Auditory stimuli. For the nomic variable of frequency, a scale consisting of ten sounds was constructed. An estimation was then made of
the bar lengths (measured in millimetres) necessary to produce these frequencies, when both bar width and thickness were held constant,
using a wooden xylophone as a guide. Another ten sounds were then produced for the symbolic category of damping. All sounds were
found to be distinguishable during pilot testing.
Visual stimuli. Prior to hearing the struck bar, participants were presented with a green circle positioned in the centre of the screen. This
served as a focus for the mouse pointer and ensured the pointer was equidistant from all numbers at the start of each trial. At the same
time as the sound was presented, five numbers were displayed in a circular configuration surrounding the area previously occupied by
the circle. The positions of the numbers were altered in each block to control for spatial organisation.
Apparatus. The experiment was designed and conducted using Powerlaboratory (Chute & Westall, 1996) and was run on one of two
Apple Macintosh computers; a Power Macintosh 7300/200 and a Power Macintosh G3.
Procedure
Prior to testing, participants were given a brief summary of the experimental procedure. The participants were prompted to position the
mouse on their favoured side and were fitted with headphones.
The experimental session was then initiated, with a more detailed set of instructions provided on screen. The instructions varied
according to the type of listening that the condition was attempting to induce. Participants assigned to the everyday listening group were
told that they would be presented with the sound of a struck pipe, and that this sound would be paired with the length of the pipe in
millimeters. Those in the musical listening condition were told that they would hear a sound and that it would be associated with a label.
The paired-associate learning task required participants to select (using the mouse) the appropriate number after hearing a sound.
Discussion
The results support the four experimental hypotheses. Specifically, nomic mappings were learned significantly better than symbolic
mappings, but the advantage was restricted, as predicted, to the immediate phase of the everyday listening group. Figure 2 shows that the
largest discrepancy in performance between nomic and symbolic mappings occured on the first block of the everyday listening
condition. This observation endorses the notion that nomic mappings were more intuitive than symbolic mappings.
An ecological perspective provides one explanation of these findings. The nomic mapping of pitch-size afforded useful information to
participants about the length of the struck bar. A biologically-based explanation is that humans, possibly through evolution, have adapted
to associate, relatively easily, the pitch of a sounding object with size. Such a combination represents a nomic mapping because it
conforms to unchanging physical laws or states of affairs. These conditions have accompanied humans throughout history, and
phylogenetic imprinting of these laws may provide the basis for direct event perception. However, the present experiment used adult
participants highly familiar with relations between sounds and object size and, as a result, cannot rule out the possibility that the
facilitatory effects of nomic mappings are the result of experience.
Implications for Design of Auditory Icons
The present study demonstrates that adults can determine the relative length of a struck bar from the acoustic quality of pitch.
Importantly, this finding need not be restricted to impact sounds. Gaver (1993a, 1993b) suggests that more complex sound-producing
events may be reducible to a series of impacts. For example, Gaver proposed that scraping involves multiple impacts as the moving
object falls into depressions and hits raised ridges. As a result, the current research findings should generalise to a large number of
events.
The learning advantages of nomic mappings are not necessarily confined to pitch. Damping indicates the material of a struck object, and
amplitude, with its perceptual correlate of loudness, affords information about the proximity of an event and the force of the interaction
(Gaver 1993a). Warren and Verbrugge (1984) have illustrated the role of temporal properties during event perception. The challenge for
ecological psychology is to investigate the extraction of such relevant features from the soundwave and to map each component to the
invariant information that it affords. The resultant taxonomy of sound-event mappings would provide a framework for understanding
realised and potential meaning inherent in the acoustic array.
Results of the present study also imply that nomic mappings, as illustrated with pitch and object size, provide significant initial learning
advantages over symbolic mappings. Although the advantages are realised only during the initial stages of learning, such intuitive
mappings may minimise the need for training. If invariant relations are mapped together then the meaning of an auditory icon should be
obvious, learned quickly, and resistant to extinction.
Limitations and Future Directions
Criticisms of the current experimental research may question extrapolation to perception in the real world. For instance, the reductive
psychophysical analysis employed in the present study reduced the auditory stimuli to mere caricatures of everyday sounds. Importantly,
identification of the elements crucial for event perception was what prompted Gaver (1993a) to devise algorithms for synthesising
auditory stimuli. His reasoning was that if an artificial sound manages to produce an accurate identification of the desired
sound-producing event, then the essential spectral components have been included. It is also likely that the proposed one-to-one
relationship between pitch and object size is more complex in real world settings. Frequency is known to reflect the material and shape
of an object, as well as its size (Gaver, 1993a). However, when the attributes of shape and material are held constant, there is probably a
direct relationship between pitch and size (Gaver, 1993a). Thus, despite the risk of corrupting the natural listening experience, a
controlled laboratory experiment was considered appropriate for examining the predictions of the current study.
Further exploratory, empirical research is still required in the relatively new field of ecological acoustics. First, invariant acoustic
properties need to be examined and mapped to source related information. For instance, Gaver's (1993a) predictions regarding
sound-event mappings remain to be investigated. Pilot testing during operationalisation of the nomic and symbolic categories during the
present study employed the Garner interference paradigm, which appears to hold potential for the future investigation of sound-event
mappings. Second, while the present results are consistent with notions of direct perception and preparedness, they provide definitive
evidence for neither. Developmental and even comparative studies are needed to identify the basis for the advantage of nomic mappings.
Finally, it would be of significant theoretical interest to examine the possibility of generalising these concepts to linguistic and musical
domains. Interestingly, research into infant-directed speech suggests that prosodic cues, such as melodic contour and rhythm, rather than
semantic content, communicates intent to young infants via direct manipulation of instinctive physiological responses (Fernald, 1989). It
would be most interesting to examine whether invariant acoustic relations maintain their affordances in various auditory contexts from
language to music.
References
Aslin, R. N., & Smith, L. B. (1988). Perceptual development. Annual Review of Psychology, 39, 435-473.
Ballas, J. A. (1993). Common factors in the identification of an assortment of brief everyday sounds. Journal of
Experimental Psychology: Human Perception and Performance, 19(2), 250-267.
Ballas, J. A., & Howard, J. R. (1987). Interpreting the language of environmental sounds. Environment and Behavior, 19(1),
91-114.
Barnard, W. A., Breeding, M., & Cross, H. A. (1984). Object recognition as a function of stimulus characteristics. Bulletin
of the Psychonomic Society, 22(1), 15-18.
Brand, N., & Jolles, J. (1985). Learning and retrieval rate of words presented auditorily and visually. The Journal of General
Psychology, 112(2), 201-210.
Chute, D. L., & Westall, R.F. (1996). Fifth generation research tools: Collaborative development with Powerlaboratory.
Behavior Research Methods, Instruments, and Computers, 28, 311-314.
Deutsch, D. (Ed.). (1999). The Psychology of Music (2nd ed.). San Diego: Academic Press.
Familant, M. E., & Detweiler, M. C. (1993). Iconic reference: Evolving perspectives and an organizing framework.
International Journal of Man-Machine Studies, 39, 705-728.
Fernald, A. (1989). Intonation and communicative intent in mothers' speech to infants: Is the melody the message? Child
Development, 60, 1497-1510.
Garner, W. R., & Felfoldy, G. L. (1970). Integrality of stimulus dimensions in various types of information processing.
Cognitive Psychology, 1, 225-241.
Gaver, W. W. (1986). Auditory icons: Using sound in computer interfaces. Human-Computer Interaction, 2, 167-177.
Gaver, W. W. (1989). The SonicFinder: An interface that uses auditory icons. Human-Computer Interaction, 4, 67-94.
Gaver, W. W. (1993a). How do we hear in the world?: Explorations in ecological acoustics. Ecological Psychology, 5(4),
285-313.
Gaver, W. W. (1993b). What in the world do we hear?: An ecological approach to auditory event perception. Ecological
Psychology, 5(1), 1-29.
Greene, R. (1988). Stimulus suffix effects in recognition memory. Memory & Cognition, 16(3), 206-209.
Heine, W., & Guski, R. (1991). Listening: The perception of auditory events? Ecological Psychology, 3(3), 263-275.
Hereford, J., & Winn, W. (1994). Non-speech sound in human-computer interaction: A review and design guidelines.
Journal of Educational Computing Research, 11(3), 211-233.
Howell, D. C. (1997). Statistical Methods for Psychology (4th ed.). Belmont: Duxbury Press.
Jacko, J. A. (1996). The identifiability of auditory icons for use in educational software for children. Interacting With
Computers, 8(2), 121-133.
Jacko, J. A., & Rosenthal, D. J. (1997). Age-related differences in the mapping of auditory icons to visual icons in computer
interfaces for children. Perceptual and Motor Skills, 84, 1223-1233.
Jenison, R. L. (1997). On acoustic information for motion. Ecological Psychology, 9(2), 131-151.
Jusczyk, P. W. (1997). The Discovery of Spoken Language. Cambridge: MIT Press.
Lachner, G., Satzger, W., & Engel, R. R. (1994). Verbal memory tests in the differential diagnosis of depression and
dementia: Discriminative power of seven test variations. Archives of Clinical Neuropsychology, 9(1), 1-13.
Lass, N. J., Eastham, S. K., Parrish, W. C., Scherbick, K. A., & Ralph, D. M. (1982). Listeners' identification of
environmental sounds. Perceptual and Motor Skills, 55, 75-78.
Leiser, R. G., Avons, S. E., & Carr, D. J. (1989). Paralanguage and human-computer interaction. Part 2: comprehension of
synthesized vocal segregates. Behaviour & Information Technology, 8(1), 23-32.
Melara, R. D. (1989). Dimensional interaction between color and pitch. Journal of Experimental Psychology: Human
Perception and Performance, 15(1), 69-79.
Melara, R. D., & Marks, L. E. (1990). Dimensional interactions in language processing: Investigating directions and levels
of crosstalk. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16(4), 539-554.
Pansky, A., & Algom, D. (1999). Stroop and Garner effects in comparative judgement of numerals: The role of attention.
Journal of Experimental Psychology: Human Perception and Performance, 25(1), 39-58.
Pressing, J. (1997). Some perspectives on performed sound and music in virtual environments. Presence, 6(4), 482-503.
Rasch, R., & Plomp, R. (1999). The perception of musical tones. D. Deutsch (Ed.) The Psychology of Music (2nd ed.) (pp.
89-112). San Diego: Academic Press.
Rosenblum, L. D., Wuestefeld, A. P., & Anderson, K. L. (1996). Auditory reachability: An affordance approach to the
perception of sound source distance. Ecological Psychology, 8(1), 1-24.
Savage, R., & Gouvier, W. (1992). Rey auditory-verbal learning test: The effects of age and gender, and norms for delayed
recall and story recognition trials. Archives of Clinical Neuropsychology, 7(5), 407-414.
Seligman, M. E. P. (1970). On the generality of the laws of learning. Psychological Review, 77(5), 406-418.
Seligman, M. E. P. (1971). Phobias and preparedness. Behavior Therapy, 2, 307-320.
Shavelson, R. J. (1988). Statistical Reasoning for the Behavioral Sciences (2nd ed.). Boston: Allyn & Bacon.
Shepard, R. N. (1984). Ecological constraints on internal representation: Resonant kinematics of perceiving, imagining,
Back to index
Proceedings paper
Table of Contents
● Background
● Aims
● Main Contribution
❍ The Model ARTIST
❍ Melodic Expectancies
❍ Stylistic Expectancies
❍ Implications
❍ References
❍ Figures
❍ Back to ICMPC6 index
Background
It is often not until we are faced with an unfamiliar musical style that we fully realize the importance of the musical mental schemata gradually acquired through our past listening experience. These cognitive structures automatically intervene as music is heard, and they are necessary to build integrated and organized perceptions from acoustic
sensations: without them, as it happens when listening to a piece in a musical style foreign to our experience, a flow of notes seems like a flow of words in a foreign language, incoherent and unintelligible. The impression is that all pieces or phrases sound more or less the same, and musical styles such as Indian Rags, Chinese Guqin or Balinese
Gamelan are often qualified as being monotonous by Western listeners, new to these kinds of music. This happens to experienced, musically trained listeners as well as to listeners without any musical experience other than just listening. Thus it is clear that the mental schemata required to interpret a certain kind of music can be acquired through
gradual acculturation (Francès, 1988), which is the result of passive listening in the sense that it does not require any conscious effort or attention directed towards learning. This is not to say that formal training has no influence, but only that it is not necessary and that exposure to the music is sufficient.
Becoming familiar with a particular musical style usually implies two things: 1) The memorization of particular melodies 2) An intuitive sense of the prototypicality of musical sequences relative to that style (i.e., the sense of tonality in the context of Western music). These underlie two kinds of expectancies, respectively melodic and stylistic
expectancies. Melodic (also called 'veridical') expectancies rely on the listener's familiarity with a particular melody and refer to his knowledge of which notes will be coming next after hearing part of it. Stylistic expectancies rely on the listener's familiarity with a particular musical style, and refer to his sense of the notes that should or will probably
follow a passage in order for the piece to fit well in that style. These expectancies can be probed in different ways, for instance with Dowling's (1973) recognition task of familiar melodies interleaved with distractor notes, and Krumhansl and Shepard's (1979) probe-tone technique, respectively.
Aims
Some connectionist models of tonality have been proposed before but they are rarely realistic in that they often use a priori knowledge from the musical domain (e.g., octave equivalence) or are built without going through learning (Bharucha, 1987; extended by Laden, 1995). This paper presents an Artificial Neural Network (ANN), based on a
simplified version of Grossberg's (1982) Adaptive Resonance Theory (ART) to model the tonal acculturation process. The model does not presuppose any musical knowledge except the categorical perception of pitch for its input, which is a research problem in itself (Sano and Jenkins, 1989) and beyond the scope of this paper. The model gradually
develops through unsupervised learning. That is, it does not need any other information than that present in the music to generate the schemata, just like humans do not need a teacher. Gjerdingen (1990) used a similar model for the categorization of musical patterns, but did not aim at checking the cognitive reality of these musical categories. Page
(1999) also applied successfully ART2 networks to the perception of musical sequences. The goal of the present paper is to show that this simple and realistic model is cognitively pertinent, by comparing its behaviour with humans' directly on the same tasks. As mentioned in the previous section, these tasks have been chosen because they are robust,
having stood the test of time, and because they reflect broad and fundamental aspects of music cognition.
Main contribution
The Model ARTIST
The ART2 self-organizing ANN (Carpenter and Grossberg, 1987) was developed for the classification of analogue input patterns and is well suited for music processing. It seemed a bit more complex than what is needed here, and a few simplifications were made to build the present model, ARTIST (Adaptive Resonance Theory to Internalize the
Structure of Tonality). It is made up of 2 layers (or fields) of neurons, the input field (F1) and the categories field (F2), connected by synaptic weights that play the role of both Bottom-Up and Top-Down connections. Learning occurs through the modification of the weights, that progressively tune the 'category units' in F2 to be most responsive to a
certain input pattern (the 'prototype' for this category). The weights store the long-term memory of the model.
❍ Input Layer
The neurons in F1 represent the notes played. For now the model will be tested only with conventional Western music, so an acoustic resolution of one neuron per semitone is sufficient to code the musical pieces used. This is the only constraint applied to the model imposed by assuming Western music, and this can easily be
overridden simply by changing the number of input nodes. Bach's 24 preludes from the Well-tempered clavier were used for learning. The notes they contain span 6 octaves. With 12 notes per octave, 72 nodes are needed to code the inputs. The activation of the inputs is updated at the end of every measure. Each note played within the
last measure activates its corresponding input neuron proportionally to its loudness (or velocity; notes falling on beats 1 and 3 were accentuated) and according to a temporal exponential decay (activation is halved every measure). Before propagating the activation to F2, the activation in F1 is normalized. Each prelude was transposed in
the 12 possible keys, so 288 pieces were available for training ARTIST.
❍ Classification Layer and Learning
Melodic Expectancies
When we are very familiar with a melody, we can usually still recognize it after various transformations like transposition, rhythmic or tonal variations, etc... This is not the case when distractor (random) notes are added in between the melody notes, and even the most familiar tunes become unrecognizable as long as the distractors 'fit in' (if no
primary acoustic cue like frequency range, timbre or loudness for instance segregates them; Bregman, 1990). However, when given a few possibilities regarding the identity of the melody, it can be positively identified (Dowling, 1973). This means that Top-Down knowledge can be used to test hypotheses and categorize stimuli. For melodies, this
knowledge takes the form of a pitch-time window within which the next note should occur, and enables the direction of auditory attention (Dowling, Lung & Herrbold, 1987; Dowling, 1990). As the number of possibilities offered to the subject increases, his ability to name that tune decreases: when Top-Down knowledge becomes less focused,
categorization gets more difficult. With its built-in mechanism of Top-Down activation propagation, ARTIST can be subjected to the same task.
❍ Method
To get ARTIST to become very familiar with the first 2 measures of 'Twinkle twinkle little star', the learning rate and vigilance values were set to their maximum equal to 1 for both), so that the learning procedure would create two new F2 nodes to memorize those two exemplars and act as labels for the tune. Had the vigilance level
been too low, the tune would have been assimilated into an already existing category, which activation couldn't be interpreted as recognition of the tune. After learning the tune, the activation in F2 was recorded under 5 conditions corresponding to the presentation of:
■ The original melody alone
■ The melody with distractors,
■ No Top-Down activation
The control condition is necessary to make sure that testing the hypothesis that the tune is 'Twinkle twinkle...' by activating the label nodes does not always provoke false alarms.
For each condition, the activation ranks of the 2 label nodes were computed (1 for the most activated, 2 for the next, and so on) and added. A low rank indicates few categories competing and interfering with the recognition, and a probable "Yes" response to the question "Is this Twinkle twinkle ?". As the sum of the ranks increases,
meaning the label nodes are overpowered by other categories, the response goes towards "No".
❍ Results
Results appear in Figure 1. Thus the straight presentation of the tune results in the smallest possible summed ranks, equal to 3: ARTIST recognizes unambiguously the melody. When the melody is presented with distractors, the ranks are higher, indicating some difficulty in recognition. Among these 3 conditions, the lowest ranks are
found when testing the 'Twinkle' hypothesis and only that one. The label nodes are amongst the top 5 most activated, which suggests a strong possibility to identify the melody. Then, identification gets much more difficult when testing multiple hypotheses, about as difficult as without Top-Down activation (no explicit hypothesis is
being tested), exactly like when human subjects are not given a clue about the possible identity of the melody. Finally, the control condition shows that ARTIST does not imagine recognizing the melody amongst distractors when it is not there, even after priming the activation of its notes through Top-Down propagation.
Equivalent results can be obtained by summing the activations of the label nodes instead of computing their ranks. However, given the important number of categories, small differences of activation (especially in the middle range) imply strong differences in ranks and therefore the latter measure was preferred to exhibit the contrast
between conditions. In any case, the order of the conditions most likely to lead to the recognition of the familiar melody is the same for ARTIST and humans, and the effects of melodic expectancies can easily be observed in ARTIST.
Stylistic Expectancies
The most general and concise characterization of tonality --and therefore of most Western music-- probably comes from the work of Krumhansl (1990). With the probe-tone technique, she empirically quantified the relative importance of pitches within the context of any major or minor key, by what is known as the 'tonal hierarchies'. These findings
are closely related to just about any aspect of tonality and of pitch use: frequency of occurrence, accumulated durations, aesthetic judgements of all sorts (e.g., pitch occurrence, chord changes or harmonization), chord substitutions, resolutions, etc... Many studies support the cognitive reality of the tonal hierarchies (Jarvinen, 1995; Cuddy, 1993;
Repp, 1996; Sloboda, 1985; Janata and Reisberg, 1988). All these suggest that subjecting ARTIST to the probe-tone technique is a good way to probe whether it has extracted a notion of tonality (or its usage rules) from the music it was exposed to, or at least elements that enable a reconstruction of what tonality is.
❍ Method
The principle of the probe-tone technique is quite simple. A prototypical sequence of chords or notes is used as musical context, to establish a sense of key. The context is followed by a note, the probe tone, that the subjects have to rate on a scale to reflect how well the tone fits within this context. Repeating this procedure for all 12
possible probe notes, the tone profile of the given key can be generated. Out of the many types of contexts used by Krumhansl et al. over several experiments, the 3 standard ones were retained to test ARTIST: for each key and mode (major and minor), the corresponding chord, the ascending and the descending scales were used as
contexts. Several keys are used so the results do not depend on the choice of the particular pitch of reference. Here all 12 keys are used (as opposed to 4 by Krumhansl) and ARTIST's profile is obtained by averaging the profiles obtained with the 3 contexts for each key, after transposition to a common tonic. Thus the tone profile
obtained for each mode is the result of 432 trials (3 contexts × 12 keys × 12 probes). After each trial, the activations of the F2 nodes were recorded. Following Katz' (1999) idea that network's total activation directly relates to pleasantness, the sum of all activations in F2 is taken as ARTIST's response to the stimulus, the
index of its receptiveness/aesthetic judgement towards the musical sequence.
It appeared in the first studies using the probe-tone technique that subjects judged the fitness of the probe-tone more as a function of its acoustic frequency distance from the last note played rather than as a function of tonal salience. This results naturally from the usual structure of melodies, that favours small steps between consecutive
notes. This problem was eluded by using Shepard tones (Shepard, 1964). They are designed to conserve the pitch identity of a tone while removing the cues pertaining to its height, and their use to generate ever-ascending scales illusions prove they indeed possess this property. Shepard tones are produced by generating all the
harmonics of a note, filtered through a bell-shaped amplitude envelope. To simulate Shepard tones for ARTIST, the notes are played simultaneously on all 6 octaves, with different velocities (loudness) according to the amplitude filter: high velocities for the middle octaves, decreasing as the notes get close to the boundaries of the
frequency range.
❍ Results
Figures 2 and 3 allow direct comparison of the tone profiles obtained with human data (Krumhansl and Kessler, 1982) and with ARTIST, for major and minor keys, respectively. Both Pearson correlation coefficients between human and ARTIST profiles are significant, respectively -.95 and -.91, p<.01 (2-tail). Surprisingly, the
correlations are negative, so ARTIST's profiles are inverted on the figures for easier comparison with the human data. This is discussed in the next section. Once the tone profile for a particular key is available, we can deduce the profiles of all other keys by transposition. Then, we can compute the correlation between two different key
profiles as a measure of the distance between the two keys. This is the procedure used by Krumhansl to obtain all the inter-key distances, and the same was done with ARTIST data, to check whether its notion of key distances conforms that of humans. Both graphs for distances between C major and all minor keys are shown in Figure
4. Keys on the X-axis appear in the same order as around the circle of fifths. It is immediately apparent that both profiles are close to being identical. This is even more true for the key distances between C major and all major keys, as well as for distances between C minor and all minor keys: The correlations between human and
ARTIST data for major-major, minor-minor and major-minor key distances are respectively .988, .974 and .972, all significant at p<.01. Thus ARTIST clearly emulates human responses on the probe-tone task, and therefore can be said to have developed a notion of tonality, with the tonal invariants extracted directly from the musical
environment.
Implications
From the two simulations above we can see that it is easy to subject ARTIST to the same musical tasks as given to humans in a natural way, and that it approximates human behaviour very closely on these tasks. When probed with the standard techniques, it shows both melodic and stylistic expectancies, the two main aspects of musical
acculturation. ARTIST learns unsupervised, and its knowledge is acquired only from exposure to music, so it is a realistic model explaining how musical mental schemata can be formed. The implication is that all is needed to accomplish those complex musical processings and to develop mental schemata is a memory system capable of
storing information according to similarity and abstracting prototypes from similar inputs, while constantly interpreting the inputs through the filter of Top-Down (already acquired) knowledge. From the common action of the mental schemata results a musical processing sensitive to tonality. This property emerges from the internal
organisation of the neural network, it is distributed over its whole architecture. Thus it can be said that the structure of tonality has been internalized. Testing ARTIST with other musical styles could further establish it as a general model of music perception. In the simulation of the probe-tone task, ARTIST's response has to be recorded before
any lateral inhibition occurs in F2. Otherwise, the sum of all activations in F2 would simply be that of the winner, all others being null, and a lot of information regarding ARTIST's reaction would be lost. This is taking one step further Gjerdingen's (1990) argument for using ANNs, namely that cognitive musical phenomena are probably too
References
Bharucha, J.J. (1987). Music cognition and perceptual facilitation: A connectionist framework. Music Perception, 5(1), 1-30. Bregman, A.S. (1990). Auditory scene analysis. Cambridge, MA: MIT Press. Carpenter, G.A., & Grossberg (1987). ART2: Self-organization of stable category recognition codes for analog input patterns. Applied
optics, 26, 4919-4930. Cuddy, L.L. (1993). Melody comprehension and tonal structure. In T.J.Tighe & W.J.Dowling (Eds.), Psychology and music: the understanding of melody and rhythm. Hillsdale, NJ: Erlbraum. Dowling, W.J. (1973). The perception of interleaved melodies. Cognitive psychology, 5, 322-337. Dowling, W.J. (1990).
Expectancy and attention in melody perception. Psychomusicology, 9(2), 148-160. Dowling W.J., Lung, K.M.T., & Herrbold, S. (1987). Aiming attention in pitch and time in the perception of interleaved melodies. Perception and Psychophysics, 41, 642-656. Francès, R. (1988). La perception de la musique. Hillsdale, NJ: Erlbraum. Originally
published 1954. Librairie philosophique J. Vrin, Paris. (Translated by W.J.Dowling) Gjerdingen, R.O. (1990). Categorization of musical patterns by self-organizing neuronlike networks. Music Perception, 7, 339-370. Grossberg, S. (1982). Studies of mind and brain: Neural principles of learning, perception, development, cognition and motor
control. Boston:D.Reidel/Kluwer. Janata, P., & Reisberg, D. (1988). Response-time measures as a means of exploring tonal hierarchies. Music Perception, 6(2), 161-172. Jarvinen, T. (1995). Tonal hierarchies in Jazz improvisation. Music Perception, 12(4), 415-437. Katz, B.F. (1999). An ear for melody. In Griffith, N. & Todd, P. (Eds.)
Musical Networks: Parallel Distributed Perception & Performance, Cambridge: MA, MIT Press. pp 199-224. Krumhansl, C.L. (1990). The cognitive foundations of musical pitch. Oxford psychology series, No. 17. Krumhansl, C.L., & Kessler, E.J. (1982). Tracing the dynamic changes in perceived tonal organization in a spatial representation
of musical keys. Psychological Review, 89, 334-368. Krumhansl, C., & Shepard, R. (1979). Quantification of the hierarchy of tonal functions within a diatonic context. Journal of Experimental Psychology: Human Perception and Performance, 5, 579-594. Laden, B. (1995). Modeling cognition of tonal music. Psychomusicology, 14, 154-172.
Page, M.P.A. (1999). Modelling the perception of musical sequences with self-organizing neural networks. In Griffith, N. & Todd, P. (Eds.) Musical Networks: Parallel Distributed Perception & Performance, Cambridge: MA, MIT Press. pp 175-198. Repp, B.H. (1996). The Art of inaccuracy: Why pianists' errors are difficult to hear. Music
Perception, 14(2), 161-184. Sano, H, & Jenkins, B.K. (1989). A neural network model for pitch perception. Computer Music Journal, 13(3). Shepard, R.N. (1964). Circularity in judgments of relative pitch. The journal of the Acoustical Society of America, 36, 2346-53. Sloboda, J.A. (1985). The musical mind. Oxford psychology series, No.5.
Figures
From top to bottom: Figure 1. Summed ranks of the 2 label nodes for 'Twinkle twinkle' as a function of stimulus played and hypothesis tested. Figure 2. Comparison of ARTIST and Krumhansl & Kessler (1982) C major tone profile (correlation = .95). Figure 3. Comparison of ARTIST and Krumhansl & Kessler (1982) C major tone profile
(correlation = .91). Figure 4. Comparison of ARTIST and Krumhansl & Kessler (1982) inter-key distances between C major and all minor keys (correlation = .972).
Proceedings paper
Unresolved Issues in Continuous Response Methodology: The Case of Time Series Correlations
Emery Schubert
School of Music and Music Education
University of New South Wales
Sydney 2052 NSW
AUSTRALIA
Phone: +61-2-9385 6808
Fax: +61-2-9313-7326
Email: E.Schubert@unsw.edu.au
ABSTRACT
Background: While continuous response methodologies have become increasingly popular among researchers of
emotional response to music, the literature is very light on critical analysis of the methodology.
Aims: This paper investigates the common formats in which the methodology has appeared: open-ended, checklist, and
rating scale; the kinds of problems for which it has been used: validation, comparison, the relationship between
stimulus and response, and the dynamic lag structure of the musicâ€"emotion system; and the analytic techniques
which have been applied: interoccular tests, correlation analysis, repeated measures approaches, and traditional time
series analytic techniques.
Main Contribution: The most popular continuous response format is the rating scale, however there is little
experimental evidence to support the reliability of this format over the checklist or the open-ended format. Also unclear
is the kind of rating scale to use (unipolar or bipolar), the number of scales to use simultaneously (one, two, or three),
the response sampling rate, and the label identifiers of the scales. Another serious problem facing continuous response
research is the analysis of data. While time-series textbooks have for a long time warned against the use of visual
inspection as the sole method of analysing continuous data, the literature is riddled with conclusions based on just such
a technique.
Implications: In this paper I argue that elementary methods of time series analysis can be applied by researchers to
produce a more valid basis for investigating their data. I also argue that continuous response methodology in
musicâ€"emotion research is in its infancy â€" evidenced by the large proportion of validation and comparative studies.
If and when the methodology matures, its most beneficial application will be in helping to understand the dynamic
structure of the musicâ€"emotion system, and not so much the understanding of basic stimulus-response relationships,
which traditional asynchronous approaches can do more efficiently.
FULL PAPER
Introduction
A common problem in studying emotional responses to music is that of lacking ecological
(naturalistic) validity. In the typical study, a listener will hear an excerpt of music and at the end of the
excerpt he or she will be asked to indicate the emotion expressed by the music or experienced by the
listener (eg, Gabrielsson & Juslin, 1996; Heinlein, 1932). This is a highly efficient way of collecting
data on emotional response to music. However, such instantaneous responses cannot tap into the
subtle patterns of emotion which change from moment to moment through the course of a listening.
For example, they cannot provide information about the lag structure between one response and
another, or between response and stimulus (Schubert & Dunsmuir, 1999).
One of the remedies for this problem is to measure responses to the musical stimulus continuously.
Instead of making a response at the end of an excerpt, the individual is continually assessing the
expressed or experienced emotion during listening. Such continuous responses enable the researcher
to build up a profile of the relationship between the stimulus and the response within a more realistic
musical context and psychologically valid framework. However, this methodology brings with it a
range of problems, many of which are yet to surface.
In this presentation I will mention methodological issues and concerns of which researchers using
continuous response devices should be aware. I will then focus on one such problem, specifically the
question of correlating comparative time series data.
General Methodological Issues
Two broad categories of problems in continuous response methodology are the response task
requirement and the analysis of continuous response data. An example of the response task problem is
that the concentration on the response task continuously is itself unnaturalistic. This should be a cause
for concern for it contradicts the initial drive toward the methodology (ecological validity). However,
continuous response researchers have found what appear to be reasonably adequate solutions to the
problem. A typical solution is to make the continuous task a simple one by making responses on a
single scale, such as amount of emotion (Krumhansl, 1998), tension (Madsen & Fredrickson, 1993;
Nielsen, 1983) or aesthetic experience (Madsen, Brittin & Capperella-Sheldon, 1993), which a
computer samples automatically in the background.
More serious are the issues regarding the analysis of continuous, time-series data. Many researchers of
emotional response to music who have chosen to adopt continuous response methodologies have yet
to come to terms with the issues that are pertinent in time series data analysis (Schubert, in press).
Amongst the analytic issues there are problems which lie on either side of a spectrum of
methodological issues (Figure 1). On one pole a large amount of continuous data is obtained but the
point of the collection is not immediately apparent (I call the extreme of this pole ‘no analysis'). For
example, if a researcher is going to collect time series data and then report on the time-average
(perhaps because he or she cannot find an appropriate way to analyse the data in its time series form),
the researcher should consider whether the extra effort in collecting continuous data was worthwhile.
On the other end of the spectrum, analysis is often applied which is appropriate for parametric data,
but not for serially correlated data (this is a problem of using parametric inferential statistical
analysis). For example Analysis of Variance is generally not an appropriate form of analysis for time
series data because the assumption of independent, normally distributed data is usually violated
(Gibbons, 1993). Somewhere along this spectrum lies the most common problem of emotional
response studies which analyse continuous responses: the interpretation of a visual inspection of the
time series. Gottman (1981) refers to this as an ‘interoccular test' and warns against the use of this
descriptive approach as sole means of analysing time series data. While many of these issues are
known, particularly in the fields of Economics, Geography and Engineering, there also exists firmly
grounded literature to explain and correct these problems (eg. Box & Jenkins, 1976; Hamilton, 1994).
In this presentation I will focus on one issue: the use of Pearson's product-moment correlation
technique for comparing two or more time series.
Figure 1. Spectrum of problems associated with time series data analysis of emotional
response literature.
investigated do not attempt to support their findings by falsification (Stanovich, 1998). They tend to
report positive relationships (significant correlations) without examining correlations that should not
be significant. Consequently, these researchers cannot be aware as to whether their significant
correlations (and they almost always are, or appear to be) are meaningful, or whether they are in fact
false correlations that have appeared due to underlying serial correlation. (That is, the correlation
gives information about the music stimulus, not just the measuring instrument). The assumption of the
independent sampling of the Pearson product-moment correlation is quite likely violated in time series
data.
A particular study prompted me to investigate the situation with correlations and time series data
further. Fredrickson (1997) reported the results of continuous tension responses as sample-by-sample
mean responses from 2nd grade, 5th grade, 8th grade, 11/12 grade students, professional and
non-professional musician listeners. All groups were highly correlated with one-another. Fredrickson
ranked the coefficients to demonstrate the strength of similarity between the various groups. He then
reported that "of particular note is that the correlations, including the lowest one of .71 between the
second graders and the musicians, were all significant at the [sic] p = 0.001 level" (p. 630). However,
if the correlation of 0.71 is inflated or it is a false correlation due to underlying serial correlation, it
leaves room open for a flood of studies to report incorrectly high correlation coefficients and to treat
them as meaningful results. While I believe that something like this is already happening in the
literature, at the same time I do not argue that Fredrickson's analysis is necessarily wrong. Instead I
felt that the use of correlation analysis of time series emotional response data required some
investigation.
In this paper I present some data which address the issue of correlation coefficients for serially
correlated, time-series responses. I do not claim to find a solution to the problem, but I do intend to
make researchers cognisant of issues concerning the application of correlation coefficients.
Monte Carlo Study
Using a sample of data from a study by Schubert (1999a), I constructed a pseudo-Monte Carlo study
to investigate the effect of correlation coefficient calculation. The study is pseudo-Monte Carlo
because I did not select data from a predetermined distribution (Mooney, 1997). Instead, I used actual
time series data which was collected in the form of emotional responses to music. The data comes
from continuous responses to three pieces of music: Edvard Greig's ‘Morning Mood' from Peer
Gynt, Joaquin Rodrigo's Adagio movement from Concierto de Aranjuez for Guitar and Orchestra and
Antonin Dvorak's Slavonic Dance No 1, Op. 46. For each piece there existed two bipolar time series
responses: the arousal response (the amount of arousal or sleepiness expressed by the music) and the
valence response (the amount of happiness or sadness expressed by the music). Each response was
recorded by computer once per second on a scale of -100 to +100 for arousal and valence. Over
seventy participants' data were available for each piece and emotional response dimension.
Hypothesis
Being a Monte Carlo-type study, the hypothesis is assumed to be true, and the data is evaluated
according to how well it fits the hypothesis (cf. Mooney, 1993). In the present study, the hypothesis is
that responses by different participants will be correlated for the same dimension and piece of music,
and in all other conditions they will be uncorrelated (falsification). For example, all subjects' arousal
responses to Morning will be correlated, however, their valence response to Morning or to any other
piece (or dimension) will not be correlated with this arousal response.
Method
For the present study I randomly sampled 16 responses from each of the three pieces. The first 200
seconds of each piece was selected. For simplicity and to conserve space, I will make reference to a
subset of five responses from each piece, but the processes and findings discussed apply to the entire
sample of sixteen.
Analysis
The data were factor analysed using a six-factor varimax-rotation solution using SPSS 6.1.1 for the
Macintosh software. The analysis was conducted using the original data sets. A second analysis was
conducted using the first-order differenced data sets. Differencing refers to a changes in responses
rather than absolute responses, a technique used to reduce serial correlation in time series data
(Gottman, 1981). By subtracting a data point from the previous (in time) sample, a first-order
difference series is generated. This series corresponds to the gradient of the original series.
A sample of the factor loadings for each analysis is shown in the Table 1. Only factor loadings above
.4 are considered. Relative to the undifferenced data, the differenced data is much closer to the
expected model stated by the hypothesis. When the data is differenced, it tends to load fairly neatly
onto separate factors grouped by the response dimension (Arousal or Valence) and musical item. For
example, the first-order differenced arousal responses to the Dvorak (responses labelled A_Dxx in
Table 1) load onto factor 3 for each of the five participants shown. When the same data are not
differenced, there is still a grouping of the data but, in the case of the Dvorak arousal data, the loading
occurs on two factors (1 and 3), contrary to the hypothesis. Further, undifferenced data factors are
more frequent and more scattered.
Table 1 Factor loadings for undifferenced (untreated) data and differenced (serial correlation
adjusted) data.
Undifferenced Data Factors Differenced Data Factors
UF 1 UF 2 UF 3 UF 4 UF 5 UF 6 RS DF 1 DF 2 DF 3 DF 4 DF 5 DF 6
0.78 A_GAL
0.71 A_GAI
0.75 A_GCO
-0.46 V_D15
0.71 V_GAN
V_RJU 0.69
DF = Undifferenced Factor
DF = Differenced Factor
RS = Response Sample
A_ = Arousal response
V_ = Valence response
G = Grieg Morning
R = Rodrigo Adagio
The factor analyses provide evidence that serial correlation is present in undifferenced data, and
suggests that correlations between any pairs of participants is more likely to lead to a misleadingly
high correlation coefficient than when the data are first-order difference transformed. For example, the
undifferenced arousal response to the Dvorak for any particular participant is likely to correlate with
another participant's arousal response to the same piece. This result is fine, however Table 1 also
demonstrates that a significant correlation is also likely to occur with any of the other pieces or
dimensions because there exist reasonably large factor loadings onto factors 1 and 3 for each of the
other examples. This is an incorrect result according to the hypothesis.
The differenced data still posed some problems. Dvorak Valence and Dvorak Arousal load onto the
same factor for all but one of the participants. Further, there appears to be some contradictions with
the sign of the factor loadings which are inconsistent within the rest of the group. For example, factor
3 in the ‘Dvorak Valence' group has two negative loadings and two positive loadings. (Note: Factor
4 contains no loadings probably because of sampling error and because only factor loadings greater
0.4 are shown.) The first problematic finding can be reconciled by a closer examination of the Dvorak
responses. For this piece the valence and arousal were more correlated than for other pieces (meaning
that the hypothesis requires correction, or that different-dimension responses to the same piece should
not have been compared). The second problematic finding could be explained by sampling error. With
such a small sample chosen for analysis (Monte Carlo studies are considerably larger) the effect of
sampling error becomes quite problematic (16 per group in the original study, five shown in Table 1).
The important point, however, is that the differenced responses are considerably better grouped than
the undifferenced responses.
Discussion and Conclusions
The Monte Carlo-type study demonstrates that Pearson product-moment correlations between time
series responses tend to be inflated and misleading. A better result was obtained when each time series
was first-order differenced. Differencing, in this case, removes some of the serial correlation from the
data. The amount of serial correlation in the data can be diagnosed by examining the autcorrelation
function, not discussed here (see Schubert & Dunsmuir, 1999). Another possible method of
controlling the inflation of the correlation coefficient and the possibility of false correlation is to use
more conservative correlation analyses such as Spearman's rho or Kendall's Tau (Howell, 1997).
However, the mathematical derivation of these methods is not based on principles of time series. My
own investigation of correlation coefficient matrices (again a Monte Carlo-like study on the above
data) demonstrated minimal reduction in the number of false correlations when data is undifferenced
(matrices not shown here to conserve space). Consequently, the conclusion drawn from the present
investigation is that it is appropriate to control serial correlation before calculating correlation
coefficients, and that a simple method of controlling serial correlation is to apply a
first-order-difference transformation to the data.
While there are many issues that are of concern to emotion-in-music researchers who adopt
continuous response methodologies, the present investigation and those discussed elsewhere (Beran
and Mazzola, 1999; Schubert, 1999b; Schubert & Dunsmuir, 1999) suggest that there are simple
techniques available for dealing with many of these matters. However, for continuous response
methodology to be a plausible alternative to conventional, more efficient approaches, researchers must
become aware of the issues and the solutions. In particular, the issue of serial correlation needs to
receive more consideration than is currently the case in the literature.
References
Beran, J. & Mazzola, G. (1999). Analysing musical structure and performance - A
statistical approach. Statistical Science, 14, 47-79.
Box, G. E. P. & Jenkins, G. M. (1976). Time series analysis: Forecasting and control
(Rev. ed.). San Francisco: Holden-Day.
Fredrickson, W. E. (1997). Elementary, middle, and high school perceptions of tension in
music. Journal of Research In Music Education, 45, 626-635.
Fredrickson, W. E. (1999). Effect of musical performance on perception of tension in
Gustav Holst's first suite in E-flat. Journal of Research in Music Education, 47, 44-52.
Frego, R. J. D. (1999). Effects of aural and visual conditions on response to perceived
artistic tension in music and dance. Journal of Research in Music Education, 47, 31-43.
back to index
Proceedings abstract
M13 9PL.
1. Background
Since the question "Is hearing all cochlear?" was posed some seven decades past
(Tait, 1932), there has amassed considerable evidence that the sacculus, an
organ of hearing in fish (Popper et al, 1982), has retained some acoustic
sensitivity throughout phylogeny (McCue and Guinan, 1995). In humans, myogenic
vestibular evoked potentials (MVEP) may be obtained from motorneurones
inervated by the vestibulo-spinal tract, particularly from the cervical region
of the spinal cord (Ferber-Viart et al, 1999). MVEP has been studied
principally as non-invasive clinical tool for evaluation of normal otolith
vestibular function, since traditional nystagmographic methods only assess
canal function. However, we have been interested in using MVEP as a window on
what possible 'auditory' function acoustic sensitivity sacculus may have.
2. Aim
3. Method
4. Results.
1. MVEP has a frequency tuning (Todd, Cody and Banks, in press), with a best
frequency between 300 Hz - 350 Hz and a band-width of about 3 octaves.
This response is consistent with it being saccularly mediated,
particularly since we were able to model the selectivity by means of a
mass-spring-damper system with a Q of about 0.7.
2. MVEP can be obtained to 'natural' acoustic stimuli (Todd and Cody, 2000),
such as dance music, above about 90 dB SPL.
3. MVEP can be obtained to continuous sounds (Todd, in preparation), i.e. a
frequency-following response may be obtained with longer duration stimuli,
suggesting that acoustically evoked phase-locking is taking place in the
saccular nerve giving rise to the adaptation and inhibition characteristic
5. Conclusions
Given the above then, there are a number of natural stimuli, including vocal
and musical sounds, where saccular acoustic sensitivity may play a role in
perception. Such sensations could be vestibular or 'auditory' since there
exists anatomical evidence of a saccular projection to the cochlear nucleus
(Burian et al, 1989). Further, such a mechanism could interact with a more
general sensory-motor mechanism (Todd, 1999) involved in rhythmic processing,
particularly given the input that the vestibular system has to the cerebellum.
Back to index
Proceedings abstract
DEVELOPING PROCESS BASED INVENTING ACTIVITIES: A SPIDER'S WEB OF INTRIGUE AND
CREATIVITY
Mr Charles Byrne
c.g.byrne@strath.ac.uk
Background:
The Scottish system of examination and assessment of the secondary school music
curriculum places emphasis on the certification of children's achievements in
musical invention which may be to the detriment of developing lifelong learning
skills, interests and enthusiasm. World wide web based materials have been
created which provide teachers and pupils the opportunity to develop creative
music making through exploration and experimentation (Spider's Web Composing
Lessons).
Aims:
This paper sets out the theoretical context for the Spider's Web Composing
lessons, the suggested methodologies related to self-directed learning and
reviews the progress to date in the creation of materials, identifying further
questions and possible strategies for developing critical thinking skills
through musical composition and improvisation.
Main contributions:
In order to bring about change in the way that composing and improvising
activities, are conceived within the music curriculum, more feedback is needed
from users of the world wide web based materials. Both formal and informal
evaluative techniques are being used to collect information on teacher, student
teacher and pupil views and attitudes to process based activities in composing
and improvising.
Implications:
Little evidence is available regarding the use of the spider's web composing
lessons other than statistics on the number of 'downloads' from the web server.
Evaluations and feedback will be reviewed in order to shape and inform the
development of musical activities within a 'knowledge unrestricted problem
environment'.
Back to index
Proceedings abstract
fineberg@music.columbia.edu
Background:
The literature in the field of music cognition has shown that music is
processed as units of 'discourse' unfolding in time, forming larger scale
structures of various levels of complexity. Grouping into larger scale
structures might rely on computation of similarity or difference between units.
Studies on similarity between musical units have, so far, concerned mainly
melodic stimuli.
Aims:
This study investigates features which are used in similarity judgments for
pairs of harmonic phrase units. Stimuli were constructed by varying three
music-theoretically distinct harmonic dimensions and a subtle aspect of rhythm
for a two-measures excerpt from a Chopin Prelude. Stimuli were written in a
number of different musical idioms.
method:
Results:
Conclusions:
Back to index
Proceedings abstract
MR TIM HORTON
tjh20@cam.ac.uk
Background:
Although there have been various theories of how primitive structural units in
music might be said to have meaning, such theories are generally unable to deal
with the composition of the meaning components they identify into complex
meaning structures. These theories thus remain somewhat superficial analogies
to conceptions of meaning in other domains, such as natural language.
Aims:
Various formal parallels between tonal structure and linguistic syntax will be
examined. It will be suggested that the functional properties of tonal harmony
play a role in the domain of tonal music analogous to the semantic properties
of natural language.
Main contributions:
Implications:
Back to index
Proceedings paper
excerpt of music (most often taken from some Bach instrumental piece), a brief definition of what is typically
called "compound melody," and a diagram showing the melodic line separated into multiple voices. But there is
rarely any description of how this separation was determined or any consistent statement about which specific
musical features contributed to that particular parsing of the melodic line (cf. Piston, Kennan).
General principles of auditory stream segregation may help to explain some cases of this type of linear
polyphony. In many ways, these principles coincide with the fundamental claim of Gestalt psychology. As
Lerdahl and Jackendoff describe, this claim is that "perception, like other mental activity, is a dynamic process of
organization, in which all elements of the perceptual field may be implicated in the organization of any particular
part" (Lerdahl and Jackendoff 303). Two Gestalt principles that seem especially applicable to auditory perception
are proximity and similarity. Proximity is basically the idea that listeners tend to perceptually group elements
together that are closer to one another, while similarity refers to the tendency to group elements of similar shape
or other likeness together. According to Albert Bregman, auditory stream segregation seems to follow most
directly from the Gestalt law of grouping by proximity (20).
In audition, the two most important influences on the segregation of tones by proximity are the rate of the
sequence and the frequency separation of different elements within the sequence (Bregman 643). Separations of
this kind represent a bottom-up type of processing, with emphasis on the more detailed, note-to-note level of the
music. A grouping determined according to similarity, however, can work on a variety of different levels.
Examples in music might include the grouping of instrumental sounds by timbre (low level processing) and by
motivic parallelism (larger level processing).
In creating melodies, composers have long realized the influence that these grouping principles have on
perceptual coherence, especially the repetition rate and the frequency separation of tones. For instance, various
studies have shown that much Western music is dominated by small melodic intervals, thereby reflecting the idea
that notes closer together in frequency tend to produce stronger perceptual groupings (Ortmann 7). Even though
many composers seek to achieve this melodic coherence by avoiding any extended use of those features which
are apt to create segregation, others choose to purposely maximize the tendency for tone sequences to break apart
(given a sufficient degree of frequency separation, for instance). In an interesting reference to the very style of
music that Bach's unaccompanied string pieces represent, Bregman states,
Rapid alternations of high and low tones are sometimes found in music, but composers are aware that such
alternations segregate the low notes from the high. Transitions between high and low registers were used by the
composers of the Baroque period to create compound melodic lines - the impression that a single instruments,
such as a violin or flute, was playing more than one line of melody at the same time. These alternations were not
fast enough to cause compulsory segregation of the pitch ranges, so the experience was ambiguous between one
and two streams. Perhaps this was why it was interesting (675-76).
The previously mentioned dissertation by Elias Dann contains one of the few published attempts to separate one
of Bach's monophonic movements into multiple voices. Dann bases his analysis on the assumption that the
melodic function of each individual tone in this music is dependent on the tones that surround it, its rhythmic
placement within a measure or phrase, and whether its range ever crosses into the frequency space of another
voice (199). He also points out that each individual tone has a dual role, first as part of a single melodic line
simply because the tones actually are heard one after the other and then as a member of one of the many
polyphonic lines that can be followed throughout the course of the piece (212-13). Dann then provides his
interpretation of how this music could be separated into multiple voices, using the opening four measures of the
Sarabande Double from Bach's B minor Partita as his material.
As can be seen in Example 1, Dann separates this brief excerpt of music into five different voices, with some
instances of doubling reflecting times when two different lines have coincided or when a single tone functions as
part of more than one polyphonic strand. At least initially, it seems that Dann's analysis is mainly determined by
his interpretation of the opening three tones, a simple root position arpeggiation of the tonic triad. Although he
does acknowledge that these three tones could be heard as a single entity due to the influence of harmony, he
ultimately chooses to interpret them as "carving out an area of musical space in which they will start operation as
three individual voices" (Dann 215). He then goes on to explain in detail the different lines that he sees emerging
out of just the first tone. While it is seen as a sustained note in one voice that remains constant until the leading
tone enters on the downbeat of the third measure, it is also considered the beginning point of both an ascending
and a descending voice that he marks in the diagram with a : and an 6 , respectively.
Example 1.
While further explaining his particular analysis, Dann states the following:
In a polyphonic complex such as the one under consideration, no system of analysis can be expected to present
more than a partial picture of the various voices and their interrelationship. The following analysis does not
presume to be the only possible one, nor even to be entirely correct; it merely attempts to illustrate one way in
which the inner ear may gather together the threads implicit in this piece of one-line polyphony . . . the five staves
have been chosen for convenience, to bring out certain polyphonic relationships and not to argue that there are
exactly five voices to be heard (215).
This brings up the important point of the perceptual relevance of this type of analysis and also returns us to issues
of auditory stream segregation. Although Dann's interpretation does provide an interesting perspective on the
implied polyphony based mostly on the harmonic and rhythmic functions of each individual tone, he admittedly
has not taken into account how this passage might actually be heard by both performers and listeners.
One of the first issues that arises when principles of auditory stream segregation are applied to Dann's analysis is
the way he parses the opening b minor arpeggiation into three different voices. In a 1975 study, Leo van Noorden
presented subjects with an alternation of two tones in varying rates of repetition, with one tone remaining fixed
and the other tone moving to various frequency differences. The subjects' task was to indicate the points at which
the frequency separation became too large to hear one coherent stream and too small for separate streams to be
perceived. Van Noorden essentially concluded that "the degree of association varies inversely as the pitch
difference, or pitch distance", with streams played at high rates of repetition being heard as a single, coherent
unity when the frequency separation was less than five semitones (13). It, therefore, seems unlikely that the
opening arpeggiation of the Bach Sarabande Double would actually be perceptually segregated into three
different strands based on frequency separations of only three or four semitones.
Another issue concerns the fact that Dann separated this brief passage into five total voices. In a 1989 study
whose aim was to determine the number of simultaneously sounding polyphonic voices that a listener could
identify and count, David Huron found that a threshold seems to exist at three voices. One subject even
commented after the study that he found himself using two different techniques for determining the number of
concurrent voices. The subject felt very confident in his ability to provide an accurate count when there was a
small number of voices, but instead found himself comparing the density of surrounding textures and simply
estimating the number of voices when the total number was greater than three. As Huron then concluded,
It appears that in the perceptual denumeration of sounds of homogeneous timbre, listeners do not follow the
arithmetic sequence: one, two, three, four, etc. to infinity, but proceed in a manner similar to the counting
language of the San bushmen: auditorily we may count: one, two, three, many - where one might admit only
gradation of "manyness" rather than definite discrete values (378).
After taking into consideration both the lack of extensive research into this issue and the apparent disregard for
fundamental perceptual tendencies, it is clear that a more detailed set of guidelines is necessary in order to shed
greater light on this idea of implied polyphony. For this reason, a simple rule system was created to provide a
concrete and consistent method for parsing these solo instrumental lines into multiple voices. This rule system
focuses on bottom-up conditions, or note-to-note interactions, and intentionally does not take into account every
possible musical parameter. For this reason, some voice changes will be blatant or obvious, while others will be
harder to distinguish or perhaps not present at all. Some degree of such ambiguity certainly seems appropriate
since even the most superficial glance at the movements consisting of mostly chords and multiple stops shows
that Bach did not consistently maintain the same number of voices throughout a single piece. There are numerous
instances were one voice seems to be suspended while other voices move around it, with that original voice only
later reappearing for resolution and further melodic motion. It, thus, seems reasonable to assume that similar
compositional techniques were applied in the monophonic movements. Again, this is something that Bregman
recognized as stemming from fundamental principles of auditory stream segregation. As he stated,
The alternation of registers in a single instrument tends to produce a more ambiguous percept in which either the
two separate lines or the entire sequence of tones can be followed . . . It is not certain that the composers who
used this technique would have called for such an unambiguous separation of musical lines even if the players
could have achieved it. It is likely that the technique was not used simply as a way of getting two instruments for
the price of one, but as a way of creating an interesting experience for the listener by providing two alternative
organizations (464-65).
In this rule system, weights are assigned to the transitions between different pitches based on the degree to which
three basic features are present. These weights essentially only signify the extent to which these features might act
in conjunction with one another at any single point of transition to suggest a clearer or more obvious change of
voice. Transitions whose weight crosses a threshold of four points are seen to signal a change of voice, thereby
generally ensuring that more than one of the following rules would be enforced. In order to facilitate the analysis
of this entire repertoire, a computer program representing this rule system was created using the Humdrum
Toolkit (Huron 1999). This program works from a file of each original score with all pitches translated into a
succession of ascending and descending intervals.
Rule 1: Interval Size
Given a sequence of four notes (n1, n2, n3, n4), let Int2 = n3-n2.
the first step in analyzing the results of this activity was to simply identify the specific places in the music where
students most often notated a change of voice, regardless of which voice they placed that passage into within the
overall texture. A quantitative analysis of the students orchestrations was then performed by marking each note in
the score with the number of students that signified that note as the beginning of a new voice. These numbers can
be found below each line of the score provided in Example 2. The upper row of numbers then represents the
output of the model described above, with each score reflecting the degree to which interval size, contour, and
conjunct motion combine to suggest a change of voice. The highest possible score for the students responses is
fifteen, while there is no upper limit for the output of the rule system.
Example 2.
Since these two sets of scores are measuring two fundamentally different things, their significance does not
ultimately lie in their actual numerical values. The numbers, instead, simply reflect the relative strength of any
potential change of voice. In general, then, a comparison of these strengths shows a fairly high degree of
correlation between the model's output and the students' responses. This correlation is most easily seen in the
notes which have been circled in the score, which show places where both the model and at least 10 out of 15
students indicated an obvious change of voice. Due to the large leaps, the dramatic changes in register, and the
instances of stepwise motion, the clearest example of this correlation is found in measures six and seven of the
Allemande, where almost all of the fifteen students consistently indicated the same changes of voice that the rule
system strongly suggests.
There are, however, places in the score which exhibit a much greater degree of ambiguity, where there is more
discrepancy between the output of the model and the responses of the students. This is mostly due to the fact that
the rule system described here essentially only analyzes note-to-note interactions in order to determine the extent
to which bottom-up processing can adequately explain the implied polyphony in this type of music. Many other
larger level structures and processes certainly influence the tendency for this music to separate into multiple
voices, though. Based on the students' responses, rhythmic placement, motivic repetition, timbre, and articulation
are some of the most powerful of these influences.
In measures four and five, for example, a majority of the students only indicated changes of voice in three places
(the circled C, D, and E in Example 2). While these results certainly reflect the fact that larger intervals are more
likely to suggest a change of voice than smaller intervals, they also reflect an awareness of articulation changes
and motivic parallelism. It is evident that the students were recognizing the shift between slurred and separately
bowed notes, as well as the repetition of a two beat motivic pattern (which is misaligned by one sixteenth note
with the meter). Even though the model indicates additional voice changes inside this repeated pattern, fewer
students chose to include them as part of their analysis. Perhaps this is because these changes are only influenced
by the note-to-note details that the model addresses, not by either one of the larger level groupings that the
students were taking into consideration.
One instance of the influence of meter then occurs in the sequence which begins in measure eleven. A large
majority of the students consistently noticed that the first note of each leg of this sequence creates a descending
melodic line and chose to signal a change of voice prior to each of those notes. The rule system also marks the
same points as changes of voice, mostly because of the descending sevenths and the change of contour that occurs
between each leg. It is interesting to note, however, that there is a corresponding descending melodic line
occurring in the last note of each leg of the sequence. Perhaps this line was not recognized by as many students as
a change of voice because it is not strengthened by a metric accent, thereby supporting Leonard Meyer's idea that
an accent is "a stimulus (in a series of stimuli) which is marked for consciousness in some way" (8).
Although these larger level issues have certainly been recognized and addressed to some degree, their
formalization as part of the model set forth here is still forthcoming. Perhaps the fundamental value of this model
as it currently stands thus lies in its contrast to the ways that this music is typically discussed, especially by
performers and pedagogues. As a final example, noted violinist and pedagogue Yehudi Menuhin stated,
Even though there can be no allowance in the music of Bach for arbitrary effects, personal indulgence, or changes
of direction, as there are indeed in the romantic literature, there is every justification for a flexibility, a fluidity of
line, a play of accent, colour and stress within a given series of notes, but only of course when these are justified
by a sensitive and disciplined musical intuition and by an intellectual awareness . . . For instance, although many
of Bach's movements for solo violin and particularly for 'cello are written in one voice, that is without
counterpoint and harmony, the counterpoint and the harmony are in fact implied and every effort must be made to
bring the different voices out clearly, even though there is never more than one voice sounding at a time (119).
Despite the fact that this statement does make reference to both the existence of this implied polyphony and the
need for flexibility and sensitivity in its performance, Menuhin does not actually describe how to identify this
polyphony or to "bring the different voices out clearly." Perhaps this phenomenon is related in some way to basic
ideas about automaticity, which suggest that someone who has mastered any technique or skill often has a
difficult time adequately describing what he is doing and instead prefers to teach by demonstration or imitation. It
is possible that many musicians have developed their skill to the extent that they no longer consciously think
about what might be considered the fundamental technical aspects of the music, instead turning their thoughts to
larger level groupings or phrasings. In turn, this results is an underestimation of the power of these note-to-note
details that are the very focus of the rule system presented here.
For this reason, it is important to have some method which can assist in clarifying the issue, thereby helping to
identify those features of the music that might otherwise be overlooked, taken for granted, or left unexplained.
Although it is unlikely that any single system could fully account for all aspects of this intensely complex music,
this study has shown that a model based on a small number of simple guidelines is actually a fairly powerful
indicator of how musicians might interpret the implied polyphony in this piece. As performers and pedagogues
become more aware of these specific polyphonic potentials, they also become more conscious of the expressive
potential in this music. Ultimately, an informed performer has utmost freedom to either let the implied polyphony
emerge on its own or to provide added emphasis through the use of a variety of expressive techniques, thereby
making this structure more perceptually relevant to audiences of all kinds.
REFERENCES
Bregman, A. S. (1994) Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT
Press.
Cooper, G. , Meyer, L. (1960) The Rhythmic Structure of Music. Chicago: University of Chicago Press.
Dann, E. (1968) Heinrich Biber and the Seventeenth Century Violin. Ph.D. Diss., Columbia University.
Huron, D. (1989) Voice Denumerability in Polyphonic Music of Homogenous Timbres. Music Perception, 6 (4),
361-382.
Huron, D. (1999). Music Research Using Humdrum: A User's Guide. Stanford, California: Center for Computer
Assisted Research in the Humanities.
Kennan, K. 1987 Counterpoint: Based on Eighteenth-Century Practice, 3rd ed. Englewood Cliffs, N. J.:
Prentice-Hall.
Lerdahl, F., Jackendoff, R. (1983) A Generative Theory of Tonal Music . Cambridge, MA: MIT Press.
Menuhin, Y. (1976) The Violin. New York: Schirmer Books.
Ortmann, O. (1926) On the melodic relativity of tones. Psychological Monographs, 35, Whole No. 162.
Piston, W. (1947) Counterpoint. New York: W. W. Norton & Company, Inc.
Stevens, D. (1976) "Bach's Sonatas and Partitas for Violin." In Violin and Viola. New York: Schirmer Books.
Van Noorden, L.P.A.S. (1975) Temporal Coherence in the Perception of Tone Sequences. Unpublished doctoral
dissertation, Eindhoven University of Technology.
Back to index
Proceedings abstract
Alf Gabrielsson
Department of Psychology
Uppsala University
Box 1225
SE - 751 42 Uppsala
SWEDEN
Background:
Studies on emotion in music can be divided into two main categories: studies focusing on recognition of emotional
expression in music and studies focusing on listeners' own emotional response to the music (induced emotions). The
distinction between these categories is not always upheld in investigations.
Aims:
This paper aims at comparing and discussing methods and results of studies in both categories mentioned above with
regard to parallels and contrasts.
Main Contribution:
On the basis of extensive reviews of empirical studies, various classifications of both recognised and induced emotions
have been made. Classifications may be data-driven, theory-driven, or both. In either case, a basic classification into
aspects of valence and arousal seems feasible. However, it has to be supplemented in various ways to do reasonable
justice to the manifold and the subtleties of emotions recognised in or induced by music. Results depend to a great deal
on the music used, voluntarily or involuntarily chosen, as well as on the methods for obtaining and analysing data.
Furthermore, results are by necessity influenced by listener and situation characteristics.
Implications:
Investigations and discussions on emotion in music would benefit from a clearer distinction between recognised and
induced emotions. Although this distinction may be somewhat blurred, awareness thereof should help in interpreting
results, reveal inconsistencies, and contribute to increasing both internal and external validity of studies on emotion in
music.
Back to index
Proceedings paper
mystical aura that surrounds the concept of creativity can be said to exist in the absence of a complete
theoretical explanation of the phenomenon (Finke, Ward & Smith, 1996). However, Boden suggests
that creativity may be no more mysterious than other unconscious processes and systems, such as
vision, language, and commonsense reasoning (p. 75). The essence of the novelty in artistic creativity
may be metaphorical thinking. All humans are likely to use such thinking, and perhaps people who are
creative, such as artists and scientists, simply use it more often or to more focussed purposes
(McKechnie, 1996). Alongside novelty, unconscious processes and metaphor, another element
common to a number of accounts of creative thinking is the juxtaposition of two seemingly
contradictory ideas. Rothenberg (1994) refers to this as a Janusian process: the ability to hold two
competing, contradictory ideas, images or concepts in mind simultaneously. He proposes that
creativity is the synthesis or coalescence of these. Koestler (1964) pointed to useful distinctions
between creativity as it appears in humour (the collision of matrices or planes of thought), in science
(integration), and in art (analogy). More recent accounts of creativity emphasize processes of problem
solving and problem finding (Kay, 1994; Wakefield, 1994). Putting these notions together, Boden
argues that a theory that considers unexpected combinations, together with a psychological
explanation of analogy, may suffice as a theory of creativity.
Creativity in Choreographic Cognition
By nature contemporary dance is difficult to study as it is ephemeral and, unlike a musical score,
painting or sculpture, there are few notes of the development of the work or even good records of all
aspects of the performance. Fortunately, since early 1999 a collaborative research team involving the
Victorian College of the Arts, dance industry partners, and researchers in Australia has captured on
digital video the inception and development of new dance works by two elite choreographers. We
draw on this video and journal documentation seeking examples of problem finding and problem
solving, metaphorical thinking, and evidence of the synthesis of competing ideas.
An important characteristic of creativity in contemporary choreographic cognition is that dancers and
choreographers increasingly work together exploring, selecting, and developing dance material.
Australian choreographer Anna Smith developed new dance material working closely with eight
experienced and professional dancers over a period of six months. The dance materials were generated
from improvisations of the whole group. At one stage, spoken cues were given to the dancers such as
'Right elbow behind back, shoulders tilting, left hand reaching' and each dancer interpreted the cue.
Individual solutions were found and the group gradually selected and developed the interpretations
made by one or more of the dancers. Importantly, the choreographer was not in control of the material
thus generated but the choreographic process took place through interactive dance-making to which
everyone contributed (McKechnie & Grove, 2000). An explanation of creativity in choreography must
therefore address the complex of dynamics and interactions among dancers and choreographer in this
community of creative minds. In addition to motivation, memory, and personality factors that
underpin the individuals' thoughts and behaviour, there are dyads and triads within the group and
concomitant ideas, tensions, conflicts, attractions and defences. Thus the social and cognitive
psychologist searching for a fresh domain to test current theoretical assumptions will be pleased with
the uncharted territory offered by choreographer and dancer interactions.
Instances of problem finding and problem solving in choreographic cognition are easily found. The
development of movement as art brings with it challenges of the limits of the human body and best
use and negotiation of the dimensions, space and time. Although difficult to capture in writing, video
footage of Smith and her team demonstrates the cognitive complexity of a segment that involved rapid
and continuous whole body movement from all dancers with each performing a different series of
complex transitions. As well as the motor and spatial complexity of each transition, the dancers were
to carry out their individual movements while the group traced a DNA-like double helix. Before the
sequence could be performed a logistical analysis to determine a way in which it could work spatially
was carried out. Finally, movement of the complex spatial configuration of parts (dancers) and whole
(group forming the helix) was realized using colour-coded paper trails of the path of each dancer.
Thus the spatial and temporal configuration was modelled with concrete materials and after much
analysis and trial and error it was achieved in real time and space.
In another example, McKechnie describes creation of a work commissioned for a very small dance
space. The space elicited ideas and images related to the use of simple forms in small spaces.
Pondering this problem led to images of Ikebana, to the similar asymmetry of human lives lived in
close contact in small rooms, to the alienation of separate lives closely entwined spatially but
separated by emotional chasms. The source of the solution to the problem lay in synthesis between the
imagery of confined spaces and the experience of contained tensions. A final example involves
synthesis of real and imagined time in the perceptions of observers. Amplification, choreographed in
1999 by Phillip Adams reflects an interest in the contemporary cult of the pornography of car crashes.
The choreographer faced a problem of how to represent a distorted experience of time in dance terms.
The seemingly endless expansion of time experienced by car crash victims during the few seconds of
a violent accident became the source of a central image. The problem of conveying the nature of the
experience in real time was solved by breaking up movement material into brief distorted and
fractured components and performing a long and complex sequence of them at a tempo verging on the
perilous, a feat accessible only to highly trained contemporary dancers. The presence of imagery is
evident in these two examples and in most accounts of creativity. The examples also demonstrate that
imagery can occur in all sensory modalities and in contemporary dance, unlike other artforms, the
creative search is embodied in the human form.
Memory and Imagery in Rehearsal and Performance of Contemporary Dance - The Performer
Although choreography and contemporary dance have only rarely captured the interest of
experimental psychologists, classical ballet versus contemporary dance have been used as tools to
examine coding in human short- and long-term memory. Results include the observation that memory
for complex movements is more kinaesthetic than verbal (Starkes, Caicco, Boutilier & Sevsek, 1990).
Anecdotal accounts suggest that recall is often multi-modal such that activity in one mode triggers
knowledge or recall in another. Smyth and Pendleton (1994) used an interference paradigm and
measured effects of articulatory and movement suppression. Dancers' spans were longer than those of
non-dancers for both classical ballet and modern movement and both articulatory and movement
suppression decreased the dancer's spans. This implies that material is coded at least in the short-term
in both verbal and kinaesthetic form. Long-term memory for dance material has been examined by
Solso and Dallob (1995). They propose that a class of movements is represented abstractly in memory
in the form of a prototype. Solso & Dallob conclude that there is an underlying scheme that governs
the formation of body actions in general and dance routines in particular and that it may be possible to
determine basic laws of motor performance and transformation as part of a comprehensive theory of
dance 'grammar' and general kinaesthetic 'grammar'.
Of the experimental studies of memory for dance most have used classical ballet in which a sequence
of prescribed steps is drawn from an established repertoire of labelled formal movements. By contrast,
contemporary dance frequently consists of idiosyncratic movement derived from the theme being
explored and is less easily reduced to verbal description. At one point in developing Red Rain the
dancers commented on the extraordinary amount of information they needed to retain while working
with new and demanding movement material. On another occasion a dancer watched herself perform
a slow and intricate move on video but had little recollection of performing the movement or how she
made her body move in a particular way. Such observations have implications for memory in
choreographic cognition. One testable hypothesis is that verbal labels or cues for single movements
(such as 'Deirdre's wrist; Kathleen's sitting bones; Nicole's no. 3') are used initially. Over time, longer
and more complex movements are sequenced, rehearsed, and chunked in long-term memory. With
repetition, the entire sequence becomes part of kinaesthetic memory. A crucial question that arises is
to ask what is the nature of the representation in memory that stores and integrates visual, auditory,
propositional, spatial, temporal, and kinaesthetic features?
Imagery is used extensively in dance because "an arsenal of images has the ability to find a concise
way of describing a movement" (Smith, 1990, p. 17). Using psychometric tools, Overby (1990)
showed that experienced dancers differed significantly from novice dancers on three of four imagery
ability measures, namely body image, cognitive imagery and spatial ability. Interpreting results on the
Individual Differences Questionnaire (IDQ), Overby suggested that, while novice dancers prefer a
visual mode of thought, experienced dancers were equally inclined to verbal and visual modes of
thinking. She speculated that dance experience may be related to a tendency to process visual and
verbal information on equal terms. Foley, Bouffard, Raag & DiSanto-Rose (1991) demonstrated that
subjects who performed movements or imagined themselves performing them were better at
recognizing the movements than those who observed or imagined another performing. This
self-performed task effect was evident only for "uncommon" movements (modern dance and ballet).
Visualising movement and movement patterns is now common in sport and in systems of training in
kinesiology (Sweigard, 1974). Foley et al. concluded that further research is needed to establish
whether imagery abilities differ in general or specific ways across expert and novice dancers. The
ideas of Damasio (1999) and improved methods of investigation using new scanning technology are
likely to contribute to our understanding of imagery and movement.
In his analysis of Red Rain development footage, Grove (1999) noted that, as the dancers explored
movement triggered by a verbal cue, they appeared to intellectualise their task. Paradoxically, "the
movement became more internal, establishing its own pathways through the body, internal
realizations, instead of relying on a picture or mirror-image of what the spectators see". For Grove, "it
was as if the piece was being created from the inside out". There is imagery indeed behind the
movement created and explored by the eight dancers but the way in which experimental studies have
considered imagery seems to fall short of the processes involved in an actual creative act. Experiments
have dichotomised memory codes as either propositions or non-verbal structural descriptions and
images. However, categorisation of movement in terms of what it is not, i.e. non-verbal, is
uninformative and simply reflects the lingua-centric bias of cognitive psychology. In its stead, Grove
refers to dance-making as "an utterance of the body". The artist, whether poet or choreographer, does
not necessarily start out with words or a visual image, but instead material may come from a pulse or
a rhythm. A challenge for the experimental study of choreographic cognition is to divest itself of
reference to verbal versus non-verbal features and turn to the seemingly simple notion of a generative
pulse or rhythm. A dynamical view is based on this simple but powerful assumption.
As the medium of contemporary dance is time we propose that the artistry of movement is in
trajectories, transitions, and in the temporal and spatial configurations in which moves, limbs, bodies,
relate to one another. Choreographic cognition can be conceived as a dynamical system wherein
change to a single component can affect the entire interacting network of elements. In a dynamical
system, time is not simply a dimension in which cognition and behaviour occur but time, or more
correctly dynamical changes in time, are the very basis of cognition.
Meaning and Communication in Contemporary Dance - The Observer
The power of movement and dance to evoke memories has been identified as an important factor in
the communication achieved via contemporary dance. Hanna (1979) suggests that affective and
cognitive communication in dance are intertwined and she gives a broad account of the way in which
emotion is communicated. For example, physical movements associated with affect may stimulate or
sublimate a range of feelings and may be elicited for pleasure or coping with problematic aspects of
social involvement. Adults may find succour and release cathexis in culturally permissible motor
behaviour; this may be reminiscent of nurturance and protection of prenatal and infancy stages and
imitates satisfaction of childhood behaviour. Dance may communicate a kind of excitement; may also
provide a healthy fatigue or distraction that may abate temporary crises. Examples of the intoxication
that occurs with rapid movement abound. Such therapeutic matters are unlikely to be of concern to the
choreographer. However, such responses on the part of the observer constitute communication and
will reinforce pursuit of dance as art or entertainment. The psychological issue that remains is to
explain the mechanism that underpins release and cathexis. Sympathetic kinaesthesia is one possible
explanation.
Conversations with elite choreographers and dancers suggest the presence of intriguing somatic and
kinaesthetic processes when they observe dance performance and this leads to many possibilities for
research into communication via kinaesthetic perception. Anecdotal reports suggest that expert
observers actually feel the movement or feel as if they are performing the movement; a kind of
sympathetic kinaesthesia. One way to examine this would be to take detailed physiological recordings
of changes in tension, galvanic skin response, muscle response, heart rate, and blood pressure, as an
observer watches a performance. We can examine the effects of differing levels of experience and
performance expertise. We can also assess the way the presence of music might moderate
physiological change. Finally, we can ask whether there is evidence of comparable physiological
change in other performance artists such as elite musicians as they observe a virtuosic performance on
their instrument. Is observation for all elite artists a virtual performance? Indeed, one could imagine
that if such muscular and physiological changes occur during mere observation then styles of dance
do not evolve or change from simply watching seminal works but aspects of the performance may
literally be stamped in to the choreographer's kinaesthetic memory: a kind of virtual plagiarism!
Interestingly, recent neurophysiological findings suggest a mechanism that may underpin sympathetic
kinaesthesia. Neurons have been identified in both monkeys and humans that fire according to
particular actions of the hands and mouth, rather than with the individual movements that form them.
Furthermore, a class of these same neurons fire when the action is observed being performed by an
other (Di Pellegrino, Fadiga, Fogassi, Gallese & Rizzolatti, 1992; Fadiga, Fogassi, Pavesi &
Rizzolatti, 1995). Rizzolatti and Arbib (1998) suggest that the mirror system represents in an observer
the actions of an other. If this is so, then as we observe a dance performance particular neurons are
firing that represent particular dance actions in us.
A Dynamical Systems View of Choreographic Cognition
Time is the glue, the medium of choreography and contemporary dance and, for this reason,
contemporary dance lends itself to analysis in terms of dynamical systems theory. In this theory
complex wholes and forms emerge from simple elements and in self-organising dynamical systems
structures emerge from chaos. It is possible to apply the dynamical view to identify pulses, rhythms,
patterns that spark an idea that is utterable in movement. The pulse or rhythm can occur in any
modality but, for the creative choreographer, will be expressible as a composition of movement in
space and time. Ultimately, we can apply the notion of dynamical systems to better understand,
possibly to model, the movements and form of a single body, or many bodies, in space and time.
Another artform - music - has been described as a dynamical system (Burrows, 1997; Sloboda, 1998)
and Sloboda's account in particular is relevant to dance. Sloboda argues that meaning in music comes
from the way it embodies the physical world in motion. Human understanding of music comes from
our capacity for analogical thinking. If contemporary dance too embodies the physical world in
motion it may be doubly powerful in that it can be understood both by analogy and by direct
perception. That is, the trajectory of objects in motion, through time, is the very stuff of dance - real
objects moving in real space and real time. Adams' Amplification is a good example of a
contemporary dance work that can be understood both directly and by analogy.
A dynamical systems view of choreographic cognition holds that behaviour is continuous and each
component acts and interacts with others in the system. Each state of the system determines the next
state so that a structure or form evolves. Change occurs at many time scales and change at one scale
shapes and is shaped by change at others. The process is one of self organization where solutions
emerge to problems defined by particular constraints of the immediate situation (Thelen, 1995). In the
context of movement there will be a number of physical constraints that will influence and determine
the evolving form. Constraints might include mass, limb structure, size, weight, flexibility, space
limitations, and so on. Within the set of constraints there will only be a certain number of possibilities
so that the evolving movement is determined by what has come before and the context in which the
movement is set. Importantly though, the movement is flowing, continuous, transitional - it is motion
rather than simply 'moves' or 'steps'.
Constructing and Testing a Dynamical Theory of Choreographic Cognition
One of the first tasks for the development and evaluation of a dynamical theory of choreographic
cognition is to specify the constraints and features/variables of the system. Such information may be
procured from detailed three-dimensional analysis of (initially) simple movements/transitions. Even a
five-second sample of a transition in a modern dance piece will generate a multitude of possible
features. The simplest starting point would be to identify a feature as a single oscillator and
demonstrate entrainment and coupling to a relatively simple motion task. Thelen (1995), Saltzman
(1995) and Large and Jones (1999) provide examples of the way in which single- and dual-degree of
freedom oscillatory models lead to testable predictions of human timing behaviour. Although a single
oscillator is unlikely to capture the richness of the creation and performance of modern dance, an
investigation of a single oscillator model of dance-like movement or body transition would provide
the needed existence proof of the viability of the dynamical approach.
At a higher level of complexity, detailed analysis of dances (in Adshead, 1988) and the movement
notation system of Laban (1975) are fertile ground for identifying the key features and movement
variables in contemporary dance. The three broad categories in Laban Movement Analysis are use of
the body, use of space, and use of dynamic energy. Adshead expands on these concepts to include
analysis of relationships between the parts and the whole, and of interpretation and evaluation. Precise
elements are then defined within each of the categories. To take an example, movement may leave
straight lines as vapour trails, an action may result in curved and arc-like trails, and other motions
leave behind complex three-dimensional loops, twists, and spirals. These so-called trace-forms can
then be analysed in more detail: linear trace forms may be accomplished simply with flexion and
extension of various joints; curved trace-forms require abduction/adduction and sometimes rotation.
Technology now provides us with motion capture systems via digital video and computer imaging
which is able to reveal the most subtle and complex movement pathways. To scrutinize samples of
outstanding contemporary dance works with the tools of dynamical systems theory and digital
technology would be a fascinating and informative exercise.
A dynamical view of choreographic cognition has great explanatory power in that it is relevant to the
three key actors we have described - choreographer, performer and observer. Our dynamical view
proposes that the basis for an idea in movement can come from a pulse, beat, rhythm, or action. The
task and artistry of the choreographer is to notice such a pulse and to express it in bodily form. The
germ of an idea may multiply and develop so that from a single movement other variations,
approximations, caricatures, inversions, emerge. (At some later stage the movement may be described
using verbal language or visual images - but this is not necessarily its original form). In dynamical
terms, there may be structure and order, perhaps self-similarity, that emerges from apparent chaos.
Complexity increases when choreographer and dancers interact and dancers perform - transitions,
explorations continue to be conceived as a state space of many dimensions. Finally, for the observer,
there is understanding from recognition, perhaps via analogy, of objects, organisms, moving in space
and time and experiencing the conflicts, tensions, resolutions of a biological object negotiating the
world, a world of time, space, and others. Within the dynamical scheme, a dance work consists of
transitions of elements in high-dimensional space. The meaning for choreographer, dancer and
observer lies in the dynamics of these transitions and their embodiment of the physical and biological
world.
Author Notes
This research was supported by an Australian Research Council SPIRT grant. Details of the project
Unspoken Knowledges and Red Rain can be found at http://ausdance.anu.edu.au/unspoken
References
Adshead, J. L. (Ed.) (1988). Dance Analysis: Theory and Practice. London: Dance Books.
Boden, M. A. (1996). What is creativity? In M. A. Boden (Ed.), Dimensions of Creativity Cambridge,
Mass: MIT Press. pp 75-117.
Burrows,. D. (1997). A dynamical systems perspective on music. The Journal of Musicology, 15,
529-46.
Damasio, A. R. (1999). The Feeling of What Happens: Body and Emotion in the Making of
Consciousness. Harcourt Brace.
Di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., & Rizzolatti, G. (1992). Understanding motor
events: a neurophysiological study. Experimental Brain Research, 71, 491-507.
Fadiga, L., Fogassi, L., Pavesi, G., & Rizzolatti, G. (1995). Motor facilitation during action
observation: A magnetic stimulation study. Journal of Neurophysiology, 73, 2608-11.
Finke, R. A., Ward, T. B., & Smith, S. M. (1996). Creative Cognition: Theory, Research, and
Applications. Cambridge, Mass: MIT Press.
Foley, M. A., Bouffard, V., Raag, T., & DiSanto-Rose, M. (1991). The effects of enactive encoding,
type of movement, and imagined perspective on memory for dance. Psychological Research, 53,
251-59.
Grove, R. (1999). In the house of breathings. Proceedings of the Second International Dance
Research Conference. Auckland, New Zealand: Danz.
Hanna, J. L. (1979). To Dance is Human: A Theory of Nonverbal Communication. Austin: University
of Texas Press.
Johnson-Laird, P. N. (1988). Freedom and constraint in creativity. In R. Sternberg (Ed.), The Nature
& T. van Gelder (Eds.), Mind as Motion: Explorations of the Dynamics of Cognition. Cambridge,
Mass: MIT Press. pp 69-100.
Wakefield, J. F. (1994). Problem finding and empathy in art. In M. A. Runco (Ed.), Problem Finding,
Problem Solving, and Creativity. Norwood, NJ: Ablex Publishing Corporation. pp 99-115.
Wales, R., & Thornton, S. (1994). Psychological issues in modelling creativity. In T. Dartnall (Ed.),
Artificial Intelligence and Creativity. Netherlands: Kluwer Academic Publishers. pp 93-105.
Back to index
Proceedings paper
INTRODUCTION
Since the introduction of the National Curriculum (1988) in schools in England and Wales, it is now
required that all children study music up to the age of 14, and composition forms a large part of this.
Composition is defined very broadly in the primary school and refers to the briefest musical
utterances, as well as to more sustained inventions. This paper reports three studies of children's
collaborative music composition.
Collaborative work among children has been the focus of much research since the initial writings of
Piaget and Vygotsky. Essentially, it was claimed that children working in pairs or groups on any kind
of task can achieve a higher level of understanding than any one child could achieve alone (see Doise
& Palmonari, 1984). Research has since sought to explain what is learned through social interaction
and how the interaction takes place.
Research of this kind is vast, and has looked at children working collaboratively on a wide range of
scientific tasks, such as mathematical problem solving, logical reasoning and so on (e.g. Tudge &
Rogoff, 1989). Little of the research to date has examined the role of peer collaboration in creativity,
where goals are less clearly defined and measures are more ambiguous. It could be that factors found
to be responsible for productivity in the science-based tasks differ from those responsible for
productivity in creative tasks.
Much of the previous (science-based) peer collaboration research suggests that the most important
element of task activity in groups is the dialogue among group members (e.g. Tolmie et al, 1993). The
recurring theme is one of sharing ideas verbally, arguing through alternatives, and providing
justifications for accepted and rejected solutions. That is, the more of this type of talk that occurs
among collaborating children, the greater their productivity. It is suggested here that in collaborative
music composition tasks, rather than discussing their ideas, the children would prefer to try their ideas
out directly on the musical instruments, and thus somehow communicate with each other through the
music itself.
Possible support for this hypothesis comes from studies of computer based problem solving tasks.
Pheasey & Underwood (1986) and others have found evidence of peer facilitation effects but low
levels of verbal interaction. Children working on a computer based task were found to produce higher
quality work when they collaborated with a partner than if they worked alone, but they did not appear
to be sharing their ideas verbally. Subsequent analyses revealed that the children were trying their
ideas out directly on the computer, thus they were said to be communicating with actions rather than
words. It is suggested that children working on music composition tasks will work in the same way.
The gender composition of the collaborating group is reported by previous researchers to be a salient
factor (e.g. Underwood, 1994). A detailed examination of the gender issues is beyond the scope of the
present paper, therefore this paper will consider only two key issues: firstly, the previous finding that
in mixed gender groups, boys tend to dominate over girls; and secondly, that single gender groups
tend to achieve better results than mixed gender groups. It should be noted that all of these findings
come from studies of science-based tasks and it is important to examine these issues in relation to
creative tasks.
To summarize, the overall aim of the present research was to examine what factors within a group of
children lead to the production of a good music composition. Particular attention was paid to the
amount of verbal interaction, and to whether the children communicated their ideas through the music,
and if so was this form of communication important for group productivity. Also of interest was how
the gender composition of the group affected the composition process and the quality of the work
produced.
The assessment of quality in music composition is a tricky area. It is argued that assessment
procedures need to be task specific and should be tailored to meet the needs of the researcher and the
demands of the task. Thus, procedures to assess the compositions are discussed within the context of
each of the studies. For a detailed discussion on these issues see Morgan (1998) and Webster (1992).
Three studies are reported here that differ only in the nature of the task given to the children. The first
task was a representational composition task, in which children were asked to compose a piece of
music to represent the events of a story. The second task was a formal music composition task and
required children to compose a piece of music 'that has a beginning, middle and end'. The third study
used an emotion-based composition task, and required the children to compose a piece of music 'that
will make me happy'. For a detailed discussion on these three types of task, see Barrett (1995).
STUDY 1
A TRIP TO THE SEASIDE: A REPRESENTATIONAL MUSIC
COMPOSITION TASK
METHOD
Eighty-eight children aged 9-10 were put into groups of 4 of varying gender compositions. The music
composition stimulus was a story about a family's trip to the seaside, which is presented in full below.
The children were asked to work together to produce a series of sounds or music to represent the
events of the story. They were given four musical instruments; a xylophone, a drum, a triangle and a
cabasa. It was ensured beforehand that they had prior experience of this kind of task, and of these
kinds of instruments. The children were told that they would have 20 minutes to work on the task and
they were videotaped throughout. They were also told that they would be asked to give a final
performance of their composition for the video camera.
Process variables
The collaborative working period was assessed independently of the performance of the finished
composition. Several aspects of the children's interaction during the working period were timed with a
stopwatch.
Total talk for each child was sub-divided into task directed talk, time spent reading the story aloud,
off task talk and interaction with the researcher.
Task directed talk was defined as any talk directed towards the successful completion of the task.
This type of talk included the presentation of ideas and suggestions to other group members, the
discussion of alternatives and the justifications of accepted and rejected solutions. Task directed talk
was therefore assumed to be indicative of attempts to share the social reality of the problem-solving
situation.
Read was simply the time spent reading the story aloud. This was included because it comprised a
large part of the child's talk time, and while it was task directed by nature, it was not seen as actively
sharing one's ideas with other group members.
Off task talk was defined as any talk not directed towards completion of the task, suggesting time out
from actively working to complete the task.
Interaction with the researcher was any time spent talking to the researcher, including questions of
help.
Similarly, there were two sub-variables of total time playing the instruments: task directed play and
exploratory play.
Task directed play was defined as play directed towards completion of the task and towards other
members of the group. This definition included the presentation of ideas directly on the instruments,
and was viewed as an alternative means of sharing the social reality.
Exploratory play refers to the exploration of the sound materials, and was seen as being directed
towards the individual, or 'playing for oneself'. This type of play was not seen as contributing to a
mutual understanding of the task, and did not move the group closer towards establishing shared
understanding or towards the completion of the task.
To assess the possible effects of the gender composition of the collaborating group in terms of verbal
and musical interaction and subsequent group performance, the type of group in which the children
were placed was coded as consisting of all boys, all girls or mixed gender.
Assessment of the Compositions: The Selectivity Rating Scale
A five-point rating scale was developed to assess the quality of the compositions. The scale provides
guidelines to assist the raters in their marking of the compositions and is presented in full in below.
Three independent raters used the scale to give each of the compositions a mark out of 5. The group
score was therefore the mean mark given by the three raters.
The essence of this scale is the extent to which the children display selectivity or discrimination of
both the actions and events within the story, and of the instruments chosen to represent these. There
are many actions and events within the story that could be represented by an infinite number of
musical sounds. The children must then select a variety of actions or events from the story and decide
how to illustrate these with the available musical apparatus. In this way, groups of children who score
well on the rating scale will be those who demonstrate a certain degree of musical thinking, apparent
in this context through the selection and rejection of sounds. This is based largely on Swanwick's
(1979) three proposed criteria for attempting a definition of music, namely selection, relation and
intention. For a full discussion of the development of this scale see Morgan (1998).
Rating Scale
Score 1:
All sound effects* are played, with no evidence of selection or discrimination. Sound
effects are stereotyped. No evidence of decision making as to which sound should
represent which event or action within the story. No apparent organisation.
Score 2:
More selective with a sense of unity. One or two instruments have been chosen to
represent certain elements of the story. The sound effects tend to focus on events, rather
than actions, and are still very stereotyped. Little structural control and the impression of
spontaneity without development of ideas.
Score 3:
Further selection of events/actions and of instruments is apparent. Sounds become more
appropriate and more inventive. Evidence of a structure to the finished piece.
Compositions still rather predictable.
Score 4:
More selective still. Less narrative. Clear beginning and ending.
Score 5:
High level of selection and discrimination, of both the events/actions chosen and of the
instruments. Clear beginning, middle and ending. A more abstract level than previously.
Equal representation of events, actions, emotions, etc.
*The use of the term 'sound effects' is for descriptive clarity only. At no time at all was it suggested to
the children that they work on producing a series of sound effects. For the children, the emphasis was
put on the transformation of elements within the story into a musical medium.
RESULTS
Verbal and musical interaction
Table 1 shows significant relationships between group productivity, as determined by the selectivity
rating scale, and task directed talk (r=.47, p<.05) and task directed play (r=.44, p<.05). There were no
significant relationships between group productivity and the time spent reading the story aloud, off
task talk, interaction with the researcher or exploratory play. A t-test revealed that there was
significantly more talk than play (t=2.30, p<.05).
Pearson Correlations Between the Process Variables and the Group Score
Table 1
Process variables r p
Gender
Table 2 suggests that the all-girl groups achieved the highest marks for their finished compositions
(mean 3.60), followed by the mixed gender groups (mean 3.30), then the all-boy groups (mean 3.14).
These differences were not significant.
Mean (seconds) F p
(S.D.)
* maximum score = 5
In the mixed gender groups, a series of t-tests revealed that the girls engaged in significantly more
total talk than the boys (t=2.02, p<.05).
DISCUSSION
Verbal and musical interaction
An important finding in this study was the significant relationship between task directed play and
group productivity. This suggests that the children were communicating their ideas through the music,
and that this type of communication was important for group productivity.
A significant relationship was also found between group productivity and task directed talk, and there
was significantly more talk than play. It is suggested here that this may be due to the nature of the
task. The stimulus was highly verbal and the children's ideas may be adequately expressed verbally.
An alternative task is necessary to study this further.
Gender
In this study, the girls in the mixed gender groups talked significantly more than the boys. This
finding is in contrast to those of previous peer collaboration research, where boys tend to dominate
over girls. A possible explanation for this comes from status theories (e.g. Lee, 1993), which suggest
that if a task is perceived as being within the domain of expertise of one particular gender, that gender
will dominate in a mixed gender setting. The previous peer collaboration research has focused on
science-based tasks which could be perceived as being more 'for boys'. Music in schools is perhaps
seen as more 'for girls', and may help explain female verbal domination in the present study (Archer,
1992).
STUDY 2
COMPOSE A PIECE OF MUSIC THAT HAS A BEGINNING, MIDDLE
AND END: A FORMAL MUSIC COMPOSITION TASK
It was suggested that the children's ideas in study 1 might have been adequately expressed verbally
given the verbal nature of the task. The task in study 2 was a formal music composition task that
required the children to work directly with musical structure and form, and moved away from the
direct representation of external events. It was proposed that with a formal composition task, musical
interaction will be related to group productivity and that verbal interaction will show no relationship.
The two key gender issues are again examined here: whether one gender consistently takes control of
the task verbally and non-verbally in the mixed gender groups, and the relative productivity of single
and mixed gender groups.
METHOD
Seventy two children aged 9-11 were taken from a second primary school. The same procedure used
in study 1 was used in study 2. The only difference was the composition task. In this study, the
children were asked to work together to compose a piece of music that has a beginning, middle and
end.
UNEVOCATIVE 1 2 3 4 5 6 7 EVOCATIVE
DULL 1 2 3 4 5 6 7 LIVELY
UNVARIED 1 2 3 4 5 6 7 VARIED
UNORIGINAL 1 2 3 4 5 6 7 ORIGINAL
INEFFECTIVE 1 2 3 4 5 6 7 EFFECTIVE
UNINTERESTING 1 2 3 4 5 6 7 INTERESTING
UNAMBITIOUS 1 2 3 4 5 6 7 AMBITIOUS
(adventurous)
DISJOINTED 1 2 3 4 5 6 7 FLOWING
(articulate)
AESTHETICALLY 1 2 3 4 5 6 7 AESTHETICALLY
UNAPPEALING APPEALING
RESULTS
Process Variables r p
Gender
In the mixed gender groups, neither gender consistently took control of the task. Table 4 shows the
mean marks awarded to the compositions. The all-girl groups were awarded the highest marks,
followed by the all-boy groups and the mixed gender groups. These differences were not significant.
Mean F p
(S.D.)
DISCUSSION
Verbal and musical interaction
In this study, there was a significant relationship between group productivity and task directed play.
This suggests that, in line with study 1, musical interaction was important for productivity. However,
in contrast to study 1, no relationship was found between group productivity and task directed talk,
and there was significantly more play than talk. It is suggested here that this was due to the nature of
the task. The formal music composition task required the children to work directly with musical
structure and form rather than with the direct representation of external events. The children's ideas
were more efficiently expressed directly through the music and not through verbal discussion.
Gender
In this study, neither gender consistently took control of the task in the mixed gender groups. The
boys and girls may have felt on a more equal footing in this study than in Study 1 in terms of their
ability to tackle the task. Perhaps this type of task was one to which the boys were better able to relate
and may be more akin to the sort of music they enjoy. Or perhaps it was a type of task to which the
girls were less able to relate. This requires further investigation.
STUDY 3
COMPOSE A PIECE OF MUSIC THAT WILL MAKE ME HAPPY: AN
EMOTION-BASED COMPOSITION TASK
Studies 1 and 2 have looked at children working on a representational and formal music composition
task respectively. With the representational music composition task, where the stimulus was highly
verbal, both verbal and musical interaction were related to productivity. With a formal music
composition task, while musical interaction was related to productivity, verbal interaction was not. It
was suggested that these differences were due to the nature of the task. It is therefore important to
examine a further type of task in order to support these claims. On the basis of the above findings, it is
suggested that with an emotion-based music composition task, musical interaction will have a
significant relationship with group productivity and that verbal interaction will be both less prevalent
and less important.
The two key gender issues are again examined here: whether one gender consistently takes control of
the task in the mixed gender groups, and the relative productivity of single and mixed gender groups.
METHOD
Seventy two children aged 9-11 were taken from a third primary school. The same procedure as
before was used in this study. The children were asked to work together to compose a piece of music
'that will make me happy'. The quality of the compositions was assessed by three raters using the
Hargreaves et al (1996) scale discussed above.
RESULTS
Verbal and musical interaction
Table 5 shows that a significant relationship was found between group productivity and task directed
play (r=.56, p<.05). No relationships were found between group productivity and task directed talk,
exploratory play, off task talk or interaction with the researcher. A t-test revealed that there was
significantly more play than talk (t=19.76, p<.001).
Pearson Correlations Between The Process Variables and the Group Score
Table 5
Process variables r p
Gender
Neither gender was found to consistently take control of the task in the mixed gender groups. Table 6
shows that the all-boy groups were awarded significantly higher marks for their compositions than the
mixed gender groups (F=5.28, p<.05). The all-girl groups lay in the middle.
Mean F p
(S.D.)
DISCUSSION
The results support the hypothesis that musical interaction would be related to group productivity and
that there would be no relationship between group productivity and verbal interaction. It is suggested
that this is due to the nature of the task.
In this study, the all-boy groups were significantly more productive than the mixed gender groups,
with the all-girl groups lying between the two. It was suggested in the discussion of study 1 that the
status theories may explain female verbal domination in the mixed gender groups, and that music in
schools may be perceived as being more 'for girls' than 'for boys'. It may not be quite so clear cut -
rather than the subject being gender specific, it may be the task within that. There is some suggestion
that girls prefer to work on tasks that are more structured and verbal in nature (Morgan, 1998). Boys
tend to fare better on more open-ended, creative tasks. The task in study 3 is the most abstract of the
three and may therefore appeal more to the boys than the girls.
GENERAL DISCUSSION
relationship was found in studies 2 and 3. One possible explanation for these different findings is the
nature of the task. In studies 2 and 3 the children had to work directly with musical form and structure
and thus communication through music was more important. This explanation is only tentative,
because each of the studies was carried out in a different school with different approaches to music
education. More research is needed to establish which of the findings are due to the nature of the task,
and which are due to the differences among the schools.
All three studies showed there was a significant and positive relationship between musical interaction,
as measured by task-directed play, and the quality of the music compositions.
No relationship was found between exploratory play and the quality of the music composition in any
of the three studies. The exact nature and function of what was called exploratory play is still rather
unclear. It was defined as an individualistic form of play, as opposed to play directed towards the
group or towards completion of the task. It was essentially the exploration of the musical instruments.
While this element of play is considered individualistic rather than co-operative, it did not have a
negative relationship with group productivity as would be expected. Rather it showed no relationship
with group productivity. It is therefore dangerous to assume that exploratory play is somehow
detrimental, it may in fact be a vital part of task accomplishment, or have some other role that the
present analysis has not tapped into. It may be an important precursor to task directed play, where the
child may be trying out ideas for him/herself before feeling ready or able to share those ideas with the
rest of the group. What begins its life as an exploration of ideas at the individual level may somehow
make the transition to task directed play at the group level. 'Group score' may not be the most
effective means of assessing its importance.
A fundamental difficulty with the definition of exploratory play was that it did not distinguish
between individualistic playing involving trying out ideas, and simply 'messing around' with the
instruments. On a behavioural level, this distinction is problematic to make as it involves inferences of
intention on the part of the child. While exploratory play did not show a clear relationship with group
productivity, high levels of this behaviour were observed in all four studies, and so it would seem
feasible to suggest that it must have some function. Is it improvisation, exploration of ideas,
exploration of the instruments or simply a time wasting activity to avoid working on the task? It is
important to study the elements which make up the category of exploratory play as it may consist of
all of these.
Although all three studies showed that children can make use of musical interaction for the effective
communication of ideas, many questions remain unanswered. Does musical interaction act like verbal
interaction. That is, if the purpose of verbal interaction in collaborating groups is to present ideas and
discuss their alternatives, how is this happening in music? To what extent are ideas presented
musically and subsequently modified musically? Verbal interaction essentially involves reciprocity; to
what extent does this occur in musical interaction? Does one person in the group dominate in their
instrumental playing as sometimes occurs in verbal interaction? These issues require further
investigation.
Given Allison's (1986) argument that problem solving in the arts requires the use of thought patterns
different from those in science, it may have been expected that the children would work in a way that
was different from the way they might approach a science-based task. However, composition is a
form of problem solving, where a problem is set up, decisions are taken to solve the problem which
results in the satisfaction of having answered them (Salaman, 1988). While it is accepted that there
may be infinite solutions to this problem, the results of the present research suggest that the work
needed to complete the task may involve similar processes to those observed in science-based tasks.
That is, behaviourally, the same factors found to be responsible for productivity in science-based tasks
account for productivity in music composition tasks, namely the communication of ideas and the
establishment of a shared social reality.
Gender Composition of the Collaborating Group
The studies concentrated on two main gender issues: firstly the findings of previous research which
suggest that boys in mixed gender groups dominate verbally and non-verbally over girls; and
secondly, the suggestion that mixed gender groups tend to be less productive than single gender
groups.
SUMMARY
In sum, the present research has been concerned with children's collaborative music composition, with
the principal aim of establishing which factors within groups of children are important for group
productivity. Previous peer collaboration research has suggested that the most important element of
task activity within groups is the dialogue among group members. In the three studies reported here,
the importance of verbal communication was found to be dependent on the composition task. The
present research also showed that this 'dialogue' could occur musically, that is through the music itself
rather than through words. Thus, talking about music composition is not always productive, and there
is no substitute for the experience of the music itself.
References
Allison, B. (1986). Some aspects of assessment in art and design education. In M. Ross (Ed.)
Assessment in arts education: A necessary discipline or a loss of happiness. Pergamon Press, Oxford.
Archer, J. (1992). Gender stereotyping of school subjects. The Psychologist: Bulletin of the British
Psychological Society, 5, 66-69.
Barrett, M. (1995). Children composing: What have we learnt? In H. Lee & M. Barrett (eds.), Honing
the Craft: Improving the Quality of Music Education. Conference proceedings of the 10th National
Conference of the Australian Society for Music Education, Hobart: Artemis Publishing Consultants.
Doise, W. & Palmonari, A. (1984) (Eds.). Social interaction in individual development. Cambridge
University Press.
Fitzpatrick, H. & Hardman, M. (1995). Gender and the classroom computer: Do girls lose out? In H.
C. Foot, C. J. Howe, A. Anderson, A. K. Tolmie, & D. A. Warden (Eds.), Group and interactive
learning. Boston: Computational Mechanics Publications.
Hargreaves, D.J., Galton, M. J. & Robinson, S. (1996). Teachers' assessments of primary children's
classwork in the creative arts. Educational Research, 38 (2), 199-211.
Lee, M. (1993). Gender, group composition and peer interaction in computer-based co-operative
learning. Educational Computing Research, 9 (4), 549-577.
Morgan, L.A. (1998) Children's collaborative music composition: Communication through music.
Unpublished doctoral dissertation, University of Leicester.
Pheasey, K. & Underwood, G. (1995). Collaboration and discourse during computer-based problem
solving. In H. C. Foot, C. J. Howe, A. Anderson, A. K. Tolmie, & D. A. Warden (Eds.), Group and
interactive learning. Boston: Computational Mechanics Publications.
Rogoff, B. (1998). Cognition as a collaborative process. In W. Damon, D. Kuhn, & R. S. Siegler
(eds.) Handbook of Child Psychology: Cognition, Perception & Language (5th Edition). (pp.
679-744), New York: Wiley.
Salaman, W. (1988). Personalities in world music education. No. 7 - John Paynter. International
Journal of Music Education, 12, 28-32.
Swann, J. (1992). Girls, boys and language (First ed.). Oxford: Blackwell.
Swanwick, K. (1979). A basis for music education. Windsor: NFER.
Tolmie, A., Howe, C., Mackenzie, M. & Greer, K. (1993). Task design as an influence on dialogue
and learning: Primary school group work with object flotation. Social Development, 2 (3), 189-211.
Tudge, J. & Rogoff, B. (1989). Peer influences on cognitive development: Piagetian and Vygotskian
perspectives. In M.H. Bornstein & J.S. Bruner, Interaction in human development. Lawrence Erlbaum
Associates.
Underwood, J. (ed.) (1994). Computer-based learning (First ed.). London: David Fulton Publishers
Ltd.
Wegerif, R., Mercer, N., & Dawes, L. (1999). From social interaction to individual reasoning: an
empirical investigation of a possible socio-cultural model of cognitive development. Learning &
Instruction, 9, 6, 493-516.
Back to index
Proceedings abstract
EFFECT OF TEMPORAL CONTEXT ON PITCH SALIENCE IN MUSICAL CHORDS
parncutt@kfunigraz.ac.at
Background:
Aims:
method:
In each trial, music students heard a chord of octave complex tones followed by
single tone, and rated how well the tone followed the chord. Different
combinations of chord and probe tone were presented in random order and in
random transposition. In a second experiment, the chord was either preceded or
followed by a distractor tone, which listeners were instructed to ignore. When
the distractor followed the chord, the distractor also followed probe tone.
Results:
We hypothesized that distractor tones would stream with nearby chord tones,
attracting attention to them and increasing their perceptual salience. This was
not confirmed. Instead, peaks were generally observed at both chord and
distractor pitches. More importantly, when the chord and the distractor tone
together created a more familiar tonal fragment or progression, the tone
profile was more clearly structured.
Conclusions:
Temporal context and streaming do not destroy pitches that are implied
according to the pitch model, but not played. This seems to confirm the model's
music-theoretic potential. However, responses were strongly influenced by
listeners' familiarity with specific sound structures that occur frequently in
western music and are not represented in the model.
Back to index
Proceedings paper
The hypothesis holds that we understand all of the overt gestures of performers-the finger, arm, trunk, and leg movements-via overt and covert
imitation. Overt forms of mimetic participation include toe-tapping, swaying, dancing to music, and singing along with music; covert forms
include subvocalization and other aspects of motor imagery. Because the overt forms are generally more occasional, they are somewhat less
informative than the covert forms, which, according to the evidence presented below, appear to occur regularly as an automatic part of music
perception and cognition.
Notice that the mappings are one-directional, from the source domain of the human voice to the target domain of instrumental sounds.
Although singers may sometimes speak of their "instrument," this reverse mapping of instrumental sounds onto the voice is far less common.
Now if it were simply a matter of vocal and instrumental sounds being alike, then the mappings might go in either direction. But in addition to
whatever acoustic similarities there may be, the voice provides most of us with an experiential basis for understanding the majority of
instrumental sounds. The unidirectionality of the mappings, from this generally applicable vocal experience onto the more specific cases of
instrumental sounds, are consistent with the mimetic hypothesis. The mappings of the conceptual metaphors MUSICAL SOUNDS ARE VOCAL
SOUNDS and INSTRUMENTAL SOUNDS ARE VOCAL SOUNDS indicate that part of how we understand musical sounds generally is in terms of
our own vocal experience, while the mimetic hypothesis holds that the process whereby we perform these cross-domain mappings involves,
and is perhaps motivated by, mimetic participation on the part of listeners.
Implications
If the mimetic hypothesis is correct, it holds fundamental implications for various aspects of musical meaning. At a broad, philosophical level,
it helps show explicitly what role embodied experience plays in the imagination of what might otherwise seem like autonomous, objective
musical properties, such as musical verticality. We normally treat the concept of musical verticality as if it were literal-"music-literal," to use
Guck's (1991) term-while perhaps acknowledging that on some level it is metaphoric. One problem with this is the implicit suggestion that
musical tones simply have some property, which we understand metaphorically as "high" and "low," and which we perceive in ways that only
incidentally involve embodied experience-a position that might be put, "Yes, we have ears with which we hear, but this does not determine the
property of the tones which we understand in terms of verticality." The mimetic hypothesis and analysis of the metaphor, however, say
something quite different.
Tones are neither "high" nor "low," and they neither "ascend" nor "descend," and yet so much music discourse and meaning depend on the
imagination that they somehow do. As I have explained elsewhere (Cox 1999), we can understand the logic of this metaphor in terms of the
conceptual metaphor GREATER IS HIGHER (Lakoff and Johnson 1980) and the folk theory to increase is to raise, whereby we regularly
understand greater and lesser quantities and magnitudes in terms of vertical relations. In the same way, "higher" notes are produced by and
large via greater quantities and magnitudes of air, effort, and tension. There are exceptions of course, but when we recognize the role of vocal
mimesis, the exceptions become even fewer. If we understand musical sounds in general in terms of our own vocal experience, and that vocal
experience involves greater and lesser amounts of air, effort and tension, then we have a basis for understanding sounds in general
metaphorically in terms of "higher" and "lower." This view of things brings together embodied musical experience, in terms of mimetic
participation, and the embodied metaphoric reasoning that we use everyday, in order to account for the fundamental concept of musical
verticality which heretofore has not been explained beyond the level of identifying it as a metaphor. The further implication of this view is
that any concept and any claim about music based on the concept of verticality-whether in terms of melodic motion or shape-describes not a
property of the music itself, but an interpretation based in part on the imagined (re)production of the sound described, understood via the logic
of the conceptual metaphor GREATER IS HIGHER. Musical properties that might otherwise seem to be located in the music itself are instead
shown to emerge in the imagination of listeners, as we draw on embodied experience and the logic of metaphoric thought. This claim has
further philosophical implications with regard to the autonomy of musical works and related issues, but I will proceed to the more directly
Armstrong, D. F., Stokoe, W. C., and Wilcox, S. E. (1995). Gesture and the Nature of Language. Cambridge: Cambridge University Press.
Baddeley, A. (1986). Working Memory. Oxford, Clarendon Press.
Baddeley, A., and Logie, R. (1992). Auditory imagery and working memory. In D. Reisberg (Ed.). Auditory Imagery. Hillsdale, NJ, Lawrence
Erlbaum.
Carpenter, Patricia. 1967. The musical object. Current Musicology, 5, 56-87.
Clifton, T. (1983). Music As Heard. New Haven, Yale University Press.
Cox, A. (1998). As time goes by: the past, present, and future of musical motion. Society for Music Theory, Chapel Hill.
Cox, A. (1999). Verticality, conceptual blending, and the mimetic hypothesis. Society for Music Theory, Atlanta.
Crowder, R. (1989). Imagery for musical timbre. Journal of Experimental Psychology: Human Perception and Performance, 15, 472-78.
Crowder, R. G., and Pitt, M. A. (1992). Research on memory/imagery for musical timbre. In D. Reisberg (Ed.). Auditory Imagery. Hillsdale,
NJ, Lawrence Erlbaum. pp. 29-44.
Cumming, N. (1997). The subjectivities of 'Erbarme Dich'. Music Analysis, 16/1, 5-44.
Fadiga, L.; Buccino, G.; Craighero, L.; Fogassi, L.; Gallese, V.; Pavesi, G. (1998). Corticospinal excitability is specifically modulated by
motor imagery: a magnetic stimulation study. Neuropsychologia, 37/2, 147-158.
Fadiga, L., and Gallese, V. (1997). Action representation and language in the brain. Theoretical Linguistics, 23/3, 267-280.
Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti G. (1996). Action recognition in the premotor cortex. Brain, 119, 593-609.
Gallese, V., and Goldman, A. (1998). Mirror neurons and the simulation theory of mind-reading. Trends In Cognitive Sciences, 2/12,
493-501.
Gallese, V., and Goldman, A. (1998). Mirror Neurons and the Simulation Theory of Mind-Reading. Trends in Cognitive Sciences 2/12,
493-501.
Gathercole, S. E., and Baddeley, A. D. (1993). Working Memory and Language. Hillsdale, NJ, Lawrence Erlbaum.
Gibson, E. J., and Levin, H. (1975). The Psychology of Reading. Cambridge, MA, The MIT Press.
Grafton, S. T., Fadiga, L., Arbib, M. A., and Rizzolatti, G. (1997). Premotor cortex activation during observation and naming of familiar tools.
Neuroimage, 6/4, 231-236.
Guck, M. (1991). Two types of metaphoric transfer. In J. C. Kassler (Ed.). Metaphor: A Musical Dimension. Sydney, Currency Press.
Lakoff, G., and Johnson, M. (1980). Metaphors We Live By. Chicago, University of Chicago Press.
Larson, S. (1993). Modeling melodic expectation: using three "musical forces" to predict melodic continuations. Proceedings of the Fifteenth
Annual Conference of the Cognitive Science Society, 629-34.
Lidov, D. (1987). Mind and body in music. Semiotica, 66/1, 66-97.
Lochhead, J., and Fisher, G. (1997). Performance and gesture: on the projection and apprehension of musical meaning. Society for Music
Theory Annual Meeting, Phoenix.
Logie, R., and Edworthy, J. (1986). Shared mechanisms in the processing of verbal and musical material. In D. G. Russell, D. Marks, and J.
Richardson (Eds.). Imagery 2. Dunedin, New Zealand, Human Performance Associates. pp. 33-37.
McClary, S. (1991) Feminine Endings: Music, Gender, and Sexuality. Minneapolis, University of Minnesota Press.
Mead, A. (1996). Bodily hearing. Annual Meeting of the Society for Music Theory, Baton Rouge.
Neisser, U. (1976). Cognition and Reality: Principles and Implications of Cognitive Psychology. New York, Freeman.
Walton, K. (1993). Understanding Humor and Understanding Music. In Krausz, M. (Ed.). The Interpretation of Music: Philosophical Essays.
Oxford, Clarendon Press. pp. 259-70.
Walton, K. (1997). Listening with imagination: is music representational? In Robinson, J. (Ed.). Music and Meaning. Ithaca, Cornell
University Press. pp. 57-82.
Young, I. (1990). Throwing Like a Girl and Other Essays in Feminist Philosophy and Social Theory. Bloomington, Indiana University Press.
Back to index
Proceedings paper
The Relation of body movement and voice production in early childhood music learning
Wilfried Gruhn, Musikhochschule Freiburg, Germany
Background
Observational and experimental research on early music learning (Gordon 1990; Wilson & Roehmann 1990; Deliège &
Sloboda 1996; Gruhn 1999) as well as brain studies on music learning (Altenmüller & Gruhn 1997; 1999; Liebert &
Gruhn 1999) have demonstrated that the brain holds a powerful potential by its high degree of plasticity which is
fundamental to learning. Although all primary sensory cortices are genetically predetermined, the extension and neuronal
connectivity within particular brain areas vary over time according to experience and practice. For learning humans
profit from the plastic power of the brain which enables it to modulate different functions depending on new demands.
The neurobiological potential of the brain makes the "competent infant" (Dornes 1993) whose competence is grounded in
the ability to develop dynamic neural networks within a given genetic program and to form mental representations for all
kinds of experiences and incoming information.
Since cognitive psychology has described different types of mental representation as "figural" and "formal" (Bamberger
1991) depending upon what aspects the perceptive mind focuses on,
and since EEG studies have identified different activation patterns which can be referred to those types, it is likely to
assume that the very nature of infant learning centres in the genetically determined and environmentally stimulated
growth of neuronal connections which form genuine musical representations as their neuronal correlates. In a long-term
observational study over more than five years (Gruhn 1999; 2000) we exposed groups of young children to various
materials derived from Gordon's learning theory (Gordon 1980, 51997) and applied a language acquisition model to the
informal teaching and learning process. One of the goals of the longitudinal study was to stress on the interaction of
motor skills with musical activities like singing in tune and chanting rhythmically. What educators and musicians like
Jaques-Dalcroze, Laban, Jacoby, and Gordon have assumed intuitively or by observation - that different dimensions of
body movement reflect musical experience and procedural knowledge - should be proved experimentally.
Experimental study
Subjects
Parents of children from birth up to two years from the municipal area of Freiburg volunteered in the study for 15 months
from October 1998 until December 1999. Children were selected from a larger sample according to the criteria age (1
year +/- 3 months) and gender (balanced distribution of male and female). From a total of 13 children we lost 4 because
of parents' move or other circumstances. Finally, 9 children (M 19.5 months, SD 5.83) completed the study. The social
structure was not representative for the average population; all children grew up in a musically active family, and
exhibited - as reported by the parents - high musical sensitivity (attracted by music, conscious listening, spontaneous
moving along with music). A control group (9 children from a nursery, M 23.2 months, SD 3.15) was matched with the
experimental group as to age, gender and social background.
Procedure
Children with their parents or caregivers participated in an informal instruction once a week for 30 minutes. The entire
teaching period of 15 months was segmented into four 10-week-sections. Materials presented were children songs with
words, tunes without words, chants with words (nursery rhymes), chants without words, tonal patterns, and rhythm
patterns. Songs included all tonalities (major, minor, modes); chants consisted of duple and triple meter and included
unusual combinations either way. Whatever type was presented, singing was always accompanied by body movements.
However, movement did not necessarily reinforce beats, rather any movement focused on continuous flow and weight.
As long as the teacher kept eye contact with a child, (s)he persisted in presenting the same sound pattern to the child.
Additional materials (like scarves, balls, hoops, trampoline) were introduced as far as they were apt to support the
experience of flow and weight
All sessions were videotape recorded; additionally children were observed individually by two independent judges
(interjudge reliability r = .88) using the structured observation form CBOF. Data of any child were averaged for always
4-week periods. For analysis the means of the beginning and ending of each 10-week-section were taken from CBOF. In
all, data of 8 measurements (2 for each of the 4 sections) were related to the means of the control group that did not get
any musical instruction whatsoever. However, teachers spent approximately the same time with them on the playground
and in other classroom activities to gain confidence of the children and to avoid any care effect in the study group.
Additionally, parents reported regularly at the end of each section by a questionnaire.
Results
The behaviour over time exhibits slow changes in some areas as type of movement (explorative, imitative, creative),
quality of movement (flow, coordination, synchronisation, expressiveness), and voice response (rhythm patterns). (Fig.
1) Since the children started at a very early age, one cannot expect too much progress in voice production (singing and
chanting) and little enhancement in movement at the beginning when children begin to stand freely and walk alone.
Therefore, only after introducing new models of movement during the first section we find a significant change at the
end of that section (flow p = .000; coordination p = .020; synchronisation p = .042). Major and most obvious
development of the quality of movement happens only during the second half of informal instruction. In contrast, the
readiness for sound reproduction grows slowly. Children need a long time of perceiving structured patterns and
processing it before they start imitating them beyond mere babble. Therefore, significant improvement of sound
reproduction - if ever observed - is only to be recognised in rhythm patterns.
The development of each particular skill (e.g. coordination, synchronisation, accuracy, consistency, intonation, pitch
discrimination, expressiveness) within any criterion (movement, tonal and rhythm response, imitation, improvisation,
and audiation) does not proceed continuously, but with accelerations and decelerations. (Fig. 2) Moreover, there is a
different progression speed in voice and body movement depending on many biological, social, and environmental
influences. From sociology we know about the U-shaped growth which is also reflected by several developmental curves
of voice and movement achievement in this study. Any child performs his/her own developmental drift with varying
phases of progress and retardation.
Looking for interactions between the various attributes, a Pearson correlation was calculated for each of the 45 criteria.
Here, a significant correlation appears only in body movement and voice production at all measurements throughout the
four sections. In most cases results are even highly significant (r ≥ .80). (Fig. 3) That indicates an important interaction
of the way how children use their body for moving and their ability for matching accurate pitches and keeping a
consistent tempo. The better they can control their body, the more precisely they can also control their vocal apparatus.
With growing experience and practice, the correlation, then, shifts from patterns to songs and tunes at the end of the
long-term observation.
Most interestingly, children of the control group show exactly the same direction in their development, but on a lower
level although they started from exactly the same level as children in the study group. Significant changes in their
musical behaviour over the entire time are nonsignificant except for imitation of movements (p = .009) and coordination
(p = .001).
Figure 4
Compared development of movement of children in the study group (S) with those of the control group (C) at the beginning (measurement 1)
and at the end of each section (measurements 2 - 5).
Because of their lack of exposure to a musically enriched environment which offers many opportunities to explore their
body as well as their voice as a means of expression and communication, the control children could only perform
movements according to their biological maturation. In the experimental group that process appeared significantly
enhanced.
Discussion
Although one must take into consideration that correlational findings do not allow causal statements, it is remarkable that
the empirical data support a strong interaction between movement and voice production what has already been observed
by other music educators. Furthermore, researchers have investigated infants' development of movement and its impact
on early childhood music learning (Metz 1989; Blesedell 1991; Hicks 1993; Reynolds 1995). However, it is not easy to
adequately interpret the data presented here. One might assume that there is a common neuronal basis for connected
mechanisms underlying the performance of movement and vocal sound. But there is no neurological evidence for such
an assumption. Rather it is evident that precise control of efferent neuronal transmission that governs motor coordination
affects both, general motor skills for moving arms and legs as well as fine motor skills needed for controlling the
adjustment of the vocal cords. Movement supposedly facilitates somato-sensori and sensori-motor stimulus transmission
and enhances the primary sensori-motor cortex which inversely affects muscular motor processes in the larynx and
enables it to react physically to sound by matching a perceived pitch with the vocally produced pitch. If that kind of
motor skills is developed properly, correct voice production can function more easily. EEG, MEG and fMRI studies
might demonstrate whether there is a neuronal basis for interaction and how it works. As to now one can conclude that
music training at an early age is best supported by integrating body movement.
The salient interaction of body movement and voice production can also be interpreted in terms of transfer effects. If we
differentiate between internal and external transfer effects (Gruhn 2000), the interaction may be based on an internal
effect that connects different brain functions to a more complex network. However, strong research data which support
this hypothesis are still missing.
More likely, Condon's observation (1975) that infants immediately after birth respond to mother's voice by synchronous
movements parallels our findings. From long-term behavioural observations of newborn he concludes that infants learn
their mother tongue even before the age of actual language acquisition by first imitating the structure of movements
along with their mothers' speech. Rhythmically structured vocal sequences are - as to Condon - basically perceived as
rhythmically structured movement patterns. Investigations on mother infant communications (Malloch 1999/2000;
Trevarthen 1999/2000) support the evidence of rhythmically structured patterns in terms of call and response and of
childrens' exploration of pitch ranges. Those interactions, which are "built from the units of pulse and quality found in
the jointly created gestures of vocalisations and bodily movement" (Malloch 1999/2000, 45), are basic for the
development of skills necessary for communication through sound production.
References
Altenmüller,E. & Gruhn,W. (1997). Music, the brain, and music learning. Chicago: G.I.A. Publ.Inc. (GIML series
vol. 2)
Altenmüller, E., Gruhn, W., Parlitz, D. (1999). Was bewirkt musikalisches Lernen in unserem Gehirn? In
H.G.Bastian (Ed.). Musik begreifen. Mainz: Schott. pp. 120 - 143.
Bamberger, J. (1991). The mind behind the musical ear. Cambridhe: Harvard Univ.Press.
Blesedell, D.S.(1991). A study of the effects of two types of movement instruction on the rhythm achievment and
developmental rhythm aptitude of preschool children. Dissertation Abstracts International, 52 (07), 2452.
Condon, W. (1975). Speech makes babies move. In R.Lewin (Ed.). Child alive. New York: Anchor. pp. 75 - 85.
Deliège, I. & Sloboda, J. (Eds.) (1996). Musical beginnings. Origins and development of musical competence.
Oxford: Oxford Univ.Press.
Dornes, M.(1993). Der kompetente Säugling. Frankfurt: Fischer.
Gordon, E.E. (1980). Learning sequences in music. Chicago: G.I.A.Publ.Inc. ( 5th edition 1997)
Gordon, E.E. (1990). A music learning theory for newborn and young children. Chicago: G.I.A. Publ.Inc.
Gruhn, W. (1999). The development of mental representations in early childhood. In Suk Won Yi (Ed.). Music,
Mind, and Science. Seoul: Seoul Nat.Univ.Press. pp.434 - 451.
Gruhn, W. (2000). Does brain research support the hope for musical transfer effects? SRPMME Conference,
Leicester UK (mscr.)
Hicks, W.K.(1993). An investigation of the initial stages of preparatory audiation. Dissertation Abstracts
International, 54 (04), 1277.
Liebert, G., Gruhn, W. et al. (1999). Kurzzeit-Lerneffekte musikalischer Gehörbildung spiegeln sich in kortikalen
Aktivierungsmustern wieder. Proceedings of Deutsche Gesellschft für Musikpsychologie. Karlsruhe.
Malloch, S.M. (1999/2000). Mothers and infants and communicative musicality. Musicae Scientiae. Special Issue,
29 - 54.
Metz, E. (1989). Movement as a musical response among preschool children. Journal of Research in Music
Education, 37 (1), 48 - 60.
Reynolds,A.M. (1995). An investigation of the movement responses performed by children 18 months to 3 years
of age and their caregivers to rhythm chants inn duple and triple meters. Dissertation Abstracts International, 56
(04), 1283.
Tervarthen, C. (1999/2000). Musicality and the intrinsic motive pulse: evidence from human psychobiology and
infant communication. Musicae Scientiae. Special Issue, 155 - 211.
Wilson, F.R. & Roehmann, F.L. (Eds.) (1990). Music and child development. St.Louis: MMB Music.
Back to index
Back
Proceedings
Convenors:
Drake,C.,
Palmer,C.
Chair: Palmer,C.
Development of More on the How much time do Music cognition Student practice The effect of
musical meaning of natural we need to process illuminates our habits in the musical styles
organisation in schemata - their harmonic understanding of United States and and experiences
children's music role in shaping structures the experience of Japan on melodic
notations types of film expectancy
directionality (Abstract)
(Abstract) (Abstract)
Narrative and Musical meaning Effect of harmonic When your ear sets Muisical and Melodic
musical time: and the line of relatedness on the the stage: musical extra-musical expectancies in 7
children's fifths detection of context effects in factors and the and 8 year olds
perceptions of temporal film perception career of
structural norms (Abstract) asynchronies professional (Abstract)
musicians: the
perspective of
employment
agencies
(Abstract)
The distance and Musical timing Music, mood and Preparing Similarity
ratio model of data as memory professional perception of
pitch perception performance performers? Music variations of
gesture students' tonal and
(Abstract) perceptions and twelve-tone
experiences of melodies
their orchestral
training at
Birmingham
Conservatiore
Proceedings abstract
Eva Brand
evabrand@netvision.net.il
Background:
Aims:
The aim of this study was to identify and describe the development of musical
organization found in children's notations of the song.
method:
A total of 36 children paricipated in the study, coming from three age groups :
6, 9 and 12 year-olds. The children were asked to learn unfamiliar ZULU song
through independent use of a taped recording of a song, a tape, xylophone, drum
and paper and colored pens. Making a visual representation of the song was one
of the learning strategies they used.
Results:
Conclusions:
Musical notations contain " more than meets the eyes ".
Back to index
Proceedings paper
A. INTRODUCTION
The topic of this paper links certain aspects of three partly overlapping concepts: meaning in music, natural schemata, and types of
directionality. Although much has been written about them, they require clarification. We assume here that meaning in music refers
to the types of experiences evoked by the music. We will continue to classify the types from three overlapping overall standpoints
that may contribute to characterizing and distinguishing between cultures, periods, or composers, since they represent different
messages or ideals. (1) the existence or absence of a direct link with the extramusical world (as expressed in "functional,"
"programmatic," and other music, as opposed to "absolute" music); (2) association with one of two poles: "ethos" or "pathos" (to use
the terminology of Curt Sachs [1946]; these poles can also be termed "tranquility/excitement," "clarity/blurring," or
"Classical/Romantic"); (3) types of directionality and complexity.
"Directionality," in the most general terms, can be said to represent the sense of certainty as to the continuation of the musical
progression on various levels of musical organization. It is related to expectations that may arise from learned and natural schemata
and how they are realized. The concept of directionality has been referred to in theoretical discussions by many terms, including
"progression," "process," "processive forms," "flow," "goal," "direction," and "approaching." (For a summary of studies on
expectancy in music from a historical perspective, beginning in 1903, see Carlsen 1990.) Different styles have different forms of
directionality: momentary, overall, clear, suspensive (or blurred), and various combinations thereof, and these forms of directionality
B. NATURAL SCHEMATA
Natural schemata that are related to universal phenomena can be regarded as complementary contrasts of learned schemata. They are
familiar from outside music, too, and are not expressed in precise terms. (The concept of the schema, a term that was coined by
Bartlett [1932] and is in fairly common use today, is still referred to by somewhat equivalent terms such as archetype, prototype,
alphabet, model, and "structured system of knowledge.")
Despite their tremendous contribution to all types of experiences, natural schemata have been largely neglected in discussions and
analyses of music, especially Western tonal music. Western musical theory focuses almost exclusively on the learned schemata and
not on their realization through natural schemata. In non-Western cultures, some natural schemata are integral parts of musical
theory. Another reason why the natural schemata are neglected or ignored is that they are not defined in quantitative terms.
Awareness of them increased in the twentieth century, as they became more important in the organization of pieces of music at the
expense of the learned schemata and as research into music and cognition and various universal phenomena developed substantially.
Many analyses of twentieth-century music are based on natural schemata, as manifested in the texture in its broad sense, both by
theoreticians (Lansky 1974; Goldstein 1974; Scolnic 1993) and by the composers themselves (Stockhausen 1957, Varèse 1967;
Boulez 1971; Ligeti 1993). In contrast, few analyses of Western tonal music have been done based on natural schemata (Lorince
1966; Rothgeb 1977; Ratner 1980; Levy 1982), although there have been many studies on various aspects of natural schemata,
especially those related to expectations and realization of them (Meyer 1975; Narmour 1991; Krumhansel 1997; Yeger-Granot 1996;
to mention just a few) and curves of pitch (see below). Here we would like to discuss an additional aspect of natural schemata and
examine their selection in light of the stylistic ideal.
Due to the large number of parameters and combinations thereof, there is a huge variety of manifestations of natural schemata. In
order to understand their contribution to shaping perceptions of types of directionality, we will attempt to define cognitive principles,
and in light of them, we will examine the meaning of the natural schemata.
We can see the manifestations of natural schemata in four main realms:
1. The absolute and relative range of occurrence of various parameters, with treatment of the normative range of expectations
following an inverted U function (on the U function with respect to musical phenomena, see Hargreaves 1986). Any deviation
in either direction (greater or less) runs counter to natural expectations, spoils the clear directionality, and elicits tension
(therefore music that is very low, very slow, or without change also produces tension). Extensive deviation introduces another
factor-infrequency/frequency-that affects the perception of expectations.
The relative range is represented by the ambitus-medium, large, or small-which may appear in any of the absolute ranges for
each parameter.
2. Curves of change over time for each parameter. Most research on curves has focused on the parameter of pitch and involved
various topics: musical perception and memory (e.g., Dowling 1978; Edworthy 1985; Andrews and Dowling 1991); the
contribution to musical structure (Eitan 1997), especially to the characterization of folk songs (Nettl 1977; Huron 1996); the
curves as gestures that contribute to emotional expression, as a component of expectations that arise in melodic progressions,
and as an important factor in determining the parameters that accompany it (changes in intensity and duration) during
performance (Repp 1998; Sundberg et al. 1991). In general, one can speak of six meaningful basic curves: ascending,
descending; convex, concave; flat; and zigzag. These curves may appear on various levels and in various combinations. The
convex curve (also called "arch") represents a natural model of predictability as to the continuation of the musical progression.
It may be regarded as defining a closed unit in which tension (or excitement) results in relaxation, where tension and
relaxation may be realized as contrasting pairs in various parameters: ascent-descent; crescendo-diminuendo;
accelerando-ritardando; sudden change (e.g., a skip)-gradual change (scale steps).
Both random zigzag and horizontal (no change) curves portray lack of organization (based on difference and similarity [see
Tversky 1977]). They therefore hinder clear directionality and cause stress, due to expectancy of some sort of repetition in the
case of zigzag and expectancy of change in the horizontal case. One can also say that in the case of continuous exact repetition
(A A A ...) "before" and "after" are meaningless, and the units are interchangeable. (To be more precise, the units are never
identical in the sense that each appears after a different number of A's).
It is important to distinguish between kinds of repetition. For example, a single repetition (with varying degrees of precision),
which underlies the "Classical period" (and 2n organization), enhances directionality; multiple repetitions distinguish at least
between various levels, such as repeated background events (e.g., meter) and repeated frontal events .The first helps to
reinforce a sense of directionality in the changing frontal events, whereas the second evokes excitement due to expectation of
change (up to a certain threshold at which one despairs of change, when repetition turns the frontal events into background).
The type of repetition also depends on the length of the repeated unit and events on various levels.
3. The degree of definability of all the above-elements, events, curves, units, etc.-depends on psychoacoustic and cognitive
constraints, and, of course, it affects the perception of clear directionality. As for the definability of elements such as the notes
of the scale, intervals, chords, and rhythms, a precondition is the existence of clear categories for our perceptions (Burns et al.
1978; Rakowski 1990; Kefe et al. 1991). The categories, for their part, depend on the quantitative aspect and the existence of a
hierarchy. For example, An examination of hierarchy and the conditions for coherence in connection with the interval system
showed a preference, in terms of possible forms of organization, for the specific division of the octave into twelve and seven,
as in the West, over other hypothetical divisions (Balzano 1980; Burns 1981; Agmon 1989). Thus we can regard some learned
schemata as natural schemata, since they are the outcome of cognitive activity.
As for the definability of events or units based on combinations of parameters, this depends on the degree of concurrence or
nonconcurrence of the parameters (Cohen and Wagner 2000). Full concurrence, for example, is obtained when a note that
appears on a stressed beat in the measure serves as a peak for its neighbors in pitch, duration, and intensity. In such a case
there is no doubt about its salience, which corresponds to its location on a stressed beat. In contrast, in all states of
nonconcurrence (seven possibilities in all), the degree of definability regarding its salience is smaller.
Concurrence/nonconcurrence may be seen between different learned schemata, between natural schemata, and between
learned and natural schemata. Nonconcurrence heightens complexity, uncertainty, and excitement.
4. Categories of operations that denote the principle of change that occurs in a transformation. Transformation, by its very
nature, includes the two conditions required by every form of organization: difference and similarity; and repetition. It can be
characterized by the parameters in which it occurs, the level (immediate or overall), the degree of change, and most
importantly, the operation. We can group the operations in five categories-contrast, shift, reduction/expansion,
segregation/grouping, and equivalence-which represent categories of cognitive operations; consequently, we can regard them
as natural schemata (Apel 1993).
Fig. 1: The structure of the upper voice in the canon from Krenek's Etude (op. 38, no. 1) in terms of pitch curves (and written
dynamics concurrent with pitch) on several levels.
a. Specification of curves on the various levels. Dots (•) stand for non-repeating notes; diamonds around dots ( ) indicate
notes that open a row; solid lines (-) above bar numbers denote pauses; broken lines (- - -) represent the overall curves of
units (convex or ascending); broken vertical lines ( ) denote the division between the two subparts in I and II. An oval ( ) in I
surrounds the opening subunit of two motives; in II the oval surrounds each of the two motives, which have become separate
units.
b. Overall structure of the piece, in terms of the curves of the parts (I and II are concave, whereas the overall curve is
convex).
In terms of the natural schemata, we can think of the piece as being divided into two parts (I and II) and a finale (see fig. 1). The first
part (eight measures, ending with the end of the third appearance of the row) is divided into two unequal subparts (41/2 + 31/2
measures). The first subpart has three units that are fairly zigzagged, but their overall curves are convex. The units, which are
separated by rests, become gradually shorter (together with the rests, their durations in number of measures are 2, 13/4, 3/4). All
three are subject to the overall descending curve. The second subpart contains two zigzagged units, with rests before, between, and
after them. In this subpart the ascending curve is salient on several levels. The second subpart may be regarded as a contrast to the
first in terms of the curve: ascent following descent. The overall curve of the first part is concave. Note that the curves on the
different levels, both convex and ascending, are reinforced by appropriate changes in intensity, but, as we mentioned, there is no
concurrence with the rows, except at the end of the first part and at the very end of the piece (altogether the row appears five times,
with a gradual increase in duration).
The second part (the second line in the figure) constitutes a repetition of sorts. The second part is also divided into two subparts. The
first subpart consists only of the subunit that opens I (circled in the figure) and contains two motives. In II it breaks up into two units
(set off by two ellipses in the figure); each of them is expanded by a repetition of notes, an increase in duration, and a pause between
them. The second unit is shorter than the first (the first has six notes and the second has only three). Then, in the second subpart,
there are three ascending lines that become shorter and shorter (the first has eight notes, the second has six, and the third has three)
with an ascent between them, up to a peak (in intensity, too). Like the first part, the second part as a whole is represented by a
concave curve, with a simpler and smaller descent and a stronger ascent. The second part as a whole is ascending. Then comes the
finale, which is marked by segregating staccato and rests between the notes and ends with descent, legato, long notes, and piano (i.e.,
a decrease in all parameters, including density).
Thus we can regard the piece as comprising several levels: the parts (two plus the finale), the subparts, the units of the subparts, and
the immediate level. As for the curves, on the immediate level most of them are zigzags with varying degrees of steepness; on the
level of the units, the curves are convex or ascending; on the level of the subparts, they are ascending or descending; on the level of
the parts, they are concave (in the two parts) and descending (in the finale); on the level of the piece as a whole (b in the figure), the
curve is convex. The operations are contrast, shift, reduction, and intensification. Unquestionably, these natural schemata contribute
to directionality on various levels of the piece, and they fill in what the dodecaphonic system lacks.
D. THE EXPERIMENT
The experiment examined subjects' responses to monophonic, atonal lines that are supposed to represent types of directionality by
means of various characteristics of directionality. Atonality was selected in order to prevent competition from learned schemata.
D.1 Method
The following questions were asked: (1) Is the pattern a closed one (in which case the unit is coherent and directional), or does it
require a continuation (lack of clear expectations). (2) Does it elicit a sense of pleasantness, unpleasantness (annoyance), or apathy?
(3) What adjectives are appropriate for characterizing the pattern?
The subjects: All were adults; some had studied music and the others could not read notes.
The patterns examined: The patterns were characterized by pitch curves (convex, concave, zigzagged, and flat), combinations of
them on various levels, and interval sizes (small/large). All of them appear in various (more than two) realizations, with equal
durations (i.e., neutralization of the duration factor) and without repetition of notes. Some of them reappeared with additional facets
of organization: of durations (meter, regular or irregular rhythms), with or without concurrence between pitch and duration factors,
and with repetitions of notes. There were a total of 47 patterns; the number of notes in each pattern ranged from 8 to 16; and the
ambitus did not exceed an 11th (17 half-tones). The patterns were recorded on a computer in the timbre of a piano; between the
patterns, the subjects were given 12 seconds to write down their responses.
D.2 A Selection of Significant Findings
General
1. There was usually a high correlation between "pleasant" and "directional."
CONCLUSION
To sum up, we have tried to highlight the importance of natural schemata in shaping musical directionality. To do so, we have
extended what would be the background to our work, which views directionality as one of the three main variables in characterizing
the stylistic ideal.
REFERENCES
Agmon, E. (1989). A mathematical model of the diatonic system. Journal of Music Theory, 33, 1-25.
Andrews, M. W. & Dowling, W. J. (1991). The development of perception of interleaved melodies and control of auditory
attention. Music Perception, 8, 349-368.
Apel, R. (1993). A cognitive model for transformation processes in Western tonal music. Doctoral diss., University of Haifa.
Arom, S. (1991). African Polyphony and Polyrhythm. Cambridge, Cambridge University Press.
Balzano, G. J. (1980). The group-theoretic description of 12-fold and microtonal pitch systems. Computer Music Journal, 4
(4), 66-84.
Bartlett, F. (1932). Remembering: a Study in Experimental and Social Psychology. Cambridge, Cambridge University Press.
Boulez, P. (1971). Boulez on Music Today. London, Faber and Faber.
Burns, E. M., Ward. W., & Dixon. (1978). Categorical perception-phenomenon or epiphenomenon evidence from experiments
in the perception of melodic musical intervals. Journal of the Acoustic Society of America, 63, 456-468.
Carlsen, J. C. (Ed.). (1990). Music Expectancy (special issue). Psychomusicology, 9 (2).
Cohen, D. (1971). Palestrina counterpoint-a musical expression of unexcited speech. Journal of Music Theory, 15, 84-111.
Cohen, D. (1986). The performance practice of the Rig Veda: a musical expression of excited speech. Yuval, 6, 292-317.
Cohen, D. (1994). Directionality and complexity in music. Musikometrica, 6, 27-77.
Cohen, D., & Dubnov, S. (1997). Gestalt phenomena in musical texture. In M. Leman (Ed.), Music, Gestalt, and Computing.
Berlin, Springer, pp. 386-405.
Cohen, D., & Granot, R. (1995). Constant and variable influences on stages of musical activities: research based on
experiments using behavioral and electrophysiological indices. Journal of New Music Research, 24, 197-229.
Cohen, D., & Michelson, I. (1999). Directionality and the meaning of harmonic patterns. In I. Zamos (Ed.), Music and Sign:
Semiotic and Cognitive Studies in Music, Systematica Musicologica, vol. 2. Bratislava, Asco Art and Science, pp. 278-298.
Cohen, D., & Mondry, H. (1997). Learned and natural schemata in music. In Proceedings of the Third Triennial ESCOM
Conference, Uppsala, pp. 605-610.
Cohen, D., & Wagner, N. (2000). Concurrence and nonconcurrence between learned and natural schemata: the case of Johann
Sebastian Bach's saraband in C minor for cello solo. New Music Research (in press).
Dowling, W. J. (1978). Scale and contour: two components of a theory of memory for melodies. Psychological Review,
341-354.
Edworthy, J. (1985). Melodic contour and musical structure. In P. Howell, J. Cross, and R. West (Eds.), Musical Structure
and Cognition. London, Academic Press.
Eitan, Z. (1997). High Points: a Study of Melodic Peak. Philadelphia, University of Pennsylvania Press.
Fónagy, I., & Magdis, K. (1972). Emotional patterns in intonation and music. In D. Bolinger (Ed.), Intonation: Selected
Readings. Harmondsworth, England, Penguin Education, pp. 286-312.
Goldstein, M. (1974). Sound texture. In J. Vinton (Ed.), Dictionary of 20th Century Music. London, Thames and Hudson, pp.
747-753.
Huron, D. (1996). On the kinematics of melodic contour: deceleration, declination and arch-trajectories in vocal phrases. In
Proceedings of the 4th ICMPC. Montreal.
Krumhansel, C. (1997). Effect of perpetual organization and musical form on melodic expectancies. In M. Leman (Ed.).
Music, Gestalt, and Computing. Berlin, Springer, pp. 294-320.
Lansky, P. (1974). Texture. In J. Vinton (Ed.), Dictionary of 20th Century Music. London, Thames and Hudson, pp. 741-747.
Levy, J. (1982). Texture as a sign in class and early Romantic music. JAMS, 35, 402-531.
Ligeti, G. (1993). States, events, transformation. Perspectives of New Music, 31, 164-171.
Lomax, A. (1977). Universals in song. The World of Music, 19, 117-129.
Lorince, F., Jr. (1966). A study of musical texture in relation to sonata-form as evidenced in selected keyboard sonatas.
Doctoral diss., University of Rochester.
Meyer, L. B. (1975). Explaining Music. Berkeley, CA, University of California Press.
Mondry, H. (1999). Schemata as a basis for perception and creation of music. M.A. thesis, Hebrew University of Jerusalem
(Hebrew).
Narmour, E. (1991). The top-down and bottom-up systems of musical implication: building on Meyer's theory of emotional
syntax. Music Perception, 9, 1-26.
Nettl, B. (1977). On the question of universals. The World of Music, 19, 2-7.
Rakowski, A. (1990). Intonation variants of musical intervals in isolation and in musical context. Psychology of Music, 18,
60-72.
Ratner, L. G. (1980). Texture, a rhetorical element in Beethoven's quartets. Israel Studies in Musicology, 2, 51-62.
Reese, G. (1959). Music in the Renaissance. New York, W. W. Norton.
Repp, B. H. (1998). Obligatory expectations of expressive timing induced by perception of musical structure. Psychological
Research, 61, 33-43.
Rothgeb, J. (1977). Design as a key to structure in tonal music. In M. Yeston (Ed.), Readings in Schenker Analysis and Other
Approaches. New Haven, CT, Yale University Press, pp. 72-93.
Sachs, C. (1946). The Commonwealth of Art. New York, W. W. Norton.
Shanon, B., & Atlan, H. (1990). Von Forster's theory: semantic application. New Ideas in Psychology, 9, 81-90.
Stevens, D. (1980). The Letters of Claudio Monteverdi. London, Faber and Faber.
Stockhausen, K. (1957). How time passes... Die Reihe, 3, 10-40 (English edition).
Sundberg, J., Friberg, A., & Fryden, L. (1991). Common secrets of musicians and listeners: an analysis by synthesis study of
musical performance. In P. Howel, R. West, and J. Cross (Eds.). Representing Musical Structure. London, Academic Press,
pp. 161-197.
Tversky, A. (1977). Features of similarity. Psychological Review, 84 (4), 327-352.
Varèse, E. (1967). The liberation of sound, excerpts from lectures of 1936-1962. In C. Chou (Ed.), Contemporary Composers
on Contemporary Music. New York.
Yeger-Granot, R. (1996). A study of musical expectancy by electrophysiological and behavioral measures. Doctoral diss.,
Hebrew University of Jerusalem.
Back to index
Proceedings abstract
HOW MUCH TIME DO WE NEED TO PROCESS HARMONIC STRUCTURES ?
Bigand, E.*, d'Adamo, D., Madurell, F.**, Poulain, B., & Tillmann B.
* LEAD CNRS, Faculte des Sciences, Bld Gabriel, 21000 Dijon, FRANCE
** Music Department, Universite Paris IV La Sorbonne, France
*** Dartmouth College, USA
Background. The processing of a given musical event partly depends on the harmonic context in
which it occurs. Harmonically related events usually are processed faster than harmonically unrelated
one.
Aims. The purpose of the present study was to investigate the time course of harmonic priming in
short and long contexts. In short context (one chord priming), the duration of the prime, the SOA and
the ISI were manipulated in order to specify the speed at which abstract knowledge of Western
harmony may be activated in musicians and nonmusicians. In longer context (nine-chord sequences),
the tempo of the sequence was manipulated in order to trace the time course at which harmonic
modulations are processed by musician and nonmusicians.
Method. The harmonic priming paradigm was used. Participants were asked to quickly process a
target chord occurring after a short or a long musical context. The harmonic relationship of the target
and the previous context is manipulated. In single chord priming, the harmonic relationship of the
target and the prime was varied across the circle of fifth. In long chord sequence priming, the target
chord was a tonic chord of a new key whose distance from the first key was manipulated around the
circle of fifth. It was assumed that distant key or chord relationships requires more time to be
processed.
Results. Previous findings provide evidence that single chord priming occurs for prime duration as
short as 50 ms. A 50 ms prime is no longer explicitly recognizable as music, but participants (even
nonmusicians) reacted differently to more or less harmonically related primes. This suggest that
harmonic knowledge of western harmony may be very quickly activated. Experiments with long
(modulating) chord sequences are currently running.
Conclusions. Previous findings provide evidence that simple harmonic relationships may be very
quickly processed, even by musically naive listeners.
Back to index
Proceedings abstract
acohen@upei.ca
Background:
Aims:
This paper examines the validity of this view of music as metaphor for cinema.
The evaluation will be based on the applicability of contemporary knowledge
about music cognition to the experience of the art film.
Main contributions:
Implications:
Consistent with the intuitions of early French and American psychological film
theorists, this paper suggests the direct application of principles of music
cognition to cinema. The present approach sanctions the application of
music-theoretic ideas to film (see also Cook, 1998), but advocates, in
addition, the importance of considering cognitive constraints and proclivities
as determined by music cognition research.
Back to index
Proceedings paper
In this study the researchers surveyed applied music students at five schools in the United States and one
conservatoire in Japan. The survey investigated student perceptions of their efficiency, motivation, concentration
and planning of practice time. Students were also asked to report total practice times per week, general
demographic information and other aspects of the student's musical background.
Method
The present study in the United States and Japan extended the work of Harald Jørgensen (1997), who created the
study for students at the Norwegian Conservatoire of Music in Oslo, Norway by studying voice as well as
instrumental students, both primary and secondary instruments and both non-music majors and music majors.
Jørgensen translated the instrument to English for use in the United States at two liberal arts colleges and three
schools of music at major universities. Kiyoshi Miyamoto of Crown College in Minnesota translated the survey
into Japanese for use at a conservatoire in Japan.
Nine hundred seventy-seven applied music students participated in the study. Two hundred eighty-nine students
were surveyed in Japan, four hundred thirty- nine at liberal arts colleges and two hundred forty-nine at schools of
music in the United States. In addition to the survey, participating schools in the United States were asked to
have each of their applied music instructors identify a student who, in their opinion, was successful in the
practice studio and another who was less successful in their practice habits. Fifty-seven of those students were
interviewed. These interviews were recorded then coded for study. That part of the analysis has not been
completed.
Results
Lammers and Kruger (1999) reported that there are substantial differences between instrument groups in minutes
per week of practice time (see Table 1 below). Practice times also differ between liberal arts colleges and schools
of music in the US. Students who report more planning also engage in more practice time. They also report
higher levels of efficiency, concentration and motivation. It was also found that students who study two
instruments do not necessarily practice more that if they studied a single instrument.
Table 1
Mean Practice Time by Nation and Instrument Group
Group Self-Reports of Practice SD N
time
Keyboard
Japan - Majors 765.0 502.3 114
Non-Majors 758.8 386.4 28
Wind/Percussion
Japan-Majors 1036.3 605.6 50
Non-Majors 1230.0 212.1 2
Strings
Japan-Majors 695.6 360.8 18
Non-Majors 665.0 643.5 2
Total practice time per week was divided in to three categories: (1) those who practice five or less hours per
week, (2) those who practice between five and twenty hours per week and (3) those who practice more than
twenty hours per week. Table 2 below illustrates differences between the schools.
Table 2
Self-Reported Practice Hours Per Week (HPW) by Music Majors
Country/Type School Less than 5 HPW 5 - 20 HPW More than 20 HPW
In our previous presentation it was reported that there are differences between when students plan their lesson
time and the total amount of practice time they report. Students were asked when they planned practice time. The
five choices were: (1) before the practice day, (2) just before practice, (3) during practice, (4) just after practice
and (5) between practice days. Students in general, regardless of country of origin report that they plan, 'just
before practice' and 'during practice'. However, students who spend more time practicing were more likely to
report planning 'before the practice day', 'just after practice' and 'between practice days'. These differences can be
seen in Figure 1. These differences are found in each of the four primary instrument groups are studied. They are
not noticeable when examining Japanese students. (See Figure 2).
Systematic Planning
In this section we will suggest influences upon practice habits and differences between the two countries of Japan
and the United States. In particular, we will examine the impact of reporting that one is a systematic planner. At
one point in the survey participants were asked to rate themselves on a five-point scale in response to the
question, "Do you regard yourself as a person who uses planning of practice in a systematic way?" The mean
response of music majors at schools of music in the United States was 2.8, (SD 1.1, N = 87). Music majors at the
liberal arts colleges had a mean of 3.1, (SD 1.04, N = 290) while in Japan music majors had a mean of 2.6, (SD
0.89, N = 226). Some caution is advised in comparing means across nations because of potential differences in
the way Likert scales are used. Several experts on Japanese culture we've spoken to indicate that Japanese
students may simply be less likely to use the extreme ends of these scales. Differences within nations remain
quite interesting.
Connection to when Practice is Planned
When practice is planned was found to be associated with the extent to which subjects perceive that they are
systematic planners as well as to how much time students practice. Students in the United States who think they
are 'systematic' planners tend to report that they plan their practice day at the beginning of the day (r (255) = .46,
p < .01), between practice days (r (255) = .36, p < .01) and at the beginning of practice (r (255) = .28, p < .01).
These effects were weaker for Japanese students. The correlation between self-reports of systematic planning and
planning at the beginning of the day was found to be (r (222) = .35, p < .01). The effect was weaker but
significant for between practice days (r (222) = .20, p<.01) and even less at the beginning of practice time (r
(222) = .14, p <.01).
Perceived Change in Systematic Planning
All students were asked to answer on the five-point scale to, "Compare your planning now to the year
immediately preceding your study at this college/university? (We are interested in your general development, not
the often occurring and short time deviations)." Second, third and fourth year students were also asked, "How is
your planning now compared to earlier study at this college/university?" This allows us to ask if those who now
report that they are systematic planners also report that they have improved their practice habits. These
relationships are stronger for the Japanese students than for those in the United States. When comparing
'systematic' planning to 'year before' planning, the statistics are as follows: US (r (250) = .34, p < .01), Japan (r
(209) = .41, p < .01). Similar results occur in statistics that relate 'systematic' planning now to planning for
second, third and fourth year students in earlier years at their institution: US (r (250) = .31, p < .01), Japan (2) to
(4) (r (209) = .42, p < .01). This pattern also is revealed when relationships are examined between 'year before
entering this school' and for second, third and fourth year students' 'previous experience at this school', US (r
(255) = .48, p < .01), Japan (r (222) = .78, p < .01).
more minutes per week than keyboard players who practice more than string players who in turn practice more
than voice students. In the United States, keyboard students practice more than wind and percussion students
followed by string players and voice students.
A comparison was made to the Conservatoire of Music in Oslo to the schools in Japan and the United States. It
was reported that forty-two percent of the Oslo students practice twenty hours per week or more while only two
percent of the students at liberal arts colleges in the United States practice that amount of total time. Those same
liberal arts colleges have forty-six percent of their students reporting less than five hours per week of total
practice time and only five percent of the Oslo students practice a total of five hours or less per week.
The researchers also reported earlier that when some students plan practice time has a relationship to total
practice time. For students in the United States, those that plan before the practice day, just after practice and
between practice days also report that they practice more. This did not apply to students in Japan.
Students were asked if they consider themselves to be 'systematic' planners for practice. Music majors at the
liberal arts colleges in the United States report more systematic planning than do the students at schools of music
in the US and they in turn more than Japanese students. Also, those students in the US who rated themselves
systematic planners of practice time, plan at the beginning of the day, between practice days and at the beginning
of practice. Japanese students agree, to a lesser degree, only to planning at the beginning of the day.
When instrument groups are studied, Japanese students relate more strongly than students in the United States to
systematic planning now compared to the year before entering the school in which they are now enrolled in all
areas except string students where the reverse was reported. In those same groupings, second, third and fourth
year Japanese students relate the question of systematic planning now to planning in previous years at their
school to a greater degree than the students in the US in all areas except strings. They also relate planning
previous to their school enrollment to planning in previous years at their school higher than students in the United
States for all instrument groups except for string instrument performers.
The influence of persons known by students to practice habits was considered in relation to self-reports of how
they rated themselves as systematic planners of practice time. The strongest relationship between these two was
found in the vocal area when it relates to 'yourself' as the major influence upon planning of practice time in both
countries. "Yourself' is also the most important influence upon planning by Japanese keyboard students. Persons
outside the school also influence Japanese keyboard students. Voice students in the United States relate, to a
lesser degree, to 'other faculty/advisor(s)'. The instrument teacher and theory teachers are reported as influencing
planning of practice time by US string students. US wind and percussion students have planning influenced by
'other voice/instrument teachers' and 'other faculty/advisor(s)'.
Relationships were found between planning in the year before attending their school with possible influences
upon the planning of practice time, and between second, third and fourth year students and whom they think
influenced practice behavior. The strongest relationship of 'year before' to 'influence' was by Japanese vocal
students who reported 'yourself' as the person with the most influence. When relating influences to planning in
previous years by second, third and fourth year students, the Japanese vocal students again report the strongest
influence as 'yourself'. The strongest relationship by upper class voice students in the United States was found in
the 'year before' category who gave credit to their teacher for improvement. They also reported the teacher as the
strongest influence if they compared themselves to the year before they entered the school in which they are now
enrolled.
Conclusion
This paper has reported a set of relationships among and between the countries of the United States and Japan as
well as between types of schools and instrument categories. Analysis of the large amount of data produced by
this study has not been completed, notably, a comparison analysis of the data reported here and that reported
elsewhere by Harald Jørgensen at the Norwegian Conservatoire in Oslo, Norway. That project is planned for the
end of the year 2000. The researchers are also preparing to examine the information gathered through the
interview process in the United States. This study has shown that planning of practice time stands out as a very
important aspect of successful practice. How that planning is learned and taught as well as the many other aspects
of successful instrument practice may be answered with more study of the data that is now captured in our
Back to index
Proceedings paper
Background
There have been ceaseless efforts to find a model on melodic expectancy of listeners, especially for
the past ten years. Narmour's implication-realization model (1990, 1992) is a music-theoretical and
psychological approach on listeners' melodic expectancy and confirmation, having its roots in Meyer's
theory (1956) and Gestalt principles such as proximity, similarity, and common fate. This model
hypothesizes that melodic expectancy is influenced by two perceptual systems: one a flexible,
variable, and empirically driven top-down system, and the other a rigid, automatic, unconscious,
preprogrammed bottom-up system. In contrast with learned top-down process, bottom-up process is
regarded as an innate and pan-stylistic system on tone-to-tone level. In addition, this bottom-up
system is partially based on universal and Gestalt-like principles and can be summarized as following
five principles: registral direction, intervallic difference, registral return, proximity and closure. The
five principles are defined as follows.
(1) Registral direction: small implicative intervals (≤ 5 semitones) imply continuation in the same
direction, whereas large implicative intervals (≥ 7 semitones) imply change in a different direction.
(2) Intervallic difference: small implicative intervals imply a subsequent realized interval that is
similar in size, whereas large implicative intervals imply a subsequent realized interval that is
relatively smaller in size.
(3) Registral return: it occurs when an implicative interval moves to a third note that is identical or
near (within ± 2 semitones) to the first note of the implicative interval.
(4) Proximity: any implicative interval implies a subsequent note near to (≤ 5 semitones) the second
note of the implicative interval.
(5) Closure: it occurs when registral direction of the implicative and realized intervals is different, or
when a large implicative interval is followed by a smaller realized interval, or both.
Of five bottom-up principles above mentioned, the two primary principles are registral direction and
intervallic difference. Narmour's five bottom-up principles, including these two primary principles,
have been recently supported by several experimental studies (Cuddy & Lunney, 1995; Krumhansl,
1991, 1995; Russo & Cuddy, 1996; Schellenberg, 1996, 1997; Thompson, Cuddy & Plaus, 1996,
1997; Thompson & Stainton, 1998).
Cuddy & Lunney (1995) and Thompson, Cuddy & Plaus (1996, 1997) evaluated five bottom-up
principles with simple two-tone melodic intervals in a melodic continuation or completion test.
Krumhansl (1991, 1995) expanded the applicability of bottom-up principles into melodic excerpts
ranging from three to eight measures in three different styles: British folk songs (tonal melodies),
Webern Lieder (atonal melodies), and Chinese songs (non-Western tonal melodies). She concluded
that the judgments of continuation tones were predicted quite well by Narmour's bottom-up principles
and this was true regardless of style. Schellenberg (1996) has also supported Krumhansl's results
strongly with the same musical materials and data.
However, with a closer examination, we may find that bottom-up principles are not equally applicable
to all musical styles. In Krumhansl's study (1991, 1995), the predictive power of bottom-up principles
was weaker in atonal music than in tonal or non-Western ton