You are on page 1of 32

Cognition, 52 (1994) 23-54 23

OOlO-0277/94/$07.00 0 1994 - Elsevier Science B.V. All rights reserved.

What child is this? M%at interval was that?


Familiar tunes and music perception in
novice listeners
J. David Smith*‘” Deborah G. Kemler Nelsonb, Lisa A. Grohskopfb,
Terry Appleton’
“Psychology Department, Park Hall, State Universityof New York at Buffalo, Amherst, NY 14260,
USA
hPsychology Department, 500 College Avenue, Swarthmore College, Swarthmore, PA 19081, USA
‘Massachusetts General Hospital, Neuropsychology Laboratory, VBK 8300, 55 Fruit Street, Boston,
MA 02114, USA

Received June 7, 1993, final version accepted November 13, 1993

Abstract

In the laboratory, musical novices often seem insensitive even to basic structural
elements of music (octaves, intervals, etc.), undermining long-held theories of
music perception, and threatening to leave currentonly to
theories applicable
experts. Consequently it is important to demonstrate novices’ basic listening
competence where possible. Two experiments examined the perception of musical
intervals (minor thirds, major thirds and perfect fourths) by musical novices.
Subjects received either standard instructions or familiar folk-tune labels to aid
performance. The folk-tune labels greatly improved identification performance,
producing expert-caliber performance by some musically inexperienced subjects.
The effectiveness of the folk-tune manipulation was much more limited in a difficult
discrimination task. The results suggest that novices do have some basic competence
when assayed appropriately, and that familiar musical tokens may be a critical
element in such assays. Larger implications of the role of familiarity in novices’
competence are discussed, including those that relate to music cognition and
aesthetics.

*Corresponding author.
We thank Jenny Saffran for her assistance in the conduct of this research, and Larry Nelson who
guided the production of all the stimulus materials.

SSDZ OOlO-0277(93)00603-5
24 J.D. Smith et al. I Cognition 52 (1994) 23-54

The musical novice: tin-eared and tonally deaf?

Musical novices perform poorly on many basic musical tasks. For example,
Allen (1967) had subjects rate the similarity of a central pitch to surrounding
tones. Sensitive listeners should show octave generalization - that is, similarity
ratings should show a generalization “scallop” with local maxima at octave
intervals to either side of the central pitch. Experts show this pattern (as in Fig.
1A). Novices do not (as in Fig. 1B); instead, psychological similarity decreases

A. Experts B. Novices

7-

6-

1’
5 6 7 8 9 5 9
LOG FREQUENCY L:G FR:Q”EN%Y

C. Experts D. Novices

6- 6-

Figure 1. Judgements by experts (A) and novices (B) of the similarity between a central pitch and
others. Only experts show octave generalization; for novices mere pitch difference controls
similarity. Judgements by experts (C) and novices (0) of the closure provided when tones
complete a melodic fragment. Only experts reveal diatonic sensitivity; novices’ judgements
reflect mere pitch distance.
J.D. Smith et al. I Cognition 52 (1994) 23-54 25

monotonically with pitch distance (see also Thurlow & Erchul, 1977). Burns and
Ward (1982, p. 263) concluded that “only musically trained observers give results
indicative of octave similarity”.
In another example, Krumhansl and Shepard (1979) asked listeners to judge
how well different tones completed a musical scale. Sensitive listeners should
judge finality in accordance with the diatonic hierarchy of Western music. Trained
listeners show this pattern (Fig. 1C). Novices do not (Fig. 1D) - again their
judgements obey pitch-distance relations, not tonal relations (see also Shephard,
1982).
This paper focuses on another basic musical ability, that of recognizing or
identifying musical intervals. Predictably, experts show sharply tuned identifica-
tion functions with clearly defined category boundaries (Burns & Ward, 1978,
1982; Siegel & Siegel, 1977). In contrast, novices show ragged and chaotic
identification functions (Siegel & Siegel, 1977). Moreover, their interval categori-
zation is strongly distorted by a contextual shift in the range of interval sizes. This
suggests the absence of any overlearned or privileged mnemonic anchors to hold
the response categories in place, and signals that subjects are distributing an
arbitrary set of responses over whatever stimulus range is employed in the
experiment (see also Burns & Ward, 1982).
In two cases novices have had some success with interval identification under
rarefied conditions. Zatorre and Halpern (1979) had subjects identify two chordal
intervals (minor and major thirds), when all trials had the same base pitch. Locke
and Kellar (1973) presented chordal triads in which the middle pitch varied from
C (524) to C# (554) in the context of fixed tones at A (440) and E (659).
However, generalizations from these two reports about interval perception are
limited, because, in both cases, trials were fixed at a particular pitch level. Hence,
as suggested by Zatorre and Halpern, accurate performance could have been
grounded in acoustic, pitch-matching strategies rather than in interval-identifica-
tion processes per se. In any event, there is a wide gap between the performance
of experts and normal listeners in several tasks which supposedly tap the basic
elements of music perception.

Novices’ failure and music’s psychophysical and cognitive imperatives

However interesting this performance gap might potentially be, it has immedi-
ate and important implications for our old theories and new science of music
cognition-if it is real. Consider that in many musical systems worldwide the
octave has a privileged position which underlies the cyclicity of musical scales.
The special status of the octave has been ascribed to an invariant of natural sound
which human (perhaps mammalian) perceptual systems have learned or evolved
to use (Deutsch, 1977, 1982; Roederer, 1973; Terhardt, 1978). From this
26 J.D. Smith et al. I Cognition 52 (1994) 23-54

psychophysical perspective, octave similarity should be perceived by experts and


novices alike. For as long as novices defeat this prediction, they infirm such
classical interpretations.
Alternatively, consider the principal focus here - interval identification. Pre-
sumably, the intervals and chromas of music are constituents of a shared category
system among composers, performers and listeners (Frances, 1958, pp. 34-35) -
one that is useful for providing the psychological mileposts for mapping melodic
or harmonic motion (Helmholtz, 1877, pp. 250 ff. ; see also Dowling & Harwood,
1986). The use of interval categories has sometimes been viewed as inevitable or
at least adaptive in order for human listeners with limited capacity to make sense
of the informational richness of music (Burns & Ward, 1982; Dowling &
Harwood, 1986). By analogy to the categorical perception of speech sounds (e.g.,
Studdert-Kennedy, Liberman, Harris, & Cooper, 1970), interval and chroma
categorization during on-line music listening should render manageable the task
of apprehending the complex auditory information fast unfolding over time. And
trained listeners do show categorical perception of musical stimuli (Burns &
Ward, 1974, 1978; Locke & Kellar, 1973). In fact, this categorical imperative has
been used to explain why musical systems worldwide use only about 5-7 discrete
pitch categories in their scales (Burns & Ward, 1982). Such a limit may be
responsive to a restriction on humans’ capacity to categorize continuously varying
stimuli along a single dimension (Estes, 1972; Miller, 1956).
Thus it has been said that musical interval categories are overlearned,
entrenched within cognition, and may “constitute one of the must durable
families of perceptual-motor schemata that have been observed in psychology,
ranking perhaps only after the schemata of natural language in their stability and
resistance to change in adult life” (Dowling, 1978, p. 345). But if musical interval
categories are to play the role that has been ascribed to them in music listening,
everybody should use them - birthday-party goers, Christmas carolers, even
shower singers.
The problem is that only experts have ever shown such categorization
processes. And thus music’s categorical imperative, presumed to be a fundamen-
tal cognitive accommodation a listener should make to music, and perhaps even a
fundamental accommodation which musical scales have made to human cognition,
risks being demoted to some artifact or convenience of expertise. The bottom line
is that uncovering basic listening competencies in novice listeners will be critical to
sustaining a variety of classic interpretations of the structure and evolution of
music.
Fundamental novice/expert differences must also be addressed from the
perspective of contemporary research in music perception. Suppose, as the data
suggest, that novices really do listen to music as contourish tinkling which is
octaveless, chromaless, intervallically vague, and so forth. In fact, in a provoca-
tive summary of some of the literature, Krumhansl and Shephard (1979, p. 529;
J.D. Smith et al. I Cognition 52 (1994) 23-54 27

see also Dowling & Harwood, 1986, pp. 112-113; Shepard, 1982, p. 370) verged
on just this conclusion, that novices essentially listen to music psychophysically,
and not musically. Then one would conclude that the helical model of musical
pitch, the multi-dimensional metric spaces, the abstract octave organizing pitch
classes, and the structural representations of tonal hierarchies only describe the
cognitive systems of the musical elite. If so, conclusions from much contemporary
research on music perception would seem stratified, or worse, rarefied.
To ensure that music cognition research today is more than just a 2% solution,
and provides more than just an expert-centered account of musical phenomena, it
will be valuable to search further and harder for conditions that reveal basic
musical competencies in the vast majority of music listeners.

Inadequate orientation and inappropriate tasks: are these the problem?

One might begin by hypothesizing that the prior history of novices’ failure in
many music perception paradigms is due to impoverished instructions or artificial
tasks that have not adequately prepared novices to display their underlying
competence. Perhaps novices receive insufficient orientation to the underlying
octave, scale and interval concepts before testing begins. Or perhaps they are not
sufficiently prompted to think musically in such tasks.
However, a variety of arguments suggests that interpretations of this sort are
inadequate to explain the breadth and persistence of novice failure. For one
thing, octave generalization on many accounts should need no embellished
instruction or special musical set. If the auditory system has learned or evolved to
process the octave invariant that resides in acoustic signals, then the perceptual
similarity of octaves should be easy to elicit, and not require extensive instruc-
tions. Similarly, if musical intervals are presumed to be massively experienced,
overlearned and deeply engrained, like the perceptual categories underlying
speech. it is difficult to understand why access to such categories should be so
hard to reveal in the laboratory.
The “musical set” argument also gives novices too little credit as motivated
subjects. Subjects experience frustration in these experimental paradigms, fre-
quently sensing how deficient their performances are. If novices could find a
simple, workable strategy, such as adopting a musical set, or imagining instru-
ments playing the tones, or singing to themselves, it seems likely they (often
highly motivated college students) would do so and improve their lot. Indeed,
even when tasks are deliberately modified to help the novice, they still fail at tasks
like octave recognition (Thurlow & Erchul. 1977).
Moreover, novices’ qualitatively different performance remains in force even
when the musical stimuli are whole musical scales (Krumhansl & Shepard, 1979),
tonal melodic fragments (Dowling, 1978), fully composed harmonic progressions
28 J.D. Smith et al. I Cognition 52 (1994) 23-54

(Smith & Melara, 1990), or even the works of real composers (Smith, 1987).
These findings strain the validity of a claim that novices produce their failures
only because their true musical competence has not been engaged by the task
contexts that investigators have provided.
And the strain increases given a century of scholarship in musical aesthetics
which focuses on novices’ qualitative weakness in listening choices, preferences,
styles and abilities (Bell, 1958; Hanslick, 1957; Lee, 1918; Ortmann, 1927; Smith
& Witt, 1989). All of this scholarship reflects real music contexts, actual composer
preferences, first performance riots, and the like, even as it foreshadows the
existing novice data. Thus, there is much more to be explained away than just
sterile laboratory findings using amusical methodologies. The novices’ problem
appears to transcend inappropriate tasks and sets. However musically novices are
“trying to think”, they would seem to need some additional aid to gain access to
their musical competence, if it is there at all.
In fact, all of the novice failures in the laboratory, and many of their failures in
the concert hall, have come from musical stimuli and compositions that are novel
and unfamiliar. These musical tokens do not make contact with novices’ long-
term memory of, and familiarity with, well-known music and melodies. A
principal hypothesis behind the current experiments was that musical competence
would emerge most strongly for musical material in long-term memory, and
especially for familiar musical tokens. This hypothesis accords with the ideas and
findings of Attneave and Olson (1971) and with other research showing that
novices recognize familiar tunes easily (Bartlett & Dowling, 1980; Deutsch,
1969).
The familiarity hypothesis opens new possibilities for revealing basic musical
skills like interval identification, the focus here. In the abstract, novices identify
intervals poorly (Burns & Ward, 1982). But if one reformulated the task,
providing well-known folk-tune labels to guide the assimilation of the intervals
into intervallic categories, one might then show novices’ competence in interval
identification convincingly for the first time.
In the current experiments, novices were tested under two different sets of
instructions for their ability to categorize in accord with three standard musical
intervals of Western music (minor thirds, major thirds and perfect fourths). Both
instruction sets involved rather extensive instructions and orientation to the task
at hand. But, to test the hypothesis that access to familiar musicai tokens is a
critical contributor to novice competence, participants in one of the conditions
were provided familiar folk-tune labels to guide their interval-identification
performance. For this Folk-tune condition, we took advantage of the fact that
three highly familiar melodies - What child is this? (or Greensleeves), Kumbahyah
and Here comes the bride - begin, respectively, with the intervals of a minor third,
a major third and a perfect fourth.
J.D. Smith et al. I Cognition 52 (1994) 23-54 29

EXPERIMENT 1

Method

Subjects

Twenty-eight adults (19-57 years old) served in the experiment. Subjects were
screened to ensure their inexperience, using a music experience questionnaire.
Before ninth grade, subjects had received an average of 0.16 years of music
theory or harmony training (s.d. = .38), 0.11 years of music appreciation (s.d. =
.21) and 1.0 year of musical performance training (s.d. = 2.16). After ninth grade,
subjects had received an average of 0.02 years of music theory or harmony
training (s.d. = .09), 0.18 years of music appreciation (s.d. = .33), and 0.38 years
of musical performance training (s.d. = .61). All subjects received $10 for their
participation.

Experimental design

Subjects were assigned randomly to either the Standard condition, in which the
intervals were to be identified using three different numerical labels (after Zatorre
& Halpern, 1979), or to the Folk-tune condition, in which intervals were to be
identified as the first two pitches of one of three familiar melodies.’

Stimulus materials

Each trial consisted of a single musical interval presented in an ascending,


melodic manner (i.e., each pitch was played separately with the lower one
sounding first). An interval consisted of two 375 ms electronically synthesized
sinusoidal tones separated by 250 ms of silence, and it was followed by at least a
6 s period of silence during which the subject made an identification. The tuning
of all pitches was accomplished with PXSlZ Editor Librarian (version 5.3;
Opcode, Inc.), run on a Macintosh II computer (Apple, Inc.) interfaced with a
Yamaha PX81Z electronic synthesizer. The programming of the pitches from
stored tuning banks into trials and blocks was accomplished with Performer

‘Throughout this paper we will refer to interval perception and identification. However, we intend
this usage to be consistent with either of two views of music perception, namely that subjects encode
pitch or chroma classes and derive intervallic information from these (cf. Dowling, 1982), or that
subjects encode intervallic distances directly.
30 J. D. Smith et al. I Cognition 52 (1994) 2.3-54

(version 2.31; Mark of the Unicorn, Inc.). Once programmed, the final products
were taped using a Teat Z-5000 cassette deck.
Intervals were built on any of seven possible first tones. These “base pitches”
were chosen (as in Burns & Ward, 1978), to be 262 Hz ? 0, 60, 120 and 180 cents.
It is this feature of the task which defeats any pitch-matching strategies, which
forces the use of more abstract intervallic categories, which undermined the
identification performance of subjects in Zatorre and Halpern (1979, Experiment
2), and which makes this identification task among the most difficult administered
previously to novices or even expert listeners.
Intervals were of 17 different sizes, ranging from 300 cents (a minor third)
through 400 cents (a major third) to 500 cents (a perfect fourth) in 12.5-cent steps.
There were 119 possible different trials (7 base pitches x 17 interval sizes). These
were permuted in four different random orders for use as one practice block and
three test blocks.

Procedure

Subjects were run individually, using a sound system that contained a Sony
TC-W7ES stereo cassette deck, a Technics SA-310 FM/AM stereo receiver, a
Teat MB-20 mixer and Realistic Nova-Pro headphones. At the beginning of each
session, there was a period of instruction during which the experimenter oriented
(or re-oriented) subjects to the stimuli and task at hand. Instruction was aided by
a small electronic keyboard (Casio CZ-1000) to illustrate certain concepts.
Each subject participated in two sessions, each about an hour long, both held
within the same week. The first session included the orientation and instructions
for the task, the practice block and one test block. The second session began with
a brief review of the instructions and a re-run of the first 42 trials of the practice
block, and ended with the remaining two test blocks. The order of the three
stimulus test blocks was randomized across subjects.

Standard condition

Orientation and instructions for subjects in the Standard condition were


intended to provoke a musical set, by leading subjects to focus on piano notes and
intervals as the experiment began. In addition, subjects were given an intensive
introduction to musical concepts relevant to the task.
Subjects were first introduced to the concept of the musical interval as the
distance between two notes played in succession. It was demonstrated that
J.D. Smith et al. I Cognition 52 (1994) 23-54 31

intervals can be large or small by the experimenter’s playing a minor second and
an octave on the keyboard and showing the subject that the melodic distance
correlates with the number of keys in between those used to play the interval. It
was emphasized to subjects that intervals are defined solely by the difference in
pitch between their first and second notes, and that where they are played on the
keyboard does not matter. To illustrate this, the experimenter played five
different octaves on Cs and Gs throughout the keyboard and explained that they
“are really all the same type of interval”, because they all span the same distance
on the keyboard.
The subject was then acquainted with the three basic intervals to be used in the
experiment, and with the labels to be applied to each in the identification task.
Minor thirds were labeled as “size 1” intervals; major thirds as “size 2” intervals,
and perfect fourths as “size 3” intervals. With the aid of the keyboard, the
experimenter familiarized subjects with the sounds of the three intervals by first
(1) playing all three intervals in succession on middle C to demonstrate the
relative distances spanned by each, then (2) playing five instances of each type on
different parts of the keyboard (emphasizing again that distance is the sole
defining characteristic), and finally, (3) playing a random assortment of nine
instances at Cs and Gs at various locations on the keyboard, and telling the
subject what kind of interval it was after each example. During these three steps,
the subject was not permitted to view the keyboard and was asked to listen to
each interval closely and concentrate on the difference in pitch between the two
notes.
Since most stimuli to come would not be exact minor thirds, major thirds, or
perfect fourths, it was necessary to acquaint the subject with the concept of
“out-of-tuneness”. This was accomplished by playing a tape of examples of
out-of-tune intervals and their in-tune counterparts. All mistuned intervals on the
tape were off by 25 cents sharp or flat. The tape included a minor third and its
sharp counterpart, a major third and its flat counterpart, and a perfect fourth and
its flat counterpart. Subjects were assured that the difference between the well-
and mistuned intervals was small and that they should not be discouraged if it was
difficult to hear.
Finally, the subjects were given response sheets and instruction in how to
perform the identification task. They were instructed to listen to each interval and
decide whether it sounded most closely like a size 1 interval (minor third), size 2
interval (major third) or size 3 interval (perfect fourth). The response sheets were
arranged so that the subjects could circle their responses on each trial.
The second session reprised this orientation to reacquaint the subjects with the
sounds of the intervals. Subjects heard five examples of each interval played on
the keyboard and concentrated on the distance between the notes. Then the
identification task instructions were repeated.
32 J.D. Smith et al. I Cognition 52 (1994) 23-S4

Folk-tune condition

The instructions given for the Folk-tune condition followed the same scheme as
those for the Standard condition, except that subjects were encouraged to
associate the isolated tone pairs of each interval with the first two notes of a
familiar melody. The melodies used were Greensleeves (or What child is this?),
(which begins with a minor third), Kumbahyah (which begins with a major third)
and Here comes the bride (which begins with a perfect fourth).
To ensure subjects’ familiarity with the melodies. subjects were asked to sing
or hum each one. If the title did not evoke the correct tune, the melody was
played on the keyboard and subjects were asked to hum or sing it back. When
asked, all the subjects judged the three tunes to be familiar.
As in the Standard condition, the subject was first introduced to the concept of
the interval and shown that intervals are defined only in terms of the distance
between two notes. In addition, it was pointed out that melodies are made up of
sequences of tones with sequences of intervals in between. Using the melody
Greensleeves (What child is this?), the experimenter showed that a melody can be
played starting at any note on the keyboard, and that it will sound like
Greensleeves (What child is this?) as long as all of the intervallic distances are the
same.
Next subjects were acquainted with the three types of intervals and with the
labels to be applied to each. Here, however, instead of the numerical labels used
in the other condition, each interval was associated with one of the melodies.
Minor thirds were referred to as intervals that sound like the beginning of
Greensleeves, major thirds as intervals that sound like the beginning of Kum-
bahyah, and perfect fourths as intervals that sound like the beginning of Here
comes the bride. In order to familiarize subjects with the sounds of the intervals,
the experimenter played each type on five different Cs and Gs throughout the
keyboard, followed by a presentation of the appropriate melody and then the
interval alone again. Then each interval was presented alone at these same five
points on the keyboard, and subjects were asked to “imagine hearing the rest of
the melody following” as they listened to each. Finally, the experimenter played
on the keyboard an assortment of nine intervals and after each informed subjects
of the appropriate melody label. Again, subjects were asked to think of the
melodies as they listened.
The concept of out-of-tune-ness was then introduced. This was done in the
same manner as in the Standard condition, except that the intervals were referred
to by their melody labels. The subjects were then given response sheets and
instruction in how to perform the identification task. The response sheets were
printed with the options of Greensleeves, Kumbahyah and Here comes the bride.
Subjects were told to listen to each interval, to decide if each sounded most like
J.D. Smith et al. I Cognition 52 (1994) 23-54 33

Greensleeves, Kumbahyah or Here comes the bride, and to circle the appropriate
melody on the answer sheet.
The second session began with an abridged version of this orientation, to
reacquaint the subjects with the sounds of the intervals. Subjects were asked to
concentrate as five examples of each interval were played on the keyboard. Then
the identification instructions were repeated and testing began.

Results and discussion

Figs. 2A and 2B show the average identification functions for the Folk-tune
and Standard groups, respectively. Folk-tune subjects appeared to identify
intervals substantially better - only they seemed to have differentiated appro-
priately among all three interval categories.
To systematically compare the performance of the two groups, the data were
remapped by pooling the occurrence of the three types of response at comparable
stimulus distances (in 12.5-cent steps) from each of the cetegory centers (300, 400
and 500 cents). For example, to calculate the percentage of responses that named
the category when the stimulus was 0 steps from the category center, label 1 (or
Greensleeves) responses to 300 cents, label 2 (or Kumbahyah) responses to 400
cents, and label 3 (or Here comes the bride) responses to 500 cents were pooled
together. To calculate the percentage of category responses when the stimulus
was 1 step from the category center, a label 1 (or Greensleeves) response to 312.5
cents, a label 2 (or Kumbahyah) response to 387.5 or 412.5 cents, and a label 3
(or Here comes the bride) response to 487.5 cents were pooled together, and so
on. The composite identification function that results can then be judged for
overall steepness-the degree to which identification responses fall off as a
function of stimulus distance from the category center.
Fig. 2C shows the mean percentage of responses that occurred as stimuli were
O-8 steps from category center, for the Standard and Folk-tune conditions,
respectively. These data were entered into a two-way ANOVA with Condition
and Steps as between- and within-subjects factors, respectively. There was a
significant effect of Steps, F(8,208) = 22.22, p < .Ol, MSe = 0.44, and a signifi-
cant interaction between Condition and Steps, F(8,208) = 3.02, p < .Ol, MSe =
.044. Comparisons at each step level revealed that Folk-tune subjects had reliably
higher response rates at steps 0, 1, 2 and 3, but reliably lower response rates at
steps 6, 7 and 8, and thus more sharply tuned identification functions overall
(these and all other post hoc tests in this paper relied on the LSmeans procedure
from the SAS package). Regression analyses showed that Steps accounted for
30% of the variance in Folk-tune response frequencies, but only 11% of the
variance in Standard response frequencies.
34 J.D. Smith et al. I Cognition 52 (1994) 23-54

A. Folk-Tune Condition B. Standard Condition

1001 -.-..-+. m3

I
----*---- M3
80 ------ P4

0-
300 400 500 30 400 560
CENTS CENTS

C. Comparison of Conditions

1 -
100
Folk-Tune

0’
012345678
STEPS FROM CATEGORY CENTER

Figure 2. Three-category identification performance in a roving-level, melodic identification task by


(A) the 14 Folk-tune subjects and (B) the 14 Standard subjects in Experiment 1. (C) The
composite identification functions of both groups, remapped as described in the text.

Finally, individual regressions were performed on each subject’s data, from


steps 0 to 8, to determine the steepness of each subject’s composite identification
function. Among the 28 novices, four of the five steepest slopes were found
among Folk-tune subjects, as were eight of the steepest ten. The six remaining
Folk-tune subjects and the remaining 12 Standard subjects all identified equiva-
lently poorly.
Individual differences within the Folk-tune condition are themselves notable.
Fig. 3A plots r2 against the slope of the composite identification curve for each of
the individual regressions performed on the 14 Folk-tune subjects. Eight subjects
produced relatively steep and well-fitting composite identification curves; six did
J.D. Smith et al. / Cognition 52 (1994) 23-54 35

A B
-----Q---- m3
100 ----o---- M3
1 ------ P4

0 3 6 9 12 15 300 400 500


REGRESSION SLOPE
CENTS

Figure 3. (A) Scatter plot showing the steepness and the r’ (variance explained) for each of the
composite identification functions of the 14 Folk-tune subjects in Experiment 1. (B)
Identification performance by 8 Folk-tune subjects strongly affected by the Folk-tune
strategy.

not. Thus, as shown in Fig. 3B, the Folk-tune manipulation produced impressive
identification performance in 8 novices, especially given that it emerged in a
demanding form of identification task - one that used a roving base tone and that
required the listener to use reliably three different interval categories. It appears
likely that it was the familiarity of the musical tokens, of the folk tunes
themselves, which conferred the benefit that is evident here. In particular, the
suggestion is that the folk-tune instruction allowed a majority of novices access to
familiar-tune schemas in long-term memory, which, unlike many novel musical
tokens in short-term memory, do make chroma and interval information available
to the perceiver.
It is difficult to know why a significant minority of the subjects in the Folk-tune
condition were unaffected by the experimental manipulation. It could be that
these subjects tried to use the instructed strategy, but that access to analyzed
representations of familiar melodies in long-term memory was difficult for them to
achieve under the conditions of the experiment. Another possibility is that these
subjects were simply less motivated than the others to execute the recommended
strategy - that is, that they were less likely to put out the effort continuously
required to hear the intervals as the beginnings of familiar melodies. In
Experiment 2, identification performance in novices was tested again in an
attempt to replicate the group results, but, in addition, the subjects were selected
from a relatively homogeneous college community reputed to attract highly
motivated students.
Subjects in Experiment 2 also received a 50-cent, roving-level, discrimination
task following the same identification task used in Experiment 1. The discrimina-
tion task was a first test of the possibility that a manipulation like providing
36 J.D. Smith et al. I Cognition 52 (1994) 23-54

folk-tune labels can push novices still farther, and reveal the role of interval-
categorization processes in interval-discrimination performance.

EXPERIMENT 2

Method

Subjects

Fourteen young adults (18-24 years old), strictly screened as musical novices,
served in the experiment. The subjects had had no training in music theory,
harmony, sight-singing or identifying musical intervals. They had experienced no
more than one semester-long course in music appreciation and history, or one
six-month informal encounter with these topics. They had had no more than 1.5
years of lessons on musical instruments or voice, and any such instruction took
place exclusively before grade nine. This strict screening, beyond that in other
published studies, was intended to provide subjects who were unquestionably
novices. An expert musician also participated in the study. The expert was 29
years old, had studied piano formally for 12 years and had studied music theory at
the college level for four semesters. Subjects were paid $15 for their participation.

Experimental design

All novice subjects were assigned randomly to either Standard or Folk-tune


conditions. The expert performed under the Standard condition. All subjects
completed the identification task and then the discrimination task.

Stimulus materials

The stimuli for the identification task were exactly the same as those used in
Experiment 1.
The discrimination stimuli were tuned and recorded using software and
equipment identical to those already described. The intervals for discrimination
trials were built on the same seven base pitches used in identification intervals,
and included 11 different intervals spanning 275 cents (a flat minor third) to 525
cents (a sharp perfect fourth) in 25-cent steps. Given these intervals, discrimina-
tion trials involving 50-cent differences in interval size could be built at 9
J.D. Smith et al. I Cognition 52 (1994) 23-54 37

mid-interval sizes of 300 cents (275 vs. 325 cents) up to 500 cents (475 vs. 525
cents) in 25cent intervals.
For presentation to subjects, five blocks of 81 trials were prepared: a practice
block and four test blocks. In each block, 63 of the trials were Different trials.
The larger interval accompanying each smaller interval was always 50 cents larger,
with a base pitch chosen randomly except that it could not share the base pitch of
the smaller interval. Seven instances of such 50-cent Different trials were
constructed at each of the nine mid-interval sizes for each block of trials. Whether
the smaller or larger interval occurred first on each trial was randomly de-
termined.
The remaining 18 trials were Same trials, in which the two intervals were the
same size (but different in base pitch). For the 18 Same trials for each block, each
of the 9 intervals from 300 to 500 cents appeared twice. The base pitches for each
interval were chosen randomly without replacement. The 18 Same trials were
intermixed among the Different trials, in two randomly determined positions in
every group of seven Different trials.
Each discrimination trial included the presentation of one interval, a 1 s silent
period, the presentation of the second interval, and at least a 6s silent response
period. Each subject within a condition received a different order of the four test
blocks.

Procedure

Subjects were run individually, using the same sound system employed in
Experiment 1. The experimental time was divided into four sessions of approxi-
mately 1 h each. These sessions were scheduled at varying intervals, all completed
within one week. The first two sessions were devoted to the identification task,
the last two to the discrimination task. In session one, subjects received the
extensive orientation already described, and two blocks of identification trials, the
first of which was practice. In session two, the subjects were briefly reoriented,
presented with the first 42 trials of the practice block, followed by the remaining
two blocks of identification trials. In session three, reorientation led to three
blocks of discrimination trials, the first of which was practice. In session four,
subjects completed the first 36 practice discrimination trials, and then the
remaining two discrimination blocks.

Standard condition

Instructions and orientation for the first two sessions were identical to the
Standard condition of Experiment 1.
38 J. D. Smith et al. I Cognition 52 (1994) 23-54

At the beginning of the third session, subjects were first reacquainted with the
intervals by playing five examples of each on Cs and Gs throughout the keyboard.
They were instructed that they would now hear pairs of intervals, and that they
were to determine whether each pair was “exactly the same, or somehow
different”. It was pointed out that now, as in the identification task, some of the
intervals would be in-tune versions of sizes “l”, “2” or “3” intervals, and that
some would be out-of-tune versions. It was emphasized that Same pairs would be
those which consist of two in-tune examples of the same interval, or two versions
of an interval out-of-tune in exactly the same way. Subjects were given response
sheets which had “same” and “different” responses for each trial, and were asked
to circle their responses on these sheets. A brief review covering the same
information occurred at the beginning of the last session.

Folk-tune condition

Instruction and orientation for the first two sessions were identical to those
used in the Folk-tune condition of Experiment 1.
In the third and fourth sessions for the Folk-tune condition, the intervals
continued to be labeled by their appropriate melody referents. Otherwise, all
aspects of the orientation and the instructions for the discrimination task were just
like those for the Standard group.

Results and discussion

Identification

Fig. 4A shows the expert’s identification performance under Standard instruc-


tions. His level of identification within category was near lOO%, with sharp
transitions at the category boundaries of 350 cents (where minor thirds give way
to major thirds) and 450 cents (where major thirds give way to perfect fourths).
Figs. 4B and 4C show the group identification functions for the Folk-tune and
Standard groups, respectively. Folk-tune subjects were far superior in identifica-
tion to Standard subjects, and in several cases their individual curves rivaled the
expert’s.
The identification functions of the novices were transformed as before. Fig. 4D
shows the changing response frequencies as stimulus tokens were 0 to 8 steps from
category centers. These data were entered into a two-way ANOVA with
Condition and Steps as between- and within-subjects factors, respectively. There
was a significant effect of Steps, F(8,96) = 31.72, p < .Ol, MSe = ,050, and a
significant interaction between Condition and Steps, F(8,96) = 11.50, p < .Ol,
J.D. Smith et al. / Cognition 52 (1994) 23-54 39

A. Expert 6. Folk-Tune Condition

m3 m3
M3 M3
P4 P4

300 400 500 300 400 500


CENTS CENTS

C. Standard Condition D. Comparison of Conditions

1
100
-----*---- m3 - Folk-Tune
----Q---- M3
---*-- P4

01
300 400 500 012345678
CENTS STEPS FROM CATEGORY CENTER

Figure 4. Three-category identification performance in a roving-level, melodic identification task by


(A) an expert, (B) the 7 Folk-tune subjects in Experiment 2, and (C) the 7 Standard subjects
in Experiment 2. (0) The composite identification functions of both groups, remapped as
described in the text.

MSe = .050. There was a steep decrease in response frequencies for Folk-tune
subjects (68%) as compared to Standard subjects (22%) over steps O-8.
Comparisons at each step level revealed that Folk-tune subjects had reliably
higher response rates at steps 0, 1, 2 and 3, and reliably lower response rates at
steps 6, 7 and 8.
Regression analyses revealed that steps accounted for 64% of the variance in
Folk-tune response frequencies, but only 18% of the variance in Standard
response frequencies.
Individual regression analyses were performed on each subject’s data, from
steps 0 to 8. They showed that six of the seven steepest composite identification
40 J. D. Smith et al. I Cognition 52 (1994) 23-54

functions belonged to Folk-tune subjects. Findings for individual subjects will be


discussed further below.

Discrimination

Fig. .5A shows the expert’s discrimination performance, plotted as the propor-
tion of Different trials correctly discriminated as a function of the mid-point of the
two intervals presented. For example, successful discriminations between 350-cent
and 400-cent intervals are indicated at 375 cents.

A. Expert EL Novices

80

60

- Folk-Tune
-.-.-a.-.-. Standard

0-I
300 400 500 300 400 500
CENTS CENTS

C. Folk-Tune Condition D. Folk-Tune Condition

1001 1.61 II

iF= ~;tsmrrv~ E w Discrimination


- Identification

01
300 400
CENTS
500 -0
O-O
300 400
CENTS

Figure 5. Observed SO-cent discrimination performance in a roving-level, melodic-intervaI discrimina-


tion task by (A) a musical expert and (B) novices in the two conditions of Experiment 2. (C)
Comparison between observed discrimination performance by Folk-tune subjects in Experi-
ment 2 and the discrimination performance predicted from identification performance. (0)
Comparison between the sensitivity (d’) shown by Folk-tune subjects in Experiment 2 in
discrimination performance and in identification performance.
J.D. Smith et al. I Cognition 52 (1994) 23-54 41

The expert clearly showed troughs of poorer discrimination at the three


category centers, and peaks of discrimination near the category boundaries. His
performance was above his overall false-alarm rate of 40%, except near two
category centers. Thus this subject showed the scalloped discrimination function
that has been interpreted to mean that categorization processes figure prominent-
ly in the interval-discrimination performance of experts (Burns & Ward, 1978,
1982; Zatorre & Halpern, 1979; see also Miller, Wier, Pastore, Kelly, & Dooling,
1976; Studdert-Kennedy et al., 1970).
Fig. 5B shows the discrimination performance of the Folk-tune and Standard
conditions. These data were submitted to a two-way ANOVA, with Condition and
mid-interval size in Cents as between- and within-subjects factors, respectively.
The analysis revealed a main effect for Cents, implying that subjects’ discrimina-
tion performance varied reliably over the range of interval sizes, F(8,96) = 2.42,
p < .05, MSe = .015. The analysis failed to show an effect of condition, a failure
to which the discussion will return.
The appearance in Fig. 5B, consistent with the reliable main effect of interval
types, is that both Folk-tune and Standard subjects showed better discrimination
between tokens in two different interval categories than between tokens within
one interval category. To systematically assess this possibility, the data were
pooled in a way that accorded with the suggestion that both groups showed
category boundaries at 350 and 425 (not 450) cents. Discrimination trials were
classified as within-category trials (trials centered around 300, 400 and 500 cents),
intermediate trials (trials centered around 325, 375, 450 or 475 cents) and
between-category trials (trials centered on 350 or 425 cents). These pooled data
were entered into a two-way ANOVA with Condition and Trial type as between-
and within-subjects factors, respectively. There was a reliable effect of Trial type,
F(2,24) = 3.50, p < .05, MSe = ,018, suggesting that between-category discrimi-
nation (58%) was better than within-category discrimination (49%), and post hoc
tests confirmed this. Indeed, within-category performance stood just at the overall
false alarm rate for the whole sample (49%). It appears, then, that the novices in
both conditions here are showing reliable hints that categorical processes are
affecting their discrimination of the musical intervals.
This finding is an important hint of novice competence in its own right. Recall
that the subjects in this experiment were filtered more strongly for their lack of
musical training than in almost any previous study. Moreover, the current task
was a particularly difficult version of the discrimination paradigm. Melodic
presentation of intervals meant that each trial involved four separable auditory
events, two of which had to be integrated into interval 1, and the other two of
which had to form interval 2 before a judgement was made. In contrast, many
studies of speech perception (or those using chordal musical tokens) present only
two integral stimuli on each discrimination trial, not four to-be-integrated stimuli.
Moreover, the sheer length (in time) of the present discrimination trials risked
42 I.D. Smith et al. I Cognition 52 (1994) Z-T-S4

some fading of the pre-categorical acoustic images which might also subserve
discrimination. Finally, the two intervals on a trial were played at different
absolute pitch levels, adding yet another layer of difficulty.
In a study by Burns and Ward (1978), novices proved almost helpless in a
50-cent, melodic, roving-level discrimination task on which the present task was
modeled. Their discrimination threshold was about 80 cents (almost a semitone),
and their discrimination functions did not show the reliable scalloping observed
here, which might imply categorical processes underlying discrimination. Burns
and Ward (1982, pp. 252-253) noted that non-musicians “show no evidence of
categorical perception” of intervals. The present results indicate that such a
conclusion is too extreme.
However, it should also be recognized that subjects in the current study
entered the discrimination task only after 2 h of intensive experience listening to
musical intervals. Perhaps what we are seeing, in both groups, is a warm-up effect
wherein a couple of hours of orientation and interval familiarizing and practice
start to have an impact on the performance of subjects. In this regard, it should
be noted that the final level of performance and the degree of scalloping that are
observed across the entire group of novices are hardly overwhelming, particularly
in comparison to the expert.
In light of the dramatic effect of the folk-tune manipulation on the identifica-
tion performance of the very same subjects, the apparent failure to see an effect
of the manipulation in the discrimination task is surprising. The Folk-tune group
showed no better discrimination performance and no more pronounced a
performance scallop than the Standard group. The superior identification of
intervals that the folk-tune labels afforded did not transfer to interval discrimina-
tion.
There is a puzzle here: why didn’t Folk-tune subjects just identify the two
intervals within a discrimination trial, and respond “different” whenever the
labels were different? If they had done just this simple thing, they would have
discriminated far better and far more categorically than they did, producing the
predicted function overlain an observed performance in Fig. 5C. (The predicted
discrimination function was computed from the identification data as the com-
bined probability of identifying differently the two intervals in a discrimination
trial.) Likewise, Fig. 5D shows measures of sensitivity (d’) estimated from the
identification and discrimination tasks (McMillan & Creelman, 1991) for the
Folk-tune subjects. Clearly, as a group, these subjects fundamentally failed at
bringing to bear in the discrimination task a strategy which transformed identifica-
tion. The pace of the discrimination task, its memory demands, or other factors
could have caused this breakdown of transfer. In fact, subjects objected that the
1 s pause between intervals within a trial left them too little time to perform the
necessary recoding into an interval label. This may suggest future research with
slower tasks, or more directive instructions to encourage the transfer of identifica-
J.D. Smith ef al. I Cognition 52 (1994) 23-54 43

tion strategies into the discrimination trials. (Note: at the limit, a discrimination
trial could last 30 s, and the subject could identify one interval, then identify the
other, and then base a discrimination response solely on those two judgements.
Then, definitely, albeit somewhat artificially, identification performance would
have transferred robustly to discrimination performance.)

Individual differences

Another look at individual differences may be informative and even encourag-


ing. Fig. 6A shows, for each of the 14 subjects, the performance advantage for
between-category over within-category discrimination as well as the steepness of
the composite identification function. The performance advantage, delta discrimi-
nation, was computed as the difference between the subjects’ hit rates on
between-category trials (trials with a mid-interval size of 350 or 425 cents) and
their hit rates on within-category trials (trials with a mid-interval size of 300, 400
or 500 cents). The expert is indicated as “E” in the figure.
First, the individual difference information in Fig. 6A reinforces the group
analyses of identification. The Folk-tune and Standard groups overlapped almost
not at all, and four Folk-tune subjects showed very steep, expert-caliber
identification functions. Thus, the effectiveness of the Folk-tune manipulation in
identification is especially clear for the novice subjects in this experiment.
Second. the information in Fig. 6A provides a useful new look at performance
in the discrimination task. It is clear that instruction in the folk-tune strategy was
not sufficient to ensure categorical processes in discrimination: Five of seven
Folk-tune subjects joined seven Standard subjects in producing only weak
evidence of categorical perception in the task.
Yet, at least as striking is that two Folk-tune subjects showed near expert-
caliber discrimination functions. Fig. 6B shows the identification and discrimina-
tion performance of the highly trained expert, and Fig. 6C shows the identifica-
tion and discrimination performance for the two select novices. Recall that these
two subjects are truly novices, who had never had training in music theory,
sight-singing or interval recognition, who had had at most one appreciation
course, and at most 18 months of instrument playing and that all before ninth
grade. The combination of the folk-tune strategy with the extensive orientation,
and perhaps the earnestness of these particular subjects, essentially balanced the
scales with the expert’s extensive history of serious and formal music study.
Clearly we have skimmed the cream off our sample, and thus this comparison
only illustrates the competence novices may be able to show when circumstances
are optimal. Still, there is at least a hint in these data that the effectiveness of the
folk-tune manipulation on interval perception is not restricted to an identification
task alone.
44 J.D. Smith et al. I Cognition 52 (1994) 23-54

n Folk-Tune
Q Standard
E Expert

0 3 6 9 12 15
ID REGRESSION SLOPE

300 400 500


CENTS

....* .... m3
..._ Q .
M3
. . .

___#__ P4
Disc.

300 400 500


CENTS

Figure 6. (A) The steepness of the composite identification function plotted against the improvement in
discrimination performance from within- to between-category trials, for 7 Folk-tune subjects,
7 Standard subjects, and a musical expert. Discrimination performance overlain on three-
category identification performance for (B) the musical expert and (C) the two strong
discriminators in Experiment 2.
J.D. Smith et al. i Cognition 52 (1994) 23-54 45

GENERAL DISCUSSION

Toward a music science of the novice

Our results help narrow a significant gap in music cognition research, while
directing attention to the theoretical riddle posed by novice listeners.
The gap opened as researchers focused on the developed cognitive systems of
experts (cf. Krumhansl, 1990), seeking the clearest cases of the textured
knowledge structures underlying the perception of tonal music. Thus, researchers
proceeded in the venerable tradition of expert-centrism among music analysts,
musicologists and grammarians of music (Meyer, 1956, 1973; Lerdahl & Jacken-
doff, 1983). Certainly, it did make sense to look for our (musical) keys under the
lamppost of expertise, because the light is likely to be brightest there. However,
the consequence was that novices received scant attention. The present results
should help to encourage more interest in this population which, after all,
represents the majority of music listeners throughout human history.
Understanding music novices becomes more important given the theoretical
puzzle they pose. On the one hand, it is tacitly assumed that, when queried in the
right way, novices will be shown to have the same musical knowledge structures
as experts, albeit in less developed or rough-hewn form. Many classic interpreta-
tions of the structure, evolution and perception of music have incorporated this
assumption, by emphasizing species-general cognitive capacities and constraints
regarding music, as opposed to expert-specific skills and strategies. Meanwhile,
novices have quietly piled up failures in very basic tests of musical concepts. As a
group, the failures risk undermining the classic cognitive interpretations, and they
also risk narrowing the scope of existing music science to the Clite cadre. A
central purpose of the present paper is to make plain that novices’ failures -
collectively - are not discountable, and remain a serious thorn in music science’s
side. Not only is it important to demonstrate basic musical competence in novice
listeners, but also to explain why competence emerges only when it does.

Familiar musical tokens and novices’ access to chroma and interval information

The present results undermine the idea that novice competence simply emerges
when novices are thoroughly instructed in the nature of the musical task. All
subjects in both experiments received an extensive orientation to intervals and
other concepts relevant to the interval-perception tasks. Judging by the Standard
subjects, this orientation produced scant benefits, while the simple folk-tune
manipulation accomplished far more.
A more compelling interpretation of novices’ performance emphasizes that
46 J.D. Smith et ul. I Cognition 5.2 (1994) 23-54

they fail particularly when given novel musical events which make no contact with
their long-term memories of familiar musical tokens. This is just the condition
that the folk-tune strategy averted for many of the novice listeners. Seen in this
light, several converging lines of evidence also suggest that novices show musical
sensitivity primarily when processing familiar melodies and other long-term
representations of musical material.
Attneave and Olson (1971) provided an important demonstration in this area.
They found that novice listeners could not produce logarithmic transpositions of
novel musical intervals, raising the possibility (p. 162) that they “lack the musical
scale”. However, novice listeners produced accurate transpositions of a familiar
musical token (the NBC chimes), making clear that they have the musical scale
when (p. 162) “given a well-defined, highly overlearned pattern as a standard”.
Analogously, novices are insensitive to chroma information when comparing
brief and novel tunes over the short term - that is, they are completely fooled by
tonal lures which share the diatonic contour, but not the exact scale steps, of the
standard (Cuddy & Cohen, 1976; Dowling, 1978). However, given identical tasks
with familiar-melody targets, novices accurately reject diatonic impostors (Bart-
lett & Dowling, 1980; Deutsch, 1969). Indeed, the mere repetition of a novel
target melody increases subjects’ ability to discriminate transpositions from lures
(Deutsch, 1979).
Other evidence shows that the most powerful musical schemata are not
abstract musical entities like scales, intervals or tonal frameworks, but are the
familiar tune schemata which we all have internalized. When given the names of
tunes, subjects can follow their performance, detect errors and so forth, even
through violent contour distortions spanning octaves, and even when the tune is
interleaved within other music (Dowling, 1973, 1978; Dowling & Harwood,
1986).
All this research suggests that the encoding of chroma and interval information
may be as easy to show within the context of long-term memories of familiar
tunes as it is difficult to show outside of familiar contexts. The present interval-
identification results reveal exactly this pattern once again.
Other predictions of the remediability of novices’ deficits follow from this view
also. For example, novices show scant sensitivity to Western music’s diatonic
hierarchy in probe-tone experiments where they rate the finality of different
endings to a musical scale (Fig. lD, Krumhansl & Shepard, 1979). By replacing
an abstract scale with a familiar tune (e.g., Joy fo the world begins with a
descending major scale), one might reveal their sensitivity to the tonal hierarchy.
Thus, a familiarity perspective offers special promise of solving the theoretical
problem posed by novices’ failures. In essence, novices often succeed in musical
tasks when performance has the support of familiar musical tokens in long-term
memory. Novices largely fail when that support is cut off. The simple theoretical
move, of linking novices’ competence to the availability of familiar musical
J.D. Smith et al. I Cognition 52 (1994) 23-54 47

tokens, brings together novices’ failures and their successes within one coherent
framework.

Abstract musical knowledge and familiar-tune schemata

Acknowledging the role of familiarity for novices may be a useful addition to


the dominant abstractionist framework in current theories of music cognition.
These theories begin by assuming the abstract knowledge representations built up
by the listener - chroma categories, the scale template, schemata for harmony,
key and key relatedness - and then trace the assimilative processes which fit music
(novel or familiar) to these schemata. From these processes, it is claimed, come
expectations, implications, closures and consequently all the aesthetic possibilities
inherent within the syntax of music that make hearing into listening and sound
into music. In this framework, little distinction need be made between the
psychological processes evoked by familiar and unfamiliar musical tokens: similar
schema-instantiation processes occur in both cases. In fact, the role of familiarity
in music cognition has been de-emphasized (it shouldn’t matter) by musicologists,
aestheticians and music cognition researchers, which accounts for why novel
stimuli (melodic fragments, isolated chord progressions) were deemed sufficient to
elicit an appreciation of diatonicism.
Presumably novices must have and use some of these abstract knowledge
representations, particularly when they are lacking any familiar tune schemata to
fall back on. For example, in melody-comparison experiments, subjects show a
slight capacity to correctly reject atonal unfamiliar lures, suggesting that they are
using the abstract concept of “tonality” as a weak cue (Dowling, 1978).
Furthermore, there is slight evidence of a tonal hierarchy in novices if one
artificially manipulates the stimulus materials to remove the powerful cue of pitch
height/contour (Cuddy & Badertscher, 1987). Indeed, children (Krumhansl &
Keil, 1982) and even infants (Trehub & Trainor, 1993) sometimes express
behavioral preferences which seem to accord with Western music’s diatonic
principles, especially when the perfect fifth of the tonic-dominant interval is
strongly featured. However, the conditions under which novices show these
capacities are quite restricted, and the effects quite modest, compared to the
transforming effect of giving subjects familiar musical tokens. In the present case
familiarity supported good performance even in an extremely difficult test of
music perception.
Thus, at least in the case of novices, the assimilation to abstract musical
knowledge representations seems to be joined by the putatively different
assimilative processes which underlie the recognition of familiar tunes.
This possibility invites new questions about the nature of the representations
48 J.D. Smith et al. I Cognition 52 (1994) 23-54

which encode familiar musical tokens and the processes which utilize them in
recognition. For example, one might ask about the important psychological
components of the representation of a familiar tune (rhythm, contour, chroma,
lyrics?). After White (1960) researchers sought more controlled stimulus materi-
als and increasingly used rhythmless and novel melodic fragments in their studies,
leaving the perception and memory of familiar musical tokens largely aside.
One might also ask about the size of the representational units which constitute
familiar-tune schemata: do we encode individual chromas or intervals or do we
perhaps encode larger musical phrases as unanalyzed wholes? The answer will
relate to the smallest unit which is psychologically accessible to the listener, and
the appropriate unit of analysis for theory and research. An interesting possibility
is that the appropriate unit of analysis is actually quite large - perhaps spanning a
musical phrase or motive. It may be difficult to analyze or unpack these larger
representations into smaller interval or chroma units or to extract tonal patterns
from the rhythms in which they are embedded. Gestalt psychologists seem to have
had just this idea in mind when they rejected the idea that melodies are simply
the sum of their individual notes and intervals (von Ehrenfels, 1890; Wertheimer,
1938). Indeed, the holistic and configural properties of melody, and its invariance
under transposition, accounted for Gestalt psychology’s abiding interest in it.
A holistic view of familiar-tune representations does explain a variety of
observations. For example, one wonders why, in music as actually experienced,
the thousands of presentations of minor thirds, perfect fourths and so on, do not
let these intervals themselves become familiar musical tokens, so that the
processing benefits of familiarity attend them also, in effect creating the abstract
and general intervallic categories which defy empirical verification in novices. This
is explained if intervals are mainly experienced within larger tune units, which are
not analyzed or unpacked down to the level of the chroma or the interval, so that
multiple presentations of intervals (within and across familiar tunes) lack in-
dependent psychological status, and do not contact each other as familiarizing
repetitions of the same token. In contrast, repetitions of larger phrases, such as
the first phrase of Here comes the bride which do have psychological indepen-
dence and accessibility, and which do stand at the appropriate level of analysis
and representation for the system, would appear to the system as familiarizing
repetitions.
The holistic view also predicts the integrality of familiar musical tokens as they
are extended left-to-right in time. Does the song Happy birthday rise or fall in
pitch between the words “to” and “you”? Most people cannot answer this
question without starting at the beginning and singing through. The appropriate
contours, intervals and chromas are elicited just when the inner voice gets to them
in the song. This phenomenon has been the subject of auditory imagery-scanning
studies by Halpern (1988) and by Smith, Reisberg, and Wilson (1992).
The holistic view not only accounts for the linking of interval and chroma
J.D. Smith et al. I Cognition 52 (1994) 23-54 49

information to familiar tunes, but it also may explain some of the individual
differences in the current findings. If the intervals of familiar tunes are themselves
difficult to analyze and access - though this is possible under some conditions -
then identifying which of several familiar melodies begins with a 300-cent
interval, for example, should be effortful, and should depend on having highly
motivated listeners; hence some of the individual differences that were observed.
Quickly identifying two intervals (a good strategy for between-category trials on
the discrimination task) would be even harder and perhaps daunting for many
listeners; this also is consistent with what we saw in the discrimination data.
Furthermore, it seems likely that the effectiveness of the folk-tune strategy that
was observed in the current studies was aided by the possibility of labeling the first
interval of the tune, rather than some interval in the middle of a (holistically
encoded) familiar musical phrase. The first interval should be easier and faster to
extract from the whole than a later interval, if the whole is a representation of a
musical phrase that can only be unpacked in a left-to-right manner.
Hypotheses about the holistic encoding of familiar musical tokens connect
naturally to well-known exemplar models in the literature on human categoriza-
tion, such as those of Brooks (1978) and Medin and Schaffer (1978). An
important feature of these models is that observers tend to categorize together
complex stimuli that are nearly identical to one another in nearly all their
components, while even one or two salient component differences reduce
dramatically the tendency to relate the stimuli. In these models, only nearly
identical tokens seem psychologically similar to each other, and trigger each other
in memory. Tokens which share many common attributes, but have some
differences, are quite distinct psychologically, and may even be essentially
incomparable.
Intuition suggests that the apprehension of the relationships between familiar
tunes conforms to this pattern. Rarely, it seems, do we compare familiar tunes
with each other, and seldom does one familiar tune remind us of another
(although two different renditions of the same familiar tune are easily and
instantly relatable). This is true despite the fact that groups of familiar tunes have
a good deal in common (chromas, intervals, etc.). For example, Three blind mice,
Mary had a little lamb, and other well-known songs start with the same 3-2-l
chromas. Similarly, even though the descending major scale and the first phrase
of Joy to the world share eight identical chromas, many listeners may never have
apprehended the similarity of these passages. The same is true for the songs Rock
of ages and Rudolph the red-nosed reindeer, which are quite hard to relate despite
sharing the same seven initial chromas. That schemata for familiar tunes would be
highly distinct representational islands, rarely compared, and almost incompar-
able, is expected if they are holistic representations that preserve information
about a large number of component attributes that combine in an integral or
strongly interactive manner.
50 J.D. Smith et al. I Cognition 52 (1994) 23-54

How do normal listeners hear novel music?

A familiarity perspective also promotes interesting suggestions about how


novices approach novel music. Novel musical materials may well be encoded with
little exact chroma and interval information, and they may be perceived in some
fashion that verges on contourish tinkling: they will seem to be largely key-less,
chroma-less, like long development sections, tonal lure upon tonal lure, allowing
the perception of rhythm and contour far more than chroma and interval. In the
process, their tonal dynamism would be largely lost, for those dynamic qualities
depend heavily on the functions of the different intervals and scale steps in music.
It is those functions which help create motion, rest, tension and resolution
(Meyer, 1956, 1973; Narmour, 1990; Piston, 1941; Ratner, 1962; Schenker, 1973;
Shepard, 1982, pp. 378-379; Zuckerkandl, 1956, 1972). Accordingly, the psy-
chology of novices facing novel music would be quite different from the
expectational dance which has been depicted in centuries of music scholarship.
In fact, contour-driven listening, which de-emphasizes chroma and interval
information, may be more common than acknowledged. For example, even
well-trained listeners have substantial difficulty processing scale-step information
in short-term melodic comparison tasks (Bartlett & Dowling, 1980). Further-
more, in wide swaths of Western music, counterpoint and contour and line and
texture may dominate the musical experience over chroma and scale and key.
This applies to many of Bach’s works, the majority of jazz compositions, all
development sections within the common practice epoch, many Romantic
compositions which strain tonality anyway, nearly all contemporary works which
have broken the chords to tonality, and a variety of Medieval and Renaissance
motets and madrigals. In many other cases the speed and note densities of music
may also help rule out the processes which monitor chroma and interval
information. Research using real listening contexts could establish the conditions
under which chroma, scale and harmonic information move to the forefront of the
listening experience, We wonder whether even expert music listeners bring the
same exquisite sensitivity to abstract musical concepts that they show in the
laboratory to spontaneous, real-world listening. An affirmative answer to this
question has always been simply assumed.

Familiarity and aesthetics

A further implication of the current focus on familiarity is that familiar musical


tokens will be more aesthetically pleasing than novel compositions, for being
processed more richly and tonally. Listeners will like what they know: the hymns
and songs their parents sang them, the old “standards” and ballads and folk-
songs, the rock favorites of their college years. They will always have to get “over
J.D. Smith et al. I Cognition 52 (1994) 23-54 51

the hump” with new songs and music-getting to know them, getting them in
their ear. This prediction is confirmed daily by orchestra programs (which feature
a group of familiar “warhorses”), by daily broadcasts of Top-40 lists which change
but slowly over time, and by classic rock stations which perpetually re-play the
same set of “golden oldies”. In the laboratory, Smith and Melara (1990) showed
that musical novices, and even music majors, found the most familiar and
prototypical harmonic progressions the most aesthetically pleasing, and wanted no
syntactic deviation or atypicality at all.
The role of familiarity has received little comment in the principal theories of
music aesthetics. These theories hold that syntactic musical utterances which are
optimally discrepant from abstract schemata, and which therefore afford optimal
levels of expectational denial and fulfillment, will be most preferred (Meyer,
1956, 1973). If pressed, these theories might predict no aesthetic consequences for
familiarity (familiar and novel musical tokens should interact with musical
schemata equivalently) or they might predict that familiarity is a detracting factor
(the familiar, expected event cannot be the optimally unexpected event). In that
both these predictions conflict with the listening preferences of many listeners, the
aesthetic consequences of familiarity seem a definite area for further research.

Familiarity and musical practice worldwide

Given the impact of familiar-tune schemata on music perception in the


laboratory, and given the familiarity preference listeners show outside the
laboratory, we speculate that familiarity has been a significant force of gravity in
musical practice worldwide. Many cultures feature a standard repertory of pieces,
stable over generations, which everyone knows how to hear and even how to
produce (Nettl, 1973). These familiar pieces recur in each year’s ritual cycle, in
children’s games, at caregivers’ knees, and may dominate the culture’s musical
offering. In a culture suffused with such music, the representations of familiar
tunes would be the most salient musical knowledge structures, and a music
psychology of the familiar would be an essential tool for understanding music
cognition and appreciation there.
Thus a familiarity perspective may also have implications for cross-cultural
studies of music, and for defining what a culture’s music is. Theory in eth-
nomusicology, like that in Western music theory and music science, has been
highly abstractionist, seeking the structural qualities which systematize the
exemplars of a music, or building the grammars which could allow the open-
ended production of new musical utterances in that genre (cf. Becker & Becker,
1979; Hughes, 1991). But it may be that the mere sum of the exemplars in a
culture’s repertory describes that music in a more psychologically accurate way
than abstract systematizations do. Arguably, listeners understand the existing set
52 J.D. Smith et al. I Cognition 52 (1994) 23-54

pieces as specific versions of themselves, and not as engaging instantiations of any


abstract system. Moreover, the relevance of a generative grammar would be
restricted because new musical utterances would seldom be generated, and if they
were listeners would process these novel tokens in an impoverished and un-
appreciative way. We close by entertaining the curious possibility that music, an
open and potentially infinitely productive grammatical system, may actually be
realized in quite a closed and unproductive way, obeying a familiarity constraint,
for long stretches of anthropological time.

References

Allen, D. (1967). Octave generalization in musical and non-musical subjects. Psychonomic Science, 7,
421-422.
Attneave, F., & Olson, R. (1971). Pitch as a medium: A new approach to psychophysical scaling.
American Journal of Psychology, 84, 147-166.
Bartlett, J.C., & Dowling, W.J. (1980). The recognition of transposed melodies: A key-distance effect
in developmental perspective. Journal of Experimental Psychology: Human Perception and
Performance, 6, 501-51.5.
Becker, A., & Becker. J.O. (1979). A grammar of the musical genre Srepegan. Journal of Music
Theory, 23, 1-43.
Bell, C. (1958). Art. New York: Capricorn.
Brooks, L. (1978). Nonanalytic concept formation and memory for instances. In E.H. Rosch & B.B.
Lloyd (Eds.), Cognition and categorization (pp. 169-211). Hillsdale, NJ: Erlbaum.
Burns, E.M.. & Ward, W.D. (1974). Categorical perception of musical intervals. Journal of the
Acoustical Society of America, 5.5, 456.
Burns, E.M., & Ward, W.D. (1978). Categorical perception - phenomenon or epiphenomenon:
Evidence from experiments in the perception of melodic musical intervals. Journal of the
Acoustical Society of America, 63, 456-468.
Burns, E.M., & Ward, W.D. (1982). Intervals, scales, and tuning. In D. Deutsch (Ed.), The
psychology of music (pp. 241-269). New York: Academic Press.
Cuddy, L.L.. & Badertscher, B. (1987). Recovery of the tonal hierarchy: Some comparisons across
age and levels of musical experience. Perception and Psychophysics, 41, 609-620.
Cuddy, L., & Cohen, A.J. (1976). Recognition of transposed melodic sequences. Quarterly Journal of
Experimental Psychology, 28, 255-270.
Deutsch, D. (1969). Music recognition. Psychological Review, 76, 300-307.
Deutsch, D. (1977). Memory and attention in music. In M. Critchley & R.A. Henson (Eds.), Music
and the brain (pp. 95-130). London: Heinemann.
Deutsch, D. (1979). Octave generalization and the consolidation of melodic information. Canadian
Journal of Psychology, 33, 201-205.
Deutsch, D. (1982). The processing of pitch combinations. In D. Deutsch (Ed.), The psychology of
music (pp. 217-316). New York: Academic Press.
Dowling, W.J. (1973). The perception of interleaved melodies. Cognitive Psychology, 5, 322-337.
Dowling, W.J. (1978). Scale and contour: Two components of a theory of memory for melodies.
Psychological Review, 85, 341-354.
Dowling, W.J. (1982). Melodic processing and its development. In D. Deutsch (Ed.), The psychology
of music (pp. 413-429). New York: Academic Press.
Dowling, W.J., & Harwood, D.L. (1986) Music cognition. New York: Academic Press.
Estes, W.K. (1972). An associative basis for coding and organization in memory. In A.W. Melton & E.
Martin (Eds), Coding processes in human memory (pp. 161-190). Washington, DC: Winston.
Frances, R. (1958). La perception de la musique. Paris: J. Vrin.
J.D. Smith et al. I Cognition 52 (1994) 23-54 53

Halpern, A.R. (1988). Mental scanning in auditory imagery for songs. Journal of Experimenta[
Psychology: Learning, Memory. and Cognition, 14. 434-443.
Hanslick, E. (1957). The beautiful in music. New York: Liberal Arts Press.
Helmholtz, H. (1877). The sensations of tone (transl. 1954). New York: Dover.
Hughes, D.W. (1991). Grammars of non-western musics: A selective survey. In P. Howell, R. West &
1. Cross (Eds.), Representing musical structure (pp. 327-362). New York: Academic Press.
Krumhansl, C.L. (1990). Cognitive foundations of musical pitch. New York: Oxford University Press.
Krumhansl, CL., & Keil, F.C. (1982). Acquisition of the hierarchy of tonal functions in music.
Memory and Cognition, 10, 243-251.
Krumhansl, C.L., & Shepard, R.N. (1979). Quantification of the hierarchy of tonal functions within a
diatonic context. Journal of Experimental Psychology: Human Perception and Performance, 5,
579-594.
Lee, V. (1918). The varieties of musical experience. North American Review, 207, 748-757.
Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT
Press.
Locke, S., & Kellar, L. (1973). Categorical perception in a non-linguistic mode. Cortex, 9, 355-368.
McMillan, N.A., & Creelman, C.D. (1991). Detection theory: A user’s guide. Cambridge, UK:
Cambridge University Press.
Medin, D.L., & Schaffer, M.M. (1978). Context theory of classification learning. Psychological
Review, 8.5, 207-238.
Meyer, L.B. (1956). Emotion and meaning in music. Chicago, IL: University of Chicago Press.
Meyer, L.B. (1973). Explaining music: Essays and explorations. Berkeley, CA: University of
California Press.
Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for
processing information. Psychological Review, 63, 81-96.
Miller, J.D., Wier, C.C., Pastore, R.E., Kelly, W.J., & Dooling, R.J. (1976). Discrimination and
labeling of noise-buzz sequences with varying noise-lead times: An example of categorical
perception. Journal of the Acoustical Society of America, 60, 410-417.
Narmour, E. (1990). The analysis and cognition of basic melodic structures: The implication realization
model. Chicago: University of Chicago Press.
Nettl, B. (1973). Folk and traditional music of the western continents. Englewood Cliffs, NJ:
Prentice-Hall.
Qrtmann, 0. (1927). Types of listeners: Genetic considerations. In M. Schoen (Ed.), The effects of
music (pp. 38-77). New York: Harcourt. Brace.
Piston, W. (1941). Harmony. New York: Norton.
Ratner. L.C. (1962). Harmony: Structure and style. New York: McGraw-Hill.
Roederer, J. (1973). Introduction to the physics and psychophysics of music. New York: Springer-
Verlag.
Schenker, H. (1973). Harmony. Cambridge, MA: MIT Press (transl. from 1954 edition).
Shepard, R.N. (1982). Structural representations of musical pitch. In D. Deutsch (Ed.), The
psychology of music (pp. 343-390). New York: Academic Press.
Siegel, J.A., & Siegel, W. (1977). Absolute identification of notes and intervals by musicians.
Perception and Psychophysics, 21, 399-407.
Smith, J.D. (1987). Conflicting aesthetic ideals in a musical culture. Music Perception, 4, 373-392.
Smith, J.D., & Melara, R. (1990). Aesthetic preference and syntactic prototypicality in music: ‘Tis the
gift to be simple. Cognition, 34, 279-298.
Smith, J.D. Reisberg, D., & Wilson, M. (1992). Subvocalization and auditory imagery: Interactions
between the inner ear and inner voice. In D. Reisberg (Ed.), Auditory imagery (pp. 95-119).
Hillsdale, NJ: Erlbaum.
Smith, J.D., & Witt, J. (1989) Spun steel and stardust: The rejection of contemporary compositions.
Music Perception, 7, 169-186.
Studdert-Kennedy, M., Liberman, A.M., Harris, K.S., & Cooper, F.S. (1970). Motor theory of
speech perception: A reply to Lane’s critical review. Psychological Review, 77, 234-249.
Terhardt, E. (1978). Psychoacoustic evaluation of musical sounds. Perception and Psychophysics, 23,
483-492.
54 J.D. Smith et al. I Cognition 52 (1994) 23-54

Thurlow. W.R., & Erchul, W.P. (1977). Judged similarity in pitch of octave multiples. Perception and
Psychophysics, 22, 177-182.
Trehub, S.E., & Trainor, L.J. (1993). Listening strategies in infancy: The roots of music and language
development. To appear in S. McAdams & E. Bigand (Eds.). Cognitive aspects of human
audition. London: Oxford University Press.
von Ehrenfels, C. (1890). Uber Gestaltqualitaten Vierteljahrschrift fiir wissenschaftliche Philosophic.
14, 249-292.
Wertheimer. M. (1938). Gestalt theory. In W.D. Ellis (Ed.), A source book of Gestaltpsychology (pp.
1-11). London: Routledge & Kegan Paul.
White, B. (1960). Recognition of distorted melodies. American Journal of Psychology, 63, 100-107.
Zatorre, R.J., & Halpern, A.R. (1979). Identification, discrimination. and selective adaptation of
simultaneous musical intervals. Perception and Psychophysics. 26. 384-395.
Zuckerkandl. V. (1956). Sound and symbol. Princeton, NJ: Princeton University Press.
Zuckerkandl. V. (1972). Man the musician. Princeton. NJ: Princeton University Press.

You might also like