You are on page 1of 66

Vol 7, No 3-4 (2012)

Empirical Musicology Review

Table of Contents
EDITOR'S NOTE

Editors' Note
Nicola Dibben, Renee Timmers 102

ARTICLES
The Harmonic Minor Scale Provides an Optimum Way of Reducing Average Melodic Interval Size, Consistent
with Sad Affect Cues
David Huron, Matthew Davis 103-117
Major-Minor Tonality, Schenkerian Prolongation, and Emotion: A commentary on Huron and Davis (2012)
Richard Parncutt 118-137
Interval Size and Affect: An Ethnomusicological Perspective
Sarha Moore 138-143
A method for testing synchronization to a musical beat in domestic horses (Equus ferus caballus)
Micah R. Bregman, John R. Iversen, David Lichman, Meredith Reinhart, Aniruddh D. Patel 144-156
If horses entrain, don’t entirely reject vocal learning: An experience-based vocal learning hypothesis
Adena Schachner 157-159
A commentary on Micah Bregman et al.: A method for testing synchronization to a musical beat in domestic horses
(Equus ferus caballus)
Sandy Venneman 160-163

BOOK REVIEW
Aaron L. Berkowitz, The Improvising Mind: Cognition and Creativity in the Musical Moment. New York:
Oxford University Press, 2010.
Jakub Matyja 164-166
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Editors’ Note
NICOLA DIBBEN
RENEE TIMMERS
University of Sheffield, UK

WE are happy to present the delayed, but in many respects updated, Volume 7, Numbers 3-4, of Empirical
Musicology Review. A number of changes have occurred since the previous special issue on ‘Meaning and
Entrainment in Language and Music’. The journal is now published using a new online system, hosted by
the Ohio State University Libraries. It is the first issue published under our editorship and the editorial
management of Daniel Shanahan, and in which Joseph Plazak joins Randolph Johnson as Assistants to the
Editors.

Volume 7, Number 3-4, represents the breadth of scope of Empirical Musicology as an area of research.
The paper by David Huron and Matthew Davis reports an empirical investigation of the association
between sadness and the presence of small melodic intervals. It argues that the minor mode makes optimal
use of small intervals. The commentaries by Richard Parncutt and Sarha Moore contextualise this study
within a broader historical and ethnomusicological context, respectively. Richard Parncutt argues for the
relevance of Schenkerian theory to highlight the relevance of the minor triad for the sadness experience of
the minor mode, while Sarha Moore looks at alternative emotional experiences of flattened tonal scale
degrees in non-western cultures.

By contrast, the paper by Micah Bregman, John Iversen, David Lichman, Meredith Reinhart and Aniruddh
Patel presents a paradigm for investigating entrainment to a beat in horses. While Adena Schachner argues
against attributing entrainment to vocal learning alone, Sandy Venneman provides methodological critique
from the perspective of a researcher and horse trainer.

Finally, the issue includes a book review by Jakub Matyja interpreting improvisation from a cognitive
perspective.

We hope that this issue within its new online format will generate renewed interest in Empirical
Musicology, and are looking forward to more constructive discussions on prominent musicological topics
that this issue exemplifies so well.

102
Empirical Musicology Review Vol. 7, No. 3-4, 2012

The Harmonic Minor Scale Provides an Optimum


Way of Reducing Average Melodic Interval Size,
Consistent with Sad Affect Cues
DAVID HURON
Ohio State University

MATTHEW J. DAVIS
Ohio State University

ABSTRACT: Small pitch movement is known to characterize sadness in speech


prosody. Small melodic interval sizes have also been observed in nominally sad music
—at least in the case of Western music. Starting with melodies in the major mode, a
study is reported which examines the effect of different scale modifications on the
average interval size. Compared with all other possible scale modifications, lowering
the third and sixth scale tones from the major scale is shown to provide an optimum or
near optimum way of reducing the average melodic interval size for a large diverse
sample of major-mode melodies. The results are consistent with the view that Western
melodic organization and the major-minor polarity are co-adapted, and that the
structure of the minor mode contributes to the evoking, expressing or representation of
sadness for listeners enculturated to the major scale.

Submitted 2012 September 4; accepted 24 September 2012.

KEYWORDS: minor mode, sadness, melodic interval, scales

KRAEPELIN (1899/1921) identified five characteristics of sad speech: (1) low overall pitch, (2) small
pitch movement, (3) quieter, (4) mumbled articulation, and (5) slower speaking rate. Over the past century,
Kraepelin’s clinical observations have been confirmed through controlled experimental studies. With regard
to small pitch movement, sad vocal prosody is more “monotone” compared with normal voice (Banse &
Scherer, 1996; Bergmann, Goldbeck & Scherer, 1988; Breitenstein, van Lancker, & Daum, 2001; Davitz,
1964; Eldred & Price, 1958; Fairbanks & Pronovost, 1939; Huttar, 1968; Skinner, 1935; Sobin & Alpert,
1999; Williams & Stevens, 1972). Wide pitch excursions are associated with high physiological arousal,
such as occurs in joy or anger. By contrast, low variability of the fundamental pitch (F0) is characteristic of
low arousal, including both sad voice and sleepy voice.
In historical and cross-cultural studies, music-related sadness is one of the most commonly
described affects. In his volume De Medicina, the first century Roman physician Aulus Celsus described
how “playing soft music” might be used to treat the mental stupor thought to be caused by an excess of
black bile—a condition Celsus described by its literal Latin term, melancholia. Similar descriptions of
music’s ability to evoke or temper sadness can be found in many historical texts spanning many cultures,
including ancient Egyptian, Chinese, Hebrew, Persian, Arabic, and Sanskrit sources. Apart from historical
sources, there also exist both ethnographic and empirical descriptions of grief- or sadness-related musical
experiences in many cultures. Laments, sorrow songs, dirges, elegies, and mourning-song traditions are
evident around the world, especially in eastern Europe (e.g., Mazo, 1994; Seremetakis, 1991; Wilce, 2009),
Africa (e.g., Anyumba, 1964; Nketia, 1975), the Middle East and Asia (e.g., Naroditskaya, 2000; Racy,
1986; Wilce, 2002), and Oceania and Australia (e.g., Feld, 1982/1990; Magowan, 2007; Moyle, 1987).
Sadness has been one of the most commonly studied affects in research on music and emotion. At
least in the case of Western-enculturated listeners, both adults and children readily identify particular
melodies or passages as sounding “sad” (e.g., Dolgin & Adelson, 1990; Terwogt & Grinsven, 1991). The
ability of music to evoke or portray sadness or grief does not appear to be limited to Western music.
Anthropologists are not easily given to claims of universality, however, in the case of music-related grief, a
number of scholars have pointed to an apparent commonality. For example, writing about grief expressions
and situations in non-Western cultures, ethnomusicologist Greg Urban has noted that the experiences are
“transparently understandable, not in need of detailed ethnographic description.” (Urban, 2000, p. 151).

103
Empirical Musicology Review Vol. 7, No. 3-4, 2012

For Western-enculturated listeners, sadness has a long association with the minor mode. The
association of the minor third and minor triad with sadness was already described in the sixteenth century
by Zarlino (1558). Experimental studies by Heinlein (1928) and Hevner (1935) show that the minor mode
continues to evoke sad, dejected, or serious connotations for Western-enculturated listeners.
It is important to note that the minor mode is not solely associated with sadness. For Western-
enculturated listeners, the minor mode is also associated with exoticism (especially “orientalism”; Said,
1978), as well as passion and seriousness (Hevner, 1935). When the minor mode is linked to Kraepelin’s
cluster of acoustic cues for sadness (low pitch, small pitch movement, quiet, slow, mumbled articulation),
the resulting music has a strong tendency to exhibit a sad character. “Exceptions” to the minor sadness
association appear to confirm the need for this cluster of features. For example, W.A. Mozart’s popular
“Rondo Alla Turca” can hardly be described as “sad” despite the prevalence of the minor mode. Similarly,
the aria “He was Despised” from G.F. Handel’s Messiah is widely regarded as very sad, despite the use of
the major mode. However, the “Rondo Alla Turca” is fast-paced with relatively detached articulation;
whereas “He was Despised” is slow and quiet. In short, it is difficult to identify exceptions to the
minor sad association unless the music also violates one or more of the other acoustic features identified
by Kraepelin as characteristic of sad speech prosody. Other challenges to the minor sad association can be
found in the ethnomusicological literature. For example, scales similar to the Western harmonic minor can
be found in traditional Balkan music and in various North African and Middle Eastern musics, without any
accompanying association with sadness (Rice, 2004).
Most music scholars now reject the notion that the minor scale is inherently sad, and attribute its
sad connotations to learned associations. An early proponent of this view was Valentine (1913/1914), who
proposed that Ivan Pavlov’s recently-formulated concept of the conditioned reflex might account for the
sad connotations evoked by the minor mode:

The general significance of the minor key for modern European ears is not due to an effect
inherent in the relation of the notes in a minor interval, but is more probably the effect of
association...with the custom of setting sad songs to minor keys (Valentine, 1913/1914, p. 197-8).

In effect, Valentine proposed that the sad connotations of the minor mode originated in an early accidental
pairing of sad lyrics with the minor scale. This association then provided a bootstrap for a self-sustaining
cultural practice unique to Western European music-making.
This view implies that the association of the minor mode with sadness is arbitrary or random. For
example, the view implies that, with a different history, it might well have been the major mode that
formed a learned association with sadness whereas the minor mode formed a learned association with
happiness. Without suggesting that there is anything inherent about either the major or minor modes, and
without suggesting that the connotations of the minor mode are anything other than culture-bound, in the
study that follows, our data will suggest that the relationship between the major and minor modes may not
be arbitrary.
If the minor mode tends to be used to express or represent sadness for some listeners, then we
might expect to observe an association between the minor mode and those acoustic characteristics of sad
speech prosody identified by Kraepelin. That is, if the minor mode is commonly involved in the
expression of sadness, we might expect that, compared with major-mode music, music in the minor mode
is typically: (1) lower in overall pitch, (2) employs small pitch movements, (3) is quieter, (4) entails
“mumbled” articulation (e.g., legato), and (5) exhibits a slower tempo.
The results of several correlational studies are consistent with these conjectured associations. For
example, Huron (2008) calculated the average melodic interval size for nearly 10,000 Western classical
instrumental themes and found that the average interval size is slightly smaller for themes written in the
minor mode compared with themes written in the major mode. As small pitch movements are found in sad
speech prosody, one might speculate that small melodic interval sizes might also contribute to the
perception of sadness––at least for Western-enculturated listeners.
One might suppose that many factors contribute to the size of melodic intervals in music. One
factor is the scale itself. A pentatonic scale, for example, might encourage larger average melodic intervals
than the major scale merely because there are fewer scale tones per octave. Consider, by way of
illustration, the pentatonic scale consisting of the pitch classes C, D, E, G and A. For each pitch-class
pairing, we might calculate the minimum interval size in semitones (e.g., CD = 2, CE = 4, CG = 5,
CA = 3, DE = 2, etc.).[1] If all pitch transitions were equally probable, then the average interval class
for random pitch sequences in the common pentatonic mode would be 3.60 semitones. By contrast, if all

104
Empirical Musicology Review Vol. 7, No. 3-4, 2012

pitch-class transitions were equally probable, then the average interval class for random pitch sequences in
the major mode would be 3.43 semitones. Of course, in real music, not all pitch transitions are equally
probable, so the actual average interval size will depend on the nature of the melodic organization.
Rather than presuming that all pitch transitions are equally likely, in this study, we employ the
pitch transitions found in actual melodies. We will then explore the influence of scale structure on average
interval size. Specifically, we will begin with the major scale and make systematic pitch modifications; for
each modification we determine its effect on the average interval size for a sample of melodies. Our goal
is to determine which scale modifications most reduce the average melodic interval size. If smaller
interval sizes are robust acoustic cues for sadness, one might expect that the scale modification that results
in the smallest average interval size would also tend to most contribute to the perceived sadness of a
melody.
To anticipate our results, we will see that among the optimum modifications to the major scale,
lowering the third and sixth scale tones––as in the harmonic minor scale––provides one of the best means
for reducing the average melodic interval size. That is, the results are consistent with the idea that
transposing a major-mode melody to the harmonic minor scale is among the very best pitch-related
transformations that can be done to modify a major-mode melody in order to render a sad affect. Skeptics
are likely to view this result as too good to be true. Accordingly, in the final Discussion, we will consider
and identify the statistical properties of Western major-mode melodies that are the proximal cause of this
striking result.

METHOD

In brief, our method can be best understood by describing the following hypothetical scenario. Suppose a
composer has created several melodies in the major mode, and wants to transform these melodies so that
they sound sadder. Research in speech prosody suggests that making the melodic intervals smaller will
contribute to the perceived sadness. Rather than merely compressing all of the melodic intervals, our
composer wishes to achieve a smaller average interval size by modifying the pitches of the major scale. Are
there pitch modifications that can be made to the major scale that will, on average, transform most melodies
so that the average melodic interval is smaller?
Our method involved assembling three contrasting samples of melodies in the major mode,
calculating the average melodic interval size for each melody, and then determining whether the average
interval size increases or decreases for different scale modifications. In this study, we will limit ourselves to
modifications of the Western diatonic major scale. In principle, the same method could be applied to any
scale from any culture––and so the potential cross-cultural generality of the hypothesis might be tested.
In modifying the pitch of a scale tone, one consideration is the amount of modification. In the case
of the Western equally-tempered pitch set, the smallest modification would be a semitone. Larger
modifications are possible, but they risk potential perceptual confusions. Obviously, if we were to raise the
second scale degree (supertonic) by 2 semitones the resulting scale tone would be indistinguishable from
the third scale degree (mediant). It is therefore unlikely that the resulting pitch would be heard as a
“modified supertonic.” In the major scale, any modification greater than one semitone will lead to such
confusions. Moreover, even in the case of semitone modifications, there are four changes that will also lead
to similar scale-degree confusions: raising the third scale tone, lowering the fourth scale tone, raising the
seventh scale tone, and lowering the tonic. A further restriction is that the tonic pitch remains fixed: one
might reasonably argue that changing the tonic will cause the scale to be perceived as an entirely different
scale rather than as a modified version of some original. For similar reasons, we will avoid modifying more
than half of the scale tones at a time. That is, we will consider only modifying up to three scale tones. We
conjecture that listeners would tend to hear four or more modified pitches as an entirely different scale––
rather than a modified major scale. Accordingly, we will limit ourselves to one, two or three pitch
modifications of just one semitone; we will avoid modifying the tonic, and we will exclude from
consideration those semitone modifications that result in pitches equivalent to other scale tones.
Intuitively, one might suppose that lowering one or another scale tone might tend to result in
smaller melodic intervals. However, this is not necessarily the case. Lowering a scale tone might well result
in larger average melodic intervals. If, in some hypothetical culture, all of the scale tones occurred with
equal probability and the successions of different scale tones were randomly ordered, then modifying one
of the scale tones would have no effect on the average interval size. Figure 1 illustrates this effect for a
hypothetical five-note scale. In this case, the third tone has been lowered. Notice that intervals such as 1-3
and 2-3 will be smaller; however, the intervals 3-4 and 3-5 will be larger in size. In this case, it is not clear

105
Empirical Musicology Review Vol. 7, No. 3-4, 2012

that lowering the third scale tone would have any effect whatsoever on the average interval size for
melodies employing this scale.

Original Scale

Modified Scale
(3rd tone lowered)

Smaller Intervals Larger Intervals

Fig. 1. Effect of pitch modification for a hypothetical five-note scale. Lowering the third scale tone (lower
image) causes intervals such as 1-3 and 2-3 to be smaller, whereas intervals 3-4 and 3-5 are larger. If all
pitches occur with equivalent frequency and all pitch transitions are equally likely, then modifying a scale
tone should have no overall effect on the average interval size.

In real music, however, scale tones are not equally common, and some tone successions occur
much more frequently than others. Figure 2 shows the probability of different scale-tone successions for a
large sample of Germanic folk melodies in the major mode (from Huron, 2006). The width of each arrow is
proportional to the probability that one tone is followed by another. Pitch transitions whose probabilities are
less than 0.015 are not indicated. Notice that there is considerable variability in the likelihood of different
tone successions. For example, the third and fourth scale tones are closely linked. In the case of the seventh
scale tone, there is a much stronger connection to the first scale tone (7-1) than to the sixth tone (6-7). In
fact, there is so little movement between 6 and 7 that the probability falls below the 0.015 threshold needed
to draw a line connecting them. In the major scale, if we were to lower the seventh scale tone by one
semitone, the distance between 6 and 7 would be reduced, but the distance between 7 and 1 would be
increased. Since alternations between 7 and 1 are more common than between 6 and 7, the likely
consequence of lowering the seventh scale tone would be to increase the average melodic interval size.
Given the fact that some scale tones are more common than others, and that some successions
between scale-tones occur more frequently, modifying the pitch of a given tone might well be expected to
influence the average melodic interval size for common melodies, unlike the neutral situation illustrated in
Figure 1. Accordingly, we will systematically modify the different scale tones and measure the effect on the
average interval size for samples of real major-mode melodies. In order to pursue this approach, we must
first identify a suitable sample of melodies.

106
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Fig. 2. The probability of different scale tone successions for a very large sample of melodies in the major
mode. Frequency of occurrence is proportional to the width of connecting lines. Lines are drawn only for
those transitions with a probability of 0.015 or greater (Huron 2006).

SAMPLE

Since we are interested in melodic intervals, we restricted our musical sample to single-line melodies and
themes. For this study, only melodies in the major mode were used. Creating a “representative” sample of
music in the major mode raises significant challenges. Instead of attempting to create a broadly
representative sample, we proposed to examine three subsamples, each exhibiting a different set of stylistic
properties and biases. Specifically, our three subsamples included:
1) 151 major-mode national anthems including countries such as Albania, Colombia, North Korea,
Palau and Zimbabwe. Although the anthems originate from different cultures, they all share a basically
Western-European musical template which includes the tendency to employ the major scale. This sample
was selected from a longer list of 195 national anthems. Roughly 20 anthems were deemed to be not clearly
in the major mode and so were excluded from the sample. In addition, in order to minimize the potentially
confounding effect of modulation, anthems longer than 50 measures in length were excluded. According to
this latter criterion another 20-odd anthems were eliminated.
2) 935 randomly selected major-mode instrumental themes from the Barlow and Morgenstern
Dictionary of Musical Themes (Barlow & Morgenstern, 1948). This collection includes themes from the
period-of-common-practice Western art-music literature with a bias towards orchestral works from the 19th
century.
3) 103 major-mode works from a collection of songs most well-known to residents of the United
States, including such songs as “Jingle Bells,” “Happy Birthday,” “Three Blind Mice,” “Frosty the
Snowman,” “Yankee Doodle,” “Mary Had a Little Lamb,” and “Auld Lang Syne.”
Although we would not claim that our sample of major-mode melodies is representative of
Western music-making in general, our three samples includes both instrumental and vocal works, music of
European and American origin, art-music and popular styles, music spanning a period of roughly four
centuries, and instances of music from a genre that spans many countries. The sample also includes whole
melodies and shorter thematic statements. It should be noted that the average passage lengths differ for the

107
Empirical Musicology Review Vol. 7, No. 3-4, 2012

three samples. For the national anthems, the average length was 104 notes; for the American songs, the
average length was 53 notes; and for the Barlow and Morgenstern instrumental themes, the average length
was 20 notes. In general, effect sizes are likely to be smaller for the shorter sampled passages.
All of the musical materials were encoded in a computer database and were processed using the
Humdrum Toolkit (Huron, 1995). In total, our sample represents passages from some 1,189 individual
works involving roughly 40,000 notes. In carrying out our analyses, we will report separate results for each
of the three samples.

PROCEDURE

For each work in the three samples, the average melodic interval size was measured only for immediately
consecutive tones. No intervals were calculated for tones separated by a rest. Figure 3 illustrates the
measurement method for the beginning of “Mary Had a Little Lamb.” Figure 3a shows the original
(unmodified) song in C major. The overall average interval size is indicated below each example. Figure 3b
shows what happens when the third scale tone (E) is lowered by one semitone. Five intervals are affected,
and the overall average interval size is reduced slightly.

Mary Had a Little Lamb Mary Had a Little Lamb


(Normal Version) (Modified Version)

Average Interval Size: Average Interval Size:


1.33 semitones 1.00 semitones

Fig. 3. The effect of lowering the third scale tone (E) for the beginning of “Mary Had a Little Lamb.”
Fig. 3a shows the original (unmodified) song in C major. The overall average interval size is indicated
below each example. Fig. 3b shows what happens when the third scale tone (E) is lowered by one semitone.

In interpreting the results, readers should bear in mind the four restrictions imposed in our
simulations:
(1) pitch modifications of only one semitone, (2) tonic remains unmodified, (3) modifications that result in
pitches equivalent to other scale tones are avoided, (4) no more than three pitch modifications.

RESULTS

One-Pitch Manipulations

In our first simulation, we modified a single scale-tone by either raising or lowering it by one semitone.
With seven scale tones there would nominally be 14 possible up/down modifications, however, with our
imposed restrictions that number drops to 9. Once again, we avoided modifying the tonic, raising the 3rd or
7th scale tones, and lowering the 4th scale tone. After making the appropriate scale modification, we re-
calculated the average melodic interval size for each of the 1,189 passages in our musical sample. For each
sampled passage, the average interval size for the modified scale was compared with the average interval
size for the original major-mode version. The modified passage can then be classified as exhibiting either a
smaller average interval size, a larger average interval size, or no change in interval size. We could have
amalgamated all of the melodies together and compared the means of the interval distributions for the
modified and unmodified melodies. However, we a priori elected to treat each melody as a single
observation in order to maximize data independence.
Table 1 identifies the effect on average interval size for each of the scale modifications according
to the three different musical samples. Numerical values represent the percentage of melodies exhibiting
smaller average interval size for the given scale modification––excluding those melodies for which the
modification had no effect on the average interval size. For example, suppose that a given modification
produced 35 melodies whose average interval size decreased, 65 melodies whose average interval size
increased, and 25 melodies whose average interval size did not change. Discarding the unchanged
melodies, 35% of the remaining melodies exhibited a decrease in the average melodic interval. Values

108
Empirical Musicology Review Vol. 7, No. 3-4, 2012

around 50% suggest that the modification has little appreciable effect on interval sizes. Values larger than
50% suggest that the modification tends to cause melodies to have a smaller average interval size than the
unmodified major-mode version. The right-most column (Average) provides a summary result for all three
samples, where all samples are weighted equally. As can be seen in Table 1, the greatest decrease in average
interval size occurs when scale degree 6 is lowered by one semitone: 68.0% of major-mode passages
exhibit smaller average interval sizes when this scale modification is made. The second greatest decrease
(60.3%) occurs when scale degree 3 is lowered.

Table 1

Modification National Anthems American Songs Instrumental Themes Average


2♭ 42.1 52.0 37.2 43.8
2# 57.1 45.9 62.2 55.1
3♭ 55.0 69.8 56.0 60.3
4# 24.8 7.1 33.1 21.7
5♭ 42.6 42.6 47.0 44.0
5# 55.7 57.4 53.0 55.4
6♭ 73.3 65.3 65.4 68.0
6# 26.2 34.7 34.0 31.6
7♭ 29.4 16.7 31.1 25.7

Two-Pitch Manipulations

Tables 2a-c report the results of our second simulation in which two pitches were concurrently modified.
Separate tables are provided for each of the three musical samples. Note that these results need not echo the
results of the one-pitch manipulations since changing two pitches at the same time causes complex
interactions to occur. Once again, results are reported as the percent of melodies exhibiting a smaller
average interval size for the given scale modification––excluding those melodies for which the
modification had no effect on the average interval size.
Table 2d provides numerical averages for Tables 2a-c. As can be seen, the greatest effect occurs
when the 3rd and 6th scale tones are concurrently lowered by one semitone: 68.0% of major-mode passages
exhibit smaller average interval sizes when these scale modifications are made. The second largest effect
(63.6%) occurs when the lowered 6th scale is linked with a raised 2nd scale tone. The third largest effects
(60.7) occur with raised 2 and raised 5.

Table 2a

National Anthems 3♭ 4# 5♭ 5# 6♭ 6# 7♭
2♭ 50.4 35.4 33.6 52.8 63.7 33.6 34.3
2# 53.9 43.9 45.5 65.7 66.4 36.0 45.4
3♭ . 41.0 51.0 53.5 70.6 41.1 40.8
4# . . 20.4 44.1 48.9 13.5 22.0
5♭ . . . . 59.0 32.6 32.1
5# . . . . 64.8 39.2 45.7
6♭ . . . . . . 59.6
6# . . . . . . 16.8

109
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Table 2b

American Songs 3♭ 4# 5♭ 5# 6♭ 6# 7♭
2♭ 64.2 38.0 40.0 57.3 59.3 40.2 37.0
2# 54.3 32.9 40.4 58.9 59.8 40.0 42.1
3♭ . 57.0 52.1 66.0 70.2 60.2 60.0
4# . . 31.2 45.7 39.0 18.8 9.4
5♭ . . . . 51.0 40.6 34.7
5# . . . . 58.3 49.0 48.9
6♭ . . . . . . 51.3
6# . . . . . . 20.8

Table 2c

Instrumental Themes 3♭ 4# 5♭ 5# 6♭ 6# 7♭
2♭ 47.9 37.5 42.1 46.6 51.6 35.1 31.6
2# 48.8 49.0 52.9 57.5 64.5 47.4 52.6
3♭ . 47.9 53.3 56.0 63.0 47.7 47.6
4# . . 30.6 46.0 48.8 29.5 31.6
5♭ . . . . 55.2 44.2 42.6
5# . . . . 50.3 43.1 42.9
6♭ . . . . . . 53.4
6# . . . . . . 24.9

Table 2d

Summary results for


Tables 2a-c 3♭ 4# 5♭ 5# 6♭ 6# 7♭
2♭ 54.2 37.0 38.5 52.2 58.2 36.3 34.3
2# 52.4 41.9 46.3 60.7 63.6 41.1 46.7
3♭ . 48.6 52.1 58.5 68.0 49.7 49.5
4# . . 27.4 45.3 45.6 20.6 21.0
5♭ . . . . 55.1 39.2 36.5
5# . . . . 57.8 43.8 45.8
6♭ . . . . . . 54.8
6# . . . . . . 20.8

Three-Pitch Manipulations

Tables 3a-c report the results of our third simulation in which three pitches were concurrently modified.
Table 3d provides numerical averages for Tables 3a-c. Two scale modifications tie for exhibiting the
greatest effect on reducing the average melodic interval size: (1) lowered 2, 3 and 6, and (2) lowered 3 and
6 with raised 5 (both at 62.9%). The second largest effect (58.9%) occurs with lowered 3, 6, and 7. The
third largest effect (57.6%) occurs with lowered 6 and 7, with raised 2.

110
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Table 3a

National Anthems 4# 5♭ 5# 6♭ 6# 7♭
2♭, 3♭ 41.0 44.0 51.4 65.5 40.7 40.0
2♭, 4# . . 45.0 52.8 27.6 32.1
2♭, 5♭ . . . 46.5 29.3 3.3
2♭, 5# . . . . 39.6 47.3
2♭, 6♭ . . . . . 56.0
2#, 4# . . 46.5 58.7 22.5 39.3
2#, 5♭ . . . 58.8 37.8 41.9
2#, 5# . . . . 50.7 53.7
2#, 6♭ . . . . . 60.0
3♭, 4# . . 49.3 52.1 30.6 32.4
3♭,5♭ . . . 61.0 39.5 41.4
3♭, 5# . . . 65.5 47.3 47.3
3♭, 6♭ . . . . . 58.6
4#, 5# . . . . 35.9 35.1
4#, 6♭ . . . . . 46.5
5♭, 6♭ . . . . . 62.1

Table 3b

American Songs 4# 5♭ 5# 6♭ 6# 7♭
2♭, 3♭ 54.6 58.0 65.3 69.5 58.9 60.0
2♭, 4# . . 51.5 44.9 32.6 28.8
2♭, 5♭ . . . 42.3 36.8 31.5
2♭, 5# . . . . 50.5 52.0
2♭, 6♭ . . . . . 46.3
2#, 4# . . 44.7 46.7 24.4 30.1
2#, 5♭ . . . 48.5 38.1 36.0
2#, 5# . . . . 57.3 47.3
2#, 6♭ . . . . . 53.5
3♭, 4# . . 57.7 58.7 51.1 50.0
3♭, 5♭ . . . 58.4 47.9 44.9
3♭, 5# . . . 67.0 65.0 60.6
3♭, 6♭ . . . . . 62.8
4#, 5# . . . . 45.9 39.8
4#, 6♭ . . . . . 39.3
5♭,6♭ . . . . . 44.4

111
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Table 3c

Instrumental Themes 4# 5♭ 5# 6♭ 6# 7♭
2♭, 3♭ 45.5 45.2 50.1 53.7 44.2 41.6
2♭, 4# . . 43.4 45.5 33.1 32.7
2♭, 5♭ . . . 47.7 38.3 35.9
2♭, 5# . . . . 40.1 41.6
2♭, 6♭ . . . . . 44.6
2#, 4# . . 50.3 55.2 38.8 45.3
2#, 5♭ . . . 58.5 49.3 49.4
2#, 5# . . . . 49.4 50.7
2#, 6♭ . . . . . 59.2
3♭, 4# . . 52.0 53.2 42.7 43.5
3♭, 5♭ . . . 59.5 49.6 46.0
3♭, 5# . . . 56.1 49.9 50.8
3♭, 6♭ . . . . . 55.2
4#, 5# . . . . 38.6 38.8
4#, 6♭ . . . . . 44.1
5♭, 6♭ . . . . . 48.9

Table 3d

Summary results for


Tables 3a-c 4# 5♭ 5# 6♭ 6# 7♭
2♭, 3♭ 47.0 49.1 55.6 62.9 47.9 42.7
2♭, 4# . . 46.6 47.7 31.1 31.2
2♭, 5♭ . . . 45.5 34.8 23.5
2♭, 5# . . . . 43.4 47.0
2♭, 6♭ . . . . . 49.0
2#, 4# . . 47.2 53.6 28.6 38.2
2#, 5♭ . . . 55.3 41.8 42.4
2#, 5# . . . . 52.5 50.6
2#, 6♭ . . . . . 57.6
3♭, 4# . . 53.0 54.7 41.5 42.0
3♭, 5♭ . . . 59.6 45.6 44.1
3♭, 5# . . . 62.9 54.1 52.9
3♭, 6♭ . . . . . 58.9
4#, 5# . . . . 40.1 37.9
4#, 6♭ . . . . . 43.3
5♭, 6♭ . . . . . 51.8

DISCUSSION

Since the percentages reported in Tables 1, 2 and 3 were all calculated using the same method, we can
compare the values between tables. If we ask, of all the 1-, 2- or 3-tone modifications examined in this
study, which produced the greatest proportion of passages exhibiting smaller average melodic interval
sizes, the answer is a tie: (1) lowering the sixth scale tone alone, and (2) lowering the third and sixth scale
tones concurrently. Notice that the summary percentages for three-pitch modifications are generally lower
than for one- and two-pitch modifications. That is, modifying three pitches does not appear to produce a

112
Empirical Musicology Review Vol. 7, No. 3-4, 2012

more effective way of reducing average melodic intervals compared with either one- or two-pitch
modifications.
By way of summary, for melodic passages in the major mode, those scale modifications that most
reduce the average melodic interval size involve either lowering the 6th scale tone, or lowering the 3rd and
6th scale tones. Musicians will recognize the latter modification as the harmonic minor scale—the most
common form of the minor mode. If (as the speech prosody research suggests) small pitch movement is
associated with conveying or representing sad affect, and if a Western-enculturated musician aims to
modify a major-mode melody so that it conveys or represents sadness, then it appears that lowering the
third and sixth scale tones is likely to provide an optimum, or near optimum, solution.

Why?

Skeptics are likely to view the above results as too good to be true. Why, one might ask are the lowered
third and sixth scale tones so effective in reducing the average melodic intervals for major-mode melodies?
The answer is evident in Figure 2 which summarizes the probabilities for different scale-degree
successions. The sixth scale degree has a strong affinity for the fifth scale degree. At the same time, the
seventh scale degree has a strong affinity for the tonic. Lowering 6 reduces the distance between 5 and 6.
Normally, this effect would be offset by the increased distance between 6 and 7, however, since there is
relatively little activity between 6 and 7, the reduced melodic distance is not entirely offset by the increased
distance between 6 and 7.
In the case of the third scale degree, there is more activity between 3 and 2 than there is between 4
and 3. In addition, there is more activity between 3 and 1 than between 5 and 3. Together, these
asymmetries mean that lowering the third scale tone produces more smaller intervals than larger intervals.
Since the asymmetry is greatest for 5-6 and 6-7, lowering 6 has a greater impact than lowering 3. In short,
the results of our computationally intensive simulations may already have been inferred from the raw
statistical probabilities of scale-degree transitions for major-mode melodies. Said another way, the apparent
effectiveness of the harmonic minor mode in reducing melodic interval sizes can be directly attributed to
the statistical properties of major-mode melodies themselves.

CONCLUSION

In this study we have sought to explain one small aspect of how sad affect may be represented or conveyed
to Western-enculturated listeners. Existing perceptual research suggests that small pitch movement is
associated with sad affect in speech and other auditory stimuli. By modifying one, two, and three pitches
for major-mode melodies and comparing the average interval sizes to the unmodified versions, we were
able to calculate which modifications could best contribute to reducing the overall melodic interval size,
and so potentially contribute to the melody’s ability to convey or portray sadness.
For single-note modifications, we found that lowering the sixth scale degree provided the best
choice for reducing the overall average interval size. The lowered sixth was followed closely by the
lowered third. The reason why these scale tones have such a marked impact on average interval size arises
from common scale-degree patterns in major-mode melodies. Specifically, the proximal cause is the
relative absence of melodic “traffic” between 6 and 7, along with a preponderance of melodic traffic
between 6 and 5 and between 3 and 2.
Two-note modifications produce many complex interactions. Nevertheless, we found that
simultaneously lowering the third and sixth scale degrees produced an optimum or near optimal effect in
reducing the average melodic interval size for our sample of major-mode melodies. Of course, these
modifications result in the common harmonic minor scale.
Three-note modifications produced a somewhat more complicated picture. The natural minor was
not the most effective modification in lowering the average interval size. None of the three-pitch
modifications proved as effective as either the one-pitch modification of lowering 6, or the two-pitch
modification of lowering 6 and 3.
In general, the results replicate Huron (2008) who calculated the average melodic interval size for
nearly 10,000 Western classical instrumental themes and found that the average interval size is slightly
smaller for themes written in the minor mode compared with the major mode. However, the current study
extends the earlier work by demonstrating that the harmonic minor mode represents an optimum or near
optimum transformation for major-mode melodies if the aim is to minimize average melodic interval size.

113
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Caveats

It is important to place the results of this study in appropriate context. First, as can be seen from Tables 1-3,
there is considerable variability in the results between the three musical samples. Since the melodic
passages used in our study do not necessarily constitute a representative sample of Western major-mode
melodies, the generalizability of the results remains uncertain. Second, the results reported here say nothing
about the relationship between the major and harmonic minor scales per se. Rather, the results depend on
the structure of major-mode melodies. It is not that the minor and major scales are co-adapted for
contrasting sad and non-sad expression. Instead, it is that the harmonic minor mode is well adapted to
major-mode melodies for contrasting sad and non-sad expression. Third, our results say nothing about the
historical relationship between the major and minor modes. For example, the results should not be viewed
as suggesting that the major scale arose first, and then the harmonic minor scale grew out of this, or that the
existence of the minor mode ultimately reshaped major-mode melody-writing so as to enhance the
intervallic contrast. Fourth, although it is possible that the major and harmonic minor scales have some sort
of “natural” origin, the results in this study cannot be interpreted as demonstrating any natural origin for
either scale.
Fifth, it is important to recognize that this study has addressed only one factor known to contribute
to auditory-related sadness for Western-enculturated listeners. In addition to smaller pitch movement, other
factors known to contribute to sad affect include quieter dynamic level (e.g., Turner & Huron, 2008),
slower tempo (e.g., Post & Huron, 2009), darker timbre and more legato articulation (e.g., Schutz, Huron,
Keeton, & Loewer, 2008), and lower than normal pitch (e.g., Huron, 2008; Huron, Yim, & Chordia, 2010).
Transposing a passage from major to minor mode is not, in itself, sufficient to evoke, express, or represent
a sad affect (e.g., Ladinig & Huron, 2010). Nevertheless, for Western-enculturated listeners, transposition
to the minor mode has a strong likelihood of making a passage sound sadder (Heinlein, 1928; Hevner,
1935).
Having shown that the harmonic minor scale is among the optimum transformations for reducing
the average interval size for major-mode melodies, one might ask whether a reverse relationship is also
evident. That is, suppose one assembled a sample of minor-mode melodies and posed the question: what
modifications to the minor scale would most increase the average interval size for minor-mode melodies?
Is the major mode an optimum scale for increasing the melodic interval size of minor-mode melodies––and
so for transforming a nominally sad melody into a more joyful or happy passage? We did not pursue this
reverse test because of difficulties assembling a sample of melodies that are unambiguously in the minor
mode. Even short melodies (such as Greensleeves) commonly exhibit mixed major- and minor-mode
passages. In addition, minor-mode melodies often employ melodic minor contours in which the 6th and 7th
scale tones are treated differently in ascending and descending contexts. In principle, a future research
project could establish a systematic method for handling the various distinctive properties and exceptions
found in minor-mode melodies.
The phenomenon apparent in this study may or may not be limited to Western culture. In principle,
the cultural generality or cultural specificity of this effect can be empirically tested (cf. Moore, 2008). If, in
some non-Western culture, there exists an association where one mode is linked to “sadness” and another
mode is not so linked, then the conjecture of cultural generality can be tested by measuring the effect on
average interval size when musical lines in one mode are modified to conform to the other. Failing to show
such an effect would suggest that the relationship identified in our study is specific to Western culture.
Although the results of such a test might invite speculation, a scholarly approach would warrant actual
empirical study.[2]

NOTE

[1] Among music theorists, these minimum interval measures are referred to as interval classes (Rahn,
1980). For any pair of pitch-classes, the corresponding interval-class ignores pitch direction, collapses
octave and compound (supra-octave) intervals into their sub-octave equivalents, and also treats inversions
as equivalent.

[2] An earlier version of this study was published as Huron & Davis (2010).

114
Empirical Musicology Review Vol. 7, No. 3-4, 2012

REFERENCES

Anyumba, H. Owuor (1964). The Nyatiti Lament Songs. East Africa Past and Present. Paris: Preésence
africaine.

Banse, R., & Scherer, K.R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality
and Social Psychology, Vol. 70, pp. 614–636.

Barlow, H., & Morgenstern, S. (1948). A Dictionary of Musical Themes. New York: Crown.

Bergmann, G., Goldbeck, T., & Scherer, K.R. (1988). Emotionale Eindruckswirkung von prosodischen
Sprechmerkmalen. Zeitschrift für Experimentelle und Angewandte Psychologie, Vol. 35, pp. 167–200.

Breitenstein, C., van Lancker, D., & Daum, I. (2001). The contribution of speech rate and pitch variation to
the perception of vocal emotions in a German and an American sample. Cognition & Emotion, Vol. 15, pp.
57–79.

Davitz, J.R. (1964). Auditory correlates of vocal expressions of emotional meanings. In J.R. Davitz (Ed.),
Communication of Emotional Meaning. New York: McGraw-Hill, pp. 101–112.

Dolgin, K.G., & Adelson, E.H. (1990). Age changes in the ability to interpret affect in sung and
instrumentally presented melodies. Psychology of Music, Vol. 8, pp. 87–98.

Eldred, S.H., & Price, D.B. (1958). A linguistic evaluation of feeling states in psychotherapy. Psychiatry,
Vol. 21, pp. 115–121.

Fairbanks, G., & Pronovost, W. (1939). An experimental study of the pitch characteristics of the voice
during the expression of emotion. Speech Monographs, Vol. 6, pp. 87–104.

Feld, S. (1982/1990). Sound and Sentiment: Birds, Weeping, Poetics, and Song in Kaluli Expression (2nd
ed.). Philadelphia: University of Pennsylvania Press.

Heinlein, C.P. (1928). The affective characteristics of the major and minor modes in music. Journal of
Comparative Psychology, Vol. 8, pp. 101–142.

Hevner, K. (1935). The affective character of the major and minor mode in music. American Journal of
Psychology, Vol. 47, pp. 103–118.

Huron, D. (1995). The Humdrum Toolkit: Reference Manual. Menlo Park, California: Center for Computer
Assisted Research in the Humanities.

Huron, D. (2006). Sweet Anticipation: Music and the Psychology of Expectation. Cambridge,
Massachusetts: MIT Press.

Huron, D. (2008). A comparison of average pitch height and interval size in major- and minor-key themes:
Evidence consistent with affect-related pitch prosody. Empirical Musicology Review, Vol. 3, pp. 59–63.

Huron, D., & Davis, M.J. (2010). The effect of scale degree modifications on average interval size. In S.M.
Demorest, S.J. Morrison, & P.S. Campbell (Eds.), Proceedings of the 11th International Conference on
Music Perception and Cognition. Seattle, Washington: Causal Productions, pp. 439–444.

Huron, D., Yim, G., & Chordia, P. (2010). The effect of pitch exposure on sadness judgments: An
association between sadness and lower than normal pitch. In S.M. Demorest, S.J. Morrison, & P.S.
Campbell (eds.), Proceedings of the 11th International Conference on Music Perception and Cognition.
Seattle, Washington: Causal Productions, pp. 63–66.

115
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Huttar, G. (1968). Relations between prosodic variables and emotions in normal American English
utterances. Journal of Speech and Hearing Research, Vol. 11, pp. 467–480.

Kraepelin, E. (1899/1921). Psychiatrie. Ein Lehrbuch für Studierende und Ärzte, ed. 2. Klinische
Psychiatrie. II. Leipzig: Johann Ambrosius Barth, 1899. Trans. By R. M. Barclay as Manic-Depressive
Insanity and Paranoia. Edinburgh: E. & S. Livingstone. 1921.

Ladinig, O., & Huron, D. (2010). Dynamic levels in Classical and Romantic keyboard music: Effect of
musical mode. Empirical Musicology Review, Vol. 5, pp. 27–35.

Magowan, F. (2007). Melodies of Mourning: Music and Emotion in Northern Australia. Crawley, Australia:
University of Western Australia Press.

Mazo, M. (1994). Lament made visible: A study of paramusical elements in Russian lament. In: B. Yung &
J.S.C. Lam (Eds.), Themes and Variations: Writings on Music in Honor of Rulan Chao Pian. Hong Kong:
Harvard Department of Music and the Center for Chinese Studies, The Chinese University of Hong Kong,
pp. 164–211.

Moore, S. (2008). The Doom of the Flattened Supertonic: The ‘other leading note’ in Turkish makam,
Indian raga, klezmer and heavy metal musics. MA Thesis in World Music Studies, Department of Music,
University of Sheffield.

Moyle, R. (1987). Tongan Music. Auckland, NZ: Auckland University Press.

Naroditskaya, I. (2000). Azerbaijanian female musicians: Women’s voices defying and defining the culture.
Ethnomusicology, Vol. 44, pp. 234–256.

Nketia, J.H.K. (1975). The Music of Africa. London: Gollancz.

Post, O., & Huron, D. (2009). Music in minor modes is slower (Except in the Romantic Period). Empirical
Musicology Review, Vol. 4, pp. 1–9.

Racy, A.J. (1986) Lebanese laments: Grief, music, and cultural values. The World of Music, Vol. 28, pp. 27–
40.

Rahn, J. (1980). Basic Atonal Theory. Prentice-Hall.

Rice, T. (2004). Music in Bulgaria: Experiencing Music, Expressing Culture. Oxford: Oxford University
Press.

Said, E.W. (1978). Orientalism. New York: Vintage.

Schutz, M., Huron, D., Keeton, K., & Loewer, G. (2008). The happy xylophone: Acoustic affordances
restrict an emotional palate. Empirical Musicology Review, Vol. 3, pp. 126–135.

Seremetakis, C.N. (1991). The Last Word: Women, Death, and Divination in Inner Mani. Chicago:
University of Chicago Press.

Skinner, E.R. (1935) A calibrated recording and analysis of the pitch, force and quality of vocal tones
expressing happiness and sadness. Speech Monographs, Vol. 2, pp. 81–137.

Sobin, C., & Alpert, M. (1999). Emotion in speech: The attributes of fear, anger, sadness, and joy. Journal
of Psycholinguistic Research, Vol. 28, pp. 347–365.

Terwogt, M.M., & Van Grinsven, F. (1991). Musical expression of moodstates. Psychology of Music, Vol.
19, pp. 99–109.

116
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Turner, B. & Huron, D. (2008). A comparison of dynamics in major- and minor-key works. Empirical
Musicology Review, Vol. 3, pp. 64–68.

Urban, G. (2000). A Discourse-Centered Approach to Culture: Native South American Myths and Rituals.
Tucson, Arizona: Hats Off Books, 2nd edition.

Valentine, C.W. (1913/1914). The aesthetic appreciation of musical intervals among school children and
adults. British Journal of Psychiatry, Vol. 6, pp. 190–216.

Wilce, J.M. (2002). Genres of memory and the memory of genres: Forgetting lament in Bangladesh.
Comparative Studies in Society and History, Vol. 44, pp. 159–185.

Wilce, J.M. (2009). Crying Shame: Metaculture, Modernity, and the Exaggerated Death of Lament.
Chichester, UK: Wiley-Blackwell.

Williams, C.E., & Stevens, K.N. (1972). Emotions and speech: Some acoustic correlates. Journal of the
Acoustical Society of America, Vol. 52, pp. 1238–1250.

Zarlino, G. (1558). Le istitutioni harmoniche.

APPENDIX

National Anthems Used

Afghanistan, Albania, Andorra, Angola, Antigua and Barbados, Armenia, Australia, Austria, Bahamas,
Barbados, Belarus, Belgium, Belize, Benin, Bhutan, Bosnia and Herzegovina, Botswana, Brunei, Burkina
Faso, Cambodia, Cameroon, Canada, Cape Verde, Chad, China, Colombia, Comoros, Congo, Congolaise,
Costa Rica, Croatia, Cuba, Cyprus, Czech Republic, Denmark, Djibouti, Dominica, Dominican Republic,
Egypt, Equatorial Guinea, Eritrea, Estonia, Fiji, Finland, France, Gambia, Germany, Ghana, Gibraltar,
Greece, Grenada, Guatemala, Guinea, Guinea Bissau, Guyana, Haiti, Hungary, Iceland, Indonesia, Iran,
Iraq, Ireland, Isle of Man, Ivory Coast, Jamaica, Jordan, Kazakhstan, Kiribati, North Korea, South Korea,
Kyrgyzstan, Laos, Latvia, Lebanon, Lesotho, Liberia, Libya, Liechtenstein, Lithuania, Luxembourg,
Macedonia, Madagascar, Malawi, Malaysia, Maldives, Mali, Malta, Marshall Islands, Mauritius, Mexico,
Micronesia, Moldova, Monaco, Mongolia, Mozambique, Myanmar, Namibia, Nauru, Nepal, Netherlands,
New Zealand, Nicaragua, Nigeria, Norway, Oman, Palau, Papua New Guinea, Poland, Portugal, Principe,
Russia, Rwanda, Saint Lucia, San Marino, Saudi Arabia, Senegal, Serbia and Montenegro, Seychelles,
Sierra Leone, Singapore, Slovenia, Solomon Islands, Somalia, Sri Lanka, St. Kitts and Nevis, St. Vincent
and Grenadines, Sudan, Surinam, Swaziland, Sweden, Switzerland, Syria, Taiwan, Tajikistan, Tanzania,
Thailand, Togo, Tonga, Tunisia, Tuvalu, USA, Uganda, United Arab Emirates, United Kingdom,
Uzbekistan, Vanuatu, Vietnam, Wales, Yemen, Zambia, and Zimbabwe.

AUTHOR NOTE

Correspondence concerning this article should be addressed to:

David Huron, School of Music, 1866 College Road, Ohio State University, Columbus, Ohio, 43210, USA,
huron.1@osu.edu

117
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Major-Minor Tonality, Schenkerian Prolongation, and


Emotion: A commentary on Huron and Davis (2012)
RICHARD PARNCUTT
University of Graz, Austria

ABSTRACT: On average, melodies in minor keys have smaller intervals between


successive tones than melodies in major keys—consistent with the emotional
difference between major and minor (Huron, 2008). Huron and Davis (2012)
additionally showed that a part of this difference is inherent in the structure of major
and minor scales, in combination with typical patterns of transition between scale
steps: if one takes a typical major melody and lowers scale steps 3 and 6 by a semitone,
then the average interval size is optimally reduced. I present an alternative theory of the
origin of major and minor scales/keys and their emotional connotations. Huron’s
(2006) data on scale-step transitions in typical melodies is consistent with Schenker’s
(1922, 1935) idea that a piece of tonal music can be interpreted as a prolongation of its
tonic triad (mediated by the Ursatz). The emotional difference between major and
minor may ultimately and primarily depend on the third of the tonic triad in the
psychological background. Major music may tend toward positive valence simply
because emotionally positive music is more common than emotionally negative music,
and major triads and keys are more common than minor. Minor music may tend toward
negative valence simply because scale degrees 3 and 6 sound lower than expected,
consistent with emotional cues in speech (Huron, 2008).

Submitted 2012 October 20; accepted 12 December 2012.

KEYWORDS: major, minor, emotion, Schenker, prolongation

WHY are major and minor scales like they are (with a specific ordering of tones and semitones) and
not completely different? Why are major keys associated with positive emotional valence (happiness,
contentment, serenity, grace, tenderness, elation, joy, victory, majesty…), and minor with negative
emotional valence (sadness, anger, fear, tension, solemnity, lament, tragedy…)? For Meyer (1956), “the
minor mode is not only associated with intense feeling in general but with the delineation of sadness,
suffering and anguish in particular” (p. 227). Why? One would think that music psychologists would
have answered these apparently simple questions by now. Evidently we have not, but things are moving
in a promising direction. The contribution by Huron and Davis (2012) is a significant step towards a
new explanation, and it also has interesting broader implications. In this extended commentary, I will
present a new approach that builds upon their work.
Consider first the origin of the ordering of tones and semitones in major and minor scales. I
addressed that question in Parncutt (2011a). My basic assumption was that any passage of music in a
major or minor key may be considered a Schenkerian prolongation of its tonic triad. I will examine this
idea in detail below. For the moment, allow me to quickly consider the relationship between tones of
the tonic triad and scale degrees in major-minor tonality (MmT).
In Parncutt (2011a), I proposed that scale degrees in major and minor keys may be divided
into three categories: the tones of the tonic triad, missing fundamentals of the tonic triad, and leading
tones. In general, the missing fundamentals of a chord (and their salience) depend on voicing, but the
main candidates for an octave-generalized chord type such as “major triad” can be derived by a simple
octave-generalized calculation (Parncutt, 1988). The main missing fundamentals of a C-major triad are
A, F, and D; for example, there is a missing fundamental at A because E corresponds to the 5th
harmonic of A, and G to the 7th. Similarly, the main missing fundamentals of a C-minor triad are F and
Ab. Thus, the C-major scale comprises the tonic triad (C, E, G), its missing fundamentals (A, F, D), and
the leading tone (B). The C-harmonic-minor scale comprises the tonic triad (C, Eb, G), its missing
fundamentals (F, Ab), and the leading tone (B). According to this logic, the gap between C and Eb
could be bridged by either D or Db. D is preferred for one or more of the following reasons: the

118
Empirical Musicology Review Vol. 7, No. 3-4, 2012

analogy to the familiar major scale, the perfect 5th above the dominant G, and the avoidance of
adjacent half-steps (Pressing, 1977). Incidentally, the 5th above the dominant plays an important role in
Schenkerian theory: scale step 2 in the fundamental line (Urlinie) implies dominant harmony.
Thus, six out of seven scale tones of the major scale, and five out of seven tones of the
harmonic minor scale, are octave-equivalent with virtual pitches evoked by the tonic triad. Leading
tones may be treated as a special case, consistent with their instability and “sensitivity” (French: la note
sensible), the existence of special terms to describe them, and their apparently unique historical and
psychological origin. To understand the special status of leading tones, consider first the simplest and
most common form of the diatonic scale: the scale represented by the white keys on the modern piano,
which is based on a continuous section of the cycle of fifths (F, C, G, D, A, E, B). Among the intervals
created by this collection of 7 pitch classes, there are 2 semitones: B-C and E-F. In Medieval music
theory, the hexachord ut-re-mi-fa-sol-la can be mapped onto this diatonic scale in two different places:
C-D-E-F-G-A or G-A-B-C-D-E. From this perspective, the two semitones in the diatonic scale both
correspond to the interval mi-fa. Statistical analysis of representative Medieval (Gregorian) chants
shows that the tone fa (C or F) consistently occurs more often than mi (B or E). The reason may be
because the harmonic series above fa better fits the prevailing diatonic scale, so fa has more pitch
commonality with its immediate context and sounds more consonant (Parncutt & Prem, 2008). Modern
research on the frequency of occurrence of scale steps and their tonal stability in different tonal styles
(Järvinen, 1995; Krumhansl, 1990; Krumhansl et al., 1999; Oram & Cuddy, 1995; Smith &
Schmuckler, 2004), when applied retrospectively to Medieval music, suggests that fa was perceived as
more stable than mi. Thus, early listeners may have learned to associate the lower tone of any semitone
interval with instability and the higher tone with stability.
Admittedly, this is not the simplest conceivable explanation for the origin of major and minor
scales. But each aspect of the explanation is supported by independent psychological, historical and
music-theoretic evidence or arguments. I know of no other explanation that is plausible from all three
viewpoints. In the following, I will combine this approach with Huron’s findings to construct a new
explanation for the emotional connotations of major and minor keys that brings together ideas and
approaches from both humanities and sciences.

PROBABILITY DISTRIBUTIONS OF MAJOR MELODIES

Huron and Davis (2012) observed that the pitch range of speech is larger in happy than sad speech, and
asked whether the same is true for music. A comparison of major and minor melodies shows that the
answer is yes, but the difference is rather small. This leads to two further questions. Do composers or
improvisers choose smaller intervals for music in the minor mode because (i) that music is normally
sad and smaller intervals better convey sadness, or (ii) smaller intervals are a consequence of the
structure of the minor scale? Huron and Davis answered “yes” in both cases, but focused on (ii).
To address (ii) systematically, Huron and Davis first made the question more specific by
looking at what happens to the average successive interval size in major melodies when selected scale
degrees are shifted up or down by a semitone. If the answer to (ii) is yes, the average successive
interval size should fall if we take a melody in a major key and lower scale steps 3 and 6 to create a
minor-key melody.
Huron and Davis began their argument with a simple observation. If every scale degree and all
transitions between scale degrees happened equally often, shifting selected scale degrees would have
no effect on mean interval size. In fact, there are large variations in the prevalence (frequency of
occurrence) of both scale degrees and scale-degree transitions. For that reason, any shift in scale
degrees will lead to a change in average successive interval size. Regarding the prevalence of scale
steps, Krumhansl and Kessler (1982) explained their key profiles (hereafter “K-K profiles”) on that
basis, which led to a surge of research interest in prevalence profiles of scale degrees in MmT.
The prevalence of scale-step transitions in MmT-melodies depends on at least three separate
factors:

1.Smaller intervals happen more often than larger intervals, consistent with the gestalt
principle of proximity. Vos and Troost (1989) and Huron (2006) considered two exceptions.
First, the minor second generally happens less often than the major second; possible
explanations involve categorical perception and intonational limitations. Second, consonant
intervals such as fifths tend to happen more often than adjacent dissonant intervals such as
tritones.

119
Empirical Musicology Review Vol. 7, No. 3-4, 2012

2.Melodic leaps more often rise than fall, and steps more often fall than rise. Or put another
way: rising intervals are more often leaps and falling intervals are more often steps. This was
demonstrated empirically by Vos and Troost (1989), and confirmed by Huron (2006). Meyer
(1973) regarded a melodic leap as a structural gap; it implies stepwise motion in the opposite
direction to fill the gap. Schenker explained the stepwise filling-in of intervals between
harmonic tones with the term Zug (linear progression), which is a form of prolongation (Forte
& Gilbert, 1982).

3.Most interesting for the present study, transition probabilities depend on scale degrees
relative to the tonic. As Huron and Davis (2012) showed in their Figure 2, which is the same
as Figure 9.7 in Huron (2006), some scale-degree transitions are much more common than
others. The most common transitions in major-key melodies are the falling steps from 5 to 4, 4
to 3, 3 to 2 and 2 to 1—consistent with points 1 and 2 above. The most likely scale degrees to
be followed by a rest (which usually indicates the end of a phrase, and hence closure) are 1, 3
and 5—the tones of the tonic triad. Surprisingly uncommon is the transition from 6 to 7 in
either direction. Scale degree 6 most often moves down a step to 5, and 7 most often moves up
a step to 1. For melodies whose range is smaller than an octave, these aspects taken together
imply that the lowest tone of a melody is often 1 or 7, while the highest is often 6 or 5. In the
following, I will refer to melodies that are largely confined to this range as “Huron’s
stereotype.”

THE PITCH RANGE OF MAJOR MELODIES

Before continuing, allow me to drive this point home by considering some well-known examples of
Huron’s stereotype: “Oh Susanna,” “Twinkle twinkle little star,” “Frère Jacques,” “Mary had a little
lamb,” and the national anthems of Germany (“Einigkeit und Recht und Freiheit…,” based on the 2nd
movement of Haydn’s Kaiserquartett) and Great Britain (“God Save the Queen”). (Readers are asked
to imagine/audiate these melodies and confirm for themselves that their opening themes correspond to
the stereotype before proceeding.) But the stereotype is not limited to so-called trivial music, nor is it
limited to national anthems. As we shall see, it happens in most or perhaps all styles of MmT.
Consider Western art music. If we wanted to choose a representative composer, that might be
Mozart, and if we wanted to choose a representative repertoire we might choose his 18 piano sonatas. If
we look at the melodic line in the first two measures of each sonata, we find that in 10 of 18 cases (KV
279, 280, 281, 283, 284, 310, 331, 332, 333, 545) the melody conforms to Huron’s stereotype: the
range is less than an octave, the lowest tone is scale degree 7 or 1 and the highest is 5 or 6. In the other
8 cases, the range is expanded to include a lower 5 (KV 282, 330, 533) or an upper 1 (KV 311), or the
theme is a triadic flourish—an arpeggiation of the tonic triad (KV 309, 457, 570, 576).
Just in case that is a coincidence, let’s consider another representative Western composer, J. S.
Bach, and one of his best-known works, the Well-Tempered Clavier. There are 24 fugues in Book 1, and
each has a clearly defined theme. Of these, 10 conform to the stereotype (excluding the modulating
parts): those in C, D, d, Eb, d#, F, f#, g, Ab and g# (upper case for major keys, lower case for minor).
Like the Mozart example, other themes deviate from the stereotype, but in ways that are themselves
stereotypical.
I also did a quick, non-systematic search for Huron’s stereotype in a list of pop, rock and
traditional songs that I happen to know (and presumably most English-speaking middle-aged people
know). In the following alphabetical list, the best-known (or first few) melodic phrases of the song
(regardless of whether verse or chorus) conform to Huron’s stereotype. Some of the melodies also have
non-diatonic tones, but they also lie within the stereotypical range:
All you need is love, Blowin’ in the wind, Blue moon, Bye bye blackbird, Bye bye love,
Climb every mountain, Crocodile rock, Danny boy, Don’t worry be happy, Fernando, Fire and
rain, Great pretender, I don’t know how to love him, If I had a hammer, Michael row the boat
ashore, Mister Bo Jangles, Mrs. Robinson, Sound of music (The hills are alive…), Stand by
me, Sweet baby James, The lion sleeps tonight, The rose, The times they are a-changin’, With
a little help from my friends.
When making this list, I left out many more songs than I included. But many of those left-out songs
had a melody that went down further to scale step 5 or further up to 1—just as in the Mozart piano
sonata examples. I did not consider jazz standards, because their melodic range tends to exceed one

120
Empirical Musicology Review Vol. 7, No. 3-4, 2012

octave. Note also that I have left out melodies in minor keys, for the same reason that Huron and Davis
left them out of their study: they often modulate quickly. All in all, the results of this very preliminary
and subjective investigation are indicative of the psychological and musical reality of Huron’s
stereotype.

THE ANALYSIS OF HURON AND DAVIS

Huron and Davis (2012) asked: If a given scale degree in a major melody is altered by shifting it up or
down a semitone, how will that affect the average size of intervals between successive tones in the
melody? The question is important because, as Huron previously demonstrated, sad melodies, like sad
speech, tend to have smaller intervals between successive tones (or phonemes).
To answer this question, we first need to consider which scale degrees can be shifted. If we
limit our investigation to shifts of one semitone, we must avoid scale steps that are a semitone apart:
new scale degrees are not created if scale degree 7 is shifted up, 1 is shifted down, 3 is shifted up or 4 is
shifted down. Huron and Davis also decided not to explore the effect of shifting the tonic (1) up a
semitone, which I think would have been interesting—although it would not have affected their general
conclusions.
Huron and Davis systematically explored all remaining possibilities, applying all remaining
scale-degree shifts to a large database of melodies in major keys. The result: the best way to reduce the
average size of successive intervals is to lower scale degrees 3 and/or 6 by a semitone. In other words,
the harmonic minor scale minimizes the average successive interval size in typical melodies. Is that the
reason why the harmonic minor is so popular?
Huron and Davis examined a large number of real melodies, but it is also possible to predict
their result without doing any analysis. Consider again the typical range of a major-key melody: scale
degree 1 (or 7 just below it) is often the lowest tone and scale degree 6 (or 5) is the highest. In this case,
it is clear that lowering 7 will increase the average successive interval size, because 7 is the lowest
tone. Lowering 2 will increase the size of intervals between 2 and 3, so that is unlikely to lead to an
overall reduction in interval size. Raising 4 increases the size of intervals between 4 and 1, 2 and 3, and
since these are rather common transitions, that change is also likely to increase the average. The only
two possibilities that clearly reduce the mean successive interval size are lowering 6 and lowering 3.
Lowering 3 has this effect due to the relatively high rate of transitions among scale degrees 1, 2 and 3
by comparison to 3, 4 and 5, but the effect is rather small.
The results of Huron and Davis are consistent with the following scenario. Historically,
melodies in major and minor keys developed in parallel. Both were essentially diatonic (like
transpositions of the white keys on the piano) but with different tonal centers. For the moment, we have
no explanation for the specific choice of these two tonal centers. The relative prevalence of major and
minor keys varies across styles and periods (e.g., there is relatively more minor in music by Bach, and
relatively more major in Mozart), but a systematic analysis of large corpora in different styles and
periods would presumably reveal a consistent dominance of major over minor—consistent with the
general observation that people tend to prefer happy over sad sounding music (Thompson,
Schellenberg, & Husain, 2001; Hunter & Schellenberg, 2010). Because major was more common,
minor was perceived as a variant of it, rather than the reverse: minor became “the Other” of the major-
minor system. In minor melodies, two or three scale degrees were typically lower than the equivalent
scale degrees in major, which made the melodies sound sad—just as the average fundamental
frequency of sad speech is lower than expected (Huron, 2008). The idea of the minor as “the Other” is
not new: in the words of Meyer (1956), “States of calm contentment and gentle joy are taken to be the
normal human emotional states and are hence associated with the more normative musical
progressions, i.e., the diatonic melodies of the major mode and the regular progressions of major
harmony. Anguish, misery, and other extreme states of affectivity are deviants and become associated
with the more forceful departures of chromaticism and its modal representative, i.e., the minor
mode” (p. 227).
According to this account, minor is sad compared to major simply because selected scale steps
are lower than expected. This idea is direct and parsimonious, and therefore particularly convincing.
But it is not the whole story, because the major-happy/minor-sad association appears to be confined to
MmT; the association may be absent in historical or non-Western traditions that are not based on triadic
harmony (see the historical section below).
To solve this problem, I offer an alternative hypothesis: The ultimate origin and foundation of
the sad feeling of music in minor keys is the prolonged minor triad in the background. The prolonged

121
Empirical Musicology Review Vol. 7, No. 3-4, 2012

minor triad sounds sad by comparison to the prolonged major triad in the background of music in major
keys because one of its tones (the third) is lower than the corresponding tone in the major triad. That, in
turn, is because major is perceived as a standard from which minor deviates, which is because music in
major keys is more common. That, again in turn, is because the major triad is more harmonic (i.e.,
more similar to the harmonic series) and in this sense more consonant than the minor triad.
There is convergent evidence for each point in this argument, and each point can be
generalized to other situations. Regarding prolongation, not only triads can act as tonic (or referential)
sonorities when they are prolonged, but also open-fifth sonorities (e.g., in Medieval polyphony) and
major-minor seventh chords (in blues and bebop jazz; cf. Salzer, 1952/1962). Huron has argued that the
principle of communicating sadness by lowering an expected pitch has considerable generality both in
music and speech. That more common percepts are perceived as standards from which less common
percepts deviate is the basis of the theory of stereotype-based category perception (Taylor, Fiske,
Etcoff, & Ruderman, 1978) and is related to the availability heuristic: more common things tend to
come more easily to one’s mind (Tversky & Kahneman, 1973). Finally, harmonicity is an important
general foundation of Western consonance (McDermott, Lehre, & Oxenham, 2010; Parncutt & Hair,
2011; Terhardt, 1974).

A SCHENKERIAN APPROACH

Regarding his Figure 9.7, Huron (2006) mentioned that


One of the most striking features is the sequence of descending arrows from 5 to 4 to 3 to 2 to
1. For Schenkerian theorists, this is strikingly reminiscent of the five-line Urlinie—although it
should be emphasized that these transitions are note-to-note, rather than the transitions
between structural tones” (p. 160).
That is an interesting and promising observation, and unless I have missed something (for which I
apologize) Huron (2006) and Huron and Davis (2012) did not follow it up. Instead, they considered the
consequences of this pattern for the use of chromatically altered scale degrees.
Huron’s stereotype is consistent with the idea that any passage of music in a major or minor
key is a prolongation or embellishment of its tonic triad—or can be perceived as such. In making this
claim, I am indebted to Schenker for the idea of prolonging the tonic triad; but I have also adapted his
idea for my purpose, which is more music-psychological than music-theoretical. In his analyses,
Schenker considered different kinds of prolongation on different hierarchical levels (e.g., any triad on
any scale degree can be prolonged, either contrapuntally or harmonically; Salzer, 1952/1962), and
focused on the analysis of German masterworks. By comparison to Schenker, I apply less theory (tonic
prolongation is just one of many Schenkerian ideas) to more music (I claim that my idea applies to all
of MmT).
Forte and Gilbert (1982) give several examples of chord progressions in which one chord
(e.g., the tonic triad) is more tonally stable than another (e.g., a subdominant). The less stable chord
may be considered a prolongation of the more stable. These examples illustrate a central feature of
prolongation, both melodic and harmonic: It generally involves stepwise motion from more stable to
less stable and back.
If you add passing and neighbor tones to the tones of a tonic triad, you essentially get Huron’s
stereotype. The rarity of the 6-7 transition is easily explained by the idea that 6 is a neighbor of 5, so
progresses naturally to 5, while 7 is a neighbor of 1, so it progresses naturally to 1. If that is the case,
Huron’s stereotype is evidence that the tonic triad exists somehow in the background throughout a tonal
melody – just as the K-K profiles are understood to exist in the psychological background of any
passage of MmT, which would explain the high correlation between the K-K profiles and prevalence
profiles of scale steps (Krumhansl, 1990). Different terms can be used to describe this background: a
physicist might call it a frame of reference, and a psychologist might call it a schema, gestalt, or
cognitive representation.
Schenker introduced the idea of Ursatz (fundamental structure) to describe this psychological
background structure. His idea was later taken up in different ways by (mainly American) music
theorists and provoked a quantum leap forward in music-theoretic thinking. But the idea is also
famously problematic:

1.For most listeners, the Ursatz has no psychological reality across long time-spans such as an
entire movement of a classical or romantic symphony that lasts for several minutes (Cook,
1987). It may be possible for listeners with good absolute pitch or music theorists with a good

122
Empirical Musicology Review Vol. 7, No. 3-4, 2012

understanding of the score to conceptualize the tonic triad or Ursatz throughout a piece and
hear everything relative to it (which would be an example of structural hearing or Fernhören;
Salzer, 1952/1962), but that is an unusual form of music perception. Even if we ignore this
psychological problem, there is a music-analytical problem associated with longer timespans:
the Ursatz may work well for achieving theoretical understanding of shorter passages (e.g., the
Brahms/Haydn Variations theme) but the situation is less clear (more ambiguous) for longer or
more complex pieces, because the details of prolongation may be implemented differently at
larger levels than at smaller ones (Graham Hair, personal communication).

2.The structural details of the Ursatz are too specific. It is not the only possible background
structure for a piece in MmT. A stepwise soprano descent (the fundamental line or Urline) and
rising and falling fifth intervals in the bass (bass arpeggiation or Bassbrechung) are not the
only forms of prolongation of a tonic triad. Even if we focus on German masterworks, as
Schenker did, forcing pieces into the Ursatz mold is not necessarily the best way to achieve
music-theoretical insights—let alone explain perception.

To solve this problem, we need a representation of the musical background that is more general
and more fuzzy. The representation should have the following basic properties. First, it should
encapsulate the principle of moving from consonance through dissonance and back to consonance,
which I am taking to be axiomatic for most Western music. Second, it should be consistent with the fact
that almost all MmT-music starts and ends with a major or minor triad (either real or implied), and in
most cases the two triads are the same. Third, much music in major or minor keys can be regarded as
elaborations (prolongations) of the key-defining progressions (Caplin, 1998: cadences) such as I-II-V-I,
I-III-V-I, I-IV-V-I, and I-VI-V-I (Salzer, 1952/1962).
A solution to this problem was offered by Salzer (1952/1962) in his example 481 (p. 263 in the
Dover edition). This diagram reduces the structure of a whole piece to its tonic triad, which Salzer
labels “chord (tonality-indicating),” and beyond that to the root of that triad, which Salzer labels
“tone.” Starting from the “chord,” he separates “primordial harmonic prolongation” from “primordial
contrapuntal prolongation”; both lead to “structure = tonality-determining harmonic and melodic
framework.”
It is interesting that here and elsewhere Salzer avoids any reference to Schenker’s Ursatz. Forte
(1959, p. 9) similarly claimed that “Schenker’s major concept is not that of the Ursatz, as it is
sometimes maintained, but that of structural levels, a far more inclusive idea” (cited in Wikipedia
“Schenkerian analysis”). Boenke (2005) explained that when theorists such as Salzer applied
Schenker’s theory to earlier and later music, central aspects such as the Ursatz had to be weakened or
abandoned:
Überlegungen, den Geltungsbereich der Schenker-Theorie durch Modifikationen zu erweitern,
blieben ein bestimmendes, wenngleich kontrovers diskutiertes Motiv in der Auseinandersetzung mit
Schenker. In dem Maße, wie Teilstücke seiner Theorie—beispielsweise das Konzept hierarchisch
bezogener Schichten oder aber die Vorstellung der ›Auskomponierung‹ von Klängen—auf Werke
außerhalb der von Schenker betrachteten Zeitspanne angewendet wurden, mußten andere zentrale
Aspekte, insbesondere die Theorie des ›Ursatzes‹, abgeschwächt oder gar ganz aufgegeben werden.
Je weiter das Zeitfenster geöffnet wurde, umso stärker konnten einzelne Ideen Schenkers allgemeine
und epochenübergreifende Gültigkeit beweisen. Als Kehrseite dessen wurde jedoch die Theorie in
ihren Fundamenten ausgehöhlt.

Returning to Salzer’s diagram: in his accompanying text, Salzer explains that it


represents an attempt to demonstrate graphically the “distance” and at the same time the inner
connection between the most remote, quasi-abstract, musical factors (such as a tone and its resulting
chord) and the finished product of composition … just as a prolongation of lower order, to be
understood, must always be referred to the one of next higher order (which is its structure), so also
can the structural framework be referred further back to the tonality-indicating fundamental chord of
which it logically is a harmonic or contrapuntal prolongation (p. 231).

In this passage, Salzer reinforces and reformulates Schenker’s idea that a passage of music can be reduced
to its tonic triad, and hence regarded as a prolongation of its tonic triad—consistent with the idea that, from
a harmonic viewpoint, the Ursatz is a prolongation of the tonic triad (whereas from a melodic viewpoint it
is like the gradual fall in pitch at the end of a phrase of speech or music; Huron & Davis, 2012). Similarly, I

123
Empirical Musicology Review Vol. 7, No. 3-4, 2012

argued in Parncutt (2011a) that the K-K profiles are merely a cognitive representation of the prolonged
tonic triad. The evidence for this statement is both qualitative and quantitative. The qualitative evidence can
be found in Schenkerian theory, which explains the process of prolongation and the hierarchy of structural
relationships that exist between a tonic triad and the details of the musical surface. The quantitative
evidence is the correlation between the K-K profiles and the pitch-salience profiles of major and minor
triads according to Parncutt (1988)—a simple algorithm that was inspired by two other algorithms, Terhardt
(1982) and Terhardt, Stoll, & Seewann, (1982), and whose degree of complexity lies midway between
them. The correlation is equally strong when a more complex algorithm is used such as defined in Terhardt
et al. (1982), Parncutt (1989) or Parncutt (1993).
The quantitative evidence is based on the idea of a chord’s pitch-salience profile—an experiential
representation of the chord, in which pitch corresponds not to pitch as notated in a musical score or to
frequency as physically measured, but to the experience of a tone with a given pitch. The strength or
salience of this experience depends in general on the number of audible partials corresponding to
harmonics of that pitch, and their perceptual salience. When this simple idea, which is consistent with many
psychoacoustic studies of pitch perception, is applied systematically to a C-major triad, we can predict
firstly that the pitch class C is on average more salient than E or G, because E and G, and their harmonic
overtones, are more often harmonics of C than vice-versa. We may then predict that a C-major triad has
missing fundamentals at pitch classes D, F and A; their salience is lower than that of the tones of the tonic
triad, and on average A is the most salient and D is the least salient of the three. Similar arguments apply for
the minor triad, which can be treated in exactly the same way (and not differently, as in the 19th-century
theory of harmonic dualism; see Ortmann, 1924).
This link between the Ursatz, the key profiles, and pitch-salience profiles of tonic triads, if valid,
has the potential to become the foundation for a new general understanding of MmT. Consistent with
Adler’s (1885) concept of systematic musicology (Parncutt, 2007), I am thinking of a new paradigm that
brings together and synergizes ideas from the humanities and sciences—ideas that originally emerged
independently in strikingly different intellectual traditions and contexts. Salzer (1952/1962) anticipated this
development when he wrote
I firmly believe that there is a need for a theory of music and composition which never loses
contact in all its branches and disciplines with what seems to me to be its principal goal and
justification: leading the ear and mind to understand all details as organic offshoots of the whole,
which means the perception of total musical organization. (p. 283; italics RP)
A music psychologist may disagree with the extent to which “total musical organization” can be perceived,
or can exist in a listener’s imagination—but at the same time welcome the opportunity to collaborate with
music theorists to achieve deeper insights into these issues.

THE PSYCHOLOGY OF PROLONGATION

The key words in Schenker’s approach are prolongation and (compositional) unfolding (Ausfaltung,
Auswicklung, Auskomponierung). Forte and Gilbert (1982) explained:
Prolongation refers to the ways in which a musical component—a note (melodic prolongation)
or a chord (harmonic prolongation)—remains in effect without being literally represented at
every moment. Of the two main categories of prolongation, melodic and harmonic, the latter is
easier to grasp. Essentially, a given harmony is prolonged so long as we feel it to be in control
over a particular passage. (p. 142)

What exactly does it mean for a note or chord to “control” a passage, or to “remain in effect”? In an
empirical study, Deutsch (1972) showed that short-term memory for the pitch of a tone is affected most
by a following distractor tone whose frequency is about 1/3 of a whole tone higher or lower; the effect
disappears at an interval of about one whole tone. This is evidence that a pitch, and stepwise departures
from that pitch, can “remain in effect” for at least several seconds. The limit of about one whole-tone is
broadly consistent with the gestalt principle of (pitch) proximity, Noorden’s (1975) theory of temporal
coherence, and Bregman’s (1990) theory of auditory scene analysis. Psychological experiments have
also provided evidence for the existence of a prolonged tonic triad in the background of MmT-music.
Several studies reported by Krumhansl (1990) are consistent with that assumption. In a priming
paradigm, for example, Bigand, Tillman, Poulin-Charronnat, and Manderlier (2005) found that
response times in consonant/dissonant judgments are shorter for tonic than for nontonic targets.
Larson (1997) introduced psychological ways of thinking into the music-theoretic discourse
surrounding this issue:

124
Empirical Musicology Review Vol. 7, No. 3-4, 2012

To auralize means to hear internally sounds that are not physically present. A trace is the
internal representation of a note that is still melodically active (p. 104).
In a melodic step, the second note tends to displace the trace of the first, leaving one trace in
musical memory; in a melodic leap, the second note tends to support the trace of the first,
leaving two traces in musical memory (p. 105).
“Traces” can exist, and “displacement” can happen, in either the background or the foreground (using
these terms relatively—not in the strict sense of Schenker and his followers). Melodic tones that are
adjacent to tonic triad tones (neighboring or passing tones) can prolong the triad by displacing them in
the foreground but allowing them to continue as psychological references in the background. When
Larson says that “a second note tends to displace the trace of the first,” he is referring to the musical
surface or foreground (Deutsch’s experiment was also confined to the musical foreground). If the first
of a pair of melodic tones is part of a prolonged sonority, that sonority may continue to exist in the
background, but disappear (be “displaced”) in the foreground.
From a psychological viewpoint, the musical foreground is enabled by a shorter-term memory
with a duration or half-life of perhaps one second (cf. Huron & Parncutt, 1993); the background, by a
longer-term memory with a duration of roughly one minute (cf. Cook, 1987). The shorter-term memory
may be considered either passive echoic memory or a working memory buffer in which active,
dynamic cognitive processing occurs—similar to the visuo-spatial sketch pad and phonological loop of
Baddeley and Hitch (1974), but specialized for (musical) pitch (cf. Deutsch, 1970, 1975). In such
cognitive theorizing, there is a danger of reifying memory as purpose-built storage, but it can also be
considered a byproduct of information processing, and its effective duration as a byproduct of the kind
of stimulus or processing: “a proceduralist orientation deals with this variety as reflecting the variety of
kinds of auditory information processing, with memory as a side effect of this processing in all
cases” (Crowder, 1993, p. 140). The memory functions associated with the Schenkerian foreground and
background may correspond in general ways to other kinds of auditory memory (e.g., for speech) but
also have their own unique properties.

Larson (1997) continued:


Thus, prolongation—and only prolongation—always determines which notes are heard as
stable in a given context. … To hear a note as unstable also means to hear it as embellishing a
more stable pitch—that is, to hear it as embellishing a pitch at a more remote level of pitch
structure (p. 112).
I have argued that prolongation is embellishment; embellishment (and only embellishment)
determines the relationships between tones that make some tones of lesser and greater
structural weight than others (p. 130).
The word “prolongation” emphasizes the temporal aspect (the way a sonority can continue in the
background as a psychological reference although it is not physically sounding), whereas the word
“embellishment” emphasizes movement (usually stepwise) in pitch-time space. Since embellishment
generally has the effect of prolonging a sound or pattern, the terms are closely related.

The melodic embellishment of a tonic triad can be divided into three processes:

1. Arpeggiation: When the tones of a triad are presented successively, we still recognize the triad.
Simultaneous and successive versions have the same tonal meaning.
2. Passing notes: Within a 1-3-5 triad, we can pass 2 on the way from 1 to 3, or from 3 to 1.
Similarly, we can pass 4 on the way from 3 to 5 or from 5 to 3.
3. Neighbor notes: 2 can also be considered a neighbor of 1 or 3, and 4 as a neighbor of 3 and 5.
But the 7 just below 1 is also a neighbor, as is the 6 just above 5.

Following this logic, any melody based on the scale steps 7, 1, 2, 3, 4, 5, and 6 can be considered a
prolongation of the triad 1 3 5, provided the listener is somehow imagining this triad as a background
or goal of melodic motion, and typical embellishment figures connect the background with the
foreground. The concept of triadic prolongation can explain diverse common melodic progressions and
hence the basic structure of MmT.

125
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Schenker first hinted at these ideas in his harmony text (1906) as explained in the introduction
to the English translation by Jonas (1954, p. ix):
According to the theory of prolongation, free composition, too, is subject to the laws of strict
composition, albeit in “prolonged form.” The theory of Auskomponierung shows voice-
leading as the means by which the chord, as a harmonic concept, is made to unfold and extend
in time. This, indeed, is the essence of music. Auskomponierung thus insures the unity and
continuity of the musical work of art.
Schenker later expressed his idea more clearly (1922, p. 4; cited in Wikipedia under “Fundamental
structure”):
The fundamental line presents the unfolding (Auswicklung) of a basic sonority, expressing
tonality in the horizontal plane. The tonal system too, joins in expression of tonality. Its task is
to bring a purposeful organization into the world of chords by selecting the scale degrees from
among them. The liaison between the horizontal version of tonality through the fundamental
line and the vertical through the scale degrees is voice leading.
The idea of MmT as a prolongation of the tonic triad can also be found in Schenker (1935).
Schenker used these ideas to analyze the “great music” of Bach, Beethoven and Brahms. But
the last quote, taken out of context, says nothing about “great music,” nor does it mention the expert
subjectivity that is usually considered to be the foundation of Schenkerian analysis. It is tempting to
consider the more general and potentially objective, psychological significance of this quote. According
to Salzer (1952/1962), “tonality is the expression of tonal unity and coherence based on the principle of
structure and prolongation” (pp. 226-227), and “Tonality may thus be defined as prolonged motion
within the framework of a single key-determining progression, constituting the ultimate structural
framework of the whole piece” (p. 227). This quote, and the work of later theorists such as Larson
(1997) and Väisälä (2002), suggests that prolongation is not confined to the “high art” of “common-
practice” composers such as Bach, Beethoven and Brahms. The idea of “unfolding of a basic sonority,
expressing tonality in the horizontal plane” may also be applied to related (“pretonal,” “posttonal”)
music. Salzer (1952/1962) offered many examples of prolongation in Medieval and Renaissance music
(e.g., Leonin, Perotin, Machaut, Dunstable, Dufay, Josquin) as well as the prolongation of polychords
by Copland and Stravinsky.
I would like to consider an even more radical generalization of Schenkerian thought. Triadic
prolongation may represent the foundation of all MmT, from the most trivial to the most profound
(whatever such value judgments mean, exactly). By MmT I mean harmonic tonality in the sense of
Dahlhaus (1967) and not the more general case of music in the Ionian or Aeolian mode, which may or
may not imply a background triad, depending on its historic or cultural origin or who is listening to it.
Given this definition, any passage of music in a major or minor key can be regarded as a prolongation
of its tonic triad—even if there is no harmonic accompaniment whatsoever. As we have seen, Huron’s
data on transitions between scale steps in major-key melodies is consistent with that idea. We can
always perceive major or minor music in this way (structural hearing), and at some level it seems that
we usually do, regardless of our musical expertise.
At this point, I would like to invite the reader to test these ideas by returning to the above lists
of pieces and songs. It is one thing to imagine the melody and confirm that it is confined to the range
from a low scale degree 7 to a high 6, but it is another thing to imagine the melody as a prolongation of
the tonic triad. For me and (presumably) any Schenkerian theorist, the feeling of tonic prolongation is
obvious. But for others it may not be so obvious, so they may be right to question this kind of
explanation.

INCORPORATING SCHENKER INTO MUSIC PSYCHOLOGY

According to Straus (1987, cited by Larson, 1997),


Prolongation is an idea of extraordinary power. It has afforded remarkable insights into
common-practice music, enabling us to hear through the musical surface to the remoter
structural levels and ultimately to the tonic triad itself (p. 1).
From the point of view of music theory, it is remarkable that cognitive music psychologists have been
so reluctant to accept and build on this idea. Narmour (1977) criticized Schenkerian dogma and offered
his implication-realization model as an alternative. Huron (2006) expressed some widely shared
reservations: “The rewrite rules used for reductions are not fully systematic and so there is considerable
latitude for interpretation. No controlled studies have been carried out to determine whether analyses
are unduly influenced by confirmation bias” (p. 97). Huron goes on to suggest that the Urlinie may be

126
Empirical Musicology Review Vol. 7, No. 3-4, 2012

based on a more fundamental phenomenon, namely the gradual fall in pitch at the end of a phrase of
speech or music.
For scientific purposes, Schenker’s concept of Ursatz is too exact and detailed—and hence
arbitrary. Empirical psychologists know that we cannot empirically determine background pitch
structures with this degree of precision. Schenker laid himself open to criticism by presenting such a
sharply defined structure as the ultimate background of a range of tonal musical styles and works. If
background pitch structures exist in the awareness or imagination of listeners, those structures must be
more fuzzy than Schenker’s Ursatz.
This applies even to the most sophisticated listeners, and even to the scenario of “great”
composers listening to their own music. Of course it is possible to train oneself to hear in ways
specified by Schenker, and in that way to achieve new insights into a piece, but that is a different
question. But the Ursatz is only one possible way to prolong the tonic triad. Most or all of the melodies
listed above may also be considered prolongations of the tonic triad. In a radical interpretation, the
middleground reduction of any melody (plus bass line) may be considered an alternative to the Ursatz.
Given these arguments, it may be appropriate for music psychologists to accept the
psychological reality of prolongation, but to reject the specificity of the Ursatz (perhaps pending a
more realistically approximate formulation). A compromise solution of this kind may help music
psychology and music theory to grow closer together, and achieve productive synergetic interactions.
The goal of Schenker’s theory was appropriate in its historical and cultural context. He wanted
to understand why great pieces of Western music were so great, and in that way to understand the
genius of great composers. Today, we understand greatness and genius in more relative terms (Cook,
1998). We take a broader view of what is “good music” and indeed what is “music.” This applies in
particular to music psychology. We are aiming for a general understanding of the psychological
foundations of music. We want to understand MmT in a way that may be applied to any Western style
or repertoire. Great 19th-century symphonies are not intrinsically more or less important than other
common-practice, traditional, popular, sacred or secular music. MmT includes such diverse styles,
genres and other classifications as bel canto opera, disco, jazz standards, bebop, blues, country, sacred
harp, gospel, folk, lullabies, Christmas carols, easy listening, muzak, new-age relaxation music, chill-
out, electronic dance, calypso, hip-hop, funk, metal, techno, gothic rock, indie, post-grunge, Afrobeat,
Brazilian funk, salsa, reggae, flamenco, acid rock, Arabic pop, and Celtic punk.
The point of this long list, which could be extended almost indefinitely, is that musical styles
that appear to be based on the principle of prolonging a major or minor triad (or perhaps another
relatively consonant sonority such as an open fifth or, in bebop, a major-minor seventh chord) are
continuing to dominate in today’s globalized and technologized musical world—even if many theorists
would not consider them to be examples of MmT. Even if a musical style is not based on major or
minor scales, it may still be based on the idea of harmonic prolongation; examples include Flamenco
based on harmonic progressions in the Phrygian mode, Arabic pop based on harmonic progressions in
recognizably Arabic scales including quarter tones, and Indian classical music in which the melody
may be considered a prolongation of a background tonic-fifth drone. This raises the interesting question
of whether chords or harmonic progressions are a necessary ingredient of MmT. The theory I
developed in Parncutt (2011a) suggests that they are, but a thorough study of non-Western tonal
systems may yield a different conclusion.
Many Western musical styles, both today and in the past, cannot be conceived of as a
prolongation of major or minor triad. But they represent a small minority. Western music has been
dominated by triadic prolongation since the about 14th century (Salzer, 1952/1962), and the situation is
unlikely to change in the foreseeable future. Conversely, a lot of non-Western music involves
prolongation of sonorities such as perfect fifths and major or minor triads (but may not involve or
imply chord progressions), suggesting that the arguments in this paper would apply to it (Sarha Moore,
personal communication). Due to my limited expertise in ethnomusicology, I prefer to avoid this
question and focus on Western music as Schenker and Salzer did.

HISTORICAL ORIGIN OF THE MAJOR-HAPPY/MINOR-SAD


ASSOCIATION

I have gone to considerable lengths to explain how prolongation can help us understand the structure of
MmT. The basic idea is that the tonic triad exists constantly as a psychological reference in the
background, and may in that sense be regarded as the ultimate foundation of MmT. I will now argue

127
Empirical Musicology Review Vol. 7, No. 3-4, 2012

that prolongation can also help us to understand the emotional connotations of MmT—in particular, the
link between major keys and positive emotions, and between minor keys and negative emotions.
Huron and Davis mentioned that “the association of the minor third and the minor triad with
sadness was already described in the sixteenth century by Zarlino (1558).” The major-happy/minor-sad
association was increasingly accepted in the 17th century. For example, Lippius (1612) agreed that the
Ionian, Lydian, and Mixolydian modes were essentially happy, while Dorian, Phrygian, and Aeolian
were weak, sad, and serious; similar ideas were expressed by Cruger (1630). Werkmeister (1687, pp.
124-125; cited in Lester, 1977) considered the major triad to be “more joyful and perfect than anything
else.”
But there was also considerable argument and disagreement about the emotional connotations
of major and minor keys (and their modal relatives) in the 16th and 17th centuries. Gumpelzhaimer
(1591) presented old-fashioned ideas about the character of church modes: he considered Dorian to be
cheerful, Hypodorian sad, Phrygian severe, Hypophrygian enticing, Lydian harsh, Hypolydian gentle,
Mixolydian impatient, Hypomixolydian placable, Aeolian pleasant, Hypoaeolian sorrowful, Ionian
delightful, and Hypoionian tearful (Landner, 1997). Inspecting this list, we can find no clear link
between pre-major (Lydian, Mixolydian, Ionian) and positive emotions, or pre-minor (Dorian,
Phrygian, Aeolian) and negative emotions. In Wikipedia under “Mode: Western Church,” I found a
table that compares interpretations of the “character” of church modes by three historic theorists: Guido
of Arezzo (995–1050), Adam of Fulda (1445–1505), and Juan de Espinosa Medrano (1632–1688)—
again with little agreement. Along similar lines, Judd (2002) presented a list of affects associated with
the eight modes according to Vanneus (1533).
The uncertainty continued into the 18th century. Mattheson (1713, p. 232, cited in Lester,
1977, footnote 66) wrote:
Those who are of the opinion that the entire secret resides in the minor or major third and
would prove that all minor keys, speaking generically, are necessarily sad, and on the contrary,
that all major keys commonly foster a lusty character—it is not so much that they are wrong,
but they have no yet gone far enough in their investigations. Those who are of the opinion that
if a piece has a signature with flats it must necessarily sound soft and tender; if, however, it is
set with one or more sharps, then its nature must be hard, fresh and gay – they have even less
going for them.
Incidentally, the idea that sharp and flat keys have different character had a big influence on the history
of music and music theory, but it does not withstand modern psychological scrutiny (Powell & Dibben,
2005).
The emotional connotations of church modes in the Renaissance were not simply about
positive and negative emotions. According to Meier (2009), “the authentic modes were regarded as
‘joyful to moderate’, and the plagal modes as ‘moderate to mournful’” (p. 182). Given that the lowest
tone in an authentic mode is near the final and the lowest tone in a plagal mode is about a fourth below
the final, a possible explanation is that music in authentic modes has a higher average pitch, or a higher
average pitch compared to the final. If so, Huron’s (2008) idea of the relationship between emotion and
pitch relative to average or expected pitch could explain the effect. Meier (2009) also reminded us that
any such theory should be taken with a grain of salt: Tinctoris (1475) thought that a competent
composer could render any of the modes joyful or mournful, and this relativist approach was shared by
the 16th-century Swiss music theorist Glarean and others. Moreover, “the major-minor duality of the
Ionian and Aeolian modes, or of any other modes, plays no part in any of Glarean’s thinking—not in
the generation of the modes, their ordering, differentiation, relationships, or affects” (Lester, 1977, p.
212).
The variation and uncertainty of historical interpretations of the emotional qualities of modes
before the 17th century suggest that it was easier to override the conventional emotional qualities of
modes than it was later to contradict the stereotypical emotional qualities of major and minor keys.
… it is not always the affection of the text that determines the choice of mode; … the choice of
mode does not always depend on the composer’s choice; and … the affective character, peculiar
to a mode ‘by its nature,’ may be altered by various compositional procedures (Meier, 2009, p.
184)
… each of the modes may be rendered ‘joyful’ (or alternatively ‘hard’) if the composer
introduces movimenti veloci and uses many major thirds sixths or tenths over the bass;
conversely, in each mode the music will become ‘mournful’ or ‘languid’ … if the composer
makes use of slow rhythms and introduces many minor thirds, sixths or tenths over the bass
(Meyer, 2009, p. 186)

128
Empirical Musicology Review Vol. 7, No. 3-4, 2012

This kind of comparison between pre-MmT and MmT suggests that there is more to the major-minor
emotional distinction than mere arbitrary associations. Of course it is possible to make music in a major
key seem sad by choosing a slow tempo, or music in a minor key seem happy by choosing a fast
tempo; the emotional effect can also be changed by other parameters such as pitch range, articulation
and rhythmic pattern (Tagg & Clarida, 2003, pp. 310-317). But since the 17th century there has been
remarkable agreement about the idea that major keys are associated with positive emotional valence
and minor with negative. If we model emotion as a combination of different contributions that include
major versus minor tonality and other features investigated by Huron and colleagues (tempo, average
pitch, dynamic level, timbre, articulation; cf. Gabrielsson & Lindström, 2010), few music psychologists
today would question the psychological reality of the major-happy versus minor-sad association, given
the strength and diversity of the empirical evidence (Costa, Fine, & Ricci Bitti, 2004; Gabrielsson,
2009; Gabrielsson & Juslin, 2003; Gagnon & Peretz, 2003; Juslin & Laukka, 2004). That is true in
spite of some contradictions in the literature. For example, Kastner and Crowder (1990), whose
experimental participants were aged 3-12 years, observed that “all children, even the youngest, showed
a reliable positive-major/negative-minor connotation, thus confirming the conventional
stereotype” (abstract); but Gabrielsson and Lindström (2010) found that children do not recognize the
emotional connotations of major and minor until 6-8 years of age.
The above historical survey suggests that the psychological association between major/minor
modes and emotion emerged in the Renaissance. But that is also the period during which the system of
major and minor keys emerged—depending on how you define it (Dahlhaus, 1967). Salzer (1952/1962)
offered a broader definition of MmT and a correspondingly different date of origin:
since chord prolongation, contrapuntal or harmonic, is the force which creates tonal
coherence, the history of tonality begins not with the detection and establishment of harmonic
relationships and harmonic chord progressions, but with the first use of contrapuntal chord
prolongations in the twelfth century (p. 26).
These differences in definition mean that the margin of error within which “tonality” began is
enormous: three or even five centuries. It follows that we cannot separate the question of the origin of
MmT from the question of the origin of its emotional connotations. Sometime during those centuries,
the structural and syntactic aspects of MmT gradually became consolidated, both in musical practice
and in the minds of musical practitioners and listeners. In the absence of clear evidence to the contrary,
we may assume that the consolidation of the emotional connotations of major and minor happened
gradually and in parallel.
In Parncutt (2011a), I attempted to explain the evolution of musical structure in a way that
combined psychological and historical arguments. The structures that later became known as major and
minor triads started to appear sporadically in Western polyphony almost from its beginnings in Notre
Dame in the 12th century, e.g., in the works of Perotin (Flotzinger, 2007), in both prepared and
unprepared forms (Parncutt, Kaiser, & Sapp, 2011). They were already surprisingly prevalent in the
14th century in the polyphony of Machaut. In the 15th and 16th centuries, major and minor triads
became even more common by comparison to other possible pitch-class sets of cardinality three; in
typical scores by Palestrina and Lassus, almost every sonority is a 5/3, or in modern terminology a
major or minor triad in root position; 6/3 chords are remarkably unusual. This is all the more surprising
when we consider that Renaissance composers had no clear concept of triad, root, or inversion: they
apparently did not consider the first inversion of a triad to be related to its root position, nor did they
use this terminology. Instead, they conceived of sonorities in terms of intervals above the bass (Fuller,
1986). They may simply have regarded the 5th above the bass as more consonant than the 6th, which
privileged 5/3 chords over 6/3s (Väisälä, personal communication). In the history of music theory, clear
concepts of root and inversion first emerged in the early 17th century, for example in Lippius (1612)—
over a century before Rameau (1722).
Stereotypical structures of MmT such as subdominant-dominant-tonic cadences gradually
emerged during the 14th–16th centuries (cf. Eberlein, 1994). That is the same period during which
major and minor triads increasingly dominated harmonic progressions by comparison to other possible
sonorities. In Parncutt (2011a), I assumed that during this period music was increasingly perceived
relative to tonic triads. In Schenkerian terms, we might say that Renaissance polyphony increasingly
included passages that can be considered as prolongations of (tonic and other) sonorities (cf. Leech-
Wilkinson, 1984; Stern, 1981). It became increasingly feasible to regard entire pieces as prolongations
of triads.

129
Empirical Musicology Review Vol. 7, No. 3-4, 2012

The historical literature suggests that the major-happy/minor-sad association emerged during a
long period, beginning in about the 14th century and ending in the 17th. The more pieces of music were
structured around their opening and closing triads, the more these triads determined the music’s
emotional connotations. The question why major and minor keys are perceived to be happy or sad thus
reduces to the question of why major or minor triads may be perceived to be happy or sad.

SO WHY IS MINOR SAD?

Isolated major and minor triads tend to be perceived as happy and sad respectively relative to each
other, but the data are noisy and affected by pitch height, loudness and timbre: higher, louder and/or
brighter chords sound happier (Crowder, 1984; Heinlein, 1928; both cited in Gabrielsson & Lindström,
2010). Moreover, a major triad in the context of a minor key (e.g., a flat submediant bVI) may sound
sad because the whole minor context sounds sad, and a minor triad in the context of a major key (e.g., a
mediant III) may sound happy due to the major key context, suggesting that the effect of tonal context
may be emotionally stronger than the effect of the harmony in isolation (I am not aware of an empirical
test of this intuition). Why does a piece of music whose background triad is major tend to sound
happier than a piece whose background triad is minor? I offer the following explanations.
First, the major triad is on average more consonant than the minor triad. This claim is
consistent with both history and psychological theory. In retrospect, we may regard the 13th and 14th
centuries as periods of experimentation with vertical pitch-class sets of cardinality three, from which
major and minor triads emerged as more consonant than other possible combinations of three pitch
classes (Parncutt et al., 2011). This result can be explained if we assume that consonance/dissonance
(C/D) has three main components: roughness, harmonicity, and familiarity (McLachlan, Marco, Light,
& Wilson, 2013; Parncutt & Hair, 2011). People are not immediately sensitive to differences in
roughness and harmonicity, which can only be clearly perceived after many repetitions over a long
period (the familiarity effect). On the basis of their structure, major and minor triads are clearly more
consonant than any other triads due to their low roughness (no seconds) and high harmonicity (perfect
fifth) (Parncutt, 1988; cf. Johnson-Laird, Kang, & Leong, 2012). That explains why they have been the
most common sonorities in Western music since the 16th century; even in the 14th and 15th centuries,
they were the most common sonorities of three pitch classes (Parncutt, Kaiser, & Sapp, 2012). The
same theory explains why major is more consonant than minor: major triads have higher harmonicity
(i.e., they are more similar to the harmonic series as it exists among the audible partials of everyday
harmonic complex tones such as voiced speech sounds). There is probably no consistent or significant
difference between major and minor triads in terms of roughness (in both cases roughness varies
considerably across different voicings of the same chord) or familiarity (both are extremely familiar)—
although one could argue that major triads are somehow more expected, since they are more common.
Second, there may be a general, global tendency for music to be associated with positive
emotions. Music accompanies rituals of all kinds and marks important social events including birth,
initiation, marriage, death, yearly festivals, preparation for battle, victory, communication with gods
and spirits, and (increasingly in Western society) entertainment and everyday social interaction. Most
such events and functions carry positive connotations, so it is natural to associate music generally with
positive emotions. It is “normal” for music to be happy. Of course most (presumably all) cultures also
use music for sad occasions, but this happens less often. While many cultures are especially proud of
their sad traditional music (e.g., Russia, Finland, Portugal, Turkey), my guess is that—even in such
cases—if one was to document the music that most people listen to in everyday life, one would find
that happy music predominates.
It follows that since major triads are more common than minor in common-practice Western
music (Eberlein, 1994; Parncutt et al., 2011), and music in major keys is on average more common than
music in minor keys, music in major keys may be perceived as “normal” while music in minor keys is
somehow exceptional. Variations on this idea can often be found in music theory treatises; for example,
Werkmeister (1687, pp. 124-125; cited in Lester, 1977) considered the major mode to be “natural” or
“perfect” and the minor to be “less natural” or “less perfect.” But Werkmeister’s reasoning was based
on number ratios rather than frequency of occurrence.
This last point explains why major is happy, but we need a further argument to explain why
minor is sad. The most parsimonious explanation is that in minor passages, scale degrees 3 and 6 sound
lower than expected, relative to equivalent major passages—consistent with emotional cues in speech
(Huron, 2008). That does not mean that people necessarily have a cognitive representation of the

130
Empirical Musicology Review Vol. 7, No. 3-4, 2012

equivalent major passage every time they hear a minor melody; but I am assuming that this is
sometimes the case, or has sometimes been the case in the past few hundred years.

COMPARING THEORIES

Huron and Davis presented findings based on mean melodic interval size that can explain why music in
minor keys is perceived as sad relative to equivalent music in major keys, without reference to any
other theory. I have presented an alternative theory, for which the same can be said (although there is
considerable overlap between the two). Which theory is correct?
More generally: How do we evaluate different theories that predict the same result? Is it
possible that two or more theories are simultaneously correct? In a purely quantitative approach, one
might solve this problem by first establishing a “ground truth” of representative datapoints and then
checking which theory best accounts for the entire dataset. But things are seldom that simple.
Interesting theories can generally be applied to a range of problems or datasets, so one has to evaluate
and compare the performance of each theory across these situations. That inevitably introduces
subjective and qualitative elements into the evaluation. A possible approach is to consider the best
theory to be the one that makes the best predictions (precision), in the widest variety of relevant
situations (generalization), and based on the fewest assumptions or the least complex model
(parsimony).
Consider for example theories of the origin of music. Several competing theories seem
capable of independently explaining music’s origin. I recently attempted to solve this problem by first
defining music as a long list of attributes and comparing how each theory predicts each attribute
(Parncutt, 2011b). In the present case, that approach is hardly possible, because there is no equivalent
problem of definition surrounding terms like major, minor, happy and sad.
The analyses of Huron and Davis (2012) can contribute to an explanation of why the harmonic
minor scale is good at expressing sadness, but that is only part of the story. We must also separate
effects due to individual scale steps from effects due to scale-step transitions. The reduction in average
interval size that is achieved by using the harmonic minor scale is quite small (the effect size is small),
suggesting that the emotional difference between major and minor has more to do with individual scale
steps than interval size. I have argued above that the emotional effect of individual scale steps is
anchored to the tonic triad, whose function can be understood in two equivalent ways: Schenker’s
background and Krumhansl’s key profiles.
Meyer (1956) explained the affective power of the minor mode by noting that it is “both more
ambiguous and less stable than the major mode” (p. 226): Scale degrees 6 and 7 come in two chromatic
variants. Similarly, the root of the minor triad is more ambiguous (Parncutt, 1988; Terhardt, 1982).
Independent of music, sadness is associated with uncertainty. Different negative moods have different
functions, suggesting different evolutionary origins (Keller & Nesse, 2005): grief (exhibited as sadness
and crying) has a social function of strengthening social ties to replace lost ones, while the sadness that
is associated with fatigue or pessimism may have the function of conserving energy—for example,
during the winter, or more generally while waiting for a new opportunity for effective action. In any
case, sadness slows people down, which gives them time to think about and evaluate options in
uncertain situations. In the present theory, minor is less common because it is less harmonic (it does not
match the aurally familiar harmonic series so well) and hence more ambiguous and less certain. Along
similar lines, Schenker (1906) regarded the major triad, due to its similarity with the harmonic series,
as “natural” and hence the best basis for composition; for him, the minor triad was “artificial,” an
artistic product.
Is there a causal psychological link between uncertain life situations and uncertain situations
between musical tones? Intuitively, this seems likely, but such hypotheses are difficult to demonstrate
empirically. It is clearer that successful predictions are rewarded by the brain (Huron, 2006; Volz,
Schubotz, & Cramon, 2003). Uncertain situations are situations in which successful predictions are
unlikely, so the positive emotions associated with neural “rewards” are likely to be absent. Similarly, it
is difficult to confirm empirically the idea that musical structure is a mirror of social structure (Feld,
1984).

IMPLICATIONS

I have proposed that the ultimate origin and foundation of the negative emotional valence of music in a
minor key is the prolonged minor triad in the background. A prolonged minor triad has negative

131
Empirical Musicology Review Vol. 7, No. 3-4, 2012

emotional valence by comparison to a prolonged major triad for the simple reason that the third of the
triad is lower in pitch, and hence lower than expected if major is perceived as the norm. Similarly,
speech can sound sad if its pitch is lower than expected. Of course there are, or may be, other reasons,
but this parsimonious observation appears sufficient to explain most of the effect.
If this finding holds, it is part of an emerging solution to an old puzzle. Why is Western music
dominated by MmT? Why does MmT have just two modes, major and minor? Why has the emotional
difference between major and minor been so stable across periods, styles, and (Western) cultures? Why
is the tonal system itself so stable in spite of the long and quite intense history of attempts to usurp it,
originally inspired by composers such as Wagner and Schoenberg, and continuing to the present day?
Current evidence points toward the following tentative explanations. First, assuming that the
C/D of Western sonorities is based on a combination of roughness, harmonicity and familiarity, the
major and minor triads are clearly the most consonant possible chord types or “Tn types” (Rahn, 1980)
of cardinality greater than two, because only they have a perfect fifth (harmonicity) and no seconds
(roughness). Second, three is the greatest number of pitch classes (Parncutt, 1993) or melodies (Huron,
1989) that can be simultaneously perceived (i.e., independently noticed), based on empirical
comparisons between perceived and performed numbers of simultaneous tones and melodies
respectively. These two constraints may have eliminated any and all other candidates for tonic
sonorities composed from tones in the chromatic scale.
Familiarity based on exposure (frequency of occurrence) may have exaggerated the difference
in perceived C/D between major/minor triads and competing candidates for tonic sonorities such as
suspended, diminished and augmented triads—just as constant exposure to music of “great
composers” (or any music, for that matter) tends to increase the preference gap between their music
and the music of lesser-known composers or in lesser-known styles. This is an example of a more
general psychological effect:
The vast literature on the mere-repeated-exposure effect shows it to be a robust phenomenon
that cannot be explained by an appeal to recognition memory or perceptual fluency. The effect
has been demonstrated across cultures, species, and diverse stimulus domains. It has been
obtained even when the stimuli exposed are not accessible to the participants’ awareness, and
even prenatally (Zajonc, 2001, abstract).
The mere exposure effect (Bornstein, 1989; Zajonc, 1968) has been repeatedly confirmed for music: the
more often people are exposed to a particular piece or style of music, the more they like it (e.g., Peretz,
Gaudreau, & Bonnel, 1998; Szpunar, Schellenberg, & Pliner, 2004). The popular idea that “familiarity
breeds contempt” has not been confirmed in mainstream music psychological studies; an inverted-U
function reminiscent of Wundt’s (1874) relationship between stimulus intensity/activity/arousal and
pleasantness/affect/liking appears to exist for the relationship between liking and subjective complexity, but
not for the relationship between liking and exposure or familiarity (North & Hargreaves, 1995).
Since C/D is fundamental to Western music and only consonant sonorities can be prolonged, the
major and minor triads are the best available candidates (pc-sets, or more precisely Tn-types) for
prolongation. Since they also imply almost-complete diatonic scales by means of missing fundamentals
(only leading tones are not implied in this way), they are the best candidates for tonic sonorities of music
that is based on diatonic scales.

Acknowledgments. I am grateful to Graham Hair, Sarha Moore, Renée Timmers, Olli Väisälä, and
Karim Weth for valuable comments on previous drafts. This paper is dedicated to Steve Larson, whose
untimely death on June 7, 2011 was a great loss to both the music theory and the music psychology
communities. Steve was uniquely qualified to evaluate this paper, being equally at home as a
Schenkerian music analyst, a cognitive psychologist, and a performing musician. He would have
emailed me all kinds of interesting and useful reactions. Even without his direct feedback, this paper
may never have happened without several publications in which Steve combined Schenkerian and
cognitive-psychological theory, and convinced researchers in both areas of the virtues of the
combination. On a personal note, I should say that no theoretical discussion of the nature and origins of
musical sadness will make Steve’s premature departure easier for me to understand or accept.

132
Empirical Musicology Review Vol. 7, No. 3-4, 2012

REFERENCES

Adler, G. (1885). Umfang, Methode und Ziel der Musikwissenschaft. Vierteljahresschrift für
Musikwissenschaft, Vol. 1, pp. 5–20.

Baddeley, A. D. & Hitch, G. J. (1974). Working memory. In G. H. Bower (Ed): The psychology of
learning and motivation: Advances in research and theory (Vol. 8, pp. 47–89). New York: Academic
Press.

Bigand, E., Tillmann, B., Poulin-Charronnat, B., & Manderlier, D. (2005). Repetition priming: Is music
special? Quarterly Journal of Experimental Psychology, Section A: Human Experimental Psychology,
Vol. 58, pp. 1347–1375.

Boenke, P. (2005). Zur amerikanischen Rezeption der Schichtenlehre Heinrich Schenkers. Zeitschrift
der Gesellschaft für Musiktheorie, Vol. 2, pp. 181–188.

Bornstein, R.F. (1989). Exposure and affect: overview and meta-analysis of research 1968–1987.
Psychological Bulletin, Vol. 106, pp. 265–289.

Bregman, A.S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge,
MA: MIT Press.

Caplin, W.E. (1998). Classical Form. New York: Oxford University Press.

Cook, N. (1987). The perception of large-scale tonal closure. Music Perception, Vol. 5, pp. 197–206.

Cook, N. (1998). Music: A Very Short Introduction. New York: Oxford University Press.

Costa, M., Fine, P., & Ricci Bitti, P.E. (2004). Interval distributions, mode, and tonal strength of
melodies as predictors of perceived emotion. Music Perception, Vol. 22, pp. 1–14.

Crowder, R.G. (1993). Auditory memory. In S. McAdams & E. Bigand (Eds.), Thinking in Sound: The
Cognitive Psychology of Human Audition. Oxford: Oxford University Press, pp. 113–145.

Crowder, R. G. (1984). Perception of the major-minor distinction: I. Historical and theoretical


foundations. Psychomusicology, Vol. 4, pp. 3–12.

Cruger, J. (1630). Synopsis musicae. Berlin: Kall.

Dahlhaus, C. (1967). Untersuchungen über die Entstehung der harmonischen Tonalität. Kassel:
Bärenreiter.

Deutsch, D. (1970). Tones and numbers: Specificity of interference in immediate memory. Science, Vol.
168, pp. 1604–1605.

Deutsch, D. (1972). Mapping of interactions in the pitch memory store. Science, Vol. 175, pp. 1020–
1022.

Deutsch, D. (1975). The organization of short-term memory for a single acoustic attribute. In D.
Deutsch & J. A. Deutsch (Eds.), Short-Term Memory. New York: Academic Press, pp. 107–151.

Eberlein, R. (1994). Die Entstehung der tonalen Klangsyntax. Frankfurt/Main: Lang.

Feld, S. (1984). Sound structure as social structure. Ethnomusicology, Vol. 28, pp. 383–409.

Flotzinger, R. (2007). Von Leonin zu Perotin: Der musikalische Paradigmenwechsel in Paris um 1210.
Bern: Lang.

133
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Forte, A. (1959). Schenker’s conception of musical structure. Journal of Music Theory, Vol. 3, pp. 1–
30.

Forte, A., & Gilbert, S.E. (1982). Introduction to Schenkerian Analysis. New York: Norton.

Fuller, S. (1986). On sonority in Fourteenth-Century polyphony: Some preliminary reflections. Journal


of Music Theory, Vol. 30, pp. 35–70.

Gabrielsson, A. (2009). The relationship between musical structure and perceived expression. In S.
Hallam, I. Cross, & M. Thaut (Eds.), Oxford Handbook of Music Psychology. Oxford: Oxford
University Press, pp. 141–150.

Gabrielsson, A., & Juslin, P.N. (2003). Emotional expression in music. In R.J. Davidson, K.R. Scherer,
& H.H. Goldsmith (Eds.), Handbook of Affective Sciences. Oxford: Oxford University Press, pp. 503–
534.

Gabrielsson, A., & Lindström, E. (2010). The role of structure in the musical expression of emotions. In
P.N. Juslin & J.A. Sloboda (Eds.), Handbook of Music and Emotion: Theory, Research, Applications.
Oxford: Oxford University Press, pp. 367–400.

Gagnon, L., & Peretz, I. (2003). Mode and tempo relative contributions to “happy-sad” judgments in
equitone melodies. Cognition & Emotion, Vol. 17, pp. 25–40.

Gumpelzhaimer, A. (1591). Compendium musicae latino germanicum. Augsburg: Schoenigius.

Heinlein, C.P. (1928). The affective character of the major and minor modes in music. Journal of
Comparative Psychology, Vol. 8, pp. 101–142.

Hunter, P.G., & Schellenberg, E.G. (2010). Music and emotion. In M.R. Jones, R.R. Fay, & A.N.
Popper (Eds.), Music Perception. New York: Springer, pp. 129–164.

Huron, D. (1989). Voice denumerability in polyphonic music of homogeneous timbres. Music


Perception, Vol. 6, pp. 361–382.

Huron, D. (2006). Sweet Anticipation: Music and the Psychology of Expectation. Cambridge, MA:
MIT Press.

Huron, D. (2008). A comparison of average pitch height and interval size in major- and minor-key
themes: Evidence consistent with affect-related pitch prosody. Empirical Musicology Review, Vol. 3,
pp. 59–63.

Huron, D., & Davis, M.J. (2012). The harmonic minor scale provides an optimum way of reducing average
melodic interval size consistent with sad affect cues. Empirical Music Review.

Huron, D., & Parncutt, R. (1993). An improved model of tonality perception incorporating pitch
salience and echoic memory. Psychomusicology, Vol. 12, pp. 152–169.

Järvinen, T. (1995). Tonal hierarchies in jazz improvisation. Music Perception, Vol. 12, pp. 415–437.

Johnson-Laird, P.N., Kang, O.E., & Leong, Y.C. (2012). On musical dissonance. Music Perception, Vol.
30, pp. 19–35.

Jonas, O. (1954). Introduction. In O. Jonas (Ed.), Heinrich Schenker: Harmony (transl. E.M. Borgese).
Chicago: University of Chicago Press.

Judd, C.C. (2002). Renaissance modal theory. In T. Christensen (Ed.), Cambridge History of Music
Theory. Cambridge, UK: Cambridge University Press, pp. 364–406.

134
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Juslin, P.N., & Laukka, P. (2004). Expression, perception, and induction of musical emotions: A review
and a questionnaire study of everyday listening. Journal of New Music Research, Vol. 33, pp. 217–238.

Kastner, M.P., & Crowder, R.G. (1990). Perception of the major/minor distinction: IV. Emotional
connotations in young children. Music Perception, Vol. 8, pp. 189–201.

Keller, M.C., & Nesse, R.M. (2005). Is low mood an adaptation? Evidence for subtypes with symptoms
that match precipitants. Journal of Affective Disorders, Vol. 86, pp. 27–35.

Krumhansl., C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford University Press.
Krumhansl, C.L., & Kessler, E.J. (1982). Tracing the dynamic changes in perceived tonal organization
in a spatial representation of musical keys. Psychological Review, Vol. 89, pp. 334–368.

Krumhansl, C.L., Louhivuori, J., Toiviainen, P., Järvinen, T., & Eerola, T. (1999). Melodic expectation
in Finnish spiritual folk hymns: Convergence of statistical, behavioral, and computational approaches.
Music Perception, Vol. 17, pp. 151–196.

Landner, N.S. (1997). Ricercares a quattro voci by Vincenzo Galilei, 1584. recorderhomepage.net/
galilei.html

Larson, S. (1997). The problem of prolongation in tonal music: Terminology, perception, and
expressive meaning. Journal of Music Theory, Vol. 41, pp. 101–136.

Leech-Wilkinson, D. (1984). Machaut’s ‘Rose, Lis’ and the problem of early music analysis. Music
Analysis, Vol. 3, pp. 9–28.

Lester, J. (1977). Major-minor concept and modal theory in Germany, 1592-1680. Journal of the
American Musicology Society, Vol. 30, pp. 208–253.

Lippius, J. (1612). Synopsis musicae novae. Strasbourg: Kieffer.

Mattheson, J. (1713). Das neu-eröffnete Orchestre. Hamburg: Schiller).

McDermott, J.H., Lehre, A.J., & Oxenham, A.J. (2010). Individual differences reveal the basis of
consonance. Current Biology, Vol. 20, pp. 1035–1041.

McLachlan, N., Marco, D., Light, M., & Wilson, S. (2013). Consonance and pitch. Journal of
Experimental Psychology: General. DOI: 10.1037/a0030830.

Meier, B. (2009). Rhetorical aspects of the Renaissance modes (transl. G. Chew). Journal of the Royal
Musical Association, Vol. 115, No. 2, pp. 182–190.

Meyer, L.B. (1956). Emotion and Meaning in Music. Chicago: University of Chicago Press.

Meyer, L.B. (1973). Explaining Music: Essays and Explorations. Berkeley, CA: University of
California Press.

Narmour, E. (1977). Beyond Schenkerism: The Need for Alternatives in Music Analysis. Chicago:
University of Chicago Press.

Noorden, L. van (1975). Temporal Coherence in the Perception of Tone Sequences. Dissertation,
Technical University Eindhoven.

North, A.C., & Hargreaves, D.J. (1995). Subjective complexity, familiarity, and liking for popular
music. Psychomusicology, Vol. 14, pp. 77–93.

135
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Oram, N., & Cuddy, L.L. (1995). Responsiveness of western adults to pitch distributional information
in melodic sequences. Psychological Research, Vol. 57, pp. 103–118.

Ortmann, O. (1924). The fallacy of harmonic dualism. Musical Quarterly, Vol. 10, No. 3, pp. 369–383.

Parncutt, R. (1988). Revision of Terhardt’s psychoacoustical model of the root(s) of a musical chord.
Music Perception, Vol. 6, pp. 65–94.

Parncutt, R. (1989). Harmony: A psychoacoustical approach. Berlin: Springer-Verlag.


Parncutt, R. (1993). Pitch properties of chords of octave-spaced tones. Contemporary Music
Review, Vol. 9, pp. 35–50.

Parncutt, R. (2007). Systematic musicology and the history and future of western musical scholarship.
Journal of Interdisciplinary Music Studies, Vol. 1, pp. 1–32.

Parncutt, R. (2011a). The tonic as triad: Key profiles as pitch salience profiles of tonic triads. Music
Perception, Vol. 28, pp. 333–365.

Parncutt, R. (2011b). Defining music as a step toward explaining its origin (spoken paper). Society for
Music Perception and Cognition (SMPC, Rochester, NY, 11–14 August).

Parncutt, R., & Hair, G. (2011). Consonance and dissonance in theory and psychology: Disentangling
dissonant dichotomies. Journal of Interdisciplinary Music Studies, Vol. 5, pp. 119–166.

Parncutt, R., & Prem, D. (2008). The relative prevalence of Medieval modes and the origin of the
leading tone (poster). International Conference on Music Perception and Cognition (ICMPC10,
Sapporo, Japan, 25–29 August).

Parncutt, R., Kaiser, F., & Sapp, C. (2011). Historical development of tonal syntax: Counting pitch-
class sets in 13th-16th century polyphonic vocal music. In C. Agon et al. (Eds.), Mathematics and
Computation in Music. Berlin: Springer-Verlag, pp. 366–369.

Parncutt, R., Kaiser, F., & Sapp, C. (2012). Estimating historical changes in consonance by counting
prepared and unprepared dissonances in musical scores (spoken presentation). International
Conference on Music Perception and Cognition (Thessaloniki, Greece, July).

Peretz, I., Gaudreau, D., & Bonnel, A.M. (1998). Exposure effects on music preferences and
recognition. Memory and Cognition, Vol. 15, pp. 379–388.

Powell, J., & Dibben, N. (2005). Key-mood association: A self-perpetuating myth. Musicae Scientiae,
Vol. 9, No. 2, pp. 289–312.

Pressing, J. (1977). Towards an understanding of scales in jazz. Jazzforschung, Vol. 9, pp. 25–35.

Rahn, J. (1980). Basic Atonal Theory. New York: Schirmer.

Rameau, J.-P. (1722). Traité de l’harmonie reduite à ses principes naturels. Paris: J.B.C. Ballard.

Salzer, F. (1952/1962). Structural Hearing: Tonal Coherence in Music. New York: Dover.

Schenker, H. (1906). Harmonielehre (Neue musikalische Theorien und Phantasien, Vol . 1). Stuttgart: J.
G. Cotta'sche Buchhandlung Nachfolger.

Schenker, H. (1922). Der Tonwille, No. 2. Vienna: Tonwille-Flugblätterverlag (Universal). English


translation: Der Tonwille: Pamphlets in Witness of the Immutable Laws of Music, Ed. William Drabkin,
(transl. Ian Bent). New York: Oxford University Press, 2004–2005.

136
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Schenker, H. (1935). Der freie Satz. Neue Musikalische Theorien und Phantasien (part 3). Vienna:
Universal.

Smith, N.A., & Schmuckler, M.A. (2004). The perception of tonal structure through the differentiation
and organization of pitches. Journal of Experimental Psychology: Human Perception and
Performance, Vol. 30, pp. 268–286.

Stern, D. (1981). Tonal organization in modal polyphony. Theory and Practice, Vol. 6, No. 2, pp. 5–39.

Szpunar, K.K., Schellenberg, E.G., & Pliner, P. (2004). Liking and memory for musical stimuli as a
function of exposure. Journal of Experimental Psychology: Learning, Memory, and Cognition, Vol. 30,
pp. 370–381.

Tagg, P., & Clarida, B. (2003). Ten Little Title Tunes. New York: Mass Media Music Scholars’ Press.

Taylor, S.E., Fiske, S.T., Etcoff, N.L., & Ruderman, A. J. (1978). Categorical and contextual bases of
person memory and stereotyping. Journal of Personality and Social Psychology, Vol. 36, pp. 778–793.

Terhardt, E. (1974). Pitch, consonance, and harmony. Journal of the Acoustical Society of America, Vol.
55, pp. 1061–1069.

Terhardt, E. (1982). Die psychoakustischen Grundlagen der musikalischen Akkordgrundtöne und deren
algorithmische Bestimmung. In C. Dahlhaus & M. Krause (Eds.), Tiefenstruktur der Musik. Berlin:
Technical University of Berlin, pp. 23–50.

Terhardt, E., Stoll, G., & Seewann, M. (1982). Algorithm for extraction of pitch and pitch salience from
complex tonal signals. Journal of the Acoustical Society of America, Vol. 71, pp. 679–688.

Thompson, W.F., Schellenberg, E.G., & Husain, G. (2001). Arousal, mood, and the Mozart effect.
Psychological Science, Vol. 12, pp. 248–251.

Tinctoris, J. (1475). Terminorum musicae diffinitorium. Paris: Richard-Masse

Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency and probability.
Cognitive Psychology, Vol. 5, pp. 207–232.

Väisälä, O. (2002). Prolongation of harmonies related to the harmonic series in early post-tonal music.
Journal of Music Theory, Vol. 46, pp. 207–283.

Vanneus, S. (1533). Recanetum de musica aurea. Rome: Valerio Dorico.

Volz, K.G., Schubotz, R.I., & Cramon, D.Y. von (2003). Predicting events of varying probability:
Uncertainty investigated by fMRI. NeuroImage, Vol. 19, pp. 271–280.

Vos, P.G., & Troost, J.M. (1989). Ascending and descending melodic intervals: Statistical findings and
their perceptual relevance. Music Perception, Vol. 6, pp. 383–96.

Werkmeister, A. (1687). Musicae mathematicae hodeguscuriosus oder Richtiger Musicalischer Weg-


Weiser. Frankfurt: Calvisius.

Wundt, W.M. (1874). Grundzuge der Physiologischen Psychologie. Leipzig: Engelmann.

Zajonc, R.B. (1968) Attitudinal effects of mere exposure. Journal of Personality and Social
Psychology, Vol. 9, No. 2, Pt. 2, pp. 1–27.

Zajonc, R.B. (2001). Mere exposure: A gateway to the subliminal. Current Directions in Psychological
Science, Vol. 10, pp. 224–228.
0

137
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Interval Size and Affect: An Ethnomusicological Perspective


SARHA MOORE
The University of Sheffield

ABSTRACT: This commentary addresses Huron and Davis’s question of whether “The
Harmonic Minor Provides an Optimum Way of Reducing Average Melodic Interval
Size, Consistent with Sad Affect Cues” within any non-Western musical cultures. The
harmonic minor scale and other semitone-heavy scales, such as Bhairav raga and
Hicaz makam, are featured widely in the musical cultures of North India and the
Middle East. Do melodies from these genres also have a preponderance of semitone
intervals and low incidence of the augmented second interval, as in Huron and Davis’s
sample? Does the presence of more semitone intervals in a melody affect its emotional
connotations in different cultural settings? Are all semitone intervals equal in their
effect? My own ethnographic research within these cultures reveals comparable
connotations in melodies that linger on semitone intervals, centered on concepts of
tension and metaphors of falling. However, across different musical cultures there may
also be neutral or lively interpretations of these same pitch sets, dependent on context,
manner of performance, and tradition. Small pitch movement may also be associated
with social functions such as prayer or lullabies, and may not be described as “sad.”
“Sad,” moreover may not connote the same affect cross-culturally.

Submitted 2012 Nov 24; accepted 12 December 2012.

KEYWORDS: sad, harmonic minor, Phrygian, Hicaz, Bhairav, Yishtabach

HURON and Davis’s article states that major scale melodies, on having their third and sixth degrees
flattened, contain smaller intervals on average, and that if small pitch movement connotes “sadness,” then
altering a standard major melody to the harmonic minor is “among the very best pitch-related
transformations that can be done to modify a major-mode melody in order to render a sad affect” (p. 105,
114). I look forward to the testing of this hypothesis, playing these pieces to “major-scale enculturated”
listeners to find out their opinions.
Although Huron and Davis’s study is focused on the Western-enculturated listener, I am
addressing their interest in extending this study to cross-cultural situations (p. 103, 114). Empirical
Musicology Review has an established history of interdisciplinary discussion between music psychology
and ethnomusicology, as in vol. 2, no. 4 where Martin Clayton and John Baily discuss how the beginnings
of Ethnomusicology’s predecessor, Comparative Musicology, were in the Institute of Psychology in Berlin.
Clayton argues that both disciplines are “inherently interdisciplinary” (Clayton, 2009, p. 75); while Baily
states that he has used what he knows of psychology to “understand more about processes of music
cognition” within his work in Afghanistan—also suggesting that music psychologists might “change the
parameters slightly” to include cross-cultural samples (Baily, 2009, p. 86).
Cognitive psychology studies are generally conducted within a Western setting, and Huron and
Davis are at pains to point out that in some cultures there are no particularly “sad” associations to the
harmonic minor scale (p. 104). People who have not been “Western-enculturated” in their listening habits
as children may have very different perceptions. My research and ethnographic interviews, including within
North Indian classical and music of the Middle East, have turned up relevant comments concerning “sad”
connotations of scales. Huron and Davis’s results hinge on the “relative absence of melodic ‘traffic’”
between pitch degrees 6 and 7, the augmented second interval within a harmonic minor scale, in the tunes
selected for study, and the extra semitone interval between pitch degrees 5 and 6. I will focus on the
semitone interval: Does the general presence of more semitone intervals make a piece sound “sadder”? Are
all semitone intervals equal? I also consider what we mean by the word “sad.”

138
Empirical Musicology Review Vol. 7, No. 3-4, 2012

THE CONNOTATIONS OF THE SEMITONE INTERVAL

The harmonic minor scale has three semitone intervals, compared to the two in a major scale. Beyond
speech prosody, what extra-musical connotations are there connecting the semitone and “sadness”? The
association of semitones with negative emotions is strong in the West: “large pitch variation may be
associated with happiness, pleasantness, activity, or surprise; small pitch variation with disgust, anger, fear
or boredom” (Gabrielsson & Lindström, 2010, pp. 240-1). Since Pythagoras and Plato there have been
connotative links between the semitone and dysfunction, incompleteness, and effeminacy. The harmonic
series identified unequal pitch steps and the “problem” of the semitone (Leach, 2006, p. 1). Leach reports
that in the Medieval era the term “semi” was regarded by authors such as Guido d’Arezzo as meaning an
“incomplete” tone. Connections were even made between the semitone “leading note” attraction to the
tonic and the leading of the “simple and masculine” to the “effeminate and violent.” Music “rich in
intervals smaller than a tone” was thus deemed to be of a “morally dubious nature” (Leach, 2006, pp. 2, 5),
in a move most would now surely be wary of making.

Metaphors

Many binary metaphors that relate to large and small intervals are based on the body, with “Joy is Wide;
Sadness is Narrow” perhaps being based on the expansive feeling of a smile (Lakoff & Johnson, 1980, p.
18). Cross-modal mapping takes this to the notion of expansive intervals being best for joyful music,
narrow for sadness. The energy required to perform large intervals is also sometimes connoted as “Power is
Wide; Weakness is Narrow” (Kivy, 1989, pp. 39, 41, 55). Empirical studies on Western emotional
responses to melodic intervals include Maher and Berlyne’s 1982 study using 12 binary rating scales,
which finds that small intervals are considered weaker, simpler, more indefinite, and melancholic than
larger intervals. In the specific case of the semitone, it is considered the most melancholic, and more tense
and more beautiful than most other intervals (Maher & Berlyne, 1982, p. 16). The, perhaps, surprising
introduction here of the concept of beauty takes us away from binary thinking and towards nuanced
connotations, including those of the semitone interval that I have encountered within non-Western cultures.

A VIEW FROM ETHNOMUSICOLOGY

In an interview, Indian Sitar player Baluji Shrivastav (personal communication) uses the generic term
“expressive” to give the idea of a semitone interval’s capacity to carry meaning in music. He describes
how: “the closer notes are very expressive. Anything that is closer, you can feel more expression, it’s very
physical. If you play different notes, the closer you get, the vibrato gets faster and stronger.” Shrivastav
describes the semitone nearest to the tonic as the most expressive, the most “powerful.” This conflicts with
the (above) interpretation of small intervals as weaker.
The “power” of this “leading note” may, perhaps, be more “expressive” within a tonal system than
other semitone intervals. Bharucha’s model of melodic anchoring describes the pull, especially of the
semitone, towards stable pitches in the scale, as like an arrow “yearning” towards the stable tone, a
“psychological force pulling a musical event up or down,” with the notes that are a semitone away from the
tonic having the greatest “yearning” effect (Bharucha, 1996, pp. 383, 398). These concepts of “attraction”
may also be emotionally neutral, a structural feature, as when Chew addresses the “attractive force” of the
major 7th leading note towards the tonic, noting its centrality to Common Practice tonality (Chew, 1983, p.
35).

The Harmonic Minor may not be “Sad”

In some musical cultures the harmonic minor is a fundamental scale. Yossi Sa’aron, an Israeli guitarist
(personal communication), says that, for him, the harmonic minor is the “most natural thing; far from
dissonant; more normal and natural than Ionian or Dorian.” When composing, he starts with the harmonic
minor, with its “half-tone, magical” qualities, and “everything else is variations of it,” introducing what he
calls “flat scales” such as the Aeolian “if I need something for the mood, for the sadness of a piece or
phrase.” Seemingly, then, Huron and Davis’s conclusions are refuted by these remarks. However, there may
be other factors at play, particularly the issues of cultural familiarity.
The musical culture that surrounds you in life clearly determines what you regard as aurally
“normal.” It has come up in various studies that on hearing notes that are different to “normal” we have
particular and specific reactions. For instance, a study by Huron, Yim, and Chordia’s found an association

139
Empirical Musicology Review Vol. 7, No. 3-4, 2012

between sadness and “lower than normal pitch,” where the realization of melodic expectation contributed to
a feeling of pleasure (Huron, Yim, & Chordia, 2010, pp. 63-6). The most well-trodden melodic path in
Huron and Davis’s probability study of different scale-tone successions, in Germanic major mode folk
melodies, is the 4-3-2-1 tetrachord (see Fig.2 p. 107). If the 2nd and 3rd degree were flattened to give the
tetrachord 4-♭3-♭2-1, the average melodic size may stay roughly the same. But does the latter sound sadder
to those enculturated to the major scale? One might expect that the harmonic minor scale may not evoke as
strong an emotional reaction as very unfamiliar modes such as the Locrian or Phrygian modes. Research
may shed light on whether this factor would outweigh general effects of interval size.
The sound of weeping, heard as a falling semitone, became an iconic musical symbol of grief, pain
and loss in Western Classical music. Semiotician Raymond Monelle writes that “the moan of the dissonant
falling second expresses perfectly the idea of lament” (Monelle, 2000, p. 72). Shrivastav (personal
communication) agrees that for him, too, the falling semitone is sad, but continued that the same interval
rising is not sad: “This is quite a happy aspect, going back down is the sad aspect.”
In Jewish Liturgical music there is a mode called the Yishtabach mode that principally uses the
Aeolian mode, except at cadences where the second degree is flattened. Professor Alexander Knapp
(personal communication) told me that the flattening of the second here is: “A colouristic effect, making a
more effective cadence, somehow reflecting the prose of the liturgy of devotion.”
This “lower than normal” note, then, is used for adding expressive meaning, such as of “sadness”
or “devotion.” Familiarity on the one hand, and “falling” from “normal” on the other, may influence
interpretations as much as actual interval size.

Musical Motifs and Ornamentation

Within North Indian classical music the notes equivalent to the Western major scale are called shuddha
(natural) notes, other pitches being called “altered” (Sobhana, 1989, pp. 132). Singer Subroto Roy (personal
communication) gives a different perspective to flattened pitches, concerning their use in music played at
dawn and dusk: “flattened notes help you to get into the active mode of life, or make you sleep well, they
bridge between the unconscious and conscious minds, or between different levels of consciousness.” Roy
here opens up a new world of interpretations that are affected by extra-musical association and how notes
are treated: “You can’t say that flat notes are sad, it all depends on the context, the relationship with other
notes, and your traditional understanding.” Roy thus highlights the radical differences between Western
melodic concepts and Indian raga, where ragas are defined by the uniqueness of their notes.
This, to a greater or lesser extent, applies to the classical musics of Turkish makam and Arabic
maqam that also have strong connections between “scale,” musical motifs, and context. It would be
challenging to attempt Huron’s present study within any of these genres, as altering pitch degrees destroys
the concept of a raga or makam/maqam, rendering it simply “wrong,” with the musical motifs of one raga/
makam/maqam and the notes of another. It might be possible to make the interval-size calculation but it
would be meaningless in terms of the actual music. The equivalent change within Western music results in
a piece that may feel “wrong” in terms of a particular piece, yet changing the tonality into a harmonic
minor would, I suggest, still be perfectly acceptable as a piece within the style/genre.

Empirical Evidence within South Indian Music

An empirical study of Carnatic Classical Music addresses a similar issue to Huron and Davis’s: Bowling
Sundarajan, Han, and Purves (2012) compare the distribution of melodic intervals between melodies in
ragas associated with the “positive/excited” rasa (emotion) Hasya, and those connected with the “negative/
subdued” rasa Karuna. I would point out here that the “subdued” aspect of these ragas is not tied to the
“negative” in any “bad” sense: the rasa Karuna is “compassion” which may have associations of pathos,
love, sympathy, and tenderness. The two “positive/excited” ragas are based on notes equivalent to the
major scale, and the three “negative/subdued” ragas all have ♭2 and ♭6, together (in two cases) with other
“altered” notes.
Bowling et al. (2012) find the principal difference to be the proportion of intervals smaller or
larger than a major second. Their results are that there are significantly more semitone intervals in the
“negative/subdued” raga melodies, making a correlation between these results and scales with flattened
pitch degrees. This study highlights the prevalence of the ♭6 in the “negative/subdued” raga melodies,
rather than the ♭3 in their comparative American study, concluding that “the use of a particular tonic
interval(s) is not critical for the expression of emotion” (Bowling et al., 2012). So, put alongside Huron and

140
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Davis’s study, this is the primary difference: the flattening of notes from a “major scale” makes for smaller
intervals generally, but Bowling et al. find no particular flattened notes to hold special significance.

CONTRADICTIONS AND SUBTLETIES OF CONNOTATION

Within North Indian Classical music the ragas Bhairav, Bhairavi and Todi correspond to the three
“negative/subdued” ragas studied above. They also are associated with rasa Karuna, each with particular
characters: Bhairav is considered a devotional raga, Bhairavi a light, loving raga (Bor, 1996, pp. 32, 34),
and Todi has associations of abandonment (Goswami, 1995, p. 42). None would be regarded as simply
“negative.” For instance, raga Bhairav with its four semitone intervals, is described by Roy (personal
communication) as, for him, evoking a “delight, a spiritual bliss, as when meditating.” Sufi singer Rafaqat
Ali Khan (personal communication) described the presence of a ♭2 as “sad” and said: “Sad in our Sufi
religion doesn’t mean: ‘Oh, I’ve lost my purse,’ it means that I’m closer to God, it is a beautiful and lovely
sensation.” Shrivastav (personal communication) agreed that “sad” was how he, too, understood a ♭2:
“sad, and relaxed…. Sad can be romantic, when you long for someone, a longing mood.” So the question
arises of what we mean when we use the word “sad.”
Small melodic intervals, low frequency, slow tempo, smooth and with quiet dynamics are the
parameters identified by Huron and Davis that convey “sadness” (p. 104). However, Clayton finds the
terms “calm” and “sad” both to be reactions to raga Shri, another raga with four semitone intervals, and
surmises that “both [‘calm’ and ‘sad’] would be associated with slow introverted movements” within that
raga (Clayton, 2005, p. 371). And Johnstone and Scherer describe how “joy” may be a quiet emotion
expressed with small speech movements (Johnstone & Scherer, 2000, p. 229). Huron and Davis
acknowledge “sleepy” as another interpretation of these parameters (p. 103), and Turkish violin player
Cahit Baylav (personal communication) told me that the makam Hicaz (comparable to a mode of the
harmonic minor) was used for a large proportion of lullabies. He further gave his opinion that makam Hicaz
was very popular with the adult Turkish population for this nostalgic connection. These parameters, then,
including small melodic intervals, might combine to express more than “sadness” in different cultures: in a
prayer, a lullaby, or a joyful or tender musical expression, not as exceptions but as widespread phenomena.
Huron and Davis also write of how small melodic movement may be interpreted very differently
when treated in a different way (p. 104). For instance, the North Indian raga Basant comes from the same
pitch set as raga Shri, yet Shri is always serious, while Basant is played faster and lightly, for the joyous,
playful Spring holi festival. Another instance is how the scales of Jewish liturgical song are transformed in
Klezmer, Eastern European dance music. The “sad,” devotional song “Ahava Raba,” using the same pitch
set as Hicaz makam, becomes a lively, joyful dance tune, often with accentuated semitone intervals.

“Melancholic Airs of the Orient”

Interestingly, the interval whose relative absence in Huron and Davis’s study has produced a low average
interval size in the harmonic minor, the augmented second between the 6th and 7th degrees (p. 113), itself
receives connotations of “sad” affect. For instance, in the Bosnian musical genre Sevda, that takes its name
from a word related to melancholia. Milošević associates its “melancholia” with the augmented second
interval (Milošević, quoted in Pennanen, 2010, p. 78). Pennanen writes, of the augmented second, that
“Muslim Slavs adopted the interval because it reflects love’s yearnings and expressions of Oriental
melancholy” (Pennanen, 2010, p. 80). However, Bosnian accordionist Merima Ključo (personal
communication) attributes the “loving” affect to the semitone flattened second: “it’s so interesting to see
people react when you just change the second note…. all of a sudden your body moves, your ear,
everything just turns to the different direction. It is like looking to the wonderful baby and giving a kiss.”
Augmented second or “lower” than normal flattened second, melancholia or tenderness, varying
interpretations abound. Ironically, considering Huron and Davis’s hypothesis, melodic “traffic” between the
6th and 7th degrees may actually add to “sad” associations for the harmonic minor.

141
Empirical Musicology Review Vol. 7, No. 3-4, 2012

CONCLUSIONS

The variance of structural melodic concepts in classical musics from North India and Turkey may preclude
a repetition of Huron and Davis’s study there, though related studies may produce comparable results.
Melodies in the harmonic minor scale, and other scale types that contain many semitones, receive wide and
varied associations in different musical cultures. The semitone is associated with the “sad” and the
“beautiful,” it can be “joyful,” “tense,” or “neutral,” and some semitone intervals are considered to be of
greater affect than others. Familiarity, metaphorical differences, ideologies and cultural meanings introduce
other perspectives in the study of interval size within and across cultures, that offer an opportunity to
embed cultural diversity in cognitive musical discourse.

INTERVIEWS

Interviews conducted by Sarha Moore between 2008 and 2012:


Cahit Baylav, Turkish violin player based in London
Rafaqat Ali Khan, Pakistani Sufi singer based in Lahore
Merima Ključo, Bosnian accordionist based in Amsterdam
Alexander Knapp, pianist, Professor Emeritus of Jewish Music based in London
Subroto Roy, Indian singer based in Pune
Yossi Sa’aron, Israeli guitarist based in Tel Aviv
Baluji Shrivastav, Indian sitar player based in London

REFERENCES

Baily, J. (2009). Crossing the boundary: From experimental psychology to ethnomusicology. Empirical
Musicology Review, Vol. 4, No. 2, pp. 82–88.

Bharucha, J. (1996). Melodic anchoring. Music Perception: An Interdisciplinary Journal, Vol. 13, No. 3.,
pp. 383–400.

Bor, J. (1999). The Raga Guide: A Survey of 74 Hindustani Ragas. Rotterdam: Nimbus Records.

Bowling, D. L., Sundarajan, J., & Han, S., & Purves, D. (2012). Expression of emotion in Eastern and
Western music mirrors vocalization. PLOS ONE.
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0031942

Chew, G. (1983). The spice of music: Towards a theory of the leading note. Music Analysis, Vol. 2., No. 1,
pp. 35–53.

Clayton, M. (2005). Communication in Indian raga performance. In: D. Miell, D. Hargreaves, & R.
MacDonald (Eds.), Musical Communication. Oxford: Oxford University Press, pp. 361–381.
Clayton M. (2009). Crossing boundaries and bridging gaps: Thoughts on relationships between
ethnomusicology and music psychology. Empirical Musicology Review, Vol. 4, No. 2, pp. 75–77.
Gabrielsson, A., & Lindström E. (2010). The role of structure in the musical expression of emotions. In:
P.N. Juslin, & J.A. Sloboda (Eds.), Handbook of Music and Emotion: Theory, Research, Applications.
Oxford: Oxford University Press, pp. 367–400.

Goswami, R. (1995). Meaning in Music. Shimla: Indian Institute of Advanced Study.

Huron, D., Yim, G., & Chordia, P. (2010). The effect of pitch exposure on sadness judgments: An
association between sadness and lower-than-normal pitch. In: S.M. Demorest, S.J. Morrison, & P.S.
Campbell (Eds.), Proceedings of the 11th International Conference on Music Perception and Cognition, pp.
63–66.

142
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Johnstone, T., & Scherer, K.R. (2000). Vocal communication of emotion. In: M. Lewis, Haviland-Jones
(Eds.), Handbook of Emotions. New York: Guilford Press, pp. 220–235.

Kivy, P. (1989). Sound Sentiment. Philadelphia: Temple University Press.

Lakoff, G., & Johnson, M. (1980). Metaphors We Live By. Chicago: University of Chicago Press.

Leach, E.E. (2006). Gendering the semitone. Music Theory Spectrum, Vol. 28, No. 1, pp. 1–22.

Maher, T., & Berlyne, D.E. (1982). Verbal and exploratory responses to melodic musical intervals.
Psychology of Music, Vol. 10, pp. 11–27.

Monelle, R. (2000). The Sense of Music: Semiotic Essays. Princeton: Princeton University Press.

Pennanen, R.P. (2010). Melancholic airs of the Orient—Bosnian Sevdalinka music as an Orientalist and
national symbol. In: R.P. Pennanen (Ed.), Music and Emotions. Helsinki: University of Helsinki, pp. 76–90.
http://hdl.handle.net/10138/25832

Sobhana, N. (1989). Bhatkhande’s Contribution to Music: A Historical Perspective. Delhi: Popular


Prakashan.

143
Empirical Musicology Review Vol. 7, No. 3-4, 2012

A method for testing synchronization to a musical beat in


domestic horses (Equus ferus caballus)
MICAH R. BREGMAN
Department of Cognitive Science, UC San Diego
and The Neurosciences Institute

JOHN R. IVERSEN
The Neurosciences Institute,
and Swartz Center for Computational Neuroscience, UC San Diego

DAVID LICHMAN
Parelli Natural Horsemanship, 5-Star Professional

MEREDITH REINHART
Private Scholar

ANIRUDDH D. PATEL [1]


The Neurosciences Institute
and Department of Psychology, Tufts University

ABSTRACT: According to the “vocal learning and rhythmic synchronization


hypothesis” (Patel, 2006), only species capable of complex vocal learning, such as
humans and parrots, have the capacity to synchronize their movements to a musical
beat. While empirical research to date on a few species (e.g., parrots and monkeys) has
supported this hypothesis, many species remain to be examined. Domestic horses are
particularly important to study, as they are vocal non-learners who are occasionally
reported to move in synchrony with a musical beat, based on informal observations. If
these reports are substantiated by scientific experiments, this would challenge the vocal
learning hypothesis and provide a new species for the comparative study of musical
rhythm. Here we present a new method for testing whether horses can synchronize
their trotting to a musical beat, including an illustration of data analysis based on data
collected from one horse.

Submitted 2012 July 30; accepted 2012 September 6.

KEYWORDS: musical beat, rhythm, synchronization, animals, horses, evolution

INTRODUCTION

THIS paper presents a method for testing synchronization of movement to a musical beat in domestic
horses (Equus ferus caballus). As discussed below, horses provide a crucial test of the “vocal learning and
rhythmic synchronization hypothesis” (Patel, 2006), which states that the capacity for complex vocal
learning is an evolutionary and neurobiological prerequisite for the ability to synchronize movements with
a musical beat. One prediction of this hypothesis is that vocal non-learners (such as nonhuman primates,
dogs, cats, and horses) lack the capacity to synchronize movements to a musical beat. Among vocal non-
learners, domestic horses are of particular interest because unlike dogs, cats, and nonhuman primates, they
are the subject of numerous anecdotes about spontaneous entrainment to a musical beat (one such anecdote
is reproduced in Appendix 1; author AP has received several such anecdotes from equestrians). If these
anecdotes can be confirmed by scientific experiments, this would provide evidence contrary to the vocal
learning hypothesis and provide a new species for the comparative study of musical rhythm. Hence the
current paper presents a method for testing synchronization to a musical beat in horses, including
illustrations of data analysis based on data collected from one horse. It should be noted that these data are
not sufficient in quantity to determine if horses synchronize to music, and are presented here for didactic
purposes. The focus of the paper is on describing a new experimental method, called “circular trotting to

144
Empirical Musicology Review Vol. 7, No. 3-4, 2012

music” (CTM), with the hope that future researchers will use it to decisively answer whether horses
synchronize to the beat of music. Before describing the CTM method, we first briefly discuss the
theoretical background for this research and review several key features of musical beat perception and
synchronization.

Theoretical Background

Why study music perception in other animals? Cross-species studies of music cognition provide a way to
study the evolutionary roots of our musical capacities (McDermott & Hauser, 2005; Fitch, 2006). Music
cognition involves many distinct capacities, ranging from “low-level” capacities not specific to music, such
as the ability to perceive the pitch of a complex harmonic sound, to “high-level” capacities which appear
unique to music, such as the processing of tonal-harmonic relations based on learned structural norms
(Peretz & Coltheart, 2003; Koelsch, 2011). It is unlikely that all of these capacities arose at the same time
in evolution. Instead, different capacities are likely to have different evolutionary histories. Cross-species
studies can help illuminate these histories using the methods of comparative evolutionary biology (for
further discussion, see Patel & Demorest, 2013).
For example, consider the capacity to sense, and move in synchrony with, a perceived regular
temporal pulse (e.g., via foot tapping, head bobbing, or dance). Musical beat perception and
synchronization (BPS) is a fundamental aspect of music cognition that develops without special training in
young children, and is observed in every human culture (Brown & Jordania, 2011; Nettl, 2000). Did the
capacity for BPS arise uniquely in the human lineage as part of an evolutionary adaptation for music-
making (Bispham, 2006; Honing, 2012), or does it reflect ancient abilities shared by many species?
Darwin (1871) seems to have favored the latter view, stating that “The perception, if not the enjoyment, of
musical cadences and of rhythm is probably common to all animals, and no doubt depends on the common
physiological nature of their nervous systems” (p. 1207 in the 2006 edition, E.O. Wilson, Ed.). Darwin’s
intuition is reasonable: it seems plausible that the capacity to sense a beat in music and synchronize to it
could be widespread among animals. After all, the auditory systems of humans and other mammals
(especially primates) have many structural and functional parallels (Rauschecker & Scott, 2009), and all
vertebrates produce voluntary rhythmic movements (e.g., as part of locomotion). Therefore, it is
reasonable to expect that many animals could learn to coordinate rhythmic movements with a perceived
auditory beat. Furthermore, nonhuman animals (henceforth, animals), such as rhesus monkeys, can learn
complex sensorimotor tasks in the laboratory (Georgopoulos, Taira, & Lukashin, 1993). Thus one might
expect that many animals could learn to synchronize movements with an auditory beat, especially since
rhythm is increasingly thought to be a fundamental organizing principle of brain function (Buzsáki, 2006).
Another conceptual possibility is that the capacity for BPS is neither uniquely human nor
widespread among animals, but restricted to a few species. This view was suggested by Patel (2006), who
proposed that the capacity for BPS arose as a fortuitous byproduct of the brain circuitry for complex vocal
learning. According to this view, the capacity for BPS did not arise as a result of an evolutionary
adaptation for music-making, but a secondary use (or “exaptation”, Gould & Vrba, 1982) of brain circuits
that evolved for other reasons. This “vocal learning and rhythmic synchronization hypothesis” was
motivated by the fact that vocal mimicry, like BPS, involves tight links between the auditory and motor
systems, and the fact that the neural substrates for vocal learning and BPS appear to overlap in the brain
(see Patel, 2006 for details). Crucially, vocal learning is a rare capacity found only in a few groups of
mammals (including humans, dolphins, elephants, and seals) and birds (songbirds, parrots, and
hummingbirds), and is associated with neuroanatomical specializations including modified auditory-motor
cortical connections (Jarvis, 2007). Thus the vocal learning hypothesis makes a testable prediction: only
vocal learners are capable of BPS.
The discovery of BPS in parrots has supported this hypothesis (Hasegawa, Okanoya, Hasegawa, &
Seki 2011; Patel, Iversen, Bregman, & Schulz 2009a; Schachner, Brady, Pepperberg & M. Hauser, 2009) as
has the finding that rhesus monkeys (who are vocal non-learners) could not learn to synchronize their taps
to a metronome despite over a year of concerted training (Zarco, Merchant, Prado, & Mendez, 2009).
However, many other vocal non-learner species remain to be tested. As noted above, horses are of
particular interest due to the many anecdotal accounts of BPS in domesticated horses. If horses show the
key features of BPS, this would challenge the vocal learning hypothesis and raise new questions about the
evolutionary substrates of this ability.

145
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Key features of musical beat perception and synchronization

Synchronization to a musical beat in humans has several important features that distinguish it from other
examples of rhythmic entrainment in nature, such as the synchronous chorusing of certain insects and frogs
(Patel et al., 2009b). First, BPS involves the extraction of a beat from a complex acoustic stimulus (i.e.,
music vs. from simple pulse trains). Second, BPS involves substantial flexibility in movement tempo:
humans can easily synchronize their movements to a musical beat across a fairly wide range of tempi.
Third, BPS is cross-modal, with rhythmic sound driving movement that is not necessarily aimed at sound
production. Thus a convincing demonstration of BPS in animals requires demonstrating these three
features. In addition, an important feature of BPS is phase matching: people spontaneously align their taps
and other rhythmic movements with a beat. That is, people tap slightly before the beat or on the beat.
Stated more formally, the average temporal asynchrony between taps and beats is slightly negative or
around zero (Patel, Iversen, Chen, & Repp, 2005; Rankin, Large, & Fink, 2009). This indicates accurate
temporal prediction of beats. Hence if an animal is trained to tap with a beat, and taps at the correct tempo
after each beat (e.g., with a delay of a few hundred ms), this behavior could be largely reactive rather than
anticipatory. (This sort of behavior was observed in the study of Rhesus monkeys mentioned above.)

PRIOR RESEARCH ON HORSES AND MUSIC

To our knowledge, there is no prior research on synchronization of movement to music in horses. Indeed,
there appears to be almost no scientific research on how horses respond to music. This is somewhat
surprising given the frequent use of music in certain types of horse training and dressage (a competitive
sport based on guiding a horse through a complex set of movements based on subtle cues from the rider).
For example, the Spanish Riding School of Vienna, a traditional riding school for Lipizzan horses that dates
back to the 1500s, has long used music in its performances. The belief that horses had “an almost human
sensibility for music” was proposed as early as 1563 by French author François de Provane, and in 1612 the
French court staged an elaborate equestrian ballet for the engagement of Louis XIII (Van Orden, 2005:
235-284).
The lack of modern research on equine responses to music is also somewhat surprising given that
several other species have been studied for responses to human music, motivated by either basic cognitive
research or applied animal science (e.g., McDermott & Hauser, 2007; Patel et al., 2009; Schachner et al.,
2009; Snowdon & Teie, 2010; Wells, Graham, & Hepper, 2002; Wells, 2009). We were able to find only
one prior scientific paper on horses and music, which examined the effect of four types of music on stress-
related behaviors (such as neighing) in ponies (Houpt, Marrow, & Seeliger, 2002). The authors found no
significant differences between different music types.
Fortunately, there is excellent psychophysical research on horse hearing that is relevant to the
current work. Figure 1 shows an audiogram for domesticated horses (solid line) compared to a human
audiogram (dashed line) (Heffner & Heffner, 1992). These audiograms represent the hearing thresholds for
pure tones of different frequencies, i.e., the lowest intensity at which a pure tone can be detected 50% of the
time. Larger threshold values indicate that the tone had to be more intense before it could be detected.
Horses have a broad region of best hearing sensitivity, which overlaps considerably with the range of best
hearing in humans. One notable difference is that the horse hearing range does not extend quite as low as
the human range, but exceeds the human range on the high end. Heffner and Heffner (1992) suggest that
high-frequency hearing in horses plays an important role in sound localization, rather than being used for
acoustic communication with conspecifics (for one recent study of horse vocalizations, see Policht,
Karadios & Frynta, 2011). They also suggest that domestication of horses by humans, which has occurred
for at least 5000 years, is unlikely to have significantly affected the horse audiogram.
Given the extensive overlap between horse and human audiograms, it seems likely that horses hear
human music quite well. Spectral analysis of the two songs used in the current study indicated that the
majority of the energy was below 2KHz, which is typical for human music and well within the hearing
range of both species.

146
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Fig 1. Audiogram of the domestic horse, Equus ferus caballus (solid line) compared to a human audiogram
(dashed line). From Heffner & Heffner (1992). The human audiogram was obtained from speakers placed
in front of the subject, to make the testing conditions similar to animal testing. The horizontal line at 60 dB
is conventionally used to define the low and high-frequency hearing limit, based on where it intersects the
audiogram. Audiogram values below 0 on the y-axis indicate sounds whose intensity is less than the 0 db
reference level (i.e., 0 dB is not silence, but an established conventional sound pressure level). Reproduced
with permission.

TESTING MUSICAL BEAT PERCEPTION AND SYNCHRONIZATION IN HORSES

General considerations

In testing animals for synchronization to a musical beat, several important methodological issues arise (see
Patel et al., 2009b for a detailed discussion of 11 such issues). A primary issue is eliminating possible
visual, auditory, or tactile rhythmic cues from humans, which could lead to a “Clever Hans” effect. This
concern particularly applies to horses, which are capable of sensing human cues in many modalities (e.g.,
Proops et al., 2010; Saslow, 2002). In particular, the horse’s sensitivity to tactile cues means that an
essential requirement for studies of BPS is that the horse not be ridden. In addition, any humans in close
proximity to a horse during testing should be “deaf” to the music that the horse is hearing so that they
cannot give inadvertent cues.
Another primary issue is that the animal be tested for tempo flexibility, i.e., for its ability to
synchronize to music at different tempi. The danger with using a single tempo is that a horse could appear
to synchronize to music if the music tempo happened to match its natural trotting tempo, even without any
auditory-motor coupling. In prior research on BPS in another species (a cockatoo), testing for tempo
flexibility was accomplished by choosing a single song with a clear beat and creating versions at several
different tempi using software that allows changing tempo without changing pitch (Patel et al., 2009a). In
that prior work, 11 different versions of a song were created ranging from 20% slower to 20% faster than
the original. The use of different tempi allows one to test for tempo flexibility, a key feature of BPS, and is
also crucial for the statistical permutation tests used to check for true synchronization to a musical beat, as
discussed further below. For certain animals, the number of tempi used might need to fewer, depending on
the particular motor abilities of the animal. For example, with horses one should ensure that the horse is
capable of trotting as slow as the slowest tempo and as fast as the fastest tempo. In general, we suggest that
no fewer than five distinct tempi be tested (e.g., -15%, -7.5%, 0, +7.5%, and +15% relative to the original
musical piece). According to one source (Gallo, 2007), the average trotting tempo for horses is 152 beats
per minute [BPM], based on the timing of footfalls of the front legs. That is, if one measures the number of
steps per minute as the horse trots, counting each front leg footfall as a step, the average rate is 152 steps
per minute, though this number depends on the particular horse. Author DL feels that the comfortable
range of trotting for horses likely spans about 140 – 200 BPM. Thus if using a tempo range of +/-15%
relative to the original musical piece, a base tempo of about 165 BPM (e.g. “Hound Dog” by Elvis Presley)
would ensure that the slowest and fastest tempi fall within this range.

147
Empirical Musicology Review Vol. 7, No. 3-4, 2012

A final issue is the use of objective methods to measure synchronization to a musical beat. Simply
examining videos and making perceptual judgments of synchrony is not a valid way to study BPS in other
species, as humans have a tendency to perceive synchrony between rhythmic auditory and visual stimuli
even when it is not objectively present (Arrighi, Alais, & Burr, 2006).     Thus objective measures, such as
frame-by-frame coding of the animal’s movement from video (with the sound track turned off) are needed.
Some portion (ideally all) of the data should be coded independently by at least two researchers in order to
measure inter-rater reliability.

The circular trotting to music (CTM) method

The circular trotting to music (CTM) method is a way of testing whether horses spontaneously synchronize
their trotting to a musical beat. This method involves a horse trainer and an equipment controller working
with one horse in an indoor enclosure (to minimize background noise). The trainer stands at the center of
the enclosure and the horse trots in circles while music is played over loudspeakers in the arena. To prevent
the trainer from hearing this music, the trainer wears sound-isolating insert earphones and closed-ear noise-
protecting earmuffs (e.g., of the type used when operating loud power equipment). The earphones are
connected to a portable digital audio device, and throughout the experiment the trainer listens to music
without any beat, such as meditation music, at a sound level sufficiently high to mask out the music heard
by the horse. The equipment controller triggers the start of experimental trials by starting the music and
video camera, at which point s/he gives a visual cue to the trainer that an experimental trial has begun.
Further information on the roles of the trainer and equipment controller are given below. Trials from the
current study can be viewed online (see Appendix 2), and the reader is encouraged to watch these videos in
conjunction with reading the information below. Our study focused on a young adult male horse named
“True” (Breed: Morgan), who was approximately 6 years old. Testing took place in an indoor enclosure
measuring approximately 75' x 125', with a dusty sand floor (Horse Quarters in Davis, CA). The study took
place in March of 2010.

ROLE OF THE TRAINER

The trainer interacts with the horse during the experiment. Specifically, once the music for a given trial has
started, the trainer gives the horse a “go” signal to start trotting in circles around the trainer. The horse
continues trotting until it elects to stop or the music ends (in which case the trainer gives a “stop” signal).
The trainer can give the horse a food reward at the end of trials if needed.
In our study, the trainer and horse were connected by a long slack rope, a 22-ft “lead
line” (pronounced: “leed” line) made of yachting braid made by Parelli Natural Horsemanship, used in
conjunction with a 1/4" yacht braid rope halter. The trainer held one end of this rope with one hand, and
with the other hand held a thin rigid pole connected to a flexible whip (in Parelli terminology, a “carrot
stick and string”, Figure 2). Using body language, the trainer gave the “go” signal to the horse using the
rope and/or whip. The horse was allowed to circle the trainer either clockwise or counterclockwise,
depending on its preference, which could vary from trial to trial. As the horse circled (typically at a
distance of about 10’ from the trainer), the trainer turned in place to maintain eye contact with the horse.
The trainer continued to hold the rope while the horse circled, but no tension was applied to the rope, which
often dragged lightly on the ground, meaning that the horse set its own pace. The whip was never applied
once the horse started trotting, and thus was not a source of rhythmic cues. On five trials in the current
study the horse trotted around the trainer “at liberty”, meaning that no rope connected the horse and trainer
(the trainer still turned in place to face the horse as it trotted). Also, on three trials the horse started trotting
before the music started playing. In future work, the use of a lead line (vs. liberty) and the onset of horse
trotting relative to the music (before or after music onset) should be made consistent.

148
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Fig. 2. Screenshot from video of an experimental trial, showing the horse and trainer. Please see the online
videos for full experimental trials.

ROLE OF THE EQUIPMENT CONTROLLER

The equipment controller handles audio playback and video recording. S/he stands at a table some distance
from the horse and trainer. The music is played over high-quality, powered portable speakers at a sound
level that is clearly audible at the horse’s position. The video camera, mounted on a tripod, is used to film
all trials and should give a good view of both horse and trainer, with the camera angle set so that the horse’s
feet are clearly visible (since the timing of footfalls is used to measure synchronization of movement to the
beat). Once the music and video camera have been triggered for a trial, the equipment controller should
step behind a blind to ensure that s/he is invisible to the horse.
In our study, music was presented via a JBL EON-10 powered speaker (280 Watts, dimensions 19”
x 12” x 10”, with 10” woofer, 1” tweeter, and frequency range of 58Hz-18.5kHz), connected to an iPod
Classic (8 GB). Trials were filmed with a Canon Optura 20 digital video camera (frame rate = 30 frames
per second [fps]; to achieve higher-resolution analysis of animal movements, future work should use
cameras with higher frame rates, e.g., 100 fps or more). The equipment controller and video camera were
located about 50’ from the horse at the closest point of his trotting circle; the audio speaker was on a
viewing platform on the side of the arena, 5’ off the ground (about 15’ from the equipment controller and
55’ from the True at the closest point of his trotting circle). We did not use a blind for the equipment
controller, which raises the possibility that the horse may have picked up on inadvertent rhythmic cues from
the controller, who heard the same music as the horse. Author DL (an experienced horse trainer who served
as a trainer in the current work) feels it is unlikely that True sensed visual cues from the equipment
controller, because True was focused on his trotting circle, and because the controller was out of view of
True for at least 1/3 of the circle while True was heading away from the controller. However, future work
should use a blind to eliminate the possibility of cues from equipment controller to horse. An alternative to
video coding might be to use an accelerometer attached to the horse, which has been used to capture human
rhythmic movement to music (Phillips-Silver et al., 2011).

CHOICE OF MUSIC AND NUMBER OF TRIALS

Music that conveys a clear, steady beat should be selected for research on animal BPS. Much pop music is
suitable for this purpose. It may be advantageous to use music familiar to the animal. For example, in the
case of horses, a song that the horse has heard in its barn or during training or showing could be chosen.
Care should be taken that the music does not sound muddy at the horse’s position (e.g., due to poor quality
audio speakers or echoes in the arena). As noted above, versions of the music at several different tempi
should be presented. Furthermore, multiple “usable” trials should be gathered at each tempo, where a
“usable” trial is a trial where the horse trots continuously for a sustained amount of time while the music is
on (e.g., at least 50 steps). For example, if 5 distinct tempi are used, it would be desirable to have at least
two or three useable trials at each tempo.
In our study two songs were used: “No one emotion” by George Benson (tempo = 155 BPM,
duration = 3:58) and “The Rhythm of the Ride” by Mary Ann Kennedy (tempo = 156 BPM, duration =
2:45; note that tempi reported here were obtained from the beat tracking algorithm discussed in the Data

149
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Analysis Method section below). These songs were chosen because they had a strong beat and because
author DL, who had prior experience with True, felt that True could trot comfortably at these tempi.
Due to time constraints, we were not able to conduct trials with tempo-manipulated versions of the
songs. Thus we cannot conduct the statistical permutation tests needed to determine if there is significant
synchronization to the beat, although such tests are explained in the section on data analysis below. As
noted at the beginning of this paper, these data are not sufficient to determine if True showed significant
synchronization to a musical beat, but they are sufficient for an explanation of data analysis methods for
testing BPS in horses. These methods are described below.

Data analysis method

The experimental video was segmented into individual trials based on the start and end of the music
stimulus on each trial. For each trial, the video and audio tracks were separated so that an audio file and a
silent video file were obtained for further analysis, using AviSynth (avisynth.org).

AUDITORY BEAT EXTRACTION

The auditory beat was extracted for each trial using a dynamic programming algorithm in Matlab (Ellis,
2007). Using this method, we obtained a sequence of music beat times for each of the experimental trials.
To verify the accuracy of the beat tracker, we synthesized beeps representing each beat and overlaid them
on top of the audio track. Any segments where the musical beat and beat tracker diverged were excluded
from the analysis. In our data, one trial where beat tracking was poor was excluded.

VISUAL RHYTHMIC MOVEMENT EXTRACTION

The timing of the horse’s left and right front footfalls was used as a measure of rhythmic movement. The
timing of each footfall was extracted from the silent videos by two independent coders. For each segment,
the coders recorded the frame number in which a footfall occurred, as defined by the point where either the
left or right hoof made full contact with the ground (each footfall was labeled as a left or right footfall, but
these labels were not used in the analysis: rather, the time sequence created by the alternation of the front
two feet provided the movement series for analysis). Inter-rater reliability was very high: 90% of frame
numbers recorded by coder 2 were within 1/15th of a second of coder 1; 99% were within 1/10th of a
second), so one of the coder’s data was used for all of the reported analyses. By converting frame numbers
to time, we were able to obtain a sequence of footfall times (in seconds) for each segment. The total
number of coded footfalls per trial ranged from 57 to 156 (mean = 101.0).

STATISTICAL SYNCHRONIZATION ANALYSIS: IDENTIFYING SYNCHRONIZED BOUTS

Following the methods of Patel et al. (2009), the statistical test of BPS is divided into two steps. The first
step is to examine each trial for bouts of synchrony (i.e. time windows when footfalls and musical beats are
aligned). This process was conducted on the current study, and is described in this subsection. The second
step (not possible in the current study, due to lack of trials at different tempi) is a statistical permutation test
that provides the true test of BPS. This permutation test is further described in the next subsection.
To look for synchronized bouts within a trial, the timing of each footfall was converted into a
relative phase (RP) value within the time interval bounded by the preceding and succeeding musical beats
(Figure 3). For example, if a musical beat occurred at 1.0 and 2.0 seconds and a footfall at 1.25 seconds, the
footfall’s RP would be coded as 90 degrees (1/4 of the beat interval after the beat); whereas a footfall at
1.50 seconds would be labeled as 180 degrees (1/2 of the beat interval), and a footfall of 1.75 seconds
would be coded as -90 degrees (1/4 of the beat interval before the beat) Thus a sequence of N footfall
times in a given trial would be converted to a sequence of N RP values.

150
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Fig. 3. Illustration of relative timing of footfalls and musical beats. Four footfalls (vertical grey lines) and
four musical beats (black dots) are depicted (FL = Front left foot, FR = front right foot). The first two
footfalls have positive relative phase (40 and 20 degrees) since they occur after the nearest musical beat.
The third footfall has zero relative phase since it aligned in time with the musical beat. The fourth footfall
has negative relative phase (-20 degrees) since it occurs before the nearest musical beat.

Once the RP sequence for a given trial was computed, we searched for bouts of synchrony by performing a
windowed analysis on each sequence. Specifically, the first eight RP values were examined with a phase-
sensitive Rayleigh test (Fisher, 1993) to test if footfalls and beats were synchronized. Then, the window
was shifted by 4 RP values and this test was repeated (i.e., window 1 was RP values 1-8, window 2 was RP
values 5-12). This process of window-shifting and statistical testing was repeated until the end of the RP
sequence was reached. A synchronized bout was defined as at least two consecutive windows (i.e., a
minimum of 12 total RP values) with p<0.05.
How does the phase-sensitive Rayleigh test work? Briefly, this test uses a sequence of RP vectors
to compute a mean vector, which can be represented on the unit circle (Figure 4). After computing this
mean vector, the length of this resultant vector projected onto zero phase (the horizontal axis) provides a
measure of synchrony. More intuitively, the test examines whether the RP vectors tend to cluster around
zero phase, as would be expected if footfalls and beats were in synchrony and phase-matched.

Fig. 4. Illustration of 8 relative phase values depicted as light blue vectors on the unit circle. The numbers
around the circle are degrees (0 represents perfect alignment of footfall and musical beat; negative numbers
indicate footfalls preceding the nearest musical beat; positive numbers indicate footfalls following the
nearest beat). Note that two vectors have a very similar value (near 3 degrees) and hence overlap in this
image. The dark red arrow is the mean vector of these eight phase vectors, and the black arrow is a
projection of the mean phase vector onto the zero phase axis.

151
Empirical Musicology Review Vol. 7, No. 3-4, 2012

STATISTICAL SYNCHRONIZATION ANALYSIS: TESTING FOR BPS WITH A PERMUTATION TEST

While the phase-sensitive Rayleigh test described above can identify synchronized bouts within each trial,
one must consider the null hypothesis that such bouts occur by chance (cf. Patel et al., 2009b, subsection
“Could synchronization have happened by chance?”). That is, one must consider the null hypothesis that
the animal moves rhythmically in response to music, and that because of natural variability in movement
tempo there are periods when (by pure chance) the movements have a consistent relationship to the beat.
Patel et al. (2009a) used a permutation test to check this null hypothesis. The essential idea behind this test
is to count the total number of movements (across trials) that are part of synchronized bouts, then to
randomly pair each sequence of footfall timings with a musical beat time sequence from a different trial.
The permuted data are then analyzed for synchronized bouts, and the total number of movements in such
bouts is computed. If synchronized movements in the original data are largely due to chance, then there
should be a comparable number of synchronized movements in the permuted data. To make this a valid
statistical test, the process of permutation and synchronized-movement-counting must be done many times
(e.g., 1,000), to create a distribution of values of the number of synchronized movements that are found by
chance. The actual number of synchronized movements in the original data is then compared to this
distribution to compute the probability (p-value) that the observed amount of synchrony is a chance result
(for an application of this method to real data from another animal species, see Patel et al., 2009a).
For the permutation test to be meaningful, one must have trials at several different tempi, because
this ensures variability in the structure of the musical beat time-series across trials. Since both of our
musical stimuli were very close in tempo, and we were unable to conduct trials at different tempi, we could
not apply the permutation test to our data. Future work with horses should endeavor to include at least five
different tempi in the experiment, as noted previously in this paper. Synchronized bouts may not be found
at all tempi, but if they are found at several tempi then the permutation test provides a way of testing
whether the observed amount of synchrony is due to chance.

Results from the current data

We collected data from 16 trials and analyzed 15 of them (one trial was excluded since we were unable to
get an accurate result from the beat-tracking algorithm). Trials varied in length from 21.5 - 62.7 seconds
(mean = 41.3 seconds; trial onset was defined as the point when True started trotting to the music, and
offset as the point when he ceased trotting, which always occurred before the music ended). The number of
footfalls per trial ranged from 57-156 (mean = 101 footfalls per trial, total number of footfalls across trials
= 1537). The two musical stimuli used for this experiment were very similar in tempo (both had tempi
between 155.05 and 156.81 BPM when measured using the beat tracker).
After converting the footfall and musical beat timing data to sequences of relative phase values (as
described above), we found that nearly all trials (14/15) had at least one synchronized bout (range: 1-4
bouts, mean: 1.79 bouts per trial). The number of footfalls that were part of synchronized bouts ranged
from 12-72, with a mean of 35.42 synchronized footfalls per trial in the 14 trials with bouts (on these trials,
an average of 33.9% of the footfalls were part of synchronized bouts). The total number of synchronized
footfalls across all trials was 496. To test whether this amount of synchrony could occur by chance, and to
test for tempo flexibility (a key feature of BPS), one would need data from trials at several different tempi,
as discussed in the previous section.
Figure 5 shows relative phase data from one trial with a single bout of 40 synchronized footfalls.
The synchronized bout is indicated with a black box. One notable aspect of this figure is that when True’s
tempo is slightly faster or slower than the music tempo, the relative phase slowly drifts from -180 to +180
degrees (”phase wrapping”), but during the bout stabilizes near zero, indicating footfalls at the same tempo
as the music and synchronized with the musical beat.
In summary, True exhibits synchronized bouts, but given the limitations of this dataset it is not
possible to say whether or not this is a coincidence. Testing genuine synchronization to a musical beat in
horses awaits future research that includes more experimental trials and a range of musical tempi.

152
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Fig. 5. Time series of relative phase values from one trial of the current study. The 40 values within the
black box are part of a synchronized bout. Relative phase (in degrees) is plotted on the vertical axis with
zero indicating perfect alignment of footfalls and musical beats. Negative values indicate footfalls that
occur before the nearest beat, positive values indicate footfalls that occur after the nearest beat, and
values of +/-180 indicate footfalls that are midway between beats (i.e., in antiphase with the beat).

DISCUSSION

We have presented a novel experimental method for testing synchronization to a musical beat in horses, and
reviewed data analysis methods using pilot data collected from one horse. It is our hope that the methods
presented here will be used to test the “vocal learning and rhythmic synchronization” hypothesis, which
predicts that horses (and other vocal nonlearners) cannot synchronize their movements with a musical beat.
Given current interest in human-horse interactions involving music (such as the Equus Projects,
www.dancingwithhorses.org) and the scientific study of how animals process music (reviewed in Patel &
Demorest, 2013), hopefully such research will be conducted soon.
In closing, we briefly touch on a few conceptual issues relevant for future research on BPS in
horses. The first is that there are over 300 horse breeds; the results obtained may depend on which breed is
tested. Demonstration of BPS in even one breed would be sufficient to challenge the vocal learning
hypothesis. The second issue concerns a conceptual distinction between tempo sensitivity and BPS. It may
be that horses trot faster to fast music and slower to slow music, without showing true synchronization of
movements to a musical beat. This would show tempo sensitivity, but not BPS. BPS involves the stable
temporal alignment of rhythmic movements and musical beats, and further, is demonstrable at a range of
different musical tempi (cf. Patel et al., 2009c). Finally, if after extensive testing it appears that horses are
not capable of BPS, it would be worth asking if their lack of vocal-learning brain mechanisms is
responsible, or if the key factor is the lack of some other ability also required for BPS. For example, the
propensity to engage in joint social action (i.e., coordinated movement patterns with conspecifics) and the
ability to imitate nonvocal movements may also be necessary foundations for BPS (Fitch, 2009; Patel et al.,
2009b; Schachner, 2010). If horses have these two traits but are not capable of BPS, this would provide
more specific support for the vocal learning hypothesis.

ACKNOWLEDGMENTS

We thank JoAnna Mendl Shaw of The Equus Projects for sharing her experience with horses, music, and
dance with author AP and for introducing him to author DL. We also thank Sarah Gardner for her
outstanding work as a research assistant, Sarah Gardner and Noah Friedman for coding True’s movements
from video, and Mark McLean for help with video recording. Supported by Neurosciences Research
Foundation as part of its program on music and the brain at The Neurosciences Institute, where AP was the
Esther J. Burnham Senior Fellow.

153
Empirical Musicology Review Vol. 7, No. 3-4, 2012

NOTES

[1] Corresponding Author. Department of Psychology, Tufts University, 490 Boston Ave., Medford, MA,
02155. Email: a.patel@tufts.edu

REFERENCES

Arrighi, R., Alais, D., & Burr, D. (2006). Perceptual synchrony of audiovisual streams for natural and
artificial motion sequences. Journal of Vision, Vol. 6, pp. 260–268.

Bispham, J. (2006). Rhythm in music: What is it? Who has it? And why? Music Perception, Vol. 24, pp.
125-134.

Brown, S., & Jordania, J. (2011). Universals in the world's musics. Psychology of Music, Advance online
publication. DOI: 10.1177/0305735611425896

Buzsáki, G. (2006). Rhythms of the Brain. New York: Oxford University Press.

Darwin, C. (1871). The Descent of Man, and Selection in Relation to Sex. London: John Murray.

Ellis, D. (2007). Beat tracking by dynamic programming. Journal of New Music Research, Vol. 36, pp. 51–
60.

Fisher, N.I. (1993). Statistical Analysis of Circular Data. Cambridge: Cambridge University Press.

Fitch, W.T. (2006). The biology and evolution of music: A comparative perspective. Cognition, Vol. 100,
pp. 173–215.

Fitch, W.T. (2009). Biology of music: Another one bites the dust. Current Biology, Vol. 19, pp. R403-R404.

Gallo, T.C. (2007). You’ve got the beat. Practical Horseman, Vol. 35, pp. 47-49.

Georgopoulos, A.P., Taira, M., & Lukashin, A. (1993). Cognitive neurophysiology of the motor cortex.
Science, Vol. 260, pp. 47–52.

Gould, S.J. & Vrba, C. (1982). Exaptation – a missing term in the science of form. Paleobiology, Vol. 8, pp.
4-15.

Hasegawa, A., Okanoya, K., Hasegawa, T., & Seki, Y. (2011). Rhythmic synchronization tapping to an
audio–visual metronome in budgerigars. Scientific Reports 1, 120; DOI:10.1038/srep00120.

Heffner, H.E., & Heffner, R.S. (1992). Auditory perception. In: C. Phillips & D. Piggins (Eds.), Farm
Animals and the Environment. Wallingford, UK: C.A.B. Intl., pp. 159-184.

Honing, H., & Ploeger, A. (2012). Cognition and the evolution of music: Pitfalls and prospects. Topics in
Cognitive Science, Vol. 4, No. 4, pp. 513-524.

Houpt, K., Marrow, M., & Seeliger, M. (2000). A preliminary study of the effect of music on equine
behavior. Journal of Equine Veterinary Science, Vol. 20, No. 11, pp. 691-737.

Jarvis, E.D. (2007). Neural systems for vocal learning in birds and humans: a synopsis. Journal of
Ornithology, Vol. 148 (Suppl. 1), pp. S35-S44.

Koelsch, S. (2011). Toward a neural basis of music perception – a review and updated model. Frontiers in
Psychology, Vol. 2.

154
Empirical Musicology Review Vol. 7, No. 3-4, 2012

McDermott, J.H., & Hauser, M.D. (2005). The origins of music: Innateness, development, and evolution.
Music Perception, Vol. 23, pp. 29–59.

McDermott, J.H., & Hauser, M.D. (2007). Nonhuman primates prefer slow tempos but dislike music
overall. Cognition, Vol. 104, pp. 654–668.

Nettl, B. (2000). An ethnomusicologist contemplates universals in musical sound and musical culture. In:
N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music. Cambridge, MA: MIT Press, pp. 463–
472.

Patel, A.D. (2006). Musical rhythm, linguistic rhythm, and human evolution. Music Perception,Vol. 24, pp.
99-104.

Patel, A.D., & Demorest, S. (2013). Comparative music cognition: Cross-species and cross-cultural studies.
In: D. Deutsch (Ed.). The Psychology of Music, 3rd ed. San Diego: Academic Press/Elsevier, pp. 647-681.

Patel, A.D., Iversen, J.R., Bregman, M.R., & Schulz, I. (2009a). Experimental evidence for synchronization
to a musical beat in a nonhuman animal. Current Biology, Vol. 19, pp. 827-830.

Patel, A.D., Iversen, J.R. Bregman, M.R., & Schulz, I. (2009b). Studying synchronization to a musical beat
in nonhuman animals. Annals of the New York Academy of Sciences, Vol. 1169, pp. 459-469.

Patel, A.D., Iversen, J.R. Bregman, M.R., & Schulz, I. (2009c). Avian and human movement to music: Two
further parallels. Communicative and Integrative Biology, Vol. 2, pp. 1-4.

Patel, A.D., Iversen, J.R., Chen, Y.C., & Repp, B.R. (2005). The influence of metricality and modality on
synchronization with a beat. Experimental Brain Research, Vol. 163, pp. 226-238.

Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience, Vol. 6, pp. 688–
691.

Phillips-Silver, J., Toiviainen, P., Gosselin, N., Piché, O., Nozaradan, S., Palmer, C., & Peretz, I. (2011).
Born to dance but beat-deaf: A new form of congenital amusia. Neuropsychologia, Vol. 49, pp. 961-969.

Policht, R., Karadios, A., & Frynta, D. (2011). Comparative analysis of long-range calls in equid stallions
(Equidae): Are acoustic parameters related to social organization? African Zoology, Vol. 46, pp. 18-26.

Proops, L., Walton, M., & McComb, K. (2010). The use of human-given cues by domestic horses, Equus
caballus, during an object choice task. Animal Behaviour, Vol. 79, pp. 1205-1209.

Rankin, S.K, Large, E.W., & Fink, P.W. (2009). Fractal tempo fluctuation and pulse prediction. Music
Perception, Vol. 26, pp. 401-413.

Rauschecker, J.P., & Scott, S.K. (2009). Maps and streams in the auditory cortex: Nonhuman primates
illuminate human speech processing. Nature Neuroscience, Vol. 12, pp. 718-724.

Saslow, C.A. (2002). Understanding the perceptual world of horses. Applied Animal Behaviour Science,
Vol. 78, p. 209-224.

Schachner, A. (2010). Auditory-motor entrainment in vocal-mimicking species: Additional ontogenetic and


phylogenetic factors. Communicative & Integrative Biology, Vol. 3, pp. 1-4.

Schachner, A., Brady, T.F., Pepperberg, I., & Hauser, M. (2009). Spontaneous motor entrainment to music
in multiple vocal mimicking species. Current Biology, Vol. 19, pp. 831–836.

Snowdon, C.T., & Teie, D. (2010). Affective responses in tamarins elicited by species-specific music.
Biology Letters, Vol. 6, pp. 30-32.

155
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Van Orden, K. (2005). Music, Discipline, and Arms in Early Modern France. Chicago: University of
Chicago Press.

Wells, D. (2009). Sensory stimulation as environmental enrichment for captive animals: A review. Applied
Animal Behaviour Science, Vol. 118, p. 1-11.

Wells, D.L., Graham, L., & Hepper, P.G. (2002). The influence of auditory stimulation on the behavior of
dogs housed in a rescue shelter. Animal Welfare, Vol. 11, pp. 385-393.

Wilson, E.O. (Ed.) (2006) From So Simple A Beginning: The Four Great Books of Charles Darwin. New
York: W.W. Norton & Co.

Zarco, W., Merchant, H., Prado, L., & Mendez, J.C. (2009). Subsecond timing in primates: Comparison of
interval production between human subjects and rhesus monkeys. Journal of Neurophysiology, Vol. 102,
pp. 3191–3202.

APPENDIX 1

An anecdote about horse synchronization to a musical beat. The source is an Aug 11, 2009 essay “Using
music to train your horse”, from a blog about horse training (URL: http://www.carolynresnickblog.com/
using-music-to-train/). Bold font has been added to highlight critical sentences for this paper:
“Over the years, I have discovered that horses not only listen to music, they respond to music with
rhythmic strides. I also found the music increases their desire to work close with me in the training process.
At the age of 10, I began training horses for the show arena. I had also studied ballet most of my childhood
and planned to be a professional ballet dancer. However, when I reached the age of 18, I had to give up
dancing due to an inner ear problem that affected my balance. It was the natural choice to pursue a career in
showing horses. I fulfilled my need for dancing by listening to music when I was around the barn. If you
should drop by my ranch, you would hear anything from Swan Lake to country music. It wasn’t until years
later that I started playing music while I trained horses.
One day I was riding and listening to music when I noticed the horse I was riding was keeping tempo to the
music. It was uncanny. I wasn’t sure if my listening to music was influencing the manner in which I gave
the signals that I communicated to my horse or if he was choosing to let the music influence his steps.
Perhaps it was a little of both. Whatever it was, I this found unity we were sharing. Then I noticed horse
after horse that I rode was responding the same way to the music. I became more creative in my training
sessions asking the horse to perform the rhythm or various types of music. What I discovered when I took
this approach surprised me.
The horse learned their new elements much faster. They seemed to understand my need to stay within the
tempo. It was a marvelous discovery. I experimented and turned the horses loose on their own to see how
much I had influenced them. What I learned was amazing. On their own, each horse stepped to the
music perfectly like a metronome. Horse after horse, at liberty, ran, walked and trotted in time with
the music. The horses demonstrated that they were actively listening to the music and being attentive
to the tempo and rhythm.”
APPENDIX 2

Videos 1-6:
http://libeas01.it.ohio-state.edu/ojs/public/journals/8/EMR_v7_03-04_video1-6.html

Videos 7-12:
http://libeas01.it.ohio-state.edu/ojs/public/journals/8/EMR_v7_03-04_video7-12.html

Videos 13-16:
http://libeas01.it.ohio-state.edu/ojs/public/journals/8/EMR_v7_03-04_video13-16.html

156
Empirical Musicology Review Vol. 7, No. 3-4, 2012

If horses entrain, don’t entirely reject vocal learning:


An experience-based vocal learning hypothesis
ADENA SCHACHNER
Department of Psychology, Harvard University

ABSTRACT: Bregman and colleagues describe methods for testing whether horses
entrain their actions to an auditory beat. If horses can entrain, does this necessarily
imply that there is no causal relationship between vocal learning and entrainment? I
propose an alternative way in which vocal learning may relate to entrainment – one that
is consistent with entrainment in some vocal non-learning species. Due to engaging in
the developmental process of vocal learning, there may be early experiences common
to vocal learners, but rare in vocal non-learning species. It is possible that it is these
experiences that are critical for entrainment – not vocal learning itself, nor related
genes. These experiences may result in critical changes in neural development, leading
to the development of cognitive mechanisms necessary for both vocal learning and
entrainment. This hypothesis changes the causal story from one of genetic change to
one of changes in experience, and from a focus on evolution to a focus on individual
ontogeny. Thus, if horses can entrain, we should not immediately reject the idea of a
relationship between vocal learning and entrainment: First, we should consider whether
some unusual aspect of the horses’ experience effectively replicates the unusual
experiences of vocal learning animals.

Submitted 2013 Jan 11; accepted 2013 Jan 14.

KEYWORDS: entrainment, vocal learning, vocal imitation, evolution, development

THE vocal learning hypothesis predicts that only vocal mimicking animals should be able to entrain, or
synchronize their movements with an auditory beat. Thus far, comparative evidence involving a number of
species has borne out this prediction (Patel, Iversen, Bregman & Schulz, 2009; Schachner, Brady,
Pepperberg & Hauser, 2009). In the current paper, however, Bregman and colleagues describe anecdotal
reports suggesting that horses may be able to entrain as well, in spite of their lack of vocal learning ability.
This observation is important to test empirically: As Bregman notes, evidence of entrainment in horses
would refute the vocal learning hypothesis. To this end, Bregman and colleagues describe methods for
testing whether horses are able to entrain their actions to an auditory beat. By laying out necessary
information in a clear step-by-step manner, Bregman and colleagues make it possible for other authors with
access to horses to easily test for entrainment.
Individual animals within a species may differ in their tendency to entrain, and even a single
animal may not respond to all rhythmic stimuli or may at times fail to move in response to music (Patel et
al., 2009; Schachner et al., 2009). Thus, to have reasonable power to detect entrainment in a species, it is
necessary to obtain data from a large range of subjects, in a variety of different situations, with a variety of
different stimuli. By allowing for data collection from a larger range of subjects, the current paper serves a
valuable purpose.
In addition, the current paper points out the value of testing trained animals. Such animals may be
instructed to produce an isochronous, repetitive movement (such as walking), which may then potentially
become entrained. This method avoids a major risk of comparative entrainment research: the risk that the
subject fails to produce any movement whatsoever. If a subject remains still during entrainment testing, this
provides a null result, but also a complete lack of relevant data – thus, the null result is difficult to interpret.
In selecting new animals and new species to test for entrainment, it may be useful to begin by testing
animals which can be instructed to produce an isochronous, repetitive movement – one which has a wide
range of possible tempos, and thus may potentially become entrained.

157
Empirical Musicology Review Vol. 7, No. 3-4, 2012

IF HORSES ENTRAIN:
AN ALTERNATIVE TO THE VOCAL LEARNING HYPOTHESIS

If we find that horses can entrain, how might we make sense of the overall pattern of comparative data?
Would this finding necessarily imply that there is no causal relationship between vocal learning and
entrainment? I agree with Bregman and colleagues that such a finding would refute the current vocal
learning hypothesis. However, I believe that even in this case, it is possible that vocal learning may play a
role. Here I propose an alternative way in which vocal learning may relate to entrainment – one that is
consistent with entrainment in some vocal non-learning species.
In the original vocal learning hypothesis, Patel hypothesized that natural selection for vocal
learning gave us the required cognitive machinery for entrainment (Patel, 2006; Patel et al., 2009). This
hypothesis implies that the crucial changes were genetic in nature: certain genes emerged and were favored
through natural selection, and these genes were needed for the development of neural mechanisms used for
vocal learning and entrainment. Without these genes, the required brain mechanisms cannot develop, and
both vocal learning and entrainment are impossible. By this hypothesis, other factors, like having certain
experiences, should not be sufficient to allow vocal non-learning species to entrain.
However, there is an alternative way in which vocal learning may play a role: by shaping
experience. Due to engaging in the developmental process of vocal learning, there may be early
experiences that are common to all vocal learners, but rare in vocal non-learning species. For instance,
vocal learning birds and humans go through analogous stages of vocalization development (Doupe & Kuhl,
1999). During vocal learning, individuals develop fine motor control (over the vocal apparatus) and hone
their motor programs by comparing the sound of their own vocalizations to an auditory template stored in
memory, and engaging in error correction based on this real-time feedback (Konishi, 1965; Doupe & Kuhl,
1999).
If vocal learning causes animals to have certain common experiences, it is possible that it is these
experiences that are critical for entrainment – not vocal learning itself, nor related genes. These
hypothesized necessary experiences may result in critical changes in neural development, leading to the
development of cognitive mechanisms necessary for both vocal learning and entrainment. This process
would be analogous to that of reading, another capacity with domain-specific neural mechanisms that only
develop with specific experience (Dehaene, 2009).
If it is the unusual experiences of vocal learners that are critical for entrainment (instead of
specific genetic factors), this leaves open the possibility that the required experiences may be available to
some animals that are not vocal learners, through the introduction of unusual experiences. One possibility is
that the process by which trained horses (e.g. dressage horses) develop fine motor control, and learn motor
skills, may be meaningfully similar to the process of developing fine motor control for vocal learning. For
instance, the horses’ riders may train them to produce specific movements using real-time feedback and
error correction, perhaps engaging in a process similar to that which occurs naturally in vocal learning
species.
This alternative hypothesis changes the causal story from one of genetic change to one of change
in experience, and from a focus on evolution to a focus on individual ontogeny. However, this hypothesis
maintains an important role for vocal learning, positing that vocal learning leads animals to have certain
early experiences, which in turn cause them to develop the neural mechanisms necessary for entrainment.
Thus, if we do find that horses can entrain, we should not immediately reject the idea of a relationship
between vocal learning and entrainment. First we should consider the role of vocal learning in guiding early
experience, and ask whether some unusual aspect of the horses’ experience effectively replicates the
unusual experiences of vocal learning animals.

REFERENCES

Dehaene, S. (2009). Reading in the Brain: The Science and Evolution of a Human Invention. New York:
Viking Press.

Doupe, A.J., & Kuhl, P.K. (1999). Birdsong and human speech: common themes and mechanisms. Annual
Review of Neuroscience, Vol. 22, No.1, pp. 567–631.

Konishi, M. (1965). The role of auditory feedback in the control of vocalization in the whitecrowned
sparrow. Zeitschrift für Tierpsychologie, Vol. 22, pp. 770–783.

158
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Patel, A.D. (2006). Musical rhythm, linguistic rhythm, and human evolution. Music Perception, Vol. 24,
No.1, pp. 99–104.

Patel, A.D., Iversen, J.R., Bregman, M.R., & Schulz, I. (2009). Experimental evidence for synchronization
to a musical beat in a nonhuman animal. Current Biology, Vol. 19, No.10, pp. 827–830.

Schachner, A., Brady, T.F., Pepperberg, I.M., & Hauser, M.D. (2009). Spontaneous motor entrainment to
music in multiple vocal mimicking species. Current Biology, Vol.19, No.10, pp. 831-836.

159
Empirical Musicology Review Vol. 7, No. 3-4, 2012

A commentary on Micah Bregman et al.: A method for testing


synchronization to a musical beat in domestic horses (Equus
ferus caballus)

SANDY S. VENNEMAN
University of Houston-Victoria

ABSTRACT: This commentary provides additional information related to equines and


suggestions for strengthening the proposed protocol for testing synchronization to a
musical beat in this species.

Submitted 2012 November 8; accepted 2012 November 8.

KEYWORDS: equine perception, synchronization, nuisance variables

INTRODUCTION

THE authors propose a methodology to explore the vocal learning and rhythmic synchronization
hypothesis put forward by Patel (2006) in domestic horses that predicts that horses are incapable of
rhythmic synchronization because they are vocal non-learners. However, anecdotal accounts of horses
spontaneously entraining to musical beats abound in equestrian circles. Since there is no scientific study of
rhythmic synchronization, the authors propose a new experimental method which they term “circular
trotting to music”, or CTM, to allow for such scientific investigation. The authors explain that if horses are
capable of musical beat perception and synchronization (BPS) the vocal learning and rhythmic
synchronization hypothesis is not supported. Should such experimental data in horses be produced, it would
open a dialog regarding what substrates are central to this ability.
Their paper describes an experimental procedure to test if horses are capable of BPS. My
commentary will address this described protocol in an effort to minimize nuisance variables that may
impact the ability of researchers to find BPS if it occurs and provide equine terminology that is used
internationally.

PRIOR RESEARCH ON HORSES AND MUSIC AND PROPOSED RESEARCH


PROTOCOL

Audition

Perceptual abilities of the test animal, in this case the horse, must be addressed. If the stimulus falls outside
the perceptual abilities the experiment is invalid. The authors describe the auditory capabilities of horses
under the heading “prior research on horses and music.” Heffner (1992) is cited and an audiogram is
displayed that compares human and horse hearing thresholds. In a more recent paper, Heffner (1998) notes
that horses have an auditory range from 55 to 33,500 Hertz (Hz) and their best sensitivity is at a frequency
of 2,000 Hz and 7 decibels (dB). Given this information, the proposed protocol could be strengthened if
both variables of frequency and intensity are experimentally controlled. Currently the protocol calls for the
music to be played over the speakers in an indoor arena, using speakers with a range of 58 Hz to 18,500 Hz
but makes no mention of measurement of the intensity (dB) of music. Measuring and adjusting the decibels
of trial music at the level of the test horse’s ears would allow for consistency across multiple trials with the
same animal, or across animals in different environments.
The proposed protocol calls for the use of an indoor arena to decrease external stimuli and thereby
help control for the possibility of the horse attending to sounds other than the independent variable of
music. The indoor arena could also minimize visual differences in the experimental environment that may
confound results (Heffner, 1998).

160
Empirical Musicology Review Vol. 7, No. 3-4, 2012

The authors note that the case study uses a gelding. This would minimize the impact of sex
hormones in this one experiment, but future researchers may want to include sex as a variable since
hormones have been shown to impact audition in multiple species (Al-Mana et al., 2008).

Vision

Visual stimuli can also impact the behavior of horses. One of the earliest psychological experiments
involved a horse named Clever Hans, who responded to very subtle cues from his handler by pawing the
answer to math questions. When the person asking the math question did not know the answer, or was
hidden from Clever Hans’ view, he could no longer answer the question correctly. Thus the impact of
almost imperceptible signals was documented and has been since called the “Clever Hans Effect.” Hanggi
and Ingersoll (2012) demonstrated that horses have visual abilities of nearly 360 degrees, so the
proposition that the controller was not in the view of the horse for one third of the circle in the case study is
likely to be inaccurate. However, the researchers have controlled for this confound in their current
proposed protocol by hiding equipment controllers from the horse. Perhaps, more importantly, the authors
eliminate inadvertent rhythmic cues from the horse handler by proposing the use of headsets supplying
sounds that have no beat.

The circular trotting to music (CTM) method

The authors of this research need a procedure that would eliminate as much human-horse communication
as possible. With a mounted rider the horse can respond to multiple cues in addition to the auditory one
supplied by the experimenter. Educated equestrians use all four aids to communicate with their mounts:
hands, legs, seat, and voice. The seat (weight distribution and movement) is particularly influential as a
horse will follow the rhythm of a rider if that rhythm is not synchronized with their own. The size of the
horse’s stride is also influenced by a rider with an educated seat. This influence allows the rider to
influence the rhythm of the horse’s footfalls. The difficulty of using a mounted rider arises when the rider
unknowingly influences the movement of their mount with the aids. To eliminate seat and leg aids, the
authors employ an un-mounted technique termed lunging. They minimize the impact of the hands by
calling for a loose contact with the horse, and eliminate the voice by having the handler remain silent
during data collection.
The training of horses has been described in text since ancient times starting with Xenophon
430-354 BC (Xenophon, 1925). The current study protocol would be strengthened if it used classical
methods of training and terminology that are understood by equestrians internationally. In the proposed
protocol, the authors describe a procedure that is classically termed lunging, in which the horse moves
around the handler on a rope. This rope is generally termed a lunge line, rather than the stated “lead line.”
A lead is shorter and is used to lead a horse, not for groundwork. In the same paragraph they describe the
lunge line placing the horse ten feet from the handler. This distance is unsafe, as a horse may show
exuberance on the lunge and kick out, potentially injuring the handler. In addition, a circle of such small
radius will inhibit the movement of most horses, except for very small ones, or ones with minimal stride
(length of steps). Common practice in lunging is to spend the majority of the session on a 20 meter circle
with the horse being ten meters from handler instead of ten feet. Likewise using traditional equipment and
terminology would increase confidence in the procedure. A lunge whip would be used in place of the
terminology in the protocol, “carrot stick and string” (Bryant, 2006).
Potential researchers could investigate the impact of the lunging technique by replicating with
circles of varying sizes and assessing their influence on the ability of the horse to use its’ body and entrain
to the musical rhythm. The authors also note they conducted some trials off the lunge line. This variation
would help eliminate any unintentional communication from the handler through the lunge line. This
modification could call for the use a lunging pen, also called a round pen, if it were of ample size to avoid
restricting movement of the horse. The horse in the case study was a Morgan and appears to be fairly
typical of the breed, being small and compact. Even with his small size, he is leaning toward the center of
the circle, and falling on the forehand. Both indicate a lack of balance that may be decreased if the
procedure employed a larger circle.
A likely subject pool for further study would come from the discipline of horsemanship termed
dressage, which is the French word for training. Any horse can be improved through dressage training, but
some breeds have been selectively bred with qualities to fit the demands. These include warmbloods that
are typically larger than the case study Morgan. Increasing the size of the circle will enable horses with a
larger stride to remain in balance and move naturally, as well as small ones. Additionally, on a less

161
Empirical Musicology Review Vol. 7, No. 3-4, 2012

restrictive circle, the time the horse remains in motion may increase, allowing for more trials to have
sufficient data for coding.
Dressage riding has a long history of using music in training, exhibition, and competition. Musical
rides are part of dressage competitions nationally, internationally, and are showcased at the Olympic
Games. The US National level governing body for dressage, The United States Dressage Federation, sets
rules for scoring rides to music, termed musical freestyles. The current rulebook instructs the judge to
reward how well the music expresses the gaits of the horse and the appropriateness of the rhythm and
tempo of the music to the horse (USDF Member Guide, 2011). The owner of the horse picked the music in
the case study presented by the authors. Experimental validity would suffer if each horse were tested to
different music in the experimental protocol. However, a strength of the proposed musical protocol is
insuring that the test music chosen has a strong beat and that each horse has the physical ability to trot to
the tempo chosen. The authors propose to use the same music and mechanically alter the tempo for
different trials also strengthing the protocol.
The protocol also states that the handler should “maintain eye contact” with the horse. Horses
respond to non-verbal cues, such as gaze, from other horses as well as handlers. Facing the horse obliquely
toward the rear, with the line of sight on the hindquarters encourages forward movement in a horse, while
obliquely facing toward the head, with the line of sight to the horses eye, is a signal to stop or slow.
Therefore, unspecified gaze could add unwanted variation to trials. To reduce this variance, a simple
change to the protocol could create stability over trials and across participants. Placing an adhesive target
on the horse’s hip and instructing the handler to look at the “target” could minimize variation due to this
non-verbal cue, as well as promote longer sets of trotting for sampling.

Data analysis method

The protocol description of recording the horse’s footfalls includes video, as in the case study, and a
proposal to alternatively use an accelerometer. The procedure calls for multiple researchers to code footfalls
frame by frame with the sound turned off. This eliminates the potential threat to validity from experimenter
expectancies that would manifest as coders unconsciously finding synchronization when there is none
(Cook & Campbell, 1979).
Humans show a strong tendency to favor the use of one hand over the other “handedness” and also
exhibit “footedness” from a young age (Berger et al., 2011). Just like humans, horses exhibit a preferred or
dominant foot (Lucidi et al., 2013). Most horses will use their dominant foot to start an upward transition
(changing gait) if not cued by a rider. For this reason it might be beneficial to look at the beat on the
dominant forelimb in addition to the current protocol of counting the footfalls of both forelimbs.

DISCUSSION

The authors note that there are many breeds of horses and this may impact results. Horses of different
breeds generally have different temperaments and energy levels that could influence trials if not considered
and controlled through experimental procedures. For example, will the trials be run before the horse is
exercised for the day or after? Often horses “blow off steam” at the start of a training session and pay less
attention during this period than after they have settled down. It might be advisable to allow the horse a
specified warm-up period before collecting data or operationalize warmed-up as the point that the horse
voluntarily stops voluntary movement. In addition to variation between breeds of horse, there is
considerable variation within breeds. We might logically extrapolate from humans that some horses may
have “musical inclinations” while others do not. Therefore a fairly large sample size would be necessary to
adequately investigate this issue insuring the sample is not composed of only horses “with two left feet.”
In closing, the authors have suggested an interesting experimental design, CTM, to test if
anecdotal accounts of horses exhibiting BPS can be supported. If BPS is experimentally supported in
horses the vocal learning and rhythmic synchronization hypothesis (Patel, 2006) will need revision.

NOTES
[1] Corresponding author. Departments of Psychology & Biology, University of Houston-Victoria, 3007
North ben Wilson, Victoria, TX 77995. Email: vennemans@uhv.edu

162
Empirical Musicology Review Vol. 7, No. 3-4, 2012

REFERENCES

Al-Mana, D., Ceranic, B., Djahanbakhch, O., & Luxon, L.M. (2008). Hormones and the auditory system:
A review of physiology and pathophysiology. Neuroscience, Vol. 153, No. 4, pp. 881–900.

Heffner, H.E. (1998). Auditory awareness. Applied Animal Behaviour Science, Vol. 57, pp. 259–268.

Berger, S.E., Friedman, R., & Polis, M.C. (2011). The role of locomotor posture and experience on
handedness and footedness in infancy. Infant Behavior and Development, Vol. 34, No. 3, pp. 472–480.

Bryant, J.O. (2006). The USDF Guide to Dressage. North Adams, MA.: Storey Publishing.

Cook, T.D., & Campbell, D. T. (1979). Quazi-experimentation: Design & Analysis Issues for Field Settings.
Boston: Houghton Mifflin Company.

Hanggi, E.B., & Ingersoll, J.F. (2012). Lateral vision in horses: A behavioral investigation, Behavioural
Processes. Vol. 91, No. 1, pp. 70–76.

Lucidi, P., Bacco, G., Sticco, M., Mazzoleni, G., Benvenuti, M., Bernabo, N., & Trentini, R. (2013).
Assessment of motor laterality in foals and young horses (Equus caballus) through the analysis of
derailment at trot. Physiology & Behavior, Vol. 109, pp. 8–13.

Patel, A.D. (2006). Musical rhythm, linguistic rhythm, and human evolution. Music Perception, Vol. 24, pp.
99–104.

Pfungst, O. (1911). Clever Hans: A Contribution to Experimental Animal and Human Psychology
(Translated by Rahn, C.L.). New York: Henry Holt and Company.

USDF Member Guide (2011) www.usdf.org

Xenophon (1925). Scripta Minora and Constitution of the Athenians. (E. C. Marchant & G.W. Bowersock,
Trans.) Cambridge, MA: Harvard University Press.

163
Empirical Musicology Review Vol. 7, No. 3-4, 2012

Aaron L. Berkowitz, The Improvising Mind: Cognition and Creativity in the Musical Moment.
New York: Oxford University Press, 2010.

Although the process of musical improvisation appears to be ineffable, its mysteries have recently come to
the attention of cognitive scientists. Aaron L. Berkowitz is a neuroscientist, composer and practicing
pianist. In his recent book he aims, ambitiously, to explore cognition in musical improvisation. The author
combines manifold methodologies (from historical examination of treatises and interviews to brain
imaging) and draws on the findings from a variety of disciplines. Some readers may find the impressive
number of quotations irritating, in so far as it sometimes distracts from the narrative. However, this may
make it a potentially interesting source of reference to researchers new to the field of musical
improvisation.
Berkowitz defines musical improvisation as “the spontaneous rule-based combination of elements
to create novel sequences that are appropriate for a given moment in a given context” (p. xix). To set even
more ambitious aims for his work, Berkowitz aims to go beyond music and provide insights into similarly
rule-based human sets of behavior, specifically spontaneous speech and language acquisition. Importantly,
readers interested in the embodied dimensions of music cognition (e.g. Leman, 2008) and musical
improvisation (e.g. Iyer, 2004) may be disappointed by Berkowitz’s definition, since is lacks reference to
the active role of the human body in musical processing.
This relatively short book consists of nine chapters and a summarizing coda. The work is
structured as two parts: Part I (Chapters 1–6) studies the knowledge base necessary for improvisation (see
below) in comparison to language learning; Part II (Chapters 6 – 9) explores cognition in improvised
performance. In the course of this review I describe and discuss its content.
The first, introductory, chapter of the book aims to define the concepts used throughout, while
defining and examining musical improvisation in terms of constraints, whether they are conventions,
stylistic constraints, or performance/performer constraints. Regarding the performance constraints,
Berkowitz turns to the works of Jeff Pressing (e.g. 1984), which are often referred to in the course of the
book. Using Pressing and ethnomusicologist Bruno Nettl’s ideas, Berkowitz points here to the existence of
the knowledge base necessary for any type of improvisation. He writes: “Improvised performance in any
tradition requires years of training to acquire the rules, conventions, and elements of the style that make up
the knowledge base” (p. 7). The rest of this chapter is devoted to brief descriptions of issues from the
psychology of learning and memory, including implicit and explicit learning and memory, and declarative
and procedural memory (the “knowing that” and “the knowing how”, respectively). The chapter concludes
with a short comparison of music and language, this being the starting point for more detailed deliberation
on this particular topic throughout the course of The Improvising Mind.
Chapter 2 is devoted to the examination of pedagogical treatises on keyboard improvisation for
amateurs and music students from the late 18th to the 19th centuries. It seeks the answer to questions
connected with the process of acquiring the knowledge base (or competence conceptualization) “in a
manner for spontaneous use in improvised performance” (p. 17). In the following Chapter 3 Berkowitz
describes the pedagogical approaches used by treatise writers to develop the improviser’s “brains in the
fingers” (p. 38), but does not refer to the embodied basis of music cognition. Returning to the analysis of
works of Pressing, Berkowitz writes:

Once installed in memory, however, the elements of the knowledge base must be
organized and refined – “enriched” in Pressing’s words – if the learner is to have the kind
of instant and creative access to them that is necessary for improvisation. The authors of
the treatises introduced in the previous chapter converged upon four basic pedagogical
strategies to accomplish these goals: transposition, variation, recombination, and the use
of models that exemplify these processes in musical context. Learning a formula in
various transpositions or varied realizations allows for acquaintance with the material “in
intimate details, and from different perspectives”, while learning how elements can be
recombined, provides for “cross-linking,” and “connections” between materials, to use
Pressing’s terminology. (p. 40)

In Chapter 3 Berkowitz focuses in detail on the abovementioned basic pedagogical strategies, interestingly
drawing on concepts and research from cognitive psychology. He discusses the process of automatization
as the outcome of repeated rehearsal in terms of work in neurolinguistics by Michael Paradis, and in
cognitive psychology, using John Anderson’s adaptive control of thought (ACT) model of learning,
respectively. Berkowitz then turns to the variation strategy, this analyzed from the perspective of the

164
Empirical Musicology Review Vol. 7, No. 3-4, 2012

cognitive study of oral tradition undertaken by the psychologist David Rubin (1995) and also research in
the psychology of concepts (Eysenk and Keane, 2005). In this chapter, Berkowitz interestingly turns to the
philosophy of the medieval Spanish thinker Ramon Lull and his idea of “ars combinatoria” (being “the
method through which two or more elements can be combined”, this further developed by the philosopher
Gottfried Leibniz), that later led to the development of combinatorial principles in 18th century music, in
the form of musical “dice games”. These games “were designed by composers for amateurs, and provided a
matrix of musical choices for each measure of a waltz, minuet, or other genres, with a number given to
each possible option for a particular measure” (pp. 64–65). The author further analyzes recombination from
developmental psychology perspectives on language and statistical learning, finally turning to the works of
the music theorist Robert Gjerdingen (1988) and his research on musical “corpus” study (as analogues to
the linguistic corpus studies).
In Chapter 4, Berkowitz approaches learning to improvise in the Classical style from the
perspective of the learner. The author draws heavily on interviews he conducted with piano soloists who
have learned to improvise in this style, including Robert Levin and Malcolm Bilson. Chapter 5, compares
music and language cognition from the perspective of acquisition. Berkowitz focuses on the findings
discussed in the previous chapters of the book, and extends them by reference to research and theoretical
works on language learning. He aims to understand what is a knowledge base and how it is acquired. The
author does this by making a comparison of the learning processes in language and music while describing
the elements of their respective knowledge bases. Going further, he defines some of the terminology used
in the study of language and language acquisition, while looking for analogues to such concepts in music
(including phonology, morphology, syntax, semantics, and pragmatics). This chapter also contains a
discussion of theories of how these knowledge bases are acquired in language and music. Among these are
Chomskian nativist approaches to language acquisition and various empirical approaches (constructivism
and cognitive-functional usage-based linguistics).
Chapter 6 opens Part II of the book. Berkowitz explores the musical knowledge base described in
previous chapters and the way in which it can be used in performance, returning to interviews conducted
with Levin and Bilson. The author returns to the concepts of implicit/procedural and explicit/declarative
memory in relation to studies of amnesic patients. In Chapter 7 Berkowitz focuses on the neural correlates
of phenomena described in previous chapters. Interestingly, he starts by discussing his own study
(Berkowitz & Ansari, 2008) on the neural substrates of jazz improvisation. This chapter also consists of
discussion of a complementary study by Limb & Braun (2008). The subjects of the first study were
classical pianists who were asked to perform, in controlled conditions, four different tasks designed to
provide variable degrees of improvisatory freedom. On the other hand, the later study: “…sought to
examine improvisation in a close to its real-world form as possible, providing a more panoramic view of
the full panoply of neural activity involved in improvising” (p. 143). Berkowitz concludes:

Since our study did not demonstrate activation changes in many of the frontal, temporal,
or limbic regions that were shown to be active in the study of Limb and Braun, it is
possible that these regions come into play only when true musical intent is present, from
moment to moment and/or in the attempt to create a musical narrative over a longer time-
span using one’s stock of musical materials, as was the case in their experiment. (ibid.)

Chapter 8 pursues the connection between improvisation and spontaneous speech, while comparing music
and language cognition from the perspective of production. Berkowitz analyses the findings of Chapters 6
and 7 in the context of theoretical and neuroscientific studies of spontaneous speech. He aims to answer the
question of how the knowledge base is used in musical performance. The last chapter of the book takes the
Mozart-style cadenza as a case study. Berkowitz examines it from the perspectives of pedagogical treatises
on cadenzas and interviews with Robert Levin on cadenza improvisation.
To summarize, Berkowitz is interested in the knowledge base necessary for improvisation.
Although the title of the book suggests the broader understanding of (music) cognition, Berkowitz’s main
focus is on the following questions: (1) What is this knowledge (of what elements and processes is it
comprised?); (2) How do we acquire and internalize this knowledge?; (3) How is this knowledge used in
performance? Thus, although his ambition to use a range of different methodologies is admirable (including
the interviews that the author conducted himself), the book’s title may be misleading. Among numerous
references to classical books, Berkowitz limits his discussion of current empirical research to two studies
that he himself conducted. Interestingly, in his article on motor sequences (Berkowitz & Ansari, 2008) he
acknowledges the literature on embodiment (including influential work on the human mirror neuron system

165
Empirical Musicology Review Vol. 7, No. 3-4, 2012

(Molnar-Szakacs & Overy, 2006)), but it is unclear why he chooses to refer solely to brain-centered
manipulations in his book.

Jakub Matyja
University of Huddersfield, UK

References

Berkowitz, A.L., & Ansari, D. (2008). Generation of novel motor sequences: the neural correlates of
musical improvisation. NeuroImage, Vol. 41, pp. 535-543.

Eysenk, M.W., & Keane, M.T. (2005). Cognitive Psychology: A Student’s Handbook, 5th edition. East
Sussex: Psychology Press.

Gjerdingen, R. (1988). A Classic Turn of Phrase: Music and the Psychology of Convention. Philadelphia:
University of Pennsylvania Press.

Iyer, V. (2004). Improvisation, temporality and musical experience. Journal of Consciousness Studies, Vol.
11, No. 3-4, pp. 159-173.

Limb, C.J., & Braun, A.R. (2008). Neural substrates of spontaneous musical performance: An fMRI study
of jazz improvisation. PLoS ONE, Vol. 3, No. 2, e1679.

Leman, M. (2008). Embodied Music Cognition and Mediation Technology. Cam., Mass.: MIT Press.

Molnar-Szakacs, I., & Overy, K. (2006). Music and mirror neurons: From motion to ‘e’motion. Social
Cognitive and Affective Neuroscience, Vol. 1, No. 3, pp. 235 – 241.

Pressing, J. (1984). Cognitive processes in improvisation. In: W.R. Crozier & A.J. Chapman (Eds.),
Cognitive Processes in the Perception of Art. Amsterdam: Elsevier

Rubin, D. (1995). Memory in Oral Traditions: The Cognitive Psychology of Epic, Ballad, and Counting-out
Rhymes. New York: Oxford University Press.

166

You might also like