You are on page 1of 5


qxd 01/13/2007 03:57 AM Page 315

Major and Minor




Kansai University
Osaka, Japan
triads is discussed in relation to the tension inherent to
triads showing intervallic equidistance. The fact that
tension triads resolve to minor chords with a semitone
increase and to major chords with a semitone decrease
reflects the well-known affective valence of rising and
falling pitches in speech and animal vocalizations (the
frequency code).
Key words: harmony, major, minor, tension, triads

N A RECENT REVIEW of Tone of Voice and Mind

(Cook, 2002), Scherer & Sander (2005) dismissed
what I think is a straightforward argument concerning the perception of musical harmony. Since the psychophysics of harmony was the thread that connected
the first five chapters and is essential for proper evaluation of the book, I address here the core issue of 2-tone
vs. 3-tone auditory psychophysics. Details of the model
and further experimental work on speech prosody have
appeared elsewhere (Cook, 2006; Cook & Fujisawa,
2006; Cook, Fujisawa, & Takami, 2006; Fujisawa &
Cook, 2006) and may answer some of the questions
raised by Scherer & Sander, but the central point concerning the inherent meaning of pitch movement in
music and speech can be stated succinctly.
To begin with, the main negative argument in Tone of
Voice and Mind was simply that context-free rises and
falls in pitch are an insufficient basis for explaining the
emotional valence of either musical harmony or speech
prosody. With regard to prosody, among the authorities
I cited in making that argument was of course Scherer,
who has frequently reported that the first-order statistics (mean, range, variability, etc.) on the acoustical signal alone do not distinguish between sadness, boredom,
and contentedness, on the one hand, or between joy and
anger, on the other hand, despite the fact that human
beings are rather good at hearing these emotions in

Music Perception








0730-7829, ELECTRONIC




spoken utterances even when the signal is degraded

to hide all linguistic content, played backwards or randomly spliced (e.g., Banse & Scherer, 1996; Scherer,
1986, 1995; Scherer, Johnstone, & Klasmeyer, 2003).
Although Scherer & Sander (2005) claim that their
recent work in press now indicates that affective
meaning is due to F0 level differences (p. 89), the
previous half-century of empirical worksome of the
best of which has come from Scherers labsuggests
that factors in addition to F0 level also will need to be
With regard to musical harmony, it is demonstrably
the case (Chapter 3 and Appendix 2 in Tone of Voice and
Mind; Cook & Fujisawa, 2006) that pitch intervals alone
cannot explain the affect of 3-tone harmonies (despite
contrary claims, e.g., Parncutt, 1989; Terhardt, 1974;
Tramo et al., 2001). That is, interval-based models (with
or without consideration of upper partials, masking
and fusion of tones, etc.) inevitably predict that some of
the inherently unresolved (augmented, diminished,
suspended fourth) triads are more sonorous (consonant, stable, etc.) than some or all of the resolved
(major and minor) triadscontrary to musical common sense and experimental results (e.g., Roberts, 1986).
If 2-tone pitch intervals alone are not the key to
affect, what are the relevant factors? To sort out the difficult problem of defining pitch contours, I argued in
Tone of Voice and Mind that it is easiest to begin with
diatonic music where fixed pitches are used and the
affective qualities of 3-tone harmonies are well known.
This was admittedly an unusual approach to the study
of speech prosody, but the proposed psychophysical
model of harmony provided an unambiguous and
quantitative method for examining the more problematical (non-scalar, ever-changing) pitch phenomena of
emotional speech. In the prosody literature, there are of
course many qualitative discussions of prosodic contours, but even in the music perception literature there
have been no previous quantitative discussions of 3-tone
psychophysics unencumbered by the concepts and
terminology of traditional harmony theory. It was the
complete absence of the relevant psychophysics of
3-tone harmony that demanded a detailed discussion in
Tone of Voice and Mind so that a connection between


1533-8312 2007





MUSIC.24.3_txt.qxd 01/13/2007 03:57 AM Page 316


N. D. Cook

the pitch phenomena of music and speech could be

The main constructive argument in Tone of Voice
and Mind was therefore concerned with 3-tone harmonies. This topic was discussed in terms of Leonard
Meyers (1956) idea about the tension produced by
intervallic equidistance, an argument that leads
directly to solutions to old problems in harmony theory. Specifically, instead of following Rameau and the
theorists of the Renaissance in building a theory of
harmony focused entirely on the sonority of the
major chord, if we start with the inherently unresolved tension of triads containing two equivalent
intervals (e.g., the augmented chord or diminished
chord in root position) and then bring the effects of
the upper partials into consideration, it is surprisingly easy to explain the regularities of major and
minor resolution and the sonority of all triads solely
on the basis of the relative size of the intervals contained in 3-tone chords. Such a model lies within the
empirical tradition of auditory psychophysicsand
explicitly rejects the radical ethnomusicological view
that diatonic harmony is a learned cultural artifact
with no acoustical basisbut it is one step above the
desperate reductionism of interval-based theories
that already have proved incapable of explaining even
the basics of traditional diatonic harmony.
Full details of our model and a comparison with
various interval models have been published subsequent to Tone of Voice and Mind, but the confusion in
the Scherer & Sander (2005) review indicates that further discussion concerning 3-tone psychophysics is
needed. Simply stated, Meyers musical argument is
that the perception of two (melodically or harmonically) neighboring intervals of equivalent size produces
a sense of tonal ambiguity or tension that can be
resolved only by pitch changes resulting in unequal
intervals. Just as a dissonant interval of 1-2 semitones is
perceptually the most salient 2-tone combination (and
demands resolution toward unison or toward any of
several consonant intervals), the most salient 3-tone
harmonies are those where the tones are equally spaced
(and, in the Western tradition, demand resolution
toward a major or minor chord). Meyer suggested that
the perception of such tension in the diminished and
augmented chords (and chromatic scales, in general) is
a basic Gestalt concerned with the grouping of tones in
pitch space; we have found that, if the 3-tone spacing
among the upper partials also is considered, then the
relative sonority of all triads can be accounted for
quantitatively (Cook & Fujisawa, 2006; Fujisawa &
Cook, 2006).

What is most significant about intervallic equidistancebut where Scherer & Sander (2005) suddenly
turn light-hearted in detecting a juicy argumentis
that it leads to a possible solution to the oldest conundrum in all of Western music: the affective valence of
major and minor harmonies. That is, if we take the
unsettled ambiguity of the tension chords as the reference point for discussing harmony, then major and
minor chords represent the only two possible directions
for resolution of the tension. The clearest example is the
augmented chord, where the rising or falling direction
of semitone movement of any tone determines the
mode of resolution: major (downward) or minor
(upward; Figure 1). It is a remarkable feature of diatonic harmonies that changes in any of the tension triads (diminished, augmented, and suspended fourth)
give similar results.
Such structural arguments alone do not, however,
explain why major and minor chords are typically characterized as having positive and negative affect, respectively. The main musical argument in Tone of Voice and
Mind was therefore an attempt to answer that question
on the basis of the frequency code (or sound symbolism) of animal vocalizations. Briefly, it is known that
animals signal their social strength, aggression, and territorial dominance using vocalizations with a low
and/or falling pitch; conversely, they signal social weakness, defeat, and submission using a high and/or rising
pitch (Morton, 1977). Many other signals with speciesspecific meanings also are used, but it is the rising or
falling F0 that has been found to have cross-species generality and profound affective significance for any animal within earshot, regardless of night-time obscurity,
visual angle, or jungle obstructions. A falling F0 implies
that the vocalizer is not in retreat, has not backed down
from a direct confrontation, and has assumed a stance
of social dominance. Conversely, a rising F0 indicates
weakness. How and why this F0 signal has evolved, and
its correlation with facial expressions and vowel sounds,
have been discussed amply in the literature, but its reality is not in dispute. Unfortunately, Scherer & Sander
(2005) confuse the issue by noting the additional contributions of vocal energy, spectral quality, and timbre
(p. 88)all of which is true, but irrelevant to the fact
that, all else being equal, changes in F0 have intrinsic
meaning (which is why it is called the frequency code).
Interestingly, the frequency code is known to have
spilled over into human languages, where rising and
falling intonation have related, if greatly attenuated,
meanings concerning social status . Across divers languages, falling pitch is again used to signal social
strength (commands, statements, dominance) and

MUSIC.24.3_txt.qxd 01/13/2007 03:58 AM Page 317

Major and Minor


FIGURE 1. A semitone increase in any tone of an augmented chord resolves to minor, whereas a semitone decrease resolves to major (interval structure is noted below each chord). This general pattern is found for all tension chords (augmented, diminished, suspended fourth in all inversions), unless
a semitone change results in dissonance (see Cook & Fujisawa, 2006, for a comprehensive discussion).

rising pitch to indicate social weakness (questions,

politeness, deference, and submission). As Ohala (1983,
1984, 1994) has emphasized, the sound symbolism of
pitch rises and falls is one of a very few cross-cultural
constants of human languages (see also Bolinger, 1978;
Cruttendon, 1981; Juslin & Laukka, 2003; Ladd, 1996;
Levelt,1999), and demonstrates the importance of our
biological roots for our emotional lives.
To establish a link between the affect of animal vocalizations, human speech prosody, and music, however, it
is essential to consider the auditory contexts within
which pitch changes are heard. For both human and
animal vocalizations, the pitch context is provided by
the tonic or natural frequency of the speakers voice.
The frequency code can therefore be stated solely as rising or falling pitch intervals from the tonic. The lions
roar or the dogs growl or the bosss lowered register is
alone indication of (intended) dominance. In diatonic
music, however, a range of equally possible tonic frequencies is available, so that any musical composition
must go through the gradual process of establishing
harmonic key with a definite tonic. As a consequence, in
addition to whatever inherent meaning there may be
to isolated pitches or context-free pitch movement,
there is a tonal context in diatonic music within which
pitch changes are heard. The musical question then
becomes: What is the minimal context from which

tonal movement can lead to the establishment of a

major or minor key? The answer is given in traditional
harmony theory: the minimal indication of key is a
major or minor triad requiring a pitch range of at least
seven semitones. The minimal harmonic context is
therefore a diminished chord spanning six semitones. It
is a simple consequence of the regularities of diatonic
harmony that, given the minimal context (or other
unresolved tension chord), a semitone increase can
resolve to a minor key and a semitone decrease can
resolve to a major key, but not vice versa.
It is this regularity of diatonic harmony (Figure 1) that
makes a direct connection with the frequency code. That
is, the positive or negative affect implied by tonal movement from the ambivalence of harmonic tension to a
major or minor triad is identical to the affective valence
of the pitch changes in both animal vocalizations and
language intonation: pitch decreases connote positive affect
and pitch increases connote negative affect . Although the
affective charge of music in a major or minor modality is
further attenuated relative to the emotional power of animal calls or emotional speech, the core argument is that,
all else being equal, rising or falling pitch from tonal
ambiguity to harmonic resolution has inherent emotional connotations because of the biological significance
of similar pitch changes in establishing social dominance
among competing animals (Figure 2).

MUSIC.24.3_txt.qxd 01/13/2007 03:58 AM Page 318


N. D. Cook


weakness, defeat,

politeness, assent,

rising pitch
normal tonic
of the voice

strength, victory

falling pitch


rising pitch

negative affect,


falling pitch
assertions, statements

positive affect,

FIGURE 2. The meaning of pitch changes in animal calls, human language, and music.

Scherer & Sander (2005) dismissed the frequency

code as the mechanism underlying the affect of musical
mode (a disservice to the cause, p. 88) and chose to see
the similarity between the affective responses to the
pitch changes of the frequency code and those in harmony as a generous analogy, rather than as a noteworthy identity. Clearly, however, if service to the cause is
the coherent explanation of human perception on the
basis of biological principles, the evolutionary frequency code is an interesting possibility that appears to
link the affective significance of animal calls, speech
prosody, and harmonic mode. Having rejected this evolutionary argument, apparently in favor of the nonexplanation of historically grown preferences
(p. 88)whatever that may mean in terms of perceptual mechanismsthey proceed to reverse the logic and
suggest that, if the positive/negative affect of the
major/minor modes is due to the frequency code, then
animal vocalizations should be perceived as musical!
They earnestly note that low-frequency animal calls of
dominance and threat will not necessarily result in
bright, major-like sounds...and a shrill high-pitched
alarm call does not convey a dark, minor mood (p. 88).
This is a non sequitur comparable to insisting that if
human beings ask questions using Who? then we
should hear an interrogative intonation in the owls
hoo! On the contrary, the cause-and-effect runs in the

opposite direction: the affective twinge produced by

major or minor chord resolution percolates up from the
evolutionarily-older meaning of pitch changes that has
been ingrained through countless generations of animals competing over mates and territories. The biological argument behind the frequency code is that animals
use, among several tricks, a lower voice to imply larger
physical size and therefore strength (Morton, 1977;
Ohala, 1994). To bluff their physical dominance with a
mere vocalization suffices for animal communications,
but our inheritance of the ancient evolutionary baggage
of the frequency code implies that, in both speech and
music, we cannot easily shrug off the affective quality of
changing pitch. As a consequence, when a 3-tone combination shifts away from the unresolved tension of
intervallic equidistance toward harmonic mode, we
infer an affective valence from our detection of the
direction of tonal movement. This, I maintain, may be
the biological origin of the affect of major and minor
Author Note

Correspondence concerning this article should be addressed

to Norman D. Cook, Department of Informatics, Kansai
University, 2-1-1 Reizenji, Takatsuki, Osaka, 569-1085
Japan. E-MAIL: cook@res.

B ANSE , R., & S CHERER , K.R. (1996). Acoustic profiles in vocal
emotion expression. Journal of Personality and Social
Psychology, 70, 614-636.
B OLINGER , D.L. (1978). Intonation across languages. In J.H.
Greenberg, C.A. Ferguson, & E.A. Moravcsik (Eds.),

Universals of human language: Phonology (pp. 471-524). Palo

Alto: Stanford University Press.
C OOK , N.D. (2002). Tone of voice and mind: The connections
between music, language, cognition and consciousness.
Amsterdam: John Benjamins.

MUSIC.24.3_txt.qxd 01/13/2007 03:58 AM Page 319

Major and Minor

C OOK , N.D. (2006). A psychophysical explanation for why

major chords are bright and minor chords are dark. In
Proceedings of the First International Conference on Kansei,
Kyushu, Japan, pp. 45-48. Retrieved February 2, 2006 from
C OOK , N.D., & F UJISAWA , T.X. (2006). The psychophysics of
harmony perception: Harmony is a three-tone phenomenon.
Empirical Musicology Review 1(2), 106-126. Retrieved April 15,
2006, from Cook, N.D.,
Fujisawa, T.X., & Takami, K. (2006). Evaluation of the affective
valence of speech using pitch substructure. IEEE Transactions
on Audio, Speech and Language Processing, 14, 142-151.
C RUTTENDON , A. (1981). Falls and rises: meanings and
universals. Journal of Linguistics, 17, 77-91.
F UJISAWA , T.X., & C OOK , N.D. (2006). How to calculate
harmoniousness and draw the curves: Dissonance, tension
and modality. Journal of Informatics, 25, 13-29.
JUSLIN, P.N., & LAUKKA, P. (2003). Communication of emotions
in vocal expression and music performance: Different
channels, same code? Psychological Bulletin, 129, 770-814.
L ADD, D.R. (1996). Intonational phonology. Cambridge:
Cambridge University Press.
L EVELT, W.J.M. (1999). Producing spoken language: a blueprint
of the speaker. In C.M. Brown & P. Hagoort (Eds.), The
neurocognition of language (pp. 83-122). Oxford: Oxford
University Press.
M EYER , L.B. (1956). Emotion and meaning in music. Chicago:
Chicago University Press.
M ORTON , E.W. (1977). On the occurrence and
significance of motivation-structural roles in some


bird and mammal sounds. American Naturalist,

111, 855-869.
O HALA , J.J. (1983). Cross-language use of pitch: an ethological
view. Phonetica, 40, 1-18.
O HALA , J.J. (1984). An ethological perspective on common
cross-language utilization of F0 in voice. Phonetica, 41, 1-16.
O HALA , J.J. (1994). The frequency code underlies the soundsymbolic use of voice-pitch. In L. Hinton, J. Nichols & J.J.
Ohala (Eds.), Sound symbolism (pp. 325-347). New York:
Cambridge University Press.
PARNCUTT, R. (1989). Harmony: A psychoacoustical approach.
Berlin: Springer.
R OBERTS , L. (1986). Consonant judgments of musical chords
by musicians and untrained listeners. Acustica, 62, 163-171.
S CHERER , K.R. (1986). Vocal affect expression. Psychological
Bulletin, 99, 143-165.
S CHERER , K.R. (1995). Expression of emotion in voice and
music. Journal of Voice 9, 235-248.
S CHERER , K.R., J OHNSTONE , T., & K LASMEYER , G. (2003).
Vocal expression of emotion. In R. J. Davidson, H. Goldsmith,
K. R. Scherer (Eds.). Handbook of the affective sciences
(pp. 433-456). New York and Oxford: Oxford University Press.
S CHERER , K.R., & S ANDER , D. (2005). The musical tuning of
the brain. Music Perception, 23, 87-90.
T ERHARDT, E. (1974). Pitch, consonance and harmony. Journal
of the Acoustical Society of America, 55, 1061-1069.
(2001). The neurobiological foundations for the theory of
harmony in Western tonal music. In R.J. Zatorre & I. Peretz
(Eds.), The biological foundations of music (vol. 930, pp.
92-116) The New York Academy of Sciences, New York.