You are on page 1of 10

An empirical approach to studying intonation tendencies in

choral performances
Johanna Devaney
Department of Music, York University, Canada
jdevaney@yorku.ca - http://www.yorku.ca/jdevaney

Daniel Ellis
Department of Electrical Engineering, Columbia University, USA
dpwe@ee.columbia.edu - http://www.ee.columbia.edu/dpwe

In: K. Maimets-Volt, R. Parncutt, M. Marin & J. Ross (Eds.)


Proceedings of the third Conference on Interdisciplinary Musicology (CIM07)
Tallinn, Estonia, 15-19 August 2007, http://www-gewi.uni-graz.at/cim07/

Background in Music Theory and Analysis. Choral intonation practices have been addressed in a number of studies
on choral acoustics. Our research both builds on this work and supplements it with a theoretical paradigm based on
work done in the areas of sensory consonance and tonal attraction.

Background in Computing. Recent work in the field of music information retrieval has discussed the main obstacles
related to tracking pitches in a polyphonic signal and has provided some techniques for working around these
problems. Our methods for analyzing the pitch content of recorded performances draw extensively on this work.

Aims. Our research is focused on the study and modeling of choral intonation practices through the intersection of
computational and theoretical approaches. We employ a methodology that allows for a detailed model of this aspect of
choral performance practice to be built from analyses of numerous recordings of real-world performances, while
working within a robust theoretical paradigm.

Main contribution. In the computational component of the research a number of a cappella choral recordings are
analyzed with signal processing techniques to estimate the perceived fundamental frequencies for the sung notes.
Local signal processing measures determine the precise instantaneous frequency in the neighborhood of the
fundamentals anticipated from the score. Averaging these features over a perceptual integration window yields a
reasonable estimate of the perceived notes in the performance. These observations can be related to the musical
context of the score through machine learning techniques to determine likely intonation tendencies for regularly
occurring musical patterns. These patterns include both composed examples that highlight commonly occurring
vertical/horizontal conflicts, such as typical cadential patterns, and repertoire excerpts that explore more specific
tuning issues.
A major issue in developing a theory of intonation practices is the potential conflict between the vertical and horizontal
intonational impetuses. To assess this conflict in greater detail we have constructed a theoretical model where theories
of sensory consonance account for vertical tuning tendencies and theories of tonal attraction account for the horizontal
tendencies. Theories of sensory consonance suggest that singers will attempt to maximize the coincidence of partials.
Our analyses of a number of vocal performances partially confirm this, though the results indicate that there are other
factors, i.e. the horizontal impetuses, at work. Theories of tonal attraction allow us to analyze how local tonal goals
function in relation to one another and within the general tonal character of the piece. The quantization of these
interactions allows us to relate them more explicitly to horizontal intonation tendencies.

Implications. In the field of music cognition, our research relates to work being done in the area of musical
expression. If the intonation tendencies inferred from the theoretical models are taken as a norm, the deviations from
this norm, when these deviations are musically appropriate, can be viewed as expressive phenomena. Most of the
relevant work in this area has dealt with rhythm, though there has been some work in the area of violin intonation that
addresses this issue. This work also has implications for research and pedagogy. Computer software implementing the
model will allow composers and musicologists to hear more intonationally accurate digital re-creations and may also
function as a training guide for vocalists.

The goal of our research is to develop a the exact tuning of each pitch may vary each
model of choral intonation practices. We time it is sounded. The central assertion of
recognize that the human voice, like non- this work is that a choir’s tuning of cannot be
fretted string instruments, is not locked into a consistently related to a single reference
single tuning system or temperament; rather point; rather a combination of horizontal and
CIM07 - Conference on Interdisciplinary Musicology - Proceedings

vertical musical factors form the reference the guide for determining consonance.
point for the tuning (Backus, 1969; Barbour, Ptolemy’s Harmonics, c. 120 AD, took the
1953). Assessing the relation between the middle road between the Pythagorean
horizontal and vertical factors is complicated numerically based methodology and the
by the strong probability that the weighting of Aristoxenean aurally based methodology,
these factors often differ in different musical contending that the Pythagorean approach
contexts. Our proposed model attempts to was essentially correct but that it should be
address these issues by employing a informed by aural perception. In the course of
methodology that is built on the interaction of his study of harmonics, the aim of which was
two distinct approaches. The first is to address both physical and perceived
theoretical, drawing on the various branches musical phenomena, Ptolemy defined a
of music theory that address harmonic and seven-note diatonic scale system with a
voice-leading practices, so-called musical variety of tunings, which was clearly
forces and musical expectation, acoustics and influenced by Archytas’ expanded notion of
psychoacoustics, and tuning, temperament, consonance and which had significant impact
and intonation. The second is computational on a number of later theorists. In particular,
based on frequency analyses of a number of his Syntonic Diatonic tuning system, which is
choral recordings. This computational generated by the syntonic comma, the 81/80
approach will provide the model with an differential between the 81/64 Pythagorean
objective basis, while the theoretical approach major third and 5/4 major third, was
will provide a contextual depth that empirical influential on later tuning theory as it was, in
analysis cannot produce. We believe it is a effect, the first articulation of 5-limit Just
strength of this methodology that the Intonation.
approaches inform one another throughout.
Boethius’ De institutione arithmetica musica
libri quinque, c. 520 A.D. and known in
Historical Background English as Foundations of Music, was the
major source of Greek music theory in the
Our methodology builds and expands on late-medieval and early Renaissance periods.
centuries of theory about choral intonation In this work Boethius laid out an extensive
practices and attempts to address these discussion of consonance and dissonance in
practices in greater detail. Questions of the Pythagorean tradition. The Pythagorean
tuning have preoccupied a significant number doctrine of limited consonance served the
of theorists from antiquity through the music of the early and middle centuries of the
present. For some ancient Greeks, tuning was medieval era, where only the fifth, four and
predominantly numerically based; the the octave were considered consonances. The
Pythagoreans, c. 500 BC, limited their rise of the third and the sixth as imperfect
definition of consonance to intervals consonances in the late medieval period led
corresponding to monochord divisions which to the use of tuning theories and systems in
employed only super-particular ratios of the which these intervals sounded more
numbers 1, 2, 3, and 4 (a series of numbers agreeable, most notably 5-limit Just
which, because they sum to ten, was known Intonation.
as the tectractys). This resulted in a system
in which only the octave (2/1), fifth (3/2), In spite of the increasingly widespread
fourth (4/3), and the compounds of the acceptance of thirds and sixths as consonant
octave and the fifth (i.e. the perfect fifteenth, intervals, a number of high Renaissance
4/1, and the perfect twelfth, 3/1) were theorists retained the Pythagorean method of
considered consonant. Archytas, c. 428-365 monochord division in their discussions of
BC, argued for the use of the 5/4 ratio for the tunings, such as Ramis’ Musica Practica of
Major Third, this idea was echoed by Didymus 1482. In late Renaissance, the Pythagorean
of Alexandria, c. 63 BC-10AD. Aristoxenus’s method of monochord divisions was expanded
Elementa Harmonica, c. 350 BC, articulated a to include chromatic and enharmonic pitches
potentially opposing view, arguing that the in Glarean’s Dodechordon of 1547, and 5-limit
ear, rather than strict mathematics, should be just-intonation divisions in Salinas’s De
musica libri septem of 1577. The first explicit

2
CIM07 - Conference on Interdisciplinary Musicology - Proceedings

discussions concerning intonation arose 1722, 1737, 1750) and Helmholtz’s theory of
around the same time; primarily because of konsonanz (Helmholtz, 1863).
the increased interest in keyboard
tuning/temperament systems, which Contemporary Work
highlighted the difference between singers’ By the early twentieth century tuning theory
tuning practices and fixed tuning of keyboard had become something of a fringe interest.
instruments. The notions of intonation practices were only
The first attempt to systematically address rarely considered, as there was no reliable
the issue of singers’ intonation practices, as way of assessing what was going on. Though
well as to develop a keyboard instrument to there was some notable mid-twentieth
emulate them, was Nicola Vicentino’s L’antica century interest in intonational practices, or
musica ridotta alla moderna prattica preferences, the relevant studies were in the
(Vicentino, 1555). Here Vicentino put forth tradition of the eighteenth and nineteenth
two tuning systems for his 31-tone gamut; century music theoretic approaches cited
one of which explicitly is named “Tuning above. For example, Boomsliter and Creel
System for the Purposes of Accompanying attempted to develop a theory of melody
Vocal Music” and which he claims, somewhat based on their experiments with musicians’
erroneously, provides pure fifths in every preferences for various tuning systems on a
key; in reality a number of the fifths are monochord-like instrument (Boomsliter and
slightly tempered. The other system produces Creel, 1961).
striking similar results, but is conceived of as An increased interest in the accurate
an augmented 1/4 comma meantone system. performance of early music over the past
This is characteristic of a recurring conflict forty years has prompted deeper
throughout late Renaissance and the investigations into historical tuning and
Baroque, between the desire for idealized temperament. Most of these studies are,
systems and the need for practicality in however, a prescriptive endeavor; the
keyboard tuning. The conflict is later musicians/singers are either instructed of the
articulated in Zarlino’s Le Institutioni ways in which they should modify their usual
Harmoniche (Zarlino, 1558), where both an intonation practices in hopes of achieving a
idealized system using a 5/4 ratio for the more historically accurate performance. Our
Major Third and the need to systematically work, in contrast, is descriptive in its attempt
temper intervals when tuning string and to create a model of common choral
keyboard instruments are discussed. Likewise intonation practices from actual choral
in his Harmonie universelle (Mersenne, performances. The descriptive nature of this
1636), Mersenne presents a mathematical study falls in line with small, but growing,
proof for the size of an equal tempered number of studies that address questions
semitone and argues that consonances are related to intonation aspects of performance,
produced through the use of 1, 2, 3, 4, and 5 particularly Fyk’s large-scale study on violin
part divisions of the string, i.e. 5-limit Just intonation (Fyk, 1995) where she attributed
Intonation. gravitational attractions at work within the
Debates concerning which was the most tonal system, and work done at the “Speech,
appropriate keyboard temperament continued Music, and Hearing” group at the Royal
to dominate tuning theory for the next Institute of Technology in Stockholm
hundred and fifty years, until equal (Ternstrom and Sundberg, 1988; Nordmark
temperament eventually won out over the and Ternstrom, 1996; Ternstrom, 2002; Jers
various meantone and well-temperament and Ternstrom, 2005) on the intonation
systems. In the eighteenth and nineteenth practices of singers, as assessed in laboratory
centuries, discussions of tuning ratios, and experiments. The use of choral recordings
later overtones, were limited to music also contextualizes this study in terms of
theoretic treatises considering arguments for actual performances, rather than contrived
consonance and dissonance within the experiments.
context of equal temperament, most notably
in Rameau’s theory of harmony (Rameau,

3
CIM07 - Conference on Interdisciplinary Musicology - Proceedings

Our study of choral intonation practices is corroborate both our ear’s ability to discern
aided by a number of these theoretical works, small variations in frequency and which
particularly those late Renaissance writings support the idea that maximal coincidence of
that addressed tuning issues in light of the partial in a vertical sonority produces maximal
contemporary vocal practices. The large body consonance. Plomp and Levelt revisited some
of a cappella repertoire that complemented of Helmholtz’s ideas through a series of tests
this theoretical tradition is a fruitful area from on untrained subjects. They discovered that
which to draw musical examples. Though this interval size, called the critical band, is also a
project is not intended as a historical survey, significant factor, their general results using
these late Renaissance writings will serve as a sine tones indicated that as their subjects
window into general choral intonation judged intervals less than a minor third and
practices. However, the temporal scope of the greater than a unison as dissonant and
musical repertoire examined in this study will intervals a minor third or greater as
be limited to music from the Renaissance and consonant. Their results also demonstrated
the early Baroque because of the wealth of a that there is some variation based on the
capella vocal music written in these periods. frequency range; the same interval in a lower
We have chosen the early Renaissance as the frequency range was generally perceived as
starting point, primarily because this music being less consonant than in a higher
demonstrates an increased use of thirds and frequency range. Plomp and Levelt’s theory
sixths compared to earlier music. also applied their critical band findings to the
interactions between the partials of pairs of
complex tones.
Music Theory
Terhardt expanded the work of Plomp and
The major issue in theorizing intonation Levelt with a theory of consonance that
practices is the conflict between vertical and reconciles psychoacoustic phenomena, which
horizontal tendencies at any given point in he termed sensory consonance, and tonal
time. We anticipate that the vertical aspects significance, or harmony (Terhardt, 1984). He
of the intonation practices will conform to the aligns sensory consonance with Helmholtz’s
harmonic series, i.e. that the upper voices will concept of Konsonanz, in this phenomenon a
coincide with the partials of the note sung by greater degree of consonance corresponds to
the bass note, and vice versa. With horizontal a lesser amount of beating. According to
aspects there is no acoustical template to Terhardt, beats that are slower than 20Hz are
refer to, rather we are utilizing recent work in audible; this phenomenon is observable when
the area of tonal tension as the basis of our two instances of the same note are played
theory of the horizontal intonation tendencies. slightly detuned. If the beating is faster than
20Hz it is perceived as roughness, an aural
Vertical Aspects
sensation akin to rattling. Beating and
Our assumption that the vertical intonation roughness may occur both between the tones'
tendencies will conform to the overtone series fundamentals and their partials. The greater
is rooted in Helmholtz's theory of consonance the degree of coincidence between the
and dissonance (1863); where he postulated partials of the two tones the less rough, i.e.
that the coincidence of a significant number less dissonant, the resultant sound is. The
of partials between two pitches produced a theory of sensory consonance makes a case
consonance whereas the absence of such for purely tuned vertical intervals, as there is
coincidence produced a dissonance. The a greater coincidence of partials between
degree of coincidence can be determined by them than with tempered intervals.
measuring the number of beats produced
The second component of Terhardt’s theory of
when the two tones are played
consonance is the tonal, or harmonic,
simultaneously. Beating is produced by
context. He aligns this component with both
interference between tones of proximate
Helmholtz’s theory of Klangverwandtschaft
frequency.
and his own virtual pitch theory. Both of
Helmholtz’s work is also the foundation of these theories suggest that the perception of
relevant psychoacoustic theories which harmonic consonance in western art music is

4
CIM07 - Conference on Interdisciplinary Musicology - Proceedings

dependent on the mind’s acquisition of an (s2/s1*1/n2). In this context Lerdahl discusses


acoustical template. In his virtual pitch the asymmetries in attraction when moving
theory, Terhardt argues that this template, from unstable pitches to stable ones and from
based on the harmonic series, allows the stable pitches to unstable ones. This
listener to perceive the pitch of complex tones demonstrates how the same interval functions
as being that of the fundamental, whether or differently in different musical contexts. For
not the fundamental is actually present. He example, the attraction of the leading tone to
expands this to harmonic consonance by the tonic is 2 (= 4/2 * 1/12) while the
arguing that the template acts as a reference attraction of the tonic to the leading tones is
point for determining whether or not the bass only 0.5 (= 2/4 * 1/12). Lerdahl suggests that
note of the current sonority corresponds to these asymmetries relate to the melodic
the virtual fundamental note that is intonation tendencies of instruments with
suggested by the template. When the real flexible-intonation capabilities. One of the
bass note and the virtual fundamental note aims of this dissertation is to investigate this
align, the sonority is perceived as consonant. connection.
The learning process associated with the
Larson posits a more complex calculation for
acquisition of this template allow for the
the phenomenon of tonal tension, which is
varying degrees of consonance, which
more explicitly focused on quantifying how
correspond with the different degrees of
listener's expectations are met or confounded
consonance that are typically assigned to
by particular musical patterns (Larson, 2004).
different types of sonorities. Terhardt argues
Larson’s model correlates the forces of
that the majority of this learning comes from
gravity, magnetism, and inertia explicitly into
exposure to the complex tones found in
a single equation. Gravity is defined as the
speech sounds; so although this learning
tendency of a musical line to go down, and is
impacts musical perception, its acquisition is
rooted in Lackoff and Johnson’s notion in
predominantly non-musical. He uses this
embedded metaphors. Magnetism, the
postulation to support a further argument
tendency of unstable notes to move to stable
that the basis of harmonic consonance, like
ones, is, like Lerdahl’s attractions, rooted in
sensory consonance, is psycho-acoustical
the psychological principles of proximity and
rather than cultural. This can be also be
stability. While inertia, the tendency of a
applied to the current issues of tuning
musical to continue rather than vary, is based
preferences in vertical sonorities.
on the gestalt principle of good continuation.
Horizontal Aspects The culminate forces acting on a note in a
given context, or pattern, is calculated by
Theories of tonal tension, particularly those summing the results of individual calculations
put forth by Lerdahl and Larson, offer tools for each force (F = wGG + wMM + wII).
with which to address the horizontal Gravity (G) is a binary, 1 or 0, as patterns
dimension. Lerdahl’s approach is a component can be assessed as either giving into gravity
of his tonal pitch space theory. He formalizes or not; i.e. if a pattern descends towards a
the tendency of a dissonant pitch to resolve more stable pitch it has a G value of 1,
to a consonant neighbor (which may be a otherwise a G value of 0. Magnetism (M =
neighbor at either the chromatic, diatonic, or 1/dto2 - 1/dfrom2) is the inverse of the square
triadic level of his pitch space model) with a of the distance in semitones from the initial
rule that observes both Bharucha’s principles note to the closest stable pitch (1/dfrom2)
of proximity and stability and proceeds in part subtracted from the inverse of the square of
on an analogy with Newton’s law of the distance in semitones from the initial note
gravitation (Lerdahl, 2001). The attraction of to the goal note in the current musical
one pitch to another is the anchoring strength context (1/dto2). Inertia (I) has a value of 1,
of the goal pitch (s2, derived from a modified 0, or -1, depending on whether the musical
version of the model’s ‘basic space’) divided pattern has inertial potential and fulfils it, has
by the anchoring strength of the source pitch no inertial potential, or has inertial potential
(s1) times the inversion of the square of the but goes against it; i.e. if a pattern continues
number of semitones between the two pitches in the direction it started with it has an I

5
CIM07 - Conference on Interdisciplinary Musicology - Proceedings

value of 1, if it moves in the opposite will form the basis of the theoretical model.
direction it has an I value of -1, and if it stays Quantization of such intuitions is necessary to
on the same pitch it has an I value of 0. He facilitate the interaction of the theoretical
uses the technique of multiple regression to model with the computational one. In turn,
find the weightings (wG, wM, and wI) for the results of this study will be able to
multiple musical contexts. provide empirical data to help substantiate
further investigations in this area.
Both of these equations serve as a starting
point for quantizing the musical intuitions that These theories are a point of intersection with
will form the basis of the theoretical model. the field of music cognition, specifically the
Quantization of such intuitions is necessary to ways in which musical forces shape musical
facilitate the interaction of the theoretical expectation (Narmour, 1990, 1992; Margulis,
model with the computational one. In turn, 2003, 2005). In particular, horizontal
the results of this study may be able to intonation practices often function as
provide empirical data to help substantiate expressive phenomena; thus, they may be
further investigations in this area. The tension related not only to musical expectation but
models provide a means of exploring also to musical meaning or emotion, as it
intonational tendencies in linear pitch relates to performance. In his 1938 text,
sequences by providing representations of Psychology of Music, C.E. Seashore suggested
how the pitches within a tonal system exist in that emotion is conveyed in performance
relation to one another outside of the through deviations from a norm. This idea
harmonic context in which they occur. Both of was superseded by Meyer's theory of musical
these systems require some adaptation to be emotion, which argued that emotional
useful for modal music, however both of the responses to music are rooted in the denial of
systems detailed above are flexible enough to the listener's expectations (Seashore, 1938).
accommodate the required modifications. Recent work by Palmer, Gabrielsson and
Specifically, adjustments to the basic space in Sloboda reconsider the issue of musical
Lerdahl’s model and the choice of reference emotion from the angle of performance
pitch collection, or alphabet, in Larson’s through their systematic explorations of
model would sufficiently adapt the systems expression. This coincides with a rise in
for this purpose. Lerdahl’s model is studies of intonation practices, in particular
particularly useful because it is internally Fyk, which have discussed the expressive
consistent, thus it generates a full aspects of intonation. Once the proposed
complement of attractional relations within a model has been fully developed it will be able
musical system. Larson’s system has great to provide some quantitative data about the
potential but requires a certain amount of typical deviations in choral performance.
tweaking as, in its current form, it has a These deviations would be measured between
number of weaknesses. Specifically, the individual choral recordings and the
inability of the model to accommodate a normalized results produced by the model
change in the governing tonic part-way when it is queried with the same musical
through a musical sequence, the inability of passage. This data can be correlated with the
the model to calculate attractions from a results of recent psychological experiments on
stable pitch, and the possibility of generating musical emotion, which could serve as the
negative values with the equation, which basis of a more nuanced theory of how
makes comparisons between results difficult. musical emotion is expressed in performance.
Once the tension models have been The reconciliation of the vertical and the
calculated they will serve as a template for horizontal is dependent on a number of
horizontal intonation tendencies in much the factors, including the duration and metrical
same way as the overtone series does for the position of a given vertical sonority, its
vertical. While, they will not provide exact function within the musical context, and the
tuning predictions, the tension models serve significance of the horizontal lines moving
as a more appropriate reference than equal through a vertical sonority, both in terms of
temperament. Thus they are the starting the relationship of its pitch material to the
point for quantizing the musical intuitions that current harmonic context and of the overall

6
CIM07 - Conference on Interdisciplinary Musicology - Proceedings

texture of the music at that point in time. The Data Extraction


development of a theory of intonation from
The main issue with using a MIDI file as a
these parameters is complicated by the fact
frequency template in the pitch-tracking
that the measurement of such factors may
algorithm is that of temporal alignment. Just
vary greatly under different musical contexts.
as the MIDI file provides equal-tempered
At points where there are conflicting
tunings, it also provides rhythmic renditions
interpretations, the data-oriented model will
according to a specified meter and tempo. In
provide additional insight into intonation
order to serve as a reference, the temporal
practices. This will require careful querying of
events in the MIDI file must be aligned with
the computer-based model with examples
the temporal events in the audio file. This can
that are informed by both the needs of the
be achieved either manually or by applying
theoretical model and representative
the results of a rhythmic analysis of the
instances in the chosen repertoire. Thus, the
recording. There are a variety of techniques
range of examples encompasses both
for automated MIDI score-polyphonic audio
composed examples that highlight commonly
alignment (Hu, Dannenberg, and Tzanetakis,
occurring vertical/horizontal conflicts, such as
2003; Raphael, 2004; Rodet, Schwarz, and
cadential patterns, and repertoire excerpts
Soulez, 2003; Shalev-Shwartz, Keshet, and
that explore more specific tuning issues, such
Singer, 2004), most of which deal quite
as the asymmetry between a melodic line
capably with a limited number of instruments
descending from the tonic to the leading-tonic
with well defined onsets and timbres. There
versus a melodic line ascending from the
are three main challenges in aligning choral
leading tone to the tonic.
recordings: First, the note onsets are often
difficult to determine, particularly when notes
Computing change under a single syllable; second, the
very nature of a choir means that all of the
The goal of the computational component of parts have roughly the same timbre, making
the model is to use statistical machine it more difficult to distinguish between parts
learning techniques to build a model of choral in certain musical circumstances; and third,
intonation practices from the microtonal pitch there is often a considerable amount of
variations between recorded performances reverberation present in choral recordings.
and twelve-tone equal temperament. Twelve- Existing techniques for MIDI score alignment
tone equal temperament is a useful reference would have to be modified in order to account
because it remains consistent in spite of for the first two idiosyncrasies. We are
changes to the tuning references. It is also currently exploring algorithms ways of
the standard system used both by music computationally reducing the amount of
software and in western art music practices reverberation in the recordings (Allen,
as a whole, making it the most universal of Berkley, and Blauert, 1977). Another
any available reference points. Due to the potential issue arises when there are multiple
challenges associated with accurately voices singing a single part. However, as this
extracting pitch information from a polyphonic also affects the intonational activity, we will
signal, the data collection is a two-step restrict our study to performances with one
process. The first step is the preliminary, and voice per part.
not entirely trivial, step of temporally aligning
a MIDI score of the work to the audio Once the MIDI file has been appropriately
recording. The second step, and the main aligned to the audio file, we can use it to
obstacle, is developing a method to guide our search for relevant frequency
accurately exact pitch data from the information in the signal. There are currently
polyphonic choral recordings. Once we have a number of relatively robust procedures for
collected a sufficient amount of data we will accurate fundamental frequency (F0)
develop the computational model with estimation that may be used to acquire a set
probabilistic machine learning techniques. of exact tunings for the recordings. Like the
MIDI alignment procedures referenced above,
these FO estimation techniques will have to
be adapted for the particular idiosyncrasies of

7
CIM07 - Conference on Interdisciplinary Musicology - Proceedings

polyphonic choral recordings. Our preliminary window, so we repeated the tests with
work in this area has focused on the amplitude weightings added to the equation.
feasibility of using only the expected FO for In order to assess the mean linear frequency
tuning information, rather than correlating for each note we found all of the points in the
the relevant partials. instantaneous frequency analysis within the
specified frequency window, averaged the
For this study we made multi-track recordings
points in each frame, and then averaged the
of four parts of a Bach chorale and used the
frames. For calculations where we factored in
musical score as guide to manually identify
the amplitude of the frequencies we weighted
note onsets and offsets in the recorded
the points in each frame by their linear
signal. In order to achieve a good estimate of
amplitude.
the frequency, unencumbered by the
transitions between notes we marked the When we examined the results on a note-by-
start time an equivalent of 10% the total note basis we observed that the inclusion of
note’s duration after the attack of the note amplitude weightings into our calculations
and the end time 10% before the end of the improved our results. The general trend in the
note’s decay. Once all of the notes were data also indicated that wider collection bands
labeled we used Alain de Cheveigné’s YIN for (i.e. 6, 8, or 10%) typically generated better
fundamental frequency estimation (de results. We also calculated the root mean
Cheveigné and Kawahara, 2002) and square error of all of the analyses for each
calculated the perceived fundamental test against the results from YIN.
frequency over the duration of the note by Interestingly, the root mean square error
taking the mean across frequencies and then assessment didn’t completely conform to the
the mean of that result across time. Following general trends that we observed in the
from perceptual studies done in this area, we dataset. Namely that in the data set the 10%
are working with the assumption that the window with amplitude weighting was
perceived vibrato is the mean of the recorded typically produced the results closest to those
fundamental frequency (Brown, 1996). There produced by YIN. We determined this
is some debate in the literature as to whether discrepancy was due to presence of small
the perceived pitch is the arithmetic or number outliers in the results. Closer
geometric mean. For deviations on the order inspection of these outliers revealed that
of a few percent the variations between the most of them occurred when there was
two were insignificant; we have opted to use another note just above or below the note
the arithmetic mean, i.e. the mean of the being analyzed. Upon removing these
linear frequency. outliers, the root mean square error
assessment was more representative of the
A composite polyphonic signal of the
patterns observable in the data.
recording was analyzed with instantaneous
frequency analysis (Abe, 1996). The output of We were able to draw a number of
this analysis was then examined, making use conclusions from the results of these tests.
of the same labels used for the monophonic First, it might not be the best solution to use
tracks to delineate the start and end point of a single % range. Although the larger ranges
the notes and the musical score to determine generally produced more accurate estimates,
the expected frequency. The frequency there were instances where the close
derived from the musical score is in equal proximity of fundamentals conflicts with
temperament, which is almost certainly not fundamentals or partials of another note.
the frequency being sung. In order to Generally, the 6% window, which is
compensate for this we allow a window equivalent to one semitone, produced the
around this frequency in our analysis. In most consistent results, but in absence of
order to determine the optimal window size, these conflicts the 10% window produced far
so we repeated the analysis with frequency superior results. Second, the use of a
windows of 2, 4, 6, 8, and 10% of expected standard set of frequency predictions from
fundamental frequency. We are also the score runs contrary to the main premise
interested in the impact of taking into account underlying this research, i.e. that the tuning
the amplitude of the frequencies within the and tuning reference points are not consistent

8
CIM07 - Conference on Interdisciplinary Musicology - Proceedings

for an entire piece. We plan to examine the and other tuning possibilities. Representing a
possibility of creating a more flexible guide for sufficient variety of queries is important so
these reference frequencies that is sensitive that we can properly examine the range of
to both musical context and tuning drift. Our conflicts between the horizontal and the
next step is to implement the automatic vertical and through which we will be able to
MIDI-Audio alignment methods described assess the large-scale intonational activity of
above and to repeat this test on a larger the singers.
group of multi-track choral performances.
Once we have developed a robust approach Other Applications of the Model
to frequency estimation we will be able to Once this model has been developed and
extract pitch and tuning information from a implemented into a usable algorithm it will
corpus of existing polyphonic recordings for have numerous applications. In the academic
which the score is available. world, it will allow composers and
musicologists to hear accurate temperament
Data Analysis
re-creations and it has pedagogical potential
While data extracted from the recordings may as a training guide for vocalists. It also has
be examined manually, we believe there is applications in the area of digital music
value in developing a computer model with production as a means of rendering both MIDI
statistical machine learning techniques that and audio recordings more accurate in
can be trained to emulate some of the intonation. In terms of MIDI recordings, the
intonation practices of a real-world choir. model would be applied through the pitch-
Such a model can serve as a normalized bend controller available in the MIDI protocol.
template against which to compare various This type of MIDI manipulation has a
recordings of the same piece, as well as precedent in the ‘groove-mapping’ algorithms
providing some empirical data for relevant introduced in the mid-nineties, where the
theories. Once such a model has been timing idiosyncrasies of real drummers were
sufficiently trained to produce reliable results, measured, modeled, and then applied in a
i.e. results which fall within the range of type of quantization algorithm. It could also
possible tuning deviations for a given musical be used for intonational corrections in audio
context observed in the data, it can be signals, e.g. autotuners. Though in this
queried with specific musical circumstances. application the model would need to
The results of these queries, they themselves piggyback on one of the numerous methods
informed by the theoretical model described that are able to achieve pitch-shifting without
above, may either verify or encourage any time stretching or compression, and
reconsideration of the theoretical model. This would initially be limited to single-voices or
is conceived as an iterative process, initially very simple polyphonic textures where the
to discover which machine learning technique individual voices are easily parsed.
is most appropriate to address this problem
and later to refine both the specific
application of the chosen technique and the
References
components of the theoretical model. This Abe, T., Kobayashi, T., and Satoshi, I. (1996).
process will be applied to a number of "Robust Pitch Estimation with Harmonics
recordings of representative Renaissance and Enhancement in Noisy Environments Based
Baroque choral works. on Instantaneous Frequency" ICSLP.
Allen, J.B., Berkley, D.A., and Blauert, J. (1977).
Once the ML model has been trained to the
“Multimicrophone signal-processing
extent that it is able to produce reasonable technique to remove room reverberation
results, we can begin to derive observable from speech signals.” Journal of the
rules that will ultimately serve as the Acoustical Society of America. 62:4.
complement to the theoretical model. Taken Backus, J. (1969). Acoustical Foundations of Music.
in combination, these two models will form a New York: W.W. Norton & Company Inc.
generalized theory of choral intonation
Barbour, J.M. (1953). Tuning and Temperament: A
practices. As discussed above, these queries Historical Survey. East Lansing, MI:
will provide both the most probable tuning Michigan State College Press.

9
CIM07 - Conference on Interdisciplinary Musicology - Proceedings

Boomsliter, Paul and Creel, Warren (1961). The Two Treatises by Jean-Philippe Rameau.
Long Pattern Hypothesis in Harmony and Indiana University: Ph.D.
Hearing. Journal of Music Theory, 5. 2-30. Raphael, C. (2004). A Hybrid Graphical Model for
Brown, J. C., Vaughn, K. V. (1996). “Pitch center Aligning Polyphonic Audio with Musical
of stringed instrument vibrato tones.” Scores. ISMIR 2004.
Journal of the Acoustical Society of Rodet, X., Schwarz, D., and Soulez, F. (2003).
America. 100:3. Improving polyphonic and poly-
de Cheveigné, A., Kawahara, H. (2002). “YIN, a instrumental music to score alignment.
fundamental frequency estimator for ISMIR 2003.Seashore, C.E. (1938).
speech and music.” Journal of the Psychology of Music. New York, NY: Dover
Acoustical Society of America 111:4. Publications, Inc.

Fyk, J. (1995). Melodic Intonation, Shalev-Shwartz, S., Keshet, J., Singer, Y. (2004).
Psychoacoustics, and the violin. Zielona Learning to Align Polyphonic Music. ISMIR
Gura: Organon Publishing House. 2004.

Helmholtz, H. (1863). On the Sensation of Tone. Sundberg, J. (1987). The science of the singing
Translated by A.J. Ellis. (1954). New York: voice. Dekalb, IL: Northern Illinois
Dover Publications. University Press.

Hu. N, Dannenberg, R., and Tzanetakis, G. (2003). Ternstrom, S. and Sundberg, J. (1988).
Polyphonic Audio Matching and Alignment “Intonation precision of choir singers.”
for Music Retrieval. WASPAA 03. Journal of the Acoustical Society of
America, 84:1, 59-69.
Jers, H. and Ternstrom, S. (2005). “Intonation
analysis of a multi-channel choir Terhardt, E. (1984). “The concept of Musical
recording.” TMH-QPSR Speech, Music and Consonance: A Link between Music and
Hearing: Quarterly Progress and Status Psychoacoustics.” Music Perception, 1:3,
Report, 47, 276-295.
1-6. Ternstrom, S. (2002). Choir acoustics – an
Larson, S. (2002). “Musical Forces and Melodic overview of scientific research published to
Expectations: Comparing Computer Models date. TMH-QPSR Speech, Music and
with Experimental Results.” Music Hearing: Quarterly Progress and Status
Perception, 21:4, 457-498. Report 43, 1-8.

Lerdahl, F. (2001). Tonal Pitch Space. Oxford: Vicentino, N. (1555). Antica musica ridotta alla
Oxford University Press. modern prattica. Translated by M.R.
Maniates (1996) as Ancient music adapted
Mersenne, M. (1636). Harmonie universelle: The to modern practice. New Haven: Yale
books on instruments. Translated by Roger University Press.
E. Chapman (1957). The Hague, M. Nijhoff.
Zarlino, G. (1558). Institutione harmoniche, 4a pt.
Nordmark, J. and Ternstrom, S. (1996). Intonation Translated by V.Cohen, (1983) as On the
preferences for major thirds with non- modes, Part four of Le institution
beating ensemble sounds. TMH-QPSR harmoniche. New Haven: Yale University
Speech, Music and Hearing: Quarterly Press.
Progress and Status Report, 1, 57-61.
Rameau, J.-P. (1722). Traité de l'harmonie. Trans.
by P. Gossett (1971) as Treatise on
Harmony. New York: Dover.
Rameau, J.-P. (1737). Generation Harmonique.
Trans. by D. Hayes (1968) in Rameau’s
Theory of Harmonic Generation: An
Annotated Translation and Commentary of
Génération harmonique. Standford
University: Ph.D. dissertation.
Rameau, J.-P. (1750) Démonstration du principe
de l’harmonie. Trans by R. Briscoe (1975)
in Rameau’s Démonstration du principe de
l’harmonie and Nouvelles réflexions de M.
Rameau sur sa Démonstration du principe
de l’harmonie: An Annotated Translation of

10

You might also like