You are on page 1of 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/249979718

Ambiguity in Tempo Perception: What Draws Listeners to Different Metrical


Levels?

Article  in  Music Perception · December 2006


DOI: 10.1525/mp.2006.24.2.155

CITATIONS READS

74 610

2 authors:

Martin Franciscus McKinney Dirk Moelants


Starkey Hearing Technologies Ghent University
59 PUBLICATIONS   1,199 CITATIONS    82 PUBLICATIONS   1,672 CITATIONS   

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Martin Franciscus McKinney on 13 April 2016.

The user has requested enhancement of the downloaded file.


Ambiguity in Tempo Perception: What Draws Listeners to Different Metrical Levels?
Author(s): Martin F. McKinney and Dirk Moelants
Source: Music Perception: An Interdisciplinary Journal, Vol. 24, No. 2 (December 2006), pp.
155-166
Published by: University of California Press
Stable URL: http://www.jstor.org/stable/10.1525/mp.2006.24.2.155 .
Accessed: 10/11/2015 20:11

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

University of California Press is collaborating with JSTOR to digitize, preserve and extend access to Music
Perception: An Interdisciplinary Journal.

http://www.jstor.org

This content downloaded from 23.235.32.0 on Tue, 10 Nov 2015 20:11:48 PM


All use subject to JSTOR Terms and Conditions
Ambiguity in Tempo Perception 155

A MBIGUITY IN T EMPO P ERCEPTION : W HAT D RAWS


L ISTENERS TO D IFFERENT M ETRICAL L EVELS ?

M ARTIN F. M C K INNEY however, that for some pieces of music, the most salient
Digital Signal Processing, Philips Research perceived tempo falls outside of this range. In these cases,
Laboratories, Eindhoven, The Netherlands the most salient perceived tempo is typically a multiple or
divisor of a tempo in the preferred range, corresponding
D IRK M OELANTS to a metrical level outside of the preferred tempo zone.
IPEM-Dept. of Musicology, Ghent University, Belgium Experiments with artificial stimuli (e.g., Handel &
Oshinsky, 1981; Parncutt, 1994) show that when sub-
THE DISTRIBUTION OF LISTENERS ’ perceived tempi jects are asked to tap the beat of a rhythmic sequence,
across large collections of music has been modeled previ- the tempi of the tapped responses vary across subjects.
ously by a resonance function with a peak near the “pre- In some cases, two metrical levels appear equally salient,
ferred tempo” of 120 beats per minute (BPM) [Van while in other cases one level is clearly more salient.
Noorden and Moelants, J. New Music Res., 28, 43–66]. The amount of perceptual ambiguity in the tempo of
Here, through a series of experiments in which listeners the beat depends on the general presentation speed
were asked to tap to the most salient pulse of musical of the sequences as well as on their structure (e.g., the
excerpts, we examined distributions of tapped tempi from ratio between two components in a polyrhythm).
single musical excerpts to see if the global resonance of Although extensive research on this subject using real
preferred tempo is dependent on musical content. Results musical excerpts is scarce, we can assume that similar
show that for some musical excerpts, the distribution of principles apply in the perception of tempo in music:
perceived tempi conforms to the global resonant form in The tempo of some pieces is perceived relatively unam-
that metrical levels with tempi near 120 BPM were per- biguously, while for other pieces, different metrical levels
ceived as most salient, while for other excerpts the most could serve equally well as the most salient tempo.
saliently perceived tempo sat well above or below 120 The ambiguity inherent in the perception of musical
BPM. We then used a model, which quantifies relative tempo is an important aspect in the distinction between
strengths of periodicities in the audio signal, to demon- notated tempo and perceptual (or perceived) tempo. By
strate that deviations from the “preferred tempo” can be perceptual tempo we mean the tempo of the most
partially explained by dynamic rhythmic accents drawing saliently perceived pulse in the music, which is typically
listeners to tempi away from the resonance. measured by letting listeners move (tap) along to what
they consider the most salient pulse. There is only one
Received March 17, 2006, accepted August 31, 2006 notated tempo for a given piece of music, while there may
be multiple (and equally plausible) perceptual tempi, each
Key words: perceptual tempo, beat induction, metric corresponding to different metrical levels. The term tactus
ambiguity, dynamic accents is often used to describe the moderately paced pulse heard
in the music, typically between 40 and 160 BPM (Lerdahl &
Jackendoff, 1983), but this also differs from our definition
of perceptual tempo in that it has a fixed range.
Van Noorden and Moelants (1999) modeled tempo

I
T IS WELL KNOWN THAT LISTENERS TEND to perceive
the tempo of music to fall in the range between 100 preferences using a resonance curve with a natural fre-
and 140 beats per minute (BPM) (Fraisse, 1982; quency around 2 Hz (120 BPM). Based on the responses
Moelants, 2002). We can extend this “preferred tempo” from an extended (1,500 excerpts) listening test in
range slightly, from 80 to 160 BPM (Moelants, 2002), to which subjects were asked to tap to the perceived pulse
define a preferred “tempo-octave,” in which the tempo of of musical excerpts, they fit a resonance curve to a his-
every piece of music should be interpretable. It is clear, togram of tapped tempi. This curve, which fit the data

Music Perception VOLUME 24, ISSUE 2, PP. 155–166, ISSN 0730-7829, ELECTRONIC ISSN 1533-8312 © 2006 BY THE REGENTS OF THE UNIVERSITY OF CALIFORNIA . ALL
RIGHTS RESERVED. PLEASE DIRECT ALL REQUESTS FOR PERMISSION TO PHOTOCOPY OR REPRODUCE ARTICLE CONTENT THROUGH THE UNIVERSITY OF CALIFORNIA PRESS ’ S
RIGHTS AND PERMISSIONS WEBSITE , HTTP :// WWW. UCPRESSJOURNALS . COM / REPRINT I NFO. ASP. DOI: MP.2006.24.2.155

This content downloaded from 23.235.32.0 on Tue, 10 Nov 2015 20:11:48 PM


All use subject to JSTOR Terms and Conditions
156 M. F. McKinney and D. Moelants

very well (r  0.97), can be used as a model to predict distribution predicted by a global resonance model.
the distribution of perceived tempi for large collections While it is relatively straightforward to systematically
of music. While similar curves have been used in beat manipulate artificial stimuli (e.g., pulse trains) to attain a
detection models as a filter that amplifies the tempi in desired effect of perceptual accents, it is not clear whether
the preferred tempo zone (Parncutt, 1994; Todd, 1999), or not the relevant information on perceptual accents
it is not clear whether a global fixed resonance curve can be readily extracted from a complex acoustic wave-
accurately describes the relative perceptual salience of form. A method to extract relevant accent information in
different metrical levels for individual musical excerpts, order to better predict the relative salience of perceptual
regardless of their musical content. In other words, if an tempi for all individual musical excerpts would be a boon
excerpt has a metrical level whose tempo falls in the for systems that automatically extract tempo from musi-
middle of the resonance (120 BPM), is that always the cal audio signals. Previous attempts at automatic tempo
most saliently perceived tempo? Or, if two metrical lev- extraction (typically measured against notated tempo or
els have tempi that straddle the resonant frequency that annotated by a single person) have met with some
evenly (e.g., 80 and 160 BPM), are those tempi always success but the most common error is a tempo octave
perceived with similar salience? error (Alonso, David, & Richard, 2004; Goto & Muraoka,
We address these questions among others in the current 1998; Klapuri, Eronen, & Astola, 2006; Scheirer, 1998). A
study through a large-scale tapping experiment that builds model that could predict the relative strength of tempi at
on two previous pilot experiments, expanding the range of different metrical levels would be able to guide such sys-
musical styles, rhythms, and tempi (McKinney & tems to more accurately predict the perceived tempo.
Moelants, 2004; Moelants & McKinney, 2004). We showed
previously that a resonance model of preferred tempo Method
seems to predict the population perception of tempo for
some individual musical excerpts but not for all. Data Experiment
from some excerpts deviate from a global fixed resonance
in that the most salient perceived tempi fall well above or The perceived tempo of 170 30-second musical excerpts1
below the resonant tempo of 120 BPM. A likely explana- was collected from 40 subjects.
tion for the deviations in the data is that different types of
rhythmic accents draw listeners to metrical levels outside PROCEDURE
of the preferred tempo range. Previous studies on pulse Subjects were asked to tap to the most salient beat of the
perception have shown that listeners can be drawn to dif- music, that is, the regular pulse that they perceived as
ferent metric levels by perceptual accents on particular most salient. The experiment was run on a personal com-
beats, including dynamic, durational, and melodic accents puter and subjects listened to the music over headphones
(Parncutt, 1994; Povel & Okkerman, 1981; Toiviainen & while tapping beats of their perceived tempo on the key-
Snyder, 2003). While these findings are likely to also be board space bar (McKinney & Moelants, 2004; Moelants &
applicable to musical audio, it has never been shown. McKinney, 2004). The data was recorded in two separate
We also examine, in the current study, correlations sessions using partially overlapping subject groups.
between musical genre and perceived tempo and between
subjects’ music training and perceived tempo. Van STIMULI
Noorden and Moelants (1999) have previously shown
that the mean tapped tempo varies slightly as a function Session 1 Fifty musical excerpts were chosen to repre-
of musical style. Further confirmation of this finding sent extreme tempo conditions. The 50 excerpts were
would be useful as a step toward linking specific stylistic selected from a larger set of 100 fragments, collected by
aspects to perceived tempo. Other studies have shown 6 musicology students on the criterion of having an
that musicians tend to tap slower than nonmusicians extremely slow or extremely fast tempo and keeping the
when asked to tap the beat to music (Drake, Penel, & range of styles broad, with a preference for lesser-
Bigand, 2000); however, we have seen the opposite trend known music. In a first stage the students tested each
in our pilot experiments. Further clarification of this other’s examples in a small-scale tapping experiment
effect could contribute to a better understanding of the and the selection of 50 pieces was made on the basis of
cognitive processes involved in keeping time to music. the results of this test. Included were pieces that yielded
Finally, we attempt to model the effect of rhythmic
accents in the musical audio signal in order to predict, for
some excerpts, the second-order “deviation” from the 1
Complete list available from the authors on request.

This content downloaded from 23.235.32.0 on Tue, 10 Nov 2015 20:11:48 PM


All use subject to JSTOR Terms and Conditions
Ambiguity in Tempo Perception 157

a uniform response (with at least 5 out of 6 students conservatory graduates performing at a professional
tapping at the same metrical level) and those for which level. Subjects were divided into two groups based on
the tapped tempo was found relatively far from pre- their musical education and performance level: 19
ferred tempo (slower than 65 BPM or faster than 175 musicians and 21 nonmusicians. The musicians all
BPM). received at least 5 years of music training and all prac-
ticed their instrument for at least 3 hours a week, with
Session 2 One hundred twenty music excerpts were cho- an average of 8 hours.
sen to provide a representative view of the diversity of
music in the world. In addition to Western popular Session 2 The 40 subjects were staff members and stu-
music, we also included classical music from different dents from Ghent University and Philips Research; 22
periods and a wide range of “world music,” extending were male and 18 female, with a mean age of 29 years.
from Bollywood to Pygmee music and from Cuban Son As in Session 1, subjects were divided into two groups
to Beijing opera. based on their musical background: 26 musicians and
The sum of the two experimental sessions gives us a 14 nonmusicians.
total of 170 excerpts, which can be grouped in 10 stylis-
tic categories (cf. Leman, Vermeulen, De Voogdt, Analyses
Moelants, & Lesaffre, 2005), each with a different
number of examples according to their diversity and For each excerpt and subject, the perceived tempo in
popularity (see Table 1). Additionally, we assured that beats per minute (BPM) was calculated from the tap
different meters were present in the stimulus set. About times using linear regression. Each tapped beat was
80% of the stimuli were binary, most others were ternary, assigned a number based on how many times the
and two examples of complex meters were included. For median tapped interval fell between its time point and
both sessions, excerpts were chosen in which the tempo the first tapped beat. Using this beat number and the
was relatively stable and the meter and texture were rela- beat time as dimensions, a line was fit through the data
tively homogenous. Both are necessary to avoid compli- using linear regression. The slope of the line was taken
cations. If the tempo changes within the excerpt, it is as the tapped tempo. This method has the advantage
impossible to characterize the music with one single that it treats every tapped beat equally in the calculation
tempo value and if there are strong changes of texture of the tempo rather than just using the first and last
within the fragment, listeners’ perception of tempo could tapped beat-times, as does, for example, the calculation
change somewhere in the middle of the excerpt. of the mean beat-interval.
We then assessed the reliability of the tapping data by
SUBJECTS examining its regularity. Subjects whose tap intervals
yielded 95% confidence intervals greater than 10 BPM
Session 1 The 40 subjects in Session 1 were Ghent in the linear regression procedure were deemed “irregular
University Musicology Department students and staff tappers.” Two subjects were excluded due to irregular
members; 21 were male and 19 female, with a mean age tapping.
of 22 years. Music experience varied from none to For each excerpt, a histogram of perceived tempo
values from all subjects was generated and used to esti-
TABLE 1. Distribution of musical excerpts according to genre mate the most salient perceived tempi; see H(T1) and
H(T2) in Figure 1. Histograms were generated with a
Category Number of binwidth of 0.5 BPM and then smoothed over ten bins.
number Description excerpts
Peaks in the histogram were detected and then taken to
1 Pop music 17 represent the salient perceived tempi of the excerpt as
2 Rock music 17 long as they included 10% of the subjects and were sep-
3 Heavy metal & punk 17 arated by at least 25% of their tempo from the next
4 African American pop music 11
highest peak. This last criterion was used in order to
(R&B, soul, reggae, rap, hip-hop)
5 Dance (techno, house, drum & bass) 11 avoid an occasional problem with double peaks in the
6 Classical music 32 histogram.
7 Jazz 11 We then checked the data from each excerpt for beat
8 Folk, chanson & cabaret 11 salience. For a given excerpt, if the total fraction of
9 Ethnic & world music 32
subjects contributing to the detected peaks in the
10 Soundtracks, light classical music 11
tempo histogram was greater than 2/3, we considered

This content downloaded from 23.235.32.0 on Tue, 10 Nov 2015 20:11:48 PM


All use subject to JSTOR Terms and Conditions
158 M. F. McKinney and D. Moelants

1
TappedTempi Histogram
0.9 Peak Tempi
Resonance Model

0.8
Percentage of Subjects

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 50 100 150 200 250 300
Tempo (BPM)
FIG. 1. Example of a perceived-tempo histogram with peaks at two tempi. Dashed curve is the resonance curve from Van Noorden and Moelants (1999).

that excerpt to have a salient beat and used the data Method) and were eliminated from further analyses.
from that excerpt in subsequent analyses. If the fraction Histograms of subjects’ tapped tempi showed clear peaks
of subjects contributing to the peaks was less than 2/3, for 157 of the 170 excerpts, while the histograms for the
we assumed the perceived beat (and associated tempo) remaining 13 excerpts showed more diffuse distributions
was too diffuse and we excluded the excerpt from fur- of tapped tempi, indicating that subjects had no clear
ther analyses. This value of 2/3 was chosen somewhat consensus of perceived tempo for those excerpts. Figure 2
arbitrarily, but, empirically, it provided a clear boundary illustrates the variety of responses by showing examples
between those histograms that showed distinct peaks of histograms of the tapped tempi for four different
and those that did not. excerpts. The histograms are plotted along with the reso-
We used the parameterized resonance model for pulse nance model from Van Noorden and Moelants (1999).
perception from Van Noorden and Moelants (1999) as a If the peaks of the tapped tempi histogram fall along
reference against which to compare peak tempi. Their the resonance curve, the level of ambiguity in perceptual
resonance model is described by the following: tempo can be predicted by the position of the metrical
levels (peaks) relative to the resonance. Peaks of equal
1 1 height (indicating high ambiguity in perceptual tempo)
Ae   ,
(1) occur if the tempi of metrical levels straddle the reso-
( f0  f )  β ⋅ f 2 2
f 04 f 4
nant frequency (Figure 2A), and peaks of unequal height
(indicating low ambiguity in perceptual tempo) occur if
where Ae is the effective resonance amplitude, fo (132 the tempo of a metrical level falls near the resonant fre-
BPM) is the resonant tempo (frequency),  (0.5), is the quency (Figure 2D). The tapped tempi histograms in
damping constant, and f is tempo (frequency). Figure 2B and C show cases where the peaks of tapped
tempi histograms do not fall along the resonance curve
Results and where the perceptual tempo is unambiguous.
Different subjects tapped to at least two different
We collected 40 tapped responses for each of the 170 metrical levels when listening to each of the 170 excerpts,
musical excerpts. Initial analyses of the tapping data although the amount of ambiguity between metrical
showed that two subjects tapped irregularly (see levels (i.e., relative height of the peaks in the perceived-

This content downloaded from 23.235.32.0 on Tue, 10 Nov 2015 20:11:48 PM


All use subject to JSTOR Terms and Conditions
Ambiguity in Tempo Perception 159

A B
1 1
Tapped Tempi
Peak Tempi
0.8 Reson. Model 0.8

0.6 0.6
Percentage of Subjects

0.4 0.4

0.2 0.2

0 0
0 50 100 150 200 250 300 0 50 100 150 200 250 300

C D
1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 50 100 150 200 250 300 0 50 100 150 200 250 300

Tempo (BPM)
FIG. 2. Results from four excerpts to illustrate the variety of responses: histograms of tapped tempi (solid line), peaks at salient tempi (), and the
resonance curve from Van Noorden and Moelants (1999) (dashed line). (A) Two tapped-tempo peaks of relatively equal height (distributed along the
resonance curve), indicating high ambiguity between metrical levels. (B) A single tapped-tempo peak at a tempo above the “preferred-tempo” range.
(C) A single tapped-tempo peak at a tempo below the “preferred-tempo” range. (D) Three tapped-tempo peaks (distributed along the resonance curve)
showing a moderate level of ambiguity between metrical levels.

tempo histograms) differed across excerpts. In two the nonmusicians. The difference is not significant,
cases, the ambiguity was extremely low and only one F(1, 38)  1.99, p  .167, but it contradicts earlier
subject tapped at a metrical level different from the research that shows that musicians tap slower than non-
most common response. In contrast, nine excerpts musicians (Drake et al., 2000). Because this effect has
elicited responses in which the most common tempo already been found elsewhere (McKinney & Moelants,
was tapped by a minority of the subjects, indicating a 2004), it would be interesting to investigate it further.
high degree of ambiguity in metrical level across subjects. To examine the effect of genre on tapped tempo, we
On average, for a single excerpt, 71.3% of the subjects grouped the excerpts into 10 different genres (see Table 1)
tapped to the most salient tempo, while 22.5% tapped and computed tempo statistics per genre. We found a
to the secondary tempo. In 75% of the cases, a third highly significant effect of genre on the mean of the most
metrical level was reported at least once, and, overall, salient (primary) tempo, F(9, 160)  8.24, p  .001,
5.3% of the responses were given to this third level. In with averages varying from 76.2 BPM (classical) to
18.75% of the fragments, a fourth level is reported, 167.1 BPM (metal/punk) (see Figure 3A). This effect
although this represents only a small percentage of the was not present in the second most salient (secondary)
total answers (0.8%). tempo, where there was much more variance. In most
We found considerable differences between responses genres, the primary and secondary tempi were found
of individual subjects. The mean perceived tempo from more or less in the same range, however, t tests show a
each subject in Session 2 varied from 51 to 141 BPM. A significant difference in three genres: metal/punk
significant difference between male and female subjects (means 167.1/108.6, t  2.98, df  32, p  .01), classi-
or between musicians and nonmusicians was not found, cal (means 76.2/106.5, t  2.65, df  62, p  .05),
nor was there a relation between age and mean perceived and world/ethnic (means 119.4/93.9, t  2.12, df  62,
tempo. One interesting finding is that, in the data from p  .05). Clearly in Western classical music a lower
Session 2, the musicians tapped on average faster than metrical level seems to dominate, while in the other

This content downloaded from 23.235.32.0 on Tue, 10 Nov 2015 20:11:48 PM


All use subject to JSTOR Terms and Conditions
160 M. F. McKinney and D. Moelants

A 220 1 B
Primary Tempo Primary Tempo
Secondary Tempo Secondary Tempo
200 0.9 Third Tempo

0.8
180

Percentage of Subjects
0.7
160
Tempo (BPM)

0.6
140
0.5
120
0.4

100
0.3

80
0.2

60 0.1

40 0
an

an
nk

rld

nk

rld
z

k
k

k
p

k
e

e
ica

ica
jaz

ac

jaz

ac
roc

roc
fol

fol
po

po
nc

nc
/pu

/pu
ric

ric
wo

wo
ss

ss
dtr

dtr
da

da
me

me
tal

tal
cla

cla
un

un
me

me
na

na
so

so
ica

ica
afr

Genre afr

FIG. 3. Effect of genre on perceived tempo. Left (A) the tempo of the two most common responses for each genre (mean and 95% confidence interval
of the mean). Right (B) the percentage of responses in the three most common tempi of each excerpt for each genre.

two genres, the faster metrical level is usually dominant between the first and the third, and an additional
(see Figure 3A). Additionally, we saw significant effects 4 times between the second and third most salient
of genre on the relative share of responses at the primary tempo. Thus 8.8% of the excerpts contain some metric
tempo, F(9, 160)  3.27, p  .01, and at the secondary ambiguity.
tempo, F(9, 160)  3.31, p  .001 (see Figure 3B). There The peaks of the 157 tapped-tempo histograms are
was no significant effect of genre on the third tempo. plotted in Figure 4 along with the resonance curve from
These results show that, for some genres, there is gener- Van Noorden and Moelants (1999). As illustrated for
ally more ambiguity in metrical level than in others. individual histograms in Figure 2 (panels B and C), the
Here, ambiguity was lowest for the metal/punk excerpts data in Figure 4 show many high peaks at tempi above
(82.6% responses at the primary tempo versus 13.4% at the “preferred-tempo” region as well as below it. There
the secondary tempo) and highest for the jazz excerpts are also numerous low peaks at tempi in the middle of
(66.2% responses at the primary tempo, 28.4% at the the “preferred-tempo” region, indicating that other fac-
secondary tempo) (see Figure 3B). tors in those excerpts are drawing listeners away from
In addition to the ambiguity in tempo (metrical level) the “natural tempo” of 120 BPM. Previous work on
perception, we also observed an ambiguity in the per- accents in pulse trains suggests that perceptual accents,
ception of meter for a few excerpts. While most of the including dynamic accents, can draw listeners to a par-
excerpts contained binary or ternary meters, two pieces ticular metrical level (Parncutt, 1994). It is unclear how
had asymmetric meters, one in 5/4 and one in 7/8. As directly these findings apply to more complex musical
expected for these excerpts with the asymmetric meters, excerpts, so we started by examining the effect of
we found responses that were characterized by salient dynamic accents in our data.
perceived tempi with nonsimple ratios. In addition, we
saw for several of the “straight” excerpts (most with 6/8 Model
or 12/8 meter) an ambiguity between duple and ternary
meter evidenced by salient perceived tempi with a 2:3 In order to test the idea that perceptual accents in the
ratio. This “metric confusion” was found 4 times musical excerpts can determine the relative importance
between the first and second most salient tempi, 7 times of different metrical levels, we tried to predict, from

This content downloaded from 23.235.32.0 on Tue, 10 Nov 2015 20:11:48 PM


All use subject to JSTOR Terms and Conditions
Ambiguity in Tempo Perception 161

1
Peak Tempi
0.9 Resonance Model

0.8
Percentage of Subjects

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 50 100 150 200 250 300 350 400
Tempo (BPM)
FIG. 4. Salient perceived tempi (histogram peaks) from the current experiment and the resonance curve from Van Noorden and Moelants (1999)
(fo  132 BPM,   0.5). Correlation coefficient: r  0.16 (p  .01).

acoustic analyses, which excerpts would elicit per- that more subjects tapped at a slower tempo than the
ceived-tempi distributions that deviate from an equal resonance model would predict, and a value greater
distribution around the peak in the resonance curve. than 1.0 indicates the opposite.
The most direct accent type to gauge via acoustic Next we calculated a statistic, ∆a (acoustic periodicity
analysis is the dynamic accent, which we estimated difference), from the acoustic waveform of each musical
from the relative energy in a time-limited portion of excerpt to represent the relative strengths of periodici-
the signal. ties at the peak tempi, Ts. A statistic such as ∆a should
The first step in the analysis was to define a value to embody the relative strength of dynamic accents occur-
represent the deviation of perceived tempo for a par- ring at time intervals corresponding to all peak tempi Ts
ticular excerpt with respect to preferred tempo. We and therefore allow us to relate acoustic dynamic
used the peak in the resonance curve from Van accents to ∆p, our measure of perceptual tempo devia-
Noorden and Moelants (1999), Tpeak  129 BPM, as tion. Our hypothesis was that ∆a should correlate posi-
the preferred tempo, and expressed the perceptual tively with ∆p, reflecting the idea that dynamic accents
tempo deviation, ∆p, of a tapped response as a function occurring at periodicities (i.e., tempi) below or above
of its peak tempi, Ts: the preferred tempo should draw listeners away from
S the preferred tempo. In what follows, Ts is redefined to
∑H (Ts )⋅Ts Tpeak include all peak tempi (Ts from the ∆p calculation), as
∆p  s1
, (2) well as their integer multiples and quotients in the
S range of 20 to 350 BPM. This was done in order to more
∑H (Ts ) fairly represent the entire range of perceivable tempi in
s1
the calculation of ∆a.
where H(Ts) is the histogram value at tempo Ts , s is the The acoustic tempo deviation, ∆a, was calculated in
histogram peak number, and Tpeak is the tempo at the two stages, following a similar strategy used in many
peak of the resonance curve (Van Noorden & Moelants, systems for automatic tempo extraction (Alonso
1999). The number of peak tempi (S) per histogram et al., 2004; Goto & Muraoka, 1998; Klapuri et al.,
ranged from 1 to 3. A value of less than 1.0 for ∆p indicates 2006; Scheirer, 1998). The typical strategy is to first

This content downloaded from 23.235.32.0 on Tue, 10 Nov 2015 20:11:48 PM


All use subject to JSTOR Terms and Conditions
162 M. F. McKinney and D. Moelants

calculate a driving signal from the audio excerpt and M j 2πm


then use it to drive a series of periodicity detectors in
the second stage. Periodicities detected in the driving
∑ Pb, s [m]⋅e M

m1
SI b, s  , (5)
signal are then taken as tempo candidates for that M
particular excerpt. Here, to calculate ∆a, we use the ∑ Pb, s [m]
relative strengths of predefined tempo candidates m1
(Ts) to calculate a weighted tempo ratio relative to
Tpeak. For each musical excerpt calculated the driving where M is the length of the period histogram.
signal for ∆a in a similar manner to that in Scheirer
 Finally, the energy values and the synchronization
(1998): indices were combined across frequency bands to
form the acoustic tempo deviation:
 The audio signal, x[n], was first band-pass filtered S
into 6 frequency bands (0–200, 200–400, 400–800,
800–1600, 1600–3200, and 3200–22050 Hz), yielding 6 ∑Eb, s ⋅SI b, s ⋅Ts Tpeak
∆a  ∑ s 1
, (6)
six filtered versions of the original signal, xb[n], where S
b is the frequency band (1–6), and n is the sample
b1
∑Eb, s ⋅SI b, s
number. Further processing was the same in each s1
band and occurred independently.
 The instantaneous power of the signal in each band  where s is the index to the number of peak tempi (S),
was calculated, low-pass filtered, and down-sampled b is the frequency band index, and Tpeak is the tempo
to 900 Hz in order to arrive at the temporal envelope at the peak of the resonance curve (Van Noorden &
for each frequency band, eb[k]. Moelants, 1999).

The temporal envelopes, eb[k], were taken together as The value of ∆a is greater than 1.0 if there is more
the driving signal for ∆a. In the second stage of process- energy (i.e., dynamic accents) in the musical excerpt at
ing we calculated the strength of periodicities in eb[k] at the periodicities (Ts) greater than Tpeak than at periodic-
the tempi of interest, Ts: ities less than Tpeak. It is less than 1.0 if the reverse is true.
Figure 5 shows the perceptual tempo deviation, ∆p,
 Period histograms, Pb,s[m], were generated from the plotted against the acoustic tempo deviation, ∆a, for all
temporal envelopes, eb[k], for all (s) periods corre- musical excerpts. While there is quite a bit of spread in the
sponding to the tempi Ts: data, there is a significant positive correlation between the
two parameters. This correlation is consistent with the
(K M )/M idea that periodic dynamic accents can draw listeners to
Pb, s [m]  ∑ eb [m  pM ], for 0  m  M , (3) metric levels away from the preferred tempo region. The
p0 deviation parameters are best correlated at values of ∆p
less than 1.0 (r  0.65, p  .0001), indicating that periodic
where K is the length of the excerpt (in samples), and dynamic accents are more effective at drawing listeners to
M is the length of the period (sample_rate/Ts). slower tempi than to faster tempi.

 The energy, Eb,s, in each period histogram was calcu- General Discussion
lated by summing the histogram, Pb,s[m], and divid-
ing by the length of the excerpt (number of Our main findings can be summarized as follows:
samples):
1. When asked to tap to the most salient beat of musical
1 M excerpts, different listeners tapped at different tempi,
Eb, s  ∑ Pb, s [m]
K m1
(4) corresponding to the various metrical levels of the
excerpt. The ambiguity in which metrical level best
served as the basis for the perceived tempo can be seen
 The synchronization index, SIb,s, was calculated from in histograms of tapped tempi from different listeners,
each period histogram to estimate the strength of where several peaks are typically present (Figure 2).
periodicity (at tempo Ts) in the temporal envelope For some pieces, up to four different metrical levels are
from frequency band b: present in these histograms of tapped tempi.

This content downloaded from 23.235.32.0 on Tue, 10 Nov 2015 20:11:48 PM


All use subject to JSTOR Terms and Conditions
Ambiguity in Tempo Perception 163

∆p: Perceptual tempo deviation 1.8

1.6

1.4

1.2

0.8

0.6

0.4

0.2

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
∆ : Acoustic tempo deviation
a
FIG. 5. Perceptual tempo deviation ratio versus Acoustic tempo deviation ratio for data from all experiments. Correlation coefficient: r  0.51, p  .0001.

2. Tapped tempi across subjects varied considerably; phenomenon. In many scores and theoretical works, the
however, no significant effect of age, sex, or musician- tempo is determined with a single metronome number.
ship was seen in the data. There was a slight trend for However, we see that for almost all musical excerpts,
musicians to tap faster than nonmusicians, but it was different subjects perceive different metric levels as
not statistically significant. being the most salient. The relative importance of the
3. There was a significant effect of music genre on the different metric levels can be considered an image of
mean primary (most salient) tempo and also on the the metric character of the music. The presence of very
percentage of subjects tapping at the primary tempo. fast responses, for example, can reflect a sense of speed
This can be interpreted to mean that the perceived perceived even by people who tap at a lower metric
tempo is significantly less ambiguous for some genres level.
(e.g., metal/punk) than for others (e.g., jazz). The ambiguity of meter perception was also seen in
4. In addition to tempo ambiguity, we also saw an our results, and although this phenomenon was not
ambiguity in meter perception for a few excerpts that extremely widespread, it has important consequences
yielded salient perceived tempi with nonsimple integer when thinking about metric structures. In music theory,
ratios. the meter of a piece is considered an unambiguous fac-
5. For many excerpts, the distribution of peak tempi tor; although it can change from one bar to the next, it
from perceived-tempo histograms is not predicted is typically fixed within a given bar or section. Yet, it is
by the preferred tempo resonance curve from Van clear from our results that this is not true from a percep-
Noorden and Moelants (1999). However, a large por- tual point of view; some pieces could just as easily be
tion of this deviation from a global preferred reso- interpreted with a ternary metric structure as with a
nance can be accounted for if we estimate and binary. It is likely that cues for both interpretations are
quantify only one type of perceptual (dynamic) present in the music; however, their perception and
rhythm accent from the music audio. relation to tempo needs further exploration.
The resonant behavior of tapping rates seen here and
Our experiments clearly show that we cannot speak in previous studies is likely linked to constraints in the
of one single tempo if we consider tempo as a perceptual motor system (Todd, 1999). This leads to a difficulty

This content downloaded from 23.235.32.0 on Tue, 10 Nov 2015 20:11:48 PM


All use subject to JSTOR Terms and Conditions
164 M. F. McKinney and D. Moelants

with using a tapping method to study tempo perception, found any significant factor of subject background on
as noted by Toiviainen and Snyder (2003): We cannot the tapping speed. This would be better explored fur-
separate the actual perception of the tempo from sub- ther in a larger study that looks at many different types
jects’ tapping (in)abilities. Subjects may, for example, of subject groupings, including aspects such as musi-
perceive a salient tempo that is simply too fast for them cianship, age, and culture.
to tap. Our results might look quite different if we had Our starting point was the assumption that relative
asked subjects to tap with their feet, legs, or heads, which salience of different metrical levels in a piece of music
would probably constrain fast tapping even more. could be predicted by the distribution of a resonance
Nevertheless, if we hold subjects to all use the same curve. In order to explain the distribution of tapped
method for tapping (here, their finger on a spacebar) we tempi to different components of polyrhythms, Van
see certain cases where virtually all subjects tap (to the Noorden and Moelants (1999) attributed relative
same excerpt) at very high tempi (e.g., Figure 2B) and weights to the different components in the polyrhythm.
others where they are completely divided over slower This can be done when modeling a fixed set of stimuli,
tempi (Figure 2A). Thus tempo cues from the music and but is not feasible when modeling music tempo in gen-
subjects’ perception of tempo cannot be completely eral. In this study we extracted acoustic cues from the
masked or defined by motor-system constraints. It is music in order to assign weights to the different metrical
likely that a combination of factors from the music, the levels. Our modeling results suggest that if the resonance
motor system, individual training, and experience (both is combined with models for perceptual accents, the
motor and music) contributes to individuals’ preferred distributions of tapped tempi can be predicted more
tempo. This is reflected in our results that show clear dif- accurately. We have started by modeling dynamic
ferences across individual subjects, where some were accents, but it is clear that other types of accents should
slow tappers, in general, and others were fast. also be included, such as melodic, durational, and other
We must also bear in mind that the distribution of rhythmic and metrical accents.
tapped tempi for a particular excerpt from the popula-
tion of subjects (e.g., the histograms in Figure 2) does Author Note
not necessarily depict the relative salience of tempi in
any individual subject. Our data and statistics here rep- Address correspondence to: Martin McKinney, Philips
resent the perception of the population as a whole Research, High Tech Campus 36, 5656 AE Eindhoven,
rather than of individual subjects. We have not yet The Netherlands. E-MAIL martin.mckinney@philips.com

References

A LONSO, M., DAVID, B., & R ICHARD, G. (2004). Tempo and H ANDEL , S., & O SHINSKY, J. S. (1981). The meter of syncopated
beat estimation of musical signals. In Proceedings of the 5th auditory polyrhythms. Perception and Psychophysics, 30, 1–9.
International Conference on Music Information Retrieval K LAPURI , A., E RONEN , A., & A STOLA , J. (2006). Analysis of the
(ISMIR) (pp. 158–163). Barcelona: Universitat Pompeu Fabru. meter of acoustic musical signals. IEEE Transactions on Audio,
D RAKE , C., P ENEL , A., & B IGAND, E. (2000). Why musicians Speech, and Language Processing, 14, 342–355.
tap slower than nonmusicians. In P. Desain & L. Windsor L EMAN , M., V ERMEULEN , V., D E VOOGDT, L., M OELANTS , D.,
(Eds.), Rhythm perception and production (pp. 245–248). & L ESAFFRE , M. (2005). Prediction of musical affect using a
Lisse: Swets & Zeitlinger. combination of acoustic structural cues. Journal of New Music
F RAISSE , P. (1982). Rhythm and tempo. In D. Deutsch (Ed.), Research, 34, 39–68.
The psychology of music (pp. 149–180). New York: Academic L ERDAHL , F., & JACKENDOFF, R. (1983). A generative theory of
Press. tonal music. Cambridge, MA: MIT Press.
G OTO, M., & M URAOKA , Y. (1998). Music understanding at the M C K INNEY, M., & M OELANTS , D. (2004). Deviations from the
beat level: Real-time beat tracking for audio signals. In D. F. resonance theory of tempo induction. In R. Parncutt, A.
Rosenthal & H. G. Okuno (Eds.), Computational auditory Kessler, & F. Zimmer (Eds.), Abstracts of the Conference on
scene analysis (pp. 157–176). Mahwah, NJ: Lawrence Erlbaum Interdisciplinary Musicology (pp. 124–125; Full text on
Associates. CD-ROM). Graz: Dept. of Musicology, University of Graz.

This content downloaded from 23.235.32.0 on Tue, 10 Nov 2015 20:11:48 PM


All use subject to JSTOR Terms and Conditions
Ambiguity in Tempo Perception 165

M OELANTS , D. (2002). Preferred tempo reconsidered. In C. P OVEL , D.-J., & O KKERMAN , H. (1981). Accents in equitone
Stevens, D. Burnham, G. McPherson, E. Schubert, & J. sequences. Perception & Psychophysics, 30, 565–572.
Renwick (Eds.), Proceedings of the 7th International S CHEIRER , E. D. (1998). Tempo and beat analysis of acoustic
Conference on Music Perception and Cognition, Sydney, 2002 musical signals. Journal of the Acoustical Society of America,
(pp. 580–583). Adelaide: Causal Productions. 104, 588–601.
M OELANTS , D., & M C K INNEY, M. (2004). Tempo perception TODD, N. P. M. (1999). A sensory-motor theory of rhythm,
and musical content: What makes a piece fast, slow or time perception and beat induction. Journal of New Music
temporally ambiguous? In S. D. Lipscomb, R. Ashley, R. O. Research, 28(1), 5–28.
Gjerdingen, & P. Webster (Eds.), Proceedings of the 8th TOIVIAINEN , P., & S NYDER , J. S. (2003). Tapping to Bach:
International Conference on Music Perception & Cognition, Resonance-based modeling of pulse. Music Perception, 21,
Evanston, IL, 2004. Adelaide: Causal Productions. 43–80.
PARNCUTT, R. (1994). A perceptual model of pulse salience VAN N OORDEN , L., & M OELANTS , D. (1999). Resonance in the
and metrical accent in musical rhythms. Music Perception, 11, perception of musical pulse. Journal of New Music Research,
409–464. 28, 43–66.

This content downloaded from 23.235.32.0 on Tue, 10 Nov 2015 20:11:48 PM


All use subject to JSTOR Terms and Conditions
This content downloaded from 23.235.32.0 on Tue, 10 Nov 2015 20:11:48 PM
All use subject to JSTOR Terms and Conditions
View publication stats

You might also like