Professional Documents
Culture Documents
Convention Paper
Presented at the 126th Convention
2009 May 7–10 Munich, Germany
The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have
been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from
the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes
no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio
Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights
reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the
Journal of the Audio Engineering Society.
Perceived Roughness -
a Recent Psycho-Acoustic Measurement
Robert Mores1 , Thorsten Smit1 and Jana-Marie Wiese1
1
University of Applied Science, Hamburg, Germany
ABSTRACT
This paper relates to an investigation on perceived roughness from Aures in 1984 where findings are based on
psycho-acoustic tests with synthetic sounds and a small group of people. The related results have repeatedly
been used for modelling roughness perception since then, for instance in the context of noise perception.
Roughness is again an issue when investigating the perceived quality or timbre of musical sounds. In this
context, roughness is one among some ten mid-level features to be extracted. Here, perceived roughness
is measured again, but on a wider basis than in the earlier investigation. This paper outlines the psycho-
acoustic investigation, basically following the method of Aures, but modifying some of the issues under
question. The results are reasonable and differ from the earlier findings in various aspects.
cations. It also describes the true test conditions for carrier frequency of 1 kHz are specifically important.
presented results. Here this circumstance is appreciated in particular
by defining an anchor parameter set. As this set is
2. METHOD USED AND ALTERATIONS important for linking the results of the two phases,
In principle the method employed here follows the the group size has been enlarged to 50 individuals.
basic idea already used by Aures [2]: in a first The larger group on this parameter set also helps to
test phase, two tones of same carrier frequency estimate confidence levels for the other parameter
are presented, and individuals have to adjust sets with only 20 participants.
the modulation intensity for specific modulation
frequencies until the perceived roughness matches Apart from these modifications, there are a
that of a reference sound of different modulation few other issues where the Aures publication does
frequency. This first test phase therefore delivers not fully outline details of interest to the reader or
sensitivity of perceived roughness against variations of relevance to the procedure:
of modulation frequency. In a second phase, rough-
ness across different carrier frequencies is compared a) The Aures investigation says nothing about
within sound pairs, maintaining the modulation the eight individuals involved in the test. In
frequency for both sounds, and, again, adjusting the this investigation, the 50 individuals are ran-
modulation intensity until the perceived roughness domly chosen people, mainly from the faculty
matches. This second phase delivers sensitivity of environment. None of the individuals has been
perceived roughness against variations of carrier involved in the research questions behind the test.
frequency. Finally, results from the two phases are None of the individuals has specifically been trained.
roughly brought into a common context, and the
perceived roughness is mapped against the three b) Individuals have been interviewed on musi-
parameters used. This investigation uses the same cal skills or training after execution of the test.
sequence of test phases and the same final context Therefore, individuals have not been biased prior to
map. It also employs the same test procedure, using the test, nor have they been under pressure in terms
the method of adjustment on sound pairs. Most of expected performance. The subsequent question-
of the parameters of the synthetic sounds are also naire allows for classification across individuals and
identical. for meaningful evaluation of data, e.g. perception
of musicians vs. non-musicians.
The few modifications of our test approach
are fully conform with the general recommendations c) The sound pairs are randomly permutated.
for psychophysical tests from Hellbrück et al. [5]. However, the permutation has been manually
re-edited to maximize options for cross checking
a) The number of sound pairs to be adjusted the obtained raw data between groups and between
by individuals is 18 for the first test phase and 12 parameter sets.
for the second test phase, compared to 124 and
112 sound pairs in the respective Aures test. The d) The authors expected that individuals would
authors felt more confident when requesting only need the first few sound pairs for adapting to the
some 20 minutes of attention from individuals, test environment and for finding some confidence
rather than hours. when adjusting the modulation intensity. Therefore
three sounds were presented to individuals prior to
b) The group size is more than doubled com- the test phase: a slightly vibrating tone, a rough
pared to the former test, 20 instead of eight tone, and a rather rough tone.
individuals. This larger group size should deliver an
improved statistical basis. e) In addition, the uncertainty of the early
test phase is addressed by allocating a larger group
c) According to Aures’ method for the final size for the first few sound pairs. The permutation
context mapping, results for sound pairs with a has been adjusted in a way that parameter sets
with a larger group size are likely to be presented fc in Hz 125 250 500 1000 2000 4000
in the early test phase. fm in Hz 50 50 50 70 70 70
3.1. Parameters Sets, Group Size and Allocation (TG). Test groups one to five are allocated across the
The sounds consist of a carrier with carrier frequency different carrier frequencies:
fc , AM modulated with modulation frequency fm
and modulation intensity m. Sounds are synthesized index fc a b c d e f
according to TG1 x x x
TG2 x x x
ym = sin(2πfm · t) (1)
TG3 x x x
yc = sin(2πfc · t) (2) TG4 x x x
TG5 x x x
yAM = (1 + m · ym ) · yc . (3)
Table 4: Allocation of test groups (TG) across car-
Sampling rate is always fs = 44.1 kHz and the sig- rier frequencies fc (see table 1)
nal duration is tsig = 1 s.
This test uses the same set of carrier frequencies as
Aures did for his investigation: According to this allocation, the carrier frequency 1
kHz is presented to 50 individuals, whereas all other
carriers are presented to only 20 individuals.
index a b c d e f 3.2. Permutation
fc in Hz 125 250 500 1000 2000 4000 Table 5 lists the permutation map for test group
two. Letter and number for each entry correspond
Table 1: set of carrier frequencies used
to the indices of carrier and modulation frequency
according to Tables 1 and 2. For example, the
sound c1 represents a 500 Hz carrier which is
The set of modulation frequencies is reduced com-
modulated with 40 Hz. In the first test phase
pared with the Aures approach in order to reduce
the respective reference sound is a signal with the
the number of parameters and to enhance the test
same carrier frequency. Specifically, the carrier is
quality for the parameters of interest:
again fc = 500 Hz, modulated with fm = 50 Hz,
according to Table 3. Sounds under investigation
index 1 2 3 4 5 6 are always modulated with m = 0.7, whereas the
fm in Hz 40 55 65 75 90 120 modulation intensity of the reference sound is to
be adjusted. Sound pairs number four to 18 are
Table 2: set of modulation frequencies used randomly permutated with some manual correction
to avoid long sequences of always the same carrier
frequencies. The early sound pairs number one to
Sound pairs always consist of a sound under inves- three employ two distinct sequences to allow for
tigation and a corresponding reference sound, pre- cross checking raw data within and between test
defined according to table 3. The modulation fre- groups.
quency fm is chosen roughly in the area of an ex-
pected maximum for the perceived roughness. In test phase two the reference sound always
consists of a 1 kHz carrier using the same modu-
50 individuals have been organized in five test groups lation frequency as the sound under investigation
VP 2.1 VP 2.2 VP 2.3 VP 2.4 VP 2.5 VP 2.6 VP 2.7 VP 2.8 VP 2.9 VP 2.10
1 d1 d1 d1 d1 d1 d6 d6 d6 d6 d6
2 d5 d5 d5 d5 d5 d3 d3 d3 d3 d3
3 d2 d2 d2 d2 d2 d4 d4 d4 d4 d4
4 e5 b2 b1 e6 e6 b5 b4 e2 b1 e3
5 e4 d6 b3 e1 b5 e5 b5 e6 b4 e2
6 b1 b6 e5 b5 b6 e6 d5 b6 b2 b1
7 b4 b5 b2 b1 b3 b6 e5 d1 d2 b2
8 e6 e4 d3 d6 e5 b4 e3 d2 e2 b3
9 e3 b3 d4 b4 e1 e3 b1 e1 e3 e4
10 e1 d4 e4 e3 e4 d1 b3 e4 d1 d5
11 d3 d3 e2 e2 d3 d2 b2 b2 d5 d2
12 d4 e1 e3 e4 d6 d5 e4 b3 e4 d1
13 b3 e3 b4 d4 e3 e4 e1 b1 e1 e1
14 e2 e6 d6 d3 e2 b3 d2 e3 e5 b4
15 b5 b4 b6 b2 d4 b2 d1 e5 b3 b6
16 b6 b1 b5 e5 b2 b1 b6 d5 b6 e6
17 d6 e2 e1 b3 b4 e2 e6 b5 b5 e5
18 b2 e5 e6 b6 b1 e1 e2 b4 e6 b5
and a modulation intensity m = 0.7. This allows earlier decisions. Individuals were always aware of
evaluation of perceived roughness across different their work progress.
carrier frequencies.
Before the test, individuals were allowed to set their
favoured loudness level for comfortable listening, but
3.3. Test Environment and Conditions they were also instructed to maintain the loudness
Individuals executed the test alone in an acousti- level over the test. Three learning sounds were pre-
cally dry and silent room. They were given enough sented to individuals prior to the test: (i ) a slightly
time to comfortably do the test. Short steady-state vibrating tone, fc = 1 kHz, fm = 10 Hz, m = 0.4,
sounds of one second duration were automatically (ii ) a rough tone d1 with m = 0.6 and (iii ) a rather
generated on a computer and binaurally presented rough tone d4 with m = 0.9. Prior to the test,
via external sound board (U A − 25) and a headset individuals were also introduced to the use of the
(HD 202). Individuals had the free choice to repeat- very simple GUI. They were encouraged to settle all
edly listen to the sounds and to jump between the questions before they were left alone with the test.
reference sound and the sound under investigation in Questions on the purpose of the test or the related
order to adjust the level of modulation intensity m. research questions have not been answered. Individ-
Control and adjustment were done via a MATLAB- uals were encouraged to take a break between the
based graphical user interface and mouse. The con- two test phases.
trol bar for the modulation intensity allowed for ad-
justing m continuously between zero - no modulation 3.4. Test Log Book
at all - and one - maximum depth. Therefore, the The mean working duration was 13.4 min with a
technically limited parameter space was much wider, deviation of 4.0 min for the first test phase (18 sound
than the space needed for the task. Each sound pair pairs), and was 12.9 min with a 3.9 min deviation
was presented on a separate page and adjustment re- for the second test phase (12 sound pairs) after an
sults were captured from individuals with each step average break of 10 min. From 50 individuals nine
through the sequence of pages. There was no way persons set the loudness to a slightly lower level and
back in reverse direction of the sequence to lean on one person to a slightly higher level.
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
fm fm fm
Fig. 1: Results from the first test phase - comparison of different modulation frequencies for individual
carriers - graphs represent modulation intensity versus modulation frequency
fc = 125 Hz fc = 250 Hz fc = 500 Hz
1 1 1
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
fm fm fm
Fig. 2: Results from the second test phase - comparison of different carriers for individual modulation
frequencies - graphs represent modulation intensity versus modulation frequency
1.2
fc = 1 kHz
0.8
fc = 125 Hz fc = 2 kHz
fc = 4 kHz
m̃ 0.6
fc = 250 Hz
0.4
fc = 500 Hz
0.2
0
20 40 60 80 100 120 140
fm
Fig. 3: Sensitivity of perceived roughness against carrier frequency and modulation frequency
R = mα , where α ranges from roughly 1 to 2 in the comparison, the participants have been interviewed
literature, and is not necessarily considered as con- about their musical practice, instruments and time
stant over carrier frequency. Therefore the abscissa spent. The classification was done after the test,
represents a qualitative measure rather than a quan- there was no pre-selection or call for musicians. Par-
titative measure. Reading the figure appropriately, ticipants without any skills on musical instruments
the scaling still reflects the method from the test, are classified as non-musicians, whereas all partici-
where parameter sets match for equally perceived pants with training on at least one musical instru-
roughness. Absolute measures for the perceived ment are classified as musicians. By pure chance, the
roughness would require further investigations. two groups have almost equal size: 23 non-musicians
and 26 musicians.
The main result is that humans are most sensitive
in the 1 kHz region. Perceived roughness declines
with higher carrier frequency. It declines for lower Figures 4 and 5 show the results for the group of mu-
carrier frequencies and even stronger when the mod- sicians and non-musicians, respectively. There are
ulation frequency rises. For the 1 kHz carrier the only few minor differences in the results. Quartiles
maximum roughness is not confirmed at 70 Hz mod- are slightly smaller for many entries in the group
ulation. Our tests deliver a sensitivity maximum at of musicians. There are minor differences for the
a higher frequency, approximately at 90 Hz. While 125 Hz and 250 Hz carriers at low modulation fre-
maintaining Aures scaling function (Fig. 3), m̃ be- quencies. However, this seems to be an area of gen-
comes slightly larger than unity. eral uncertainty. It is the same parameter range
where the results here differ from Aures’ results.
4.3. Results across Classes Apart from this area, the difference between me-
Perceived roughness seems to be independent from dians from the two groups is always significantly
the fact, whether an individual has been actively smaller than the uncertainty of the entire approach,
exercising a musical instrument or not. For this expressed in the quartiles. Therefore it seems, that
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
fm fm fm
Fig. 4: Results from the first test phase - comparison of different modulation frequencies for individual
carriers - non-musicians
fc = 125 Hz fc = 250 Hz fc = 500 Hz
1 1 1
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
fm fm fm
Fig. 5: Results from the first test phase - comparison of different modulation frequencies for individual
carriers - musicians
there are no large differences in human perception zur gehörbezogenen Schallanalyse”, Disserta-
of roughness, whether musician or not. tion, TU München, 1984.
[3] Daniel, P., “Berechnung und kategoriale
5. FINDINGS Beurteilung der Rauhigkeit und Unan-
There are four main findings in this contribution. genehmheit von synthetischen und technischen
a) The findings of Aures can roughly be confirmed Schallen”, C. v. O. Universität Oldenburg,
concerning the medians of raw data gained from 1995.
both two test phases. Researchers can generally be [4] Fastl, H., Zwicker, E., “Psychoacoustics - Facts
reassured when trusting Aures’ data for their per- and Models”, Springer Verlag, Berlin Heidel-
ception model. However, anyone may feel free to use berg, 3rd edition, 2007.
this alternative data set for tuning or conformance
evaluation. Raw data under [12]. [5] Hellbrück, J., Ellermeier, W., Kohlrausch, A.,
b) The precision of Aures’ results, however, cannot Zeitler, A. “Kompendium zur Durchführung
be confirmed. Quartiles are generally larger here von Hörversuchen in Wissenschaft und in-
and are very reasonable as outlined. Looking back dustrieller Praxis”, Deutsche Gesellschaft für
from the experience gained in this project, the sharp Akustik e.V., Berlin 2008.
results are only obtainable with strongly precondi- [6] Sontacchi, A., “Entwicklung eines Mod-
tioned individuals. ulkonzeptes für die psychoakustische
c) Differences between medians of the two works are Geräuschanalyse unter MatLab”, Diplo-
in many cases larger than the quartiles of the Au- marbeit, TU Graz, 1998.
res results. This observation recommends that re-
[7] Schulz, H.H., “Rauhigkeit, Lautstärke und
searchers should not overemphasize accuracy when
Mithörschwellen von amplitudenmoduliertem
using a solitary data basis only.
Breitbandrauschen”, Diplomarbeit, TU
d) Musicians and non-musicians seem to perceive München, 1975.
roughness in the same way. The method used here
does not identify differences that would justify build- [8] Sottek, R., “Modelle zur Signalverarbeitung
ing separate perception models. im menschlichen Gehör”, Dissertation RWTH
Aachen, 1993.
e) The results from a group with 50 participants
did not deliver qualitatively different quartiles when [9] Sottek, R., “Gehörgerechte Rauhigkeitsberech-
compared to results from a group with 20 partici- nung”, in: Fortschritte der Akustik - DAGA
pants. 1994.
Test person Musical Active years Time spent Listening to Age Hearing
instrument hours/week classical deficiency
TP1.1 - - - y 28 n
TP1.2 guitar 17 abandoned n 31 n
bass guitar 17 abandoned
TP1.3 guitar 18 14 n 32 y1
bass guitar 18 14
TP1.4 - - - n 26 n
TP1.5 Trombone 8 abandoned n 27 n
TP1.6 saxophone 12 3-4 n 27 n
guitar 4 2
piano 2 abandoned
TP1.7 drums 6 abandoned n 27 y2
piano 3 abandoned
TP1.8 - - - n 26 n
TP1.9 - - - y 26 n
TP1.10 - - - n 35 n
Test person Musical Active years Time spent Listening to Age Hearing
instrument hours/week classical deficiency
Test person Musical Active years Time spent Listening to Age Hearing
instrument hours/week classical deficiency
Test person Musical Active years Time spent Listening to Age Hearing
instrument hours/week classical deficiency
TP4.1 - - - n 27 n
TP4.2 guitar 15 0.1 y 26 n
piano 2 0.25
vocals 1 abandoned
TP4.3 guitar 25 abandoned y 50 n
vocals 30 2
TP4.4 - - - n 30 n
TP4.5 - - - n 27 y6
TP4.6 - - - n 33 n
TP4.7 piano 20 abandoned y 27 n
saxophone 10 3.5
vocals 15 1
TP4.8 vocals 2.5 3 n 25 y7
violine 8 abandoned
TP4.9 - - - n 28 n
TP4.10 guitar 13 2 n 29 n
Test person Musical Active years Time spent Listening to Age Hearing
instrument hours/week classical deficiency
TP5.1 - - - n 24 n
TP5.2 drums 16 1 n 27 n
TP5.3 organ 8 abandoned n 31 n
TP5.4 - - - n 28 n
TP5.5 Cello 24 0.015 n 30 n
piano 12 5
TP5.6 vocals 20 0.5 y 33 n
TP5.7 - - - n 27 n
TP5.8 piano 28 8 y 25 y8
vocals 24 5
guitar 20 5
trumpet 6 abandoned
TP5.9 - - - n 27 n
TP5.10 bass guitar 6 4 n 25 n