Professional Documents
Culture Documents
net/publication/200806328
CITATIONS READS
3 494
1 author:
Rodrigo Segnini
Stanford University
11 PUBLICATIONS 55 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Rodrigo Segnini on 02 March 2015.
ABSTRACT Keywords
The tristimulus [Pollard & Jansson, 1982] is a timbre de-
Music Structure, Timbre Analysis, Visualization
scriptor based on a division of the frequency spectrum in
three bands. Likewise its homonymous model for color
INTRODUCTION
description, it provides an approximation to a perceptual
Timbre has a myriad of ad-hoc representations. Observa-
value through parametric control of a physical measure-
tions to the sound’s frequency spectrum are regarded as
ment. The original method for representing this descriptor
among the most objective. Several methods exist for its
circumscribes a point in a triangular space where corners
visualization; the spectrogram is a well-known example.
represent the maximum values of each band; however, this
approach is prone to cluttering when dealing with a large One of the main disadvantages of spectrograms and similar
number of data points. In this work we introduce Tim- methods lays in their inability to represent simultaneously
brescape, a method for visualizing tristimulus data as well detailed time and frequency related events. This is a conse-
as the structural boundaries that result from timbre infor- quence of an intrinsic trade-off in the size of the analysis
mation thus represented. This method uses a multi- window. A smaller window provides finer detail on time-
timescale approach that offers at-a-glance information for related events yet only broad frequency information, while
arbitrarily sized sections and the whole sound sequence in a larger window provides only a coarse depiction of time-
a single picture; it is based on the Scoregram [Segnini & related events but finer detail on frequency information.
Sapp, 2006]. We conducted an experiment to verify A multi-timescale approach that provides in a single picture
whether the structural boundaries suggested by Tim- the output of a range of analysis window sizes from an
brescape are related to those reported by listeners when arbitrary minimum to the whole duration of a sound se-
segmenting a music sequence. Results suggest that Tim- quence was reported in [Segnini & Sapp, 2006]. We ap-
brescape reflects well the salient features present in the plied this approach to timbre information in audio obtained
frequency spectrum as captured by the tristimulus model, through the tristimulus descriptor, and considered an appli-
and that it may serve to illustrate the implicit strategies cation of this new representation, labeled Timbrescape, in
used by listeners when segmenting music information. the context of timbre-based musical structure organization.
Discussion about what determines musical structure has
focused historically on the organization of pitch into me-
In: M. Baroni, A. R. Addessi, R. Caterina, M. Costa (2006) Proceedings
of the 9th International Conference on Music Perception & Cognition lodic, harmonic, and rhythmic groups. While composers
(ICMPC9), Bologna/Italy, August 22-26 2006.©2006 The Society for have been aware of the unique aspects that timbre organi-
Music Perception & Cognition (SMPC) and European Society for the zation brings to their music edifice, and have attempted
Cognitive Sciences of Music (ESCOM). Copyright of the content of an formalizations in this regard—Schoenberg’s Klangfarben-
individual paper is held by the primary (first-named) author of that pa-
per. All rights reserved. No paper from this proceedings may be repro- melodie comes to mind, it has only been investigated scien-
duced or transmitted in any form or by any means, electronic or me- tifically as an important contributor to musical structure in
chanical, including photocopying, recording, or by any information recent times (see [McAdams, 1999] for an overview).
retrieval systems, without permission in writing from the paper's primary
author. No other part of this proceedings may be reproduced or transmit- We describe an experiment were listeners generated a con-
ted in any form or by any means, electronic or mechanical, including tinuous measurement resulting in boundaries marking per-
photocopying, recording, or by any information retrieval system, without ceived timbrally contrasting sections. We found certain
permission in writing from SMPC and ESCOM. coherence between the boundaries generated by listeners
Blue
and those revealed by Timbrescape, allowing us to propose
Timbrescape as a template to depict and model listener’s Green
strategies when structuring timbrally-organized music.
Red
TRISTIMULUS
In the visual domain, tristimulus stands for the values of Sum
the three primary light wavelengths which in combination
represent any perceivable hue—hue is one of the attributes Figure 2. Colored tristimulus representation of data in Fig. 1.
of color, together with saturation and intensity.
The acoustic version, proposed by [Pollard & Jansson, pose onto each other (e.g. a whole piece’s rendition), di-
1982] divides the frequency spectrum of the acoustic signal minishing the benefits of this descriptor to the naked eye.
into three bands, with the first band reflecting the ampli-
tude of the fundamental, the second band the sum of par- TIMBRESCAPE
tials 2-4, and the third band the sum of the most significant Timbrescape attempts to provide at once tristimulus infor-
remaining spectral components. This proportion of energy mation for single and longer sequences of sound events. It
at the fundamental, mid and high-frequency regions like- assigns each band’s value in an analysis window to a pri-
wise attempts to describe any sound’s spectrum. mary hue: Red, Green, and Blue (see Fig. 2), and since
While each tristimulus point represents a unique percept, color is additive, incrementally adds results of contiguous
other combinations can also lead to the same color1 percep- analyses windows. In this way, every new level displays
tion. This is due to additional attributes of color in both, the the aggregate results of the previous one, effectively in-
visual and auditory domains, as well as coarticulation ef- creasing the size of the analysis window while maintaining
fects and time-variant characteristics. continuity in the representation. Also, since hop is kept to
one, each new level becomes a window shorter, resulting in
The tristimulus method was originally meant to convey a hierarchical picture with the shape of a triangle. At the
visually the time-variant dynamics of timbre. It is normally highest level, the tip of the triangle sums up all colors, re-
represented by circumscribing a point within a triangle flecting the global tristimulus information, whereas at the
where corners represent maximum values of each band. bottom, each color provides local tristimulus information.
The values of any two bands are sufficient to determine the
value of the remaining by complementarity. Figure 1 shows For example, if there was a single preponderant color
the evolution of timbre from a flute sound as its dynamics across the whole piece, this same color would have an al-
change from pianissimo (pp) to fortissimo (ff). most non-faded prevalence all the way to the top. On the
other hand, many different colors (timbres) at the bottom
Maximum mid- would aggregate into a grayish indistinct hue. Figure 3
freq. band energy shows the picture resulting from applying this process to
Ligeti’s Concert for Violoncello.
freq. band energy
Maximum high-
5
3 4
2
1
1
While timbre is also referred as sound color, no patent analogy exists
between the visual and auditory domains (except in synaesthetes who Figure 3. Timbrescape of Ligeti’s Concerto for Violoncello.
experience correspondences across multiple modalities).
Fig. 3 reveals a fluctuation of red, green, and blue hues resulting in boundaries which may be explained through
according to the proportions of energy on each band. If one the Timbrescape representation.
shifts the observation level to somewhere above the middle
of the picture, then approximately four main sections be- Method
come defined: (form left to right, in color) a reddish section Six (3M/3F) non-musician subjects ages 25-35 volunteered
at the beginning of about 1/6 of the total length, followed to participate in this experiment. Ligeti’s Concerto for Vio-
by a bluish of about 1/6, a reddish at the middle, and a final loncello and Orchestra (WER 60163-50) was played back
bluish taking most of the second half. At the top of the pic- using headphones in a semi quiet setting. This piece was
ture there is a mostly magenta hue reflecting an overall selected because of the importance given to timbre in shap-
combination of energy in the low and high bands, which is ing the structure of the work, instead of the more common
somewhat representative of the whole piece. compositional traits based on pitch melody and harmony. It
The previous inspection leaded us to the intuition that lis- has a duration of 12’44”.
teners may also switch among various levels of detail re- Subjects were instructed to move a slider in relation to per-
garding timbre throughout the processing of an incoming ceived changes in the music they were about to listen. One
music stream. If so, then Timbrescape may provide a land- end of the slider was labeled most familiar, and the oppo-
scape on which to represent these strategies when the struc- site end ‘not familiar’ or ‘most novel’. The size of the
turing decisions are based on timbre information movement had to proportional to others in the context of
the present audition. A clarification was made that if some-
TIMBRE-BASED MUSIC ORGANIZATION thing initially considered novel remained unchanged for a
Listeners subject incoming auditory information to some certain time, it would have to be reconsidered since by then
sort of organization in order to make possible its under- it has sounded long enough to become familiar. No further
standing. Musicians may have an advantage in this process instructions were given, except that they were free to inter-
by using prior knowledge and tools related to their trade; pret these directives with great flexibility, that there was no
however, non-musicians can perform equally well in simi- ‘right’ or ‘wrong’ in executing this task, and that further
lar tasks, especially if it involves previously unheard music clarification was possible if needed. Two trials were given
also unrelated to the domain of experience of the musician. to each participant.
Much of this experience-independent ability is based on Data acquisition was conducted using a custom-built mi-
innate processes for which there is usually a lack of aware- crocontroller-based slider device sending OSC 2 packets
ness as to what is being done while listening. And even through the serial port to a Pd3 based application. Data was
after the experience, it is hard to verbalize the knowledge captured at a 10 Hz rate and processed in Matlab to find the
that was invoked and generated. It is also not clear how this time points where the slider went above a certain threshold
implicit knowledge, for which no deliberate learning strat- (i.e. mean value); then, the coincidences among a minimum
egy was used, is represented, stored, and perused. Argu- number of subjects within a certain given lag.
ments suggest either a collection of relatively unprocessed
sequences of information, also called ‘whole exemplars’, or
abstract a set of rules describing the internal relationship of
Results
No overall correlation was found among all subjects (see
components. In any case, it is assumed that listeners ac-
examples by two subjects in Fig. 4); however, a great num-
quire this knowledge by mere exposition, and that similar-
ber of coincidences were found between at least three par-
ity among elements helps determine groupings underlying
ticipants given a lag of three seconds (plus or minus).
this knowledge [Tillmann & McAdams, 2004].
Experiment
The goal of this experiment is to evaluate how much from
these implicit grouping processes we can infer through
assessment of a subjective measure by the listener. We im-
plemented an approach analogous to [McAdams et al,
2004] using continuous ratings but tailored for novelty de-
tection. A novelty scale is indeed an open-ended barometer
of perception, and many features in the music stream (pitch
trajectories, amplitude, timbre, etc)—even outside of it—
may influence listeners’ responses. However, in the case of
timbrally organized music, we expect that transitions be-
tween contrasting timbre regions will have a prominent role
2
when discerning whether something is familiar or novel, http://www.cnmat.berkeley.edu/OpenSoundControl/
3
http://www-crca.ucsd.edu/~msp/software.html (PureData)
Figure 5. Common boundaries for most subjects overlaying a Timbrescape representation from Ligeti’s Concerto for Violoncello. The
ovoid shape results from a logarithmic aggregation that preserves the bottom colors longer in the overall picture ([Sapp, 2001] for details).
REFERENCES
Lamont, A. & Dibben, N. (2001) Motivic Structure and the
Perception of Similarity. Music Perception, 18 (3), 245-74