You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/200806328

Timbrescape: a Musical Timbre and Structure Visualization Method using


Tristimulus Data

Conference Paper · August 2006

CITATIONS READS

3 494

1 author:

Rodrigo Segnini
Stanford University
11 PUBLICATIONS   55 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Non-verbal Communication View project

All content following this page was uploaded by Rodrigo Segnini on 02 March 2015.

The user has requested enhancement of the downloaded file.


Alma Mater Studiorum University of Bologna, August 22-26 2006

Timbrescape: a Musical Timbre and Structure


Visualization Method Using Tristimulus Data
Rodrigo Segnini Sequera (rsegnini@shimojo.jst.go.jp)
Shimojo Implicit Brain Function Project (ERATO-JST), NTT R&D, Atsugi, Japan
Center for Computer Research in Music and Acoustics (CCRMA), Stanford, CA, USA

ABSTRACT Keywords
The tristimulus [Pollard & Jansson, 1982] is a timbre de-
Music Structure, Timbre Analysis, Visualization
scriptor based on a division of the frequency spectrum in
three bands. Likewise its homonymous model for color
INTRODUCTION
description, it provides an approximation to a perceptual
Timbre has a myriad of ad-hoc representations. Observa-
value through parametric control of a physical measure-
tions to the sound’s frequency spectrum are regarded as
ment. The original method for representing this descriptor
among the most objective. Several methods exist for its
circumscribes a point in a triangular space where corners
visualization; the spectrogram is a well-known example.
represent the maximum values of each band; however, this
approach is prone to cluttering when dealing with a large One of the main disadvantages of spectrograms and similar
number of data points. In this work we introduce Tim- methods lays in their inability to represent simultaneously
brescape, a method for visualizing tristimulus data as well detailed time and frequency related events. This is a conse-
as the structural boundaries that result from timbre infor- quence of an intrinsic trade-off in the size of the analysis
mation thus represented. This method uses a multi- window. A smaller window provides finer detail on time-
timescale approach that offers at-a-glance information for related events yet only broad frequency information, while
arbitrarily sized sections and the whole sound sequence in a larger window provides only a coarse depiction of time-
a single picture; it is based on the Scoregram [Segnini & related events but finer detail on frequency information.
Sapp, 2006]. We conducted an experiment to verify A multi-timescale approach that provides in a single picture
whether the structural boundaries suggested by Tim- the output of a range of analysis window sizes from an
brescape are related to those reported by listeners when arbitrary minimum to the whole duration of a sound se-
segmenting a music sequence. Results suggest that Tim- quence was reported in [Segnini & Sapp, 2006]. We ap-
brescape reflects well the salient features present in the plied this approach to timbre information in audio obtained
frequency spectrum as captured by the tristimulus model, through the tristimulus descriptor, and considered an appli-
and that it may serve to illustrate the implicit strategies cation of this new representation, labeled Timbrescape, in
used by listeners when segmenting music information. the context of timbre-based musical structure organization.
Discussion about what determines musical structure has
focused historically on the organization of pitch into me-
In: M. Baroni, A. R. Addessi, R. Caterina, M. Costa (2006) Proceedings
of the 9th International Conference on Music Perception & Cognition lodic, harmonic, and rhythmic groups. While composers
(ICMPC9), Bologna/Italy, August 22-26 2006.©2006 The Society for have been aware of the unique aspects that timbre organi-
Music Perception & Cognition (SMPC) and European Society for the zation brings to their music edifice, and have attempted
Cognitive Sciences of Music (ESCOM). Copyright of the content of an formalizations in this regard—Schoenberg’s Klangfarben-
individual paper is held by the primary (first-named) author of that pa-
per. All rights reserved. No paper from this proceedings may be repro- melodie comes to mind, it has only been investigated scien-
duced or transmitted in any form or by any means, electronic or me- tifically as an important contributor to musical structure in
chanical, including photocopying, recording, or by any information recent times (see [McAdams, 1999] for an overview).
retrieval systems, without permission in writing from the paper's primary
author. No other part of this proceedings may be reproduced or transmit- We describe an experiment were listeners generated a con-
ted in any form or by any means, electronic or mechanical, including tinuous measurement resulting in boundaries marking per-
photocopying, recording, or by any information retrieval system, without ceived timbrally contrasting sections. We found certain
permission in writing from SMPC and ESCOM. coherence between the boundaries generated by listeners

ISBN 88-7395-155-4 © 2006 ICMPC 352


ICMPC9 Proceedings

Blue
and those revealed by Timbrescape, allowing us to propose
Timbrescape as a template to depict and model listener’s Green
strategies when structuring timbrally-organized music.
Red
TRISTIMULUS
In the visual domain, tristimulus stands for the values of Sum
the three primary light wavelengths which in combination
represent any perceivable hue—hue is one of the attributes Figure 2. Colored tristimulus representation of data in Fig. 1.
of color, together with saturation and intensity.
The acoustic version, proposed by [Pollard & Jansson, pose onto each other (e.g. a whole piece’s rendition), di-
1982] divides the frequency spectrum of the acoustic signal minishing the benefits of this descriptor to the naked eye.
into three bands, with the first band reflecting the ampli-
tude of the fundamental, the second band the sum of par- TIMBRESCAPE
tials 2-4, and the third band the sum of the most significant Timbrescape attempts to provide at once tristimulus infor-
remaining spectral components. This proportion of energy mation for single and longer sequences of sound events. It
at the fundamental, mid and high-frequency regions like- assigns each band’s value in an analysis window to a pri-
wise attempts to describe any sound’s spectrum. mary hue: Red, Green, and Blue (see Fig. 2), and since
While each tristimulus point represents a unique percept, color is additive, incrementally adds results of contiguous
other combinations can also lead to the same color1 percep- analyses windows. In this way, every new level displays
tion. This is due to additional attributes of color in both, the the aggregate results of the previous one, effectively in-
visual and auditory domains, as well as coarticulation ef- creasing the size of the analysis window while maintaining
fects and time-variant characteristics. continuity in the representation. Also, since hop is kept to
one, each new level becomes a window shorter, resulting in
The tristimulus method was originally meant to convey a hierarchical picture with the shape of a triangle. At the
visually the time-variant dynamics of timbre. It is normally highest level, the tip of the triangle sums up all colors, re-
represented by circumscribing a point within a triangle flecting the global tristimulus information, whereas at the
where corners represent maximum values of each band. bottom, each color provides local tristimulus information.
The values of any two bands are sufficient to determine the
value of the remaining by complementarity. Figure 1 shows For example, if there was a single preponderant color
the evolution of timbre from a flute sound as its dynamics across the whole piece, this same color would have an al-
change from pianissimo (pp) to fortissimo (ff). most non-faded prevalence all the way to the top. On the
other hand, many different colors (timbres) at the bottom
Maximum mid- would aggregate into a grayish indistinct hue. Figure 3
freq. band energy shows the picture resulting from applying this process to
Ligeti’s Concert for Violoncello.
freq. band energy
Maximum high-

A close look to the colors at the bottom of the picture in

5
3 4
2
1

Figure 1. Tristimulus triangle with trajectory points of a flute


sound (adapted from [Ystad & Voinier, 2001]).

This approach is informative for sounds of short duration,


(e.g. a single instrument’s note), but prone to cluttering as
the number of data points increase and potentially superim-

1
While timbre is also referred as sound color, no patent analogy exists
between the visual and auditory domains (except in synaesthetes who Figure 3. Timbrescape of Ligeti’s Concerto for Violoncello.
experience correspondences across multiple modalities).

ISBN 88-7395-155-4 © 2006 ICMPC 353


ICMPC9 Proceedings

Fig. 3 reveals a fluctuation of red, green, and blue hues resulting in boundaries which may be explained through
according to the proportions of energy on each band. If one the Timbrescape representation.
shifts the observation level to somewhere above the middle
of the picture, then approximately four main sections be- Method
come defined: (form left to right, in color) a reddish section Six (3M/3F) non-musician subjects ages 25-35 volunteered
at the beginning of about 1/6 of the total length, followed to participate in this experiment. Ligeti’s Concerto for Vio-
by a bluish of about 1/6, a reddish at the middle, and a final loncello and Orchestra (WER 60163-50) was played back
bluish taking most of the second half. At the top of the pic- using headphones in a semi quiet setting. This piece was
ture there is a mostly magenta hue reflecting an overall selected because of the importance given to timbre in shap-
combination of energy in the low and high bands, which is ing the structure of the work, instead of the more common
somewhat representative of the whole piece. compositional traits based on pitch melody and harmony. It
The previous inspection leaded us to the intuition that lis- has a duration of 12’44”.
teners may also switch among various levels of detail re- Subjects were instructed to move a slider in relation to per-
garding timbre throughout the processing of an incoming ceived changes in the music they were about to listen. One
music stream. If so, then Timbrescape may provide a land- end of the slider was labeled most familiar, and the oppo-
scape on which to represent these strategies when the struc- site end ‘not familiar’ or ‘most novel’. The size of the
turing decisions are based on timbre information movement had to proportional to others in the context of
the present audition. A clarification was made that if some-
TIMBRE-BASED MUSIC ORGANIZATION thing initially considered novel remained unchanged for a
Listeners subject incoming auditory information to some certain time, it would have to be reconsidered since by then
sort of organization in order to make possible its under- it has sounded long enough to become familiar. No further
standing. Musicians may have an advantage in this process instructions were given, except that they were free to inter-
by using prior knowledge and tools related to their trade; pret these directives with great flexibility, that there was no
however, non-musicians can perform equally well in simi- ‘right’ or ‘wrong’ in executing this task, and that further
lar tasks, especially if it involves previously unheard music clarification was possible if needed. Two trials were given
also unrelated to the domain of experience of the musician. to each participant.
Much of this experience-independent ability is based on Data acquisition was conducted using a custom-built mi-
innate processes for which there is usually a lack of aware- crocontroller-based slider device sending OSC 2 packets
ness as to what is being done while listening. And even through the serial port to a Pd3 based application. Data was
after the experience, it is hard to verbalize the knowledge captured at a 10 Hz rate and processed in Matlab to find the
that was invoked and generated. It is also not clear how this time points where the slider went above a certain threshold
implicit knowledge, for which no deliberate learning strat- (i.e. mean value); then, the coincidences among a minimum
egy was used, is represented, stored, and perused. Argu- number of subjects within a certain given lag.
ments suggest either a collection of relatively unprocessed
sequences of information, also called ‘whole exemplars’, or
abstract a set of rules describing the internal relationship of
Results
No overall correlation was found among all subjects (see
components. In any case, it is assumed that listeners ac-
examples by two subjects in Fig. 4); however, a great num-
quire this knowledge by mere exposition, and that similar-
ber of coincidences were found between at least three par-
ity among elements helps determine groupings underlying
ticipants given a lag of three seconds (plus or minus).
this knowledge [Tillmann & McAdams, 2004].

Experiment
The goal of this experiment is to evaluate how much from
these implicit grouping processes we can infer through
assessment of a subjective measure by the listener. We im-
plemented an approach analogous to [McAdams et al,
2004] using continuous ratings but tailored for novelty de-
tection. A novelty scale is indeed an open-ended barometer
of perception, and many features in the music stream (pitch
trajectories, amplitude, timbre, etc)—even outside of it—
may influence listeners’ responses. However, in the case of
timbrally organized music, we expect that transitions be-
tween contrasting timbre regions will have a prominent role
2
when discerning whether something is familiar or novel, http://www.cnmat.berkeley.edu/OpenSoundControl/
3
http://www-crca.ucsd.edu/~msp/software.html (PureData)

ISBN 88-7395-155-4 © 2006 ICMPC 354


ICMPC9 Proceedings

For example, the first three minutes of Ligeti’s piece have a


single note performed continuously by the cello in such a
way that pitch information becomes easily discarded as a
source of variance in the music stream, prodding the lis-
tener to dig into the timbre subtleties that slowly signal the
character of the piece. Accordingly, most subjects left un-
touched the slider for the first few minutes of the task.
Some subjects reported concern about their ability to de-
velop a consistent criterion for moving the slider. They
wondered whether to let their own musical taste intervene
in the decision, as in considering novel that which they did
not like. We tried to reassure them by saying that they were
not going to be asked for the reasons behind their response.
At the same time, we celebrated these reports for they
brought support to a concurrent intuition to be explored in
forthcoming work about familiarity being at the base of
music ‘understanding’ and preferences, and novelty, being
necessary to avoid boredom and generate interest.
We superimposed the coincidental points among at least Figure 4. Slider movements by two subjects while listening to
any three subjects given a lag of three seconds (plus or Ligeti’s Concert fro Violoncello. These highly uncorrelated
minus onto a Timbrescape representation of Ligeti’s Con- examples are representative of all subjects. Still, similar above-
cert (see Fig. 5). By inspection, it can be noted that places threshold boundaries (stem marks) appear on particular land-
where listeners reported structural boundaries have coher- marks, like the steady-state plateau near the middle of the piece.
ence with some of those revealed by Timbrescape. This
suggests that acoustic features captured by the tristimulus lustrate strategies used by listeners when segmenting a mu-
model and highlighted by Timbrescape may indeed be par- sic stream. Extensions to this line of work include deter-
ticipant in the implicit processing underlying the under- mining whether a different performance or even a different
standing of this music. instrumentation using the same music material would elicit
similar results. We are also interested in the development
CONCLUSIONS of an objective evaluation method of listener’s implicit
An alternative method for visualizing tristimulus data segmenting strategies based on a least-cost traverse (e.g.
called Timbrescape has been introduced. Boundaries re- Viterbi-like) of a Timbrescape representation.
flecting timbre differences in a music sequence are re-
vealed by this representation providing results at the micro ACKNOWLEDGMENTS
and macro levels in the same picture. We conducted an The author would like to express sincere appreciation to
experiment to verify whether such boundaries are related to Craig Sapp with whom prior collaboration spurred the core
those reported by listeners when segmenting a music se- methods underlying of this work. Much gratitude is also
quence. Results suggest that Timbrescape reflects well the owed to Makio Kashino and Minae Okada for enriching
salient features present in the frequency spectrum as cap- discussions during the preparation of the manuscript.
tured by the tristimulus model, and that it may serve to il-

Figure 5. Common boundaries for most subjects overlaying a Timbrescape representation from Ligeti’s Concerto for Violoncello. The
ovoid shape results from a logarithmic aggregation that preserves the bottom colors longer in the overall picture ([Sapp, 2001] for details).

ISBN 88-7395-155-4 © 2006 ICMPC 355


ICMPC9 Proceedings

REFERENCES
Lamont, A. & Dibben, N. (2001) Motivic Structure and the
Perception of Similarity. Music Perception, 18 (3), 245-74

McAdams, S. (1999). Perspectives on the Contribution of


Timbre to Musical Structure. Computer Music Journal, 23
(3), 85-102.

McAdams, S., Vines, B., Vieillard, S., Smith, B., Reynolds,


R. (2004) Influences of Large-Scale Form on Continuous
Ratings in Response to a Contemporary Piece in a Live
Concert Setting, Music Perception, 22 (2), 297-350.

Pollard, H. & Jansson, E. (1982). A Tristimulus Method for


the Specification of Timbre. Acustica, 51.162-71

Sapp, C. (2001) Harmonic Visualizations of Tonal Music


Proceedings of the International Computer Music Confer-
ence (ICMC), Havana, Cuba. pp. 423-430

Segnini, R. & Sapp, C. (2006). Scoregram: Displaying


Gross Timbre Information from a Score. In Kronland-
Martinet, R., Voinier T., & Ystad, S. (Eds.), Computer Mu-
sic Modeling and Retrieval CMMR 2005, Pisa, Italy (pp.
54-59). Berlin Heidelberg: Springer-Verlag.

Tillmann B. & McAdams, S. (2004). Implicit Learning of


Musical Timbre Sequences: Statistical Regularities Con-
fronted With Acoustical Dissimilarities. Journal of Ex-
perimental Psychology: Learning, Memory and Cognition,
30 (5), 1131-42

Ystad, S. & Voinier, T. (2001). A Virtually Real Flute.


Computer Music Journal,25 (2), 13-24.

ISBN 88-7395-155-4 © 2006 ICMPC 356

View publication stats

You might also like