Professional Documents
Culture Documents
To cite this article: Horacio Vaggione (1994) Timbre as syntax: A spectral modeling approach, Contemporary Music Review,
10:2, 73-83, DOI: 10.1080/07494469400640311
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Contemporary Music Review, 9 1994 Harwood Academic Publishers GmbH
1994, Vol. 10, Part 2, pp. 73-83 Printed in Malaysia
Reprints available directly from the publisher
Photocopying permitted by license only
This paper proposes some general comments concerning a syntactic approach to the domain of
micro-time, that is, to the realm of what we call "timbre". I will attempt to do this by referring to some
steps toward Spectral Modeling. However, this not an historical essay, since these problems and
perspectives are related to the present state of the field.
The concept of syntax, as it is used h6re, refers to the modalities of any temporal articulation. It can
be seen as a generic categorization of hierarchies dealing with nested chains of events which appear
as composed time. In this perspective, the usual link between syntax and common language constitutes
a special case, which cannot be generalized to all situations involving an articulation of temporal
relationships. To situate the concept in this context I can recall the aims of the science of Syntactics,
which emerged in the field of formal logic at the beginning of the century (defined by Cournot as a
thdorie des enchafnements), as well as its prolongations in the field of early Artificial Intelligence,
especially in relation to pattern analysis and recognition strategies (Fu, 1974).
Introduction
phenomena as fusion and streaming. In this light, we can also regard SchOnberg's
ideas about timbre (in German Klangfarben, the "color" of sound) as the first
conscious statement concerning a prolongation of harmony into spectrality. The
term "'Klangfarben" does not refers to the cause of sound (as the French word
"timbre" suggests: a postage stamp, a declaration of origin). One of the main
problems that made the concept of timbre so "fuzzy" and controversial, to the
point that sometimes nobody knows what it is all about, comes from this
difference of etymology, which corresponds in fact to two different stages of
development of the idea.
In the 1950s, the serialization of timbres was first understood as a permutation
of instrumental sounds; a row of timbres: clarinet, piano, oboe, trumpet, etc. As
Downloaded by [University of North Texas] at 18:23 23 November 2014
long as the number of objects in the row of timbres did not coincide with the
number of objects in the row of pitches, it happened that each rotation of the
timbre row resulted in different combination with pitch elements, as well as with
other parameters. While almost never realized in this rigid way, this mechanism
constituted the basic model of the serial approach. In any case, even in the
pre-serial period, we can find in the so-called timbral manipulations of Webern
this meaning of timbre, corresponding strictly to the instrumental source of a
sound.
Sch6nberg was the creator of serial combinatoriality, but he was indeed aware
of another signification of the phenomenon of timbre, related to the structure and
not to the origin of a sound. It is not by chance that the famous passage about pitch
as a component of timbre comes at the end of his Treatise on Harmony (1911). But
the postulated expansion from harmony to timbre needed pertinent tools allowing
a descent in the time scale to the level of micro-time, in order to develop a
micro-composition to be linked to the surface of the musical discourse - the level
of the "note" - encompassed by an articulated view of the ensemble of relations
involved. Unfortunately Sch6nberg did not have any chance to pursue this idea,
lacking the necessary technological means - precisely, the means that we have
today.
Indeed, we cannot affirm that timbre as structure - as musical structure - have
been always a matter of concern in the field of electroacoustic music. The tape
recorder, as well as the digital sound editor, provide time-domain manipulations -
cut and paste, whether real or virtual - which are very useful in articulating a
musical discourse, but do not touch the frequency domain. As for variable tape
speed and its successive improvements, from the analogue phonog~ne constructed
by Pierre Schaeffer in 1950 to the digital sampler of today, they only provide very
rudimentary means for changing the tonal pitch, but lack the power of controlling
at the same ~ n e the coordinates that define these changes at a deeper level, that
of the particular spectral structure of the sound to be transposed.
It is difficult to consider sampling as a sound synthesis technique as long as this
procedure is concerned only with macro-time manipulations, that is, as long as it
is not completed with tools to manipulate the recorded sounds in the frequency
domain, and, furthermore, to perform the pertinent correlations between fre-
quency and time domain. If we want to incorporate timbre structure into the act
of composition, we should look for means to compose also at the level of
micro-time. In order to do this, as Julius Smith recently pointed out (Smith, 1991),
sampling needs to be integrated into a spectral modeling paradigm.
Timbre as Syntax 75
One of the first approaches to dynamic spectral modeling was developed by Risset
in his work on the analysis/synthesis of trumpet tones by means of additive
clustering of partials whose temporal behavior was represented by piece-wise
linear segments - that is, articulated amplitude envelopes. Given the complexity of
the temporal features imbedded in natural sounds, a reproduction of all these
features was an impossible task; hence Risset applied a data reduction procedure
- what he called analysis by synthesis, reverting the normal order of these terms
(Risset, 1966, 1969; Risset and Wessel 1982). Beyond its success in imitating
existing sounds, the historical importance of the Risset model resides in the
Downloaded by [University of North Texas] at 18:23 23 November 2014
"in general offer a unique graphic perspective for music acoustic research
(...) We can study how the harmonic envelope changes during the course of
a single sustained tone. We can also view how the spectrum changes between
played notes. We can also watch the effect of articulation, such as
legato-tonguing, on the resulting spectrum" (Piszczalski, 1979, p. 18)
"to find a paradigm for controlling the threshold (for a given wave-form) so
that the projected perceptually salient features (e.g. blips in the attack) can be
retained or removed without concomitant, significant changes in the rest of
the wave-form. If the threshold is large, then the analysis delivers the overall
shape; with a small threshold, supposedly necessary details are retained
along with presumably superfluous ones. In general, our experience has led
to the conclusion that no single algorithm of this type will ultimately be
sufficient for the systematic exploration of timbre and data reduction"
(Strawn, 1980/1989, p. 681).
Here I would like to recall the statements made by Norbert Wiener in a lecture
given at G6ttingen University in 1925, and reproduced in 1930 in a paper entitled
precisely "Spatio-Temporal Continuity, Quantum Theory and Music" (Wiener,
1964). Wiener claims in his paper that Heisenberg was influenced in the solution
of quantum complementarity by his remarks about harmonic analysis, notably his
assumption that
From this Wiener inferred a basic fact, encountered later in sound synthesis:
Denis Gabor (1946, 1947) was perhaps the first to propose a method of sound
analysis derived from quan~ra physics. Assuming the uncertainty problem stated
by Wiener, he proposed to merge the two classic representations (the time-varying
waveform and the static frequency-based Fourier transform) into a single one, by
means of concatenated short windows or "grains". Meanwhile, the engineering
community developed a most effective Fourier analysis removing its static nature
by taking many snaphsots of a signal during its evolution. This technique became
known as the "short-time Fourier transform". However, the Gabor transform still
remains conceptually innovative by the fact of conceiving a two-dimensional space
of description. This original paradigm has been taken as a starting point for
developing granular synthesis (Roads, 1978) and later the wavelet transform
(Grossmann and Motet, 1984). While the first granular synthesis technique used
a stochastic approach, and hence did not touch the problem of frequency-time
local analysis and control, the wavelet transform addressed a straightforward
analytical orientation. The main difference of the wavelet transform in regard to
the original Gabor transform resides in the fact that in the first the actual changes
in frequency are analyzed with a grain of unvarying size, where in the wavelet
transform this grain's size increases and decreases following these changes (Arfib,
1990).
"it should be reasonable to require that the same primitives be used ha the
analysis of both amplitude and frequency functions, even if it turned out that
analysis of the two at higher levels followed different rules" (Strawn, 1980,
p. 682)
These primitives constitute small line segments, having each a duration and a
slope, which determines the direction of the line segment, as up, down or
horizontal.
The analysis is then articulated into three hierarchical levels, depending on the
threshold used to capture the line segment structure: (a) micro-time elements, very
small features such as "blips" in the attack, or noise appearing in some small
portions of the wave-form; (b) more long line segments modeling the wave-form,
capturing essential features emerging as additions of smaller line segments - thus
retaining the singularities (non regular events) belonging to the first level; (c) even
more long line segments, described as "part of a note", where the salient features
of the other levels are still present.
The method described by Strawn is roughly bottom-up. A refinement of the
technique might be to develop a top-dawn parse, using the results of the analysis
to "guide" a search for features to be included in the final set of break-points, one
direction confirming the findings of the other.
For those familiar with object-oriented programming, this process can be viewed
as a kind of encapsulation of features into an object. I have myself developed this
approach (Vaggione, 1991) which expands the syntactic analysis method into a
wider frame than pattern recognition.
The line of reasoning for this expansion relies on the assumption that if we can
encapsulate into an object some micro- and macro-time syntactic descriptions of
sound processes, we have gained a significant handle to deal with their
interactions.
Timbre as Syntax 79
Another recent technique to confront explicitly the basic acoustic dualism was
developed by Xavier Serra and Julius Smith (1990), proposing a spectral modeling
synthesis approach based on a combination of a deterministic and a statistic
decomposition, the deterministic part being the representation of Fourier-like
components (harmonic and inharmonic) in terms of separate sinusoids evolving in
time, and the statistic part providing what is not possible to analyze within the
Fourier paradigm, namely, the noise elements, present in the attack portion, but
also throughout the production of a sound (think of the noise produced by a bow,
or by breath, etc).
The mention of these latter elements leads us to recall the existence of another
different approach to sound analysis and synthesis, which can not be characterized
Downloaded by [University of North Texas] at 18:23 23 November 2014
"While a measure is cyclic, in that after the music has moved through beats
1,2,3, and 4 (for example), it goes back to (another) beat 1 (...) the rhythmic
groups are not usually cyclic, because they vary considerably and because
Timbre as Syntax 81
they are comprised of music, not just beats. It is because meter is cyclic that
it is more resistant to change than rhythm. Rhythm is the force of motion,
while meter is the resistance to that motion" (Kramer, 1988, p. 99).
can be said of the relations between time and frequency domains. Faced with this
situation, it is evident that the solution cannot consist in closing the borders, but
in considering all the possibilities of interaction, and hence of articulation -
composed articulation - between these different scales. Timbre as a syntax, as a
field for syntactic articulation, is to be related to the other time-scales not by
imposing a syntax common to all (this is an utopia, unfortunately present in the
Stockhausen article "How Time Passes" (Stockhausen, 1957), otherwise so full of
insights about the dualism frequency/time), but to articulate all relationships into
interactive, contextual fields. The conditions of extensibility of these contextual
fields to all possible time-scales is in itself a subject of syntactic analysis.
In any case, the structuring of the micro-time of timbre is not without
consequences for the macro-time, and vice-versa. The very fact of the non-
existence of a continuum between different scales of time gives us the opportunity
to implement powerful means of sound transformation, if only we take care of
knowing what we are doing. Surface changes in tonal pitch cause immediate
changes in spectral content, even beyond a simple rearrangement of the partials:
a non-linear correlation of dimensions is at work here. And it is not an
exaggeration to say that each operation concerning a simple change in tonal pitch
is to be conceived as a case of correlation where a compositional strategy directs
its formal coherence.
There is certainly a correlation between a given degree of spectral density (what
is often called "brightness" or spectral energy) and the spiral of the tonal pitch
space as illustrated by Shepard (1976) and Risset (1978). The mapping of this
timbre space, initiated by the work of Wessel (1979), can be developed and tested
in different situations handling different coordinates. But this model needs to be
syntactically refined in order to embrace timbre as a time-frequency domain
structure - and to incorporate the many linearities arising from the interaction of
"a note and its timing", as Wiener would say.
Moreover, consider simply the limit of density that MIDI (which is a macro-time
protocol) can handle: if we overstep this limit, at the threshold which causes the
MIDI interface to break down, we will start to hear side-bands, that is, the raising
of a spectrum. MIDI ends here, at the threshold of micro-time. On the other hand,
granular synthesis is not limited by this low-density problem. That is why this
technique deals pertinently with micro-time operations. However, there is a
twofold deficiency in simple granular synthesis, namely (a) its non-analytical
orientation, and (b) its lack of bridges in the direction of the macro-time domain.
The problem here is inverse to that of MIDI. Of course, we can always recuperate
a granular texture by making it function as a wavetable (of whatever length),
82 H. Vaggione
"At that time (1925) I had clearly in mind the possibility that the laws of
physics are like musical notation, things that are real and important provided
that we do not take them too seriously and push the time scale d o w n b e y o n d
a certain level. In other words, I m e a n to emphasize that, just as in q u a n t u m
theory, there is in music a diference of behavior between those things
belonging to very small intervals of time (or space) and what we accept on
the normal scale of every day, and that the infinite divisibility of the universe
is a concept which m o d e m physics cannot any longer accept without serious
qualification" (Wiener, 1964, p. 545)
References
Arfib, D. (1990) In the intimacy of a sound. In Proceedings of the 1990 International Computer Music
Conference, Glasgow. San Francisco: ICMA.
Cadoz, C. et al. (1984) Responsive Input Devices and Sound Synthesis by Simulation of Instrumental
Mechanisms. ComputerMusic Journal, 8 (3): 60-73. Reprinted in C. Roads, (Ed.): The Music Machine
(1989); Boston: MIT Press.
Dolson, M. (1986) The phase vocoder: a tutorial. In: Computer Music Journal 10 (4): 14-27.
Eimert, H. and Stockausen, K. (Eds). (1955) ElektronischeMusik. Die Reihe 1. Vienna: UniversalEdition.
Fu, K. S. (1974) Syntactic Methods in Pattern Recognition. New York: Academic Press.
Gabor, D. (1946) Theory of communication. Journal of the Institute of Electrical Engineering 93, 4-29.
Gabor, D. (1947) Acoustical Quanta and the Theory of Hearing. Nature 159: 303.
Grey, l- (1975) An Exploration of Musical Timbre. Ph.D. diss., Stanford University.
Grey, I. and Moorer, J. A. (1977) Lexiconof Analyzed Tones. Computer Music Journal, 2 (2): 24-27.
Grossman, A. and Morlet,J. (1984) Decompositionof Hardy Functionsinto Square Integrable Wavelets.
SIAM Journal of Mathematical Analysis 15: 723-736.
Hiller, L. and Ruiz, P. (1971) SynthesizingSounds by Solving the Wave Equation for Vibrating Objects.
Journal of the Audio Engineering Society, 19: 463-470.
Kramer, J. (1988) The Time of Music. New York: Schrimer.
Timbre as Syntax 83
Kronland-Martinet, R. (1988) The Wavelet Transform for Analysis, Synthesis, and Processing of Speech
and Musical Sounds. Computer Music Journal, 12 (4): 11-20.
Laske, O. (1991) Toward an Epistemology of Composition. In O. Laske (Ed.): Composition Theory.
Interface 20 (3-4): 235-269.
McAdams, S. (1984a) The Auditory Image: A Metaphor for Musical and Psychological Research on
Auditory Organization. In W.R. Crozier and A.J. Chapman (Eds.): Cognitive Processes in the
Perception of Art. Amsterdam: North Holland.
McAdams, S. (1984b) Spectral Fusion, Spectral Parsing and the Formation of Auditory Images. Ph.D. diss.,
Stanford University.
McAdams, S. (1989) Contraintes psychologiques sur less dimensions porteuses de forme. In
S. McAdams and I. Deli~ge (Eds.): La musique et tes sciences cognitives. Liege: Pierre Mardaga.
Moorer, J. A. (1976). The use of the Phase Vocoder in Computer Music Applications. Journal of the
Audio Engineering Society, 26 (1/2): 42-45.
Downloaded by [University of North Texas] at 18:23 23 November 2014
Morrison, J. and Adrien, J-M. (1993) MOSAIC: A Framework for Modal Synthesis. Computer Music
Journal, 17 (1): 45-56.
Pavlidis, T. and Horowitz, S. (1974) Segmentation of Plane Curves. IEEE Transactions on Computers
C-23 (8): 860-870.
Piszczalski, M. (1979) Spectral Surfaces from Performed Music. Computer Music Journal 3 (1): 18-24.
Roads, C. (1978) Automated Granular Synthesis of Sound. Computer Music Journal 2 (2): 61-62.
Risset, J. C. (1966) Computer Study of Trumpet Tones. Murray Hill, Bell Laboratories.
Risset, J.C. (1969) An Introductory Catalog of Computer Synthesized Sounds. Murray Hill: Bell
Laboratories.
Risset, J. C. (1978) Paradoxes de hauteur. Paris, Rapport IRCAM 10.
Risset, J. C. and Wessel, D. (1982) Exploration of Timbre by Analysis ans Synthesis. In D. Deutsch (Ed.):
The Psychology of Music. New York: Academic Press.
Sch6nberg, A. (1911) Harmonielehre. Vienna: Universal Edition.
Stockhausen, K. (1957) Wie die Zeit vergeht . . . Die Reihe 3, Vienna: Universal Edition.
Serra, X. and Smith, J. O. (1990) Spectral Modeling Synthesis: A Sound Analysis/Synthesis System
Based on a Deterministic plus Stochastic Decomposition. Computer Music Journal, 16 (4): 12-24.
Smith, J.O. (1991) Viewpoints on the History of Digital Synthesis. In Proceedings of the 1991
International Computer Music Conference. pp. 1-10 San Francisco: ICMA.
Strawn, J. (1980) Approximation and Syntactic Analysis of Amplitude and Frequency Function for
Digital Sound Synthesis. Computer Music Journal, 4 (3): 3-24. Reprinted in C. Roads (Ed.): The Music
Machine (1989), Cambridge: MIT Press.
Truax, B. (1988) Real-Time Granular Synthesis with a Digital Signal Processor. Computer Music Journal,
12 (2):. 14-26.
Vaggione, H. (1984) The Making of Octuor. Computer Music Journal Vol. 8 (2): 48-54. Reprinted in
C. Roads (Ed.): The Music Machine (1989), Cambridge: MIT Press.
Vaggione, H. (1991) On Object-based Composition. In O. Laske (Ed.): Composition Theory. Interface 20
(3-4): 209-216.
Vaggione, H. (1993) Determinism and the False Collective: About Models of Time in Early
Computer-Aided Composition. In J. Kramer (Ed.): Time in Contemporary Music Thought. Contempo-
rary Music Review 7 (2): 91-104.
Wessel, D. (1979) Timbre Space as a Musical Control Structure. Computer Music Journal, 4(1): 45-52.
Wiener, N. (1964) Spatial-Temporal Continuity, Quantum Theory and Music. In M. Capek (Ed.) 1975:
The Concepts of Space and Time. Boston: Reidel.