You are on page 1of 8

The Composition of "Vox-5"

Author(s): Trevor Wishart


Source: Computer Music Journal, Vol. 12, No. 4 (Winter, 1988), pp. 21-27
Published by: The MIT Press
Stable URL: http://www.jstor.org/stable/3680150
Accessed: 19/01/2009 14:10

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=mitpress.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the
scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that
promotes the discovery and use of these resources. For more information about JSTOR, please contact support@jstor.org.

The MIT Press is collaborating with JSTOR to digitize, preserve and extend access to Computer Music
Journal.

http://www.jstor.org
TrevorWishart The
Realspace Musics
Composition
83 Heslington Rd.
YorkY01 5AX of Vox-5
United Kingdom

Introduction precise manner was one of the principal reasons for


coming to the Institut de Recherche et Coordina-
Vox-5 is the fifth in a series of six pieces exploring tion Acoustique/Musique (IRCAM)in Paris to do
the musical possibilities of the human voice. All a project.
the other pieces in the series are for live perfor-
mance by four amplified vocalists, and most of
them also use sound-on-tape as the accompanying Techniques
Compositional
medium, and simultaneously the environment in
which the vocalization takes place. Each piece in The two majortechniques used in the composition
the series is concerned with different ideas of mu- of Vox-5 are those of spectral manipulation, and
sical structure, different performancetechniques, spectral interpolation. In both cases, the source
and different aspects of human experience. Prior sound or sounds are first analyzed using the phase
to the composition of this series I had developed vocoder program(Dolson 1985; Gordon and Strawn
a thorough system of notation for vocal sounds 1985).
together with a systematic classification of vocal The phase vocoder (PV)effectively divides the
techniques. Vox-5 forms a bridgingrecitative into source sound into a number of channels that are
the finale of this series (Vox-6), and is also a "poetic equally spaced through the range of frequencies.
summary" of the preceding material. In each channel it stores the amplitude of the sig-
The primaryauralfocus of Vox-5is a (super)human nal component, and its frequency (usually known
voice that metamorphoses into many recognizable as "phase"in this context).It is now possibleto
sonic images, such as the sounds of crowds, bees, manipulate these analysis data. For example, to
a horse, or bells. This voice also employs various shift a spectralcomponentin channeln up one
extended vocal techniques (such as ingressive com- octave,we need to move the amplitudeandphase
plex sounds, subharmonics, ululation, and special in this channelinto channel2n (whichis centered
consonant production).These are further extended aroundtwice the frequency)andalso multiplythe
by spectral manipulation of the material by the phaseby two. (Phasecorrespondsto frequency,
computer. In all the spectral transformationsone so an octaveshift meansmultiplyingfrequency
aesthetic aim has been to retain the "source credi- by two.)
bility" of the resulting sounds; that is, they must It is possibleto deviseprogramsthat will system-
always be believably vocal or naturalistic, even if aticallytransformthe completespectrumgenerat-
startlingly unexpected in their spectral develop- ing a secondanalysis-datafile, which can then be
ment. I have been working with ideas of sound resynthesizedusing the phasevocodersynthesisop-
transformationsince about 1970, but not having tion. These transformations can be time-dependent
had access to computing facilities, I developed my (the spectralshiftingcan follow an envelope,or
previous techniques in a classical tape studio, using reada value-timefunctionto ascertainits valueat
only editing, mixing, and speed changing. The op- any particulartime).The programsI wroteto do
portunity to realize these processes in a much more this arenow availableon the systemat IRCAMand
as partof the ComposersDesktopProjectsoftware
This article refersto a set of sound examples. These examples (Atkinset al. 1987;formoreinformation,contact
will appearon a soundsheet published in the next issue, Volume the ComposersDesktopProject,11 KilburnRd.,
13, Number 1, (Spring1989). YorkY014DF,U.K.).(Iwouldlike to acknowledge
ComputerMusic Journal,Vol. 12, No. 4, Winter 1988, the adviceandassistanceof XavierRodet,Yves
? 1988 Massachusetts Institute of Technology. Potard,ErikViara,Dan Timis, MillerPuckette,and

Wishart 21
others in helping to clarify aspects of the phase Preserving Formants
vocoder, ironing out my C programmingtechnique,
and making helpful suggestions about directions Moving spectral components of the voice a long
worth exploring!) way distorts the original spectral envelope or for-
mant structure of the source. In some cases this is
undesirable, so the programspecf makes an approx-
Special Shifting imate calculation of the spectral envelope of the
source, which is retained on shifting the spectrum.
Shifting and Stretching the Spectrum

In Vox-5, the spectral shifting techniques were Different Kinds of Stretching


confined to global expansion or contraction of the
spectral data. It is useful to be able to stretch just a The stretching of the spectrum in specsh is lin-
part of the spectrum, so the programcalled specsh ear; that is, for upwardstretching above fdcno, the
divides the spectrum into two parts about a par- stretch at fdcno is zero. Intermediate spectral com-
ticular PV channel (referredto in the programsas ponents are shifted by a factor (less than or equal to
fdcno, or frequency-dividing channel number) one) before being applied to any specific channel.
and then shifts the upper or lower portion (or both). The value of this factor varies (linearly)from zero
In this way, for example, subharmonics and other to one as we progressfrom channel fdcno to the
kinds of (apparent)double-productioncan be in- topmost channel (Fig. 1).
troduced into the voice, and (through the use of One transformationI wanted to experiment with
the time-varying options) can be made to move was the stretching of a vocal source to produce an
independently of one another. At the beginning inharmonic spectrum like a bell sound (as this is
of Sound Example 1 (see the soundsheet in the one of the few sounds the voice cannot imitate).
next issue, Volume 13, Number 1, Spring 1989), The results with linear stretching were disappoint-
the source sound is a voice producinga (real)sub- ing, so I wrote a modified version of the stretch part
harmonic at one octave and a fifth below the fun- of specsh, called spece, which allowed me to vary
damental. The specsh programhas been used to the type of interpolation used in the stretching pro-
extend this subharmonic down another octave and cess. For example, if the multiplying factor (which
then cause it to slide upwardtoward its original is always less than one) is multiplied by itself, be-
register in the course of the sound. fore being applied to stretch, we produce a stretch-
There are two fundamentally different ways in ing curve that is concave. In this case, the spectral
which the spectral data can be expandedor com- components are shifted less at the bottom of the
pressed. In the first case, the harmonic relation- spectrum, and more at the top (Fig. 2). Conversely,
ships between (moved)components are preserved taking the square root of the multiplying factor, we
(this I refer to as shifting of the spectrum). In the produce a convex curve, and the spectrum is shifted
other case, all the (moved) components move in the more at the bottom than at the top of the spectrum
same direction, but the amount they move depends (Fig. 3).
on their positions in the spectrum (this I refer to The spece programpermits the multiplying fac-
as stretching of the spectrum). As a rule-of-thumb tor to be raised to any power, and that power factor
for preservingvocal "credibility"in the resulting can be specified by the composer. In Sound Ex-
sound, shifting is best applied to clearly pitched ample 2 the original vocal source has been stretched
sources, and stretching to noisy sources, but either using a power factor of 0.4. Through the sequence
technique can be applied to either type of source. In the maximum shift moves from 1.05 to 6.0.
practice, experimentation on the particularsound After this spectral shifting I also applied time
source is essential. The programspecsh proved to stretching to the sources, taking care not to time-
be particularlyuseful for a subtle type of perceptual stretch the attack portion of the sounds. The pro-
masking discussed later under "Interpolation." cess of spectral stretching is complemented by

22 Computer Music Journal


Fig. 1. Linear spectral Fig. 2. Concave spectral Fig. 3. Convex spectral
shifting from fdcno to the shift. shift.
top of the spectrum.

Fig. 1 Top of spectrum


Top of spectrum

)-4

c)

a)
(3
6

fdcno !

fdcno 0 1
Shift multiplier
0 Shift multiplier 1

(component-shift = input-shift-value + shift-multiplier)


cence of the sound image as "bell,"while remain-
ing (at the start of the sequence) "voices."
Fig. 2
Top of spectrum Spectral Envelopes

All the previously mentioned programs allow one


u
a) to define an envelope on the evolution of the spec-
tral shifting. This is similar to the traditional
"S attack-decay-sustain-release (ADSR) envelope on
analog synthesizers, but is defined as attack, diver-
z)
U
<^l gence, on, convergence, end. During the attack and
d4 end portions there is no spectral change, and dur-
ing the on portion the spectral change is as pro-
fdcno grammed. In the other two portions the spectral
0 Shift multiplier 1 change is interpolated between the original and
the changed versions.
The program specsh goes one step further, in that
the increased duration of the resulting sounds it allows the parameters of the spectral shifting
(enhancing the perception of a transition toward (stretching) to be input as a time-varying function.
"bell-like"). The results are heard in Sound Ex- This program works on specifically defined bands
ample 3. The time-stretching process introduces of the spectrum.
distortions (appearingas subharmonics or ampli-
tude-modulation-type effects), which I decided to
retain as part of the final mixed sounds. In the final Spectral Interpolation
composition of the piece, several such sequences
were synchronized, component-by-component, to The other major technical procedure used in the
produce the sequence heard in Sound Example 3. composition of Vox-5 is that of interpolating,
Here the ability to exactly align the perceived vocal through time, between two different spectra to
attacks (which differby between 0.06 and 0.1 sec generate a seamless transformation between the
from onset of the sound) is crucial to the coales- two original sounds. Mathematically, and compu-

Wishart 23
tationally speaking, the process of interpolation inent features in the source), producing a nine-
is quite straightforward.All we need to decide is strand wedge. Three further repetitions generate a
when, for how long, and in what manner we inter- 243-strandwedge. This is then played backward,
polate between the phase and amplitude values in and the original sound emerges out of the 243-times-
the two analysis files. Any kind of interpolation thickened spectrum. In this way the sound of a
can be mathematically described. The problem is crowd can be more easily matched with a plosive
primarily that of finding musically and acoustically unvoiced k.
satisfactory solutions. There are four aspects to this
problem.
3. Specifying the Interpolation

1. Matching of Sources The programvocinte allows one to specify a start


and end time for the interpolation of amplitude, in-
First, it goes without saying that the two source dependent of the interpolation of phase, and vice
sounds should have as many similarities as pos- versa. The two interpolations could also be made to
sible. In the now well-known examples of inter- proceed either linearly or by specifying a power fac-
polation between pitched and instrumental sounds tor (see the section "DifferentKinds of Stretching")
made at StanfordUniversity in the 1970s, the prob- for amplitude, and another for phase. A great deal
lem was simplified by dealing with harmonic spec- of experimentation was requiredwith each pair of
tra of stable tessitura. With more complex sounds, sources to determine the values for start-and-end
a high level of acoustic/musical judgment is re- times and power factor that gave the best result.
quired to match the sources sufficiently before Other rules-of-thumb included the following:
embarking on what can be a long interpolation
Amplitude and phase should be interpolated over
computation. In this task, the spectral and mor- the same time interval.
phological malleability of the voice as a sound pro-
ducer is vital. The voice can be made to match Amplitude interpolation should have power fac-
tor 0.8 (rapidat start, slowing at end).
almost any other sound, apartfrom bell-like stable
Phase interpolation should have power factor 2
inharmonic spectra. Knowledge of extended vocal
techniques is a great asset here. (slow at start, and rapid at end).
But these rules broke down in the case of Sound
Example 1 (an interpolation between a vocal voiced
2. Preprocessingof Sources zzz and the sound of bees). At the start, the curves
for amplitude and phase interpolation are exactly
I brought with me a simple technique from the ana- the opposite of the rule (!),while phase interpo-
log studio. A complex source can be reduced in its lation takes place in a much shorter time than
spectral detail by a process of spectral thickening, amplitude interpolation (Table 1). [Note again:
which makes it much easier to match to another Sound examples are on a soundsheet included with
source. Thus many of the analog sources I brought issue 13(1), Spring 1989.]
with me to IRCAMhad been wedged or diswedged
as follows. To make a diswedge, the sound is re-
corded backward.It is then mixed with two copies 4. Perceptual Boundaries
of itself, not synchronized, with the copies diverg-
ing slightly in tape speed from the first as the mix The fourth and potentially most difficult area for
proceeds. The two copies fade in imperceptibly sound interpolation is the problem of categoric
during the mix, thus progressivelythickening the recognition. No matter how smooth an interpola-
mix. This process is then applied to the resulting tion may be acoustically or mathematically, we are
mix (taking care to avoid echo effects between prom- perceptually prone to make sudden leaps in our per-

24 Computer Music Journal


Table 1. Sound Example 1 Interpolation Parameters The programsdescribedhere can never give au-
tomatically valuable results. They need to be fine
Interpolation Start Time End Time Power Factor tuned to the particularsource sound and the sonic
result required.Musical decision-making is still
Amplitude 0.2 sec 7.8 sec 2.0 paramount, and a most important aspect of this
Phase 0.2 sec 3.0 sec 0.8 work has been that composition can now approach
a sculptural process in which the composer fash-
Front Right Behind Left ions the sounds just as a sculptor might work with
physical materials. Each sonic object is unique, and
ception of unfamiliar objects. As suggested by ca- it requires a special sort of acoustically informed
tastrophe theory, when perceiving a continuously open-minded approachto mold it to the desired
changing object that we initially recognize, we con- end. The mental attitude is quite distant from the
tinue to interpret it as the original object until a predetermininglogical thoroughness that we have
certain threshold is reached. At this point there is been taught to bring to the writing of an instru-
likely to be a sudden switch to the recognition of mental score, and adds an exciting new dimension
a new object. To minimize this perceptual switch- to the sphere of compositional competence.
ing, I reprocessedthe interpolated sound, making
(usually the upper)part of the spectrum move in-
dependently through the moment of maximum Spatialization
ambiguity. In this way the tendency to make a per-
ceptual switch is distracted by our following the One further important structural feature of Vox-5
other movement in the spectrum (which is not di- is the use of spatial placement and movement to
rectly related to the spectral interpolation that is articulate the sonic development and the aural
going on) and we are more likely to perceive the imagery of the materials. I have used computer-
transition as seamless. This process can be heard controlled, prescoredspatialization in the piece
at the beginning and end of Sound Example 1. Vox-1, where the live sounds of the four vocalists
In Sound Example 4, on a first listening we hear are moved (independently)aroundthe quadraphonic
the sound of a human voice becoming a horse neigh. space by a custom-built device and software that I
However, on a second hearing (in isolation), be- developed, using ambisonic technology. I would
cause I knew it would become a horse, I heard it all have preferredto spatialize the sounds in Vox-5 di-
as a horse! This difficulty can be overcome within rectly on the DEC VAXcomputer at IRCAM.It is
the structure of the piece, however, by the choice of possible to do this, but there were only two digital-
an appropriatecontext. As (almost) every other to-analog converters available, so it would not have
sound in the piece begins as a vocal source, the been possible to hear the results of one's moves and
context is established for the listener to interpret thus test the validity of one's compositional pre-
the start of this sound as vocal (which it is!) so the conceptions. The sound materials were therefore
perceptual ambiguity does not arise. spatialized during a mixdown on IRCAM's24-track
Making transitions back to the voice proved to be digital tape recorder,using a simple quadrapan
generally more problematic than divergences from device.
the voice. This often required that the sound being I have written much about the significance of
interpolated from persist, and, as it faded away, fol- "landscape"in the perception and composition of
low exactly the same amplitude contour and mo- electroacoustic music. The sonic space I am refer-
tion in space (see the next section) as the target ring to is indicated in Fig. 4, where the numbers
sound, to make a convincing convergence. Sound represent spatial positions and also loudspeaker
Example 6 gives an approximateindication of the placements. Thus the position 1/2 refers to the
process involved (the true version has sounds con- place equidistant from the two places 1 and 2.
verging from a quadraphonicspace). In a system with a limited number of loudspeak-

Wishart 25
Fig. 4. Layout of playback
channels. Position "1/2" is
between channels 1 and 2.

Two different kinds of motion organization are


1 Front 2 used in Vox-5: (1) the motion of individual sounds
1/2 within the space, like the three explosive unvoiced
du ... sounds behind the "choralbells" material,
which travel individually aroundthe space in differ-
ent arcs; and (2) "framemotions," such as that of
the vocal-ululation material (near the beginning of
the piece) and the ingressive complexes (near the
e. 1/3 Audience 2/4g: end), where three streams of similar (but separable)
materials move simultaneously aroundthe space,
redefining their relationship to one another. This
latter type of motion seems (perceptually)more
like a constant shifting of the aural space itself,
and is almost equally effective from all listening
3/4 positions.
3 Behind 4

Other Techniques

Other computer music techniques were employed


ers our aural discrimination of direction and mo- in the composition of Vox-5.One importantmethod
tion is limited. Moreover,spatial motion is best was the use of the mix programto control very pre-
perceived with certain types of sounds (e.g., high- cisely the envelopes of sound events. This was par-
frequency noise bands and repeating sounds). Mo- ticularly crucial when joining end-to-end sections
tion is thus confined primarily to the lines joining of vocal material recordedindependently.In order
the loudspeakerpairs (including the diagonals).In to create the illusion of a continuous vocal stream,
addition, perception of spatial position tends to be the finishing sound had to be enveloped (often over
strongly distorted for audience members sitting a a period of only 0.05 sec) to suggest a vocal closure
long way off-center. This effect can be remedied sig- or the preparationin the vocal cavity of the next
nificantly by ambisonic encoding and projection of sound. Conversely, the beginning of the next sound
the sounds; an ambisonic master of the finished often had to be enveloped to suggest the reopening
piece has also been completed. of the mouth. This kind of subtle detail should re-
The principal sonic image of the piece is a voice main unperceived by the listener if it is correctly
located at 1/2. The emergence (genesis) of other done, but would be jarringlyobtrusive if not. I
sounds out of vocal initiators is emphasized by enjoy this kind of "self-effacing"technique, be-
their spatial "ejection"from this location into the cause it contradicts the mythology of a composer
quadraphonicspace aroundthe listeners. This is as a strongly projected ego in music-making!!
most dramaticallyheard in the five rapidvocal
noise band attacks in the early part of the piece (re-
curringin curtailed form later). These are "ejected" Coda
out into the corners of the space. In other cases
the process is enhanced by moving from a "single- The creation of Vox-5 helped me to test out many
stream" vocal source located at 1/2, to a stereo ideas about the control of musical articulation in a
sound. Here it is necessary to separate the two continuum, about spectral interpolation, and about
channels of the stereo source, analyze and interpo- the organization of sound in space. I am particu-
late them separately (using exactly the same pa- larly grateful to IRCAMfor inviting me to use its
rameters for each), and then recombine the results. facilities, as my ideas could not have been realized

26 Computer Music Journal


without these computer music resources (no the sonic evolution to the global properties of fluid
equivalent facilities exist in the U.K.). flow (e.g., the onset of turbulence, granulationpro-
The computer opens up areas of compositional cesses) as part of a compositional research project.
exploration that were previously inaccessible. The
precision with which sound materials can be speci-
fied implies two things: (1) Given an understanding References
of acoustics, sounds can be transferreddirectly
from the composer's imagination to the perfor- Atkins, M., A. Bentley, T. Endrich,R. Fischman, D. Mal-
mance situation; (2) Areas of sonic organization ham, R. Orton, and T. Wishart. 1987. "The Composers'
previously inaccessible to composers through the Desktop Project."In S. Tipei and J.Beauchamp,eds.
existing media of notation can be explored, open- Proceedings of the 1987 International Computer
Music Conference. San Francisco:Computer Music
ing up a new world of dreamedof, but unsung
Association, pp. 146-150.
possibilities. Dolson, M. 1986. "The Phase Vocoder:A Tutorial."
In the future I would like to explore more ComputerMusic Journal10(4): 14-27.
precisely the musical possibilities of moving and Gordon,J.,and J. Strawn. 1985. "An Introductionto the
"moving/static" aural events like forms in fluids. Phase Vocoder."In J. Strawn, ed. Digital Audio Signal
Consider the turbulence aroundan obstruction that Processing:An Anthology. Madison, Wisconsin: A-R
appearssemistatic in space, but is constructed en- Editions, pp. 221-270.
tirely out of rapidly moving flows. I hope to relate

Wishart 27

You might also like