Compositional Control of Phonetic:Nonphonetic Perception

Compositional Control of Phonetic/Nonphonetic Perception
Author(s): David Evan Jones

Source: Perspectives of New Music, Vol. 25, No. 1/2, 25th Anniversary Issue (Winter -
Summer, 1987), pp. 138-155
Published by: Perspectives of New Music
Stable URL: http://www.jstor.org/stable/833096
Accessed: 11-05-2017 11:16 UTC
REFERENCES
Linked references are available on JSTOR for this article:
http://www.jstor.org/stable/833096?seq=1&cid=pdf-reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
Perspectives of New Music is collaborating with JSTOR to digitize, preserve and extend access to
Perspectives of New Music
This content downloaded from 185.38.254.12 on Thu, 11 May 2017 11:16:41 UTC
All use subject to http://about.jstor.org/terms
COMPOSIT'IONAL CONTROL OF
PHONE'lIC/NONPHONEI'IC
PERCEPTION
DAVID EVAN JONES
N PERCEIVING ORDINARY speech, we interpret a continuously variable stream

of diverse timbres (the auditory speech signal) in terms of categories-
phonemes, syllables, words, phrases-which we recognize. In thus attending to
the patterns of speech sounds as caiers of information (verbal meaning), we
have less attentional capacity to hear these patterns as structures of (purely sonic)
information. In making the categorizations necessary to understand verbal
meaning, we reduce somewhat the overwhelmingly complex sonic interactions
which characterize actual speech to familiar and much simpler reconstructions
which, in turn, make speech sound patterns seem somehow simple.
There is a wide body of literature, both poetry and music, which adopts as a
central compositional device the presentation of speech sounds in ways which
draw the attention (at times) to the purely sonic information in speech. This
literature includes some of the work of Italian Futurists such as Filippo
Phonetic/Nonphonetic Perception 139
Marinetti, early sound poets such as Hugo Ball and Kurr Schwitters, Lettrists
such as Maurice Lemaitre, and many others. With the advent of musique con-
crete and electronic music, composers such as Stockhausen (e.g. Gesang der
Junglinge), Herbert Eimert (e.g. Epitaph fir Aikichi Kuboyama), and Luciano
Berio (e.g. Thenma-OmiggiaJoyce) began using recorded and processed speech
sounds (along with nonspeech materials) to focus listeners' attention on the
sounds of speech. Composers such as Luciano Berio (e.g. Circles), Gyorgy Ligeti
(e.g. Nouvelles Aventures), Kenneth Gaburo (e.g. Maledetto), and many others
wrote purely acoustic compositions to the same effect. More recently, com-
posers of computer music have brought digital technology to bear on the prob-
lem. These works include Charles Dodge's Speech Songs, Tod Machover's Soft
Morning, City!, John Chowning's Phone, Paul Lansky's Six Fantasies on a Poem by
Thomas Campion, and many others.
A variety of perceptual ambiguities can be found in these pieces. These
include ambiguities concerning:
SOURCE RECOGNITION: Is the source vocal, instrumental, electronic?

What voice? What instrument?
PHONETIC/NONPHONETIC PERCEPTION: Is the listener receiving a

phonetic message? What phonetic segments are being received?
MORPHEMIC/NONMORPHEMIC INTERPRETATION: Do the
phonemes received form functional units (words or syllables whic
verbal meaning) in a natural language known to the listener?
SYNTACTIC STRUCTURE: Do the morphemes combine to form

level syntactic structures in a natural language known to the liste
While a few writers have approached the speech-as-sound poetic

literature from an historical perspective (e.g.Ruppenthal 1975), or h
specific compositions (e.g.Stockhausen 1964), there currently e
oretical framework which attempts to explain the perceptual issu
the variety of approaches to composition with speech sound. Inde
of entangled ambiguities listed above suggests that such a frame
extremely difficult to establish.
I wish to address some fundamental issues regarding a very small
puzzle. I will examine situations in which listeners are aware of re
cific and intelligible phonetic message (not necessarily morphem
focus of their attention is primarily on the sounds of speech-not on
phonetically coded information, but as timbres, pitches, duration
In defining my topic rather narrowly around the issues of ph
phonetic perception, I focus upon the most elemental distinctio
speech and sound. Nonetheless, it will be impossible to avoid discu
140 Perspectives of New Music
levels of the speech hierarchy (phonetic, morphemic, syntactic). As will be illus-

trated below, perception on one level of the hierarchy informs and often deter-
mines perception on both higher and lower levels. Questions as to source,
phonetic and morphemic content, and syntactic structure are often almost inex-
tricably intertwined.
It will be useful, then, to begin with some fundamental distinctions and a
brief theoretical discussion of some of the perceptual principles involved. It will
then be possible to examine specific compositional strategies which make use of
these principles.
VOICE-LIKENESS
The most heavily processed sections of Eimert's Epitaph fir Aikichi Kuboyama
convey the rhythms, rates, and directions offormant change which we associate
with speech. It is even possible in these sections to identify many syllable bound-
aries and even some classes of phones (sonic representation of phonemes). It is
possible, for example, to distinguish stops, as a class, from fricatives and vowels
as classes. And yet, while it is clear that these heavily processed sounds are voice-
like (or "speech-like"), no specific and intelligible phonetic message is conveyed.
We can generalize the distinction: "Voice-likeness" involves source recognition
and description; a sound can be voice-like without conveying a specific phonetic
message.
A compositionally useful definition of "voice-like" aspects of a sound might
be: those aspects which cue listeners' association with the human voice in a given
context. The subjective nature of the definition reflects the speculative nature of
composing for a diverse audience. It is more useful for the analysis of perceptual
effects to define "voice-likeness" in terms oflisteners' associations than in terms
of the actual source. Some extended vocal techniques or other unusual sonic
output of the human vocal tract heard in isolation might not be "voice-like"
under the above definition; sounds produced by electronic or acoustic instru-
ments other than the voice are often considered voice-like in some aspects. Also,
because a signal may be voice-like in one aspect (e.g. vibrato rate or intonation
contour) and unvoice-like in another (e.g. timbre of the glottal source etc.) it is
often more useful to refer to voice-like aspects of a sound than to voice-like sounds.
It is intuitively obvious that an association with the human voice can be pro-
duced-by clearing the throat or coughing, for example-without any phonetic
message involved. On the other hand, the presentation of a specific phonetic
message (whether by human voice or electronic or acoustic instruments) cer-
tainly cues an association with the human voice, and must be regarded as voice-
like in that aspect-even if the glottal source or other aspects of the sound are not
at all human-sounding or "voice-like."
But there is a much stronger correlation between voice-likeness and phonetic
perception. In deriving a phonetic message from an acoustic signal, listeners rou-

tinely make use ofinformation they have as producers of speech sound-ofinfor-
mation about the capabilities of the human vocal tract. The case for this point is
reviewed compellingly by Liberman and Studdert-Kennedy (1977) who cite a
wide body of supportive experimental evidence. As these authors point out"...
the key to the (speech) code is in the manner of its production." This required
identification of listener with the source of the signal he is decoding may explain
the fact that listeners may not hear the phonetic message in some "unvoice-like"
sounds (see Remez et al. 1981, for example). Thus the issue of voice-likeness-
essentially a question of source recognition-is strongly related to the issue of
phonetic/nonphonetic perception. Voice-likeness will bear upon some of the
compositional strategies outlined below.
It should be noted in passing that the close identification of listener with
source in speech perception is one basis for the affect exerted by compositions
which play at the threshold between "unvoice-like" sounds and voice-like (and
perhaps phonetic) sounds. The acoustic difference between voice-like and
unvoice-like is often very small; the psychological difference is often very great.
Moreover, there are marked differences, discussed below, between phonetic and
nonphonetic perceptual modes. Thus, while Stockhausen may have created a
rough acoustic continuum between speech and nonspeech for the composition of
Gesang der Jinglinge, perception of the sounds along this continuum does not
change continuously (see House et al. 1962). Still, the most striking aspect of
Stockhausen's piece may be the close association between familiar vocal sounds
and electronic sounds which could not be produced in any familiar acoustic
environment.
PHONETIC AND NONPHONETIC INFORMATION IN SPEECH
It is impossible to convey only phonetic information aurally. Any sound which

can be interpreted phonetically also carries nonphonetic information-sonic
information which is not directly relevant to the phonetic code. Thus, in many
languages, voice quality and pitch cary the phonetic message but do not directly
determine that message: they can vary within wide limits without altering the
phonetic message. Voice quality, intonation contour, and other nonphonetic
aspects of the sound may, however, change the meaning of the syntactic mes-
sage-turning a statement into a question, negating the spoken text with irony or
satire, and so forth.
In synthesized or electronically altered speech, the nonphonetic aspect of the
sound can be made to behave in ways which are quite impossible for a human
vocal tract-in ways which, in fact, are quite "unvoice-like"-without directly
interfering with the phonetic message. In Charles Dodge's Speech Songs, for
example, the glottal waveform often changes pitch in unvoice-like discrete steps
and at a rate which could not be approached by a human performer. For the
most part, the texts of these songs are intelligible-despite the "unvoice-like"
behavior of the fundamental frequency in the sections to which I refer. How-
ever, the nonhuman behavior and the "weighting" of the information (rapid
discrete pitch change on a single vowel) draws attention to the nonphonetic
aspects of the sound-sometimes to the extent that the vowel is temporarily lost.
This restates the point made in the preceding section: unvoice-like behaviors
may draw listeners' attention away from phonetic information available in a sig-
nal. Moreover, the reverse can also apply: voice-like behaviors may draw lis-
teners' attention toward a phonetic interpretation of an ambiguous signal. (For a
related experiment, see Tsunoda 1971). Both of these influences play a role in the
strategies outlined below.
PHONETIC/NONPHONETIC PERCEPTION
I have distinguished above between voice-like sounds and sounds which cue
phonetic interpretation. I have also pointed out that any utterance which con-
tains phonetic information also contains nonphonetic information. It is possible
to go one step further to say that those aspects of a sound which determine
phonetic content (e.g. formant transitions in stop consonant/ vowel syllables)
can, to an extent, also be discriminated on a purely sonic basis. That is to say,
listeners can, within limits, discriminate between two signals which they identify
as being the same phonetically (both identified as the syllable /bi/, for example)
and which are identical in every aspect except in the exact structure of their for-
mant transitions. Moreover, under certain circumstances, a signal can be per-
ceived-and discriminated-either alternatively or simultaneously as speech and
nonspeech (see, for example, Bailey, et al. 1977). When sounds are heard as both
speech and nonspeech simultaneously, the phenomenon is called "duplex per-
ception" (Rand 1974; Isenberg and Liberman 1978; Liberman 1979).
Thus it is possible to derive two different types of information, sonic and
phonetic, from the same aspect of the same signal. This is so because of the con-
trasting nature of the two perceptual processes the listener brings to bear.
Phonetic perceptual processing involves attending to a more or less continu-
ous stream of sound as a carrier of separate and serially ordered phonetic seg-
ments. This is not to say that the sonic signal is itself "spliceable" into separate
phonetic representations. It is not. Adjacent phones of any given syllable are co-
articulated by the speaker and "interleaved" in the resultant acoustic signal. (See
Liberman and Studdert-Kennedy 1977.) The acoustic signal is decoded into sep-
arate and serially ordered segments, however, by the lister.
Sonic perceptual processing, on the other hand, does not usually involve the
discrimination and "labeling" ofsegments. Although we can discriminate sepa-
rate events in the signal, and can recognize the order of those events, we often
have no ready-made system of classification with which to label them. Instead,

"echoic" memory retains a recording of a sound we have just heard. Phonetic
memory, in contrast, retains a coded representation of a sound.'
COMPOSITIONAL STRATEGIES
I stated at the outset that I wished to discuss compositional strategies by means

of which composers influence listeners to focus their attention on speech sound
while also receiving a phonetic message. I specified the goal as situations in which
"listeners are aware of receiving a specific and intelligible phonetic message (not
necessarily morphemic), while the focus of their attention is primarily on the
sounds of speech-not only as cues for phonetically coded information, but as
timbres, pitches, durations." In light of the distinctions I have made thus far, it
is clear that this attentional focus involves something of a balance between two
perceptual processes. In listening to ordinary speech, our attention is often
occupied with the process of decoding the phonetic and syntactic messages to
the extent that we have little attentional capacity left to attend to speech sound.
Thus, in order to promote the dual attentional focus I describe above, the task is
to draw the listeners' attention either:
* AWAY from the phonetic information in the signal and TOWARD the
sonic information (in the case of sounds which lend themselves easily to
phonetic interpretation), or...
* TOWARD a phonetic interpretation (in the case of sounds which carry

much sonic interest and of which a phonetic interpretation cannot be easily
or continuously made.)
Moreover, as pointed out by Liberman (1979) and others, in order to hear

acoustic cues for speech as sound rather than only as the phonetic features they
represent, we must partially circumvent our specialized auditory perceptual
processes for extracting the phonetically relevant information in speech by utiliz-
ing sounds and contexts at the border of speech and nonspeech. While this is
clearly possible under some experimental circumstances (Rand 1974; Isenberg
and Liberman 1978), it is not at all clear to what extent this occurs in less
focussed listening situations.
What is clear, is that the listeners' attention can be influenced as suggested in
the two approaches outlined above. I divide strategies to these ends into two
classes:
*Strategies which may utilize a SINGLE PHONETIC SOURCE which

CONTINUALLY lends itself to phonetic interpretation.
*Strategies which may utilize MULTIPLE SOURCES (including non-

phonetic sounds) and/or INTERMITTENTLY INTELLIGIBLE
phonetic materials.
The first class of strategies has its effect by de-emphasizing or destroying the
relationship of the sounds to a hierarchical linguistic structure and by drawing
the attention instead to the purely sonic information. By themselves, these strat-
egies are often less compelling than the class two strategies, discussed below,
and, if listeners allow their attention to remain on the elementary linuistic infor-
mation in the signal, they may find these approaches unrewarding. Each strategy
may be more effective-as in most of the examples cited-when combined with
other strategies.
Class one strategies include uses of:
* syntactically and/or morphemically meaningless text,
* slow-motion delivery of text,
* multiple repetitions of text.
SYNTACTICALLY AND/OR MORPHEMICALLY MEANINGLESS TEXT
Some of the earliest examples of this approach taken in this century are the
nonsense poems of the dadaists and the Italian futurists. Many of them organ-
ized their nonsense poems sonically by working with a highly restricted phonetic
vocabulary. More recent examples of this technique include Ligeti's Nouvelles
Aventures for three vocalists and seven instrumentalists (Example 1).
I also include in this category text "deconstructions" such as Daniel Lentz'
Songs of the Sirens and Cage's 62 Mesostics Re Merce Cunningham because the
texts-in their fragmented form-draw the attention away from the syntactic
level specifically by means of their lack of morphemic and syntactic content.
It should be noted at this point that, just as it is possible for a text to be syntac-
tically nonsensical or morphemically nonsensical, a text may also conform to or
violate the phonological conventions of a given language. Thus the nonsense
word "shtimp" would be "phonotactically well-formed" in German but not in
English-"sh" does not precede "t" as an initial sound in English. Moreover,
the set of phonetic sounds itself varies from language to language (with large
areas of overlap). Tongue clicks, for example, are phonetic in some African lan-
guages but not in European languages. Thus even the question of whether lis-
teners hear phonetic nonsense or nonphonetic vocal sound in a given text may
depend to some extent on the languages with which they are familiar. A non-
sense text may thus say a great deal about itself even without involving mor-
phemically or syntactically intelligible information.
4 X ABRUPTO:
[Fl.utoe: *mtot. _ . p Fl. cclo]
rpre) r 5 r- o------------.
.8A"120
P. C -S-^1
me*l,l"^t,ur .rt_.I- ._ i----1 L ,
I'S
?ki K T Ji
.~,.,.. ' we,o,,. . , iT;.
I !ts!b! r!gE! c:A riLdr! ka! tjz!
Alt' o.~s* ^ ot .,.o,b;ots (Ite*FOOP Alt:tB I..... '
lt , ^YO l. sempfe
Haibti..oc, T 4, c4Lt=. 2 lMo.t
Avo so .
,, t4t.
t.* AooA5W
rate JfY\)
k ?-3e$| r At;t-
~ r 4 ^ r
'/ f - - St
;bJsIt - r>
,e 1 *
if je` k/ . Y ' '
jP. W ; , t e pW t^ - 1 - - -'.1 - -
B 7; ^fd^? ^7 1Th' ^ f?Syi K r
P -3-
;!t
<L'k
V?! kv! thae! ra! 6k.i o! t' s-- 'y!
?,! co!
(-
JA,ffi Wr^ '
[": ?-J
q
OLB tko- I - y
---- 'l _
b4l;B"F - - iiQ P-^K
if__e,
v -."-ooasroE,_
'~"~"~~.d.- - ":' ?et,
EXAMPLE 1
SLOW-MOTION DELIVERY OF TEXT
This approach is utilized along with other techniques in sections

sky's computer tape Six Fantasies on a Poem by Thomas Campio
Dodge's Speech Songs, in Roger Reynolds' Still, and in many other
A listener attempting to decode speech sounds phonetically
phonetic cues from transitions which are obscured when they occ
cantly slower-than-normal rate. Moreover, "slow motion" speec
teners with a slowed rate of phonetic and syntactic information
them the time necessary to "hear out" harmonics in the vowels an
to focus on timbral information in the signal.
MULTIPLE REPETITIONS OF TEXT
Perhaps the best-known example of this technique is found in

Steve Reich's Come Out. Literal repetition in some form, howev
mental strategy in a large body of works including many of the t
positions available on disc from Fylkingen in Stockholm (e.g.
Bodin's Forjon III) and in much American sound-poetry (e.g. Ch
nian'sJust).
Even a single immediate exact repetition objectifies a recorded word or phrase
by announcing, in effect, that the exactly repeated verbalization, with its associ-
ated rate and rhythm of delivery, intonation contour, and voice quality, is itselfa
unit-a "building block" which can be repeated and divided-rather than simply
a single instance of an infinitely flexible discourse. After one or more presenta-
tions, listeners have absorbed any morphemic and syntactic information in the
signal and are left to listen to the speech sound in the remaining repetitions. If
the repeated speech segment is short enough and is repeated for long enough,
the listener will involuntarily "deconstruct" the text into its acoustically related
elements. Again, the best known instance of this effect is found in Reich's Come
Out where the perceptual "streaming" is aided by other strategies discussed
below. (For a discussion of "streaming"-the perceptual segregation of acous-
tically related elements within a continuous auditory sequence into distinct
"channels"-see Lackner and Goldstein 1974). Repetition is thus used to call
attention away from syntactic information and towards the acoustic information
in a signal.
Whereas the first class of strategies may involve a single source continuously
presenting phonetic materials, the second class makes use of OTHER
SOURCES and/or INTERMITTENTLY INTELLIGIBLE phonetic materials.
These latter strategies draw attention to sonic information by means of:
* JUXTAPOSITIONS/TRANSITIONS between phonetic and non-

phonetic sounds,
* coordinated SUPERIMPOSITIONS of phonetic materials,

* extreme TEMPORAL FRAGMENTATION and/or TEMPORAL
REORDERING,
* "SOURCE/FILTER" EFFECTS.
By various means, these strategies draw our attention to auditory similar

between sounds perceived as phonetic and sounds perceived as nonphone
Moreover, the marginal phonetic intelligibility of texts presented in these m
ners minimizes the syntactic information in the signal and requires listener
pay conscious attention to the sound of phonetic cues in their attempt to decode
an ambiguous message.
JUXTAPOSITIONS AND TRANSITIONS
(Juxtapositions of and transitions between phonetic sounds and non-

phonetic sounds-vocal, electronic, or instrumental-having similar tim-
bral characteristics.)
When we describe the acoustic attributes necessary for a sound to be per

ceived phonetically, we are, in general, describing characteristics associated i
music with timbre. Given certain fairly broad constraints on fundamental fr
quency, we can describe simple steady-state continuants (vowels, nasals, liqui
... ), for example, purely in terms of their spectrum (formant center frequenc
formant bandwidth, relative formant amplitudes, and so forth). Fricatives can b
described in terms of the bandwidth and the quality of the noise-source, and so
on. By juxtaposing or creating transitions between phonetic sounds and non
phonetic sounds which have audibly similar timbral characteristics, some com
posers have sought to call attention to the timbre of the speech sounds.
Three compositions by Luciano Berio will serve as illustrations. Sequenza II
for female voice employs a marginally intelligible English-language text alon
with sounds which are nonphonetic in English (such as tongue clicks, hand
over-mouth, and so forth) which draw attention to vocal sound as well as tex
Similar techniques were used by the "Lettrists" and other early sound poets.
Berio's Circles, some of the unvoiced fricatives and rolled /r/ sounds in the voic
are imitated and overlapped by sustained percussion sounds of similar timbr
Berio's phonetic notation at these points in both the voice part and the percu
sion part make his intentions clear (Example 2). In the opening section of Visag
Berio uses electronics to imitate the "quasi-phonetic" text. Thus Berio has use
nonphonetic vocal sounds, instrumental sounds, and electronic sounds to im
tate the timbre of phonetic sounds, and thus to draw the listeners' attention to
speech sound as timbre.
Curiously, an inverse process also takes place. In the above examples, the li
tener's attention is drawn away from phonetic information and to sonic inform
tion in a signal. In cases where the signal does not readily lend itself to phonet
interpretation, a balance between sonic and phonetic perceptual processes ca
sometimes be obtained by drawing the listener's attention toward a phoneti
interpretation. In GesangderJuinglinge, for example, Stockhausen's effort to u
lize an acoustic continuum between the sounds of a singing voice (at on
extreme) and nonphonetic electronic sounds (at the other extreme) sometim
results in a marginally intelligible phonetic "sense" to electronic sounds whic
in another context-would be heard as nonphonetic. The filtered white noi
lends itself to interpretation as specific fricatives at times, while aggregates of sine

waves become interpretable as specific vowel resonances. There are numerous
psycho-linguistic experiments which illustrate this point. These experiments uti-
lize sounds which are not-at first-interpreted by listeners as being phonetic.
When the researchers ask the subjects to listen to the sounds as speech, the lis-
teners are able to interpret the signal phonetically and report the intended
phonetic message (Bailey, et al. 1977; Rand 1974; Isenberg and Liberman 1978).
By utilizing a continuum from phonetic to nonphonetic materials, Stockhausen
influences the listener of Gesag derJiinglinge to listen phonetically to sounds at
the borderline between speech and nonspeech. (Stockhausen gives a general
description of the "continuum" he created in his article "Speech and Music.")
+ tf--- .t
. i i ;.
I I I
I~~~a b
-
. I""' b
I I --I i
I I _ j
Luciano Berio-CIRCLES
? Copyright 1961 by Universal Edition (London) Ltd., London
All Rights Reserved
Used by permission of European American Music Distributors
Corporation, sole U.S. agent for Universal Edition
EXAMPLE 2
COORDINATED SUPERIMPOSITION OF TEXTS
(Multiple texts superimposed in such a way as to temporally al

sounds which are sonically similar but different phonetically
drawing attention to a class of speech sounds-e.g. fricatives,
etc.-while obscuring, to some extent, the phonetic message.)
Simply superimposing texts does not necessarily call attenti

sound. Depending on how it is done, superimposition may result
unintelligible or quasi-intelligible texts compelling no particul
focus. However, if the texts are carefully coordinated, so as to
sonically similar phonetic materials, the resulting texture may len
easily to sonic categorization than to phonetic categorization and id
In David Evan Jones's Passages for chamber choir, two pianos, tw
and organ, a limited vocabulary of speech sounds-an artificial n
guage"-is divided into three categories-vowels, unvoiced fricat
consonants. Vowels are superimposed on other vowels, fricatives
catives, and stops with other stops so that the phonetic identity
sounds is obscured, while the sonic characteristics of each category
nances (vowels), broadband noise (fricatives), rapid transien
emphasized. As each category serves a different function in the m
of the piece, the resulting textures lend themselves more easily to
zation and interpretation than to phonetic categorization (Examp
TEMPORAL FRAGMENTATION/TEMPORAL REORDERING
(Extreme fragmentation and/or reordering of text by electro

resulting in incomplete and ambiguous phonetic information.)
Texts electronically fragmented and/or reordered (played back

ple texts sampled alternately, and so on) can be distinguished fr
texts or "deconstructed" texts produced directly by humans in
tronic alterations often result in sounds which cannot be produc
vocal tract. Unless electronic fragmentation is performed with the
electronic "artifacts," instantaneous or abrupt transitions often re
beyond the physical capabilities of a human vocal tract. Similarl
speech sounds (vowel transitions for example) can be produced b
and backwards by a human performer, other speech sounds (su
can only be exactly reversed by electronic means. Lars-Gunnar B
ILfemploys electronic fragmentation along with rapid repetition of
other techniques.
6 (
SI
S2
A I
A2
TI
T2
BI
B2
PIANO I
PIANO 2
PERC I
PERC 2
ORGAN
EXAMPLE 3
SOURCE/FILTER EFFECTS
(Use of an unusual glottal waveform-e.g. vocal "fry" (creak voice), vocal

multiphonics, or (by electronic means) acoustic/electronic sources.)
As I pointed out above, any sound which can be interpreted phonetically also
has a nonphonetic component-sonic information which is not directly relevant
to the phonetic code. In many languages, changes in voice quality (e.g. raspy,
harsh, whispered, etc.) and pitch do not change the phonetic message itself.
By employing glottal waveforms which display attack and decay patterns, par-
tial frequencies, and behaviors of pitch and timbre change which are unusual or
impossible for the human glottis to produce, composers have drawn listeners to
focus on information in the speech signal which is not directly relevant to the
phonetic code. Moreover, this "nonphonetic" focus seems to transfer, to some
extent, to a nonphonetic focus on formant resonances and other aspects of the
sound which determine the phonetic message. This transfer of attention does
not preclude the possibility of phonetic and syntactic intelligibility, but it leaves
less attentional capacity available to this task.
In Still, for example, Roger Reynolds uses "vocal fry" extensively (and slows
the rate at which the text is presented to well below normal speaking rates). In
SoftMorning, City., Tod Machover uses the sound of a contrabass as the glottal
source (by means of a "cross synthesis" technique) for some fragments of the
text in the tape part. In The Stoy ofourLives, Charles Dodge uses a wide band of
converging and diverging sine waves to represent the "book voice" in his syn-
thesized-speech setting of Mark Strand's poem. Bengt Emil Johnson's 3/1970;
(bland) III ends with electronically processed vocal sounds most of which retain
their formant structure (and hence their phonetic identity) while the timbre of
the source is significantly altered. Thus vocal, instrumental, and electronic
sources have all been used for this effect. In each case, the marginal intelligibility
of the text is due partly to the above-cited transfer of the listener's attention to
nonphonetic aspects of the speech signal.
CONCLUDING THOUGHTS
The aim of both classes of strategies discussed above is to create a b

between attention directed to two perceptual processes-the phon
auditory. In most cases, this involves attenuating the listener's focus on
tactic content of the signal and calling attention to the sonic content.
cases, this involves calling attention to the phonetic information in a sig
does not readily lend itself to phonetic interpretation. By attenuating th
tic meaning in a clearly phonetic signal (class one strategies) or by means
ous juxtapositions, transitions, and superimpositions of phonetic soun
nonphonetic sounds (class two strategies), these approaches allow us to hear

overly familiar speech sounds with fresh ears-even as we continue to decode the
phonetic message. They make available to poets and composers "elements"
(speech sounds) which would otherwise be "bound" in their phonetic roles.
Although the perceptual issues which lead to and from the strategies outlined
above are complex, the realization of each of these strategies is potentially quite
simple. This is only to say that, in itself, it is not a significant accomplishment
merely to have brought the listener's attention to speech as sound; having done
this, most serious composers still face fundamental questions as to how to organ-
ize their pieces. To an extent, the particular strategies selected by the composer in
bringing the listener's attention to speech sound involve concomitant decisions
on compositional organization. The selection of strategies must be guided by the
composer's fundamental attitude towards the phonetic segments in the piece,
whether:
* to order the phonetic fragments compositionally (as, for example, in

Ligeti's NouvellesAventures), and/or...
* to highlight or to otherwise interact with a preexisting phonetic order (as in

Paul Lansky's Six Fantasies on a Poem by Thomas Campion), and/or...
* to construct the composition parallel to (but with little regard for) the
organization ofthephonetic segments in the text.
There exist formidable pieces which take each of these approaches. However, a
discussion of the ways in which the perceptual issues discussed above relate to
these pieces and to these general approaches requires a separate exposition.
Concern with relationships between sound and meaning in language is by no

means an innovation of the twentieth century. But the intensity with which the
word has been dissected and reconstructed is certainly unique to this century.
Let me close then with a brief quote from Velimir Khlebnikov, one of the cen-
tury's first "sound poets."
The word lives a double life. At times it grows like a plant and produces a
cluster of sonorous crystals, then the beginning of the sound takes on its
own life, while that part of reason which we call "The Word" remains in
shadow; at other times the word places itself at the service of reason; the
sound ceases to be "omnipotent" and absolute-it becomes "NAME"
and weakly carries out reason's orders. Thus, now reason obeys sound,
now pure sound obeys pure reason. It is a struggle between two worlds, a
struggle of two powers which is constantly carried on in the heart of the
word, giving a double meaning to language: two circles of shooting stars.
(Hausmann, 1969, 53)
It gives me great pleasure to acknowledge the many helpfiul criticisms, com-

ments, and suggestions provided by speech researcher Dr. Mary Regina Smith in
support of my work on this article. Any errors which may remain, however, are
entirely my responsibility.
NOTES
1. There are nonphonetic stimuli-such as tonal melodies-which, under

some circumstances can be shown to be coded rather than merely recorded
in memory. My discussion here focusses specifically on the differences
between sonic and phonetic perceptions of speech sounds.
REFERENCES
ARTICLES
Bailey, Peter J., Quentin Summerfield, and Michael Dorman. 1977. "O
Identification of Sine-Wave Analogues of Certain Speech Sounds." Haskin
ortories Status Report on Speech Research SR-51/52: 1-25.
Hausmann, Raoul. 1969. "The Optophonetic Dawn." Studies in the Twen

Century, 3:51-54.
House, A.S., K.N. Stevens, T.T. Sandel, and J.B. Arnold. 1962. "O
Learning of Speech-like Vocabularies. " Journal of Verbal Learning and Beh
1:133-43.
Isenberg, D., and A.M. Liberman. 1962. "Speech and Nonspeech Percepts
from the Same Sound. "Journalofthe AcusticalSociety ofAmerica 64, Suppl. No.
1:J20.
Lackner, James R, and Louis M. Goldstein. 1974. "Primary Auditory Stream

Segregation of Repeated Consonant-Vowel Sequences. "Journal of the Acoustical
Society ofAmerica 56:1651-2.
Liberman, A.M. 1979. "Duplex Perception and Integration of Cues: Evidence

that Speech Is Different from Nonspeech and Similar to Language." Ninth
International Congress of Phonetic Sciences, Symposium No. 8.
Liberman, A.M., and M. Studdert-Kennedy. 1977. "Phonetic Perception." In

Handbook of Sensory Physiology, Vol. 8, "Perception," edited by R1 Held, H.
Leiboweitz, and H.L. Teuber. Heidelberg: Springer-Verlag.
Rand, TC. 1974. "Dichotic Release from Masking for Speech." Journal of the
AcousticalSociety ofAmerica 64, Suppl. No. 1: J20.
Remez, Robert E., Philip E. Rubin, David B. Pisoni, and Thomas D. Carrell.
1981. "Speech Perception without Traditional Speech Cues. " Scince 212(4497):
947-50.
Ruppenthal, Stephen. 1975. "History of the Development and Techniques o

Sound Poetry in the Twentieth Century in Western Culture." Master's Thesi
California State University San Jose.
Stockhausen, Karlheinz. 1964. "Music and Speech. " Die Reihe 6:47-56.
Tsunoda, Tadanobu. 1971. "The Difference of the Cerebral Dominance
Vowel Sounds among Different Languages." The Journal of Auditory Researc
11:305-14.
SCORES
Berio, Luciano. Circles, for female voice, harp, and two percussion
London: Universal Edition, 1961.
Berio, Luciano. Sequenza III, for female voice. London: Universal

1968.
Gaburo, Kenneth. Lingua II: Maledetto, for seven virtuoso speakers.

California: Lingua Press, 1976.
Ligeti, Gy6rgy. NouvellesAventures, for three singers, and seven instrum

New York: C. F. Peters, 1966.
Machover, Tod. Soft Morning, City! for soprano, double bass, and co
generated tape. Paris: Ricordi Press, 1980.
SOUND RECORDINGS
Amirkhanian, Charles. Just. (Phono-disc) 1750 ARCH 1752.
Berio, Luciano. Circles. (Phono-disc) Wergo Schallplattenverlag G

60021.
Berio, Luciano. Sequenza III. (Phono-disc) Wergo Schallplattenverlag GmbH,

WER60021.
Berio, Luciano. Thema (Omaggio a Joyce. (Phono-disc) Turnabout (Vox) TV

34177.
Berio, Luciano. Visage. (Phono-disc) Candide 31027.
Bodin, Lars-Gunnar. For Jon III (They Extricated their Extremities plus for John)
(Phono-disc) Fylkingen Records FYLP 1029.
Cage, John. Sixty-Two Mesostics Re. Merce Cunningham. (Phono-disc) 1750

ARCH 1752.
Dodge, Charles. Speech Songs. (Phono-disc) Composers Recordings, Inc., CRI

SD 348.
Dodge, Charles. The Story of our Lives. (Phono-disc) Composers Recordings,

Inc., CRI SD 348.
Eimert, Herbert. Epitaph firAikichi Kuboyama. (Phono-disc) Wergo Schallplat-

tenverlag GmbH, WER60014.
Gaburo, Kenneth. Lingua II: Maledetto. (Phono-disc) Composers Recordings,

Inc., CRI SD 316.
Johnson, Bengt Emil. 3/1970; (bland) ILl. (Phono-disc) Sveriges Radio 20-1.
Lansky, Paul. Six Fantasies on a Poem by Thomas Campion. (Phono-disc) Com-

posers Recordings, Inc., CRI SD 456.
Lentz, Daniel. Songs of the Sirens. (Phono-disc) ABC Command COMS 9005.
Ligeti, Gyorgy. Nouvelles Aventures. (Phono-disc) Candide CE 31009.
Machover, Tod. Soft Morning City! (Phono-disc) Composers Recordings, Inc.,

CRI SD 506.
Reich, Steve. Come Out. (Phono-disc) Odyssey 3216 0160.
Reynolds, Roger. Still. (Phono-disc) Vital Records, Inc., VR1801-2.
Stockhausen, Karlheinz. Gesangder Junglinge. (Phono-disc) Deutsche Gram-

mophon 138 811.

Compositional Control of Phonetic:Nonphonetic Perception

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Compositional Control of Phonetic:Nonphonetic Perception

Uploaded by

Copyright:

Available Formats

Compositional Control of Phonetic/Nonphonetic Perception

Author(s): David Evan Jones

DAVID EVAN JONES

N PERCEIVING ORDINARY speech, we interpret a continuously variable stream

SOURCE RECOGNITION: Is the source vocal, instrumental, electronic?

PHONETIC/NONPHONETIC PERCEPTION: Is the listener receiving a

SYNTACTIC STRUCTURE: Do the morphemes combine to form

While a few writers have approached the speech-as-sound poetic

levels of the speech hierarchy (phonetic, morphemic, syntactic). As will be illus-

perception. In deriving a phonetic message from an acoustic signal, listeners rou-

PHONETIC AND NONPHONETIC INFORMATION IN SPEECH

It is impossible to convey only phonetic information aurally. Any sound which

have no ready-made system of classification with which to label them. Instead,

I stated at the outset that I wished to discuss compositional strategies by means

* TOWARD a phonetic interpretation (in the case of sounds which carry

Moreover, as pointed out by Liberman (1979) and others, in order to hear

*Strategies which may utilize a SINGLE PHONETIC SOURCE which

*Strategies which may utilize MULTIPLE SOURCES (including non-

* syntactically and/or morphemically meaningless text,

* slow-motion delivery of text,

* multiple repetitions of text.

SYNTACTICALLY AND/OR MORPHEMICALLY MEANINGLESS TEXT

SLOW-MOTION DELIVERY OF TEXT

This approach is utilized along with other techniques in sections

MULTIPLE REPETITIONS OF TEXT

Perhaps the best-known example of this technique is found in

* JUXTAPOSITIONS/TRANSITIONS between phonetic and non-

* coordinated SUPERIMPOSITIONS of phonetic materials,

By various means, these strategies draw our attention to auditory similar

JUXTAPOSITIONS AND TRANSITIONS

(Juxtapositions of and transitions between phonetic sounds and non-

When we describe the acoustic attributes necessary for a sound to be per

lends itself to interpretation as specific fricatives at times, while aggregates of sine

COORDINATED SUPERIMPOSITION OF TEXTS

(Multiple texts superimposed in such a way as to temporally al

Simply superimposing texts does not necessarily call attenti

TEMPORAL FRAGMENTATION/TEMPORAL REORDERING

(Extreme fragmentation and/or reordering of text by electro

Texts electronically fragmented and/or reordered (played back

(Use of an unusual glottal waveform-e.g. vocal "fry" (creak voice), vocal

The aim of both classes of strategies discussed above is to create a b

nonphonetic sounds (class two strategies), these approaches allow us to hear

* to order the phonetic fragments compositionally (as, for example, in

* to highlight or to otherwise interact with a preexisting phonetic order (as in

Concern with relationships between sound and meaning in language is by no

It gives me great pleasure to acknowledge the many helpfiul criticisms, com-

1. There are nonphonetic stimuli-such as tonal melodies-which, under

Hausmann, Raoul. 1969. "The Optophonetic Dawn." Studies in the Twen

Lackner, James R, and Louis M. Goldstein. 1974. "Primary Auditory Stream

Liberman, A.M. 1979. "Duplex Perception and Integration of Cues: Evidence

Liberman, A.M., and M. Studdert-Kennedy. 1977. "Phonetic Perception." In

Ruppenthal, Stephen. 1975. "History of the Development and Techniques o

Berio, Luciano. Sequenza III, for female voice. London: Universal

Gaburo, Kenneth. Lingua II: Maledetto, for seven virtuoso speakers.

Ligeti, Gy6rgy. NouvellesAventures, for three singers, and seven instrum

Amirkhanian, Charles. Just. (Phono-disc) 1750 ARCH 1752.

Berio, Luciano. Circles. (Phono-disc) Wergo Schallplattenverlag G

Berio, Luciano. Sequenza III. (Phono-disc) Wergo Schallplattenverlag GmbH,

Berio, Luciano. Thema (Omaggio a Joyce. (Phono-disc) Turnabout (Vox) TV

Berio, Luciano. Visage. (Phono-disc) Candide 31027.

Cage, John. Sixty-Two Mesostics Re. Merce Cunningham. (Phono-disc) 1750

Dodge, Charles. Speech Songs. (Phono-disc) Composers Recordings, Inc., CRI

Dodge, Charles. The Story of our Lives. (Phono-disc) Composers Recordings,

Eimert, Herbert. Epitaph firAikichi Kuboyama. (Phono-disc) Wergo Schallplat-