This action might not be possible to undo. Are you sure you want to continue?
A Thesis Submitted to the Faculty in partial fulfillment of the requirements for the degree of Master of Arts in Digital Musics by Paul G. Osetinsky DARTMOUTH C OLLEGE Hanover, New Hampshire June 2010
David P. Casal, Ph.D. (co-chair)
Michael A. Casey, Ph.D. (co-chair)
Stanley B. Link, Ph.D. (Vanderbilt University)
Brian W. Pogue, Ph.D. Dean of Graduate Studies
© Co pyr i gh t by P au l Os et in s ky 20 1 0
Sounds, like any other natural phenomenon, are always perceived in context. For sounds in particular, this context is multifaceted, encompassing social, environmental, instrumental, and any other origins from which they arise, but also the larger acoustic structures built from sounds themselves over two dimensions: time and frequency. Musique concrète, the first music to begin from concrete, recorded sounds rather than abstract musical notation, separates sounds from their causal environments, as well as the larger sound chains—the temporal contexts—to which they contribute with other sounds sequentially as links. Microsound, a related aesthetic that employs sonic “grains” of durations at the lower threshold of human perceptibility, only takes the temporal sound separations characteristic of musique concrète to a lower extreme. This thesis addresses the musical implications of emancipating sounds from their frequency, or spectral contexts, those to which they contribute with other sounds simultaneously.
Dartmouth College funded my entire experience as part of the Digital Musics Program, giving me the opportunity to devote myself to music for two years on a beautiful campus filled with extraordinary people—for this I will always be grateful. Finishing this thesis is bittersweet, and I will truly miss Hanover and the people I have met here. This thesis comes at the end of a long journey, which was by no means easy for me. I faced a significant learning curve, and I am obliged to several individuals for their patience and assistance in helping me climb over it. First, to those on my thesis committee: David Casal, whose encouragement proved indispensable when the doubts flooded in faster than the ideas, came to Hanover from London at just the right time. Without him, I would have lacked the confidence to pursue this particular thesis topic, as well as the aesthetic advice he provided for me in my compositional work. I am also indebted to Michael Casey, who invited me to study as part of this program, spent much of his valuable time meeting with me and reviewing my work over the past two years, and offered me immense freedom to explore the oftentimes overwhelming world of sound and music. His own work directly influenced this thesis, and if it were not for the time he devoted to developing SoundSplitter for our fall 2009 seminar, this thesis would certainly not have been written. Stan Link changed my undergraduate career at Vanderbilt University, and I am sure that I would not have pursued graduate studies in music without his input. His continuing kindness and support is something truly admirable and a quality every student seeks in a professor. Next, I am thankful for having had such an enjoyable group of graduate students and friends with whom to pass the majority of my time at Dartmouth. Patrick Barter and Chris Peck, my fellow “second-years,” have undoubtedly taught me more about music than I have taught them since the fall of 2008. Josh Hudelson, Roth Michaels, and Alex Wroten of the class of 2011, and Michael Chinen, Beau Sievers, and Kristina Wolfe of the class of 2009 all contributed to my learning experience at Dartmouth as well during my relatively shorter time with them. I feel privileged to have been part of such an impressive group of students and composers. I also owe special thanks to Spencer Topel, who critiqued my music and introduced me to the spatialization techniques that I explored in the latter part of this thesis. Rebecca Fawcett gave me a great first impression of the Digital Musics Program in my interactions with her as I began the application process in 2007, and she should be commended for making all our lives in Hallgarten easier and more enjoyable. I should also thank Larry Polansky and Jody Diamond, who were always generous with their time and knowledge; Kui Dong, who introduced me to several interesting composers; Newton Armstrong, whose fall 2008 seminar instilled in me a deep appreciation for musical structure; Jon Appleton, who met with me individually several times to discuss my music; Bob Duff, who introduced me to the fascinating subject of counterpoint, which heavily influenced this thesis; Charles Dodge, whose course in composition I am lucky to have taken; Dmitri YanovYanovsky, who led me to Ligeti’s notion of micropolyphony; and Doug Perkins, who made it possible for me to perform my music for the first time in New York and Boston. Finally, I dedicate this thesis to my family, who has always supported me in all of my endeavors.
Table of Contents
Preface ................................ ................................ ................................ .......................... v Part I: Microsound ................................ ................................ ................................ . 1
1. 1 1. 2 1. 3
Int ro duc t io n ................................ ................................ ................................ ............. 1 1. M ic ro s ou nd B ac kg ro un d ................................ ................................ ................... 1
G a bor ................................ ................................ ................................ . 1 X ena k is ................................ ................................ ............................... 4 R oa d s & Tr ua x ................................ ................................ .................... 5 2. Gr ai n An a to my ................................ ................................ ................................ ... 6 2. 1 X ena k is G r a in s & G a b o r G r a i n s ................................ ......................... 6 2. 2 Ce l ls, S cr e en s , & Bo o ks ................................ ................................ ...... 9 2. 3 G r a i n D i v is ib il it y ................................ ................................ .............. 10 3. Gr an u la r Syn th e si s & G r an ul a ti o n ................................ ................................ . 12 3. 1 Ti m e D om a in G r a nu la t io n ................................ ................................ 13 3. 2 “Fr e qu en cy D o m a i n G r a n u la t io n” ................................ ..................... 17
Int ro duc t io n ................................ ................................ ................................ ........... 20 1. A ud it ory Sc ene A n aly s i s ................................ ................................ .................. 24
S eq ue n t ia l I n te gr a ti o n ................................ ................................ ....... 25 S im ul ta n eo us I n t e gr a t io n ................................ ................................ .. 26 2. Pr ob a bi l i st ic L at en t Co mp on en t A n a ly si s ................................ ...................... 30 3. PL C A A p pl ic a ti on s in M u sic ................................ ................................ ........... 34 3. 1 O n o m a to s ch i zo ................................ ................................ ..................... 35 3. 2 R a cq u el em en t ................................ ................................ ...................... 39 3. 3 S tra to v in s ky ................................ ................................ ........................ 43 1. 1 1. 2
The Block of Sound: Alternative Responses ................................ ...... 20
Part III: Stratosound ................................ ................................ ............................... 47
Int ro duc t io n ................................ ................................ ................................ ........... 47 1. Sou nd O b jec t s w it hi n So und O bje ct s ................................ ............................ 48 2. 3. 4. 5.
Ma s s ................................ ................................ ................................ . S ou nd Ob je ct s, M us i ca l Ob je cts , & Ma g n e ts ................................ ..... Fr om Unc o nt ra cted to C on tr ac ted ................................ ................................ . 2. 1 D es ir ed (S i gn a l ) & Un d es ir ed ( N o is e) ................................ ............... 2. 2 E m a n c ip a t io n s ................................ ................................ .................. Ac o us m at ic s ................................ ................................ ................................ ..... 3. 1 R eor d er i n gs & R ea l i gn m en ts ................................ ............................. A ur a l Im a ge s ................................ ................................ ................................ .... 4. 1 S ou nd Ob je ct Co h er e n c e ................................ ................................ .. 4. 2 T wi ce - R ed uc ed Li ste n in g ................................ ................................ .. Po i nt ag a in s t P oi n t ................................ ................................ .......................... 5. 1 A n A r t o f S e gr e ga ti o n ................................ ................................ ....... 5. 2 S ynt he ti c & N a t ur a l M icr o p o l yp h o ni es ................................ ............. 5. 3 Ch im er a s, Hyd r a s, & V ir t ua l S o ur ce s ................................ ................ Str a in s ................................ ................................ ................................ ............... 1. 1 1. 2 49 50 51 52 52 54 55 56 57 58 59 61 63 64 66
The Positive of Sound: Conclusions ................................ ................................ ..... 70 Appendix ................................ ................................ ................................ ..................... 73 Biblio graphy ................................ ................................ ................................ ............... 79
Sounds, like any other natural phenomenon, are always perceived in context. For sounds, this context is multifaceted, encompassing social, environmental, instrumental, and any other origins from which they arise, but also the larger acoustic structures built from sounds themselves. Sounds are often heard sequentially within a succession of others, each inevitably influencing how the others are perceived. The words that are heard to comprise a spoken sentence provide a simple example here. Sounds are also often heard simultaneously, in which case they cannot be perceived in the same way they would be in isolation. The note C is perceived very differently in the contexts of a C major chord, a C minor chord, and by itself. Yet, as we will see, no sound heard in nature is truly one sound, and even what we may think of as “single” sounds (e.g. a spoken word, the note C played on the piano, etc.) can themselves be conceived as the result of many smaller sounds that combine sequentially and simultaneously over two dimensions: time and frequency respectively. These smaller sounds interact with each other over time and frequency to create an acoustic context on a smaller level known as the timbre of the larger sound they comprise. This thesis addresses the musical implications of extracting those smaller sounds that combine simultaneously over frequency from their spectral contexts, primary determinants of the timbres to which they contribute (“spectrum”). Sound is a wave of pressure propagating from one or multiple sounding bodies simultaneously. As philosopher Aden Evens notes, in perceiving a sound, in hearing, we contract 1 the compressions and rarefactions of air pressure fluctuations caused by the wave of pressure propagating from such sound sources (Evens 2005, 1). He elaborates:
What hearing contributes to sound…is a contraction. Hearing takes a series of compressions and rarefactions and contracts them, hears them as a single quality, a sound (Evens 2005, 1).
The rate or frequency at which that wave fluctuates is contracted over time by our hearing into we perceive as pitch, the most straightforward difference between the notes A, B, and C played on the same piano. The amplitude of the pressure wave, the magnitude of difference between the maximum pressure of compression and minimum pressure of rarefaction caused by the wave, is contracted over time into what we perceive as loudness; the greater the change in pressure, the louder the sound is perceived to be (Evens 2005, 1). The piano and the violin, each playing the same pitch (i.e. frequency) at the same loudness (i.e. amplitude), still sound very different to us, as they each are characterized by their own particular timbre. What we perceive as timbre, notes Evens, is the result of contractions similar to those that contribute to our perception of pitch and loudness (Evens 2005, 2). In the simplest case, timbre can be thought of as the shape of the pressure wave propagating from a sound source as it moves from maximum pressure to minimum pressure (Evens 2005, 2). This shape of the sound wave can be represented as the summation of the shapes of smaller sounds that lie within it. As it is the simplest type of wave, a sine wave represents the simplest timbre, and any sound— any timbre—can be considered as the sum of an infinite number of sinusoids (i.e. waves whose shapes resemble sine waves). Jean Baptiste Joseph Fourier presented this influential theory in the early 19th century, which states more specifically that any periodic waveform can be deconstructed into combinations of sinusoids of different frequencies, amplitudes, and phases2 (Roads 2001, 243). The Fourier transform can conduct this deconstruction of a “single,” larger sound into the appropriate arrangement of its constituent sinusoids, its many smaller sounds, and it represents a signal once described with amplitude values over time instead with frequency values over time:
[T]hat is, it specifies the relative amplitudes of sine waves of various frequencies that would have to be added together to generate the original signal…. Every sound can be decomposed into sine waves and can be regenerated by adding together sine waves in the right proportions (Evens 2005, 4).
Integrate may be a more appropriate term. The “integration” of sounds is explored in Part II and Part III. The phases of these sinusoids refer to the time relationships between their successive cycles.
The spectrum of a sound “is a description giving the amplitude (and/or) phase of each frequency component in it” (Bregman 1990, 735). A spectrogram graphically represents a time-varying spectrum with time on the x-axis and frequency on the y-axis, and amplitude with a shade of darkness (Bregman 1990, 735). It is analogous to the Gabor matrix, discussed and displayed on the next page. Sounds of the same pitch and loudness produced by the piano and violin have, “as a most significant component, the same basic sine wave as their fundamental frequency” (Evens 2005, 3). It is the additional sine waves, known as partials—characterized by their frequency values, time-varying amplitudes, and phases—which must be added to this fundamental sine wave that distinguish the timbre of, for instance, a piano from that of a violin (Evens 2005, 3). These partials may or may not be harmonics, which are frequency components that have integer multiple frequency values of a fundamental frequency (Bregman 1990, 731). Given a single sinusoid, our hearing contracts frequency and amplitude over time into pitch and loudness respectively. Given the many sinusoids of varying frequencies and amplitudes that combine to form a “single,” larger sound, our hearing contracts these multiple sinusoids over frequency into what is perceived as the timbre of that larger sound, or the overall “sound” of that larger sound (Evens 2005, 5). Again, the many sinusoids that may exist within a “single” sound together comprise that “single” sound’s “overall sound,” or timbre, which is ultimately determined by the overall shape of the sound wave resulting from those many sinusoidal waves that add together to create it (Evens 2005, 2-3). Hearing contractions, then, occur throughout both the time domain and frequency domain of any sound with multiple sinusoids. The summation of sinusoids of differing frequencies and amplitudes (A, B, C, and D below) into a sound of a more complex timbre is displayed below in E:
Sound wave E can be decomposed into sine waves A , B, C , & D (Evens 2005, 4).
As an individual sine wave “has a minimal timbre,” its perception involves fewer contractions: those of frequency into pitch and amplitude into loudness (Evens 2005, 4). However, as Evens notes, in sounds of more complex timbres, “hearing does more contracting”:
It hears…one sound…as a set of sine waves; it contracts this set of waves into a singular sound of a distinctive character, rich texture (Evens 2005, 4).
Sounds with more complex timbres are not at all limited to those produced by traditional instruments. Evens notes that any sound can be conceived as “nothing but timbre”:
For to describe the shape of a wave is just to describe the wave…in the totality of its detail. As a reference to shape, timbre captures not only the gross features of the wave, its overall curve, but also every tiny variation, every little blip, every notch or bump in the motion of air…. The “sound” of a sound is its timbre, which contracts its multilayered complexity into a distinctive quality; tones added together, tones subtracted or shifted in place, bursts of tone, slow undulations of tone, but always the sum of sinusoids (Evens 2005, 5-6).
This reflects Fourier’s view of sound, to which British physicist Dennis Gabor rightly took exception. Fourier’s theory, argued Gabor, badly represents human hearing because it is a “timeless” view of sound, considering the phenomenon in an infinite interval (Gabor 1947, 591). One premise forms the basis of Fourier’s theory: if a signal contains a single frequency, then that signal must be an
infinite-length sinusoid (Roads 2001, 250). That is, if the concept of frequency can refer only to infinitely long signals, then the concept of changing frequency is impossible (Roads 2001, 250). Gabor provided a solution to this problem by representing sound with a combination of time and frequency, two previously separated dimensions (Roads 2001, 58). He manifested this combination in the notion of the acoustical quantum, an atomic unit of elementary acoustical information bounded by discrete units of time and frequency (Roads 2001, vii). While a signal can be considered as the sum of its sinusoidal components, Gabor noted, the components do not have to be considered as infinite-length sinusoids (Gabor 1947, 592). He argued that any sound could be decomposed into acoustical quanta bounded by discrete units of time and frequency—into single cycle sine waves (Roads vii). Gabor’s acoustical quanta are depicted below in the Gabor matrix:
Gabor Matrix t Acoustical Quantum f f Grain (i.e. Time Component)
f t t Gabor’s acoustical quantum represents a minimal amount of acoustic information. The grain, discussed soon, is a musical generalization of the acoustical quantum proposed by composer Iannis Xenakis.
As discussed soon, Gabor’s 1947 paper entitled “Acoustical Quanta and the Theory of Hearing” (Gabor 1947) inspired Greek composer, architect, and engineer Iannis Xenakis to propose his musical theory of microsound, with its musical unit, the grain, deriving from the notion of the acoustical quantum. First, the work of French composer Pierre Schaeffer, which equally influenced Xenakis’ development of microsound, should be addressed. In 1948, Schaeffer used the word “concrète” to describe his music—musique concrète—the first to begin from recorded, concrete sounds rather than abstract musical notation (“musique concrète”). Schaeffer wished to distinguish his music from traditional music, which began “from an abstract conception and notation leading to a concrete performance” (“musique concrète”). The chief musical unit of musique concrète is the objet sonore, or sound object, which is obtained by separating an acoustic event from its causal environment or context by means of the microphone and recording medium. This acoustic event is usually separated further from other sounds that surround it in temporal proximity on the recording, that is, in terms of time, and freed to serve outside this chain of signification (Cox, Warner 2004, 330). Schaeffer stressed that the sound object is not to be confused with the sounding body that produces it, nor even the recording medium on which it rests (Schaeffer 1967, 73.2). It is a sound to be perceived without relation to its source. The sound object, asserted Schaeffer, calls for what he referred to as écoute reduit, or reduced listening, an attitude that consists of listening to a sound for its own sake, as a sound object, by removing its real or supposed source and any meaning conveyed by that source (“reduced listening”). In reduced listening, “listening intention targets the event which the sound object is itself (and not to which it refers) and the values which it carries in itself (and not the ones it suggests)” (“reduced listening”). Reduced listening and the sound object, then, are correlates of each other, as “they define each other mutually and respectively as perceptual activity and object of perception” (“reduced listening”). As British composer and music theorist Trevor Wishart writes:
It is important in Schaeffer’s development of the concept of the sound-object that it be detached from any association with its source or cause. The sound-object is to be analysed for its intrinsic acoustic properties and not in relation to the instrument or physical cause which brought it into being (Wishart 1996, 129).
The sound object may facilitate reduced listening if it has been separated from the other sounds that surrounded it in temporal proximity on the original recording, certain sounds that would otherwise reveal the separated sound object’s source, but also because it is heard through loudspeakers,
separated from its sounding environment and detached from any visual association with a sound source (e.g. instrument and instrumentalist). By detaching a certain sound from its environmental origin and from other sounds at different time locations on a recording, by separating it from its former causal and temporal contexts, the sound object may be obtained. The sound object allows for a particular case of musique concrète called musique acousmatique, acousmatic music, which focuses specifically on those sounds whose source is hidden to the ear. Schaeffer borrowed the term acousmatic from the Greek akousmatikoi, the disciples of Pythagoras who, in order not to be distracted by his presence so as to focus more absorbedly on the content of his words, only heard him speak from behind a veil. One hears acousmatic music without seeing the source of a sound, by listening from behind the “veil of the loudspeaker” (Schaeffer 1977). Wishart describes acousmatic listening as “the apprehension or appreciation of a sound-object independent of, and detached from, a knowledge or appreciation of its source” (Wishart 1996, 67). He writes:
This means not only that we will ignore the social or environmental origins or intention of the sound (from a bird, from a car, from a musical instrument, as language, as distress signal, as music, as accident) but also its instrumental origin (voice, musical instrument, metal sheet, machine, animal, larynx, stretched string, air column etc.)…. We should concern ourselves solely with its objective characteristics as a sound-object (Wishart 1996, 67).
The “veil of the loudspeaker” that hides performing instrumentalists signifies a withholding of information characteristic of acousmatic music. It represents a withholding of visual information, however, and not aural information. The composers of musique concrète and musique acousmatique in particular did not limit themselves to withholding visual information, or even to separating the entire life of a complex instrumental sound from other sounds along the time axis of a sound recording (i.e. withholding aural information unrelated to the attack, decay, sustain, and release of a “single” sound). Schaeffer himself was “baffled” by his inability to decipher the instrumental source of a sound upon hearing its decay, sustain, and release without—or separated from—its attack, the beginning portion of a sound most indicative of its source (Schaeffer 1967, 41.6). Even if the various time fragments of a sound object are just displaced in time, not withheld (i.e. if the different time fragments of a larger sound are reordered in time and played back), their source may be disguised. Xenakis, a colleague of Schaeffer at the Parisian Groupe de Recherches Musicales (GRM), offered his own response to musique concrète in the form of microsound. Microsound, which employs sonic “particles” whose durations extend down to the lower threshold of perception, largely arose as a critique of musique concrète (Solomos 1996). In developing his idea of microsound, Xenakis portrayed sound objects as “block[s] of one kind of sound,” and responded to musique concrète by decomposing the sound object—by separating it further in terms of time—into these brief particles of sound, which he called grains (Solomos 1996; Xenakis 1992, 43). As the sound object constitutes the chief musical material of musique concrète, the grain does so for microsound. It is important to note that, while Xenakis’ notion of the grain derives directly from Gabor’s notion of the acoustical quantum, the grain as extracted from the sound object is not explicitly limited in terms of its spectral (i.e. frequency) content as it is in terms of its duration. That is, while the grain ceases to be a grain once its duration exceeds a certain threshold (i.e. 100 ms), even drastic increases in the amount of its frequency content will not alter its granular status. In contrast, the acoustical quantum is limited by discrete units of both time and frequency, and thus can be considered necessarily micro in terms of both time and frequency. In terms of acoustical quanta (the very concept from which it derives), the microsonic grain is only necessarily micro to the extent that its duration is short. The grain represents a small time fragment, or time component—not a single frequency component—of the larger sound from which it is taken. Unlike the acoustical quantum, the grain may have a divisible spectrum. The micro aspects of microsound are certainly biased toward the time domain, away from the frequency domain. The sound object of musique concrète and the grain of microsound, as they are extracted from longer sounds in terms of time, represent a prevention of those contractions that would otherwise
take place over time. These temporal sound separations may prevent a certain number of notes from combining over time into what we would perceive as a complete melody (i.e. a sequence of notes), or a certain number of words combining over time into what we would perceive as a complete sentence. These sound separations remove shorter sounds from their longer temporal context. The types of contractions explicitly prevented by the separation of sounds from sounds in musique concrète and microsound are not contractions over frequency. That is, they do not, at least explicitly,3 remove constituent sinusoids of a larger sound from their former spectral context—the very context that gives simultaneously occurring sinusoids semantic relationships yielding timbre. Musique concrète and microsonic granulation are characterized by their sound separating techniques. However, in decomposing one larger sound into multiple, smaller ones to be arranged into music, they have focused almost exclusively on the horizontal separation of sounds from others, that is, over time. Perhaps this is because they both developed around such technologies as the blade and magnetic tape, around tape cutting and splicing, which allow only for the separation of sounds from sounds in terms of time. For the purpose of decomposing the larger sound object into many smaller musical units, neither aesthetic has focused on the vertical separation of sounds from others, that is, over frequency. Such a separation would prevent those contractions that our hearing applies to, as a simple example, the notes that combine simultaneously to comprise a chord. Concrète composers did often use filtering, an operation that attenuates certain frequency components in the time-frequency spectrum of a sound (“filter”), to transform one sound object into another. Still, as Schaeffer himself noted, many sounds are “almost indestructible” under the filter that separates certain frequency components from others (Schaeffer 1967, 1.20). That is, filtering a sound object often cannot decompose it into smaller, highly distinctive and perceptually coherent sound objects over frequency. This is because a filter is used to separate only certain bands of frequencies from a sound object, into which important, qualitative characteristics of the original may seep. Schaeffer used the term mass to describe the degree of resistance or “indestructibility” a sound shows to the filter (Schaeffer 1967, 1.22). This thesis utilizes a new technology, Probabilistic Latent Component Analysis (PLCA) (Raj et. al. 2006; Casey, Westner 2001; Casey 1998), which proves massy sounds otherwise “indestructible” under the filter to indeed be “destructible” into spectral fragments. These spectral fragments, when summed along the frequency axis, reconstruct the original sound from which they are broken. The music composed for this thesis uses PLCA to scatter these spectral fragments throughout time and space to prevent those contractions that would otherwise take place over frequency, those that our hearing applies to the “multilayered complexity” comprising the timbre of sounds otherwise perceived to be singular (Evens 2005, 6). Such layers of frequency components are separated from each other to be contracted and appreciated for their own inherent qualities, qualities that exist independently from other layers in the multilayered original. Via the deconstructive, vertical separation of sounds from sounds, groups of sinusoids from others, these layers can be perceived outside of the spectral context to which they once contributed. This thesis, then, addresses the musical implications of separating, or deconstructing, the sound object along its frequency axis into time-frequency strata of sounds, and notes how these implications differ from those of deconstructing the sound object along its time axis into time-frequency columns of sounds, or microsonic grains. It is divided into three parts: Part I discusses the separation of sounds from sounds in the time domain characteristic of microsound as derived from musique concrète. Its purpose is to highlight how the decomposition of the sound object into smaller pieces to be arranged musically has focused overwhelmingly on the time domain, but also to give the reader an idea of how smaller units of the decomposed sound object might be arranged. Part II describes the music composed for this thesis, and how it owes itself partly to the misuse of PLCA. Lastly, Part III reflects upon this music, particularly from the viewpoints of musique concrète and microsound.
As discussed in Part I, if shortened to a great enough extent, the grain will lose some of its lower frequencies. In brief, lower frequencies require more time to complete their cycles and thus to be perceived, as f =1/ t. The time and frequency domains are inextricably connected (What meaning would frequency have without time?).
Everything is solid until the moment it dies and gets scattered. –Cornelia Parker
Iannis Xenakis discusses his Concret PH (1958), the first microsonic piece, in the following:
Start with a sound made up of many particles, then see how you can make it change imperceptibly, growing and developing, until an entirely new sound results…. This was in defiance of the usual manner of working with concrète sounds. Most of the musique concrète which had been produced up to the time of Concret PH is full of many abrupt changes and juxtaposed sections without transitions. This happened because the original recorded sounds used by the composers consisted of a block of one kind of sound, then a block of another, and did not extend beyond this (Solomos 1997).
The microsound aesthetic thus largely began as Xenakis’ critical response to Schaeffer’s musique concrète. Nevertheless, in his development of microsound, Xenakis still adopted the same tapecutting techniques Schaeffer used to separate the sound object from its former temporal context of the longer recording. That is, microsound began, like musique concrète, with the separation of sounds from others—with the decomposition of a larger sound into smaller ones—in terms of time. Part I of this thesis discusses how Xenakis compositionally exploited the idea that many smaller sounds exist along the time axis of relatively larger sounds. Along with musique concrète, this concept, proposed by physicist Dennis Gabor (Gabor 1947), heavily influenced microsound. Part I also addresses the difference between the two main forms of microsound—granular synthesis and granulation—and focuses on the latter, which is related to the use of recorded sounds characteristic of musique concrète, while the former in its strictest sense is not. Exhibited in Concret PH, granulation is a chiefly time domain operation of separating time components with durations no greater than 100 ms—grains—from the larger sound object. What has been referred to as “frequency domain granulation,” a frequency domain operation of separating certain frequency components from the larger sound object, is also addressed (Roads 2001, xi). Frequency domain granulation has received far less attention as a means by which to decompose the sound object into new musical material as compared to time domain granulation, and is argued to deserve more exploration. The purpose of Part I is to elucidate how the decomposition of the sound object into smaller pieces has focused on yielding sonic, time-frequency columns—not strata—to be arranged into new musical structures. It aims to give the reader a background of microsound theory, which heavily influenced this thesis. Part II and Part III will refer to aspects of Part I to distinguish between the musical implications of using columns and strata of sounds taken from the larger sound object.
Ga b or
Microsound is largely founded in the work of British physicist Dennis Gabor, who argued that human hearing is badly represented by wave-oriented Fourier analysis. Again, Fourier’s theory states that arbitrary periodic waveforms can be decomposed into combinations of sinusoids of different amplitudes, frequencies, and phases (Roads 2001, 243). One premise formed the basis of Fourier’s theory: if a signal contains a single frequency, then that signal must be an infinite-length sinusoid (Roads 2001, 250). Gabor noted that although Fourier’s theory is mathematically irreproachable, “even experts could not at times conceal an uneasy feeling when it came to the physical interpretation of results obtained by the Fourier method” (Gabor 1946, 431). He elaborated:
The reason is that the Fourier-integral method considers phenomena in an infinite interval, sub specie aeternitatis, and this is very far from our everyday point of view. Fourier’s theorem makes of description in time and description by the spectrum, two mutually exclusive methods. If the term “frequency” is used in the strict mathematical sense which applies only to infinite wave-trains, a “changing frequency” becomes a contradiction in-terms, as it is a statement involving both time and frequency (Gabor 1946, 431).
[T]he ear analyses the sound into its spectral components, and our sensations are made up of the Fourier components. But Fourier analysis is a timeless description, in terms of exactly periodic waves of infinite duration. On the other hand, it is our most elementary experience that sound has a time pattern as well as a frequency pattern (Gabor 1947, 591).
In other words, if the concept of frequency can refer only to infinitely long signals, then the concept of changing frequency is impossible (Roads 2001, 250). Gabor stressed that time analysis and “timeless” Fourier (i.e. frequency) analysis exhibited only the extreme cases of describing a signal, that “both views have their limitations, and they are complementary rather than mutually exclusive” (Gabor 1946, 431). Gabor provided a solution to this problem by representing sound with a combination of time and frequency, two previously separated dimensions (Roads 2001, 58). He manifested this combination in the notion of the acoustical quantum (Roads 2001, vii). While a signal can be considered as the sum of its sinusoidal components of different frequencies, amplitudes, and phases, Gabor noted, the components do not have to be considered as infinite-length sinusoids (Gabor 1947, 592). He proposed that any sound could be decomposed into acoustical quanta bounded by discrete units of time and frequency (Roads vii). Gabor represented acoustical quanta mathematically by relating a time domain signal s ( t) to a frequency domain spectrum S (f). To obtain a single acoustical quantum, Gabor mapped an energy function from s( t) over an “effective duration” !t into an energy function from S (f) over an “effective frequency width” !f (Gabor 1947, 591). He stressed the motivation behind acoustical quanta in his 1947 paper, writing:
The ultimate reason for the emergence of ‘acoustical quanta’ is that we have viewed the same phenomenon simultaneously from two different aspects, and described it by two ‘quantities of interest’, time and frequency (Gabor 1947, 594).
Gabor described acoustical quanta as “quanta of information” that represent “two real data,” time and frequency information, which together constitute “one complex numerical datum” (Gabor 1947, 591). That is, Gabor combined time and frequency in the acoustical quantum for the express purpose of representing the smallest amount of acoustic information perceivable. A signal, wrote Gabor, can be associated with an acoustical quantum represented by a “characteristic rectangle or cell,” an “information diagram” in which time and frequency are orthogonal coordinates (Gabor 1947, 591). This cell is represented mathematically by the inequality !t!f " 1, which implies that its area must be at least of the order unity and highlights the time-frequency uncertainty principle. Based on the fact that an effective duration !t " (1/ f ) is required to encompass one full cycle of a signal of f Hz, the uncertainty principle states that frequency resolution is inversely related to time resolution— the shorter the duration of a signal is, the less certain one can be about its frequency. Gabor argued that there are “elementary signals” for which the inequality above turns into an equality, and that such an equality can represent a single acoustical quantum (Gabor 1947, 591). Gabor’s “information diagram” of the acoustical quantum is shown below:
Effective frequency width Characteristic cell associated with signal
t Fig. I.a: Gabor’s Information Diagram (Gabor 1947, 591).1
Interestingly, in his 1947 paper, Gabor depicts sound with frequency on the x-axis and time on the y-axis.
Gabor argued that the “quantum of sound” is a concept of “considerable physiological significance,” as evidence regarding the human threshold of discrimination for “epoch and frequency differences” had shown that “the ear possesses a threshold area of discrimination of the order unity” (Gabor 1947, 591). In other words, the ear “can just about discriminate” one acoustical quantum described by the area !t !f =1 (Gabor 1947, 593). The acoustical quantum can thus represent a minimum amount of acoustical information, an elementary signal with unchanging amplitude2 and an epoch just large enough to encompass one cycle of a single frequency, a sine wave. Acoustical quanta can be represented by elementary signals with oscillations at any audible frequency f and amplitudes modulated by finite duration envelopes (Roads 2001, 57-58). These elementary, low information signals can represent any arbitrary, relatively high information signal if their information area is expanded into unit cells, each with a geometry !t !f and a particular amplitude factor (Roads 2001, 58). These cells comprise the Gabor matrix, displayed below with time on the x-axis and frequency on the y-axis, and amplitude factors of the cells indicated by the size of dark circles within them:
!t !f Acoustical Quanta
f t Fig. I.b: The Gabor Matrix (Gabor 1947, 592; Roads 2001, 60).
The Gabor matrix is analogous to a projection of the time-frequency plane onto a spectrogram, which represents sound as a two-dimensional display of time versus “frequency + amplitude” (Roads 2001, 263).3 According to Gabor, this analytical method surrounding his matrix “contains ‘time language’ and ‘frequency language’ as special extreme cases,” and that only an infinite number of matrix cells in time direction yields Fourier analysis (Gabor 1947, 592). In an experiment related to his matrix, Gabor constructed what he called the Kinematical Frequency Convertor, a sound “granulator” that enabled the changing of sound’s pitch without affecting its duration, and vice versa (Roads 2001, 61). According to Curtis Roads, Gabor’s machine conducted a “time-granulation of recorded sounds” (Roads 2001, 61).4 Gabor’s granulator essentially operated on the principle of sound sampling (i.e. the cutting or temporal separation of sounds from others) and employed a rotating, sampling head that spun across a tape recording of sound. As the head would only contact the tape for a brief period, it effectively sampled the sound on the tape at regular intervals. These sampled segments were analogous to what Xenakis would later refer to in a musical context as grains of sound (Roads 2001, 61). If reassembled into a continuous stream in the same time order in which they were granulated from the sound recording, Gabor’s sampled segments would resynthesize the original recording upon playback.5 Gabor was able to change the duration of a sound recording while preserving its
A changing amplitude conveys more acoustic information than a constant amplitude. In his 1947 paper, Gabor wrote that spectrograms, or “sound portraits,” unlike his “matrices of elementary signals,” are “not complete representations of the original sound, though they contain most of its subjectively important features” (Gabor 1947, 592). The spectrogram is revisited in Part II. 4 This hyphenation proves redundant when one considers Roads’ definition of granulation as “a purely time domain operation” (Roads 2001, 188). The inclination of Roads to use the redundancy may support the case for a stronger disambiguation of the terms granulation and frequency domain granulation, discussed soon. 5 This procedure is equivalent to brassage, discussed later in Part I.
frequency content by adjusting the rotation speed of the sampling head upon granulation; he changed the frequency content of a sound while preserving its duration by adjusting the playback rate of the original tape recording before adjusting the rotation speed of the sampling head upon granulation (Roads 2001, 61-62). It is important to note that while Gabor’s sound granulator allowed for the independent temporal (i.e. time-stretching) and spectral (i.e. pitch-shifting) manipulation of a sound, it did not permit the separation of a sound into frequency components as it did its segmentation via granulation into time components, or grains. As discussed soon, not until the introduction of the fast Fourier transform (FFT) in the 1960s would the efficient separation of a sound into its constituent frequency components be possible (Cooley, Tukey 1965). 1.2 X en a k is
In the late 1940s, Xenakis read Gabor’s 1947 paper and began to extend his theories of sound into music (Robindoré 1996, 11). Drawing from Gabor’s scientific notion of acoustical quanta, Xenakis coined the term “grains of sound” and was the first composer to explicate a compositional theory for them (Roads 2001, 65). Xenakis strongly believed the arts and sciences to be mutually beneficial fields of study and constantly kept aware of scientific advances to inform his work as a composer. While the oeuvre of Xenakis is by no means limited to the musical extension of Gabor’s work, this thesis focuses on his particular writings and music that are. Xenakis described all sounds as the sum of “pure sounds,” or acoustical quanta:
Hecatombs of pure sounds [i.e. acoustical quanta] are necessary for the creation of a complex sound. A complex sound may be imagined as the creation of a multi-colored firework in which each point of light appears and instantaneously disappears against a black sky. But in this firework there would be such a quantity of points of light organized in such a way that their rapid and teeming succession would create forms and spirals, slowly unfolding, or conversely, brief explosions setting the whole sky aflame. A line of light would be created by a sufficiently large multitude of points appearing and disappearing instantaneously (Xenakis 1992, 43-44).
Xenakis stated that all sounds could be constructed granularly, thus proposing the idea of granular synthesis (Xenakis 1992, 47). He claimed that clusters, or clouds, of grains can be manipulated to produce “not only the sounds of classical instruments and elastic bodies, and those sounds generally preferred in concrete music, but also sonic perturbations with evolutions, unparalleled and unimaginable until now” (Xenakis 1992, 47). The next section demonstrates how Xenakis generalized the acoustical quantum, the most restricted unit of sound perceivable by the human ear, into the less restricted grain—far less so in terms of its capacity for acoustic information. While the acoustical quantum is limited explicitly both in terms of frequency (i.e. it has a single, constant frequency) and time (i.e. its duration is equal to the inverse of its frequency), the grain as extracted from a sample via granulation is limited explicitly in terms of time, but only implicitly in terms of frequency.6 Xenakis implements his generalized concept of the grain in Concret PH, and he plays with the idea of the more restricted acoustical quantum in Analogique B, composed one year later (1959). The former piece exhibits granulation, the latter granular synthesis.7 Concret PH premiered at the Philips Pavilion during the 1958 World’s Fair in Brussels. After cutting magnetic tape recordings of burning wood embers into fragments ranging from a few hundredths to a few thousands of a second in duration (Di Scipio 1997, 168), Xenakis manipulated them in the studio of the GRM in Paris to obtain a granular texture for the piece (Roads 2001, 64). Xenakis composed Concret PH using numerous pieces of granulated tape, which he then manipulated
At the lower extremes of time limitations, frequency limitations are indeed imposed upon grains. This is due to the uncertainty principle: with short enough durations, lower frequencies of grains cease to be perceivable. 7 While granular synthesis encompasses granulation as one of its various forms, granular synthesis implies the use of synthetic waveforms to compose new, complex sounds. Granulation implies the decomposition of complex, recorded sounds (Roads 189). Thus, while both granular synthesis and granulation are related through the brief timescale of the grains they employ and are both forms of microsound, there is a significant distinction between them. This point is revisited later in Part I.
through tape speed changes before mixing them in time to obtain grain “clouds” of varying densities (Roads 2001, 65). As mentioned, Xenakis claimed that he composed Concret PH “in defiance of the usual manner of working with concrète sounds,” implying that the sound object blocks of musique concrète imposed certain compositional limitations (Solomos 1997). The defiance to which Xenakis referred is present in the very grains of Concret PH, each of which represents a time fragment of the sound blocks with which he was dissatisfied. Still, Xenakis obtained his grains simply by applying to the sound object the same operation with which Schaeffer had obtained the sound object from the sound recording—that of cutting it down into smaller time fragments using tape-cutting techniques, of separating sounds from others in terms of time. This point is important, as it portrays the microsonic grain as a direct response to the sound object, and its procurement (i.e. granulation) as an extension of the same, temporal sound-separating technique of musique concrète. Part II and Part III of this thesis call for more attention to be given to an alternative response to the bulky “block of one kind of sound” that is the sound object (Solomos 1996), namely to what has been referred to (albeit awkwardly, as discussed soon) as “frequency domain granulation”— decomposing a sound object into frequency components as opposed to time components (Roads 2001, xi). As Xenakis himself wrote, “[a]ll sound, even continuous musical variation, is conceived as an assemblage of a large number of elementary sounds adequately disposed in time,” and his intent in Concret PH was to expose the sound object for what it was on a micro level: a mass of many sounds over time. However, by using the same tape-cutting techniques of musique concrète, Xenakis could only break down the sound object into smaller masses of sounds—time-frequency columns of sounds instead of time-frequency rows of sounds, and certainly not atomic acoustical quanta. Contrary to the intuitively composed Concret PH, Xenakis’ Analogique B was largely written algorithmically. Premiered in 1959, Analogique B employed a more systematic approach to working with sound grains, which came much closer to acoustical quanta than those in Concret PH. Rather than using recorded complex sounds as in Concret PH, Xenakis granulated analog tape recordings of sine tones he produced with an analog tone generator (Roads 2001, 65). These granulated sine tones (i.e. frequency-constant) began as virtual equivalents to acoustical quanta until Xenakis modified them to glissando (i.e. shift in frequency), thus generalizing their connection to Gabor’s concept of the frequency-constant acoustical quantum. Analogique B is a brief composition for tape intended by Xenakis to be played after the slightly longer instrumental work Analogique A; both pieces were composed using stochastic algorithms. In Analogique B, Xenakis scattered grains onto what he referred to as screens (Roads 2001, 92), time-grids representing elementary sonic quanta in three dimensions: difference thresholds in frequency, amplitude, and time (Roads 2001, 65). Xenakis synchronized screens with an advancing time interval !t, combining them into a sequence that he called a book (Roads 2001, 65-66). Each screen can thus be considered as a “snapshot of sound bounded by frequency and amplitude grids,” subdivided into elementary squares of sonic energy like the Gabor matrix (Roads 2001, 65-66). A book of screens, according to Xenakis, “equals the life of a complex sound” (Xenakis 1992, 51). These concepts are revisited and depicted visually in the next section. 1.3 Ro a ds & Tr u ax
American composer and computer scientist Curtis Roads met Iannis Xenakis in 1972 at Indiana University during a workshop led there by Xenakis based on his book Formalized Music. Although Xenakis presented the first form of granular synthesis in Analogique B (1959), he had accomplished this only through tedious tape-cutting and tape-splicing techniques. Roads claims it was Xenakis who started him on the path that would eventually lead to the first implementation of a digital granular synthesis engine, and that granular synthesis largely remained a theoretical topic at the 1972 workshop. While Xenakis himself had not yet realized granular synthesis with a computer, he had envisioned its potential value at several points within Formalized Music, and realized he was hindered
by a certain poverty of technology as he developed microsound (Xenakis 1992, 103, 244, 270).8 Roads’ digital implementations of granular synthesis catalyzed an explosion of granular techniques in the 1970s and 1980s still being explored today. Roads describes his contemporary Barry Truax as “the pioneer of real-time granulation and its most inveterate exponent” (Roads 2001, 192). Truax, a Canadian composer, was the first person to implement the real-time granulation of a sampled sound (Truax 1986), and in 1990, he achieved a “technical breakthrough” that allowed him to perform real-time granulation on the live sound of an instrumentalist (Roads 2001, 112). Truax was particularly interested in using granulation to reveal what he has referred to as the “inner complexity” of natural sounds (Truax 1994). By time-stretching grains of sampled sound, Truax strove to provide “a unique way to experience the inner structure of timbre” by drawing the listener into the sound (Truax 1994, 47). His process entailed lengthening grains without altering their pitch with the intent to yield rich textures from extremely small fragments of source material (Truax 1994, 38). As a sound is progressively stretched, Truax noted, one becomes less aware of its temporal envelope and more aware of its timbral characteristics (Truax 1994, 44). Truax’s interest in the “inner complexity” of sound is revisited later in Part I.
This section addresses the theory of microsound and begins with a closer look at the grain, particularly as it relates to the acoustical quantum. It then distinguishes between the two main forms of granular synthesis; the first of which, also referred to as granular synthesis, employs synthetically generated grains, the second of which employs sampled sound that is subjected to granulation. While both forms of granular synthesis deal with sounds on the microsound timescale, they represent the construction and deconstruction of sounds respectively, and thus have quite different compositional implications. As this thesis is primarily concerned with the deconstruction of sounds—with the separation of sounds from others—we focus on microsonic granulation. 2.1 X en a k is Gra i ns & G ab or Gr a in s
Any discussion of microsound theory should commence with an anatomical description of the grain. We can look first to Roads, who writes:
A grain of sound is a brief microacoustic event, with a duration near the threshold of human auditory perception, typically between one thousandth of a second and one tenth of a second (from 1 to 100 ms). Each grain contains a waveform shaped by an amplitude envelope…. A single grain serves as the building block for sound objects. By combining thousands of grains over time, we can create animated sonic atmospheres. The grain is an apt representation of musical sound because it captures two perceptual dimensions: time domain information (starting time, duration, [amplitude] envelope shape) and frequency domain information (the pitch of the waveform within the grain and the spectrum of the grain) (Roads 2001, 86-87).
Roads depicts the grain with a time domain perspective (i.e. with amplitude on the y-axis) below:
Fig. I.c: Time Domain Depiction of the Grain (Roads 2001, 87). Xenakis did later create his GENDYN system in the 1990s, a software engine for the direct digital synthesis of sound, which led to three compositions: GENDY301, GENDY3, and S709 (Thomson 2004, 210-211).
Curtis Roads’ description of the grain evolves from Xenakis’:
All sound is an integration of grains, of elementary sonic particles, of sonic quanta. Each of these elementary grains has a threefold nature: duration, frequency, and intensity. All sound, even continuous musical variation, is conceived as an assemblage of a large number of elementary sounds adequately disposed in time. So every sonic complex can be analyzed as a series of pure sinusoidal sounds even if the variations of these sinusoidal sounds are infinitely close, short, and complex. In the attack, body, and decline of a complex sound, thousands of pure sounds appear in a more or less short interval of time, !t (Xenakis 1992, 43).
While both descriptions of grains above address their threefold nature—time, frequency, and intensity (i.e. amplitude)—only Xenakis’ portrays them as “pure sinusoidal sounds” (Xenakis 1992, 43). This discrepancy can be attributed to Xenakis himself, who later in the same text describes the grain differently from how he originally defined it—as a very brief unit of sound that can fluctuate in frequency and amplitude (i.e. any very brief unit of sound, including complex, non-sinusoidal ones):
The particular case of the grain occurs…when [its] frequency…is constant. In general, the frequencies and intensities of the grains can be variable and the grain a very short glissando (Xenakis 1992, 55).
With this statement, Xenakis radically generalized what he referred to as “the particular case of the grain,” or a “grain of the Gabor type” (Xenakis 1992, 55, 103), and what Roads refers to as the Gabor grain. The Gabor grain, an extremely brief sound constant in both frequency f and amplitude g (i.e. sinusoidal) lasting a duration of 1/f, is equivalent to an acoustical quantum (Roads 2001, 66). The Gabor grain/acoustical quantum is thus explicitly restricted in terms of time, frequency, and amplitude. Xenakis generalized the Gabor grain by preserving its explicit time restriction and removing its explicit frequency and amplitude restrictions. Let us refer to the acoustical quantum as generalized by Xenakis more specifically as the Xenakis grain in order to stress its distinction from the Gabor grain. If we depict the Xenakis grain and Gabor grain not from the time perspective offered by Roads above, but a more illuminating timefrequency perspective offered by the Gabor matrix, this distinction becomes quite clear:
f Xenakis Grain f Gabor Matrix f t Gabor Grain (i.e. Acoustical Quantum) t
Fig. I.d: Frequency Domain Depiction of the Xenakis Grain & the Gabor Grain/Acoustical Quantum.
The Xenakis grain can thus be considered “molecular” relative to the “atomic” Gabor grain. The Xenakis grain is describable in terms of the Gabor grain, but not vice versa; the Xenakis grain is a chain of Gabor grains in the frequency direction. That is, while a Gabor grain can be considered as a particular type of Xenakis grain, the Xenakis grain cannot be considered as a particular type of Gabor grain. These points are important, as granulation entails the decomposition of sound along the time axis into potentially spectrally rich Xenakis grains, but not necessarily along the frequency-axis into spectrally minimal Gabor grains. For the latter to be extracted from a complex sound, the short-time Fourier transform (STFT) is needed. The Fourier transform (FT), discussed in the Preface, is a mathematical procedure that represents arbitrary periodic waveforms as an infinite Fourier series summation of sinusoidal waveforms, each at a different frequency, amplitude, and initial phase (Roads 1995, 1076). That is, the FT maps an input signal represented as time versus amplitude into a corresponding spectrum representation, which displays the signal as time versus frequency (as in the Gabor matrix above). Until the introduction of computers in the 1940s, the computation of the FT was “a tedious and
error-prone task of manual calculation” (Roads 1995, 1076). Even the first digital implementations of the FT in the 1940s required vast amounts of time, and it was not until the 1960s that the FFT, introduced by James Cooley at Princeton University and John Tukey at Bell Telephone Laboratories (Cooley, Tukey 1965) that the vast amount of calculations and computer time required for Fourier analysis were greatly reduced (Roads 1995, 1076). The computationally efficient FFT is at the heart of the STFT. Roads refers to the latter as the “practical” form of Fourier analysis, as it adapts Fourier analysis to the application of finite-duration signals rather than infinite-duration ones (Roads 1995, 148). The STFT places time windows upon the input signal, breaking it down into “short-time” segments (i.e. bounded in time); this process is referred to as windowing (Roads 1995, 550). Windowing the signal, then, breaks it into a series of short-time segments that are shaped in amplitude and bounded in time according to a given window function (Roads 2001, 246).9 The duration of the window for STFT audio applications is usually on the microsound timescale: between 1 ms and 100 ms (Road 2001, 246). Following the windowing process, the STFT analyzes each of the windowed, “short-time” segments using a particular implementation of the FT called the discrete Fourier transform (DFT), which yields a discrete-frequency spectrum, “a measure of energy at a set of specific, equally spaced frequencies” (Roads 1995, 551). Roads notes that the DFT can be implemented efficiently within the FFT, and that the STFT, for practical purposes, most often uses the FFT for its analyses (Roads 1995, 551). The FFT generates a block of data for each “short-time” segment called a frame, which contains two spectra (Roads 2001, 247). The first is a magnitude spectrum depicting the amplitude of every analyzed frequency component (Roads 2001, 247). The second is a phase spectrum indicating the initial phase value for every frequency component in the input signal. In summary, the STFT windows an input signal before applying a discrete-frequency FFT to each of its windowed segments, resulting in a series of frames that constitute a time-varying spectrum (Roads 1995, 551). The STFT permits the analysis, transformation, and, in conjunction with the inverse short-time Fourier transform (ISTFT), the resynthesis of sounds (Roads 1995, 172). That is, the ISTFT maps a spectrum representation of sound as time versus frequency into an output signal as time versus amplitude, i.e. to construct an audible signal from its spectral information. The STFT and ISTFT are at the core of the phase vocoder (PV), a prevalent sound analysis tool that allows for the timecompression and time-stretching of input sounds independent of pitch-shifting and vice versa (Roads 1995, 1094). Thus, it resembles Gabor’s granulator discussed earlier in Part I. Unlike Gabor’s granulator, however, certain implementations of the PV permit the extraction of specific sinusoidal frequency components from a complex signal, and therefore the removal of a single, sinusoidal Gabor grain. It deserves reiteration that while Xenakis could compose new sounds with Gabor grains that he created using analog sine wave generators, he could not decompose existing sounds into Gabor grains using tape-cutting techniques. For the latter, he would have needed the STFT. To Xenakis, both the single-frequency “grain of the Gabor type” and his generalized multiplefrequency grain constituted microsounds, as both are necessarily extremely brief. Yet only the Gabor grain is necessarily micro in terms of its frequency content. That is, while Xenakis’ generalized concept of the grain ceases to be a grain once its duration surpasses a certain threshold—regardless of the amount of frequency domain information it may contain—the Gabor grain ceases to exist when its duration surpasses a certain threshold or when it encompasses anything more than a single sine wave. Given the “ultimate reason for the emergence” of the acoustical quantum (i.e. Gabor grain), which was to represent the smallest perceptible amount of acoustic information via time and frequency restrictions, Xenakis’ generalization of the Gabor grain is evidently biased toward its time restriction. This generalization has rendered microsound micro only necessarily in the durations, not the frequency content, of the grains it employs relative to the sound objects from which they are extracted. Part II and Part III of this thesis address the musical implications of using sound materials that are micro in terms of their frequency content, but not their durations, relative to the sound objects from which they are extracted using a tool incorporating the STFT.
This operation is analogous to time domain granulation, discussed further soon.
Ce lls, S cre e ns, & B oo ks
A closer consideration of Analogique B helps to clarify Xenakis’ generalization of the Gabor grain into the Xenakis grain. Again, central to Analogique B are what Xenakis called screens, which (as displayed below) represent acoustical quanta in three dimensions: time, frequency, and amplitude (Roads 2001, 65). Each screen is divided into amplitude and frequency cells and serves as a “snapshot of a microsound” that “represents a Gabor matrix at a specific moment” (Roads 1002, 92). “Like the frames of a film,” Roads notes, “a synchronous sequence of screens constitutes the evolution of a complex sound” (Roads 2001, 92). Xenakis referred to such a synchronous sequence of screens constituting an entire complex sound as a book, and the screens can be thought of as the pages of a book. The thickness of a screen, or page, represents a time duration !t. Each screen contains cells whose height represents an amplitude change !g and width represents a frequency change !f, and any given cell may or may not contain grains.10 Xenakis’ book of screens and a three-dimensional view of an individual screen cell are displayed below to the left and right respectively:
Cell full of grains
Fig. I.e: Xenakis’ Book of Screens (above left) & Three-Dimensional Screen Cell with Gabor Grains (above right) (Xenakis 1992, 50-51).
In Analogique B, instead of decomposing audio recordings of complex sounds as in Concret PH, Xenakis used elementary sounds resembling acoustical quanta—sliced magnetic tape recordings of sine waves—and employed stochastic algorithms to distribute them onto screens in order to create complex sounds. Xenakis notes that the grains displayed in the left image above are only artificially flattened on the screen, that a grain cloud depicted by a cell full of grains “exists in the thickness of time !t” (Xenakis 1992, 51). This time thickness !t is depicted in the above right image, in which frequency- and amplitude-constant grains are associated with vectors parallel to the time axis. Xenakis’ screen cells should not be confused with Gabor matrix cells. The former are threedimensional, with frequency on the x-axis, amplitude on the y-axis, and time on the z-axis, while the latter are two-dimensional, with time on the x-axis and frequency on the y-axis (and amplitude indicated by a circle size or shade of color). However, the grains depicted above in the threedimensional screen cell to the right, their constant frequency and amplitude portrayed by the parallelism of their vectors to the axis of time, are equivalent to time-restricted, frequency- and amplitude-constant (i.e. frequency- and amplitude-restricted) Gabor grains. Again, Xenakis refers to the Gabor grain or acoustical quantum as a “particular case of the grain” and a grain “of the Gabor type,” generalizing it to serve his own musical tastes: “[i]n general, the frequencies and intensities of the grains can be variable and the grain a very short glissando” (Xenakis 1992, 55, 103).11 In Analogique B, Xenakis cut magnetic tape recordings of sine tones to produce sounds on the microsound timescale with constant frequencies and amplitudes, sounds analogous to Gabor grains or acoustical quanta. However, once he manipulated granular frequency and amplitude to the extent that neither was constant within a given grain, the Xenakis grain became
Only certain frequencies are present at any given moment within a complex sound. Those familiar with Xenakis’ music are likely aware of his fondness for glissando.
distinct from the Gabor grain. Xenakis grains, whose vectors relate fluctuations in frequency and amplitude, are depicted in the image below:
Fig. I.f: Three-Dimensional & Two-Dimensional Screen Cells with Xenakis Grains (Xenakis 1992, 55).
British composer Trevor Wishart offers yet another definition of the grain, which, unlike Xenakis’ or Roads’, distinguishes it from the acoustical quantum as the term “Xenakis grain” has been proposed to do:
[A grain is a sound] having a duration which is long enough for spectral properties to be perceived but not long enough for time-fluctuation of properties to be perceived (Wishart 1994a, 120).
For Wishart, then, “[e]xtremely short time frames of the order of 0.0001 seconds,” sounds of durations just below the microsound timescale, have no perceptual significance at all” (Wishart 1994a, 16). Wishart notes that each sample12 in a digital representation of a sound corresponds to a duration close to .1 ms, and that, even though “every digitally recorded sound is made out of nothing but samples, the individual sample can tell us nothing about the sound of which it is a part,” that is, through either its time domain or frequency domain information (Wishart 1994a, 16).13 Each such sample, in accordance with the time-frequency uncertainty principle and as Wishart points out, would be perceived as a “broad-band click of a certain loudness,” (Wishart 1994a, 16). The spectral characteristics of each sample in a digital representation of sound are no longer perceptible due to the great brevity of each sample. 2.3 Gra i n D i vi si b il it y
A single wavelength of a sound close to the duration of the aforementioned sample, which Gabor refers to as the acoustical quantum, Roads as the Gabor grain, and Xenakis as the “grain of the Gabor type,” Wishart refers to as the wavecycle (Wishart 1994a, 17). Unlike Roads and Xenakis, however, Wishart makes a clear distinction between the wavecycle and the grain. In making this distinction, though, Wishart gives more emphasis to the difference in timescale between a wavecycle and a grain, the latter of which can itself contain many wavecycles (i.e. acoustical quanta) over time as long as it lasts under about 100 ms, than he does to the difference in the spectral capacity—the capacity for multiple frequency components, or acoustic information—between the two. The wavecycle and grain, according to Wishart, respectively constitute two increasingly long timeframes that precede a third timeframe of sound he calls continuation, discussed soon (Wishart 1994a, 18). As “[t]he first significant object from a musical point of view,” the wavecycle only contributes with other wavecycles to the distinctive characteristics of larger sounds:
The shape and duration of the wavecycle will help to determine the properties (the spectrum and pitch) of the sound of which it is a part. But a single wavecycle is not sufficient on its own to determine these properties (Wishart 1994a, 17).
“Sample” here is not to be confused with a sample on the sound object timescale. How much might a single sinusoid of a sound be able to tell us about the sound of which it is a part?
This statement may seem to contradict Gabor’s assertion that “the best ears in the optimum frequency range can just about discriminate one acoustical quantum [i.e. one wavecycle],” but Gabor, like Wishart, realized that such boundaries of perception are ambiguous and not completely clear-cut (Gabor 1947, 593; Wishart 1994a, 18). Nevertheless, it is important to note that, regardless of whether a single wavecycle can be heard as a certain frequency over time, it can be said with certainty that multiple wavecycles along the axis of frequency are needed to determine the spectral, or timbral properties, of any complex sound. In contrast to the wavecycle or Gabor grain, the Xenakis grain, itself a sound comprised of multiple wavecycles over frequency, has a spectrum or timbre of its own that is determined by these multiple wavecycles. So while he explicitly states that the wavecycle and grain differ in their timescale, that “[t]he boundary between the wavecycle timeframe and the grain timeframe is of great importance,” Wishart does implicitly state that they differ spectrally; the spectrally and timbrally minimal wavecycle only contributes, that is, with other wavecycles, to the spectral characteristics of a larger sound such as the Xenakis grain (Wishart 1994a, 17). He writes:
Once we can perceive distinctive qualitative characteristics in a sound, we have a grain (Wishart 1994a, 17).
In other words, these “distinctive qualitative characteristics” can be contained in the potentially rich spectrum of the Xenakis grain, but not the minimal spectrum of the wavecycle (i.e. Gabor grain). Thus, two essential differences exist between the wavecycle/acoustical quantum/Gabor grain and the Xenakis grain: one of timescale (less drastic), and one of spectral richness (much more drastic). As it differs from the wavecycle, the Xenakis grain also differs from larger sound structures, such as the sound object of musique concrète, but only in timescale. Wishart notes that, because of a difference in timescale, “we cannot perceive any resolvable internal structure” in the Xenakis grain as we can in the longer sound (Wishart 1994, 17). However, the sound of a grain “presents itself to us as an indivisible [emphasis added] unit with definite qualities such as pitch, spectral contour, onset characteristics (hard-edged, soft-edged), pitchy/noisy/gritty quality etc.” (Wishart 1994a, 17). Just as the longer sound has definite qualities such as pitch, spectral contour, etc., so does the Xenakis grain. They may differ in their timescale, but not necessarily in their spectral richness. Nevertheless, Wishart continues to stress the “indivisibility” of the (Xenakis) grain:
Although the internal structure of sounds is the cause of what we hear, we do not resolve this internal structure in our perception. The experience of a [Xenakis] grain is indivisible (Wishart 1994a, 17).
It seems that Wishart is upholding a temporal indivisibility of the Xenakis grain, however, and not a spectral indivisibility. He writes:
Longer sound events can often be described in terms of an onset or attack event and a continuation. The onset usually has the timescale and hence the indivisibility and qualitative unity of a grain…. But if the sound persists beyond a new time limit (around .05 seconds) we have enough information to detect its temporal evolution, we become aware of movements of pitch or loudness, or evolution of the spectrum. The sound is no longer an indivisible grain: we have reached the sphere of Continuation (Wishart 1994a, 51, 52).
Thus, according to Wishart, it is when the Xenakis grain is lengthened to a great enough duration, one within the timescale of “continuation,” that it becomes divisible (Wishart 1994, 52). Continuation, for Wishart, comprises the third timescale of sound after the wavecycle and grain. While the Xenakis grain, lacking continuation, can be considered temporally indivisible in that any smaller time slice of that grain would render its distinctive qualitative (i.e. spectral) characteristics imperceptible in accordance with the time-frequency uncertainty principle, it cannot be considered spectrally indivisible. This is precisely because it, by Wishart’s definition at least, possesses such distinctive qualitative characteristics, which themselves are no less divisible than the notes perceived in a chord. As Wishart writes later in the same text:
[T]here are very subtle parameters in the spectral character of a grain which we cannot “hear out” but which are fundamental to our qualitative perception (Wishart 1994a, 78).
We may not be able to “hear out” the spectral subtleties of a Xenakis grain, perhaps both because of its brevity and because its spectral subtleties occur more or less simultaneously, but this does not mean that these subtleties are inseparable. As discussed in Part II and Part III, there are subtle parameters in the spectral character of grains and sound objects that we cannot “hear out” unless they are separated from others within the grain or sound object, that is, in terms of frequency. The grain may be temporally indivisible—make it any shorter in time and its spectrum may be distorted due to the time-frequency uncertainty principle—but it is not, like the sound object, spectrally indivisible. The significance of the distinction between the Xenakis grain and the Gabor grain lies in the difference in the amount of information content they both can hold. Indeed, the microsonic grain can be macro in terms of its acoustic information content, specifically its spectral information content. The reader may consider a 1999 psychological study (Schellenberg et al., 1999), which found that listeners were able to identify brief excerpts from popular music recordings by matching 200 ms or 100 ms (i.e. microsound timescale) excerpts with song titles and artists. The study found that listener performance was well above chance for the 200 ms excerpts and poorer but still above chance for the 100 ms excerpts (i.e. grains). Although it did not find that the recognition of popular songs typically occurs in 100 ms, the study provided “unequivocal evidence” that 100 ms excerpts of these songs can contain enough information necessary their identification (Schellenberg et al., 1999). This finding shows that 100 ms excerpts of complex sounds—Xenakis grains—are indeed perceptibly divisible in terms of their potentially vast spectral information content. It is clear that the Xenakis grain, more through its potentially vast amount of spectral information than its timescale, is a drastic generalization of the acoustical quantum. The Xenakis grain, then, can be considered as one of the arbitrary signals that can be expanded in terms of acoustical quanta, but not an acoustical quantum itself. Part II and Part III of this thesis consider musical units that are, like the Xenakis grain, micro units of larger sounds. In terms of their capacity for acoustic information, they may be said to generalize the notion of the acoustical quantum no less drastically than does the Xenakis grain.
Granular Synthesis & Granulation
Wishart defines granular synthesis in the following:
A process, almost identical to granular reconstruction, in which very brief sound elements [i.e. grains] are generated [emphasis added] with particular (time-varying) properties and particular (time-varying) density, to create a new sound (Wishart 1994a, 120).
Granular reconstruction he defines as:
A procedure which chops sound into a number of segments [i.e. grains] and then redistributes these in a texture of definable density (Wishart 1994a, 120).
Thus, the difference between granular synthesis and granular reconstruction—a variant of brassage, discussed below—is that the former implies the use of synthetically generated grains, and the latter the use of grains extracted from sampled sound. It should be noted however, that granular synthesis is also used as a broader, catchall term to describe any form of sound synthesis that employs grains, either generated synthetically or extracted from sampled sound. Granular reconstruction and brassage are both particular cases of granulation:
To granulate means to segment a sound signal into tiny grains…. Granulation is a purely time domain operation (Roads 2001, 187-188).
That is, granulation separates smaller sounds from the larger sample purely in terms of time. 14 Granulation has also been referred to as granular sampling, particularly by composer Cort Lippe. He
While granulation does separate grains from a sound object purely in terms of time, exception is taken to Roads’ statement that granulation “is a purely time domain operation” (Roads 2001, 188). If used to extract
offers a clear distinction between granular synthesis and granular sampling, or granulation; while granular sampling can be considered “as musique concrète,” as it makes explicit use of the sound object, granular synthesis cannot. Lippe writes:
While granular synthesis and granular sampling are variants of the same [microsonic] technique, their musical essences lie at opposite poles of the electronic music paradigm. One is immediately confronted, historically speaking, with the two main categories of electronic music: granular synthesis is Elektronische Musik, making use of purely synthetic sounds, while granular sampling is part of the world of musique concrète in which recorded sounds are manipulated and transformed. As the Canadian composer, Jean Piché, has suggested, granular sampling is an “input dependent” technique. Thus, using granular techniques on sampled sounds offers a level of musical implication which does not exist in granular synthesis: one is acting on and transforming a pre-existing sound object (Lippe 1994, 4).
The Xenakis grain and Gabor grain can both be created synthetically with a digital synthesis instrument, but only a Xenakis grain can be extracted from a complex sound granular sampling. As mentioned, for a Gabor grain to be extracted from a complex sound, the STFT is needed to separate individual sinusoidal frequency components. Part II and Part III of this thesis are concerned only with the use of sampled sounds and their deconstruction and reconstruction in composition, so we will ignore the use of synthetically generated grains here. Nevertheless, organizational principles related to grains obtained via granulation largely apply to synthetically generated grains. 3.1 T im e Do m a in Gr an ul a ti on
The following subsections address time domain granulation, equivalent to granulation or granular sampling discussed above, and what Roads refers to as frequency domain granulation (Roads 2001, xi).15 Although both operations entail a separation of smaller sounds from a larger sound object,16 they do so along two different dimensions: time and frequency respectively. Part II and Part III of this thesis assert that frequency domain granulation deserves closer attention as a way to procure musical materials suited for particular musical techniques. First we will discuss the basic principles behind time domain granulation and frequency domain granulation, in turn. The basic process of granulation is demonstrated below:
Cell Sound Object Grains
f t Fig. I.h: Granulation.
Composer Herbert Brün lead the development of one of the first instances of digital granular synthesis, which he called SAWDUST (Thomson 2004, 209). Its name provides a useful metaphor for granulation: the computer serves as the saw, and the sound object, upon granulation, becomes the dust (Thomson 2004, 209). The SAWDUST system provided a means by which to organize these grains on a macro level, and, like any efficient granular synthesis engine, it allowed for the
grains with short enough durations, granulation may render lower frequencies of these grains imperceptible, thus affecting the frequency domain. 15 The term “time domain granulation” is redundant considering that the term “granulation” itself already refers to a sound separation that occurs along the time domain (Roads 2001, 188). Nevertheless, Roads himself refers to granulation as such in several instances of Microsound (Roads 2001, 61, 197, 259). He does this, of course, to distinguish it from frequency domain granulation, which occurs along the frequency domain. 16 Size here can be thought of as the amount of acoustic information content held.
“building of sounds and structures from the bottom up” (Thomson 2004, 209). As Xenakis wrote, only large numbers of grains are useful in composition:
For a macroscopic phenomenon, it is the massed total result that counts, and each time a phenomenon is to be observed, the scale relationship between observer and phenomenon must first be established…. [T]o work like architects17 on the sonic material in order to construct complex sounds and evolutions of these entities means that we must use macroscopic18 methods of analysis and construction. Microsounds and elementary grains have no importance on the scale that we have chosen. Only groups of grains and the characteristics of these groups have any meaning (Xenakis 1992, 49-50).
Various granulation parameters allow for a macro approach to granular organization. Although grain duration must be specified before the granulation process (a duration between about 1 ms and 100 ms), grain durations can be modified after granulation and can have uniform or varying durations within the microsound timescale. The duration of a grain has a profound effect on its timbre in accordance with the uncertainty principle. The greater the time resolution of a grain (i.e. the shorter its duration), the more that is lost in frequency resolution (i.e. the more broad-band its sound). Thus, a longer grain may lose some of its intrinsic timbral qualities if granulated further.19 Other parameters of grains can be modified after granulation. Grains may be subjected to timestretching or time-shrinking, although the former allows for more drastic alterations as grains are already extremely brief. Time-stretching elongates grains in time, independent of any pitch changes. As Truax notes, time-stretching allows the listener to perceive the “inner complexity” of a sound by enhancing its spectral makeup (Truax 1994, 42). He writes:
[A]s a sound is progressively stretched, one is less aware of its temporal envelope and more aware of its timbral characteristics. Ironically, with extreme stretching in time, a spectrum can be experienced psychoacoustically in the classical Fourier manner, namely as the sum of its spectral components…. With stretched sounds, one has time to refocus one’s attention on the inner spectral character of a sound, which with natural sounds is amazingly complex and musically interesting. Therefore, transitions from the original to the stretched versions provide an interesting shift from one dominant percept to another…. In general, timestretching is a unique way to bring out the inner complexity of sound (Truax 1994, 44-45).
Wishart distinguishes between granular time-stretching and spectral time-stretching. In the former, the playback sequence of granulated grains is slowed down but not the grains themselves, so that “grains become detached events in their own right” (Wishart 1994a, 57). Spectral time-stretching, in contrast, is equivalent to the kind discussed by Truax above, in which grains are slowed down so that “the internal morphology of the individual events comes to the foreground of perception” (Wishart 1994a, 57). Both are illustrated below:
Fig. I.i: Granular Time-Stretching & Spectral Time-Stretching (Wishart 1994a, 89). No architect like Xenakis, the author focuses on the musical demolition of sounds in Part II. As Thomson notes, “macrosonic” may be a preferable term (Thomson 2004, 208). 19 According to Wishart, at such a point, the grain ceases to become a grain, as it has lost its “distinctive qualitative characteristics” (Wishart 1994a, 17). Time-shrinking, or “time-contraction” as Wishart refers to it, “may force the grain separation under the minimum time limit for grain perception” (Wishart 1994a, 57).
Grain selection order concerns the playback order of extracted grains relative to their original time positions, or onset times, in the granulated sample. Brassage can be used to reproduce the original sample after granulation by preserving the sequential order (i.e. onset times) in which grains existed in the original sample. Wishart defines brassage in the following:
A procedure which chops sound into a number of segments and then resplices these together tail to head. In the simplest case sounds are selected in order, from the source, and respliced together in [that same] order. However, there are many possible variations on the procedure (Wishart 1994a, 113).
Thus, “if the segments are replaced [in time] exactly as they were cut, we will reproduce the original sound” (Wishart 1994a, 53). Selection order variations include granular reversal, in which the original order of grains is reversed, and granular reordering, in which grains are rearranged in some new order. Granular reversal or reordering can hide the original source of the sound even if all of its elements are present throughout playback, as both alter the morphology of the original. Such reordering allows for an acousmaticization process, a topic revisited later in this thesis. Lippe discusses the relative importance of selection order to granular synthesis and granulation, noting that in the former, “the concept of grains in an ordinal sense remains somewhat abstract” (Lippe 1994, 5). Although ordering synthetically generated grains in arbitrarily different ways may yield very different sounding variants, the results “remain abstract synthetic sounds” (Lippe 1994, 5). However, with granulation, the waveform of a grain is not a synthetic one, but one taken from a millisecond-length duration of a sample, or sound object. Thus, writes Lippe, an additional parameter exists for granulation (i.e. granular sampling): “onset time into the stored sound” (Lippe 1994, 5). He continues:
This additional parameter can be of primary importance in granular sampling. No longer a kind of “commutative” or arbitrary parameter, grain order may have important consequences, thus creating a hierarchy of parameters. In fact, this hierarchy may be considered implicit in the very nature of granular sampling. Using a piano note as the stored sound, if onset times descend in an ordinal fashion from high to low…, the sounding result will always be recognized as a piano note played backwards even though variants may sound quite different. Furthermore, the ability to “deconstruct” sounds via the manipulation of onset times…, moving between the boundaries of recognizability and non-recognizability on a continuum,20 is one of the principle, musically interesting characteristics of granular sampling (Lippe 1994, 5).
Reordering grains, that is, playing back grains in a new selection order, can render the original sound unrecognizable even if all of its acoustic material is present in the playback. Modification of grain selection order is demonstrated below:
Fig. I.j: Granular Reordering, Granular Reversal, & Sound Reversal (Wishart 1994b, 56-57).
Lippe’s notion of a recognizability “continuum” is considered closely in Part III.
Grain temporal pattern may be synchronous, as when streams of grains follow one another with a regular, synchronous delay period to form a longer sound (Roads 2001, 93). This form of granular synthesis can produce metric rhythms if the grains follow one another with large enough delay. A density parameter controls the frequency of grain emission, and a frequency value in hertz can represent “grains per second” (Roads 2001, 93). Metrical rhythms decelerate with lower synchronous densities and accelerate with higher densities up to a certain point. Grain temporal pattern may also be asynchronous, as when grains follow each other at unequal intervals to produce ametrical rhythms. At high enough densities, grains fuse into continuous sounds.21 The time direction of grains can be reversed through sound reversal, not to be confused with granular reversal. In granular reversal, again, the order in which the grains in the original signal is reversed. If grains are sound reversed, they are each played backwards in time. This is illustrated below (and also above to the bottom right):
Fig. I.k: Sound Reversal (Wishart 1994b, 43).
The amplitude envelope of a grain determines its attack and decay and influences its spectrum.22 Grains obtained through granulation are captured with predetermined envelopes, but these can be modified after granulation or even replaced with synthetic envelopes. Envelopes with short attacks produce percussive grains, while envelopes with long attacks produce less percussive ones. An expodec envelope features a sharp attack and an exponential decay, whereas a rexpodec envelope is characterized by a long attack and a sudden decay. Granulated concrète sounds can be perceived as sound-reversed even when they are not if given rexpodec envelopes (Roads 2001, 100). The density of a grain cloud relates the number of grains per second that occur within it (Roads 2001, 105). This parameter can only describe the cloud texture, however, if it is linked with grain duration. For example, a 1 s cloud containing twenty 100 ms grains, Roads notes, is “continuous and opaque,” while a 1 s cloud containing twenty 1 ms grains is “sparse and transparent” (Roads 2001, 105). Roads refers to the difference between these two clouds as their fill factor, the product of their grain densities and durations. In more complex clouds with varying grain densities and durations, their average grain density and duration is used to derive the fill factor (Roads 2001, 105). Spatialization is a particularly interesting parameter of granulation, as every grain can be directed to a particular spatial location in a multichannel speaker setup. This location can remain fixed or move according to a trajectory if the duration of the grain is long enough (Roads 2001, 188). If a grain cloud is monaural, with every grain in the same spatial position, the resulting sound will be “spatially flat” (Roads 2001, 107). If grains are each given unique locations, however, the cloud they comprise “manifests a vivid three-dimensional spatial morphology” (Roads 2001, 107). Spatialization using spectrally decomposed sounds is discussed in Part II and Part III in greater detail. Granulation can be applied in non-real-time to a sound file, but also in real-time (Truax 1986, 1994). Real-time granulation takes two forms: on a live, incoming sound source such as that
That is, the grains are subsumed in the perception of a larger structure. This phenomenon is revisited later. A sound’s spectrum, or timbre, is the result not only of what frequencies exist within it, but also of how these frequency components fluctuate in amplitude over time.
produced by an instrumentalist, and the live, controlled granulation of stored sound files (Roads 2001, 191). Playback from a granulated sound file offers more control to the composer over grains than does real-time granulation, as the former allows for the extraction of grains in any order, be it sequential, reversed, random, etc., and for time-shrinking in real-time (Roads 2001, 192; Truax 1994). Granular reordering and spectral time-shrinking of grains extracted from a live, incoming sound are both impossible, as granulation in this case is limited by the “forward direction of the arrow of time” (Roads 2001, 192). Several granulation parameters are modifiable in real-time granulation, however, the most notable of which is grain duration, more particularly spectral time-stretching (Truax 1994). A single grain of sound can be extended to a duration hundreds of times its original (Truax 1994), or different grains of a sound can be looped repeatedly resulting in a type of granular time-stretching (Roads 2001, 191). In Microsound, Roads briefly addresses what he calls selective granulation, which entails the selection and separation of certain frequency components from a sound object for granulation (Roads 2001, 191). This type of granulation, then, demands the STFT and includes frequency domain granulation as a prerequisite. Roads mentions that emerging signal processing operations are allowing for more and more ways to analyze and separate these frequency components from a signal. Some sets of components can be retained and others discarded. The components can also be modified and recombined to result in a variation of the original signal (Roads 2001, 191). Roads discusses various ways in which a signal can be separated into its frequency components (Roads 2001, 191). One way is through a bank of filters that tune to a different center frequency, resulting in several frequency component groupings, or bands, that differ purely in their bandwidths (Roads 2001, 191). Others include setting amplitude thresholds that separate the low and high-level energy parts of a signal, or setting duration thresholds that separate frequency components of varying lengths (Roads 2001, 191). Roads notes that in general, “there are no limits on the number of ways in which a given sound can be divided” (Roads 2001, 191). It should be emphasized that selective granulation only begins with frequency domain granulation, with the intent to (temporally) granulate selected frequency components that comprise the original signal subsequently. Part II and Part III discuss how frequency domain granulation yields sound material valuable in its own right, and that it has largely been neglected as a means by which to obtain primary musical material for composition. 3.2 “Fre q ue n cy D om a i n Gr an ul a ti o n”
This subsection briefly addresses what Roads refers to as frequency domain granulation, the decomposition of a Gabor matrix of sound into thin rows or layers as opposed to thin columns as with time domain granulation (i.e. granulation) (Roads 2001, xi, 282). Quotations are used around the term in the title of this subsection to emphasize that frequency domain granulation is an awkward term that needs replacing, as the operation does not yield grains (i.e. time components) at all, but frequency components. That it is coined in terms of granulation may highlight the secondary status of frequency domain granulation relative to granulation, that the separation of smaller sounds from others for the procurement of musical material has focused overwhelmingly on the time domain. This is as true for microsound as it is for musique concrète. It should be noted that Roads spends no more than five pages or so on frequency domain granulation in his over four hundred page text (Roads 2001, xi, 275-282), although he devotes portions of several chapters to the latter. Indeed, he only uses the term “frequency domain granulation” once or twice (Roads 2001, xi, 282), and other instances have not been discovered in relevant literature. By no means does Roads elaborately theorize frequency domain granulation or explicate its parameters as he does for granulation. Part II and Part III aim to explicate this theory, distinguishing it from that germane to microsonic grains. Certain PV implementations already allow for the direct editing of the analysis data produced by the STFT, and more particularly for frequency domain granulation (Roads 2001, 275). One can alter the amplitudes or other parameters of specific frequency components, and, of greatest interest to us now, one can even “cut” or “extract” certain frequency components from the spectrum (Roads 2001,
275, xi). Cutting frequency components, according to Roads, removes them from the spectrum, while extracting frequency components removes everything from the spectrum but these components (Roads 2001, 275). It is also argued that the terms “cutting” or “extracting” may not be as appropriate in the context of the frequency domain as they are in that of the time domain; separating may be a better term, as will become clearer in Part II.23 Roads mentions that even a single, half-second sound object may contain hundreds of “closelyspaced” frequency components (Roads 2001, 275). Given the large amount of frequency components that can exist within a single sound object (or even a grain), and that an operation on an individual frequency component within the sound object would alter it negligibly, the question becomes one of how to select frequency components to separate (Roads 2001, 275). Selecting frequency components manually is the labor-intensive way of “frequency domain granulating” a sound, and the key to efficiency in frequency domain granulation is to use other strategies (Roads 2001, 275). The simplest case of frequency domain granulation—filtering—which involves a bank of filters that tune to different center frequencies, is illustrated below:
Sound Object F.C. Groupings (i.e. Bands) t
Fig. I.l: Filtering: Simplest Case of Frequency Domain Granulation.
Roads offers some of the same frequency component selection strategies he mentions in regard to selective granulation: selecting frequency components according to a certain bandwidth in which they may exist (i.e. bandpass filtering, as shown above), or according to length or amplitude criteria, in which only frequency components over or under a certain duration or amplitude threshold are eliminated or retained (Roads 2001, 275). Removing frequency components greater than 70 ms in duration, says Roads, transforms a spoken voice into “wheezing and whispering,” while increasing the amplitudes of frequency components less than 10 ms in duration transforms a voice into a “gravely sounding voice” (Roads 2001, 275). Roads also states:
One imagines that more sophisticated track [i.e. frequency component] selection principles will become available in the future, such as selecting harmonically-related tracks (Roads 2001, 275).
It seems that Roads only foresaw “the tip of the iceberg,” so to speak, as frequency component selection principles that operate in much more sophisticated ways than by simply selecting harmonically-related frequency components now exist. Particularly, recent source separation technologies select and group perceptually-related frequency components of an arbitrary signal, such as those that correspond to a single instrument in a polyphonic ensemble—those that our ears tend to group together as coherent sounds—and may do so by selecting harmonically-related frequency components only as a step in the procedure (Casey, Westner 2001, 1). Source separation is discussed at length in Part II and Part III. Frequency domain granulation is by no means a new concept. However, it has widely been overlooked as a means by which to segment sound into fruitful musical material relative to time domain granulation, which has been used extensively to obtain musically functional (and fruitful) grains. Traditionally, frequency domain granulation, particularly filtering, has served as a means by which to transform existing sounds, to which the examples given by Roads above attest. Relative to granulation, however, it has only seldom served to separate the sound object into smaller sonic pieces to be used in their own right in distinct musical processes as grains have been, smaller sounds that
“Separating” seems to work equally well in describing time and frequency domain extractions.
can be perceived outside of the context of the larger sound object they comprise. As discussed in Part II and Part III, such smaller pieces may lend themselves to macro approaches of organization into new sonic structures as grains have lent themselves. The term “grain” itself bestows musical value on the products of granulation—emancipated time components of sound—and it makes clear that these sounds are no longer confined within the larger sound from which they come. The products of frequency domain granulation—emancipated frequency components of sound—enjoy no such marking descriptor. In fact, neither grains nor the products of frequency domain granulation are components of the original sound from which they are obtained once they are procured as such and freed to function otherwise—they are their own sounds. Many smaller sounds may exist throughout the frequency domain of larger sound object as do throughout its time domain. Why, then, have musique concrète and microsound focused so heavily on separating smaller sounds from larger sounds in terms of time, but not in terms of frequency? Perhaps a deconstructive operation that has proved so musically fruitful in one dimension of sound (i.e. the time domain) can prove just as fruitful when explored just as rigorously and applied to another (i.e. the frequency domain). Frequency domain granulation deserves far greater attention, and receives the focus of Part II and Part III.
The Block of Sound: Alternative Responses
I’m concerned with…things disintegrating and coming back together…, with killing things off…and then resurrecting them, so that one set of references is negated as a new one takes its place. –Cornelia Parker
Part I of this thesis addressed microsound, which employs units of sound that are nevertheless only necessarily micro in terms of their durations, but not their spectral information content.24 Part II addresses musical units that can be considered, relative to the sound object from which they are extracted, as micro in terms of their spectral information content, but not in terms of their durations. Relative to the larger sound object, they are not temporally micro columns of sounds, but spectrally micro strata of sounds. Xenakis described his first work of microsound, Concret PH, as written “in defiance of the usual manner of working with concrète sounds,” sounds that he portrayed disparagingly as “block[s] of one kind of sound” (Solomos 1997). In Concret PH, Xenakis separated the sound object into many time-restricted particles, or grains, to demonstrate that “[a]ll sound, even continuous musical variation, is conceived as an assemblage of a large number of elementary sounds adequately disposed [horizontally] in time” (Xenakis 1992, 43). Of course, sound can also be conceived as a combination of a large number of sounds, of sinusoids, arranged vertically in frequency. This latter point can be exploited compositionally using various techniques grouped under the category of what Roads refers to as “frequency domain granulation,” briefly addressed in Part I. As a particular case of frequency domain granulation, source separation and its musical implications receive the focus of Part II. Source separation exemplifies what Roads envisioned as one of the “more sophisticated” approaches to frequency domain granulation that would arise in future (Roads 2001, 275). Like other forms of frequency domain granulation, source separation allows for a response alternative to Xenakis’ to working with the bulky sound object, one that allows the sound object to be broken down into smaller pieces that are freed to serve otherwise in music. Compared to time domain granulation, however, frequency domain granulation has received much less attention to the extent that it has been used to decompose a sound object into new, primary sound material for musical composition, sound material that existed in the larger sound object all along. The influence of microsound has extended far beyond what can be considered as “art music,” or Ernst (i.e. serious) Musik (E-Musik) (Landy 2007a, 23). Phil Thomson, a former student of Truax’s, notes that microsound includes “both institutionally produced computer music, as well as an increasingly popular and broad-based body of work from independent composers and sound artists with little or no formal training” (Thomson 2004, 207). Microsound has influenced composers with roots in “popular music,” or Unterhaltung (i.e. entertainment) Musik (U-Musik), musicians who work outside of academic settings (Landy 2007a, 23).25 Many of these composers produce music that is difficult to categorize, music that exists on the border of the “E-/U-Musik boundary,” and may “see as much of a divide between themselves and more commercial, entertainment, and fashion-based pop industry as they do between themselves and the art music world” (Landy 2007a, 147). Composer and musicologist Leigh Landy argues that much of electronica exists somewhere along the E-/U-Musik boundary (Landy 2007a, 241). He offers three definitions for electronica:
1. 2. An umbrella term for innovative forms of popular electroacoustic music created in the studio, on one’s personal computer, or in live performance. Typically, although influenced by current forms of dance music, the music is often designed for a non-dance-oriented listening situation. The unlikely meeting of several genealogical strands of music: the sonic and intellectual concerns of classic electronic music; the do-it-yourself and bruitist attitudes of punk and industrial music; and beat-driven dance floor sounds from disco through house and techno
Relative to the sound objects from which they are extracted. Interestingly, as Thomson notes (Thomson 2004, 5), Roads’ Microsound contains almost no mention of UMusik that incorporates microsonic techniques.
Real-time improvised…laptop performance approaches…. Many such works are based on the structuring and manipulation of small audio artifacts traditionally considered as defects, such as “clicks.” Noise is a common characteristic. Here the term implies a celebratory lo-fi aesthetic that is not directly influenced by either popular or art music traditions (Landy 2007a, 14-15).
Landy includes electronica with other musical categories such as organized sound, sonic art, sound art, radiophonic art, electronic music, and electroacoustic music under the broader category he describes as soundbased music (Landy 2007a, 9-17). Landy defines sound-based music as that which “designates the art form in which the sound, that is, not the musical note, is its basic unit” (Landy 2007a, 17). His conglomeration of the disparate, sound-based musics above stems from his belief that “the E/U separation is not as important today and perhaps could even rest in peace” (Landy 2007a, 146), that “there exists a ‘space’ of communality amongst sound-based music works” (Landy 2007b, 87). In 2004, Landy attended a two-day festival in Paris held by the Electronic Music Foundation (EMF) called NewMix (Landy 2007b, 85). Landy lauds NewMix as “not just another electroacoustic music festival,” that its “celebration of the art form’s inherent eclectic nature” made the festival special (Landy 2007b, 85-86). He elaborates:
This eclectic approach is in contrast to that of many festivals focusing on, for example, music on a fixed medium, digital music in new media contexts or interactivity or, more recently, various forms of electronica. The NewMix programme contained all of these and some works that would have caused difficulties in terms of fitting them into any of the above categories…. There was hardcore acousmatic work and techno-inspired pieces; there were improvised ambient works, new articulations of the post-digital aesthetic such as works celebrating glitch and other digital detritus, speech-driven works, and note-based works. However, most performed works belong to what I call sound-based music involving technology (Landy 2007b, 86).
The NewMix festival, organized by composers from Schaeffer’s GRM, naturally emphasized EMusik over U-Musik. The 2003 All Tomorrow’s Parties in Camber Sands, UK curated by Sean Booth and Rob Brown of the electronica duo Autechre, exemplifies another festival featuring a similar mix of such eclectic sound-based musics, but with an emphasis on U-Musik. The festival featured music performed by Autechre’s fellow U-Musik, experimental electronica artists of the British Warp Records label such as Richard James a.k.a. Aphex Twin, but also the E-Musik of Bernard Parmegiani, a well-known composer of acousmatic music and student of Pierre Schaeffer at GRM, and that of Curtis Roads, digital pioneer of microsound. These two festivals alone, of course, do not attest to “the ‘space’ of communality” among soundbased musics that Landy believes exists. Nevertheless, they do exemplify two separate congregations of musicians who create and extol all of these musics together. Such gatherings may perhaps serve as the most compelling indicator of these musics’ possible convergence, and Landy writes that such a convergence is indeed “slowly taking place” (Landy 2007a, 146). Of the different musics that may fall into either of the three definitions of electronica above, those that fit into the third definition (“works…based on the structuring and manipulation of small audio artifacts traditionally considered as defects, such as ‘clicks’”) have most been influenced by microsound (Landy 2007a, 15). While “microsound itself is not a proper genre,” genres that can be included under electronica as defined by Landy above “have emerged using microsound as the key means of source material creation” (Landy 2007a, 245). Some recent developments in microsound have arisen in one such genre embraced by electronica artists who work on the E-/U-Musik boundary, one referred to by Phil Thomson as “the new microsound,” and known more widely as glitch (Thomson 2004, 211-212). Composer Kim Cascone notes that glitch “is not academically based,” that its composers are “for the most part…self-taught” (Cascone 2000, 12). He continues:
Music journalists occupy themselves inventing names for it: glitch, microwave, DSP, sinecore, and microscopic music. These names evolved through a collection of deconstructive [emphasis added] audio and visual techniques that allow artists to work beneath the previously impenetrable veil of digital media (Cascone 2000, 12).
Deconstructive audio techniques are highly characteristic of glitch, as well as the music to be discussed later in Part II. The genre has arisen from the “failure” of digital technology: “glitches, bugs,
application errors, system crashes, clipping, aliasing, distortion, quantization noise,” etc. serve as “the raw materials [glitch] composers seek to incorporate into their music” (Cascone 2000, 13). Glitch music, however, derives just as much from the misuse of digital technology as it does the “failure” of this technology. Indeed, technological failure often is a result of technological misuse:
In…[glitch], the tools themselves have become the instruments, and the resulting sound is born of their use in ways unintended by their designers…. Because the tools used in this style of music embody advanced concepts of digital signal processing, their usage by glitch artists tends to be based on experimentation rather than empirical investigation. In this fashion, unintended usage has become the second permission granted. It has been said that one does not need advanced training to use digital signal processing programs—just “mess around” until you obtain the desired result. Sometimes, not knowing the theoretical operation of a tool can result in more interesting results by “thinking outside of the box” (Cascone 2000, 16).
The tool used to deconstruct sound objects in order to procure the musical material for this thesis exhibits state of the art digital signal processing (DSP) concepts, and its misuse through experimentation, as discussed in the next section of Part II, led to a new musical technique. Glitch “arrived on the back of the electronica movement,”26 arising in part as a response to techno music of the early 1990s, which Cascone claims had “settled into a predictable, formulaic genre serving a more or less aesthetically homogeneous market of DJs and dance music aficionados” (Cascone 2000, 15). Around this time, artists of experimental electronica began to seek ways to expand the musical “tendrils” (Cascone 2000, 4) of techno into new areas by embracing the work of 20th century composers of electronic art music (Cascone 2000, 15). Despite the influence of 20th century E-Musik on glitch, the former of which some glitch composers “feel best describe its lineage,” glitch has tended to use this influence primarily as a means to respond to U-Musik techno:
The new genre of microsonic computer music [i.e. glitch] tends to have a different aesthetic than computer music produced within the traditional institutional framework. Although it is often informed by currents in twentieth-century concert music…, much of it is also in more or less explicit reaction to…rave-oriented techno…. [M]uch of this music is beat-oriented, engaging microsonic sound design in its vocabulary of blips and clicks used in place of the usual drum-machine sounds (Thomson 2004, 212).
Cascone notes that glitch has developed around an “aesthetics of failure” (Cascone 2000, 12-13). The many types of digital audio “failure,” write Cascone, may result in “horrible noise,” but also “wondrous tapestries of sound” (to some these may be one in the same) (Cascone 2000, 13). Regardless, the sounds that result from digital audio “failure” are often new ones, and the explicit misuse of technology reflects a search for new sounds that may be used in music. As Landy notes:
There are many paths leading to Rome when it comes to the search for new sounds…. In musique concrète, processes of sound manipulation have led to radical new sounds…. One path followed by many in search of the new is that of various forms of microsound, creating sounds from very small sound particles. One form of microsound, granulation…, will provide an example here (Landy 2007a, 119).
Digital technological detritus, the sound of “failure,” has served as the new sound for the composers of glitch music. However, Thomson remarks that, while glitch may have arisen in the spirit of technological critique, this “critical spirit may be vanishing,” and the search for new sounds by glitch composers seems to have lost its rigor (Thomson 2004, 214). He writes:
[T]he ‘aesthetics of failure’ might be on its way to being…a failure of aesthetics…. Its techniques may also be on their way to becoming as formulaic as those of the techno against which it originally railed. More and more audio software is being developed which enables one to simulate the sounds of digital failure without actually experiencing it. Digital technology is rendered capable of successfully emulating its own failure, a fact which risks undercutting whatever critical edge an aesthetics of failure may have had…. [P]erhaps it is time for it to re-invent itself in a way that offers a way past the impasse which its own self-consciously limited vocabulary has tended to produce (Thomson 2004, 214). Because of its close connection with the dance music market, and despite its heavy inspiration from 20th century art music, Cascone notes, glitch has failed to receive the “academic consideration and acceptability it might otherwise earn” (Cascone 2000, 15).
Glitch, then, must expand its “limited vocabulary” if it is to evolve. Fortunately for the glitch genre, new digital music technologies bombard us on a daily basis, bringing with them vast opportunities for technological misuse. In their search for these new opportunities, new sounds, glitch composers may glean much from the academic setting. In reverse, academia may benefit from those glitch composers who “mess around” and “think outside the box” (Cascone 2000, 16):
Glitch…may benefit from a broader range of influences, including those stemming from research-based computer music institutions, to find their way out of the current aesthetic deadlock into which they seem to be heading, and institutionally-based composers may also be able to develop new aesthetic directions with more of an influence from composers working outside the institutional framework. Indeed, this cross-fertilisation seems to be underway already, with institutionally based composers becoming more and more aware of the new tendencies in digital music (though the reaction is not always favourable...)27 (Thomson 2004, 215).
While the gap between the worlds of E-Musik and U-Musik may be closing as Landy, Thomson, and others believe, it certainly still exists. It is uncertain whether or not glitch may be heading into an “aesthetic deadlock” as Thomson suggests, but given that the genre thrives on the discovery and exploitation of technological “failure,” glitch would undoubtedly benefit from greater interaction with academic, E-Musik culture where the newest technologies exist. It is, of course, easier to find new technological “failures” if one looks to new, largely unexplored technologies. Thomson is keen to note above that an influence from composers working outside the institutional framework, such as those in glitch, could benefit academic institutions. It is often misunderstanding that leads to the misuse of a technology and thus its extension into areas previously unconsidered, as was true for the technology utilized for the music discussed later in Part II. Those outside the institutional framework are perhaps more likely to misunderstand and misuse those technologies developed within the institutional framework, and this may lead to innovative musics. The music soon to be discussed represents an attempt to expand what Thomson calls the “limited vocabulary” of glitch, the “new microsound,” by exploring frequency domain granulation with a tool capable of—but not expressly intended for—this purpose. New approaches to frequency domain granulation, as discussed soon, allow not only for the procurement of new sounds, but new musical processes as well. It was a misuse of technology, adopted in the defiant nature of the glitch aesthetic, which allowed for a new technique to develop throughout the writing of this thesis. The compositions to be presented share many analogies with the work of British artist Cornelia Parker, which itself has much to do with the glitch aesthetic in its recurrent theme of deconstruction. Parker is perhaps best known for her exploded shed in Cold Dark Matter: An Exploded View (1991), shown below:
Fig. II.a: Cornelia Parker’s Cold Dark Matter: An Exploded View (Parker 2005, 78). Perhaps it is an extant lack of approval directed from each side of the E-/U-Musik divide toward the other that serves as the most significant barrier to a greater synergy between the two musical worlds.
Parker briefly discusses her work in the following:
I resurrect things that have been killed off…. My work is all about the potential of materials—even when it looks like they’ve lost all possibilities (ARTseenSOHO).
Parker describes what she refers to as the “exploded view” above as “another clichéd cartoon death,” and cites her motivation for the work as stemming from “seeing explosions on the news” and a realization that her firsthand knowledge of explosions was quite limited, that she had “never walked through the detritus [emphasis added] of a bombed-out building” (Parker 2000, 57). As is true for the music to be discussed later in Part II, the process and the finished work in Cold Dark Matter are inextricable. For the work, Parker collaborated with the British armed forces, which performed a controlled explosion on the shed at the army’s School of Ammunition (Parker 2000, 22). The destroyed, burnt, and mangled pieces of the shed were taken to a gallery and suspended from thin wires around a single bare light bulb (which once hung inside the original shed). Only after the explosion could the pieces of detritus become a “universe unto themselves” (Parker 2000, 22). Parker’s Cold Dark Matter captures a process of deconstruction, an exploded view, evident in the musical works soon discussed. It is the explosion of a coherent object that yielded the detritus for Parker’s work, the explosion that allowed her to make something uniquely new from that object. The deconstructive processes in the music to be discussed simultaneously yield new musical materials detached from the solid sound object they once comprised—new musical materials appreciable in their own right—as well as musical structures inextricably connected to these new musical materials. The next section discusses Probabilistic Latent Component Analysis (PLCA), which can be thought of as a particular case of frequency domain granulation. PLCA is an algorithm for source separation, a procedure that separates distinct sources from a sound mixture, sounds that our ears can distinguish as causally independent, such as the individual instruments in a string quartet. Source separation involves the extraction of certain groupings of frequency components that exist within a sound object, which correspond to individual sources within the overall sound mixture. PLCA is misused for the purposes of this thesis, as it is utilized to extract from the sound object smaller sounds that our ear is inclined to fuse into one. These sounds, the detritus of the “exploded” sound object, are then scattered over time and space to become “a universe unto themselves” (Parker 2000, 22).
Auditory Scene Analysis
The “cocktail party problem,” in which several people are talking simultaneously and the objective is to follow the speech of one individual (Cherry 1953), presents a simple task to the human brain. The task is a difficult DSP one, however, and is referred to as source separation. The complicated goal in source separation, then, is to extract individual audio sources from a mixture of sources (Casey, Westner 2001, 3). Source separation is a special case of frequency domain granulation (Roads 2001, xi, 275), as it selects and extracts frequency components from a signal, particularly ones that describe individual audio sources. What constitutes a single source, a mixture of sources? Single sources would include isolated speech or a single instrument playing; source mixtures would include speech with background sounds or multiple instruments in a polyphonic ensemble. Each source in a sound mixture consists of a particular set of frequency components that exist throughout the larger signal, each of which have amplitude envelopes dictating how they evolve over time. It is important to note that distinct sources in a mixture likely will overlap each other in different parts of the spectrum, and thus cannot be separated by simple bandpass filtering. As discussed in the next section, source separation begins with an auditory scene analysis, which decomposes a signal into “fundamental representations” (Casey, Westner 2001, 3). Further steps must be taken to identify groups of related frequency components that form individual perceptual auditory streams, or perceptually relevant groupings of frequency components that together describe a single source in a sound mixture (Casey, Westner 2001, 3). Canadian psychologist Albert Bregman, a pioneer in auditory perception who coined the term auditory scene analysis (ASA), summarizes the problem of ASA in the following:
Although we need to build separate mental descriptions of the different sound-producing events in our environments, the pattern of acoustic energy that is received by our ears is a mixture of the effects of the different events (Bregman 1990, 641).
In other words, independent sounds that occur simultaneously in our surrounding environment reach our ears as a single pressure wave. How does our auditory mechanism distinguish between the different sounds? Bregman claims that it does so in two main ways. One is through schemas that govern the listening process, which make use of our knowledge of familiar sounds (Bregman 1990, 641). The other is through primitive processes of auditory grouping. Bregman focuses on the latter. Primitive processes, according to Bregman, must first break down the incoming sound signal into many separate analyses, which “are local to particular moments of time and particular frequency regions in the acoustic spectrum” (Bregman 1990, 641). Each region of the time-frequency spectrum is analyzed in terms of “its intensity, its fluctuation pattern, the direction of frequency transitions in it, an estimate of where the sound is coming from in space, and perhaps other features” (Bregman 1990, 641). It is only after it conducts these numerous separate analyses that our auditory system must decide how to group the various parts of the incoming signal as having come from certain environmental sources (Bregman 1990, 642). This grouping occurs along two dimensions: time and frequency (Bregman 1990, 642). Bregman calls the temporal grouping sequential integration and the spectral grouping simultaneous integration (Bregman 1990, 642). These two forms of grouping “often operate in conjunction to solve the [scene analysis] problem,” and are promoted by similar factors (Bregman 1990, 642). 1.1 Seq u en t ia l I n tegr at i on
Features definitive of the similarity and continuity of successive sounds favor the grouping of a sequence of auditory inputs, and include fundamental frequency, temporal proximity, spectral shape (i.e. timbre), intensity, and spatial origin (Bregman 1990, 649). Bregman notes:
My description has seemed to imply that the things that group sequentially can be thought of as sounds. This was because the examples that I have given were all stated in terms of rather simple sounds rather than in terms of mixtures of sound. We find that the same factors serve to promote the sequential grouping of sound in mixtures, but, in this case, it is not whole sounds but parts of the spectrum that are caused to group sequentially. The resulting grouping helps the brain to create separate descriptions of the component sounds in the mixture (Bregman 1990, 649).
These “things that group sequentially,” which “can be thought of as sounds,” as opposed to sound mixtures, relate to causally coherent Schaefferian sound objects, as discussed in more detail in Part III. For now, suffice it so say that while Schaeffer obtained his sound object by isolating it from others in time, these “sounds” above, to be perceived alone, would have to be isolated from others in terms of frequency, that is, by separating them from the spectral context of the sound mixture. Bregman describes each of these main factors that promote sequential integration in more detail. Frequency separation and temporal distance are the most important factors in segregating simultaneous streams of successive sounds (Bregman 1990, 643). If two subsets of sounds are first heard as a single stream of sound, moving the subsets apart in frequency will result in an increased segregation, making it more difficult to hear the full sequence as a single stream (Bregman 1990, 643). Further, as the speed of the entire sequence increases, the segregation increases further (Bregman 1990, 643). Bregman notes that the “stream-forming process” behaves analogously to the Gestalt principle of grouping by proximity: a sequence of sounds closer to each other in frequency tend to be heard as a single stream, and the closer they follow each other (i.e. the closer their temporal proximity, or the smaller amount of time between them), the stronger will be their integration by our auditory mechanism into a single stream (Bregman 1990, 643-644). Natural sounds contain many frequencies, one of which may be a fundamental frequency that helps to integrate them into a stream of other natural sounds of the same or relatively close fundamental frequency. If sounds in a succession do not have fundamental frequencies contributive to their grouping, they still may bear a timbral similarity that integrates them into the same stream
(Bregman 1990, 646). While pure tones (i.e. sine waves) cannot be differentiated from each other timbrally (as they each have identical, minimal timbres), complex sounds may differ in timbre and undergo grouping by our auditory mechanism according to their timbral similarities. Just as frequency relations between various frequency components in a sound mixture may cause our auditory system to integrate or segregate them, so do amplitude relations. However, amplitude relations seem less important than the frequency relations, and the former may only supplement the effect of the later in the integrations or segregations applied by our auditory system. As Bregman notes, “amplitude differences between sounds will control their grouping,” and “[l]oud sounds will tend to group with other loud ones and soft ones with soft” (Bregman 1990, 648). However, sounds that differ only in amplitude will not necessarily segregate from one another; it is more likely the case that, “when there are also other differences between the sounds, the loudness differences may strengthen the segregation” (Bregman 1990, 648). These “other differences” may be ones of fundamental frequency or timbre, but also spatial ones. Spatial location is, like amplitude, a relatively less significant segregation criterion of the human auditory mechanism—it may only supplement the grouping effects of other sequential integration factors. As Bregman reminds us, humans “can do quite well at segregating more than one stream of sound coming from a single point in space, for example, from a single loudspeaker” (Bregman 1990, 644). Again, the auditory system may use other factors of segregation to distinguish between two spatially proximal sources (i.e. temporal proximity, frequency separation, timbre, amplitude, etc.). Still, sounds originating from the same point in space more often lend themselves to sequential integration by our auditory mechanism than sounds that do not. This point, in conjunction with the fact that other factors may segregate sounds that nevertheless derive from the same spatial origin, illuminates a certain “competition” that may arise within our auditory system:
Illusions can be created by setting up a competition between the tendency to group sounds by their frequency similarity and by their spatial similarity (Bregman 1990, 644).
Several factors promoting sequential integration may enter into a grouping competition with the spatial similarity factor. This competition is explored in the third piece discussed in Part II. 1.2 Si mul t a ne ou s I n tegr a ti on
In addition to its ability to group sequences of related sounds, our auditory system can group simultaneities of related sounds. Bregman reiterates that sequential and simultaneous integration are complementary aspects of human ASA, that the “sequential grouping of auditory evidence…is only part of the story” (Bregman 1990, 654). He writes:
In mixtures of sound the auditory system must decide which components, among those that are received concurrently, should be treated as arising from the same sound. This process was studied in simple experiments in which two concurrently presented pure tones, B and C, were alternated with a pure tone, A…. It was found that if B and C started and ended at the same time, they tended to be treated as two components of a single complex tone, BC, that was perceived as rich in quality. On the other hand, there was a tendency to treat B as a repetition of A whenever A was close in frequency to B. B seemed to be the object of a rivalry. When it was captured into a sequential stream with A, it was less likely to be heard as part of the complex tone, BC. Conversely, when it was captured by C and fused with it, it was less likely to be heard as a repetition of A. It seemed that sequential grouping and spectral grouping were in a competition that served to resolve…evidence concerning the appropriate grouping of sensory material (Bregman 1990, 654-655).
So while they often “operate in conjunction” to solve scene analysis problems, sequential and simultaneous integration can be thought of as “competing” to group different pieces of acoustic sensory information (Bregman 1990, 642, 655). Bregman notes that in a spectrogram of a mixture of sounds, spectral content “arriving from one sound overlaps [emphasis added] the components of the remainder of the sound both in frequency and in time” (Bregman 1990, 655). An example of such an overlap is illustrated by Bregman below
to the right with a spectrogram of a mixture of sounds containing the spoken word “shoe” (Bregman 1990, 8). It is juxtaposed with a spectrogram of the word “shoe” spoken in isolation”:
Fig. II.b: Above left: “shoe” spoken in isolation. Above right: “shoe” within a mixture of sounds (Bregman 1990, 7-8).
Bregman asks, “How can the auditory system know which frequency components to group together to build a description of one of the sounds?” (Bregman 1990, 655). It does so, Bregman argues, by searching for “correspondences among parts of the spectral content that would be unlikely to have occurred by chance” (Bregman 1990, 655). One type of correspondence lies “between auditory properties of different moments of time,” and thus is related to sequential integration (Bregman 1990, 655). A spectrum of a mixture of sounds “may have, embedded within it, a simpler spectrum that was encountered a moment earlier” (Bregman 1990, 655).28 The simpler spectrum might “abut against the more complex one with no discontinuity,” in which case it might be appropriate to treat the part of the spectrum that matches the earlier, simpler one as “merely a continuation” of the former, and to treat “the later one as resulting from the addition of a new sound to the mixture” (Bregman 1990, 655). Bregman refers to this abutment of distinct sounds within a complex spectrum as the old-plus-new heuristic (Bregman 1990, 393, 655). Because this abutment is a temporal one, Bregman notes that the old-plus-new heuristic “is just another manifestation of the principles that control sequential grouping” (Bregman 1990, 655). Yet sequential integration and simultaneous integration are difficult to consider independently because, as mentioned, they seem to compete with one another: the stronger the sequential integration between a sequence of sounds, the less likely the sequence is to be simultaneously integrated with another sequence of sounds with a strong sequential integration. Grouping part of an auditory event an earlier event depends on their similarity. Two factors we have seen in sequential integration influence this similarity: frequency separation and temporal proximity (Bregman 1990, 655). A sequence of sounds close in frequency and separated by short silences will likely be integrated sequentially and segregated from another sequence of sounds that joins it simultaneously. When one independent sequence of sounds abuts against another that shares some of the same frequency components (as is often the case), amplitude may serve as an important factor in the segregation of these two distinct sequences from each other. Bregman writes:
There is even some evidence that the auditory system uses the amplitudes of the various spectral components of the earlier spectrum to decide not only which spectral components to subtract out but also how much intensity to leave behind at each frequency. This is a good strategy because the old and new sounds might have some frequency components that are the same. The subtraction (or a process roughly equivalent to subtraction) provides an estimate of the probable intensity of the frequency components of the sound that has been added to the first to create the complex spectrum (Bregman 1990, 655-666).
This idea of multiple spectra existing within one complex spectrum is an important concept revisited soon.
That perceptual streams of sound within a mixture of sounds can share some of the same frequency components explains why bandpass filtering a sound cannot convincingly separate it into its constituent, perceptually coherent units. This point is revisited in Part III. The above examples involve a sequence of sounds, a simple spectrum, which is first heard independently before being joined by a new sequence of sounds; both sequences together constitute a complex spectrum. Our auditory mechanism can also decide “which components, received at the same time, should be grouped to form a description of a single auditory event” (Bregman 1990, 656). Again, relations between the many frequency components in the spectrum contribute to such groupings, and their effect “is to allow global analyses of factors such as pitch, timbre, loudness, and even spatial origin to be computed on a set of sensory evidence that probably all came from the same event in the environment” (Bregman 1990, 656). First, the farther the simultaneously sounding components are separated in terms of their frequencies, the more likely they are to be segregated into distinct sounds (Bregman 1990, 656). However, frequency components that are harmonics, or components with frequencies that are integer multiples of the same fundamental frequency, tend to be grouped by our auditory system even if they are separated distantly in frequency (Bregman 1990, 656). Bregman refers to this as the “harmonicity principle,” and describes it in the following:
Its “utility follows from the fact that when many types of physical bodies vibrate they tend to generate a harmonic spectrum in which the partials [emphasis added]29 are all multiples (approximately) of the same fundamental. Instances include many animal sounds, including the human voice. Therefore if the auditory system can find a certain number of fundamentals that will account for all of the partials that are present, then it is very likely that we are hearing that number of environmental sounds (Bregman 1990, 656).
The relative intensities of the various partials in a mixture also affect their simultaneous integration or segregation, and a partial that is louder relative to others that surround it in frequency space is not likely to be grouped with them (Bregman 1990, 393). Bregman argues that the Gestalt principle of common fate seems to be present in spectral, or simultaneous integration (Bregman 1990, 394, 657). He remarks:
The Gestalt psychologists discovered that when different parts of the perceptual field were changing in the same way at the same time, they tended to be grouped together…because of their common fate. A visual example can be made by drawing two clusters of dots, each on a separate transparent sheet. When the two are superimposed, we see only one denser set of dots. However, if the two sheets are moved in different patterns, we see two sets of dots, each set defined by its own trajectory of motion (Bregman 1990, 657).30
The principle of common fate applies to audition in that “correlated changes in the frequencies of different partials or in their amplitudes” will group these partials together (Bregman 1990, 657). Regarding frequency, Bregman takes speech as an example and notes that pitch variations in the human voice involve similar changes in all frequency components—with a rise in pitch, the change in fundamental frequency is accompanied by a proportional movement in the same direction of all its harmonics (Bregman 1990, 657). “It is plausible to believe,” Bregman writes, “that this correlated change, if it could be detected auditorily, could tell us that the changing partials all came from the same voice” (Bregman 1990, 657). Bregman states that common fate also helps the auditory system to detect and group “synchronized amplitude changes in different parts of the spectrum” (Bregman 1990, 658). Because the many partials emitted by a single sound source often start and stop simultaneously, and those emitted by different sources tend not to, synchronized amplitude changes can help “to partition the set of frequency components derived from a mixture of sounds” (Bregman 1990, 658). Thus, distinct sources within a mixture, which may likely share the same frequencies components, can be segregated by our auditory mechanism based on their amplitude characteristics. Although the overall
As mentioned briefly in the Preface, a partial is simply a frequency component of a tone, which is not necessarily related harmonically to a fundamental frequency (Bregman 1990, 733). 30 This same visual example of common fate is directly related to the third piece discussed at the end of Part II.
trajectories of amplitude envelopes of different frequency components may be important in integrating or segregating them, these frequency components’ onsets—the very instant when they begin to sharply rise in amplitude—is also a critical factor. When sounds have synchronous onsets, they tend to be grouped together; when they have asynchronous onsets, they are more likely to be segregated. This is an important principle of auditory perception exploited the pieces discussed in later in Part II. Onset synchrony promotes simultaneous sounds to fuse, though not necessarily (e.g. in the case of onset synchronous sounds with very distant and harmonically unrelated frequencies, or onset synchronous sounds with different timbres). Bregman states that frequency components become fused, and no longer separately audible, when our auditory mechanism assigns them to the same analysis (Bregman 1990, 660). Fusing is related to yet distinct from masking, in which frequency components also “become less audible in the presence of other ones” (Bregman 1990, 660). Bregman distinguishes between fusion and masking in the following:
Masking and fusion differ in the tasks used to measure them. A sound is deemed to be masked if there is no way to tell whether it is present or absent in a mixture of sounds. It need not be audible as a separate sound. On the other hand, it is considered to be fused if it is not audible as a separate sound even if you can tell whether it is on or off by some change that it induces in the perceived quality of the spectrum (Bregman 1990, 660-661).
Thus, fusion implies that constituent sounds comprising a larger sound may not be “heard out” even though they influence our perception in the larger sound. Masking implies that certain sounds that exist within the spectrum are simply “covered up” by others, and that the former have a negligible influence on the larger sound. Nevertheless, fusion and masking are definitely related in that, in both cases, some “[frequency] components of a complex auditory mixture lose their ability to be heard individually [emphasis added]” (Bregman 1990, 661). It is important to note that segregative factors help prevent frequency components from being masked by others. For example, it is difficult to mask one sound by another that does not start synchronously with it or by one that comes from a different spatial location (Bregman 1990, 661). Spatial location is the final important factor of simultaneous integration. Spectral components that derive simultaneously from the same spatial location tend to be segregated from those that come from different spatial locations (Bregman 1990, 659). However, as Bregman notes, “this strategy is not infallible,” as distinct acoustic events can occur in close proximity, and distant acoustic events can sound closer than they actually are due to the “reflection of sound waves from nearby surfaces” (Bregman 1990, 659). He writes:
This may be why spectral organization does not depend too strongly on spatial cues. A person can do a creditable job at segregating concurrent sounds from on another even when listening a monaural recording. Spatial evidence is just added up with all the other sorts of evidence in auditory scene analysis (Bregman 1990, 659).
Nevertheless, as in sequential integration, while spatial location may not be an “overwhelming” cue for partitioning the spectrum, it may still be a strong one. Spatialization techniques are used in the third piece discussed later in Part II to explicitly prevent acoustic fusions from occurring. In the next section, we shall discuss a new tool, Probabilistic Latent Component Analysis, which allows for the decomposition of the sound object into perceptually relevant streams. These streams, when perceived alone, can be heard quite independently from the sound object within which they formerly existed. Special thanks are owed to Michael Casey, who introduced the author to PLCA and wrote an implementation of the PLCA algorithm in MATLAB, which allowed for the undertaking and completion of this thesis.
Probabilistic Latent Component Analysis
Recall the prediction of Roads in Microsound that “more sophisticated…[frequency component] selection principles…[would] become available in the future, such as selecting harmonically-related [frequency components]” (Roads 2001, 275).31 New auditory stream formation algorithms based on Bregman’s work “use perceptually-motivated heuristics, such as common onset of harmonicallyrelated components” to automatically perform separations (Casey, Westner 2001, 3). While the selection of frequency components based on harmonic relation may entail a more sophisticated approach than ones based on an amplitude or length relation as discussed by Roads (Roads 2001, 275), it may only mark a primary step in algorithmic ASA procedures. PLCA is one such algorithm for ASA (Raj et. al. 2006). While a technical discussion of PLCA is beyond the scope of this thesis, its basic concepts are addressed. PLCA is a tool for the analysis and decomposition of acoustic spectra that allows for source separation in addition to feature extraction, source recognition, and denoising (Raj et. al. 2006, 1). It differs from other source separation techniques in that it maintains attributes of the original input sound by extracting actual spectra and time envelopes from that sound, that is, by constructing new, smaller spectrograms derived from the original spectrogram (Raj et. al. 2006, 1; Casey, Westner 2001). PLCA views the spectrogram “to be a sort of histogram which measures the amount of timefrequency ‘sound quanta’ at each point” (Raj et. al. 2006, 2), and thus much like a Gabor matrix. PLCA considers the spectrogram as a distribution of acoustic energy over the time-frequency plane, and applies statistical techniques directly to an input signal to extract marginal distributions over time and frequency (Raj et. al. 2006, 2). These marginal distributions are either time marginals, which describe the temporal elements in the original spectrogram, or frequency marginals, which describe the spectral characteristics of the original spectrogram. PLCA combines corresponding time and frequency marginals, which are actual amplitude envelopes and spectra respectively, to describe the temporal evolution and frequency profile of all the independent sources within a sound mixture (Raj et. al. 2006, 3). Time and frequency marginals extracted from using PLCA are shown below:
Extracted frequency marginals Input time/frequency distribution (drum loop)
Extracted time marginals
Fig. II.c: PLCA applied to a drum loop sample to extract five latent components (Raj et. al. 2006, 3).
Each combination of associated frequency and time marginals correspond to multiple frequency components and their respective amplitude envelopes that form an actual time-frequency spectrogram. In this regard, PLCA draws much from Independent Subspace Analysis (ISA) to extract “maximally contrasting features from a single mixture” (Casey, Westner 2001, 4). ISA itself is based on the Ph.D. dissertation of Michael Casey, who offered the first approach to decomposing single spectrograms into new separate spectrograms, which he calls subspaces (Casey 1998).
Roads refers to frequency components here as “tracks” (Roads 2001, 275).
These separated spectrograms subspaces consist of static, multi-frequency component groupings of the input signal “that contain independent source elements of the input mixture” (Casey, Westner 2001, 9). Thus, each spectrogram corresponds to a subspace, which itself corresponds to a particular source in a sound mixture. The spectrogram of a given subspace is inverted using the ISTFT to obtain the source-separated audio corresponding to individual elements within a sound mixture (Casey, Westner 2001, 5). Spectrogram subspaces obtained through PLCA by combining time and frequency marginals from a mixture of speech and chimes are shown below to the far left and right as “Extracted Speech” and “Extracted Chimes” (Raj et. al. 2006,5). The middle spectrogram corresponds to the original mixture and is not a subspace:
Extracted Speech Speech/Chimes Mixture Extracted Chimes
Fig. II.d: Source separated spectral subspaces obtained with PLCA (Raj et. al. 2006, 5).
The source-separated signals represented by the spectrogram subspaces above are comprised of latent components (Raj et. al. 2006, 1). The single latent component consists of multiple frequency components whose semantic or timbral relation within the input signal is not necessarily observable before closer analysis and organization into the latent component via the PLCA algorithm (hence the “latent” descriptor). The semantic relation between the frequency components that constitute a latent component thus exists, but the realization of its existence requires analysis. The latent aspect of PLCA is derived from Latent Semantic Analysis/Indexing (LSA/LSI) and its derivative Probabilistic Latent Semantic Analysis/Indexing (PLSA/PLSI), which are used to correlate semantically-related terms that are latent in a potentially large collection of texts (Hofmann 1999). Important to note is that PLCA, like PLSA/PLSI, must consider the context of the data it analyzes. Because multiple words may have multiple meanings (i.e. polysemy) and multiple words may share the same meaning (synonymy), PLSA/PLSI must determine patterns in the relationships between words in the context of the body of texts it analyzes. Similarly, PLCA must determine patterns in the relationships between frequency components in the spectral context of the signal it analyzes. Latent components are additive in that all the latent components extracted from a signal should sum to reconstruct the original spectrum (Raj et. al. 2006, 4). They are also semantically useful in that, when extracted from a given sound mixture, they should contain meaningful features indicative of a particular thematic source within the sound mixture (Raj et. al. 2006, 1). For example, by applying PLCA to a recording of separate instruments playing simultaneously in a string quartet, one could represent each of the four instruments independently with four distinct latent components, or more accurately with four clusters of relatively sparser and related latent components. Summing the four distinct latent components reconstructs the original signal containing all four instruments. The basic method of PLCA is depicted below to the left, albeit highly simplified (compare to Fig. II.d above). It demonstrates a hypothetical case of source separation applied to a sound mixture containing four distinct sources, each described by multiple frequency components. Although the illustration resembles the depiction of Xenakis’ book of screens offered in Part I and depicted again below to the right, the “screens” in PLCA represent spectral subspaces and not moments in time as they do for Xenakis. According to Xenakis, his book of screens “equals the life of a complex sound,” implying that it represents a temporal evolution. The PLCA “book of screens” depicted
below to the left instead represents the summation of individual subspaces over frequency that constitutes the complex sound from which they were extracted via PLCA:
ISA/PLCA Xenakis’ Book of Screens
g f t
Fig. II.e: Above left: PLCA applied to a sound object to obtain subspaces (i.e. latent components), compared to Xenakis screens (above right). PLCA operation is depicted highly simplified (Casey, Westner 2001, 4).
Note how the same frequency components can exist in two different subspaces and thus latent components. As different sources in a mixture will inevitably share many of the same frequencies, one frequency component can describe multiple sources in a source mixture. This is shown more clearly below. Also, the number of individual frequency components that comprise any latent component can vary between latent components because particular sources within a source mixture may vary in their spectral richness. The static group of frequency components that comprise the latent component are amplitude modulated in a fashion determined by their time marginals. The amplitude dimension now replaces the spectrogram subspace dimension on the z-axis:
Fig. II.f: Left side of Fig. II.e including amplitude dimension. Latent components contain a static set of frequency components that are amplitude modulated according to their time marginals.
It should be noted that the number of latent components extracted from a signal with PLCA determines the accuracy of resynthesis: the greater number of latent components extracted, the more accurate the resynthesis. Small numbers of latent components will often be averaged, giving a rough description of the input signal. Large numbers of components yield more detailed information that uses multiple distributions to describe one source (Raj et. al. 2006, 3). Extracting and summing fifty latent components (i.e. playing them back simultaneously) from a sample (e.g. of a string quartet) resynthesizes that sample more accurately than would, for instance, four latent components, but less
accurately than would one hundred latent components. A larger number of latent components extracted implies a greater fragmentation of the original signal, and thus a greater spectral sparsity of each latent component. If more latent components than the number of sources in a source mixture are extracted with PLCA, the multiple latent components that correspond to each source must be clustered together after extraction to sufficiently describe each source. Each of these latent components, however, corresponds to a certain subspace of timbral characteristics that exists even within a single source—that is, a non-mixture—within the larger sound mixture. This raises the question, addressed in Part III, of what may constitute a sound mixture, and is an important aspect of PLCA exploited in the compositions discussed later in Part II. Like ISA, PLCA is intended for automatic music analysis and machine listening. It allows for feature extraction, its most straightforward application, by extracting time and frequency marginals from an input signal and combining them to yield latent components. Again, these latent components, which correspond to individual spectrogram subspaces, describe the independent elements within an audio scene. Feature extraction is a prerequisite for other PLCA applications. PLCA allows for source recognition by employing a pre-learned set of frequency marginals for each sound class comprising a given mixture. If such a mixture is assumed to be composed of the prelearned frequency marginals with differing time marginals, PLCA can be performed on the mixture to estimate the time marginals by holding the input distribution’s frequency marginals to the prelearned set of frequency marginals. The time marginals in turn reveal how much of each sound source’s frequency marginals are present at any time throughout the input distribution (Raj et. al. 2006, 3-4). Source separation extends the concept of source recognition beyond simply determining when certain frequency marginals are present and to what extent in a sound mixture. If frequency marginals are again pre-learned, individual sources can be separated from a sound mixture.32 Again, PLCA can be used to estimate time marginals corresponding to the different frequency marginals. Combining only the time and frequency marginals from a single source will yield a reconstruction of only that source, separated from the original mixture (Raj et. al. 2006, 4). Denoising is quite similar to source separation. If the frequency marginals of a single source in a noisy mixture are already known (i.e. pre-learned), PLCA can be performed holding the known frequency marginals of the desired single source fixed. Computed time marginals corresponding to all frequency marginals in the noisy sound mixture can then be separated into time marginals associated with the frequency marginals of the single source to be extracted and another set of time marginals associated with all other sounds in the noisy mixture. The time and frequency marginals corresponding only to the source to be extracted can then be combined to reconstruct that source without unwanted noise (Raj et. al. 2006, 5). The PLCA algorithm conducts a sophisticated form of frequency domain granulation. As a form of frequency domain granulation, PLCA conducts a spectral decomposition of a signal that yields sounds containing less frequency components, less sinusoids, as compared to the original signal. This spectral decomposition entails an informational decomposition, in which pieces of acoustic information from the original signal are allocated to different latent components. 33 Depending on the degree of spectral decomposition or fragmentation of the original signal, this informational decomposition may render a recognizable source unrecognizable in the single latent component. Informational decomposition has much to do with acousmatics, an important aspect of musique concrète, as discussed further in Part III. To the extent that it is intended for sound mixture applications, PLCA was misused for the compositional purposes of this thesis. It was utilized not to obtain individual sources within sound mixtures, but to obtain perceptually relevant, spectral building blocks (i.e. subspaces) of non-mixture
While employing a pre-learned set of marginals can improve the quality of the source recognition or source separation, doing so is not necessary. 33 Sampling (separating the sound object) and granulation involve a temporal decomposition, which entails an informational decomposition in the time domain.
sound objects for primary musical material. For example, while a sample of isolated spoken voice is typically thought of as a non-mixture, there are multiple potentially musical sounds deep within that “non-mixture” speech sound object, such as sibilants, vowels, and the interesting moments of transitions between them that are subsumed by the larger sound object they comprise. Bregman himself describes the voice as having “all kinds of sounds in it” (Bregman 1990, 646). PLCA allowed for the mining of the “non-mixture” sound object for musical material, for sounds that had existed in the sound object all along but which had not yet been “emancipated.” The misuse of PLCA also allowed for the discovery of a novel process deeply connected with the sound material obtained with it, illustrated succinctly below:34
Fig. II.g: “Exploded view” of the sound object, allowed for by PLCA and latent component misalignment.
This type of process, which manipulates the alignment and potential onset synchronies of frequency components within a signal, exploits the rhythmic qualities inherent in each of the many components that comprise that signal. It also allows for the creation of klangfarbenmelodie, or timbre melody, as a process that runs parallel to that of the deconstruction of the original sound object. Further, it allows for the acousmaticization and deacousmaticization of sound objects, as discussed in Part III. As it derived from the misuse of a new technology and served largely as a response to beatoriented techno, the music presented in this thesis is related to the glitch aesthetic, the “new microsound” (Thomson 2004, 211). Because PLCA is quite new, it offered the opportunity for the untapped misusage of technology, something composers of glitch constantly seek. Aside from its misuse, the “failure” of PLCA to perfectly reconstruct an input signal from its latent components was exploited. Certain artifacts of PLCA certainly add to the “glitchiness” of the sounds present in the compositions to be described, and contribute to rhythmic qualities of the music. The music to be discussed in this thesis, like much of glitch, has everything to do with the deconstruction of the sound object. Like a repaired broken vase, a reconstructed sound will almost inevitably exhibit remnants of its formerly deconstructed state. There is perhaps something beautiful about this imperfect “rebirth,” or “resurrection,” of the sound object. As Cascone writes, “new techniques are often discovered by accident or by the failure of an intended technique or experiment” (Cascone 2000, 13). The discovery of the new often involves a misuse of preexisting tools, often due to some sort of ignorance. It must be admitted that the algorithm, the “tool” at the heart of this thesis—PLCA—was largely misunderstood by the author upon its first implementation. Nevertheless, it was perhaps this ignorance that led to its exploration in ways that would not have been undertaken otherwise. While many of the capabilities of PLCA proved superfluous for the purposes of this thesis, these capabilities owe themselves to a particular design, which was essential to the character of the new sounds exploited in the following works.
PLCA Applications in Music
SoundSplitter, the PLCA M ATLAB implementation by Michael Casey, was applied to individual sound objects in three separate pieces. The only sounds used for the music in this thesis, aside from synthesized ones used in the first piece discussed on the following page, were excavated via PLCA from the sound objects to be described (all of which were less than 2 s in length). All latent components extracted were arranged in the digital audio workstation Ableton Live.
See Fig. II.a, page 23.
O no ma to s ch i zo
The musical phasing techniques of American composer Steve Reich helped to inspire the music presented in this thesis, and two of Reich’s phasing pieces that employ speech, It’s Gonna Rain and Come Out, particularly informed the composition of Onomatoschizo. Just as these pieces do, Onomatoschizo exploits the rhythms inherent in speech. However, while Reich’s phasing techniques create rhythms by employing two identical audio samples, which are modified through different tempo modifications (Reich 2002, 20), Onomatoschizo creates rhythms using different time alignments of unmodified, spectrally and morphologically distinct latent components extracted from a single audio speech sample using PLCA. While It’s Gonna Rain and Come Out work outside two identical sound objects to exploit the rhythm shared by each, Onomatoschizo exploits the multiple rhythms existing inside a single sample of speech, the very rhythms that help to comprise its timbral character. Coincidentally, just as Reich realized the musical potential of phasing “by accident,” so did the author discover the particular technique employed in Onomatoschizo and the greater potential for source separation and PLCA (Reich 2002, 20). It was actually a “failure” of technology that enlightened Reich to the potential of phasing, “two inexpensive tape recorders…[that] happened to be lined up in unison…[until] one of them gradually started to get ahead of the other” (Reich 2002, 20-21). It was more a misuse of technology than a “failure” that led to the procurement of the materials and the development of a process for Onomatoschizo. Again, source separation is intended for the extraction of semantically meaningful components that describe independent sources within a sound mixture. PLCA was (mis)applied to a (seemingly) non-mixture sound object, a human voice. As Xenakis noted, “[a]ll sound…is conceived as an assemblage of a large number of elementary sounds adequately disposed in time” (Xenakis 1992, 43). Again, all sound can also be conceived as an assemblage of such elementary sounds disposed in frequency space. The “non-mixture” sound object of the human voice employed in Onomatoschizo in fact contains many distinct sounds, namely certain manners of articulation: the various types of consonants and vowels. These distinct sounds that exist in almost any case of human speech are comprised of certain layers of frequency components. These sound layers have their own rhythmic characteristics and interact with each other in a natural type of counterpoint to form the overall “piece” of a spoken phrase.35 Onomatoschizo obtains its vocal sound material from a short sample extracted from an interview with the American composer Jon Appleton. The sample, a non-mixture, originally contained the female interviewer’s words, “not to mention the bloops and bleeps of most avant-garde computer music,” but was cropped to contain only “the bloops and bleeps” in order to restrict an otherwise cumbersome amount of material. This sample lasts for the duration of a half measure in common time at about 140 beats per minute, and its extracted latent components in Onomatoschizo, once introduced, are looped every half measure for a certain duration. Source separation was applied to the non-mixture audio sample using SoundSplitter, written by Michael Casey in the numerical computing environment MATLAB. After several trial resyntheses with various numbers of components, it was determined that at least 30 components were needed for a convincingly accurate resynthesis of the original audio. Resynthesis accuracy was important for the aim of the piece, which was to hint at and eventually reveal the semantic content of the original audio data by gradually aligning its components. Component numbers higher than 40, however, proved to yield diminishing returns in terms of resynthesis accuracy, as well as an undesirably large amount of sound material with which to work. A number of latent components between 30 and 40 was chosen arbitrarily: 31. All 31 components are equal in duration (just less than 2 s), and when all are played simultaneously and in alignment, they resynthesize the original audio to reveal its speech content: “the bloops and bleeps.”
This is an important point revisited in Part III. These rhythms that exist deep within the non-mixture voice can be thought to interact in counterpoint to form what we recognize as speech. Once the alignment, or onset synchronies of these different rhythms is manipulated, an entirely new counterpoint and global sound results. This new global sound, which nevertheless consists of the same smaller sounds that existed within the original, allows for objective listening to the smaller sounds that comprise speech, free of any semantic distractions.
They are aligned when their looping positions coincide on the time axis, and the more of the 31 components that play simultaneously, the closer the resynthesis approximates the original audio. It should be noted here that component sets were chosen intuitively based on how their constituent components interacted rhythmically when misaligned. They are grouped into three sets, which contain 11, 9, and 11 components respectively. Once a component was used in a component set, it could not serve as a constituent component of another component set. These sets serve to structure the piece as described soon. Lines representing set 1, 2, and 3 components have single, double, and triple layer patterns respectively:
Fig. II.h: Three latent component sets employed in Onomatoschizo, each aligned. When all 31 latent components are played simultaneously and in alignment, they resynthesize the original sound object.
An important feature of the extracted components, that they bear their own rhythmic characteristics, is exploited throughout the piece. As mentioned earlier in Part II, PLCA extracts time and frequency marginals from an input sample. Time marginals indicate when their corresponding frequency marginals are present at any point in the original audio data. As the various extracted components employed in Onomatoschizo have frequency marginals not audible continuously in the original audio sample (i.e. they have fluctuating amplitude envelopes), the components all exhibit their own temporal evolutions, or rhythmic characteristics, as they crescendo into perceptibility and decrescendo into silence. In addition to the components extracted from the speech sample, actual synthesized bloops and bleeps (or at least the author’s perception of what bloops and bleeps should sound like) are employed in the piece. While our discussion of the piece will focus on the arrangement of the speech material, as it alone underwent source separation with PLCA, it should be noted that the arrangement of the bloops and bleeps largely mimicked the patterns applied to the components of speech. The piece employs different 31 pitches, each of which are introduced in temporal proximity to the introduction of each of the 31 components. As is described soon, the components undergo misalignment and alignment processes. The pitches are gradually “misaligned” to form melodies through their onset asynchronies when the voice components are misaligned, and are gradually “aligned” to form chords through their onset synchronies when the voice components are aligned. The same basic process is employed throughout Onomatoschizo, and it exploits the additive property of the PLCA-extracted components, as well as their distinct rhythmic characteristics. Component set 1 (shown below) is introduced one component at a time. Once a new individual component is introduced, it is looped continuously for a certain duration. Each new individual component is presented out of alignment with the component introduced immediately before it. By this it is meant that the latent components are played simultaneously but with non-coincident looping positions (looping not displayed). Such misalignment prevents the summation of the additive components from revealing the semantic content of the original, decomposed sample. This is because the unaligned components’ time marginals, which indicate the time points in the original audio data where their corresponding frequency marginals are audible or inaudible, are not in the positions relative to each other that they were in the original audio data.
spectral subspace t Fig. II.i: Latent component set 1, introduced one component at a time. The components begin misaligned and are gradually aligned to hint at the semantic meaning of the spoken phrase from which they come.
It is only after all the individual components of component set 1 are introduced that they are gradually brought into alignment (shown above). Even with the 11 components of set 1 are playing aligned simultaneously, however, there are not enough out of the 31 total to closely approximate a resynthesis. At this point, the 11 combined and aligned components only begin to resemble speech. The process of misaligning and aligning components serves the goal of the piece to gradually hint at the semantic content of the original audio, but it also creates rhythmic interest. After all of set 1’s components are aligned, they are gradually taken back out of alignment (not displayed) before being interrupted by the first component of component set 2 (above left below):
spectral subspace t Fig. II.j: Component set 2 (on top) interrupts the 11 components of set 1 (not displayed). Once all 9 components from set 2 are playing, set 1 returns (on bottom) with all of its 11 components at once, misaligned (faded area depicts misalignment). Both sets are then gradually aligned together.
This marks the second of three sections comprising the piece. The same basic process described above is repeated, and the remaining 8 components of component set 2 are presented one-by-one, unaligned. This time, however, after all of set 2’s components are presented, all of set 1’s 11 components are simultaneously and abruptly reintroduced (unaligned) to accompany the 9 unaligned set 2 components. At this point, both component sets together are gradually brought into alignment. The 20 aligned components of sets 1 and 2 combine to resynthesize the original audio closer than could the 11 aligned components of set 1 alone. The semantic content of the speech is not yet clear, but the sound material does become more discernable as speech. After several measures of the
20 aligned components from sets 1 and 2 pass, the 20 components are brought back out of alignment (not displayed) as were the 11 components in the first section of the piece. While the 11 components of set 1 drop out one-by-one, the 9 components of set 2 continue independently for several measures (not displayed). The set 2 components in turn drop out gradually (not displayed) with the gradual introduction of component set 3 and the final section of the piece (displayed below). After component set 2’s 9 components drop out completely and all 11 components in set 3 are presented unaligned, component set 2 returns gradually, unaligned, to accompany the unaligned set 3 components once again. After all 9 components from set 2 return and all 20 components from sets 2 and 3 are playing unaligned, they are gradually brought into alignment in the same process used in the first and second sections. Component set 1 returns at this point, one component at a time (not displayed), with each component brought into alignment with those components from sets 2 and 3 that are already aligned. Components from sets 2 and 3 continue to be aligned and components from set 1 continue to be introduced in this same alignment. This proceeds until all 31 components from sets 1, 2, and 3 play simultaneously and in alignment. At this point, the original audio is accurately resynthesized and the semantic content of the words, “the bloops and bleeps,” finally becomes intelligible. The piece ends after several loops of the original sample from the Jon Appleton interview.
spectral subspace t Fig. II.k: Each component of set 3 (on top) is introduced misaligned, as were each those from sets 1 and 2. Set 2 joins one component at a time, each also misaligned. Sets 3 and 2 are then gradually aligned before being joined by the aligned components of set 1. All 31 components from sets 1, 2, and 3 play aligned to resynthesize the original speech sound object, to reveal the hidden words, and end the piece.
As can be inferred, the three component sets serve to structure the piece. Each section is characterized by the unaligned presentation of a new component set and its subsequent alignment. Each section also incorporates any component set or sets that precede it. The unaligned 11
components of set 1 are presented and aligned in the first section. The unaligned components of set 2 then interrupt set 1 to begin the second section. Set 1 is reintroduced, unaligned with set 2’s components before all 20 are aligned. The third section introduces component set 3, reintroduces sets 2 and then 1, and aligns all 31 components in order to reveal the speech and end the piece. In reflecting upon his phasing works, Reich came to “understand the process of gradually shifting phase relationships between two or more identical repeating patterns as an extension of the idea of infinite canon or round” (Reich 2002, 20).36 While Onomatoschizo does not employ “identical repeating patterns” (the components are by no means identical) and thus is difficult to compare to a canon or round, it can still be compared to a particular type of round known as the catch. The catch, though it employs identical melodies and lyrics in different voices as the round, is distinct from the round in that the different voices are meant to interact so that a phrase (often a lewd one) emerges that cannot be heard when only one voice is singing. While it is a particular misalignment of the different voices in a catch that reveals a “hidden” phrase at certain points in its progression, it is the correct alignment of the complete set of latent components in Onomatoschizo that reveals the “hidden” phrase by the end of the piece. 3.2 R acq ue le me nt
Like Onomatoschizo, Racquelement misapplies the PLCA algorithm to the non-mixture sound object. However, while Onomatoschizo terminates with the reconstruction of its original sound material through the total summation of its components and their correct alignment over the time axis, Racquelement terminates with the deconstruction of its original sound material. The sound material of each two pieces, manipulated as it was through PLCA and the respective processes of reconstruction and deconstruction in each piece, was not linked with these processes arbitrarily. The reconstruction of the spoken phrase, “the bloops and bleeps,” was, of course, intended to come as a surprise by the end of Onomatoschizo, but also to give (latent) meaning to the actual, synthesized bloops and bleeps employed from the very beginning of the piece. Racquelement, in contrast, exposes its manipulated sound material in its original form from early on in the piece. At the heart of the piece is an audible process of deconstruction, that of the cliché elements, the now unwanted and avoided sound objects of traditional techno. Unlike in Onomatoschizo, in which the source of the sound material is unknown until the end of the piece, the source of the sound material in Racquelement is quite obvious from an early point in the piece. Thus, whereas the reconstructive process of Onomatoschizo cannot necessarily be perceived as such until the end of the piece, throughout the process of Racquelement, the cliché sounds of techno can be perceived as deconstructed throughout. As a response to techno, Racquelement does, like much of glitch, substitute the sounds of digital detritus for those of traditional techno. However, the digital detritus here was obtained from the cliché sounds of techno themselves, and this is quite obvious. Unlike Onomatoschizo, Racquelement is not an acousmatic piece, as the source of its sound material is rather clear. Further, much of the detritus used in post-techno glitch is often simply substitutive, that is, it only contributes to the same rhythms characteristic of techno.37 In Racquelement, the same process of deconstruction applied to the sound objects to obtain the detritus sound material serves simultaneously as an audible process of construction—the sound material is deconstructed over time to construct new pitch and timbre melodies from the detritus. This deconstructive process, then, is used not only to obtain the new sounds of detritus, but also to form rhythms uncharacteristic of traditional techno, as well as the overall structure of the piece. Much like Xenakis achieved both timbres and musical form through the creative granular processes in Analogique B (Di Scipio 1997), Racquelement finds new timbres and musical form through a destructive process permitted by PLCA.
The canon employs a melody with one or more imitations or identical instances of the melody played after a certain duration. The round is a type of canon that employs different voices singing the same lyrics. 37 Regarding glitch: “[A] compressed millisecond of static stands in for the hi-hat, recognizable as such because that’s where the hi-hat would have been” (Sherburne 1998).
Racquelement individually deconstructed for its primary sound material four samples in all, each lasting one bar at 120 beats per minute in common time: ! ! ! ! A 4-4 bass drum beat Analog Synthesizer (a) sounding chord with A1, E2, F#2 Analog Synthesizer (a) sounding chord with A2, B2, E3, G3 Analog Synthesizer (b) sounding chord with C4, E4, F#4
PLCA was applied to the 4-4 bass drum beat to obtain 100 components. It was then applied to individually to samples of each of the chords produced by the analog synthesizers above; two of the chords were produced by the same synthesizer (a), the third by another synthesizer (b) with a different timbre. The samples of the lowest (C4, E4, F#4) and highest (A1, E2, F#2) chords were each separated using PLCA into 30 components. The middle chord sample (A2, B2, E3, G3) was separated into 40 components. Thus, the four brief samples of original sound material were separated into 200 total pieces of sound material: 100 pieces of a bass drum and 100 pieces of synthesized harmonies. It should be noted that the chords in each of the three chord samples have different onset times within each of their one bar durations. Thus, when each of the three chord samples is begun simultaneously, a harmonic progression is sounded. As in Onomatoschizo, the time displacement of components prevents their fusion into one and allows for closer attention to be given to their own inherent qualities, as sounds themselves. The same duration of attention originally devoted simultaneously to all of the frequency components in the sound objects can be devoted more fully to the relatively small number of sinusoids in the latent components. The primary sound material in Racquelement is far less interesting in terms of timbre than is the speech sample used in Onomatoschizo. Nevertheless, a world of sound exists within any complex sound, even that of the bass drum, which can be exploited economically in composition. As proved true for Onomatoschizo, the decomposition of sound objects into spectrally or informationally smaller pieces of sound allows for the construction of a sound world in Racquelement that would have been impossible with the unbroken sound objects alone. In general, it is often easier to create something new from materials at hand by breaking them down into smaller units. The higher frequency components of the bass drum, separated through the deconstructive process of Racquelement from the lower frequency components to which they were formerly bound by onset synchrony, sound alone as something akin to the relatively higher-frequency hi-hat of traditional techno beats. The various notes comprising the chords of the original sound objects, separated from each other through the same deconstructive process, come to serve as spectrally thin units of melody. The piece begins with the gradual introduction of each of 40 components obtained from the sample of the four-note chord produced by synthesizer (a) (A2, B2, E3, G3). As in Onomatoschizo, after each component is introduced, it loops repeatedly for a certain duration. Each of the 40 components from this sound object is introduced in alignment with the others until all combine to reconstruct the original sample. This process of introducing components accelerates, and is joined by the same process applied to the other three sound objects, two other chords and the bass drum. By the end of the first minute of the piece, the original sound objects are all reconstructed through this type of additive synthesis, their full forms exposed. This basic process, the same for each set of components associated with the four separate sound objects, is illustrated below with only three latent components for lack of space. These layers only represent three out of two hundred (see the Appendix for a fuller view): Full Loop
Fig. II.l: Introduction of 3/200 latent components, aligned to reconstruct a clichéd sound object of techno.
The ticks above the components are meant to clarify the introduction of components at the rate of one bar (here subdivided into quarters), which places them in the same alignment in which they existed in the original sound objects. Once the four original sound objects are constructed from their components, each of the 100 components of the bass drum and each of the 100 components of the harmonic progression are misaligned one at a time, similarly to how the components of Onomatoschizo were misaligned. However, the components in Onomatoschizo were misaligned (tediously) by restarting each looped component at an irregular rate. This irregular rate was necessary to achieve a more-or-less metrical rhythm with ametrical speech components. In Racquelement, the four sound objects in their original relative positions do form a metrical rhythm, and thus could be introduced at regular rates to create new metrical rhythms. Below, the first realignment process, which restarts components at a rate of a triplet quarter note, is shown:
Incomplete Loops Full Loop Full Loop
Fig. II.m: First realignment process, at the rate of a triplet quarter note. The realignment process involves latent component loops restarting before finishing their current loop (again only 3/200 components displayed).
The 100 components of the bass drum and 100 components of the chord samples are misaligned one-by-one, a process applied to both of the 100-component sets simultaneously. Just as in Onomatoschizo, the misalignment process of components creates new rhythms. In Racquelement, though, perhaps due to the silences between each of the four hits of the bass drum in the drum sample and chords, and perhaps due to the larger number of components used relative to Onomatoschizo (which employed only 31), the misalignment and realignment of components in Racquelement produces clear rhythmic glissandos. Further, the multiple steps of misalignment in Racquelement, each of which restarts components at a distinct, regular rate, also gradually produce new melodies, or melodic glissandos, throughout the piece. A new misalignment process commences several bars after each of the 200 components have been restarted at the misalignment rate of a triplet quarter note and the rhythm has reached exhaustion. In the same order, the 100 components of the bass drum and the 100 components of the chord samples are restarted, this time at a rate of a sixteenth note. This second misalignment process takes over the recently achieved alignment of the components whose onset times are separated by a triplet quarter note. This overriding sixteenth note realignment rate is depicted below:
Fig. II.n: Second realignment process, this time at the rate of a sixteenth note.
The misalignment process is again allowed to complete and the rhythm to exhaust itself, this time followed by a new realignment of components at the rate of a triplet half note, as displayed below:
Fig. II.o: Third realignment process, now at the rate of a triplet half note.
Once this process completes, 199 components stop suddenly, leaving a single component of one of the chord sound objects to be heard alone. Each of the 99 other chord components is then introduced, now at a rate of a triplet eighth note. The 100 bass drum components do not begin to arrive until all 100 chord components have begun to play. As the same basic alignment process applies throughout the rest of the piece, only with different realignment rates, more images will only become redundant and words shall suffice. After all 200 components are looping and the triplet eighth note rhythm exhausted, the next step of realignment is applied, this time at a rate of an eighth note. Two more realignment processes follow, with rates of a triplet sixteenth note and a thirty-second note respectively. The thirty-second note rate of realignment, not surprisingly, results in the most drastic deconstruction of the original sound objects. This is because it restarts components the most rapidly, yielding a higher number of onset asynchronies per unit time than the slower realignment rates. The piece ends gradually as it begins, with each of the drum components halting one-by-one after all 200 components have been realigned at the rate of a thirty-second note. The remaining components of the chord sound object stop oneby-one until the last component loops two times alone to bring the piece to a close. While the connection between Racquelement and the phasing pieces of Steve Reich is not as strong as it is for Onomatoschizo, Racquelement can still be perceived as a phasing process. Because the primary sound material for Racquelement was not rhythmically or timbrally interesting as was the speech sample in Onomatoschizo or that of Reich’s It’s Gonna Rain, new rhythms and timbres had to be constructed (via the deconstruction and realignments) rather than exploited. That is, the deconstruction of rather unmusical sounds directly contributed to the construction of musical rhythms, melodies, and form. Like much of Cornelia Parker’s work, Racquelement exploits the cliché, what Parker calls the “avoided object,” and allows the listener to perceive “the transformation of the ordinary object into the abstract and extraordinary” (Parker 1996, 5; Parker 2000, 11). As Parker’s work exhibits “exploded views” (Parker 1996, 5), Racquelement exhibits “exploded sounds.” Not until the cliché techno elements employed in Racquelement were “exploded” with PLCA could they be scattered over time into something entirely different. Racquelement makes audible the “death” and “resurrection” of the very familiar and overused, the transformation of the cliché sounds of techno into something new, a scattered yet organized mass of smaller, distinct sounds. Frequency domain granulation via PLCA allows for a particular process of deconstruction that can be perceived throughout time, one that cannot be achieved through time domain granulation. This is precisely because the spectral deconstruction permitted by frequency domain granulation is not at all conducted in terms of time. Of course, time domain granulation or even sampling on the larger level can result in a perceptible deconstructive process, but this must involve some exposition of a longer sound that is subsequently cut down into shorter sounds that are reordered, as discussed in Part I and revisited in Part III. If used in conjunction with granular reordering, time domain granulation can result in a deconstruction of the original sound object that can be perceived to occur over time, but this is very different perceptually from the deconstructive process discussed above. It is interesting to note that, while the glitch is not something overused like the cliché, both are often avoided—the former for its unfamiliar (and “unpleasant”) sound, the latter for its all too familiar (and “unpleasant”) sound. Despite all the clichés and timbral simplicity of the four primary sound objects used in Racquelement, their spectral excavation via PLCA yielded new and interesting smaller sounds, which were not only interesting because of the “glitchy” artifacts of the PLCA
decomposition they exhibited. The cliché of these sounds was purely the result of their summation and alignment. Heard alone, the components that once combined together to form the cliché can indeed be perceived as new (not cliché) sounds. They can be appreciated for their own features, uninterrupted by the other frequency components that once existed alongside them in the original sample, sounds that combined with them to comprise the cliché. 3.3 Str at o vi n sk y
The final piece composed for this thesis is the only one to apply PLCA to what would traditionally be thought of as a sound mixture: a brief orchestral sample. As the title suggests, this sample comes from the music of Russian composer Igor Stravinsky, and encompasses the famous chord introducing Danse Sacrale (Sacrificial Dance), the last scene of his ballet, Le Sacre du Printemps (The Rite of Spring). Although the sound object employed for this piece was certainly no non-mixture, it was decomposed using PLCA into 128 latent components, far more pieces of the spectrum than the number of instrumental sources that existed in the original orchestral environment. Thus, the latent components cannot be said to represent distinct sources within mixture sound object, but rather spectral fragments of sources, as did those in Onomatoschizo and Racquelement. Like Onomatoschizo and Racquelement, this piece exploits the alignment and misalignment of latent components as a major compositional parameter. However, Stratovinsky also explores the spatialization of latent components, which proves to be a very promising parameter of music that employs latent components. The primary inspiration to use spatialization stemmed from its applications in microsound (Roads 2001, 107). It seemed natural to embrace a compositional parameter particularly suited for music that employed vast quantities of one type of sound object decomposition (i.e. the grain) and apply it to music employing vast quantities of another (i.e. the latent component). However, as discussed soon, the spatialization of latent components has implications very distinct from that of grains, as it aids in the segregation of spectral fragments that would otherwise integrate. This phenomenon—fusion—was discussed earlier in Part II. As Bregman notes, both sequences of sounds and simultaneous sounds close in frequency that derive from the same spatial location have the tendency to integrate (Bregman 1990, 644, 659). “Illusions can be created,” Bregman writes, “by setting up a competition between the tendency to group sounds by their frequency similarity and by their spatial similarity” (Bregman 1990, 644). Their frequency similarity is not the only factor, however—amplitude envelope trajectory (i.e. onset synchrony) is another important grouping factor. This grouping “competition” between spatial derivation factors and other is a key point of interest in Stratovinsky, which scatters different spectral strata from Stravinsky’s famous chord throughout the performance hall in both time and space. The sampled Stravinsky chord from The Rite of Spring (about 1.5 s in duration) was chosen for its seasonal relevance to the performance date of Stratovinsky (spring 2010 Festival of New Musics at Dartmouth College), but more importantly for its recognizability, as clarified soon. Important to note is that, as did the previous two compositions discussed in Part II (excepting the synthesized bloops and bleeps in Onomatoschizo), Stratovinsky used as its sole musical material only those sounds that existed all along in the sound object it decomposed via PLCA. A truly vast amount of sounds exists within any complex sound object—most of which may be integrated into the larger whole—and although Stratovinsky may seem to exhibit an economy of means in its use of musical material, this is not the case. 128 latent components proved a truly onerous amount of sound material with which to work, despite the fact that they all came from less than two seconds of sound. Each of the 128 components extracted from the Stravinsky chord are directed to 16 channels. The latter are each sent on their own spatialization trajectories throughout the course of the piece, to be controlled in real-time improvisation. All 16 channels are sent through a four-channel surround speaker setup, and the spatialization is controlled using Ambisonics, a spatialization toolkit for the graphical music programming language Max/MSP. As were the components in Onomatoschizo and Racquelement, the ones in Stratovinsky are arranged in the loop-based software sequencer Ableton Live (but only in Stratovinsky are they sent from Ableton Live to Max/MSP for spatialization). The four-
channel speaker setup (marked with squares) and the initial spatial positions of each of the 16 channels (marked with circles) are displayed below to the left in Fig. II.p:
Fig. II.p: Above left: initial positions of 16 channels in Stratovinsky (circles), each of which contains the same set of 8 latent components throughout the piece. The 16 channels are output to a quadraphonic surround speaker setup (squares), and are each sent on distinct trajectories. Above right: each of the 16 channels is sent on a random circular trajectory to fluctuate between clockwise and counterclockwise paths.
To clarify, circle 1 (i.e. channel 1) contains latent components 1, 17, 33, 49, 65, 81, 97, and 113. Circle 2 (i.e. channel 2) contains latent components 2, 18, 34, 50, 66, 82, 98, and 114. Circle 16 (i.e. channel 16) contains latent components 16, 32, 48, 64, 80, 96, 112, and 128. The associations between these particular channels and latent components are preserved throughout the piece. Like Onomatoschizo and Racquelement, this piece begins by introducing each of its latent components misaligned and one-by-one before looping them continuously. Because the different sources in the orchestral sample have approximately synchronous onsets as the chord is sounded, the misalignment of the latent components creates asynchronies (i.e. transforms the chord into melodies). Like Racquelement (with 200 latent components), Stratovinsky (with 128 latent components) introduces its latent components at a relatively faster rate than Onomatoschizo (with only 31 latent components). Latent components are introduced in a sequential order (1-128), and thus arrive from the different spatial locations allocated to the 16 different channels spread throughout the room. The latent components are introduced at the rate of an eighth note until all 128 are looping. As more and more of the 128 latent components arrive, the spatial location of each of the 16 channels is manipulated in real-time. The 16 channels are directed to move gradually faster on a random, circular trajectory around a common center. Their positions relative to each other are not fixed, and the channels can cross each other’s paths in clockwise and counterclockwise movements. This movement is kept relatively slow as the introduction rate of an eighth note changes to a realignment rate of a triplet quarter note (the reader may wish to revisit discussions of realignment processes in Racquelement). The spatial movement increases in speed as each of the 128 latent components are realigned into this new rhythm and then into a half note realignment rate. It should be noted here that the overall sounds that result simply from the time realignments of the latent components differ just as drastically from each other as they do from that resulting from the alignment of the latent components in the original sample. The “micropolyphonies”38 that result from the different interactions of the latent components seem to contain something reminiscent of a choir of distant voices, which chant morphing nonsense phrases with the new realignments.
This term is borrowed from composer György Ligeti (Ligeti et. al. 1983, 14-15), as discussed in Part III.
After all 128 latent components are realigned to produce the new overall sound and rhythm via the half note realignment rate, they are realigned to reproduce the original Stravinsky chord about halfway into the approximately five and a half minute piece. The transitions from eighth note to quarter note triplet to half note realignment rates represent movements increasingly closer to the original alignment of the latent components in the original sample. With slower and slower realignment rates, more and more latent components are triggered to restart at the same time and an increasing number of the onset synchronies that occurred in the original result. If two latent components begin simultaneously (i.e. exhibit onset synchrony), they are more likely to fuse into one, larger sound. To help prevent this from occurring, to aid in the segregation of the various latent components despite more and more onset synchronies, their movements through space are made faster to the point that they become quite noticeable. Again, as Bregman notes, a “competition” can be encouraged between sounds’ tendency to fuse, due to such factors as frequency or onset similarity, and their tendency to segregate, due to spatial separation (Bregman 1990, 644). The “illusions” that such a competition fosters, which prevent the integration of sounds that our auditory system would otherwise fuse, were the aim of Stratovinsky. As the latent components begin to realign (at the rate of one bar) into positions that resynthesize the original sample, each of the 16 channels are sent rapidly on circular trajectories to new locations in the quadraphonic setup before the onset of what begins to be perceived as the looped chord (the channels retain their radial distances from the center of the room). The channels’ rapid movement is temporarily frozen for the onset of each looped chord and unfrozen before the next onset. Although the Stravinsky chord can be heard before all 128 latent components are realigned, the process continues until all 128 components have synchronous looping positions and the full sound of the chord is resynthesized. Examples of the 16 channels’ spatial positions corresponding to each new onset of the gradually reconstructed Stravinsky chord are illustrated below:
Fig. II.q: Newly reached spatial positions for the 16 channels, each corresponding to a new onset of the gradually reconstructed Stravinsky chord. This repositioning between chord onsets aims to emphasize the independence of latent components, each of which the larger sound of the chord would otherwise absorb.
Before the two-minute mark of the composition, microsonic grains extracted from the same Stravinsky chord sample are introduced to provide a new layer of rhythmic flow. The grains are arranged in a looping rhythm, which is borrowed from the same scene (Sacrificial Dance) from which the sample is taken (although this rhythm is sped up to about twice its true tempo). Throughout the course of the looped, granular rhythm, grains with increasing onset times in the original sample are played back. This same rhythm employing a sequence of grains from the Stravinsky sample is employed in 8 out of the 16 channels, and each looping rhythm, as was each latent component, is introduced one at a time, out of alignment with the rhythmic loop that was introduced before it. These rhythmic loops are introduced at eighth note intervals, as were the first latent components. Because they are introduced while the latent components are looping at a rate of a quarter note triplet that produces a dense texture, the grains are not easily perceived until the latent components
are realigned at the half note rate that yields a relatively sparser rhythm with more silences in between onsets (the half note realignment rate essentially divides the Stravinsky chord into two, smaller chords that alternate within each bar). All 8 tracks of the misaligned, looping rhythms of grains have entered by the time the one bar realignment rate begins to reconstruct the original Stravinsky chord. As more and more sound is concentrated with this reconstructive process to one instant (i.e. the chord) within each bar, the granular rhythms become increasingly perceivable during the silences between each onset of the reconstructed chord. When the chord is finally reconstructed, all but one of the rhythm tracks cease. At this moment, one out of the 8 latent components in each of the 16 channels stops looping every bar the chord is looped. Thus, in 8 bars, the 128 latent components that reconstructed the original sample are reduced to 16 latent components, one for each channel. While this happens, the granular rhythms are reintroduced, this time at the rate of a half note triplet. With the introduction of the fifth rhythmic loop of grains, all of the missing latent components begin to be reintroduced as well, also at the rate of a half note triplet, but this time in groups of 16 (one for each channel). The rhythms resulting from the eight granular tracks and those resulting from the latent components thus begin to mimic each other. When all of the 128 latent components return, aligned in a fashion determined by their half note triplet reintroduction rate, they are subjected to two more realignments, in turn: one at the rate of a quarter note, the second (and last) at the rate of an eighth note. With the commencement of each realignment, the eight tracks of granular rhythm are realigned at the same rate applied to the latent components. The introduction/realignment rates of the latent components largely structure the piece and their overall sequence throughout the piece can be summarized as follows: 1/8, 1/4t, 1/2, 1 bar, 1/2t, 1/4, 1/8. The piece, then, although the quarter note triplet and half note realignment rates it applies to latent components are replaced with quarter note and half note triplet realignment rates respectively, is still palindromic—the piece is divided down the middle by the reconstruction of the Stravinsky chord. When all 128 latent components are once again realigned at the rate of an eighth note as they were introduced in the beginning of the piece, the rhythmic, granular loops begin to fall out, followed by the latent components themselves to end the piece. Although the spatial movement of the 16 channels is more rapid with the relatively slower realignment rates of the latent components (and most rapid during the reconstruction of the Stravinsky chord with the one bar realignment rate), it continues, gradually slowed down, throughout the rest of the piece. As the 128 latent components begin to cease, one-by-one, the 16 channels are sent on completely random trajectories throughout the room. That is, they are released from their random circular trajectories and freed to move anywhere within the room, demonstrated below:
Fig. II.r: From left to right: transition from random circular motion to completely random motion.
Considering Stratovinsky in retrospect, it now is clear that spatialization is one of the most interesting and promising parameters of composition employing latent components. If our goal is to isolate the many beautiful and often “overheard” sounds that exist within complex sounds, their spatial separation will be conducive to their closer and independent appreciation.
I was interested in an orchestration of objects, in the sense that each object is itself, its own sound, but that all the objects together create something larger than the sum of the parts. –Cornelia Parker
Closer attention is owed to the frequency domain as a site for the musical separation of sounds from others in the sound object. While larger sounds can indeed be separated into smaller sounds along the axis of frequency, the musical separation of sounds from others—namely in musique concrète and microsound—has focused almost exclusively on the time domain. For musique concrète, the desired, separated sounds constitute the single sound object. For microsound, the desired, separated sounds constitute the single grain. In both aesthetics, desired sounds are separated in terms of time, and the undesired sounds are set aside to prevent any interference they may cause with the perception and appreciation of the desired in their own right. Again, microsound largely arose as a response to musique concrète, through Xenakis’ “defiance of the usual manner of working with concrète sounds,” those sound objects he referred to as “block[s] of one kind of sound” (Solomos 1996). Xenakis’ Concret PH, the first work of microsound, exhibited microsonic granulation as a technique that decomposed the block of the sound object into smaller “particles” restricted to the microsound timescale (Solomos 1996). Granulation, which extracts grains from a larger sound object, entails the separation of these smaller sounds from the larger sound object purely in terms of time. It is important to note that this type of separation differs only in timescale from that used by Schaeffer as he developed musique concrète, which entailed the separation of the sound object in temporal terms from other sounds on the longer sound recording. This is true despite the obvious perceptual difference between the fleeting grain and the longer sound object. Cort Lippe describes granulation as musique concrète, that it “is part of the same world of musique concrète in which recorded sounds are manipulated and transformed” (Lippe 1994, 4). Many smaller sounds may be thought to exist over the two dimensions of sound: time and frequency. Nevertheless, the borders of separated sounds have largely been written in the time space, not frequency space, of the larger sound from which they come. Perhaps this is due to a cultural precedent, a meme, set by the blade and magnetic tape, by tape-cutting and -splicing, technologies and techniques that heavily influenced musique concrète and microsound, and which could only separate sounds from others in terms of time. Whatever the reasons, although the sound separations characteristic of both musique concrète and microsound are largely reserved for the time domain, smaller sounds can indeed be separated from larger ones in terms of frequency. The music discussed in Part II represents an alternative approach to the “block of one kind of sound” that is the sound object, one that decomposes it in terms of frequency rather than in terms of time as microsound has done (Solomos 1996). Decomposing primary sound material along the frequency axis is nothing radical, and has been referred to by Roads (albeit awkwardly) as frequency domain granulation (Roads 2001, 275). However, it has until now been considered only secondarily as a means by which to separate sounds from others to procure musical material. It deserves much more attention, as it yields sounds suited for particular musical processes and structures unachievable through the use of those smaller sounds yielded by microsonic granulation. While filtering can and certainly has been used to decompose the sound object into qualitatively new sounds with less sinusoids (i.e. as in subtractive synthesis), filtering cannot break the sound object into its maximally contrasting features as can PLCA. That is, filtering cannot break the sound object into smaller yet coherent fragments of sound as can PLCA. Further, filtering is most often used to transform one sound object into one new sound object, not to break one sound object into scores of additive frequency component groupings, new smaller sound objects. Part III begins to explicate a compositional theory around the latent components of PLCA, considering them from the perspective of musique concrète and microsound. It argues that any sound we hear, any timbre, is the result of a natural counterpoint between sinusoids, and that the latent component can indeed be considered as a sound object, a voice or line in a natural, contrapuntal texture.
Sound Objects within Sound Objects
Despite his focus on tape-cutting and the separation of the sound object from its former temporal environment on the recording, Schaeffer himself did at least touch upon the idea of obtaining the sound object by separating it from its former spectral environment.39 He asserted that a sound object can be separated into smaller sound objects through further cuttings (i.e. temporal separations), and implies that “double sounds” can be separated into these smaller sound objects through filtering, although it is difficult to do so (Schaeffer 1967, 43.1, 75.1). Schaeffer faced a poverty of technology that prevented him from decomposing the sound object in ways that considered its spectral context, and would likely have been interested in source separation technologies such as PLCA that would have enabled him to convincingly separate “double sounds” into their constituent pieces. First we shall look closer at Schaeffer’s sound object, as proposed in Solfège de l’objet sonore (Schaeffer 1967).40 As Schaeffer wrote:
The sound object must not be confused with the sounding body by which it is produced (Schaeffer 1967, 73.2).
So while a sound object “is defined by its causal coherence” and “coincides with the fleeting history of an acoustic event,” it is not equivalent with the sounding body that produced it (Schaeffer 1967, 75.1). Schaeffer notes that as one sounding body “may supply a great variety of objects whose disparity cannot be reconciled by their common origin,” the “features of an object cannot be related to those of the parent instrument” (Schaeffer 1967, 73.2, 73.3). For example, the strings of the piano can be struck by the hammers of the instrument, but they can also be bowed—each case produces a very different sound. Schaeffer stresses that magnetic tape is no more equivalent to the sound object than is the traditional musical instrument such as the piano, even if the former may help to disguise or “veil” the origin of the sound object and thus let its meaning “transpire” (Schaeffer 1967, 73.4). Indeed, “magnetic tape conceals a new and more insidious trap,” which may entice listeners to regard “the recording itself as an object or…[to use] the same recording to set up new cause-to-effect relationships between hypothetical new causes and hypothetical new objects” (Schaeffer 1967, 73.4). Thus, the sound object “is not identifiable with the recording” (Schaeffer 1967, 73.4). Like a traditional musical instrument, the recording medium can be treated in different ways to produce different sounds. Regarding the sound object and recording, Schaeffer writes:
They so closely resemble each other that you might imagine that you had captured the sound object on tape. In fact, when the recording is played at the right speed it does restore the original sound phenomenon…. This recorded fragment is not merely a recollection but also becomes a source and an instrument; the slightest variations of the playing speed or intensity will create new objects which will be different from the initial one, just as the same sounding body can give birth to completely different sounds (Schaeffer 1967, 74.1).
To demonstrate this principle Schaeffer offers a recording of a “test object” on the LP accompanying Solfège and then this same object “speeded up and slowed down,” followed by a recording of the same “whole sound mass swept from treble to bass with a filter” (Schaeffer 1967, 74.4). He reflects:
What is it that remains common to all these objects in spite of their differences? The answer is “form” and “fabric”, which are the essential criteria of sound morphology…. But the same recording can be made to yield objects of different morphologies. Thus a variation in mass [emphasis added] changes the texture of an object, and the sound fabric is also no longer fixed…. Nevertheless, there are still one or more features common to the initial sound and the objects manipulated in this way (Schaeffer 1967, 74.4).
Mass is a rather important concept to this thesis, and points to why bandpass filtering a sound object cannot necessarily break it into perceptually coherent fragments to be used as qualitatively distinct materials in composition.
Temporal and spectral separations, because they both involve some kind of recording medium, both encompass causal separations—the separation of a sound object from its original sounding environment, “from the ‘audiovisual context’ to which it initially belonged” (Landy 2007a, 78). 40 Solfège accompanied an LP with numerous sound examples Schaeffer used to support his arguments.
M as s Wishart gives his own description of mass in the following:
[V]arious properties of a sound may be altered by filtering. With certain sounds, however, though qualitative changes can be perceived as a result of filtering, we do not feel that the underlying sound-object has been fundamentally altered. The instrumental tones of conventional musical practice are typical examples of sounds which…are resistant to filtering. They are not the only ones, however, and it is necessary to define a more general characteristic of sound-objects having this property. Following the French terminology, we will refer to this as mass (Wishart 1996, 68).
Isolating a certain bandwidth of a sound object via filtering will likely preserve the identifiable features of the original sound, just as isolating sequential words of a sentence will likely preserve much of its meaning (isolating non-sequential words or all instances of a certain letter would alter the sentence much more drastically). Schaeffer proposed his notion of sound mass in the following:
If we filter bands of a given width and different central frequencies out of white noise, they follow each other like notes in a melody…. When structured sounds are manipulated in a similar way, the results are the opposite. Such sounds…are almost indestructible. Whether they are tonal or complex, whether they have a harmonic spectrum or consist of several interwoven spectra, they remain unaffected when high-pass filtered, and although their timbre is altered when the middle register is filtered out, severe band-pass filtering does not bring out melodic changes as in the case of white noise…. [T]he timbre is altered, of course, but something in the texture remains unchanged. We now hold the key to the code, which can be defined in linguistic terms: that which does not change…is its “mass” (Schaeffer 1967, 1.18–1.22).
It is interesting that Schaeffer chose the word “indestructible” (or “almost” so) to describe such massy sounds resistant to filtering.41 While sounds of high mass may be “almost indestructible” under simple filtering, they are indeed destructible under the surgical filtering effects of frequency domain granulation approaches such as source separation. That is, the qualitative characteristics of massy sounds may not be resistant to frequency domain granulation as they may be to filtering. While simple bandpass filtering removes a certain frequency bandwidth of sinusoids from a sound, the resulting, spectrally sparser sound may not differ so much morphologically from the original. Whether a bandwidth of retained frequency components or discarded ones, a bandwidth is simply a grouping of sequential frequencies (e.g. 440 Hz to 880 Hz). Just because these frequencies are sequential does not mean they are essential or unessential to the identity of a sound they constitute. Within this bandwidth may exist enough remnants of the original, unfiltered sound to betray its identity. Through it may seep distinct, qualitative characteristics of the original. In order to render a massy sound unrecognizable by removing frequency components, to fundamentally alter its texture, we must look beyond mere bandwidths, toward other, more perceptually relevant relations between frequency components throughout its entire spectrum. Frequency domain granulation may remove frequency components related in perceptually meaningful ways, component groupings that contribute to important aspects of a sound’s morphology. For instance, PLCA, based on ISA, decomposes a sound into smaller sounds with “maximally contrasting features” (Casey, Westner 2004, 4). Latent components have their own distinguishing characteristics, their own distinct morphologies, but not the general morphology of the original, larger sound from which they are separated. Hence the potential for frequency domain granulation to destroy a massy sound that simple filtering cannot. Schaeffer continues:
So this new notion, as important as the notion of pitch, is brought to light: that of sound mass…. Whether it is tonal or complex, concise or diffuse, related to a harmonic or non-harmonic spectrum, whether it consists of a single or an unlimited number of frequencies, mass is a musical perception that accounts for the harmonic structure of a sound object…. [T]he structures to which the ear refers depend on the mass of the object which is presented to it (Schaeffer 1967, 1.23). With a high enough degree of filtering, any sound can be reduced to a single sinusoidal frequency component. Of course, its particular amplitude envelope gives that frequency component a special connection to the sound from which it is filtered.
PLCA allowed for high mass sounds to be destroyed, broken down into distinctive sounds to be combined in new ways for the music presented in Part II. Source separation, a sophisticated form of frequency domain granulation, allows for high mass sounds to be decomposed in terms of frequency in ways that simple filtering cannot—it affects the mass of a sound while simple filtering may not. 1.2 Sou n d O b j ec ts , M us i cal O b je c ts, & M ag ne ts
Schaeffer distinguished between a sound object and a musical object, writing, “[a] sound object is not a musical object” (Schaeffer 1967, 74.6). Although a musical object is necessarily a sound object, a sound object is not necessarily a musical object because the former may contain multiple musical objects, or smaller sound objects. He stated:
[W]hen the cymbal is struck with a padded stick similar to a piano hammer it does not emit a “double sound” as the vibraphone does…. Why double? Because it consists of a very brief metallic impact followed by a resonance which is made linear by the design of the vibraphone (Schaeffer 1967, 43.1). A sound object is defined by its causal coherence; it coincides with the fleeting history of an acoustic event. This is not enough to make a musical object. A metal sheet whose edge is struck, for instance, produces…[an] object, which undeniably has a unity of sound…. It will be noticed at once that this sound object contains at least42 two musical objects which are spontaneously appreciated. This mental separation into two objects is not based on fact and it is actually rather difficult to reproduce it materially by means of filtering [emphasis added]. A high-pass filtered version retains the main characteristics of the attack…[and] a low-pass filtered version retains only the low resonance…. [O]ne is surprised by the relationships43 which are set up between the various components of the objects (Schaeffer 1967, 75.1).
This “mental separation” that Schaeffer referred to is similar to the type of automatic separation that source separation technologies aim to conduct. Still, the very term source separation highlights its intent: to separate sounds in a sound mixture produced by distinct sources. The “double sounds” Schaeffer discussed emanate from singular sources, but they are still separated by our mental mechanism into distinct sounds. As Schaeffer noted, while our brain conducts this separation for us, it is difficult to achieve with simple filtering techniques. It is the mass of such “double sounds” that make them indestructible under simple filtering. Source separation, if misused, if applied to sound non-mixtures such as the “double sound” and not sound mixtures, can be used to deconstruct massy, non-mixture sounds and isolate those smaller musical objects that may exist within them. Immediately after his latter statement above, Schaeffer applied this same concept—that of separating the single, larger sound object into multiple, smaller sound objects—to the time domain, a region of sound over which he held relatively more control:
One must also beware of [time domain] cuttings [i.e. as opposed to frequency domain filterings]. Just as a magnet breaks into several smaller magnets, so a sound object, when cut into three segments, gives three new objects each with a beginning, body and decay (Schaeffer 1967, 75.4)
It is likely that Schaeffer, had he access to the appropriate technology (i.e. PLCA), would have more thoroughly explored the idea of separating a larger sound object into smaller musical objects— smaller sound objects—over frequency. Indeed, his last statement on the previous page portrays those musical objects within the spectral space of the larger sound object as “smaller magnets” to the larger “magnet” that is that larger sound object. Nevertheless, as Schaeffer himself noted:
Musical ideas are prisoners, more than one might believe, of musical devices (Schaeffer 1977, 16).
His musical devices—magnetic tape and the blade, as well as simple filters—allowed Schaeffer to separate the larger sound object into smaller sound objects in terms of time, but not as convincingly
Multiple musical objects may fuse within the sound object, making them individually imperceptible. Separating them with PLCA and realigning them over time would prevent fusing and render them perceptible. 43 As argued later, such relationships can be considered as a natural counterpoint between frequency components.
into smaller sound objects in terms of frequency. Xenakis too was limited in this way, and broke the sound object into shorter sounds that he called grains, again in terms of time. A spectrogram example of what Schaeffer would have considered a “double sound,” a brief strike on the gamelan, is given below to the top left of Fig. III.a. To the right is a spectrogram of a high-pass filtered version of the sound object. To the bottom left and right are spectrograms of two musical objects extracted from the original through PLCA—the strike and resonance respectively. While the filtered version to the top right is perceived merely as an altered form of the original, the musical objects on the bottom are perceptually distinct and very new sounds. Note how the musical objects overlap in frequency, and thus cannot be separated from one another via filtering:
Original sound object (gamelan)
High-pass filtered version
Musical object (a): strike
Musical object (b): resonance
Fig. III.a: A “double sound” from a struck gamelan (top left) high-pass filtered into an altered form (top right) and separated via PLCA into its constituent musical objects, the strike (bottom left) and resonance (bottom right).
What we need is some rule which would provisionally hold true for any sound chain and enable us to extract from it that raw element which we have called the sound object, thus isolating it from its environment. An object, however, is always determined by the structures to which it relates; a link is always inseparable from the chain (Schaeffer 1967, 81.4).
By “environment” Schaeffer implies temporal environment, the context of the time domain. By “chain,” Schaeffer means a chain of sounds over time. With the source separation technologies now available, we can excavate coherent and perceptually relevant musical objects from a single sound object, that is, through separations along the frequency axis. As a form of frequency domain granulation, source separation may fundamentally alter the underlying sound object in ways that simple bandpass filtering cannot; the mass of a sound object is not necessarily immune to frequency domain granulation as it may be to simple bandpass filtering. The products of frequency domain granulation may not have any identifiable trace of the larger sound object from which they are separated. We can now remove such musical objects from their former context of the frequency domain of the larger sound object from which they come. Perhaps it is time for a “rule” similar to Schaeffer’s above, one which would apply to a chain of sound over frequency, enabling us to extract from it that “raw element,” isolated from its spectral environment (Schaeffer 1967, 81.4).
From Uncontracted to Contracted
As mentioned in the introduction of this thesis, Aden Evens claims that what hearing contributes to sound “is a contraction” (Events 2005, 1). Evens’ concept of contraction is similar to Bregman’s of integration (Bregman 1990). A series of contractions, or integrations, of the characteristics of many sinusoids yield what one perceives as the “single,” larger sound they comprise. Given a single
sinusoid, frequency is contracted into what is perceived as pitch and amplitude into what is perceived as loudness. Given the many sinusoids of varying frequencies, amplitudes, and phases that combine to form a “larger,” “single” sound, one contracts these multiple sinusoids into what is perceived as the timbre of that larger sound, or the “sound” of that larger sound (Evens 2005, 5). 2.1 De sir e d (S ign al ) & U nd es ir ed ( No is e)
While an individual sinusoid “has a minimal timbre” (Evens 2005, 4), noise, particularly white noise, represents the most complex of timbres, as it consists of “too many contractions which cancel each other, a babbling of many sounds at once” (Evens 2005, 16). Theoretically, white noise consists of an “infinite and continuous range of frequencies,” and has a spectrum distributed evenly over the entire audible frequency range (“noise”). Evens suggests that noise is a mass of sounds, uncontractible because of the very multitude of sinusoids that exist within it (Evens 2005, 15). In a more general sense, noise can be thought of as “anything that interferes with the signal,” which may or may not be white noise (Evens 2005, 8). White noise is, in and of itself and detached from any signal it might obstruct, a complex of interferences, a mass of sinusoids whose very multitude obstructs the perception of each individual sinusoid for its own qualities. If not white noise as detached from any signal, what else might noise be? This begs another question: what is the signal— the desired—and what is not? If the signal denotes the desired and noise the undesired, noise is indeed a relative term dependent upon the desires of the listener. In musique concrète, the sound object represents the desired, and any sounds surrounding the sound object in time, sounds that are discarded, the undesired. Perhaps the discarded in musique concrète can then be thought of as a sort of noise, as unwanted sounds that alter the perception and appreciation of the desired in its own right. This discarding of sounds, such as the physical detachment of the attack of a sound from the rest of its evolution through tape cutting, may directly facilitate the mental detachment of that sound from any association with its source or cause. If the discarded sounds in musique concrète (e.g. the attack of a sound) would otherwise lead to the perception of other sounds at different time locations on the same recording in a way unintended by the acousmatic composer (i.e. if the discarded would reveal the source of the desired), if the discarded are indeed undesired sounds that interfere with the intended perception of the desired signal, those sounds may be thought of as a sort of noise. It is because these discarded sounds of musique concrète would otherwise interfere with the appreciation of the rest of the sounds on the recording from which they are detached in terms of time, the appreciation of these sounds in their own right, that the discarded sounds might be thought to serve as noise to the desired, the signal. 2.2 Em a n ci p at io n s
The ultimate reason for the separation of sounds from larger ones in musique concrète and its derivative microsound, then, has much to do with the emancipation of sounds from their original, “noisy” context, for the emancipated to serve otherwise. As Christoph Cox and Daniel Warner write about DJ Culture regarding its links to Schaefferian thought:
From Schaeffer onwards, DJ Culture has worked with two essential concepts: the cut and the mix. To record is to cut, to separate the sonic signifier (the “sample”) from any original context or meaning so that it might be free to function otherwise [emphasis added]. To mix is to reinscribe, to place the floating sample into a new chain44 of signification. The mix is the postmodern moment, in which the most disparate of sounds can be spliced together and made to flow. It is exemplified by those musics of flow: disco, House, and Techno. But the mix is made possible by the cut, that modernist moment in which sound is lifted and allowed to become something else, or is fractured so that it trips and stumbles around the beat (Cox, Warner 2004, 330).
This is the same “chain” of which Schaeffer speaks (Schaeffer 1967, 81.4). See page 51.
Source separation technologies unavailable to Schaeffer and other early composers of musique concrète now allow for perceptually distinct sounds to be separated from other ones in the frequency domain of a massy sound, for them to be appreciated without any relation to the others in the original sound or even the original sound itself. Such separations may lead us to further reconsider what sounds may fall into the ambiguous category of noise, of undesired sounds that interfere with the desired and inhibit the desired from serving otherwise. If the larger sound object itself is to be perceived in tact over frequency, that is, with all of its sinusoidal components, all of those sinusoidal components contribute to the signal, and all constitute the desired. Within the complex sound that is the larger sound object exist many combinations of smaller sinusoidal sounds that may constitute the individual musical objects discussed by Schaeffer. If a certain smaller sound object within the larger sound object is desired to be heard on its own, the other smaller sound objects that it combines with over frequency to create the larger sound object may interfere with, or serve as noise to, its individual perception. This phenomenon was discussed in Part II as masking. As has been discussed, sinusoidal frequency components within the larger sound may be related in many different ways, e.g. harmonically, in terms of some amplitude or length criterion, etc. We often only hear their vertical relations and any meanings associated with these relations rather than these sinusoids themselves. While within the single note exist many frequency components, we perceive the individual frequency components not as individual frequency components but as a single note. Further, while within the chord exist individual notes, we still perceive the notes not as multiple, individual notes, but as a single chord.45 As Evens reflects:
Which parts of sound are held back and which parts contracted? Vertically, we contract the relations among pitches as harmony, so we do not so much hear several notes but a chord (Evens 2005, 18).
The individual notes within the larger sound object of the chord may exist as smaller sound objects, though they are not necessarily contracted by our perception as such—they are “held back.”46 These notes would be no less valuable musically than the chord they comprise when heard alone and contracted. Alone, the note can function in musical ways that the chord cannot. A sound object such as a chord may involve multiple, smaller sound objects sounding all at once, smaller musical objects that escape our attention as they are absorbed by the larger whole. Like white noise but to a lesser degree, a sound object bombards us with a plethora of sinusoids, which cannot be analyzed as thoroughly as they could in smaller quantities. The smaller sound objects that may exist within the larger sound object serve as noise to each other, interfering with each other’s independent perception despite their complementary roles as constituents of the same, larger sound object. Again, noise is a relative term, as it depends upon what one wants (signal) and does not want (noise) to hear. If one wishes to hear a certain note within a chord more clearly, a separation of that note from the other notes in the chord that inhibit the perception of that certain note as a single entity—a denoising—would be facilitatory. This type of “denoising,” although it might entail the physical action of playing one note instead of several simultaneously, can perhaps still be thought of as similar to the type of denoising achievable by PLCA as discussed in Part II. In both cases, undesired sounds would be removed to aid in the closer appreciation of the desired, and possibly for the desired to function otherwise. Such separations may allow for the uncontracted to be contracted. Henri Pousseur’s Scambi (1957) consists of white noise that he filters into “clear” sounds comprised of less sinusoids:
As Michelangelo specified shape by chipping at his block of marble with a chisel, so Pousseur specified crisp, clear, and pitched sounds by chipping at his block of white noise with an electronic chisel called a filter (Chadabe 1987).
Pousseur’s Scambi involves the use of a filter to separate sounds from others in the frequency domain of a “block of white noise,” and thus relates to Schaeffer’s example mentioned earlier of white noise
Still, the tendency to hear a note and not sinusoids is stronger than that to hear the chord and not notes. As discussed, the notes in a chord may fuse in part due to the harmonicity principle (Bregman 1990, 656).
being filtered into melodic sounds.47 With Pousseur’s filtering, the uncontracted sinusoids in the noisy context of white noise become contracted only when isolated from their former spectral environment. The same (de)compositional principle can be applied to a sound object, whose sinusoidal content is much less than that of white noise but still greater than a single sinusoid. To a lesser extent than white noise but a greater extent than the single sinusoid, the sound object, with its many sinusoids, carries out an attack on our hearing attention. The extent of this attack on our attention can be diminished if groups of these sinusoids are separated from each other and displaced in time or space to be listened to and appreciated on their own. Sounds that may have escaped our attention in the larger mass of sounds may attract our closer attention if separated from this larger mass. Their qualities that may have gone unnoticed may no longer escape our notice. Frequency domain granulation, although it has “the effect of filtering the spectrum,” may still prove a sound with high mass indestructible under simple filtering to indeed be destructible (Roads 2001, 275). It yields sounds with less sinusoids as compared to the larger sound object. These smaller sounds, which may be perceived without any relation to the larger sound object they formerly comprised, no longer have to be thought of as “frequency components” of this larger sound object. They have been “emancipated,” so to speak, from the spectral confines of the larger sound object of which they previously were only a mere component. As unseparated frequency components, these sounds were the uncontracted. As the separated, they become sounds with their own timbre, waves of a shape distinct from that of the wave of the larger sound object from which they are separated. They can be contracted and appreciated as such. While they “inherit” acoustic qualities from the larger sound object, these smaller sounds, lacking qualitative characteristics of the larger sound object, can be perceived themselves as very different sounds, as sounds for their own sake. A similar line of thought can be applied to grains as extracted from a larger sound object via granulation. Within the temporal confines of the larger sound object, that is, before granulation, these grains serve only as mere “time components” of the larger sound object. Here, they are not contracted. Once extracted, once emancipated through granulation, these grains can serve as sounds to be appreciated (albeit fleetingly) in their own right, not as time components of the original sound object, but as grains. As time components in the larger sound object, they are not contracted as individual units, as grains, but absorbed by the larger sound object. Once separated, they can be contracted, and devoted to particular musical ends that the larger, bulky sound object cannot be.
Both kinds of sound separation—those that isolate groups of sounds from either their former context of the time domain or frequency domain—may allow for the acousmaticization48 of the separated sounds. Although the separation of sounds from others in the time domain and frequency domain involve very different operations, in both cases, certain sounds, pieces of acoustic information vital to the recognition of the source, may have been detached from the smaller sound in consideration. That is, in both cases, a suppression of acoustic information is at play. In the different forms of frequency domain granulation, these are certain frequency components from different locations in frequency space; in sampling and granulation, these are time components from different locations in time space. It should be stressed that frequency components hold time domain information in their amplitude envelopes, and that time components hold frequency domain information in their spectra. Nevertheless, frequency components and time components combine along the frequency and time axes respectively to constitute larger sounds, and they hold the majority of their acoustic information along different dimensions of the time-frequency plane. This latter point highlights the fact that each are suited for distinct musical processes and structures.
Because it can be fundamentally altered through filtering, white noise can be considered a low mass sound. As mentioned in the Preface, Schaeffer borrowed the term “acousmatique” (acousmatic) from the Greek akousmatikoi, the disciples of Pythagoras who, in order to focus more closely on the content of their teacher’s words, only heard him speak from behind a veil. Acousmatic music employs sounds heard without knowledge of any originating cause, heard from behind the “veil of the loudspeaker” (Schaeffer 1977).
With sounds separated from either domain, there exists an information threshold on which the perceptibility of their source materializes. In the case of sampling or granulation, a certain amount of acoustic information removed from along the time axis (e.g. the attack of an instrumental sound) is needed to distinguish the source of a sound (e.g. the sound of a bell or the sound of a muted cymbal?) (Schaeffer 1967, 41.5); in the case of filtering or frequency domain granulation, a certain amount of acoustic information removed from along the frequency axis (e.g. higher frequencies related to speech sibilants) is needed to distinguish the source of a sound (e.g. the sound of his voice or hers?), or even to decipher meaning associated with sounds (e.g. what words are being spoken?). 3.1 Re or der ing s & Re al ig nm e nt s
Sound, then, can undergo a de-acousmaticization, an unveiling of its source, in both the time and frequency domains if such missing information is retrieved and reorganized. In the time domain and the case of the sound object or grain, this entails the retrieval of missing sounds in temporal proximity to the sound object or grain and their appropriate ordering relative to their time positions in the original signal (as we saw in Part I). In the frequency domain, de-acousmaticization entails the retrieval of certain missing frequency components with their respective amplitude envelopes, but also their appropriate alignment over the time axis. Thus, a sound can be de-acousmaticized only by both retrieving acoustic information and by replacing this information correctly in time. A sound can be acousmaticized either by removing acoustical information or by displacing this information in time. Regarding time domain granulation, as mentioned in Part I, Lippe notes that grain order “can be of primary importance in granular sampling,” that it “may have important consequences” affecting a continuum of “recognizability and non-recognizability” related to the original, granulated sound object (Lippe 1994, 5). It is frequency component alignment in frequency domain granulation that affects the recognizability and non-recognizability of the original sample, as demonstrated through the music discussed in Part II. In each piece, a “continuum” of “recognizability and non-recognizability” is created through the gradual alignment and misalignment of latent components respectively (Lippe 1994, 5). The particular alignment of frequency components that comprise a complex sound over time greatly influences the morphology of the sound they combine over the frequency axis to produce; thus the misalignment of frequency components relative to their alignment in the original signal can acousmaticize a sound whose source was once recognizable. Every frequency component that comprises a larger sound object bears a particular phase relationship with every other frequency component that exists within that larger sound object. Once this phase relationship is manipulated, the morphology of the original sound object may change drastically, often rendering the original sound unrecognizable even if all of its spectral elements are present. It is the combination of the summation of the frequency components that comprise a sound object over the frequency axis and their correct alignment over the time axis that reconstitutes the larger sound object from which they were separated. That frequency components can be separated from a signal in terms other than time is precisely why they can be aligned (i.e. de-acousmaticized) or misaligned (i.e. acousmaticized) over time. This principle allows for interesting and distinctive musical processes of acousmaticization and de-acousmaticization that cannot be achieved with sound grains. Grains separated from a sound object cannot be de-acousmaticized by any such alignment, as they can only reconstitute the signal from which they were separated by a temporal, sequential ordering relative to their time locations (i.e. onset times) in the original signal. The reader may consider two cases, the first of which involves a sound object granulated into ten 100 ms grains, the second of which involves a separate instance of that same sound object separated into ten 1 s frequency component groupings limited by equal bandwidths (i.e. groupings achieved by simple filtering; assume the original sound object is of low mass and is destructible by simple filtering, that each of these frequency component groupings alone do not indicate their source). In the first case, the ten grains alone can de-acousmaticize the signal; in the second case, the ten frequency component groupings alone can de-acousmaticize the signal. The grains in the first case are played over time non–sequentially, i.e. in a different time order than they existed in the original sound
object; the frequency component groupings in the second case are all played simultaneously but misaligned over time, i.e. in a different alignment than they occurred in the original sound object. Both thus begin acousmaticized and undergo a de-acousmaticization process:
Unordered Grains Ordered Grains Misaligned Bands
8 9182736450 0123456789 4 5 1 9 6 2 0 3 7
9 8 7 6 5 4 3 2 1 0
Fig. III.b: De-acousmaticization processes for time components (left) and frequency components (right).
It is clear that the de-acousmaticization process for each is different: the frequency component groupings require a realignment, the grains a reordering. These different processes obviously bear different compositional implications, at least in regard to the recognizability continuum discussed by Lippe. These acousmatic principles are applied visually to the written word “Recognizable” below:
Fig. III.c: Similar de-acousmaticization processes applied respectively to a visual example.
Although it can be separated from a sound object like a grain, the separated frequency component grouping (i.e. band), unlike the grain, may itself serve as a sound object whose duration extends beyond the microsound time scale (1 ms to 100 ms). That is, unlike the grain, the frequency component grouping has what Wishart refers to as continuation. Compared to the grain, the frequency component grouping separated from the larger sound object represents an alternative informational decomposition of the original sound object. While the grain holds most of the retained information from the decomposed sound object in the frequency domain, the frequency component grouping holds most of the retained information from the decomposed sound object in the time domain. This last point is important: while both the grain and the frequency component groupings as depicted above may represent comparably drastic informational decompositions of the larger sound object (in terms of their capacity for acoustical quanta, or quanta of information), they contain the acoustic information they retain from the original sound object along different dimensions of the time-frequency plane. It is for this reason that the frequency component grouping and the grain each call for very different processes of acousmaticization and de-acousmaticization, and any other number of distinct musical processes as well.
As discussed in Part II, the many smaller sounds that constitute a sound object over frequency may come from many sources, yet they all contribute to the same sound, the same wave or timbre that our ear perceives. Wishart asks questions relevant to Bregman’s work in the following:
Having discovered that sound-objects may be exceedingly complex and that our perception of them may involve processes of averaging and attention to spectral evolution, an obvious question presents itself; how are we ever able to differentiate one sound-source from another? As all sounds enter our auditory apparatus via a single complex pressure wave generated in the air why do we not just constantly hear a single source with more or less complex characteristics? We might ask the same question in reverse: how is it that a complex sound does not dissociate into a number of separate aural images? Much research has been done and much is still being carried out on the problem of aural imaging (Wishart 1996, 64).
While a high enough degree of filtering may be able to fundamentally alter a sound object—to separate from that sound object new and smaller sounds without any perceivable relation to the original—it does not necessarily yield coherent sound objects, perceptually relevant groupings of frequency components. Such perceptually relevant groupings, or auditory streams, may exist within both sound mixtures (e.g. the different instruments playing in a string quartet) and sound nonmixtures (e.g. the sibilants and vowels within the speech of an individual). Only sound mixtures contain separate sources, or aural images, however.49 Wishart asks us another familiar question: “[O]nce our auditory mechanism has dissociated the incoming sound into its constituent sine wave components, or at least generated some kind of spectral analysis, how can it then group the various components according to the separate sources from which they emanated?” (Wishart 1996, 64). He offers a helpful summary of some of the auditory segregative factors discussed in Part II:
1. 2. 3. 4. Components having the same (or very similar) overall amplitude envelope, and, in particular, components whose onset characteristics coincide will tend to be grouped together; Components having parallel frequency modulation…will be grouped together; Sounds having the same formant characteristics will be grouped together; Sounds having the same apparent spatial location will be grouped together; (Wishart 1996, 64).
These four points addressed by Wishart relate to characteristics of signals evaluated by PLCA. Wishart also discusses “onset synchrony” and “the various constituents of a sound” being “separated by increasing time-intervals,” thus pointing to the main technique behind the music discussed in Part II. Each of the three pieces employed a sound object “split into its component parts,” the latter of which align at different points to allow the original, larger sound object to “recohere” (Wishart 1996, 65). In Onomatoschizo at least, PLCA allowed for a non-mixture sound object to be separated into a “number of separate aural images,” even though each of these derived from a single source. This concept is related to the idea of virtual sources, addressed later in Part III. 4.1 Sou n d O b j ec t C oher e nc e Wishart alludes to other Schaefferian concepts (discussed soon) in the following:
Conversely, we may use these aural imaging factors compositionally to integrate sound materials which might otherwise not cohere into objects. Thus, by imposing artificial attack and amplitude characteristics on a complex of sounds (e.g. a small group of people laughing), by artificially synchronizing the onset of two or more normally quite separate sound-objects, by artificially synchronizing…two or more normally quite separate sound-objects we may create coherent composite sound-objects…. This further opens up our conception of what might be considered as a coherent musical object (Wishart 1996, 65).
Through the musical techniques explicated in Part II, sound materials that had previously cohered into a larger sound object were artificially desynchronized in processes that helped to portray the materials as “quite separate sound-objects,” though in a manner opposite to the one Wishart mentions (Wishart 1996, 65). The musical processes in Part II were intended to help further “our conception of what might be considered to be a coherent musical object” by looking deep inside the sound object itself, by splitting it into smaller sound objects Schaeffer called musical objects, “magnets” broken from the larger “magnet” of the sound object (Wishart 1996, 65; Schaeffer 1967, 75.4). But what is a coherent musical object, or sound object? What is an incoherent sound object? Is coherence a prerequisite for a sound object? These are debatable questions, and Wishart’s conviction of what may constitute a coherent sound object seems to depart from Schaeffer’s. Wishart’s assertion that “coherent composite sound-objects” may be created with “normally quite separate sound-objects” contrasts Schaeffer’s that “the sound object is defined by its causal coherence” (Schaeffer 1967, 75.1). Schaeffer also writes:
That is, unless a sound non-mixture is separated via PLCA into musical objects to be displaced in time.
[A] sound that is both complex and coherent cannot be split up. Only less coherent [sound objects] lend themselves readily to sol-fa-ing,50 especially if one is lucky enough to hear the constituents first (Schaeffer 1967, 78.2).
It should be reiterated that Schaeffer was not privileged to the source separation technology that exists today, and that he stated it was quite difficult to separate a “double sound”—one which still “undeniably” had a “unity of sound”—into its constituent musical objects simply by means of the filtering (Schaeffer 1967, 43.1, 75.1). Today, such coherent sounds can be split up into their constituents. Indeed, Onomatoschizo made the listener “lucky enough” to hear the constituents first. Schaeffer calls these “less coherent” sound objects “compound objects,” which are more particularly multiple sound objects (i.e. from separate sources) that have the quality of onset synchrony. These smaller objects, says Schaeffer, combine into “chords”:
We shall call “compound objects” certain kinds of chords consisting of objects which have more or less merged at the same instant [i.e. have onset synchrony] into a single outline (Schaeffer 1967, 77.4).
Multiple objects with asynchronous onsets, says Schaeffer, contribute to a “composite object”:
When two objects are combined in succession, i.e. more in the nature of a melody than in the nature of a chord, the result will be called “composite object” (Schaeffer 1967, 78.1)
The music discussed in Part II allows the listener to perceive what Schaeffer would have considered as coherent objects instead as compound objects when their constituents were aligned, and composite objects when their constituents were misaligned. Still, the question may be posed: were the constituents of these larger sound objects, these latent components, sound objects themselves? French Canadian musical semiologist Jean-Jacques Nattiez addresses debates that were held in the 1970s by composers at the GRM in Paris regarding the concept of the sound object. He writes:
In my own view, the sound-object is an ambiguous phenomenon…. It is first and foremost a poietic unit…. The sound-object is nothing more than a unit, and we always need units in describing the materials and the organization of pieces. In going from one work to another, from instrumental music to electro-acoustic music, the dimensions of the units, their functions, the hierarchical structures in which they are integrated, the depth of their articulation, may all change. From the analytical standpoint, the main problem will be deciding how we go about making segmentations of these units (Nattiez 1990, 100).
Like that of the grain, the concept of the sound object may indeed be ambiguous. Like the grain, the sound object serves musically as a unit of larger sonic structures. Nevertheless, the sound object, unlike the grain, extends through time and has an appreciable morphology. Exception is thus taken to Nattiez’s contention above that the sound object “is nothing more than a unit” (Nattiez 1990, 100). As Nattiez himself writes on the same page:
Schaeffer never conceived electro-acoustic works as anything other than “studies upon objects.” In…Solfège de l’objet sonore, Schaeffer declares that his experiments in “training the ear” “from the outset avoid indicating agency”; that is, indicating the source of the sounds (Nattiez 1990, 100).
The Schaefferian sound object is indeed a musical unit, but one that calls for a special type of extended listening inappropriate for the grain. That is, the sound object calls for reduced listening, which is aided by hiding the source of the sound object so that the sound itself can be more easily appreciated for its own sake, without any meaning connected to its source. 4.2 Tw i ce- R ed u ce d L is te n ing
In this way, the latent component—as a sound with a coherent inner structure detached from its former (spectral) context—can be thought of as a peculiar type of sound object. It must be noted that the latent component, obtained as it is through the careful analysis of spectral context conducted by PLCA, indeed may bear a coherence that frequency components obtained through filtering probably
By “sol-fa-ing,” Schaeffer means the creation of klangfarbenmelodie, or timbre melody, discussed in Part II.
will not. Although it can be separated from its former spectral context like a grain can be from its former temporal context, the latent component, unlike the grain, is itself a sound object whose duration extends beyond the microsound time scale into what Wishart refers to as the “realm of Continuation” (Wishart 1994a, 52). Compared to the grain, the latent component represents an alternative informational decomposition of the sound object. While the grain holds most of the retained information from the decomposed sound object in the frequency domain, the latent component holds most of the retained information from the decomposed sound object in the time domain. Because the single latent component—like the Schaefferian sound object and unlike the single grain—is not fleeting and extends in the direction of time, it can call for a reduced listening, or rather a twice-reduced listening. Such listening can be described as “twice-reduced,” because it is not the sound object itself that is appreciated in its own right, but the latent component, a spectral decomposition of the original sound object, a musical object that is perceptually coherent. A twice-decomposed sound object, the latent component is obtained from a larger sound object, which itself is obtained through a temporal decomposition of the longer recording; the larger sound object itself is then decomposed spectrally through PLCA into latent components, smaller sound objects. In twice-reduced listening, the smaller sound objects that once lay hidden in the larger sound object can be appreciated as sounds themselves, for their intrinsic acoustic qualities and not in relation to the other sounds, or frequency components, that once existed alongside them in the larger spectrum of the original sound object. Indeed, there may exist many smaller sound objects within the larger sound object, sounds that are ripe for the mining, which deserve a more focused attention devoted to their own inherent qualities. The latent component, as informational decomposition of the larger sound object, allows for greater focus to be given to certain, related frequency components that once comprised that larger sound object, without any relation to that larger sound object as their source. In this respect, the latent component and twice-reduced listening are correlates of each other as are the larger sound object and reduced listening, as “they define each other mutually and respectively as perceptual activity and object of perception” (“reduced listening”). Latent components can be appreciated individually, in their own right, if they are each introduced gradually throughout a compositional process. Through such a gradual musical process, the sound object of which the latent components are fragments can be constructed, or rather reconstructed, after its demolition. In this way, latent components can be perceived as compositional building blocks of the larger block of sound from which they have been extracted, as constituents of the Schaefferian compound object, but still as perceptually coherent sounds for their own sake (Schaeffer 1967, 77.4).
Point against Point
As composer Karlheinz Stockhausen wrote, timbre, or “tone-color,” is the result of time structure (Stockhausen 1957, 19).51 Such time structure is itself determined by the increasing and decreasing amplitudes, the crescendos and decrescendos, of the sinusoids of different frequencies that comprise the sound. In another text, Stockhausen elaborates this idea:
Suppose you take a recording of a Beethoven symphony on tape and speed it up, but in such a way that you do not at the same time transpose its pitch. And you speed it up until it lasts just one second. Then you get a sound that has a particular colour or timbre, a particular shape or dynamic evolution, and an inner life which is what Beethoven has composed, highly compressed in time…. On the other hand, if we were to take any given sound and stretch it out in time to such an extent that it lasted twenty minutes instead of one second, then what we have is a musical piece whose large-scale form in time is the expansion of the microacoustic time-structure of the original sound (Stockhausen 1989, 91).
These instrumental layers composed by Beethoven are the same hypothetical layers that are transformed into new sinusoidal frequency components of the shorter, 1 s sound object. However, it
Timbre is also the result of frequency structure, that is, those frequency components that exist within it. Time structure, as Stockhausen is referring to it, means how these frequency components modulate in amplitude.
is argued that this shorter sound would not have “an inner life which is what Beethoven has composed” (Stockhausen 1989, 91). This is because time-shrinking that Beethoven symphony as Stockhausen suggested would transform its macro-acoustic, instrumental layers, once contractible as distinct layers distinguished by their respective timbres, rhythms, dynamics, etc., into micro-acoustic layers, frequency components uncontractible in their own right. Nevertheless, a connection can indeed be drawn between the macro-acoustic strata of the Beethoven symphony and the “timestructure” of the shorter, 1 s sound object, namely its sinusoidal, micro-acoustic strata. Stockhausen’s latter point relates to Truax’s spectral time-stretching techniques discussed in Part I. Again, Truax was interested in the potential of time-stretching to reveal the “inner complexity” of sounds, particularly natural sounds, by shifting the listener’s focus from the spectral characteristics of a brief sound (i.e. a grain) to the temporal characteristics of the spectral components that existed within the original sound (Truax 1994). Yet both Stockhausen’s example and Truax’s time-stretching techniques involve a temporal manipulation of an original sound that drastically distorts the spectrum, and whatever characteristics that could be newly perceived in the manipulated would actually not have been present in the original. While time-stretching may aid a new form of listening, it necessarily changes underlying characteristics of that which is to be perceived in order to do so. Subjecting a sound object to PLCA may allow for the perception of the same “inner complexity” of sounds that interested Truax, but without the drastic spectral distortion unavoidable in timestretching. By separating a larger sound object into smaller, horizontally extending sounds, PLCA may allow for greater attention and analysis to be given to the individual morphologies of those smaller sounds that once contributed to the greater morphology of the larger sound object. Each latent component obtained from a sound object, while it may share some morphological characteristics with others obtained from the same sound object, still has a distinct rhythmic character that is maximally contrasting with the rest,52 one which may not have been perceivable while it existed as part of the larger whole. Once frequency components are separated from a larger sound object and time-scattered, once they are emancipated from the larger sound object to serve otherwise, they can be perceived differently than they could as part of the larger whole. Indeed, they may not have been perceived at all as part of the larger whole. The original sound object, in turn, can be perceived as the result of complex interactions between its spectral components. Just as the different instrumental parts in a Beethoven symphony or the voices in a Bach fugue do, it is argued that frequency components confined in any complex sound object interact in counterpoint with each other to construct that sound object, that is, to construct its timbre. This counterpoint, of course, is a natural, not composed one, but it can be exploited in composition (as demonstrated in Onomatoschizo). The term “counterpoint” derives from the Latin punctus contra punctum, which translates “point against point.” Johann Joseph Fux’s Gradus ad Parnassum, first published in 1725, has largely influenced the teaching of counterpoint during the last two centuries (Kennan 1999, 3). The text is in the form of a dialogue between a teacher, Aloysius and his pupil, Josephus (Kennan 1999, 3). Aloysius explains the term “counterpoint” to Josephus:
It is necessary for you to know that in earlier times, instead of our modern notes, dots or points were used. Thus, one used to call a composition in which point was set against or counter to point, ‘counterpoint’ (as quoted in Kennan 1999, 3).
In our world of what Landy calls sound-based music, we are no longer so heavily concerned with the note (Landy 2007a, 17). Indeed, we may concern ourselves once again with points, although not the “dots or points,” presumably on parchment, which Aloysius mentions to Josephus. We are now concerned with points on Gabor’s matrix, spectral points against others on the time-frequency plane.
That each latent component comprising a sound object has its own rhythmic character supports the argument for its comparison to a voice or line in a contrapuntal composition, which is often distinguished from others through a unique rhythm. This point is revisited later in this section.
A n Ar t of Segr eg at io n
Bregman describes counterpoint as “the technique of composing the various parts in polyphonic music” (Bregman 1990, 494). He notes that contrapuntal principles have served several goals:
The first was to have more than one part sounding at the same time and in a harmonious relation to one another…. They also served the purpose of having each part make a separate contribution to the musical experience by being perceived as a melody in its own right. At times, the different parts would become partially independent in perception, but they would be tied firmly together again by good harmonies at key points, such as at the ends of phrases or passages. At the points at which the parts were to be independent, two things had to be arranged. The first was that they had to be prevented from fusing with the other parts and the second was that sequential integration had to be strong within each part. I think that some of the compositional principles that achieved these purposes did so because they corresponded to principles of primitive auditory organization (Bregman 1990, 494).
It is important to note that Bregman makes this connection between counterpoint and broader principles of auditory perception. Michael Casey and the author made similar connections between musical counterpoint and the sounds that interact in nature. Interestingly, while Casey noted that early composers of instrumental music likely imitated those sounds they heard in nature—the author made note of the reverse—that a natural counterpoint exists between the spectral components that comprise any complex sound, between those many smaller sounds that interact to form timbre. Bregman notes that because music often consists of separate instruments in a polyphonic ensemble, separate sources that is, it often “tries to fool the auditory system into hearing fictional streams” (Bregman 1990, 457). While our auditory systems have evolved to help us “build perceptual representations of the distinct sound-emitting events of our environment,” and while music itself often consists of distinct sound-emitting instruments (often in distant spatial positions) or even certain locations on a single instrument (e.g. different strings on a violin), “music does not always want these to be the basic units of our experience” (Bregman 1990, 457). He continues:
A set of different musical instruments may be used to create a chord or some other composite sound that is meant to be heard as a global whole. In order to get sounds to blend, the music must defeat the scene-analysis processes that are trying to uncover the individual physical sources of sound (Bregman 1990, 457).
Music “does not always” intend for separate streams to be perceived as individual units, but sometimes it does, as implied by Bregman in his description of counterpoint at the top of the page. To stress this point, Bregman quotes music theorist C.W. Fox, who argued, “the art of counterpoint is essentially an art of segregation of melodic lines” (Bregman 1990, 494). Bregman follows:
[I]n counterpoint the goal was not simply to promote the perception of the harmony of the chords, but to induce the perception of multiple, concurrent parts. Therefore the vertical ties between the parts that were produced by rules of harmony were balanced by other rules that were used to keep the parts distinct…. The goal of segregation was approached in two ways. One was to strengthen the horizontal bonds between notes and the other was to weaken the vertical ones (Bregman 1990, 496).
Bregman discusses several ways in which the concurrent melodic lines in counterpoint are traditionally kept distinct from one another. These can be summarized as follows (the reader may note how several of these relate to principles of primitive auditory scene analysis discussed in Part II):
1. 2. 3. 4. 5. Binding the notes within each line into a sequential grouping by avoiding large leaps in pitch. Preventing the synchronous onsets of notes in different parts. Giving the different parts distinct rhythms (related to 2. above). Refraining from having one line of melody cross into the pitch range of another (related to 1. above). Avoiding parallel pitch movements of different parts (Bregman 1990, 496-499).
The first point is related to the factor of frequency separation in sequential integration as discussed in Part II—the smaller the difference in fundamental frequency between successive tones, the more likely we are to integrate them into a stream that can be heard as distinct from others. Because vertical note groupings compete with horizontal ones, “if a note departs too much in pitch from its
predecessor,” it risks being captured by, and fused with, a tone in one of the other parts that occurs at the same time (Bregman 1990, 496). If the tones are “strongly bound into their own respective streams,” notes Bregman, this risk will be reduced (Bregman 1990, 496). The second point recalls the effect of onset synchrony, which is to fuse simultaneous tones. Bregman notes that this vertical grouping is strongest when each of the pitches in the group (i.e. the chord) “deviates quite a lot in pitch from its predecessors in the same part, since this will weaken their horizontal grouping” (Bregman 1990, 497). Again, the weaker a horizontal grouping between successive tones, the more likely these tones are to be grouped into different simultaneous streams. In contrast, by avoiding the synchronous onsets of tones in the different lines of counterpoint, by keeping, for example, a rest (i.e. silence) in one part while there is a note onset in another, the distinctness of lines may be preserved and the strongest separation obtained (Bregman 1990, 496). Related to the second point above, the third highlights that a difference in rhythm between the various lines in contrapuntal music “makes it easier to focus on the parts individually” (Bregman 1990, 497). Bregman notes that the use of different rhythms in polyphonic music can have a segregative effect that “would derive from the fact that having different rhythms in two parts guarantees that there are always some notes in one part that do not have a synchronous onset with notes in another part” (Bregman 1990, 497). When two parts have frequent synchronous onsets, the vertical integration of the parts is likely to increase. The fourth point is related to the first point above in that both refer to the “very strong tendency to perceptually group tones whose fundamentals are close in frequency to those of their predecessors” (Bregman 1990, 498). If one line of tones crosses into the pitch range of another, the listener may tend to “follow this melody to the [pitch] crossover point and then start tracking the other part it was crossing” (Bregman 1990, 498). Unless they have very different timbres, contrapuntal lines that overlap each other in pitch range will tend not to be distinguished easily. Anyone who has even briefly studied counterpoint will likely recall its most forbidden practices: the use of parallel octaves or fifths. As we saw in Part II with the Gestalt principle of common fate, different frequencies that move concurrently in the same direction (i.e. parallel on a log-frequency scale) tend to be simultaneously integrated (Bregman 1990, 657-658). Further, harmonics of the same fundamental frequency have a strong tendency to be simultaneously integrated. Hence the prohibition by traditional counterpoint against parallel movements between different parts to an octave or a fifth, as the notes that form these intervals already have a strong tendency to fuse vertically due to their harmonicity. If parallel pitch movements of different parts are avoided, particularly movements towards pitches that already have a strong tendency to fuse vertically, the different lines in a contrapuntal work are more likely to remain perceptually distinct. Again, it may be said that these compositional principles of instrumental counterpoint, which aim to sustain the perceptual segregation of different lines, may be thought to imitate the same interactions between the spectral components of natural sounds that induce our auditory mechanism to segregate them. That is, they relate to the same acoustic principles our hearing uses to segregate natural sounds as addressed in Part II. Bregman states that the primitive processes of our auditory system “seem to be the same ones that deal with the acoustic environments in everyday life,” that the major aspects of ASA apply largely to concepts in music theory (Bregman 1990, 528). He argues:
A musician might object that most of what is musically exciting has little to do with basic audition. Yet we cannot deny that we hear music through the same ears that provide us with the sounds of everyday life (Bregman 1990, 455).
It deserves reiterating that any complex sound can be considered as the result of a natural counterpoint between its constituent parts, its sinusoids. This is perhaps most obvious in what are normally considered to be sound mixtures, e.g. polyphonic ensembles with multiple instruments playing, a room with several people talking, a forest with the sounds of different animals, etc.— environments in which distinct sounds come in and out of our perception, trading off one against the other. Certain frequency components that exist simultaneously within these sound mixtures fuse vertically into independent sounds (e.g. a violin, viola, and cello, his voice and hers, a lion and a bird),
which are naturally segregated and contracted independently by our hearing. Some of these sinusoids do not fuse vertically, as when notes with synchronous onsets may not on account of very different timbres, or when two people with very different voices speak at the same time. Yet a counterpoint exists even in any “non-mixture” sound. Indeed, it is a counterpoint among its constituent sinusoids that gives a sounding violin the sound of a violin, a sounding piano the sound of a piano, a sounding lion’s roar the sound of a lion’s roar—it is this natural counterpoint that yields timbre. Bregman reminds us that timbre is an emergent property:
When elements enter into a higher-order organization, new properties are formed…. In daily life, we can hone a knife until it is sharp, despite the fact that none of the molecules that compose it are sharp. Sharpness is an emergent property. Similarly, we can compose a sound that is voice-like, despite the fact that not one of the sine waves that compose it is voice-like. We can call this an emergent feature (Bregman 1990, 459).
Musical experience, argues Bregman, derives from both sequential and simultaneous organizations: sequential groupings produce rhythms and melodies, while simultaneous groupings result in the perception of chords, “but also other emergent qualities of simultaneous sounds, e.g. timbre” Bregman 1990, 459).53 He reminds us that the instruments yielding the “sonic objects of music” only do so in a “very indirect way,” that sequential and simultaneous organizations are what allow us to hear the emergent qualities of interacting smaller sounds (Bregman 1990, 459). 5.2 Syn th et i c & N a t ura l M icr o pol yp ho ni es
Emergent acoustic qualities can be macro or micro phenomena—macro in the sense that a string quartet written by Shostakovich differs greatly from a string quartet by Mozart, even when they are performed by the same instrumentalists with the same instruments; micro in the sense that two different people will never sound exactly the same even though the sounds of their voices derive from interactions between the same sinusoidal waves. It is argued that this latter emergence is the result of as what Hungarian composer György Ligeti called micropolyphony, though Ligeti proposed this as a compositional concept, not a natural phenomenon:
Technically speaking I have always approached musical texture through part-writing. Both Atmosphères and Lontano [two of his orchestral works] have a dense canonic structure. But you cannot actually hear the polyphony, the canon. You hear a kind of impenetrable texture, something like a very densely woven cobweb. I have retained melodic lines in the process of composition, they are governed by rules as strict as Palestrina's or those of the Flemish school, but the rules of this polyphony are worked out by me. The polyphonic structure does not come through, you cannot hear it; it remains hidden in a microscopic, underwater world, to us inaudible. I call it micropolyphony (such a beautiful word!) (Ligeti et. al. 1983, 14-15).
Jonathan Bernard writes that Ligeti’s description of micropolyphony suggests “that the music characterised by it has two, essentially antithetical aspects: 1) the outer, audible one, which results from 2) the internal one, inaudible because it is really no more than a rule, working secretly, ‘behind the scenes’, as it were” (Bernard 1994, 227). In other words, the “outer, audible” aspect of Ligeti’s micropolyphony emerges from the “internal, inaudible one,” just as the sound of the voice emerges from inaudible sine waves as Bregman notes on the previous page. The music discussed in Part II exploits Ligeti’s concept of micropolyphony, but in reverse. That is, instead of composing synthetic micropolyphonies with individual lines that become inaudible, it decomposes natural micropolyphonies to render their individual “lines” audible. Each piece considers the sound object it decomposes via PLCA as a timbral byproduct of the contrapuntal micropolyphony between its latent components. In each of these pieces, the sound objects’ micropolyphonies (i.e. timbres) is altered to form new timbres and structures by turning onset synchronies into asynchronies through the misalignment of latent components.54
Timbre is also, on a smaller scale, a result of the horizontal organization of sounds, or rather, the amplitude modulation of frequency components over time (i.e. timbre is the result of both time and frequency structure). 54 Conceiving of any natural sound as emerging from a natural micropolyphony relates to the work of Canadian composer R. Murray Schafer, who discusses the world as one, vast macrocosmic composition in his book The
Ch i mer as , Hy dr as, & V ir tu al S o ur ce s
Bregman claims that emergent properties arise because the auditory system conducts analyses on “larger-sized objects,” the perceived boundaries of which “are governed by principles of grouping” (Bregman 1990, 459). He distinguishes between two forms of groupings: natural ones, which group sensory effects deriving from a particular sound source, and chimeric55 ones, which group sensory effects deriving from multiple sound sources. A natural grouping would integrate the sinusoidal components deriving from the same voice as a single voice, while a chimeric grouping would result in an “accidental composition of the voices of two persons who just happened to be speaking at the same time” (Bregman 1990, 459). “Natural hearing,” argues Bregman, “tries to avoid chimeric percepts, but music often tries to create them” (Bregman 1990, 459-460). As discussed earlier in this section, while contrapuntal music often wants to keep its different lines as perceptually distinct, at times it wishes to integrate them into one, as in the cadence of a string quartet. Such a chord of synchronous notes produced by different instruments “is chimeric in the sense that it does not belong to any single environmental object” (Bregman 1990, 460). Onomatoschizo, which used PLCA to split a sample of an isolated spoken voice into its maximally contrasting features, exhibits the reverse of what Bregman says music often does. That is, instead of constructing a chimeric percept from a number of separate sources, it deconstructs a natural auditory percept into separate streams, perceivable as such either because they are heard alone or because their former onset synchronies are manipulated. While the 31 latent components extracted from the spoken voice in Onomatoschizo may not derive from separate sources, they do not have to in order to be perceived that way. This is in contrast to how the sounds of separate instruments in an orchestra, while they do represent separate sources, do not have to be perceived as deriving from separate sources if arranged in the appropriate, contrapuntal way:
To avoid chimeras the auditory system utilizes the correlations that normally hold between acoustic components that derive from a single source and the independence that usually exists between the sensory effects of different sources. Frequently orchestration is called upon to oppose these tendencies and force the auditory system to create chimeras. A composer may want the listener to group the sounds from different instruments and hear this grouping as a sound with its own emergent properties. This grouping is not a real source, however (Bregman 1990, 460).
Bregman cites his former student and music theorist Stephen McAdams, who describes the chimeric grouping as a virtual source (Bregman 1990, 460). By manipulating factors that control sequential and simultaneous stream formation, music creates virtual sources that play “the same perceptual role as our perception of a real source does in natural environments,” that is, to provide “a sense of an entity that endures for some period of time (perhaps only briefly) and serves as a center of description” (Bregman 1990, 460). In a natural environment, the idea of a common source holds a succession of sounds emitted by a single source together. Bregman argues that this must be true for orchestral sounds as well, “except that in the orchestral case the source is not a real one but a virtual one created by acoustic relations in the music” (Bregman 1990, 460). He continues:
Experiences of real sources and of virtual sources are both examples of auditory streams. They are different not in terms of their psychological properties, but in the reality of the things that they refer to in the world. Real sources tell a true story; virtual sources are fictional (Bregman 1990, 460).
Perhaps virtual sources do not have to be additive, the result of separate sources blending into one. Perhaps they can be subtractive, the result of a single source, a sound “non-mixture,” being separated into new musical objects. Such was the case with the latent components employed in Onomatoschizo and Stratovinsky—they derived from virtual, albeit unrecognizable, sources. If the “non-mixture”
Tuning of the World (Schafer 1977). In it, Schafer suggests that “the blurring of the edges between music and environmental sounds may be the most striking feature of twentieth century music” (Bartle 1977, 292). 55 As Bregman relates, the Chimera is a Greek mythological beast “with the head of a lion, the body of a goat, and the tail of a serpent;” the word chimera is used metaphorically to describe an image derived as a composition of several other images (Bregman 1990, 459).
sound object separated via PLCA were to be compared to a Greek monster (see footnote 55), it would be the Chimera’s sibling—the Hydra—whose many heads, when severed, would each split into two new ones. When the latent components of speech in Onomatoschizo are misaligned so that their onset synchronies become asynchronies, the listener can more easily pay attention to these lines individually. When all 31 latent components are aligned, they naturally fuse into the original speech sample, “the bloops and bleeps.” At this moment, integrative correlations between them are no longer avoided, and the listener cannot attend as easily to each latent component, each “line.”56 However, even in contrapuntal music “designed expressly to enhance segregation,” one cannot always pay attention to each of the lines at the same time (Bregman 1990, 465). Bregman notes:
[I]f we are capable of segregating any one of six instruments from an ensemble (and this could be made easy if the part-writing avoided correlations between the lines and if the selection of instruments was designed expressly to enhance segregation), we should consider each of these as giving rise to a perceptual stream. We surely cannot pay attention to all these streams at the same time. But the existence of a perceptual grouping does not imply that it is being attended to. It is merely available to attention on a continuing basis. Metaphorically speaking, it holds up its head to be counted but does not do the counting…. One way to get the attention of the listener onto a line is to let that line be audible alone during a gap in the others (Bregman 1990, 465).
Although they may not always be attended to, perceptual groupings may indeed exist even within “non-mixture” sound objects deriving from a single source (i.e. an isolated voice). To aid the listener in giving greater attention to these perceptual groupings, they may have to be “teased out” of the “non-mixture” sound object, either through their isolation (i.e. letting them “be audible alone during a gap in the others”) or a misalignment that prevents any synchronicities between their onsets, in a counterpoint “designed expressly to enhance segregation” (Bregman 1990, 465). In Onomatoschizo, the listener is given an “advance knowledge” of the constituents of the final complex speech sample by introducing them gradually. In the words of Schaeffer mentioned earlier in Part III, the listener “is lucky enough to hear the constituents first” (Schaeffer 1967, 78.2). In his discussion of counterpoint, Bregman refers to composer Henry Brandt’s use of spatial counterpoint, noting that “[s]patial separation can also be used to keep parts perceptually distinct” (Bregman 1990, 500). Bregman describes how in Brandt’s idea of spatial counterpoint, performers are spread out in the space of the concert hall. The further performers are spread out relative to the listeners, the more independent their lines are likely perceived to be. This is similar to how two sinusoids “spread out” in frequency space are likely to be perceived independently. Bregman claims that the primary result of separating instruments spatially is an increased clarity of the melodic lines they produce (Bregman 1990, 501). As discussed earlier in this section, instruments playing notes in the same pitch range may be impossible to distinguish from each other. However, if a pair of instrumentalists are separated from one another spatially, their “melodic lines keep their independence, even when the pitches come close together” (Bregman 1990, 501). Bregman states:
This means that the composition can maintain clarity without having to keep the different parts in nonoverlapping pitch ranges…. Thus the composer has an additional dimension of separation to work with. (Bregman 1990, 501).
The spatialization of different parts allows not only for the segregation of melodies, but for that of rhythms as well. If two separated performers play unrelated rhythms, “the listener finds it easier to hear out the individual rhythms and not hear them as an incoherent blend” (Bregman 1990, 501). Bregman notes how Brandt reiterates a certain point, which is particularly related to Stratovinsky:
Brant makes the point several times that not only do the separated parts appear clearer, but also louder and more resonant. Perhaps this is because both the direct sounds and the reverberant sounds of the separated instruments are kept in perceptually distinct packages, one for each location, and hence do not mask one This integration can be compared to that between the notes in a cadence; both integrations form a musical point of reference—the notes a tonal one, the latent components of speech a semantic one.
another. We have seen…how factors that enhance perceptual segregation can minimize the effects of masking (Bregman 1990, 501).
The spatialization parameters of Stratovinsky, which send separate groups of latent components on distinct trajectories throughout the performance hall, form a spatial counterpoint that segregates the latent components once they are realigned and fused to resynthesize the original chords from Le Sacre du Printemps. That is, once the vertically integrative correlations between the latent components are no longer avoided through misalignment, they are avoided in a different way, namely through more drastic spatial displacements. Both avoidances help to prevent the effects of masking that occur in the original, unseparated Stravinsky chord, and to foster a new way of listening to its spectral components as sounds for their own sake.
As noted by Bregman, the musical art of counterpoint exploits principles of primitive auditory scene analysis to integrate and (re)segregate individual voices or lines in a musical mixture (Bregman 1990, 494). It has been argued that a natural counterpoint exists between the sinusoids of complex, “non-mixture” sounds to construct timbre. Outside of the acoustic realm of the arts, counterpoint appears in poetry. In “Counterpoint in Mallarmé’s ‘L’Après-midi d’un faune,’” Anne Holmes studies French poet Stéphane Mallarmé’s L’Après-midi d’un faune from a musical perspective, and notes how two themes in the poem, memory and desire, interact in counterpoint, how “the initial contradiction between the two drives leads to their increasing interdependence and final amalgamation, the traditional pattern of musical counterpoint” (Holmes 2003, 27). Holmes begins her discussion of poetic counterpoint by citing English poet Gerard Manley Hopkins, who she claims advocated the use of counterpoint in poetic verse (Holmes 2003, 28). She writes:
[Gerard Manley Hopkins] defined [counterpoint] as the ‘superinducing or mounting of a new rhythm upon the old’. ‘Since the new or mounted rhythm [in poetry] is actually heard and at the same time the mind naturally supplies the natural and standard foregoing rhythm’, he explained, ‘two rhythms are in some manner running at once’ and the result is ‘something answerable to counterpoint in music, which is two or more strains [emphasis added] of tune going on together’ (Holmes 2003, 28).
Holmes argues that memory and desire in L’Après-midi d’un faune “represent contrary tensions,” but that “collaboration between [these] opposites is necessary” to Mallarmé’s poetic goals (Holmes 2003, 36-37). Such collaboration between opposing entities, writes Holmes, is evident in musical counterpoint. She continues:
But this is of course what happens in counterpoint in music, in which the complex final creation absorbs the contrary strains [emphasis added] which have been introduced (Holmes 2003, 37).
The term “strain” is revisited soon. For now, it should be reiterated that the individual lines or voices in contrapuntal music may at times be absorbed by the other lines that exist alongside them, and that this same phenomenon of vertical absorption occurs on a smaller scale in natural sounds. A similar absorption occurs along the time domain of sound. Neither the grain nor larger sound object are perceived on their own until separated from the longer sound recording in terms of time. That is, the grain and sound object do not exist until this temporal separation occurs. Before separation, the grain and the sound object are merely time components of the complete sound recording and are subsumed in the perception of that larger sound recording as a whole, not contracted as individual objects of perception. As Wishart notes:
It is important to realize that there is a perceptual threshold at which we cease to perceive individual events as individual events and begin to experience them as contributing to…a larger event (Wishart 1996, 68).
Schaeffer stated similarly:
When [brief sounds] are incorporated in a given structure they are generally absorbed or disqualified by it (Schaeffer 1967, 34.8).
The law of the jungle applies to sounds as well: the smaller ones are gobbled up by the larger (Schaeffer 1967, 35.15).
Thus, the microsounds that exist throughout the time domain of a sound object cannot be perceived as such and escape our notice, as they are “gobbled up” by the larger sound they comprise (Schaeffer 1967, 35.15). As we have seen, such is also the case for those sound strata, those “stratosounds” that exist along the frequency axis of the sound object, sounds that may nevertheless escape our attention while they exist within that sound object. They are absorbed by the spectrum of the larger sound, their individual identities disqualified by it. Only when these sounds are emancipated do they become something different entirely—their own entities—just as when the time component of the longer recording does as it is freed toward different functions, embodied in the larger sound object or the shorter grain. As American composer Brian Ferneyhough states:
Was it Richard Strauss who once, when told that a certain instrument couldn’t be heard, replied, “Maybe, but I’d notice if it were not there”? Much the same could be said, I expect, for a lot of secondary strata [emphasis added], in that perhaps it’s not so much their physical presence which is contributative at any given moment as their providing of pre-planned points of structural coherence with other processual areas, points where things suddenly ‘click together’ without us always being precisely aware of what it is that’s doing the clicking (Ferneyhough et. al. 1995, 388).
While Ferneyhough discusses strata as distinct layers or textures of instrumental music, his thinking applies to certain sound strata that may exist within Schaefferian sound objects. The “clicking together” he refers to is especially apparent at the end of Onomatoschizo, when the 31 latent components finally align to reconstruct the spoken phrase. Further, one certainly notices if a latent component is not present in the playback of the spoken phrase, as important building blocks of its spectrum are missing. Sound strata that comprise a “non-mixture” sound often escape our perception as uncontracted frequency components that are absorbed by the larger signal. This does not mean that they are unmusical in their own right, and they may deserve closer attention allowed for by their emancipation from other sounds in the frequency space of the larger sound object. As part of the sound object, sound strata necessarily exist as part of a larger whole, as certain layers of frequency components that exist alongside other layers of frequency components. In other contexts, the term “strata” usually implies several distinct layers that combine with others to form something larger, e.g. the troposphere, stratosphere, mesosphere, thermosphere, and exosphere all represent distinct strata of the greater atmosphere, and all have distinct qualities. Similarly, sound strata only exist with other sound strata as constituents that comprise something larger. If separated from a larger sound, certain spectral strata of this sound can be heard without revealing the larger sound they once comprised, without illuminating the characteristic timbre of that larger sound. Like the Schaefferian sound object, they can be perceived as sounds for their own sake, precisely because they have been separated from other sounds that would have revealed the identity of the original, larger sound and any meaning its spectral context created. In this respect, separated strata may facilitate a kind of reduced listening that avoids recognition of their source and context. Again, Schaeffer was particularly interested in isolating one part of a single sound object from another (particularly the decay, sustain, and release of a sound from its attack) in order to disguise its timbre, to facilitate reduced listening. It may be very difficult, notes Schaeffer, “to recognize an instrument’s timbre when taken out of its own context” (Schaeffer 1967, 65.4). Of course, the “context” Schaeffer points to is purely temporal. The timbre of a sound, though, is a byproduct of both that sound’s temporal and spectral characteristics. The context we are concerned with now is purely spectral, and we wish to disguise the timbre to which certain frequency components contribute along the frequency axis by removing them from their spectral context—in a way similar to how Schaeffer disguised a timbre to which certain time components contributed by removing them from their temporal context. The latent components of PLCA exemplify emancipated sound strata, and can be thought of as descendants of the ancestral, larger sound object from which they are separated. However, they may not always be heard as descending from this sound object, as they have been taken out of its spectral
context. Like the grain, the latent component retains certain acoustic information belonging to the larger sound object from which it is separated. While it holds certain “genetic” (i.e. acoustic) information from the larger sound object, the latent component, as emancipated, indeed is its own sound that can be perceived as such when contracted alone, just as the E of the C major chord can be perceived as an individual E only if separated from the chord. As an individual E, this note need not have any connection with the C major chord. Emancipated sound strata, though they inherit acoustic characteristics from their ancestral, larger sound object, have a waveshape, a timbre—a sound—distinct from that of the larger sound object of which they are offspring. This is because they no longer exist alongside those other sinusoidal components of the larger sound object, sinusoids that formerly contributed to the overall waveshape or timbre of the larger sound object. Sound strata then, once emancipated, are no longer strata of the larger sound object. They are their own sounds, no longer certain strata of some larger sonic structure, and may be all the more appreciable outside of the spectral context of that sonic structure. Removing sound strata from a sound object leaves that sound object with missing frequency components. Other strata can still be emancipated from the confines of the spectrally-reduced sound object, yielding yet newer sounds to be contracted independently of that sound object, without relation to the other remaining sinusoidal components that comprise it. Thus, the different strata emancipated from the larger sound object, as additive units of that larger sound object, are distinct from each other although they each inherit acoustic information from the ancestral sound object. They are, once emancipated, smaller sounds to be appreciated, free of any relation to the larger sound object from which they descend, free of any relation to the other strata that once existed alongside them in the spectrum of that larger sound object. It is in this sense that these emancipated sound strata are no longer components, no longer “frequency components” of the larger sound object. Again, they are their own sounds and appreciable, contractible as such. They are as free to serve toward different musical ends as can the E separated from the C major chord. That is, they are no more “shackled” as components by the larger sound object than is the E by the C major chord. They are, like the emancipated time components of the larger sound object of musique concrète and the grain of microsound, perceivable in their own right and able to function in musical ways that the larger sound object from which they come cannot. Perhaps certain kinds of sound strata that exist within the “non-mixture” sound, particularly those perceptually coherent streams separable by means of PLCA, can be conceived as contrapuntal strains of the larger sonic structure they comprise through their interactions. Perhaps they can be considered as distinct “strains of sound,” just as those perceptual streams that exist as strata within the sound mixtures of polyphonic, contrapuntal music have been called distinct “strains of tune” (Holmes 2003, 28). In both cases, such strains may fall in and out of our attention, may blend together at times to become indistinguishable and separate at others to become distinguishable. The term strain57 would denote that smaller yet perceptually coherent sound object that interacts contrapuntally with others to form the larger, “non-mixture” sound object. An example would be one of the two musical objects that exist together within Schaeffer’s “double sound” (Schaeffer 1967, 43.1). However, as Bregman tells us, “the existence of a perceptual grouping does not imply that it is being attended to” (Bregman 1990, 465). There may be many streams of sound that exist within non-mixture sounds (e.g. the voice) that we may not consciously attend to, but which are still highly significant factors in our perception. Such streams of sound embodied in the latent component, to borrow Bregman’s metaphor, may only hold up their heads to be counted in certain cases, as when they are allowed to be “audible alone during a gap in the others” (Bregman 1990, 465). The term “strain” is by no means intended to replace “latent component.” Rather, the concept
Selected definitions of strain (n.) from the Oxford English Dictionary: “Offspring, progeny…. The descendants of a common ancestor…. Any one of the various lines of ancestry united in an individual or a family…. Inherited character or constitution…. An inherited tendency or quality; a feature of character or constitution derived from some ancestor; hence, in wider sense, an admixture in a character of some quality somewhat contrasting with the rest…. A thread, line, streak…. [Music:] A definite section of a piece of music; in wider sense, a musical sequence of sounds; a melody, tune [often plural]” (“strain”).
of the strain is a way of thinking about the latent component and its functionality in music, particularly as a line or voice in the greater “composition” of timbre. Further, the latent components of PLCA are expressly designed to describe individual sources within a sound mixture, while the idea of the strain is meant to describe individual sound objects that lie within a sound non-mixture as musical objects. That is, while the latent component could represent one of two separate vibraphones playing simultaneously, the strain could not. Instead, the strain would represent one of the two simultaneous musical objects potentially emitted by the “double sound” of a single vibraphone strike (or even one of the less perceptible yet still coherent sounds even deeper within the spectrum of one of these musical objects) (Schaeffer 1967, 43.1). While the latent component is not necessarily a musical concept (it may refer to practical engineering applications of source separation), the strain necessarily is. The term emphasizes that, at least musically, the latent component need not be considered as a “component” of the larger sound object. It aims to stress the musical emancipation of frequency components from the spectral confines of the sound object. It seems that the several different meanings of the word “strain” present some useful connotations that refer to the nature of the musical strain in consideration. First, as we have seen, “strain” has been used more or less synonymously with “line” or “voice” in contrapuntal texture. However, the Oxford English Dictionary gives one definition of “strain” as “a musical sequence of sounds,” which does not necessarily imply a melodic sequence of notes as does the contrapuntal “line” or “voice.” In sound-based music, and particularly in the concept of sinusoidal frequency components interacting contrapuntally to yield the timbre of a non-mixture sound, our interests certainly go beyond notes or melodic sequences of sounds. The musical definition of strain is perhaps antiquated enough to have lost any strong melodic connotation it may once have had. In this sense, the traditional, musical definition of “strain” seems particularly ripe for recontextualization, one that may broaden it to apply to those perceptual streams that interact in counterpoint to create timbre as part of a natural micropolyphony. As we saw in Part I, grains can undergo time-stretching, time-compression, and sound reversal. While it was not explored in this thesis, strains could be subjected to similar manipulations. However, applied to strains of sound, these manipulations would strongly allude to traditional contrapuntal parameters applied to strains of melody, such as augmentation, diminution, and retrograde respectively. In classical counterpoint, augmentation refers to the slowing down of a melody’s tempo while keeping its pitch content and rhythm in tact; diminution refers to the reverse of this. The retrograde form of a melodic line relates to its reversal that preserves its pitch content, tempo, and rhythm, just as sound reversal does in the context of concrète sounds. It deserves reiteration that strains necessarily derive from the same ancestral sound object. It is their “genealogical” tie through this common sound object that makes possible the particular processes of acousmaticization (that gradually hide this common sound object as source) and deacousmaticization (that gradually reveal it) exclusive to frequency domain granulation discussed earlier in this thesis. Hence the alternate definition of “strain” as “offspring” or “progeny” of a common ancestor serves well here too. Yet while strains descend from the same common ancestor, they each have “some quality somewhat contrasting with the rest” (“strain”). It is this quality that renders strains perceptually distinct from each other, and thus often unrecognizable as part of the original, larger sound object that requires more than one of its strains to be recognized as ancestor. Finally, like grains, strains are to interact in the plural, with other strains of the same ancestral sound object, and may call for more efficient macro level approaches to their organization into musical structures. As Xenakis wrote, “[m]icrosounds and elementary grains [alone] have no importance…. Only groups of grains and the characteristics of these groups have any meaning” (Xenakis 1992, 50). While the strain and grain call for very different musical processes, they are certainly related in that they both may represent an informational decomposition of the sound object, one that allows for entirely new sonic structures to be constructed (post-deconstruction) with the many, smaller sounds that existed in the original all along. It is obviously no coincidence that “strain” rhymes with “grain.” Indeed, the former was chosen in a nod to Xenakis and his concept proposed “in defiance of the usual manner of working with concrète sounds” (Solomos 1996).
The Positive of Sound: Conclusions
Stratosound is proposed as a response to the time domain sound separations characteristic of musique concrète and microsound, and particularly as a call for the musical separation of sounds from others to focus equally on the frequency domain. It stresses that greater attention need be given to the spectral excavation of what are traditionally thought of as “non-mixture” sounds. Separating sounds from others in a larger sonic structure entails an informational decomposition of that larger sonic structure. The dimension along which that larger sonic structure is decomposed— be it time or frequency—determines whether the products of decomposition retain more acoustic information from the original over time or frequency, and thus the musical processes for which the products of decomposition may or may not be appropriate. Until now, the musical isolation of sounds from other sounds, of sounds from their former acoustic contexts, has focused overwhelmingly on the time domain as the site for sound separation. Indeed, that the awkward term “frequency domain granulation,” a chiefly frequency domain operation, is coined in terms of the chiefly time domain operation of granulation (Roads 2001, xi, 188), points to the relegated status of the frequency domain as a site for sound separations.58 Musique concrète, which was the first music to begin with concrete, recorded sounds as opposed to abstract musical notation, obtained its musical material—the sound object—by isolating, or separating it from other sounds on the longer recording purely in terms of time. Pierre Schaeffer, the developer of musique concrète, was interested in disguising the identifying timbre of a sound by taking it “out of its own context” (Schaeffer 1967, 65.4). However, timbre is more than the result of time structure, and the “it” Schaeffer referred to was only a time component of a larger time structure, only a shorter, smaller timbre comprising a longer, larger timbre. The “context” Schaeffer referred to was the temporal context surrounding the time component (i.e. grain or sound object). Timbre is also the result of frequency structure, and the identifying timbre of a sound can also be “disguised” when certain parts of its sinusoidal strata are removed from their spectral context and heard alone. The blade and magnetic tape served as Schaeffer’s chief tools as he developed musique concrète, but they allowed him only to separate sounds from others in terms of time. These tools undoubtedly shifted his compositional focus onto the time domain as he attempted to take sounds out of their contexts. However, in his influential work Solfège de l’objet sonore, Schaeffer does speak of “double sounds” and “compound sound objects,” which consist of multiple sound objects simultaneously (Schaeffer 1967, 43.1, 77.4). While he noted that such sounds lend themselves to mental separation into their constituent parts, he remarked that this kind of a separation was “actually rather difficult to reproduce…materially by means of filtering” (Schaeffer 1967, 75.1). Filtering simply separates certain bandwidths of frequencies in a spectrum from others, and thus cannot necessarily separate distinct and perceptually relevant components of a signal that may overlap in frequency. Schaeffer was not privileged to the technologies of today, which indeed allow us to separate perceptual streams of sound that exist within a larger sound. One such technology, PLCA, may lead us to give the frequency domain the focus it deserves as a new site for the separation of sounds from others to serve as musical material. PLCA was developed by Paris Smaragdis, Bhiksha Raj, and Madhusudana Shashanka, and is based partly on the concepts of auditory scene analysis proposed by Albert Bregman (Bregman 1990) and Independent Subspace Analysis introduced by Michael Casey (Casey 1998; Casey, Westner, 2001). Although it is intended for the isolation of distinct sound-emitting sources in a sound mixture (i.e. source separation), PLCA may also be used to separate Schaeffer’s “non-mixture” (i.e. single source) “double sounds” into their constituent, perceptually relevant components. PLCA separates a sound into “maximally contrasting features,” and can be used to “deconstruct” sounds otherwise “indestructible” under simple bandpass filtering, sounds that Schaeffer described as having high “mass” (Schaeffer 1967, 1.22).
Perhaps a new term is needed to set things equal, one emphasizing that musical sound separations need not focus on the time domain.
Microsound arose as a critical response to musique concrète, in Iannis Xenakis’ “defiance of the usual manner of working with concrète sounds,” which he portrayed as unwieldy “block[s] of sound” (Solomos 1997). Through microsonic granulation, also known as granular sampling, Xenakis decomposed Schaeffer’s sound object into smaller time components, or grains, in order to construct new sonic structures. Nevertheless, granulation only extends the temporal sound separation techniques of musique concrète, as it conducts its separations along the time-domain. Still, microsound provides a paradigm of composing with sounds obtained by decomposing a larger sound object, which heavily influenced the compositional ideas presented in this thesis. In the sense that the music presented in this thesis derived from a misuse of PLCA, that of applying it to non-mixture sounds to obtain musical material, it can be related to the glitch aesthetic, “the new microsound,” which is characterized by the misuse of technology (Thomson 2004, 211212). It was indeed this misuse of PLCA that led to the development of new musical processes of acousmaticization and deacousmaticization, the gradual hiding and revealing of the source of sounds respectively. The ability of PLCA to deconstruct the sound object in terms of frequency also allowed for audible processes of destruction that could be heard to occur over time, processes that have been compared to the work of artist Cornelia Parker. Lastly, in composing the music presented in this thesis, a direct connection was drawn between the macro, composed counterpoints of instrumental music and a micro, natural one that could perhaps be thought to exist in all sounds, one that determines their timbral characteristics in a natural variant of what György Ligeti called micropolyphony (Ligeti et. al. 1983). As contrapuntal music consists of distinct strata that fuse and separate at different points in time based on certain relationships between them, so may natural, seemingly non-mixture sounds consist of distinct and coherent strata, of smaller sound objects, which blend together and divide in similar ways. These smaller sound objects that exist within the “non-mixture” sound object, to which Schaeffer briefly referred but which he could not convincingly isolate through filtering, have been portrayed as strains of sound comparable to strains of melody in contrapuntal music. An interesting piece by Cornelia Parker, The Negative of Sound, consisted of black lacquer residue obtained by cutting the original grooves in records:
The Negative of Sound (1996) by Cornelia Parker (Parker 2000, 50).
Parker was asked about this piece in the following excerpt from an interview:
Bruce Ferguson: Tell me about The Negative of Sound. Cornelia Parker: It’s…the idea of all sound produced, and to achieve it what do you have to get rid of? I was trying to imagine what a negative of sound was. I went to a record factory, hoping to find what I was looking for. There they take a nickel silver cast off a lacquer master disc, which is made in a recording studio. So I went to Abbey Road Studios—made famous by the Beatles—to witness that process. Then I realized they engrave away these spaces from the surface with a needle to produce the grooves.
Bruce Ferguson: In order to make space for the sound, as it were? Cornelia Parker: They’re engraving the sound, to produce the music they have to excavate the groove, the swirl of lacquer that they have to remove is what I’m interested in. It must be the negative of sound (Parker 2000, 51).
In a separate interview, she remarked:
The idea of the negative of sound, for me, is fantastic. How can you listen to the negative of sound? What does it sound like? (Parker 1996, 59).
Perhaps the larger sound object in which shackled sound strata lie becomes “the negative of sound” to the emancipated strain, “the positive of sound.” For the smaller sound object that is the strain to be produced, for it to be perceived, one may have to “get rid of” other sound strata that exist alongside it in the spectrum of the larger sound object (…to produce the music they have to excavate the groove…). Those many sinusoidal strata that comprise the larger sound object interfere with each other’s individual perception, and the desired must be separated from the undesired in order to “make space” for the perception of the former—sounds must be separated from sounds. Tony Cragg, a British sculptor whom Parker cites as a major influence (Tickner 2003, 367), presented his Britain Seen from the North in 1981, shown below:
Britain Seen from the North (1981) by Tony Cragg (Parker 2000, 12).
Britain Seen from the North challenges an ingrained frame of reference with which most Westerners view the world map, that is, with south below north. Cragg is not the only one to challenge this geographic frame of reference; the Australian Stuart McArthur’s Universal Corrective Map of the World places Australia at the center and the southern hemisphere on the top of the global map, thus challenging the “idea that Europe should be at the middle of the global map and the northern hemisphere at the top, with all the positive connotations implied by that double positioning” (Black 1997, 38). Of course, any depiction of the globe is an arbitrary one, and it is no more correct to place north above south than vice versa. Any depiction of sound is arbitrary in the same sense, and it is no more correct to place time on the x-axis and frequency on the y-axis as it is to do the reverse (Gabor 1947). Microsounds are only micro and stratosounds only stratose to the extent that they are depicted as such by the conventional time-frequency plane. It all depends on how you look at it. Nonetheless, it is time that composers shift their frame of reference toward the frequency domain as they separate sounds from others as they seek musical material. Indeed, therein lies a universe of sounds that have yet to receive our undivided attention.
Onomatoschizo: Ableton Live screenshot of 31 tracks containing 31 latent components extracted from the phrase, “the bloops and bleeps.” All 31 are summed and aligned by the end of the piece to resynthesize the phrase and reveal the semantic meaning of the sounds employed throughout the piece. The tracks containing the synthesized bloops and bleeps are not displayed. (Visible excerpt from master arrangement on far left indicated within black dotted box.)
Racquelement: Ableton Live screenshot of 21/200 tracks containing 21 latent components extracted from four separate, cliché techno sound objects. (Only latent components extracted from one of these four sound objects are shown.)
Stratovinsky: Screenshot of Max/MSP patch programmed (using the Ambisonics toolkit) for real-time spatialization control in Stratovinsky. The 16 circles within the larger circle indicate the 16 channels through which 128 latent components are sent on trajectories. The arrangement of the quadraphonic speaker setup is indicated by the four circles within the smaller circle.
Stratovinsky: Ableton Live screenshot of 25 tracks containing 25/128 latent components extracted from a 1.5 s sample of a famous chord from Stravinsky’s Le Sacre du Printemps (Danse Sacrale). Midway through the piece (approx. 2:30), the Stravinsky chord is reconstructed through the correct alignment of latent components. Once reconstructed, latent components drop out to deconstruct the chord once again—this time not through a misalignment, but a removal of acoustic information.
ARTseenSOHO. Web. 12 Mar. 2010 <http://www.artseensoho.com/Art/DEITCH/parker98/parker2.html>. Bartle, B. 1977. “The Tuning of the World by R. Murray Schafer.” Journal of Research in Music Education 25(4): 291-293. Bernard, J. 1994. “Voice Leading as a Spatial Function in the Music of Ligeti.” Music Analysis 13(2-3): 227253. Black, J. 1997. Maps and Politics. Chicago: The University of Chicago Press. Bregman, A. 1990. Auditory scene analysis: the perceptual organization of sound. Cambridge, Mass: MIT Press. Cascone, K. 2000. “The aesthetics of failure: post-digital tendencies in contemporary computer music.”
Computer Music Journal 24(4): 12–18. Casey, M. 1998. Auditory Group Theory with Applications to Statistical Basis Methods for Structured Audio. Ph.D. dissertation, Massachusetts Institute of Technology. Casey, M. & Westner, A. 2001. “Separation of Mixed Audio Sources by Independent Subspace Analysis.” Proceedings of the 2000 International Computer Music Conference. San Francisco: International Computer Music Association. Chadabe, J. 1997. Electric Sound. Upper Saddle River: Prentice-Hall. Cherry, E. C. 1953. “Some experiments on the recognition of speech, with one and with two ears.” Journal of Acoustical Society of America 25(5): 975-979. Cooley, J. & Tukey, J. 1965. “An algorithm for the machine computation of complex Fourier series.” Mathematical Computation 19: 297-301. Cox, C. & Warner, D. 2004. Audio Culture: Readings in Modern Music. New York, NY: Continuum International. Di Scipio, A. 1997. “The problem of 2nd-order sonorities in Xenakis’ electroacoustic music.” Organised Sound 2(3): 165-178. Evens, A. 2005. Sound Ideas: Music, Machines, and Experience (Theory Out of Bounds). University of Minnesota Press. Ferneyhough, B., Boros, J., Toop, R., & Harvey, J. 1995. Collected writings. Contemporary music studies, v. 10. Amsterdam: Harwood Academic. “filter.” Ears: ElectroAcoustic Resource Site. 2002. Web. 14 May 2010
Gabor, D. 1946. “Theory of communication.” Journal of the Institute of Electrical Engineers Part III, 93: 429-457. Gabor, D. 1947. “Acoustical quanta and the theory of hearing.” Nature 159(4044): 591-594. Hofmann, T. 1999. “Probabilistic Latent Semantic Analysis.” Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI’99). Kennan, K. W. 1999. Counterpoint: Based on eighteenth-century practice. Upper Saddle River, NJ: Prentice Hall. Landy, L. 2007a. Understanding the Art of Sound Organization. Cambridge, MA: MIT Press. Landy, L. 2007b. La musique des sons. Paris: Sorbonne MINT/OMF. Ligeti, G., Várnai, J., & Samuel, C. 1983. György Ligeti in conversation with Péter Várnai, Josef Häusler, Claude Samuel, and himself. London: Eulenburg: 13-82. Lippe, C. 1993. “A Musical Application of Real-Time Granular Sampling Using the IRCAM Signal Processing Workstation.” Proceedings of the 1993 International Computer Music Conference. San Francisco: International Computer Music Association. “musique concrète.” Ears: ElectroAcoustic Resource Site. 2002. Web. 2 Mar. 2010
<http://www.ears.dmu.ac.uk/spip.php?rubrique143>. Nattiez, J. 1990. Music and discourse: toward a semiology of music. Princeton, N.J.: Princeton University Press. “noise.” Ears: ElectroAcoustic Resource Site. 2002. Web. 2 Mar. 2010 <
http://www.ears.dmu.ac.uk/spip.php?rubrique93>. Parker, C. 1996. Avoided Object: [exhibition]. Cardiff: Chapter. Parker, C. 2000. Cornelia Parker. Boston: Institute of Contemporary Art. Parker, C., & Jahn, A. 2005. Cornelia Parker: Perpetual Canon : Württembergischer Kunstverein Stuttgart. Bielefeld: Kerber. Raj, B., Shashanka, M., & Smaragdis, P. 2006. “A Probabilistic Latent Variable Model for Acoustic Modeling.” Advances in models for acoustic processing workshop, NIPS 2006. “reduced listening.” Ears: ElectroAcoustic Resource Site. 2002. Web. 2 Mar. 2010 <
http://www.ears.dmu.ac.uk/spip.php?rubrique219>. Reich, S. 2002. Writings on Music. New York: Oxford University Press. Roads, C. 1995. The Computer Music Tutorial. Cambridge, Mass: MIT Press. Roads, C. 2001. Microsound. Cambridge, MA: MIT Press.
Robindoré, B. 1996. “Eskhaté Ereuna: Extending the Limits of Musical Thought—Comments On and By Iannis Xenakis.” Computer Music Journal 20(4): 11-16. Schaeffer, P. 1967. Solfège de l'objet sonore. Paris: INA/GRM. Schaeffer, P. 1977. Traité des Objets Musicaux. Second edition. Paris: Èditions du Seuil. Schafer, R. M. 1977. The Tuning of the World. New York: Knopf. Schellenberg, E., Iverson, P., & McKinnon, M. 1999. “Name that tune: Identifying popular recordings from brief excerpts.” Psychonomic Bulletin & Review 6: 641–646. Sherburne, P. 1998. “click/.” Web. 15 Mar. 2010 <http://www.mille-
plateaux.net/theory/download/p.sherburne.pdf>. Originally printed in liner notes for ‘Clicks and Cuts 2’. Frankfurt: Mille Plateaux MP98CD. Solomos, M. 1997. Program notes to Xenakis: Electronic Music. Compact disc CD 003. Albany: Electronic Music Foundation. Stockhausen, K. 1989. Stockhausen on Music: Lectures & Interviews. Marion Boyar: London and New York. “spectrum.” Ears: ElectroAcoustic Resource Site. 2002. Web. 12 May 2010 <
http://www.ears.dmu.ac.uk/spip.php?rubrique101>. “strain, n.” The Oxford English Dictionary. 2nd ed. 1989. OED Online. Oxford University Press. Web. 10 Jan. 2010 <http://dictionary.oed.com/cgi/entry/50238816>. Thomson, P. 2004. “Atoms and errors: towards a history and aesthetics of microsound.” Organised Sound 9(2): 207-218. Tickner, L. 2003. “A Strange Alchemy: Cornelia Parker.” Art History 26(3): 364–391. Truax, B. 1986. “Real-time granular synthesis with the DMX-1000.” Proceedings of the 1986 International Computer Music Conference. San Francisco: International Computer Music Association: 138-145. Truax, B. 1994. “Discovering inner complexity: time-shifting and transposition with a real-time granulation technique.” Computer Music Journal 18(2): 38-48. Wishart, T. 1994a. Audible Design. York: Orpheus the Pantomime. Wishart, T. 1994b. Audible Design: Appendix 2. York: Orpheus the Pantomime. Wishart, T. 1996. On Sonic Art. London: Hardwood. Xenakis, I. 1992. Formalized Music: Thought and Mathematics in Music (revised and extended version of the 1971 edition). Stuyvesant, NY: Pendragon Press.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.