Sound Diffusion Systems For The Live Performance of Electroacoustic Music - James R. Mooney - 2005

The University of Sheffield
James R. Mooney
Sound Diffusion Systems

for the Live Performance of
Electroacoustic Music
An Inclusive Approach led by Technological and
Aesthetical Consideration of the Electroacoustic
Idiom and an Evaluation of Existing Systems
Submitted for the Degree of Ph.D. in the Faculty of Arts
Department of Music
August 2005
i
License
This work is licensed under a Creative Commons Attribution-

Noncommercial-Share Alike 3.0 License.
Key points are as follows.
You are free:
• To share – to copy, distribute and transmit the work.

• To remix – to adapt the work. This could take the form of
quotations/citations and/or development of concepts.
Under the following conditions:
• Attribution. You must attribute the work in the manner specified

by the author or licensor (but not in any way that suggests that
they endorse you or your use of the work). In an academic
context this simply means following the usual guidelines on
referencing/citation.
• Noncommercial. You may not use this work for commercial
purposes.
• Share Alike. If you alter, transform, or build upon this work, you
may distribute the resulting work only under the same or similar
license to this one.
See http://creativecommons.org/licenses/by-nc-sa/3.0/legalcode for full

details.
ii
Abstract
This thesis documents research in the field of sound diffusion for the live
performance of electroacoustic music. Broad and inclusive ways of
conceptualising electroacoustic music are presented, with the intention of
promoting the design of improved sound diffusion systems in the future.
Having defined ‘electroacoustic music’ in terms of the technologies

involved and the unique ways in which these creative frameworks are
appropriated by practitioners (Chapter 1), a binary interpretation of the
electroacoustic idiom, whereby musical philosophies can be regarded as
either top-down or bottom-up, is given (Chapter 2). Discussion of the
process of sound diffusion itself reveals two distinct performance praxes,
which can also be characterised as top-down and bottom-up (Chapter 3).
These differing ideologies, in addition to the technical demands of the
electroacoustic idiom and the logistical demands of sound diffusion itself,
must be accommodated by the sound diffusion system if live performances
are to achieve the desired musical communication. It is argued that this is
not presently the case.
A system of criteria for the evaluation of sound diffusion systems is

presented (Chapter 4). Two original concepts – the coherent audio source
set (CASS) and coherent loudspeaker set (CLS) – are also presented; these
are intended to be practically and theoretically useful in the field of sound
diffusion. Several existing diffusion systems are evaluated in terms of these
criteria (also Chapter 4). A description and evaluation of the M2 Sound
Diffusion System, which was co-developed by the author as part of this
research, is also given (Chapter 5).
The final chapter describes ways in which superior future systems can be
devised. These range from specific practical suggestions to general
methodological recommendations. Overall, the intention is to provide an
interpretation of the electroacoustic idiom that can be used as a heuristic
tool in the design of new sound diffusion systems.
iii
Acknowledgements
In no specific order, I would like to thank:
My family, for their unwavering support throughout what has, in many
respects, been a difficult and challenging four years;
My friend and colleague Dr. David Moore, for collaborative work,
guidance, and general advice and inspiration;
Nikos Stavropoulos, Mark Horrocks, Menia Pavlakou, Carla Ribeiro, and
Themis Toumpoulidis, who are both friends and fellow Ph.D. students;
Dr. Colin Roth, whose guidance has been extremely significant, and whose
writings1 and theories have been nothing short of inspirational;
My supervisor, Dr. Adrian Moore, for giving me the opportunity to carry
out this research, and all of the other staff at the Music Department;
Professor Jonty Harrison, for his enthusiasm, support, and very kindly
donated opinions (see Appendix 1);
Tom, Cathy, Rachel, Katie, John, Becky, Simon: sorry to lump you all
together, but I had to limit myself to one page!
I would like to dedicate this work to my Mum and Dad, and also to the fond
memory of my cat, Beep, who sadly died while I was studying here.
Beep: 1988 – 20th May 2004.
1
Roth (1997). Being Happier (Sheffield; Kingfield Press).
v
Preface
These are some of the empirical observations and concrete experiences that
ultimately led to my writing this thesis.
When I first experienced the live diffusion of electroacoustic music – at the

Sonic Arts Network Conference in 1999, which I attended mainly out of
curiosity during my undergraduate Music studies at Newcastle University –
I was impressed by the results, but confused by the process. ‘Isn’t it
strange,’ I thought, ‘to play a CD in front of a quietly seated audience?’
‘Isn’t it strange that people obediently applaud at the end?’ Like many
newcomers, I also wondered, ‘What is the performer actually doing with
that mixing desk?’ It is also true to say that, at the time, I was a newcomer
to electroacoustic music in general. As a mainly instrumental performer I
was intrigued – but equally confused – by this new musical language, with
its strange performance practice conventions.
Later, while working towards an M.Sc. in Music Technology at York

University, I was able to experience more electroacoustic music, and began
also to compose some myself (as an undergraduate I had specialised in
‘music technology,’ but my compositional work had always been essentially
‘instrumental’ in nature). Similar questions crossed my mind in diffusion
concerts, only this time they were better informed: ‘How many channels are
we dealing with?’; ‘How is the surround sound effect being achieved?’;
‘What is pre-composed, and what is happening live?’ and ultimately, ‘What
is the performer doing with that mixing desk?!’ I was almost disappointed to
learn that, most of the time, we are dealing with only two channels (stereo),
and that ‘all’ the performer is doing, is controlling the levels of various pairs
of loudspeakers. ‘It really is just like playing back a CD in front of an
audience,’ I thought. I don’t recall what I imagined the performer might
have been doing, but I suppose I hoped it might be more exciting than that!
Also while at York, I was introduced to the Ambisonic surround sound

system, and became fascinated with the possibility of controlling sound
vii
sources three-dimensionally. As my final M.Sc. project, I wrote two pieces
of software for Ambisonic panning. Whilst researching this project, I
happened to read a paper by Jonty Harrison entitled ‘Sound, Space,
Sculpture: Some Thoughts on the What, How, and Why of Sound
Diffusion.’2 This satiated some of my curiosities relating to sound diffusion.
It even helped to answer my recurring question, ‘What is the performer
doing with the mixing desk?’ It helped me to gain a clearer idea of the
‘what’ and ‘how’ of sound diffusion, but I was still confused about the
‘why.’ Why would composers restrict themselves to working in stereo when
multichannel formats were so easily available? I was also unconvinced of
the methods (the ‘how,’ I suppose) of sound diffusion: how could
performers be satisfied with such a crude and apparently imprecise means of
controlling the final outcome? Surely, given that the performer is doing so
‘little’ anyway (this is what I thought at the time), multichannel formats
would far more flexible and precise; why bother with this seemingly
tokenistic ‘performance’ stage? I was fairly convinced that my Ambisonic
plugins, which would allow composers to accurately control the three-
dimensional projection of sound in the studio, were a much better solution.
It was not until I arrived at Sheffield University that I had the chance to
diffuse electroacoustic music myself. My supervisor, Adrian, asked if I
would like to diffuse a piece at the forthcoming concert (if I recall correctly,
the concert was entitled ‘Xhbt B,’ and took place in the Long Gallery of
Sheffield’s Millennium Galleries in November 2001). The piece was Åke
Parmerud’s Les Flûtes en Feu (1999).3 I wasn’t entirely sure what was
expected, but from Harrison’s paper I gathered I had to ‘make the loud
material louder and the quiet material quieter – thus stretching out the
dynamic range to be something nearer what the ear expects in a concert
situation.’4 This seemed straightforward enough, and armed with this
information, I set about creating a graphical score to aid me in my task.
2
Harrison (1998). "Sound, Space, Sculpture - Some Thoughts on the 'What,' 'How' and
'Why' of Sound Diffusion". Organised Sound, 3(2): 117-127.
3
Various (2000). Métamorphoses 2000. (Compact Disc - Musique & Recherches:
MR2000).
4
'Why' of Sound Diffusion". Organised Sound, 3(2): 121.
viii
There were plenty of flute crescendi that, I felt, would lend themselves very
well to such dynamic exaggeration. I also decided, from listening to the
piece at home, that certain sections would sound effective if they
disappeared into the distance at the front or rear of the hall.
I enjoyed my first hands-on experience of sound diffusion but, I have to say,

I still wasn’t completely convinced that it was a strictly necessary
procedure, nor that the means for achieving it were especially well-suited to
the task. Much of the time, I felt limited by the fact that the material was
stereo, and restricted by the at times unrelenting pace of the music in
comparison with the cumbersome task of manipulating faders. I still ended
up thinking that – enjoyable as the experience was – it would ultimately be
better to do a multichannel composition to begin with. Certainly, some of
my observations were naïve. On the other hand, some of them were – and I
believe still are – valid, and continue to be raised by experienced
practitioners and newcomers alike.
One aspect that I found particularly hard to comprehend, both conceptually

and aesthetically, was the fact that sound diffusion seemed to involve the
spatialisation of material that is itself already spatialised. I had no difficulty,
conceptually, with the idea of ‘panning’ a monophonic source in real time,
nor with the idea of spatialising stereophonic sources within a larger sound
field: the software I developed at York was capable of both of these things.
But something didn’t ‘add up.’ I began to feel that there must be a
fundamental difference between ‘panning’ and ‘diffusion.’
During this time, I was also engaged in the preparatory stages of

composition. With newly acquired binaural microphones I made field
recordings of trams, ‘whirring’ sounds, traffic, populated environments,
sparse environments, and various other materials. I also continued to seek
out interesting pieces of free audio processing software, an interest I had
ix
acquired whilst in York.5 In late 2001, two Studies in VST resulted, and – in
early 2002 – a piece entitled Coney Street, named after the road in York
where the main source recordings were taken.
In 2002, as an Easter holiday, I spent a week travelling around the costal

perimeter of Scotland, gathering many recordings on the way. On my return
to Sheffield, I visited the MAXIS Symposium, where I heard and saw many
interesting things, among them Shawn Decker’s sound sculptures. Two
compositions – Smoo Cave and Pointless Exercise No. 911980 – emerged:
the former is a ‘soundscape’ composition comprising recordings made
inside Smoo Cave, Durness; the latter uses recordings of two of Decker’s
sound sculptures as its source material. In the summer of the same year, I
paid a twelve-day visit to Bourges to take in the electroacoustic music
festival. In the months following my return, two more compositions –
Everything I Do is in Inverted Commas and Graffiti 2 – were produced. The
latter is in 5.1, reflecting my long-standing interest in surround sound.
In all, between late 2001 and early 2003, I produced just over an hour of
electroacoustic music. Overall, I was happy with this, but I must confess to
having found the compositional process difficult, and frustrating at times.
Regardless of the quality of the results, I somehow didn’t feel that I could
justify, on a personal level, my work in the studio. I felt as though I was
somehow ‘missing the point’ of what I was doing.
In November 2002 – in an attempt to address this – I recorded five lengthy

interviews with composers working at the University of Sheffield Sound
Studios: two with M.Phil. students, two with Ph.D. students, and one with
my supervisor. I quizzed each interviewee about their views on
electroacoustic music, with a particular focus on one of their recent works.
Opinions were very divided. I also ‘interviewed’ myself, recording my
thoughts at the time. Seemingly, I was very concerned with the relationship
between ‘abstract’ sounds and ‘referential’ sounds. In retrospect, I was
5
…and one which I have consistently maintained. The ‘Freeware VST Plugins Directory’
that I started in York has since developed into ‘Freeware Audio Resources Online’
(FARO), an online database available at http://www.freeware-audio-resources.net.
x
clearly having difficulty in understanding what the structuring principles of
electroacoustic music might be: in the absence of pitch and metrical rhythm
(with which I was fully accustomed), how does a structure take the listener
from one moment to the next? I could understand that obvious referentiality
might be an alternative structuring force, but this seemed to be at odds with
a lot of the music I was hearing, which as far as I could tell was completely
non-referential. This, I could not understand: it is no wonder I was having
difficulties in the studio!
It was also at around this time that I became interested in a piece of software
that a fellow Ph.D. student, Dave Moore (whom I had met while we were
both teaching at Barnsley College) was working on. This marked the
beginning of a long-standing collaboration in the development of the M2
(Mooney and Moore) Diffusion System, incorporating Moore’s software
and making further software and hardware developments as a means to
addressing some of my early (and subsequent) observations.
Further to my formative experiences in the studio and at the diffusion

console, I decided – around a year after arriving in Sheffield – that I wanted
to focus my attention on gaining a better understanding of the raisons d’être
of electroacoustic music, the purpose and process of sound diffusion, and
the relationship between the two. To my mind, I was convinced that there
must be some commonality between the diverse opinions and approaches.
What started off as the intention to read more books and papers, take a few
notes, and generally organise my thoughts more coherently, rapidly evolved
into a much larger enquiry, of which this thesis is the ultimate result.
At an early stage during my research it became apparent to me that there

were many different attitudes and working procedures embodied within the
electroacoustic idiom. Opinions with respect to what electroacoustic music
‘is,’ in particular, were divided; comments such as, ‘Well, that’s not
electroacoustic music’ were fairly common. The trouble is that these
opinions were often somewhat at odds with each other: what, to one person,
constituted ‘electroacoustic music,’ was the absolute antithesis to someone
xi
else. It also became evident that attitudes towards the process of sound
diffusion were similarly fragmented. This stands to reason: if opinions
regarding the nature of the music itself are varied, then so too will be
attitudes towards the ways in which it should be performed. This thesis is
therefore as much about electroacoustic music in general as it is about sound
diffusion in particular.
I now personally feel that I have gained an understanding of the theory and
practice of electroacoustic music, and as such it is sometimes necessary to
remember that newcomers to the idiom are quite possibly experiencing, for
the first time, similar difficulties to those I experienced. I now recognise the
cause of my original misunderstandings, and hope that this thesis will be
useful to anyone experiencing similar problems. Students at De Montfort
University, Leicester (where I have been lecturing for the past eighteen
months) have certainly responded well. Equally, I hope that practitioners
more experienced than myself will gain something from the perspective
offered in the following pages. In either case, the intention is to present my
ideas about electroacoustic music, sound diffusion, and propose ways in
which these can be regarded as homogeneous.
My approach is deliberately broad because I feel that this is an appropriate

way to address an idiom that is so diverse. Of course the down-side to this
approach is that some of the issues discussed could be taken much further.
My argument, however, is this: in sound diffusion, works from across the
(constantly shifting and expanding) spectrum of the electroacoustic idiom
will need to be performed side-by-side, and the diffusion system used will
have to cope with this. If future systems are to be designed with this borne
in mind, then a broad understanding of the electroacoustic idiom – even if
this is at the expense of certain ‘details’ – will be highly beneficial.
Therefore, if this thesis seeks to answer only one question, then that
question (actually three questions!) would be, ‘What is electroacoustic
music, how is it performed, and what can we do to perform it better?’
James Mooney, August 2005.
xii
Table of Contents
License ii
Abstract iii
Acknowledgements v
Preface vii
Table of Contents xiii
Introduction 1
1. Electroacoustic Music: Defining Characteristics 5

1.1. Introduction 5
1.2. Existing Definitions 5
1.3. Towards a More Useful Definition 7
1.4. The Omnipresent Loudspeaker 8
1.5. Fixed Media 8
1.6. Encoded Audio Streams 9
1.6.1. Encoded Audio as an ‘Abstraction Layer’ 11
1.6.2. Encoding Spatio-Auditory Attributes in Multiple Streams 13
1.7. Audio Technologies 17
1.7.1. Audio Encoding Technologies 18
1.7.2. (a) Recording and (b) Playback Technologies 18
1.7.3. Synthesis Technologies 19
1.7.4. Audio Processing Technologies 19
1.7.5. Software Technologies 20
1.7.6. Audio Decoding Technologies 20
1.8. The Technical Scope of Live Electroacoustic Music 22
1.8.1. Works presented solely from a fixed medium 23
1.8.2. Works consisting of sounds synthesised in real-time 24
1.8.3. Works consisting of acoustic sound sources 25
1.8.4. Works consisting of fixed medium sources processed in real
time before decoding 26
1.8.5. Works consisting of audio streams synthesised in real time and
processed before decoding 27
1.8.6. Works consisting of acoustically generated sounds encoded and
processed in real time before decoding 28
1.9. Creative Frameworks 29
xiii
1.10. Analysis of the Loudspeaker as a Creative Framework 31
1.11. Methodological Choices in the use of Loudspeakers 33
1.11.1. Choice and Treatment of Sound Material 34
1.11.2. Construction of Auditory and Spatial Illusion 35
1.11.3. Mode of Listening 36
1.12. Towards an Understanding of ‘Electroacoustic Music’ 38
1.13. Creative Exploration of Framework-Specific
Methodological Choices in Electroacoustic Music 40
1.13.1. Exploration of Choice and Treatment of Sound Material 40
1.13.2. Exploration of the Construction of Auditory and Spatial
Illusion 42
1.13.3. Exploration of the Mode of Listening 43
1.14. Back to Technology 47
1.15. Summary 48
1.15.1. The Technological Demands of Electroacoustic Music 48
1.15.2. Particular Approach to Audio Technologies in Electroacoustic
Music 50
2. Top-Down and Bottom-Up Approaches to

Electroacoustic Music 53
2.1. Introduction 53
2.2. Electroacoustic Aesthetics: A Case History 55
2.2.1. Musique Concrète 56
2.2.2. Elektronische Musik 61
2.3. The Implications of ‘Real’ and ‘Artificial’ Sounds 65
2.4. Musique Concrète versus Elektronische Musik 67
2.5. ‘Top-Down’ and ‘Bottom-Up’ Compositional Models 73
2.5.1. Invention versus Creation; Building versus Construction 74
2.5.2. Abstracted Syntax versus Abstract Syntax 76
2.5.3. Organic versus Architectonic 77
2.5.4. The Relationship between Text and Context 79
2.6. Examples of Top-Down and Bottom-Up Approaches in
Contemporary Electroacoustic Music 84
2.7. Extending the Top-Down/Bottom-Up Paradigm 88
2.7.1. Electroacoustic Music as a means to the Potential Unification
of Top-Down and Bottom-Up Models 91
2.8. Summary 94
2.8.1. Top-Down and Bottom-Up Approaches to Electroacoustic
Music 94
2.8.2. Creative Frameworks can be ‘Directionally Biased’ 94
2.8.3. Electroacoustic Technologies allow Equally for Top-Down and
Bottom-Up Compositional Approaches 96
xiv
3. Sound Diffusion 99
3.2. What is Sound Diffusion? 101
3.3. Acoustic Issues in the Performance of Electroacoustic
Music 102
3.4. Diffusion Example: Two Movements from Parmegiani’s
De Natura Sonorum 106
3.4.1. Movement 2: ‘Accidentals/Harmonics’ 109
3.4.2. Movement 3: ‘A Geological Sonority’ 115
3.5. Coherent Audio Source Sets (CASS) 122
3.6. CASS Attributes 127
3.6.1. Consists of an Arbitrary Number of Discrete Encoded Audio
Streams Treated Collectively as a Single Entity 127
3.6.2. Independent of the Number of Encoded Audio Streams
provided by any Single Audio Source 127
3.6.3. Constituent Channels Contain Phantom Imaging 128
3.6.4. Order of Channels is Spatially Determinate 128
3.6.5. Reliance on a Specific Coherent Loudspeaker Set 129
3.6.6. Hierarchical Organisation 129
3.7. Coherent Loudspeaker Sets (CLS) 132
3.8. CLS Attributes 137
3.8.1. Specific Shape 138
3.8.2. CASS Channels must be Appropriately Routed to CLS
Loudspeakers 139
3.8.3. Reliance on a Specific Coherent Audio Source Set 140
3.8.4. Multiple CLSs can Exist within a Single Loudspeaker Array140
3.8.5. A Single Loudspeaker can be a Member of Multiple CLSs 142
3.8.6. Hierarchical Organisation 143
3.9. Sound Diffusion: A Re-Evaluation in terms of the
Concepts of CASS and CLS 144
3.10. ‘Diffusion’ versus ‘Panning’ 145
3.11. Uniphony and Pluriphony 147
3.12. Top-Down and Bottom-Up Approaches to Sound
Diffusion 149
3.12.1. The Top-Down Model 151
3.12.2. The Bottom-Up Model 154
3.13. The Composition-Performance Continuum 158
3.14. Review of Top-Down and Bottom-Up Techniques 161
3.15. Summary 163
xv
4. Sound Diffusion Systems 167
4.2. What is a Sound Diffusion System? 168
4.3. Criteria for the Evaluation of Sound Diffusion Systems170
4.3.1. Ability to Cater for the Technical Demands of Live
4.3.2. Ability to Cater for both Top-Down and Bottom-Up
Approaches to Sound Diffusion 172
4.3.3. Flexibility of Signal Routing and Mixing Architecture 176
4.3.4. Interface Ergonomics 178
4.3.5. Accessibility with respect to Operational Paradigms 179
4.3.6. Accessibility with respect to Technological System
Prerequisites and Support thereof 181
4.3.7. Compactness / Portability and Integration of Components 183
4.3.8. Reliability 184
4.4. Criteria for the Evaluation of Sound Diffusion Systems:
Summary 185
4.5. Description and Evaluation of Existing Sound Diffusion
Systems 186
4.6. Generic Sound Diffusion Systems 187
4.6.1. Generic Diffusion System 1 (GDS 1): Description 187
4.6.2. Generic Diffusion System 2 (GDS 2): Description 190
4.6.3. Generic Diffusion System 1 (GDS 1): Evaluation 191
4.6.4. Generic Diffusion System 2 (GDS 2): Evaluation 195
4.6.5. Generic Diffusion Systems in General: Evaluation 197
4.7. The Acousmonium 199
4.7.1. Description 199
4.7.2. Evaluation 201
4.8. The BEAST Diffusion System 202
4.9. BEAST’s Octaphonic Diffusion Experiments 206
4.10. The Creatophone 211
4.11. The Cybernéphone 214
4.12. Sound Cupolas and The Kinephone 220
4.12.1.1. The Kinephone 223
xvi
4.13. The ACAT Dome 226
4.14. The DM-8 and the Richmond AudioBox 229
4.15. Other Systems 233
4.16. Summary 234
5. The M2 Sound Diffusion System 239

5.2. System Description 240
5.3. Physical System Architecture 245
5.4. The SuperDiffuse Software 248
5.4.1. SuperDiffuse Server Application 248
5.4.2. Super Diffuse Client Application 249
5.4.2.1. Assigning Mix Matrix Parameters Directly to Physical
Faders 250
5.4.2.2. Group Faders 251
5.4.2.3. Effect Faders and Effect Types 252
5.4.2.4. The Monitor Window 254
5.4.2.5. Summing of Parameters 255
5.4.2.6. Summary 256
5.5. Using the System 257
5.5.1. Transportation 257
5.5.2. Physical Setup 258
5.5.3. Software Configuration 259
5.6. Evaluation 263
5.6.1. Flexibility of Signal Routing and Mixing Architecture 263
5.6.2. Interface Ergonomics 265
5.6.3. Accessibility with respect to Operational Paradigms 269
5.6.4. Accessibility with respect to Technological System
Prerequisites and Support thereof 271
5.6.5. Compactness / Portability and Integration of Components 273
5.6.6. Reliability 274
5.6.7. Ability to Cater for the Technical Demands of Live
5.6.8. Ability to Cater for both Top-Down and Bottom-Up
Approaches to Sound Diffusion 278
5.7. Summary 280
xvii
6. A Way Forward for the Design of New Sound
Diffusion Systems 283
6.2. Some Advantageous Paradigms and Architectures 284
6.2.1. Matrix Mixing Architecture 284
6.2.2. Software-Based Systems 285
6.2.3. Flexible and Accessible Operational Paradigms 286
6.2.4. Client/Server Architecture 287
6.2.5. Scalable Systems 288
6.2.6. Accessible System Prerequisites 289
6.2.7. Modular Systems 289
6.3. Some more Specific Possibilities 291
6.3.1. Increased Functionality via CASS and CLS Concepts 292
6.3.2. Interface/Loudspeaker-Array Independence 294
6.3.3. Diffusion ‘Actions’ 296
6.3.4. Support for Multiple Physical Control Paradigms 297
6.3.5. Extended DSP Functionality 299
6.3.6. External and/or Automated Control via Third-Party Software
Development 299
6.3.7. Diffusion System Software Plugins 300
6.3.8. Logistical Functionalities 301
6.3.9. Increased Accessibility via Online Resources 302
6.4. Areas for Further Research: an Inclusive Approach… 302
6.4.1. …in terms of Technology 303
6.4.2. …in terms of Aesthetics and the Aesthetic Affordances of
Creative Frameworks 304
6.4.3. …in terms of Specific Methodologies and Techniques 306
6.4.4. …in terms of Potential Applications Outside the
Electroacoustic Idiom 307
6.5. Summary 307
Conclusion 311
Appendix 1: An Interview with Professor Jonty

Harrison 319
Appendix 2: Design of the M2 Sound Diffusion System

Control Surface 353
A Modular Control Surface 353
Voltage to MIDI Conversion 353
Power Supply 354
Physical Interfacing 354
Generic Module Interface 355
xviii
M2 AtoMIC Splitter (M2AS) Module 356
M2 Four-Vertical-Fader (M2FVF) Module 358
Internal Control Surface Architecture 359
External Casing 360
Bibliography 367
xix
xx
Introduction
This research is intended to be of use to anyone seeking to engage in the

design, development and implementation of new sound diffusion systems
for the public performance of electroacoustic music, or wishing to make
modifications and improvements to existing systems. In this capacity, it
does not assume a highly detailed knowledge with respect to the nuances of
the electroacoustic idiom, although this will, of course, be beneficial. This
thesis is also, however, intended to be of interest to practitioners already
experienced in the art of sound diffusion, and in this respect it is hoped that
new insights will be realised. On the basis of this dual functionality it is the
purpose of this thesis, on a very broad level, to deconstruct work in the field
of sound diffusion into a practice that at the highest level – like
electroacoustic music itself – comprises both technological and aesthetic
aspects. The ultimate outcome of this enquiry will consist in the proposition
of various means by which new sound diffusion systems that are better
suited to the purpose can be developed, designed, and implemented.
From a purely technological perspective, the performance of electroacoustic

music can involve the mediation of a wide variety of different technologies.
These range from microphones, to compact discs, DAT tapes and other
fixed media, to synthesisers, signal processing units, sophisticated real-time
control devices, and software running on laptop computers. All of these are,
ultimately, subservient to the loudspeaker as the means to making the
musical results audible. A sound diffusion system, therefore, must ideally be
designed in such a way so as to accommodate all of these technologies (and
more), in the context of a live performance. If this objective is to be
attained, it will clearly be necessary to know, broadly, what the
technological demands are likely to be. This is to be one of the primary
concerns of Chapter 1. By considering each technology as a ‘creative
framework,’ an approach will also be made towards an understanding of the
unique way in which technology is approached in the composition and
performance of electroacoustic music.
1
The second chapter will follow, in a sense, the opposite path to the first,
adopting a line of enquiry that is, in the first instance, aesthetic, and from
this standpoint moving towards the more technological considerations. The
context of this chapter will also be historical, to an extent. Specifically, it
will be observed that the praxes of musique concrète and elektronische
Musik – the cultural phenomena now widely acknowledged to have
developed into what we now understand to be ‘electroacoustic music’ –
embodied markedly different aesthetic standpoints. These were, in many
respects, diametrically opposed and will be classified, respectively, as ‘top-
down’ and ‘bottom-up.’ Although elektronische Musik and musique
concrète are often regarded as ‘extinct,’ it will be demonstrated that top-
down and bottom-up approaches are still evidenced in modern
electroacoustic music.
The significance of this in the context of sound diffusion rests in the fact
that each of these standpoints adopts a fundamentally different approach to
the use of technology as a creative framework. It is proposed that the broad
division of electroacoustic works into top-down and bottom-up – although
in some respects crude – will serve as a useful guide in evaluating the kinds
of communication that sound diffusion systems will be required to facilitate.
Chapter 3 will begin to tie together the technological and aesthetic threads
exposed in the previous two chapters. It will be noted, importantly, that top-
down and bottom-up approaches to the creative process engender not only
different kinds of music, but also different diffusion techniques. The basis of
this observation will rest in the assertion that top-down and bottom-up
works differ fundamentally in terms of what it is they seek to communicate
musically, and therefore necessitate entirely different modes of
performance. Of course, diffusion systems will ideally be able to facilitate
both approaches.
Chapter 4 evaluates several existing diffusion systems in terms of their

appropriateness for the task of performing works from across the
technological and aesthetic spectrum of the electroacoustic idiom. This
2
evaluation will be realised according to a set of evaluation criteria proposed
at the start of the chapter. It is also suggested that these criteria will be
useful in the design of future systems. Several observations and conclusions
will be made, among them the suggestion that – again, broadly speaking –
diffusion systems can be very roughly divided into those that are best suited
for the performance of top-down works, and those that are best suited for
bottom-up. On this basis there would appear to be a clear need for new
systems able to accommodate both types of work.
Chapter 5 will be devoted to a description and evaluation of the M2 Sound

Diffusion System, which was co-developed by the author as an integral part
of this research. In this respect, Chapter 5 is basically an extension of
Chapter 4. It should be noted that the M2 is not being presented as an ‘ideal’
system, although it will be shown that several of its paradigms and
architectures have proved effective.
The final chapter proposes directions in which practical and theoretical

research into the development of future sound diffusion systems can be
taken. These conclusions will be drawn from observations made in all of the
previous chapters. These propositions do not represent a ‘design
specification,’ as such (although several specific suggestions will be made)
but are rather less prescriptive in nature. Overall, the conclusions made will
seek to engender an approach to the development of new sound diffusion
systems that is inclusive in terms of both technologies and aesthetic
standpoints.
Overall, this thesis should be regarded as a heuristic tool directed toward the
development of sound diffusion systems. An interpretation of the
electroacoustic idiom, and an evaluation of existing systems, are both
undertaken to this end. This enquiry seeks to address the following broad
questions:
3
• What can we learn from the technological demands of the
electroacoustic idiom?
• What can we learn from the aesthetic nature of electroacoustic
music?
• What can we learn from the logistical nature of sound diffusion
itself, and from differing approaches to and aesthetic attitudes
towards it?
• What can be learned from existing sound diffusion systems?
• What can we learn from the design and implementation of the
M2 Sound Diffusion System and from the feedback obtained
from practitioners who have performed with it?
• What might be a constructive way forward?
These purposefully open-ended questions serve as a basic template for the

discussion that follows, with each chapter, in turn, addressing one of them.
There are no definitive answers, but there are many perfectly valid
responses. It is for this reason that this thesis seeks to engender an inclusive
attitude toward the development of future sound diffusion systems, which
are – after all – the final means by which electroacoustic music is heard.
4
1. Electroacoustic Music: Defining
Characteristics
1.1. Introduction
As a creative and intellectual discipline, electroacoustic music can be

somewhat elusive and esoteric, ill-defined and multiplicitous in nature. It is
the author’s experience that definitions of electroacoustic music tend either
to be so generalised as to be of little use, or else they in fact allude to
particular ‘sub-divisions,’ which cannot, in isolation, be regarded as
accurate descriptors of the idiom as a whole. The objective of the present
chapter is therefore to approach a broad and holistic definition of
‘electroacoustic music’ that is neither overly embracing nor overly
restrictive. The purpose of arriving at such a definition is not specifically
with a view to identifying whether or not individual works fall into the
category (although some might indeed find this useful or interesting) so
much as to present a context within which works from across the
technological and aesthetic spectrum of the idiom can be meaningfully
interpreted and performed alongside each other without jeopardising their
essential nature. In this respect, the proposed definition is intended to be of
use specifically in the field of sound diffusion.
1.2. Existing Definitions
The expression ‘electroacoustic music’ (musique électroacoustique in the

original French) was first adopted in the late nineteen-fifties, and is
described by Chion as ‘a term that was intended to be ecumenical and
unifying, leaving aside questions of aesthetics and the means used to
achieve the end […] encompassing not only music on a recording medium
but also music combining musical instruments and tape.’6 Taken in its
historical context, the term was thus intended to unify the praxes of musique
6
Chion (2002). "Musical Research: GRM's Words". Website available at:
http://www.ina.fr/grm/presentation/mots.light.en.html.
5
concrète and elektronische Musik, which differed strongly in both technical
means and aesthetics (we will return to these in Chapter 2).
This definition is useful, in one sense, because it engenders an inclusive

understanding of the electroacoustic music that embraces a wide variety of
technological means and aesthetic standpoints; a similar definition, in this
respect, is to be sought in the present chapter. Chion’s definition is less
useful, however, in as much as it does not positively identify any traits of
the electroacoustic idiom: it excludes specific technologies and aesthetic
standpoints, but fails to give an indication of what might alternatively be
regarded as defining characteristics.
Wishart uses the expression ‘sonic art,’ which he explains as follows:
We can begin by saying that sonic art includes music and electroacoustic
music. At the same time, however, it will cross over into areas which have been
categorised distinctly as text-sound and as sound-effects. […] I personally feel
there is no longer any way to draw a clear distinction between these areas.7
Wishart’s essential belief that ‘all sound is music’ is one with which the
author happens to agree, but it is not particularly useful to anyone wishing
to gain a more in-depth understanding of specific musical praxes. Wishart’s
assertion that ‘sonic art includes music and electroacoustic music’ surely
implies that there must be some difference between cultural understandings
of these two expressions, yet his explanation of sonic art would seem to
suggest that these differences are inconsequential.
Poorly substantiated and/or overly generalised definitions have frequently

led to the assertion that ‘electroacoustic music’ is in fact a meaningless
expression:
The hybrid word ‘electroacoustic,’ used by this organization and by [Harrison]

in founding BEAST, is […] problematic – rather than reconciling antagonisms,
it merely creates confusion because it doesn’t really mean anything at all!
7
Wishart (1996). On Sonic Art (Amsterdam; Harwood Academic): 4.
6
‘Computer music’ tells us very little too, because it describes the tool, not the
music – it is hardly any more helpful than a term such as ‘piano music.’8
However, practitioners of the electroacoustic art must surely have an

understanding of the characteristics that define it. If we assume that this is
true, then the expression ‘electroacoustic music’ is certainly not
meaningless; it just happens not to have been verbally explained in a way
that adequately reflects the collective understandings of practitioners. The
following sections seek to challenge the assertion that ‘electroacoustic
music’ is meaningless, and re-instate the expression as a useful descriptor
through the identification of consistent defining characteristics that are
neither superficial nor overly specific.
1.3. Towards a More Useful Definition
Unlike many spheres of musical activity, it is difficult (if not impossible) to

define electroacoustic music in terms of its musical, structural, or stylistic
attributes: these are simply too numerous and variable. It is therefore
erroneous to treat the expression as indicative of a particular ‘style’ or
‘genre’ in the same way that one might categorise ‘minimalism’ or ‘heavy
metal’; electroacoustic music would seem to be an altogether broader
designation than either of these. If electroacoustic music cannot be reliably
identified by its musical characteristics alone, then what exactly is
responsible for its unique identity?
The expression ‘electroacoustic music’ itself suggests that we are dealing

with a practice that utilises electrical and/or electronic (‘electro-’)
equipment to create and present sounds (‘-acoustic’) that are in some way
‘musical.’ Accordingly, the following sections will approach a useful
definition of the idiom via an examination of the technologies used in its
creation and performance.
8
Harrison (1999). "Diffusion - Theories and Practices, with Particular Reference to the
BEAST System". EContact! 2(4). Electronic journal available at:
http://cec.concordia.ca/econtact/Diffusion/Beast.htm.
7
1.4. The Omnipresent Loudspeaker
It is an inescapable fact that electroacoustic music must be reproduced,

partially if not totally, via loudspeakers.9 This condition, in a sense,
precedes all aesthetic considerations and is arguably the most basic (but not
necessarily the most important) defining characteristic of the idiom. Of
course it is not the only defining characteristic – if this were the case then
any sound emanating from any loudspeaker in any context could justifiably
be described as electroacoustic music. Nonetheless, it is a difficult task
indeed to find a sensible definition of electroacoustic music, or of any of its
fantastically numerous sub-divisions, that does not make reference to this
very simple reality:
It has become common practice to use the term ‘electroacoustic music’ to

describe a wide variety of musical praxes involving the mediation of
loudspeakers.10
The performance of electroacoustic music necessarily entails amplifiers and

speakers.11
It can safely be said that the loudspeaker can be regarded as a defining

characteristic of electroacoustic music. In his text entitled ‘The Mirror of
Ambiguity,’ Harvey goes so far as to use the expression ‘loudspeaker
music.’12
1.5. Fixed Media
In addition to presentation over loudspeakers, the presence of a recording

medium such as analogue tape or (as is nowadays most common) compact
disc is also frequently invoked as a defining characteristic of electroacoustic
music. This is mentioned, for instance, by Chion in the quotation given
earlier.
9
This is intended to mean any form of loudspeaker, including headphones or any manner of
‘loudspeaker-like’ transducer. The term ‘audio decoding technology’ will later be proposed.
10
Windsor (2000). "Through and Around the Acousmatic: The Interpretation of
Electroacoustic Sounds". In Emmerson [Ed.] Music, Electronic Media and Culture: 7.
11
Rolfe (1999). "A Practical Guide to Diffusion". EContact! 2(4). Electronic journal
available at: http://cec.concordia.ca/econtact/Diffusion/pracdiff.htm.
12
Harvey (1986). "The Mirror of Ambiguity". In Emmerson [Ed.] The Language of
Electroacoustic Music: 188.
8
It is proposed, however, that the ‘fixed medium’, as it is sometimes called,
is not as indispensable as the loudspeaker in mediating our understanding of
what constitutes electroacoustic music. With current technology it is
increasingly possible to generate sound in real-time, without the need for an
intermediate recording medium, a practice frequently performed in concert
with laptop computers (this, of course, would not have been the case in the
formative years of the genre). Here, there may be no fixed medium, but the
sounds must still make themselves known via loudspeakers. Perhaps some
would argue that ‘laptronica’, as such, is not electroacoustic music.
However, it seems valid at this stage to suggest that such live performance
could warrant inclusion given that it takes place so frequently alongside
more traditional recorded work, and given the undeniable dialogue that
exists between ‘laptop performers’ and ‘composers for tape’. It should also
be noted that when recorded sounds are processed live in performance
(again, perhaps using portable computer technology), the recorded sounds
must be stored somewhere – on the computer’s hard drive for instance – and
in these cases a fixed medium is indeed present. However, take into
consideration the fact that live sounds could be processed in real time (as in
Hans Tutschku’s recent work,13 for example) and once again we find
ourselves without a fixed medium.
It is proposed that the fixed medium, as a defining characteristic of

electroacoustic music, is nowadays outmoded, and a more generalised
approach is therefore required.
1.6. Encoded Audio Streams
An encoded audio stream is a representation of real auditory events in some

form analogous to, but immanently different from, their natural existential
state. Because electroacoustic music relies on mediation via loudspeakers, it
is additionally true to say that it relies on the presence of encoded audio
streams to drive the loudspeakers. Encoded audio streams can also,
13
Tutschku’s Cito (2002), SprachSchlag (2000) and Das Bleierne Klavier (2000) are all
written for instrument(s) and ‘live electronics,’ and do not include a ‘tape part’ as such:
Tutschku (2004). "Hans Tutschku". Website available at: http://www.tutschku.com/.
9
therefore, be regarded as a defining characteristic of the electroacoustic
idiom.
Encoded audio streams can, at present, be analogue or digital, and in either

case can exist in a static form on some kind of storage medium, or in a
transitory form in the case of the transmission of encoded audio streams
from one device to another. The signals represented in analogue media such
as magnetic tape and vinyl are examples of static-analogue encoding,
whereas media such as compact disc, DAT, and ADAT are carriers of
static-digital encoding. An electrical signal from a microphone, travelling
down a conductive wire is an example of transitory-analogue encoding,
while a digital signal originating from a CD player and travelling down an
optical cable is an example of transitory-digital encoding. These examples
are summarised in Table 1, below.
Static (Fixed Media) Transitory (Transmission)
Analogue Signals stored on: Analogue Electrical signal travelling

Tape; Vinyl LP; etc. down a microphone cable.
Digital Signals stored on: CD; DAT Optical signal travelling in an

Tape; Hard Drive; etc. digital-optical cable.
Table 1. Summary of currently

available audio encoding
techniques, with representative
examples of each.
When the streams are transitory, the encoded signal exists in a continuously
time-varying state and therefore has no immediate ‘totality’ (i.e. it is not
static). Consequently, the encoded stream itself is not fundamentally
different to the original auditory phenomenon, insofar as both can only exist
over a span of time. Static encodings, on the other hand, are fundamentally
different because they uniquely offer up the possibility of representing
auditory events in extraction from the constant passing of time.
10
Encoded audio streams can be obtained and manipulated in a number of
ways. Microphones are perhaps the most obvious means of obtaining
encoded audio streams; such devices therefore fall into the category of
‘audio encoding technologies.’
1.6.1. Encoded Audio as an ‘Abstraction Layer’
The process of encoding audio results in the abstraction of real auditory

events that, in turn, makes the practice of electroacoustic music
possible. The existence of audio in an encoded form can be thought of
as an ‘abstraction layer’ (see section 6.2.7, page 289) as it is only ever a
means to an end, rather than an end in itself. In other words, the process
of encoding audio is only really useful if it is to be decoded – that is to
say, rendered audible via loudspeakers – at some point in the future. In
simple terms the encoded audio thus represents an intermediate stage in
the otherwise unitary process of the creation and reception of sound, as
illustrated in Figure 1, below.
(a) Without Encoding
Sound is Sound is
created heard
(b) With Encoding

Intermediate Abstraction Layer
Sound is Sound is Sound is Sound is

created encoded decoded heard
Figure 1. Simplified
diagrammatic representation of
the process of creation/reception
of auditory events both with and
without an intermediate
encoding/decoding abstraction
layer.
11
The advantage of the abstraction layer is that it creates a ‘point of entry’
into what would otherwise be a closed process.14 This has advantages
regardless of whether we are dealing with static or transitory encoding.
In Figure 1(a), although we may have control over the way in which the
sound is created (which objects and how they interact), this is basically
the limit of our control – there is nothing else we can do with respect to
the sound itself in the interim period between sound creation and sound
reception – and in this respect the actuality of the sound is a foregone
conclusion once it has been created. In scenario (b), by contrast, there is
the opportunity for various forms of intervention within the abstraction
layer itself.
If we are dealing with a transitory encoding, then the encoded signal can
be subjected to various real-time processes before it goes on to be
decoded. In the case of a static encoding the possibilities are more
radical because we have immediate access to the totality of data within
the ‘sound object,’ as opposed to experiencing it over a period of time.
The delightfully-named CDP algorithm ‘Sausage’ performs a ‘granular
reconstitution of several soundfiles scrambled together.’15 This is only
possible, of course, if we have simultaneous access to the totality of the
encoded audio events, and such processes can therefore only be
performed on static encodings.
Another notable feature of static encodings is that there is a physical

object – the storage medium itself – that indirectly represents the sound
and that may itself be open to manipulation. This is particularly true of
analogue storage media: tapes can be spliced; records can be ‘scratched’
by DJs; et cetera. In both cases the sound can be accessed and
manipulated by various means that take place within the abstraction
layer.
14
This is true of any abstraction layer, in any context. The benefits of using abstraction
layers as a means to achieving modularity in diffusion system design will be stated in
section 6.2.7.
15
CDP (2005). "CDP Home Page". Website available at:
http://www.bath.ac.uk/~masjpf/CDP/CDP.htm.
12
The abstraction layer has another major advantage in that all it really
does is provide a standardised framework (or ‘language’) for the
encoding and decoding of (in this case audio) information. As an
example, a system might convert (encode) variations in air pressure
(‘real’ sound) – via a microphone – into a constantly varying voltage;
this is a transitory-analogue encoding. Assuming that appropriate
equipment exists that understands the ‘language’ (i.e. ‘knows what the
constantly varying voltage means’) then it should be possible to recover
the original sound via the inverse process of decoding. The important
point is that the decoder is merely interpreting data of a particular
format (in this case, a constantly varying voltage) and, therefore, where
those data originate from is irrelevant. This means that (to extend the
current example) continuously varying voltages can be generated
independently of any real auditory events and subsequently decoded
using appropriate technology. This is, of course, the basis for sound
synthesis (the process works in an analogous way in the digital domain)
and it is now stated that this possibility arises as a direct consequence of
the abstraction layer brought about by the process of encoding and
decoding auditory events.
1.6.2. Encoding Spatio-Auditory Attributes in

Multiple Streams
Notwithstanding the fundamental difference between static and

transitory modes of audio encoding, there is an important respect in
which both are similar: multiple independent streams can be utilised in
combination to encode certain spatio-auditory attributes. Two-channel
stereophony,16 for example, involves the simultaneous use of two audio
streams to encode information regarding spatial positioning along a
single axis. This is possible by virtue of ‘phantom imaging’ between
loudspeakers upon decoding, which will be described more fully in
section 1.10. The two constituent channels are physically separate
16
Blumlein (1931). "Improvements in and relating to Sound-transmission, Sound-recording
and Sound-reproducing Systems". International Pat. No. 394325. Alan Blumlein Official
Website. Available online at: http://www.doramusic.com/patents/394325.htm.
13
entities, but in order for the spatial effect to be realised correctly, certain
conditions must be observed with respect to their relationship with each
other. Conventionally, in the case of two-channel stereophonic
recordings, ‘channel 1’ is ‘left’ and ‘channel 2’ is ‘right,’ with ‘left’ and
‘right’ alluding to the opposite extremities of an axis in the horizontal
plane. At the most basic level, this convention must be observed during
both encoding and decoding stages in order for the phantom imaging to
be successful. It should be stressed, however, that this is merely a
convention, and there is no strict reason why the two channels of a
stereo recording could not be used to project auditory images along any
other arbitrarily defined axis.17 Nonetheless, whatever spatial encoding
conventions are observed, it is important that the same conventions are
assumed during both encoding and decoding stages.
Following the same basic premises it is possible to combine more than

two encoded audio streams to allow for the storage and transmission of
more detailed spatial information. Under present technological
constraints the minimum number of channels required to encode fully
three-dimensional spatial information is three, with varying degrees of
success depending on the particular technique used. Indeed, the
multichannel approach is becoming more and more common, with
software and hardware applications increasingly supporting various
formats. This trend has, as many have predicted, experienced something
of an acceleration in recent years with the increasing acceptance of 5.1
surround sound as a standard convention for multichannel audio.18 The
success of 5.1 can, perhaps, be attributed to the fact that, like two-
channel stereo, it represents a convenient standard that can easily be
17
There are in fact reasons, which pertain mainly to the loudspeaker-related criteria for
successful phantom imaging (some of these will be discussed more fully in 3.7) and vary
depending on the specific phantom imaging technique used. So long as these criteria are
met, it is proposed that the present argument is valid.
18
The ‘5.1’ configuration was devised by Dolby Laboratories and is therefore sometimes
also referred to as ‘Dolby Digital,’ although very similar systems are produced by others
(DTS being a good example). Introductory articles are available via the Dolby website:
Dolby Laboratories (1999). "Surround Sound: Past, Present and Future". Website available
at: http://www.dolby.com/assets/pdf/tech_library/2_Surround_Sound_Past.Present.pdf.
Dolby Laboratories (1999). "Some Guidelines for Producing Music in 5.1 Channel
Surround". Website available at: http://www.beussery.com/pdf/buessery.dolby.5.1.pdf.
14
adhered to in the encoding and decoding of audio signals. Accordingly,
practitioners can work within these constraints with the relative
assurance that appropriate apparatus will exist for the widespread
dissemination of their work in both commercial and domestic settings.
However, like stereo, it is just a convention, and some composers have
criticised the format as being ill-suited in various respects for more
creative audio-related applications:
The specification for 5.1 […] is very demarcated in terms of what you’re
supposed to put on each component in the 5.1 array. The LFE is supposed
to have that and nothing else, and the centre speaker is supposed to have
the dialogue and nothing else, and the frontal stereo is supposed to have
the soundtrack, the image, the music, et cetera, and very little else. The
surrounds kind of vaguely interact with that but mostly for effects. […].
That’s about the extent of it. So they’re kept fairly distinct in terms of their
functions. But most composition usage doesn’t want to keep them that
distinct. It wants actually to blur those things. It wants to have something
that swirls around the space and then it wants something that’s very
precise and in a very particular location; those kind of things. And I think
that’s actually quite difficult to do with 5.1. […] Very often, the surround
speakers are a smaller unit altogether than the main stereo. So, actually, its
very difficult to get a fully balanced equal surround, for example, in the
average 5.1 array.19
Consequently, it is fairly usual for electroacoustic composers to wish to

supersede the restrictions of 5.1, with eight-channel works at present
being fairly common. Spatial encodings encompassing even more than
eight channels are not unheard-of,20 and given the increasing acceptance
of multichannel audio in a wider context there is no reason to believe
that the demand for more channels should not continue. Regardless of
the specific number of channels, these ‘spatial encodings’ follow the
same basic principles as stereo and 5.1 – the same standard must be
followed in the encoding and decoding of the audio streams – but
unfortunately, in the absence of a more widely accepted convention,
19
Harrison (2004). "An Interview with Professor Jonty Harrison". Personal interview with
the author, 8th September 2004. See Appendix 1 for full transcription.
20
Marco Marinoni’s Del Vuoto Incanto (2003) is an electroacoustic work for 16-channel
tape and was first performed at University of Limerick on 5th April 2003:
Marinoni (2004). "Marco Marinoni's Home Page". Website available at:
http://www.geocities.com/marco_marinoni.
David Worrall’s Cords (1990) also utilises sixteen audio channels and was written
specifically for performance in the ACAT Dome, which will be discussed later in section
4.13:
Worrall (2002). "David Worrall Home Page". Website available at:
http://www.avatar.com.au/worrall/.
15
numerous different methods exist (some of these will be described in
Chapter 3). It is perhaps even more unfortunate that much commercially
available software and hardware is unable to efficiently support these
demands, and given that the practice of sound diffusion is confined to a
relatively small specialist minority, it seems that this circumstance is
unlikely to change in the near future.
All of this indicates a likelihood, in the context of electroacoustic music

performance, that (in addition to other varying technical demands yet to
be discussed) there will be a high degree of variation inherent in the
multichannel demands of works. It is therefore important that arbitrary
multichannel configurations should be allowed to co-exist in concert
programmes, and that there is a need for sound diffusion systems that
are able to facilitate this; it will be suggested, in Chapter 4, that this is
not presently the case. This is at least in part attributable to the fact that
much hardware and software only allows for encoded audio streams to
be efficiently and straightforwardly handled on an individual basis, or at
best in groups that conform to well-established conventions such as
stereo and 5.1 (although the latter case is far less common, particularly
in the hardware domain). Ultimately, however, these circumstances are
due mainly to consensus (as opposed to any actual physical constraints),
and in the vast majority of cases access to individual audio channels is
available regardless of any conventional/theoretical ‘grouping.’
Accordingly, a conceptual framework that allows for an arbitrary
number of encoded audio streams to be treated as fundamentally related
– the ‘coherent audio source set’ – will be proposed and described in
sections 3.5 and 3.6.
It will be noted that the existence of loudspeakers and encoded audio

streams is a dialectical necessity: ultimately, neither is useful in the absence
of the other. As such, loudspeakers and encoded audio streams can both be
regarded as defining characteristics of electroacoustic music. The ability to
encode auditory events – whether the encoding is static or transitory, digital
or analogue – allows us to represent and interact with these transient
16
happenings symbolically and, if necessary, outside of the time domain. This
can be regarded as providing an ‘abstraction layer’ between the normally
unitary processes of sound-creation and sound-reception. Within this
abstraction layer, various forms of technologically mediated intervention
can take place before the final decoding of the encoded audio stream(s). It is
therefore within this abstraction layer that electroacoustic music itself
exists.
1.7. Audio Technologies
In surveying the main audio technologies that are currently used in the
creation and performance of electroacoustic music, key technologies will be
summarised in terms of six categories, namely: audio encoding technology,
recording and playback technology, synthesis technology, audio processing
technology, software (computer) technology, and audio decoding
technology. This summary will be followed by an overview of some of the
possible technological combinations that could be implemented in the live
performance of an electroacoustic work, and specific examples of each of
the primary combinations will be given. The technical demands of
electroacoustic music, as presented in this chapter, are not intended to be
fully exhaustive, but rather should serve as a basic template for the (purely)
technological scope of electroacoustic music. This, in turn, will give an
indication of the kinds of technical demands that are likely to be placed on
concert sound diffusion systems. It should be noted, however, that these
technologies are by no means exclusive to electroacoustic music.
Furthermore, electroacoustic music can incorporate many other technologies
that will not be listed here. We will return to both of these issues later. The
following sections review those key technologies that deal directly with
encoded audio streams; as such, these can be used to affect various
interactions within the abstraction layer of encoded audio. As such, it can be
assumed that these technologies will be used in both the composition and
17
1.7.1. Audio Encoding Technologies
These are technologies that mediate between real auditory events and
encoded audio streams. Microphones and pick-ups of various kinds are
the most obvious examples. These convert physical vibrations into
(usually) continuously varying voltages, and as such perform a
transitory-analogue encoding. It is conceivable that in the near future
audio encoding technologies will be developed that encode directly into
transitory-digital formats. The abstract principle of converting between
real sounds and encoded audio streams, however, would remain the
same.
1.7.2. (a) Recording and (b) Playback

Technologies
Recording technologies facilitate the conversion of transitory encoded

audio streams (whether analogue or digital) into static encoded audio
streams. Playback technologies, on the other hand, carry out the
conversion from static back to transitory. We have already alluded to
the role of such technologies in electroacoustic music by way of
reference to works presented on a ‘fixed’ recording medium. Recording
technology dates back to the late 1870s, when inventors including
Edison and Berliner patented ‘phonograph’ inventions crudely
comparable to more modern vinyl record players.21 Subsequent
developments included magnetic recording onto wire (1898), steel tape
(1929), analogue-optical recording techniques and, finally, magnetic
tape in 1935.22 Pulse code modulation (PCM), a means of encoding
audio signals digitally that is still in widespread use today, was patented
by Reeves in 1942; computer implementations of PCM technology were
first developed by Mathews in 1957, with the first hardware recorder
21
Manning (1993). Electronic & Computer Music (Oxford; Oxford University Press): 1.
22
Ibid.12-13.
18
developed by NHK Technical Research Institute in 1967 following
necessary developments in digital electronics.23
1.7.3. Synthesis Technologies
In abstract terms it can be stated that synthesis technologies are used to

generate encoded audio streams directly, without the need for any audio
encoding technology. The earliest example of synthesis technology is
generally acknowledged to be the Telharmonium, patented in 1897 by
Thaddeus Cahill. Other notable early synthesis tools include the
Thérémin (1924), Sphärophon (1927), Dynaphone (1928), Ondes
Martenot (1928) and Trautonium (1930).24 All of these are used to
directly generate transitory-analogue streams (in the form of
continuously varying electrical voltages) which can then be decoded via
loudspeakers. This is in contrast with earlier musical technologies
(violins, pianos, et cetera) whose sounding characteristics are defined
directly by their physical attributes and the means of excitation
employed to cause these physical bodies to resonate. More recent
developments in synthesis use digital audio technology to implement
synthesis algorithms that would be extremely difficult or impossible to
achieve by analogue electronic means.
1.7.4. Audio Processing Technologies
An audio processing technology is one that facilitates some kind of

modification to existing encoded audio streams before the process of
decoding takes place. Modifications could be applied ‘on the fly’ to
transitorily encoded audio streams: this is what happens, for example,
when an amplifier is used to apply gain to a signal, or when a filter is
used to attenuate part of the frequency spectrum. Equally, modifications
could be applied to static encodings: tape splicing is an example
relevant to the historical development of electroacoustic music, while
the process of ‘scratching’ performed by DJs would represent a more
23
Roland-Mieszkowski (1989). "Introduction to Digital Recording Techniques". Website
available at: http://www.digital-recordings.com/publ/pubrec.html.
24
Manning (1993). Electronic & Computer Music (Oxford; Oxford University Press)1-3.
19
contemporary example. Among the earliest examples of audio
processing were those performed by Pierre Schaeffer in his early
musique concrète compositions in 1948. Short recordings were looped,
and acetate discs played back at different speeds, thus altering the
qualities of the resulting sounds. Physical means of audio processing,
such as plate reverbs, were often used in the formative years of
electroacoustic music (although it could perhaps be argued that these are
actually audio decoding technologies that modify the sound), as well as
techniques that exploit the physical means of electrical sound
reproduction, such as tape-head delay machines. Electronic means,
filters for example, followed presently.
1.7.5. Software Technologies
The impact of software technology on electroacoustic music cannot be

underestimated, not least because computer software is effectively able
to emulate most of the technologies previously mentioned (with the
notable exception of the necessarily physical roles performed by
encoding and decoding technologies), and allows the competent user to
achieve goals that, for instance, would simply not be possible in the
hardware domain. Generally, software used to perform
recording/playback, synthesis, or audio processing tasks, will not need
to be differentiated from its corresponding category given above. That
is, in many cases, implementations of computer and software
technology can be regarded as combining elements of the technologies
given in items 1.7.2 through 1.7.4. However, certain electroacoustic
music praxes (such as algorithmic composition) make use of attributes
that are exclusive to software and microprocessor technology, and that
do not fall into any of the other categories. Although structuring
principles in electroacoustic music will not be discussed in this chapter,
they will form a substantial part of the discourse of Chapter 2. Certain
live processing techniques also make use of the unique capabilities of
software applications. Some examples will be given in section 1.8.4.
1.7.6. Audio Decoding Technologies
20
In abstract terms these technologies affect the conversion from encoded
audio streams back into real sound. Normally this is done by converting
the continuously varying voltage of a transitory-analogue stream into
physical movements within some kind of material, the resulting
vibrations subsequently causing sound waves to travel. In the vast
majority of cases, such technologies can be classed as ‘loudspeakers,’
but alternative means are not unheard-of. Decker has implemented
various mechanical systems in sound-sculptures that could be classified
as audio decoding technologies.25 Tarsitani describes ‘planephones’ –
audio decoding technologies that would not perhaps intuitively be
regarded as loudspeakers – as follows:
‘Planephones’ [are] special multiphonic sound diffusion systems [in the

sense of ‘propagators of sound,’ as opposed to ‘diffusion systems’ as they
will be described in Chapter 3] developed by Michelangelo Lupone at
CRM and presented for the first time at the Musica Scienza festival of
1998. Planephones are vibrating sound systems consisting of panels of
different materials (wood, metal, plastic, leather) and shapes installed in
artistic venues; with them it is possible to design the acoustic space
according to the architecture of the hall and give the sound the timbral
quality of the materials employed.26
Again, it is feasible that at some point in the future the necessary means
to convert from digitally encoded streams into real auditory events will
be developed, but the basic premises of the decoding process will
remain the same in this eventuality. It will be noted that, because of
their dependence on encoded audio streams, each of the audio
technologies discussed previously is ultimately indebted to audio
decoding technology as the means of generating the final, audible,
result. We must therefore continue to consider the loudspeaker as being
of primary importance.
25
Decker’s Scratch Studies (2000), for example, converts encoded streams into mechanical
movements of motors attached to pieces of stiff wire, which, in turn, scratch against metal
surfaces to create sound. Of course, whether or not this constitutes electroacoustic music is
a matter of debate. A piece by the author, Pointless Exercise No. 911980 (2002), uses
recordings of two of Decker’s sound sculptures as its source materials.
Decker (2005). "Shawn Decker - Installations". Website available at:
http://www.artic.edu/~sdecker/Finstallations.html.
26
Tarsitani (1999). "Musica Scienza 1999". Computer Music Journal, 24(2): 88-89.
21
1.8. The Technical Scope of Live Electroacoustic
Music
It has been proposed that, in terms of technological requirements, the only

indispensable prerequisite in the performance of electroacoustic music is the
decoding technology (loudspeaker). Invariably – because loudspeakers are
effectively useless in the absence of any encoded audio streams to decode –
any given piece of electroacoustic music must make use of one or more of
the other technologies given under section 1.7, and may of course include
all of them.
Of the technologies described in section 1.7, three can be described as

‘primary,’ because they are able to output a transitorily encoded audio
stream with no transitorily encoded input stream. These are: audio encoding
technologies; playback technologies; and synthesis technologies. Audio
processing technologies must be regarded as ‘secondary’ because they
require an encoded audio stream as input. Such technologies therefore
cannot be used on their own, but only in conjunction with one or more of
the other technologies. Recording technology can, of course, be used during
performance, but if this is the case then it must invariably be used in
conjunction with playback technology if the performance is real time and
the recorded audio streams are to be part of the live performance itself. Live
sampling, for example, would be regarded as a use of audio encoding,
processing, and playback technologies for the purposes of the present
discussion, even though some of the audio streams may have been recorded
during the performance. Software technology, as mentioned previously, will
not be evaluated separately: if software is used to synthesise sounds, then
this will be considered a synthesis technology, and so on. Audio decoding
technologies (loudspeakers), as we know, are always present, and therefore
are not a variable. Under these criteria, the live performance of a piece of
electroacoustic music may exert any combination of the technical demands
summarised in Table 2 below.
22
…with (secondary)
audio processing
technology
…only: no
processing
secondary
1.8.4. Streams from a
1.8.1. Streams from a fixed medium
Playback technology…
fixed medium processed prior to
decoding
Primary Technologies
1.8.5. Streams
1.8.2. Streams
synthesised in real
Synthesis technology… synthesised in real
time and processed
time
prior to decoding
1.8.3. Acoustic
1.8.6. Streams
sources (often, but
encoded from
not always, encoded
Audio encoding technology… acoustic sources and
into streams for
processed prior to
amplification and/or
decoding
diffusion)
Table 2. Summary of the
possible technological demands
in the live performance of works
of electroacoustic music.
The three primary technologies – shown in the left-most column of Table 2

– can be used on their own, or in conjunction with some secondary form of
audio processing before the decoding process takes place. This gives a total
of six individual technological ‘units,’ shown in the grey shaded area. A
piece of electroacoustic music can incorporate any number (one or more) of
these units, in any combination, and could of course feasibly incorporate all
six. By factorial calculation, it can therefore be determined that there are a
total of sixty-three unique technological combinations, which could perhaps
be regarded as electroacoustic ‘instrumentations.’ Individual explanation of
each of these permutations is clearly unnecessary; the following sections
will describe each of the six main components, giving examples from the
electroacoustic repertoire where possible.
1.8.1. Works presented solely from a fixed

medium
This category encompasses the repertoire of works stored on magnetic

tape, compact disc, ADAT, hard drive and so on, and intended for play-
back or diffusion over loudspeakers in performance. Historically this
23
has been the most common category for electroacoustic works, and
although this arguably continues to be the case, it seems likely that this
primacy will be challenged in the future as technology advances. Of
course, the audio statically encoded on the medium can be composed of
any combination of acoustic sounds, synthesised sounds, and sounds
which have been further processed after initial encoding. Among the
first electroacoustic works to combine synthesised and acoustic sounds
on a fixed medium was Stockhausen’s Gesang der Jünglinge,
completed in 1956.27
1.8.2. Works consisting of sounds synthesised in

real-time
This category represents electroacoustic works that are not stored on a

recording medium, but rather realised electrically or electronically in
real-time using synthesis technology. Again, such performances are
invariably made audible by way of amplifiers and loudspeakers. As an
example, recent work at University of East Anglia 28 involves the use (or
perhaps ‘misuse’) of synthesisers in unconventional ways for the live
real-time creation of sonic material. Here, the internal workings of early
(and presumably cheaply available!) digital synthesisers are
manipulated directly, using pieces of wire to create short-circuits, with
interesting and often unexpected sonic results. Others perform using
software-based laptop systems and various other acoustic and electronic
sounding devices. The result is an improvised ensemble of
electronically generated sounds broadcast over loudspeakers.
As with all of these technological profiles, it is common for the

technology in question to be used in conjunction with other forces.
27
Stockhausen (1991). Stockhausen 3 - Elektronische Musik 1952-1960. (Compact Disc -
Stockhausen-Verlag: 3).
28
UEA artists including Shigeto Wada, Phil Archer, Stef Edwards, Adam Green, Jonathon
Manton, Nick Melia, Tom Simmons and Bill Vine performed at Sound Circus – the Sonic
Arts Network Annual Conference – at De Montfort University, Leicester, on Sunday 13th
June 2004.
24
Hindemith’s Concertina for Trautonium and String Orchestra29 (1931)
might be considered among the first examples of this, this work
combining acoustically generated sounds with sounds generated
synthetically by the Trautonium. Messiaen’s Turangalîla Symphony30
(completed 1948) includes the Ondes Martenot among the traditional
orchestral instruments. Fête des Belles Eaux (1937), also by Messiaen,
is scored solely for six Ondes Martenots. Whether or not such works
constitute electroacoustic music, however, is debatable, their having
been heavily conditioned by instrumental traditions. One could suggest
that Messiaen’s Turangalîla Symphony makes use of the Ondes
Martenot as an orchestral instrument with a novel and interesting
timbre, but that in every other respect this particular instrument is
treated in the same way as any other. In other words, it could be argued
that the piece of music would not be subject to a terminal change in its
essential character were the Ondes Martenot to be replaced with an
acoustic instrument. The main creative locus of the work has been in
composing a piece of orchestral music and not in any aspect specifically
brought about by the presence of new audio technologies. We will
return to this discussion in sections 1.12 and 1.13, as a means to further
refining our definition of electroacoustic music.
1.8.3. Works consisting of acoustic sound

sources
Live acoustic means of sound production are frequently used in the

performance of electroacoustic music. Such forces typically include the
voice and other ‘traditional’ musical instruments but are not necessarily
limited to these. Any piece of electroacoustic music utilising
acoustically generated sound, howsoever created, would fall into this
category.
29
Hindemith and Sala (1998). Elektronische Impressionen. (Compact Disc - Erdenklang
Musikverlag: EK81032). This CD also includes Hindemith’s 7 Triostücke für 3 Trautonien
and works for Trautonium by Oskar Sala.
30
Messiaen (2000). Turangalîla Symphony / L'Ascension. (Compact Disc - Naxos:
8.554478/79).
25
In this context, instrumental sounds may simply be allowed to sound
naturally, or equally may be encoded via microphones, pick-ups, and so
on. It would be fairly unusual for a piece of electroacoustic music to
invoke this criterion as its only technological prerequisite, although
Nance’s Parables of Control (2005) could – in a sense – be defined as
an electroacoustic work for acoustic sound sources only.31 It is far more
common, however, for acoustically generated sounds to be used
alongside other technologies. Smalley’s Clarinet Threads (for clarinet
and tape)32 and Harrison’s Abstracts (for large orchestra and tape)33 are
two examples of this, both of these combining playback technologies
(fixed media) with acoustically generated sound sources. In the former
case the clarinet part is usually encoded via microphones during
performance for the purposes of amplification, whereas Harrison’s piece
does not call for the orchestra to be amplified and therefore
instrumentalists do not require microphones.
1.8.4. Works consisting of fixed medium sources

processed in real time before decoding
Some work consists in fixed medium recordings that are subjected to

analogue or digital signal processing at the time of performance, before
amplification and broadcast over loudspeakers. Much work in the field
31
Nance (2005). Personal email communication with the author. At the time of writing,
Rick Nance is completing PhD studies in electroacoustic composition at De Montfort
University, Leicester, UK. Parables of Control is a series of studies for solo cello in which
a pre-composed electroacoustic ‘tape’ part (itself derived from recordings of cello sounds)
– although used in performance – is not actually heard by the audience. The material from
compact disc is heard only by the cellist – via headphones – with the instrumental
performance being realised by the performer in continual response to it. In this way, the
‘tape’ part itself acts rather like a performance ‘score.’ The use of playback and decoding
technologies is, of course, still prerequisite to the performance of the work, but as the
audience hears only acoustically generated cello sounds, the piece can be regarded as
something of a special case. Analogies of Control (2005) and K (2005) are pieces with tape
for cello and solo trumpet, respectively. Here, electroacoustic scores – again, not heard by
the audience – are used in the same way, but in this case the sound generated by the
performer’s instrument is heard alongside a different ‘tape’ part. The cello pieces were first
performed by Thomas Gardner in Liverpool on 29th November 2005, with further
performances imminent at the time of writing.
32
Various (1990). Computer Music Currents 6. (Compact Disc - Wergo Schallplatten:
WER 20262).
33
Harrison (2001). In Various (2001). Cultures Électroniques No. 15: 28e Concours
International de Musique et d'Art Sonore Électroacoustiques. (Compact Disc - Mnémosyne
Musique Média: LCD278074/75).
26
of what has become known colloquially as ‘laptronica’ involves such
practice (here, the fixed medium is a computer hard drive), Joseph
Hyde’s ‘Live Sampling’ projects of recent years being a good example.
Hyde describes these projects as follows:
Since 1999, Joseph has been performing as an experimental DJ and VJ

using a wide and eclectic range of material (everything from early
Musique Concrète to German electronica; from environmental sources to
kitsch charity-shop vinyl), mixing across genres and cutting and distorting
using a mix of lo-fi techniques such as vinyl vandalism and high-tech
methods based on digital sampling […] and live image manipulation… He
performs in concert venues, clubs and unusual sites, both solo and with
other performers.34
Processing of recorded audio sources is often performed using real-time

digital signal processing software such as Cycling 74’s Max/MSP35,
Miller Puckette’s Pure Data36 (more commonly abbreviated as PD), and
STEIM’s LiSa37, but hardware audio processing technology is also, of
course, a possibility.
1.8.5. Works consisting of audio streams

synthesised in real time and processed
before decoding
In this case, sounds that are synthesised directly in real time are
subjected to some form of intermediary processing while the streams are
still encoded. This could be achieved by both hardware or software
means, or a combination. Again, software applications such as
Max/MSP can be used to perform both synthesis and live processing.
34
Hyde (2004). "Projects - Live Sampling". Website available at:
http://www.theperiphery.com/projects/LiveSampling.htm.
35
Zicarelli (2004). "Max/MSP for Mac and Windows". Website available at:
http://www.cycling74.com/products/maxmsp.html.
36
Puckette (2004). "Software by Miller Puckette". Website available at:
http://crca.ucsd.edu/~msp/software.html.
37
STEIM (2004). "STEIM Products - LiSa X". Website available at:
http://www.steim.org/steim/lisa.html.
27
1.8.6. Works consisting of acoustically generated
sounds encoded and processed in real time
before decoding
Audio streams encoded from acoustic sources via microphones or pick-

ups may be manipulated using audio processing technologies, amplified,
and broadcast in real-time, where they will be heard in addition to the
‘natural’ unprocessed sounds. Such works are often described as pieces
for acoustic forces ‘with live electronics.’ Tutschku’s Cito (2002),
SprachSchlag (2000) and Das Bleierne Klavier (2000), for example, are
all written for instrument(s) and ‘live electronics,’ and do not include a
‘tape part’ as such.38 In each case, sound from an acoustic instrument is
transduced via microphones, and the encoded audio streams are
subjected to various signal processing techniques before being decoded.
It should, of course, be remembered that electroacoustic works may utilise

these six technological units in any number and in any combination. In the
future it may become necessary to invent further nomenclatures to describe
these practices, but at present it can be argued that they all fall under the
‘umbrella term’ of electroacoustic music. Emmerson states that “mixed’
electroacoustic music (instruments and tape), ‘live’ electronic music (using
processing of sound produced by a performer) and […] ‘real-time’ computer
music’ are component parts of a larger ‘electroacoustic music field.’39
Indeed, it is proposed that several of the technological combinations
(particularly ‘tape only,’ ‘tape and live instrument(s),’ and any of the
combinations involving ‘live electronics’) are – almost of themselves –
culturally identifiable as characteristically electroacoustic
‘instrumentations.’ The technological plurality and eclecticism inherent in
electroacoustic music is also observed by Waters.40 Nevertheless, it seems
38
Tutschku (2004). "Hans Tutschku". Website available at: http://www.tutschku.com/.
39
Emmerson (2000). "'Losing Touch?' The Human Performer and Electronics". In
Emmerson [Ed.] Music, Electronic Media and Culture: 194.
40
Waters (2000). "Beyond the Acousmatic: Hybrid Tendencies in Electroacoustic Music".
In Emmerson [Ed.] Music, Electronic Media and Culture AND Waters (2000). "The
Musical Process in the Age of Digital Intervention". ARiADA 1. Electronic journal available
at: http://ariada.uea.ac.uk:16080/ariadatexts/ariada1/content/Musical_Process.pdf.
28
possible that the inclusion of all of the above within the genre of
‘electroacoustic music’ may be controversial to some. However, it is
undeniably the case that works encompassing all of these categorisations are
frequently presented side-by-side in public performance. Several of the
examples given previously co-habited the programme for the Sonic Arts
Network conference in 2004. This being the case it is clearly necessary that
provision be made for the juxtaposition of such varied musical forces; this
will become an important item within the criteria for the evaluation of sound
diffusion systems, which will be given later in Chapter 4.
1.9. Creative Frameworks
The previous sections have sought to approach a workable definition of

electroacoustic music via an examination of the various technological means
engaged in its live performance. Although this is a good starting point, the
findings are not wholly enlightening because nowadays an enormous
volume and variety of musical material is generated and distributed by these
means. With the exception of exclusively aural traditions, it is reasonably
difficult to imagine any field of musical activity that does not make fairly
extensive use of audio technology in some aspect of its production or
dissemination. Nonetheless, it seems to be widely accepted that audio
technology, in some way, plays a particularly significant role in
electroacoustic music. Notwithstanding certain technological
‘instrumentations’ that are characteristic of the idiom (this is somewhat
tautologous, and does not constitute an acceptable definition on its own),
further examination of the significance of audio technology as a defining
characteristic in electroacoustic music, with a particular focus on how the
technology is appropriated and how this differs from other applications that
utilise the same technology, will be useful.
From this point onwards, the expression ‘creative framework’ will be used
to connote any entity, construct, system, or paradigm – whether physical or
conceptual – that is used as a partial or total means to realising artistic
output. Western classical notation, for example, is a creative framework
frequently used in the composition of instrumental music. Traditional
29
musical instruments are creative frameworks that contribute towards the
performance of such musical works. ‘Real’ sound and synthesised sound are
two creative frameworks frequently engaged in the composition of
electroacoustic music. The technologies outlined in section 1.7 are all
individual creative frameworks within which electroacoustic music is both
composed and performed.
The postulation is as follows: every creative framework, due to its very

nature, offers up specific affordances41 to its user and, therefore, engenders
particular ways of working. In this respect, creative frameworks are not
transparent, neutral mediators of artistic expression but, rather, exhibit
certain intrinsic ‘biases.’ A deliberately crude and exaggerated example may
be helpful. As a creative framework, a paint-brush (echoing Windsor)
affords painting or, one might say, is biased in favour of – and therefore
engenders – this particular mode of operation. A violin affords playing
violin music, and is of its nature inclined towards this way of working. The
violin is no more appropriate as a means to painting pictures than the paint-
brush is as a means to playing violin music. We could, of course, try to
‘misappropriate’ the frameworks in this way, but are unlikely to be
rewarded with any high degree of success; this is because each framework
engenders a particular way of working and is resistant, to an extent, towards
different ways of working. This does not mean, of course, that creative
frameworks can only ever be used exactly ‘as intended.’ The example given
is deliberately exaggerated as a means of illustrating that creative
frameworks each have their own particular affordances and biases, which –
in most sensible cases – are not quite so extreme.
41
The expression ‘affordances’ was devised by psychologist James Gibson, and is
explained by Windsor as follows:
Objects and events are related to a perceiving organism by structured information, and they
‘afford’ certain possibilities for action relative to an organism. For example, a cup affords
drinking, the ground, walking. […] Gibson’s term for this kind of meaning is ‘affordance.’
In the present context, the term ‘affordance’ is used as a convenient way of connoting
something that is ‘made possible’ by a creative framework as a function of its unique
nature.
Electroacoustic Sounds". In Emmerson [Ed.] Music, Electronic Media and Culture: 11.
30
A creative process can be regarded as one in which multiple creative
frameworks are engaged interactively, eventually resulting in the finished
work. The expression is deliberately broad because this allows us to
examine, at a very abstract and generalised level, the essential
characteristics of the means by which electroacoustic music is created and
performed.
The categorisations given in Table 2 are merely technological requirements,

and it is certainly not suggested that any practice eliciting these
requirements must ‘by definition’ be a piece of electroacoustic music. We
must therefore seek to identify what is missing from our definition at this
point. By examining the creative frameworks of electroacoustic music, we
can seek to gain an understanding of the specific ways in which these are
appropriated by electroacoustic musicians. In this way it is proposed that
further defining characteristics of the idiom can be uncovered. The
following sections will examine one specific creative framework that is
central to all electroacoustic music: the loudspeaker.42
1.10. Analysis of the Loudspeaker as a Creative

Framework
When an auditory event is encoded, it becomes detached, in a sense, from

its original causal agent. When a statically encoded (i.e. recorded) sound is
made audible by amplification through a single loudspeaker, its placement
in space – the locus from which the sound emanates – is determined not by
the spatio-temporal location of the original sounding agent, but by that of
the loudspeaker (and the source(s) from which its encoded audio streams
originate). This may seem obvious but historically the musical possibilities
must have been exciting: the spatial location of a sound source was no
longer limited by necessity to that physically attainable by a performer. That
is to say, loudspeakers can be deployed with relative ease in positions that
would be at the very least inconvenient for human performers; on the ceiling
42
Similar examination of the other creative frameworks may well reveal further defining
characteristics of the electroacoustic idiom. For the purposes of the present thesis, this is
not necessary.
31
or up the walls of an auditorium as in (to cite a fairly early example) Varèse
and Le Corbusier’s installation of Poème Électronique in the Philips
Pavilion (1958).
Of course, loudspeakers do not have to be used ‘monophonically,’ as mere

surrogate performers. If multiple loudspeakers are used collectively, various
‘phantom imaging’ techniques can be employed (with varying degrees of
success dependent on both the specific technique used and the particular
formation of loudspeakers relative to the audience) to create the illusion of
sound sources emanating from in-between individual loudspeakers. In other
words, the location of a sound source as perceived need not even be limited
to the physical location of any single loudspeaker. (Surely this must have
seemed unbelievable to composers working with new audio technology in
its formative years). This process is, of course, greatly aided by the absence
of a visual point of reference. Where loudspeakers are used
monophonically, as described in the previous paragraph, the loudspeaker
will itself tend to be perceived as the sound source. Where multiple
loudspeakers are used to create phantom images, however, individual
loudspeakers (if the phantom imaging is successful) cease to be perceived
visually as spatially rooted sound sources. Technologically, most listeners
will realise that in reality the loudspeakers are the source of what they are
hearing, but perceptually the sound sources will be invisible, because there
is no visible entity fixed at the location from which sound seems to be
emanating: the perceptual bond between sounding phenomenon and visual
referent breaks down. Clozier summarises this situation as follows:
A single loudspeaker is a sound projector and causes the listener to identify the
physical point of emission with the point of origin of the sound source… The
sound […] is located at the point within the general space at which the
loudspeaker is placed… We do not hear the sound, we hear the loudspeaker…
Two loudspeakers […] constitute […] a link between two points situated in
general space. They constitute an imaginary space-line on which may be
projected singular or particular musical space… Except for extremes at right
and left [where loudspeakers would behave ‘singly’ as visible sources] the
sound is created within its [own] space.43
43
Clozier (1998). "Composition, Diffusion and Interpretation in Electroacoustic Music". In
Barrière and Bennett [Eds.] Composition / Diffusion in Electroacoustic Music: 246-250.
32
It can therefore be seen that the ability to (re)create sound over multiple
loudspeakers, without the previous necessity of a visible, spatially-rooted
sounding agent, leads to the true emancipation of spatiality in an auditory
context. It is for this reason that electroacoustic music has been described as
‘a uniquely spatial medium.’44 Of course this, in turn, raises the highly
pertinent question of how to utilise the phenomenon of ‘disembodied space’
meaningfully in a compositional context. Unsurprisingly, the need to invoke
spatiality on a more-than-superficial level is strikingly abundant among
electroacoustic composers, as neatly summarised by Worrall:
It has become important to find more appropriate ways to explore the nature
and use of [spatiality] so as to avoid using it purely decoratively in effects such
as sounds ‘whizzing’ around the auditorium.45
The specific ways in which composers seek to attribute meaning to their

compositional use of space are invariably tied to the particular aesthetic
model to which they subscribe; we shall return to this matter in Chapter 2.
1.11. Methodological Choices in the use of

Loudspeakers
Of course all of the above is theoretically applicable to any sound

reproduced over two or more loudspeakers: voices amplified in real-time in
public address systems; live recordings of classical music played back on
home stereo equipment; multitrack studio recordings of rock bands
projected through loudspeakers in night clubs; sounds broadcast over the
airwaves to radios and television sets, and so on. How do we determine
whether an implementation of loudspeaker technology is ‘electroacoustic’
or not? Some might indeed argue that any sound broadcast via loudspeakers
is electroacoustic. Technically, this is a justifiable assertion: we are using
electrical devices for the broadcast of sound. However, recalling section 1.2,
if we subscribe to this proposition, we run the risk of delivering a definition
of electroacoustic music that is effectively meaningless, and probably at
44
Austin (2000). "Sound Diffusion in Composition and Performance - An Interview with
Denis Smalley". Computer Music Journal, 24(2): 20.
45
Worrall (1998). "Space in Sound - Sound of Space". Organised Sound, 3(2): 94.
33
odds with the intuitive understandings of most practitioners. A greater
degree of clarity is therefore to be sought.
In many applications, the role of loudspeakers is to simulate real and

familiar sonic events. That is, loudspeakers are used simply as an alternative
means to realising auditory scenarios that could ostensibly be facilitated by
‘real’ sounding agents. For example, the effect of playing a CD of rock
music at home is essentially comparable to the sonic experience of a real
rock band performing in one’s living room. This is a function of the
immediately recognisable nature of the sounds themselves (we can identify
the sound of a drum kit without the affirmative presence of the visual
stimulus itself) and the essentially ‘realistic’ way in which the spatio-
auditory illusion is built (the way instruments have been recorded and
deployed within the stereo field). These two factors are mediated by, and
dependent upon, the manner in which the listener perceives the stimulus,
which will result from a combination of their expectations (sub-conscious)
and deliberate listening strategies (conscious). These are all aspects that are
fairly specific to the loudspeaker as a functional tool or creative framework,
and as such, engender particular ‘methodological choices’ that must be
negotiated by anyone wishing to use loudspeakers for this purpose. It is
proposed that the way in which these decisions are negotiated may be
helpful in defining electroacoustic music as a practice distinct from other
applications of the same technology. Some of the ‘framework-specific’
negotiations that must be made in the use of loudspeakers are outlined in the
following sections.
1.11.1. Choice and Treatment of Sound Material
The reproduction of sound over one or more loudspeakers is a deliberate

act, and as such, choices must be made with respect to the sonic
material to be presented, and indeed the sonic material to be omitted.
Consider the process of recording a performance of an orchestral violin
concerto. We might place a stereo pair of microphones in front of the
orchestra, a further pair towards the rear of the hall to capture some of
the reverberant characteristics of the venue, and a single directional
34
microphone close to the soloist. The resulting recorded signals are
mixed together in order to create the notionally ‘realistic’ illusion of an
orchestral performance on reproduction over a pair of loudspeakers.
According to this objective, proactive efforts will have been made to
capture the ‘important’ auditory information (the orchestra, the soloist)
and, equally, to omit unwanted information (ambient traffic noise, or
coughing from the audience for example). Equally, microphones will
have been positioned so as to reinforce the ‘realistic’ aspirations of the
recording: a microphone placed too close to the soloist’s instrument will
result in an unrealistically ‘magnified’ sound for example. Accordingly,
care will have been taken in recording and subsequent mixing to adhere
to the aims and objectives of the recording. These represent very
specific choices (and omissions) of sound material, and very particular
ways in which sound material is obtained and treated. Although
recorded sound has been used as an example here, similar
considerations would inform the choice and treatment of synthesised
sounds. It is here that one must consider the ‘purpose’ for which one is
choosing, creating, or manipulating sound material.
1.11.2. Construction of Auditory and Spatial

Illusion
This aspect makes reference to the fact that sound produced over
loudspeakers is inherently illusory in nature. When our orchestral
recording is reproduced over a pair of loudspeakers, we are not really
hearing the sound of an orchestra, but rather we are provided with the
illusion of it. When a sound source seems to move around a venue, the
actual sound sources – multiple loudspeakers – are static in reality, but
are again providing the audience with the illusion of a moving sound
source. Indeed the use of spatial illusion in an auditory context is a
highly important consideration in the construction of the auditory
illusion as a whole. In the orchestral example given, the relationship
between orchestra and soloist is essential to the success of the auditory
illusion. In all likelihood, the signal from the soloist’s microphone will
be positioned centrally and statically within the stereo field, while the
35
‘orchestral’ and ‘ambient’ signals will occupy a wider portion of the
stereo field, for instance. The soloist is likely, also, to be mixed to a
level slightly higher than that of the orchestra, for clarity. The illusion
would break down, or at very least be considered bizarre, were the
soloist to appear to move around the auditorium, or if the sound of the
solo violin was masked by an erroneously loud orchestral
accompaniment. These represent very particular decisions regarding the
use of auditory and spatial illusion and, again, reinforce the overall
intended mode of listening.
1.11.3. Mode of Listening
This final – and arguably most important – aspect refers to the overall
intention of the broadcast sound with respect to how it should be
perceived. It is the mode of listening that completes the auditory
illusion, and ultimately determines its relative success. Obviously this is
highly dependent on the listener but can also be regarded, to some
extent, as a function of the choices made in the previous two areas.
Most listeners, on hearing a recording of an orchestral performance, will

know that they are not hearing the live sound of a real orchestra.
However, if shrewd decisions have been made regarding the choice of
sound materials to be presented and omitted, and in the way that these
materials have been recomposed and presented to the listener, then the
auditory illusion should be sufficiently persuasive. In this case, the
listener will be ‘encouraged’ to suspend their disbelief, and imagine that
they are indeed listening to the sound of a real orchestra. If this happens,
then the listener has adopted a particular ‘mode of listening’ that is
congruent with the intentions of the material being presented, as a direct
result of the way it has been presented. Such modes of listening are
often adopted sub-consciously – in playing a CD of the orchestral
recording, the ‘correct’ mode of listening may automatically be assumed
by the listener – but can also be conscious acts, whereby the auditory
illusion on offer is either wilfully embraced or deliberately subverted.
For example, a listener may expect (subconsciously) that playing a CD
36
by the heavy metal group Metallica will present them with the auditory
illusion of the real band performing, and the results may be more
successful if the listener deliberately (consciously) chooses a
complementary listening strategy in order to ‘complete the illusion’.
These are two aspects – subconscious and conscious – of the mode of
listening. Of course this response is easy for the listener if the nature of
the recording has been deliberately engineered to engender it. This,
broadly speaking, is a function of the choice of sound materials and the
way in which the auditory (and spatial) illusion has been executed.
It is proposed that the successful presentation of sound over loudspeakers, in

any context, is dependent on the choices made in areas 1.11.1 and 1.11.2
having been successfully negotiated in accordance with their own objectives
as articulated in 1.11.3. Further, it is proposed that any presentation of
sound via loudspeakers necessarily entails choices in each of these areas,
purely because it engages the loudspeaker as one of its frameworks. The
extent to which these choices are well-informed or even deliberate may in
some cases be debatable, but it is difficult to imagine a scenario in which
they do not exist. It can also be seen that the choices made in each of these
areas are mutually inter-dependent. Some examples have already been given
but a further example may be helpful.
Imagine an announcement broadcast over the public address system in a

train station. Some very specific decisions regarding the choice sound
material have been made: usually some kind of pitched tone(s), followed by
the sound of a human voice (and nowadays the voice may very well be live
or pre-recorded). The intended mode of listening is predominantly semantic
in nature, and purely functional, and the choice of sound material reflects
this: the pitched tones are designed to attract the listening attention of the
public and the words ‘spoken’ are designed to be intelligible (anecdotally
this is often not the case). Owing to the nature of the sounding material, a
listener will most probably enter the correct mode of listening more-or-less
automatically (sub-consciously) but – particularly true in this example – an
additional and deliberate (conscious) effort to establish this mode may be
37
necessary in order for the perceived results to be intelligible, and therefore
for the intended mode of listening to be successful. The use of spatial
illusion is also functional: ideally we should perceive an omnipresent voice,
or at least the sound of an announcer who ‘appears’ to be simultaneously
located in several places at once. In other words, several loudspeakers are
used, but without any phantom imaging as such: although the sonic illusion
of a steward moving around the station may be appealing to some, it would
jeopardise the essential aim of the exercise, intelligibility. Thus can it be
illustrated that even the most seemingly benign use of loudspeakers
necessarily involves explicit decisions in each of the three stated areas: such
is the nature of this particular framework. There may, of course, be other
areas, but the three examples given are particularly relevant to
electroacoustic music, and have served to illustrate the point adequately.
1.12. Towards an Understanding of ‘Electroacoustic

Music’
It is proposed that the ways in which electroacoustic composers negotiate

the methodological choices engendered by the creative frameworks (audio
technologies) employed, might reveal some characteristic trends that will
help to define the idiom more clearly. Specifically, it is suggested that
electroacoustic music is that which directly and categorically explores (or
seeks to promote an exploration of) the artistic potentials of the particular
creative frameworks engaged in its realisation. This sets electroacoustic
music apart from applications that make use of audio technologies for
purely functional purposes, such as the train station announcement
described in the previous section.
Of course, one could argue that any artistic practice is bound to entail, at
some level, an exploration of the creative potentials of the frameworks it
engages, and there is certainly much validity to this argument.46 In
particular, this leaves the relationship between electroacoustic music and
46
There is a danger here of digressing into an argument about what constitutes ‘art.’ While
this is undoubtedly an area of interest to practitioners of electroacoustic music, it is –
realistically – beyond the scope of this thesis.
38
‘popular’ music praxes – which would not normally be regarded as purely
‘functional’ applications and can therefore be considered ‘artistic’ – open to
debate, as these make particularly extensive use of audio technologies.
Let us consider, however, the nature of a deliberately stereotypical ‘pop

song.’ There will normally be a fairly small number of musical instruments:
for argument’s sake let us assume that this consists of an electric guitar, bass
guitar, keyboard/synthesiser, drum-kit, and a vocalist. The musical material
will almost always be tonal, rhythmical, and divided into eight-bar sections,
often in the standard ‘verse-chorus-verse’ structure. There will also be
lyrics, often of a narrative and/or auto-biographical nature, whose subject
matter is dependent to a certain extent on the ‘genre’ of the song. The list of
criteria could go on much further, but it is worth considering, at this point,
what the primary creative frameworks of our archetypal ‘pop’ song are. One
might suggest: each of the musical instruments employed; (simple) western
tonality; metrical rhythm; repetitious (but ‘catchy’) structuring principles;
language (usually English); and so on. These are the frameworks within
which the majority of the creative process takes place. Audio technologies
will certainly be used to record, produce, and disseminate pop songs, but it
cannot usually be argued that they represent the primary creative
frameworks of this idiom; creative attention is focussed too strongly
elsewhere. In short, it can be suggested that most ‘popular’ music styles do
not appropriate audio technologies as primary creative frameworks, as such
but, rather, make more functional use of their affordances.
The following sections will describe some musical works that are generally
regarded to be ‘electroacoustic’ in nature. It will be shown that for each of
the methodological choices described in the previous section – which,
recall, arise from the very nature of the loudspeaker itself as a creative
framework – ‘electroacoustic’ works can be identified that adopt the
methodological choice in question as a specific focus for creative
exploration. In that case we would expect that a characteristically
‘electroacoustic’ approach might negotiate the kinds of methodological
choices outlined in the previous sections with a creative, experimental,
39
artistic, or otherwise exploratory agenda, as opposed to approaching them as
a functional necessity to the realisation of artistic objectives whose creative
emphasis lies elsewhere (examples of the latter eventuality will also be
given). This particular observation is useful if we wish to differentiate
electroacoustic music from other musical praxes that make use of the same
technologies.
It is also worth remembering that comparable methodological choices could

almost certainly be described for any other creative framework imaginable:
the loudspeaker is simply being used as an example as it is essential to all
1.13. Creative Exploration of Framework-Specific

Methodological Choices in Electroacoustic Music
The ‘exploratory’ nature of electroacoustic music is supported by the

broader notion of ‘experimental music’ (musique expérimentale), which is
described by Chion as any music ‘conceived in a spirit of research’.47 Much
theoretical work is devoted to describing ways in which an experimental
approach to these criteria might be adopted: Wishart’s explanation of the
aural ‘landscape,’48 and Barrett’s paper entitled ‘Spatio-Musical
Compositional Strategies,’49 are two representative examples from a rich
and varied body of literature. The following sections do not seek to imply
that all electroacoustic music devotes itself to an exploration of the creative
potentials of the loudspeaker; merely that the abstract ways in which the
technological frameworks are approached as a primary creative focus might
be indicative of a characteristically ‘electroacoustic’ methodology.
1.13.1. Exploration of Choice and Treatment of

Sound Material
In most applications of audio technology, sound materials recorded or

synthesised are generally familiar and/or functional in nature, that is,
47
Chion (2002). "Musical Research: GRM's Words". Website available at:
48
Wishart (1996). On Sonic Art (Amsterdam; Harwood Academic): 127ff.
49
Barrett (2002). "Spatio-musical Composition Strategies". Organised Sound, 7(3).
40
they exist within cultural norms that are widely understood and
accepted. To reiterate an earlier example, the rock music recording
exhibits choices of sound material that for the most part fit within a
culturally accepted model of ‘rock music,’ which in turn fits within a
culturally accepted model of ‘music’ as a whole. While these sound
sources have undoubtedly been subjected to various studio signal
processing techniques, this will have been done in such a way so as not
to disrupt the overall cultural congruency of the sound materials. Of
course there can be a degree of creative exploration of the sound
materials of rock music, but it must normally be subject to restriction if
the end result is to be perceived as ‘rock music.’ The choice and
treatment of sound materials in this context, therefore, cannot truly be
considered creative, experimental, or exploratory in the fullest sense,
insofar as the exploration is, at best, a secondary agenda.
In much (but not by any means all) electroacoustic music, contrastingly,

creative exploration of sound materials is a primary agenda,
representing the very raison d’être of the music itself. For example, the
juxtaposition of culturally ‘recognisable’ and ‘unrecognisable’ sounds
as the basis for a creative approach in this area is strikingly common in
electroacoustic music. This process is described by McNabb, with
reference to his much-cited 1978 tape work Dreamsong50:
In computer music, one can take advantage of the unique plasticity of the
medium to produce combinations of and transformations between
arbitrarily different sounds. Again, it is the interplay of the familiar and
the unfamiliar or unexpected which I find to have the most expressive
potential. A juxtaposition of two completely unalike sounds need not be
instantaneous, but can occur over any time-span, producing a gesture of
great musical expressiveness and beauty, poignancy or tension, a concept
which was the primary source of inspiration for my work Dreamsong.
This juxtaposition of the familiar and the unfamiliar seems to me to be of
great historical significance. Greater, say, than the introduction of the
crescendo to Western orchestral music.51
Parmegiani expresses a similar agenda with reference to his fixed

medium work Sonare:
50
McNabb (1993). Dreamsong. (Compact Disc - Wergo Schallplatten: WER 20202).
51
McNabb (1986). "Computer Music: Some Aesthetic Considerations". In Emmerson [Ed.]
The Language of Electroacoustic Music: 145-6.
41
For each of the 5 movements, I have chosen a pseudo-instrumental or
synthesis sound which I sense will allow me to bring out its ‘very
essence,’ to develop it until it is within the deepest levels of the soul… I
had to imagine the most suitable interplay to bring out such ‘intrinsic
resonance.’ Now of course, no interplay can be genuine unless certain
freedoms are present within or inherent to it, the rule being that such an
interplay should remain musical whilst the sounds, ‘in a real context’
become linked to or opposed to each other. No combat, just interplay for
its own sake, for itself alone, and, at the same time, changes in contour, an
opening or closing in the tone, range, patterns of rhythm, as if the work
were a living being in itself…52
The above quote embodies the exploratory approach to sound material

that is so characteristic of ‘acousmatic’ music, a sub-set of
electroacoustic music that will be discussed further in the next chapter.
1.13.2. Exploration of the Construction of Auditory

and Spatial Illusion
As described earlier, in most non-electroacoustic applications of audio

technology, spatial illusion is built in order to affirm the illusion of one
or more readily identifiable sounding bodies. A recording of an
interview for radio broadcast, for example, may be spatially engineered
so that the interviewer and interviewee seem to be located opposite each
other; this could be achieved by panning the former towards the left of
the stereo field, and the latter toward the right. The same can generally
be said in the case of our rock music example: spatialisation will
generally be carried out so as to reinforce the illusion of a group of real
musicians performing on stage. Here, there is no exploration of
spatiality in its own right, a practice that is common, however, in
Barrett’s tape-only piece The Utility of Space is described by the

composer as follows:
The Utility of Space is an exploration of spatial musical structure. This

exploration is done in two ways: with poetic spatial implication, and with
52
Parmegiani (1996). Sonare. (Compact Disc - INA: E5203-275912). Quote from
accompanying booklet.
42
carefully controlled trajectories and sound magnitudes in real spatial
places.53
Barrett describes the techniques used to explore the construction of

spatial illusion in some depth in the article cited earlier, referring at
times specifically to this piece.54
Obst’s Solaris,55 for chamber ensemble and live electronics, also

represents an exploration of the construction of spatio-auditory illusion,
although this work is very different, conceptually and aesthetically,
from Barrett’s The Utility of Space (aesthetic standpoints will be
discussed more fully in the next chapter). Here, live instrumental sounds
are captured by microphones. They are then subjected to various real-
time digital signal processing techniques and spatialised – using Ircam’s
Spatialisateur software running on a NeXT computer workstation –
among four loudspeakers, where the processed sounds will be heard
alongside the unprocessed instrumental sounds. The way in which
processed sounds are spatialised is determined algorithmically:
The control of different parameters during the sound treatment is based on

[…] mathematical principles, as is the spatialization of these sounds in the
concert hall… The sound-projection uses […] random numbers […] by
transforming them into distance values between 4 and 8 metres.56
1.13.3. Exploration of the Mode of Listening
Much electroacoustic music encourages us to listen in different ways,

perhaps attempting to redress the balance of our visually dominated
culture. This trait is characterised by an acute awareness of auditory
surroundings – whether ‘natural’ or ‘artificial’ – and, unsurprisingly, is
often reflected in musical output. Much electroacoustic music whose
primary focus explores the mode of listening is descended from Pierre
Schaeffer’s school of musique concrète, which will be discussed more
53
Various (2001). Cultures Électroniques No. 15: 28e Concours International de Musique
et d'Art Sonore Électroacoustiques. (Compact Disc - Mnémosyne Musique Média:
LCD278074/75). Quote from accompanying booklet: 18.
54
Barrett (2002). "Spatio-musical Composition Strategies". Organised Sound, 7(3).
55
Obst (1998). "Sound Projection as a Compositional Element: The Function of the Live
Electronics in the Chamber Opera 'Solaris'". In Barrière and Bennett [Eds.] Composition /
Diffusion in Electroacoustic Music.
56
Ibid.: 321-322.
43
fully in the next chapter. Examples of this can be found in Ferrari’s
Presque Rien tape pieces. With reference to Presque Rien avec Filles,
the composer states the following:
I emit [sonic] images, but more in the form of an empty frame which has
to be filled in by listening.57
The piece, like its namesakes, comprises environmental recordings,

presented in a relatively unaltered form. Thus, the listener is invited to
construct the piece, or ‘fill in the blanks’ by adopting a somewhat
intense mode of listening, centred on an appreciation of the evocative
characteristics of these ‘naturally occurring’ sounds. As such, the piece
has a lot in common with what is known as ‘soundscape’ composition, a
practice predominantly developed at Simon Fraser University and
characterised by the compositions and writings of R. Murray Shafer,
Barry Truax, Hildegard Westerkamp and others, and closely linked to
the practice of acoustic ecology. A recent composition by the author –
Smoo Cave (2002) – falls into this category. In Kits Beach Soundwalk,
Westerkamp ‘takes the listener by the hand’ and guides them, by way of
a spoken-word narrative, through the process of listening to the
soundscape (which is presented in a more or less unprocessed form) in
great detail:
It’s a calm morning. I’m on Kits Beach in Vancouver… The ocean is flat,
just a bit rippled in places… I’m standing among some large rocks, full of
barnacles and seaweed. The water moves calmly through crevices. The
barnacles put out their fingers to feed on the water. The tiny clicking
sounds that you hear are the meeting of the water and the barnacles. It
trickles, and clicks, and sucks, and… The city is roaring around these tiny
sounds, but it’s not masking them… But I’m trying to listen to those tiny
sounds in more detail now. Suddenly the background sound of the city
seems louder again. It interferes with my listening. It occupies all acoustic
space, and I can’t hear the barnacles in all their tinyness… Luckily we
have band pass filters and equalisers. We can just go into the studio, and
get rid of the city: pretend it’s not there; pretend we are somewhere far
away. [The sound of the city gradually fades into nothing.] These are the
tiny, the intimate, voices of nature.58
57
Ferrari (1998). Electronic Works. (Compact Disc - BV Haast: CD9009). Quote from
accompanying booklet.
58
Westerkamp (1996). Transformations. (Compact Disc - Empreintes Digitales: IMED
9631). Quotation transcribed from ‘Kits Beach Soundwalk’ (1989).
44
The above quotation is a condensed transcription of the beginning of the
spoken-word narrative that guides the listener through Kits Beach
Soundwalk. It is evident that, in terms of studio processing, relatively
little has been done to the source recordings, and such processing that
has taken place has been with the specific intention of encouraging the
listener to focus their concentrated attention to the tiny details of the
soundscape that might otherwise pass unnoticed. Here, as in Ferrari’s
piece, experimentation with the mode of listening is of primary
compositional importance.
Listening, in the examples given, is not a passive behaviour, but an

active one. Implicit here, broadly speaking, are two distinct levels of
attention: a focus on the intrinsic characteristics of the sounds regardless
of their cultural connotations; and a specific focus on the cultural
connotations, that is, on the extrinsic or ‘extra-auditory’ information
imparted by sonic events. Of course, the two need not be mutually
exclusive, that is, the listener’s attention will probably be variously
divided between the intrinsic (sound as autonomous phenomenon) and
extrinsic (sound as a ‘signifier,’ of sorts) aspects of the stimulus
material. The exception to this can be found in the original incarnation
of musique concrète (to be discussed more fully in the next chapter),
which proposed a deliberate denial of all extra-auditory understandings
of sonic material, in favour of a phenomenological appreciation of the
sounds themselves, an act now widely regarded as difficult to achieve in
practice:
It proves very difficult to hear sound only in terms of an appreciation of its

shape and spectral properties as Schaeffer seemed to advocate.59
Accordingly, theories of modes of listening, and compositions reflecting

these, have been numerously proposed and documented. For example,
some subscribe to the idea that our response to auditory stimulus,
familiar or otherwise, is conditioned by innate ‘survival instincts’ that
force us to attempt to identify the cause of everything we hear. Smalley
59
Emmerson (1998). "Aural Landscape - Musical Space". Organised Sound, 3(2): 136.
45
refers to this process as ‘source bonding,’ and further refines the theory
with the notion of ‘gestural surrogacy,’ the ability (or inability) to
associate a sonic event with an imaginary physical gesture as its causal
agent.60 In brief, ‘source bonding’ refers to the ability to extrapolate a
hypothetical source for a sonic event, and ‘gestural surrogacy’ to the
ability to imagine what kind of gesture that sounding agent may have
made to cause that sonic event to happen. The ‘ecological’ model of
response to auditory stimulus is summarised by Windsor as follows:
A growing body of research is attempting to study the perception of

sounds which do not resemble traditional speech or music in a manner that
takes account of the perception of sound sources and their potential to us
as active (rather than passive) organisms. Within the field of ecological
acoustics, sounds are not viewed as being perceived as abstract entities
related only one to another, as ‘tone colours’ or timbres, nor are they
perceived as standing for concepts or things, as signs. Instead they are
seen as providing unmediated contact between listeners and significant
environmental occurrences.61
One might regard the notion of ‘transcontextuality’ as something of an

extension to the ‘ecological’ model of auditory response. While musique
concrète proposed an interpretation of sonic stimulus that focussed on
purely intrinsic characteristics, so transcontextuality can be regarded as
something of the opposite: the primary object of attention is the context,
or cultural understanding, evoked by sounds. Experiments in the
relationship between text, context, and super-context are described by
Savouret:
Let us take another example, one that illustrates a text diffused within a
foreign context: traditional musicians from Auvergne performing at a
public concert geographically far away from the place where their music
[…] is [normally] practiced… I discovered with them […] that producing
their text was made easier in the context of the hall (which was simply a
volume plus spectators […] from anywhere but the Auvergne) when I
added a virtual sound ‘over-context’ to which they could relate… One of
the musicians […] said that for him the song of crickets from a valley in
Cantal guided him as to how, that particular evening, he had to sing… For
another musician, the sound of the passage of a herd of cows, far from
hindering him in a work song provided him with indications as to the
tempo of the song… Thanks to a better relationship between the
performers and the context, brought about by a trick of sound – a super-
context, so to speak – listeners who belonged to cultures that had little to
60
Smalley (1997). "Spectromorphology - Explaining Sound Shapes". Organised Sound,
2(2): 110-112.
61
Electroacoustic Sounds". In Emmerson [Ed.] Music, Electronic Media and Culture.: 10.
46
do with the Auvergne were able to receive the entire text/context/super-
context in a much more satisfactory way. 62
Although this quotation does not refer directly to electroacoustic music,

it does demonstrate the intriguing and subtle responses that emerge from
exposure to auditory ‘contexts,’ (or, to rephrase, the ways in which
one’s perception of ‘context’ can be profoundly affected by auditory
stimulus) and hints at ways in which composers can harness this
phenomenon as a dimension to their musical discourse. One such
composer is Ambrose Field, who exemplifies transcontextuality as
follows:
When applied to electroacoustic music transcontextual working is a

method by which the extrinsic meanings of a sound can have a profound
impact on their musical surroundings… [It] can be used as a tool to lend
old or existing contexts new meanings… In La Disparition, [Christian]
Calon creates a scene where an aeroplane appears to fly over a tropical
jungle. This is initially accepted by the listener as the probability of this
event occurring in reality is quite high. However, as the piece progresses,
the sound of the aeroplane continues descending in pitch until it eventually
forms the bass drone to the subsequent section… The sound of the
aeroplane is the main transcontextual agent, as it performs both the
function of ‘aeroplane’ and the musical function of a bass drone. During
this transformation from aeroplane to bass drone, this sound has changed
the way we perceived the context that surrounds it.63
It is clear that the possibility of including recognisable ‘every day’

sounds, and perhaps contrasting these with unfamiliar sounds (either
processed or synthesised) in a musical context, gives rise to the ‘mode
of listening’ as a valid area for diverse musical exploration, as
demonstrated in the examples given above.
1.14. Back to Technology
It has been established that an understanding of electroacoustic music goes

further than a simple description of the technological frameworks employed
in its production and performance. Nonetheless, it can also be seen that the
very possibility of a creative exploration in any of the areas discussed is
facilitated by the existence of recording, audio processing, and synthesis
62
Savouret (1998). "The Natures of Diffusion". In Barrière and Bennett [Eds.] Composition
/ Diffusion in Electroacoustic Music: 351.
63
Field (2000). "Simulation and Reality: The New Sonic Objects". In Emmerson [Ed.]
Music, Electronic Media and Culture: 36, 50, 51.
47
technologies and the ability (in fact, the necessity) to reproduce the resulting
encoded audio streams over loudspeakers. In the words of Trevor Wishart:
The sophisticated control of this dimension of our sonic experience has only
become possible with the development of sound-recording and synthesis and
the control of virtual acoustic space via sound projection from loudspeakers.64
It can be argued that electroacoustic music is one of the few disciplines that
attempts to fully explore the creative possibilities uniquely offered by the
technologies discussed, and it is therefore something of a paradox that the
majority of applications of these technologies cannot be considered
‘electroacoustic.’
1.15. Summary
It has been a primary purpose of this chapter to provide a workable

definition of ‘electroacoustic music’ on a mainly technological basis, and
with as little discussion of aesthetic principles as possible. From a starting
point at which ‘loudspeakers’ and ‘fixed media’ were cited as rather
ambiguous defining characteristics (see sections 1.4 and 1.5), a clearer
definition has been sought.
1.15.1. The Technological Demands of

It can now be restated that electroacoustic music, at present, does indeed

depend on performance via loudspeakers. Furthermore, because
loudspeakers are effectively useless without feed signals, so
electroacoustic music is equally dependent on the existence of encoded
audio streams, as described in section 1.6. These two characteristics can
be persistently observed in all works of electroacoustic music.65
64
Wishart (1986). "Sound Symbols and Landscapes". In Emmerson [Ed.] The Language of
65
Although, as noted in section 1.8.3, there are certain works that could arguably be
described as electroacoustic despite the fact that they do not require any loudspeakers or
encoded audio streams in order to be performed. However, it can also be argued that
loudspeakers and encoded audio streams have played an absolutely central role in the
compositional process in this case. The works in question are strongly ‘acousmatic’ in
essence and could not realistically have been composed without the use of loudspeakers and
encoded audio streams as mediators of acousmatic sound:
Nance (2005). Personal communication with the author.
48
Encoded audio streams cannot exist without some kind of device to
produce them, and therefore electroacoustic music must also be
dependent upon other technologies. It is at this point, however, that the
technological prerequisites become less unanimous between works.
Nonetheless, the following three primary technologies have been
identified (see sections 1.7 and 1.8):
• Audio encoding technologies

• Synthesis technologies
• Recording and playback technologies
Recording and playback technologies are only useful in the context of

audio encoding and/or synthesis technologies, and all three of these
technologies – if they are to be useful in a performance context – must
be used in conjunction with:
• Audio decoding technologies
It can therefore be concluded, in addition to the prerequisite audio

decoding technologies (in most cases, loudspeakers) and mediation via
encoded audio streams, that any given piece of electroacoustic music
will depend on at least one other of these three primary technologies for
its realisation in performance. Two other technological categorisations,
which are secondary in the performance context, have also been
identified. These are:
• Audio processing technologies

• Software technologies
Audio processing technologies are secondary because they require at

least one other of the three primary technologies in order to function
usefully within a performance context, and therefore cannot be used on
their own. In section 1.7.5 it was observed that software technologies, in
many cases, are used as a surrogate for one or more of the three primary
technologies in performance. In such cases, for the purposes of the
present discussion, it is the primary technology that takes precedence.
49
Other applications of software technology (as a structuring tool, or as a
means to converting physical phenomena into control data, for example)
are really beyond the scope of this thesis, but have been noted.
It can now be concluded that, in technological terms, the performance of

a piece of electroacoustic music can necessitate any combination (one or
more) of the six technological ‘units’ given in Table 2 (page 23). These
consist of the three primary technologies, plus each of these used in
conjunction with audio processing technology, resulting in a total of
sixty-three possible unique combinations. Of course, all of the
technologies communicate via encoded audio streams and must
therefore be used in conjunction with audio decoding technologies
(loudspeakers).
1.15.2. Particular Approach to Audio Technologies

in Electroacoustic Music
It should be recalled that the motivation behind such a system of

technological classification is not so much to define electroacoustic
music in its totality (indeed, this is not possible) as it is to ensure that all
electroacoustic works can be adequately catered for in the performance
context. As such, these technical requisites will become an important
criterion in the evaluation of sound diffusion systems that will be
undertaken in Chapters 4 and 5. In any case, it has been noted that the
technologies themselves are not intrinsically ‘electroacoustic’ in nature:
they have a wide variety of applications outside the field of what would
normally be considered ‘electroacoustic music.’
It can now be concluded that what differentiates electroacoustic music

from other praxes that utilise the same technology, is the locus of the
creative focus. In electroacoustic music, it is likely that at least one of
the objects of creative exploration will be directly attributable to the
unique characteristics or affordances of the creative frameworks (audio
technologies) employed. (This does not, of course, mean that there will
not be any other creative avenues explored.) In other applications, the
50
object of creative exploration is more likely to exist somewhere outside
the direct sphere of influence of the technology, which is therefore often
used simply as a functional tool.
In sections 1.10 to 1.13, a specific framework – the loudspeaker – was

used to exemplify this. As a functional tool or creative platform, the
loudspeaker has certain peculiarities that need to be addressed in its
operation, whatever the purpose. In most cases, the unique nature of the
platform is, to a certain extent, disregarded and it is used in an
essentially functional or conventional manner, as a means to realising
objectives that could equally well be realised in a different way. Where
this is the case, the loudspeaker is being used as a secondary, or mainly
functional framework. In electroacoustic music, by contrast, there is a
tendency to explore the possibilities that are uniquely offered by the
platform, and therefore could not be realised in any other way: examples
were given in section 1.13. It is proposed that the same model can be
applied to other audio technologies.
It can therefore be stated that a purely technological definition of

electroacoustic music that excludes all aesthetic considerations is
impossible to attain. This is due in part to the fact that the frameworks
used are not exclusive to electroacoustic music and therefore have a
variety of non-electroacoustic applications. Consequently, part of what
defines electroacoustic music is the unique way in which the
frameworks are appropriated, and in this sense, aesthetics and
technology are intrinsically bound. It remains useful, however, to be
able to define the technological scope of electroacoustic music so as to
ensure that works exerting various different technical demands can be
accommodated side-by-side in live performance.
51
2. Top-Down and Bottom-Up Approaches
to Electroacoustic Music
2.1. Introduction
The previous chapter described electroacoustic music as an artistic practice

whose creative driving force derives in part (and, in some cases, wholly)
from the unique affordances of the frameworks (audio technologies)
employed in its realisation. This is in contrast to many other applications of
the same technology whose primary creative interests lie elsewhere, and
whose use of the technology is essentially as a functional means to
achieving these objectives. Consequently, it was concluded that a purely
technological account of electroacoustic music, devoid of any aesthetic
discussion whatsoever, would be difficult. Nonetheless, the main focus of
the previous chapter was the technology itself. The purpose of the present
chapter is to focus more directly on the aesthetics of electroacoustic music,
with less emphasis on the specific roles of technology.
The context of this chapter is very broad, more concerned with large-scale
‘overall philosophies’ of electroacoustic music than with specific and
individualised views (of which, of course, there are many). With this borne
in mind, it will be proposed that electroacoustic music, in terms of how it is
composed and performed, falls into two highly generalised categories,
which will be referred to as ‘top-down’ and ‘bottom-up,’ respectively. On a
superficial level it can be stated that these terminologies implicate differing
ways in which musical structure and material can be devised but, as will
become clear, the full extent and implications of the difference is much
deeper. Specifically it will be proposed that these two opposing schools of
thought have a fundamental impact on the performance of electroacoustic
music via the process of sound diffusion: this will be discussed in Chapter 3.
Musique concrète and elektronische Musik represent an often-cited

historical example of the proposed top-down/bottom-up binarism and are
generally regarded as having been two concurrent but highly distinct
53
musical praxes that, between them, eventually developed into what we now
understand to be electroacoustic music. Bukvic suggests that these ‘have
long disappeared in their pure form,’66 and this view seems to be
increasingly accepted. He goes on to note, however, that ‘for aesthetic,
analytical and historical purposes,’ they remain useful.67 In the spirit of this
observation, the present chapter introduces the notions of ‘top-down’ and
‘bottom-up’ by briefly accounting the differing attitudes, working
procedures, advantages, and short-comings, of musique concrète and
elektronische Musik as representative examples, further proposing that their
aesthetic influences are still strongly in evidence.
Two further creative frameworks will be analysed: ‘real’ recorded sounds

and ‘artificial’ synthesised sounds. These are, of course, the archetypal
materials of choice of musique concrète and elektronische Musik,
respectively. It will be observed that these frameworks, by their very nature,
inherently lend themselves to the two differing attitudes and methods of
working in question, and indeed that any creative framework can exhibit a
‘directional bias’ in favour of the bottom-up approach or the top-down. It
will later be argued (Chapter 4) that sound diffusion systems – as creative
frameworks themselves – are no different in this tendency, and can basically
be divided into those that are essentially top-down and those that are
essentially bottom-up.
The top-down/bottom-up dichotomy is by no means an entirely new

concept, and has been described by various writers within and without the
field of electroacoustic music. However it is comparatively rare, within the
boundaries of electroacoustic music, for the binarism to be invoked in as
general a sense as it is proposed here, individual writers tending, rather, to
allude to it within fairly particular and specific contexts. An important
function of this chapter, therefore, will be in citing a few choice incarnations
of this fundamental duality, and presenting these as evidence of a more
general underlying trend. The results of this survey will be collated and
66
Bukvic (2002). "RTMix - Towards a Standardised Interactive Electroacoustic Art
Performance Interface". Organised Sound, 7(3): 277.
67
Ibid.
54
presented as a series of ‘salient features’ (Table 3, page 84). (It will be
proposed, as a slight aside, that these abstract criteria also allow for useful
applications of the top-down/bottom-up dichotomy, as an analytical tool, to
be made outside of the confines of electroacoustic music). Having clearly
defined the essential characteristics of both top-down and bottom-up
approaches to electroacoustic music, some contemporary examples of each
will be given.
In comparison with those aesthetic issues discussed in the previous chapter

(concerning the appropriation of creative frameworks et cetera), it should be
noted that the top-down/bottom-up distinction exists on a different level,
hierarchically speaking. Put simply, any creative process can be undertaken
in a top-down or bottom-up manner. To continue the example from the
previous chapter, a musical exploration of the unique affordances of the
loudspeaker, as creative framework, could equally be realised according to
top-down or bottom-up criteria. It should also be noted that the terms ‘top-
down’ and ‘bottom-up’ need not necessarily be mutually exclusive. While it
will be proposed that, at a very general level, any given piece of
electroacoustic music could broadly be defined as either ‘essentially top-
down’ or ‘essentially bottom-up,’ closer inspection may well reveal a
combination of both approaches across the various aspects of the
composition’s creation.
Having established a context within which creative frameworks can be

regarded as either top-down or bottom-up, it will be concluded that the
creative frameworks of electroacoustic music afford, perhaps for the first
time, equal opportunities for both top-down and bottom-up approaches to
the creative process.
2.2. Electroacoustic Aesthetics: A Case History
Electroacoustic music, in terms of its heritage, is often regarded as having

been binary in nature. The praxes of musique concrète and Elektronische
Musik represent the historical evidence of this. The former expression first
appeared in 1948 and is used to describe the musical activity borne out of
55
the music and theoretical writings of Pierre Schaeffer, at what is now known
as the Groupe de Recherches Musicales (GRM) in Paris. 68 The latter
describes its Germanic counterpart, which emerged at roughly the same at
Nordwestdeutscher Rundfunk (NWDR) in Cologne, guided by the musical
and intellectual activity of a group of individuals including composer
Karlheinz Stockhausen.
On an extremely superficial level it can be stated that musique concrète

consisted in music built out of ‘real’ recorded sounds, while elektronische
Musik was entirely synthetic, constructed electronically in the studio.69 In
both cases it can be observed that the praxes embraced what were, at the
time, entirely new creative frameworks, with their respective emphases on
sound recording and sound synthesis. Harrison states that ‘this is too simple
a distinction, based on a reading of only the most obvious surface
features’.70 This is undoubtedly true but, notwithstanding its over-
generalising nature, this particular distinction presents a valuable in-road to
the more profound aesthetic differences associated with these two
compositional approaches. For the purposes of elaboration, we must
momentarily adhere to this simplified model, to which we will return after a
brief account of musique concrète and elektronische Musik.
2.2.1. Musique Concrète
It is widely accepted that the theory and techniques of musique concrète

were more or less single-handedly pioneered by the French electronic
engineer Pierre Schaeffer. In Paris, in 1942, Schaeffer began a period of
research into the acoustics of sound, and soon became interested in
recording and playback technology (section 1.7.2), which at the time
was starting to become more readily available.71 Schaeffer was
68
Emmerson (1986). "The Relation of Language to Materials". In Emmerson [Ed.] The
Language of Electroacoustic Music: 18, 217.
69
The verbs ‘build’ and ‘construct’ are deliberately contrasted here for reasons that will be
clarified in section 2.5.1.
70
71
Manning (1993). Electronic & Computer Music (Oxford; Oxford University Press): 19-
20.
56
particularly interested in the use of recording technology to capture
‘every day’ sounds from the real world. This process is very much taken
for granted nowadays, but until the advent of recording technology it
simply was not possible to statically encode – and thereby ‘fix’ – a
sonic event in such a way: sound was an exclusively transient
phenomenon with no physical object representative outside the
incessant passing of time.
When auditory events are recorded, a static encoding that abstractly

symbolises the auditory events is created and this, of course, is
represented by a physical object (an acetate disc, magnetic tape, or CD,
for example). Schaeffer later proposed the expression ‘sound object’
(objet sonore72), reflecting this state of affairs; prior to the existence of
recording and playback technologies it would not have made a great
deal of sense to refer to a sound as an ‘object.’ Physical objects can, of
course, be manipulated in ways that transient phenomena cannot. When
sound is statically encoded onto analogue tape, for instance, it is
possible to cut, splice, and rearrange the tape; it is possible to play the
tape at different speeds; it is possible to play the tape backwards; it is
possible to play only specific sections of the tape; it is possible to make
tape loops by joining opposite ends of a section of tape; and so on.
These possibilities are brought about by the physical nature of the
recording medium itself, which, as it happens, is an agent for the
abstract representation of auditory events. Manipulations to the
recording medium will have ‘knock-on’ effects on the sound when it is
eventually reproduced. Such practice is sometimes referred to as
working ‘directly’ with sound, as opposed to working ‘indirectly’ via
notational representations of pitches and durations, or with any other
form of prescriptive scheme.73 The implications of this process may
72
Schaeffer (1966). Traite des Objets Musicaux (Paris; Seuil).
73
It could, perhaps pedantically but ultimately reasonably, be suggested that this is not
representative of working ‘directly’ with sound, but rather of working with an encoding of
it, and that working ‘hands on’ with (e.g.) bows and strings involves more ‘direct’ contact
with sound, as such. However, it cannot really be contested that Schaeffer’s methods were
more direct than notational frameworks, which really have nothing to do with sound
57
nowadays seem incredibly straightforward but, experienced for the first
time, they must have been exciting.
In Schaeffer’s early experiments (in which sounds were statically

encoded onto acetate discs, which – like analogue tape – can be subject
to certain physical manipulations), attacks were removed from bell
sounds, thus fundamentally altering the perceived characteristics of the
sound. This was achieved, very crudely by modern standards, by
striking the bell and starting the recording shortly afterwards, when the
bell sound had just begun to decay.74 Experiments in which phonogram
recordings were played back at different speeds also yielded changes in
the qualities of the reproduced sounds. These procedures represent
among the first instances of creative audio processing as an artistic
practice in its own right.
Schaeffer also observed that repeated listening to a short recorded

fragment (again, only possible with the use of recording technology)
would reveal intrinsic qualities of the sound that would probably have
gone unnoticed on a single hearing. This process might be tangentially
likened to the experience of saying one’s own name repeatedly until,
eventually, the word seems bizarre. It was this observation that led
Schaeffer to develop the concept of ‘reduced listening’ (écoute
réduité75), whereby listening attention is focused on the intrinsic
characteristics of sounds (on the particular qualities of sounds
themselves), rather than on interpreting the sounds in terms of what
might have caused them. An important part of Schaeffer’s research was
devoted to exploring the ways in which sound, thus defined as an
autonomous phenomenon with intrinsically interesting characteristics,
could be used as the basis for musical discourse. To this end he set
about devising a system by which sound objects could be described in
terms of their phenomenological characteristics, correctly observing that
whatsoever. It is in this sense that the expression ‘working directly with sound’ is to be
understood.
74
75
Schaeffer (1966). Traite des Objets Musicaux (Paris; Seuil): 270-272.
58
western classical notation could only accurately describe duration and
pitch, and even then only within the confines of metrical rhythm and the
chromatic scale. The results were published in 1952 under the title
‘Esquisse d’un Solfège Concret,’ as part of a book entitled À la
Recherche d’une Musique Concrète. The salient aspects of this
publication are summarised, in English, by Manning.76
What is regarded by many as the first piece of electroacoustic music

was composed during this period of research. Schaeffer’s Étude aux
Chemins de Fer,77 completed in 1948, was fashioned from recordings of
steam trains made at the Gare des Batignolles in Paris.78 As a piece of
musique concrète of the purest calibre, the intention was that the listener
focus solely on the phenomenological characteristics of the sounds
themselves, and the musical possibilities inherent in them. In practice,
Schaeffer observed, it was difficult to deny a recognition of the ‘train-
ness’ of this material; an important observation that foreshadowed years
of subsequent research into empirical sound perception and
psychoacoustics. Schaeffer’s groundbreaking research has influenced
(among many others) Denis Smalley, whose principles of
‘spectromorphology’ represent the aesthetic foundations on which much
electroacoustic music has been, and continues to be, built.79
That Schaeffer’s new musical practice was referred to as musique

concrète is to do with the nature of its source material. In one sense, as
previously stated, transient sounds could be ‘halted in their tracks,’
statically encoded onto a physical recording medium, and in this respect
‘concretised.’ Another (more important) sense in which ‘concrete’ is
invoked is with reference to the actuality of the material. The
76
Manning (1993). Electronic & Computer Music (Oxford; Oxford University Press): 30-
40.
77
Various (2000). OHM: The Early Gurus of Electronic Music. (Compact Disc - Ellipsis
Arts: CD3670).
78
79
Smalley (1986). "Spectro-Morphology and Structuring Processes". In Emmerson [Ed.]
The Language of Electroacoustic Music: 61-93.
2(2): 107-126.
59
characteristics of the sound have not, themselves, been ‘composed’;
they are ‘already there,’ and in that sense they are ‘concrete’; they are
‘given.’ Accordingly, and as Harrison observes, this is reflected in a
‘concrete’ method of working:
A further dimension of what was ‘concrete’ about musique concrète was

the method of working and, by extension, the relationship between
composer and material: as in sculpture or painting where the artist
produces the finished product on or in a fixed medium by manipulating the
materials (paint, wood, stone) directly, so in musique concrète the
composer is working directly with sound. [Harrison’s italics].80
An important implication here is that the composer has no choice but to

‘work with’ the particular qualities of the chosen material, just as a
sculptor must work with the particular characteristics of, say, granite. It
is not normally possible to alter the fundamental characteristics of
granite. One can sculpt it into a variety of different shapes and textures,
but the essential character of the source material remains the same. Of
course, the artist is at liberty to choose another material if necessary
(sandstone offers different possibilities to granite) and combine
materials as appropriate, but ultimately the particular qualities of each
material must be considered of paramount importance. Such is the
doctrine of musique concrète, and of much subsequent and
contemporary electroacoustic music derived from its principles.
It will be noted, of course, that Schaeffer’s manipulations of sound

objects took place within the abstraction layer that is brought about by
audio encoding (see section 1.6.1 and Figure 1 on page 11), that is, at a
point in between the encoding of audio streams and their subsequent
decoding. As such, these procedures demonstrate a specific exploration
of the creative possibilities uniquely afforded by new audio
technologies, and it can be demonstrated that the aesthetic of musique
concrète arose as a direct consequence of these explorations. This is, of
course, completely in keeping with the definition of ‘electroacoustic
music’ given in the previous chapter.
80
60
2.2.2. Elektronische Musik
While musique concrète can be regarded as having been an entirely new

model of musical thinking, a more or less complete departure from what
was previously considered (perceptually and syntactically) to be
‘musical,’ arising from the innovation of one pioneering individual,
elektronische Musik is best described as a continuation of certain
aspects of established musical practice, brought about by a group of
like-minded individuals.81 Nonetheless, it will be observed in due
course that elektronische Musik is similar to musique concrète insofar as
both pursue creative interests brought about specifically by the new
audio technologies.
The process of setting up a dedicated studio for the production of

elektronische Musik began in October 1951 after a meeting between
Fritz Enkel, technical director of North-West German Radio, and a
group of interested parties including Robert Beyer (also of NWDR),
Werner Meyer-Eppler (head of the Phonetics Department at Bonn
University), and composer Herbert Eimert, with the first compositions
appearing while the studio was still under construction.82 Work on the
studio was finally completed towards the end of 1953, by which time
several other compositions had been completed, and Stockhausen had
begun work on his two Studies.
The compositional aesthetic of elektronische Musik centred on the

structured organisation of, and ergo complete control over, all aspects of
sonic material. In this respect it can be regarded as a development of
serialism. The process of serial composition can essentially be regarded
as one of compartmentalisation, or parameterisation. Firstly,
compositional material must be abstractly described in terms of a
number of independent constituents which, in combination, express its
totality. For instance, sound might thus be described in terms of pitch,
81
Manning (1993). Electronic & Computer Music (Oxford; Oxford University Press): 43ff.
82
Ibid.43-45.
61
dynamic level, duration, and timbre, although many parameters, some
more abstract than these, are possible. In this way, individual
constituent attributes of sound are defined and therefore
compartmentalised. Each of these independent parameters is then
further defined in terms of a number of discrete, perceptually different,
states. This procedure is described by Stockhausen with reference to one
of the earliest pieces of elektronische Musik, Studie I, which was
completed in 1953:
A ‘serial system’ for sensorially evaluated frequency differences will

begin in the middle of the auditory range and extent to the limits of pitch
audibility.83
Here, Stockhausen is isolating (compartmentalising) the parameter of

‘pitch’ (frequency) by specifying a point of reference from which
differences can be measured. A less esoteric example is the chromatic
scale which, conveniently for the early serialists, expresses the infinitely
variable parameter of ‘pitch’ in twelve differentiated sub-divisions. Of
course we know that between ‘C’ and ‘C-sharp’ – notionally the
‘smallest possible’ subdivision of the chromatic scale – there is in
reality an infinite number of possible intermediate pitches, but this
presents organisational and representational difficulties, for how is one
to positively identify one member of such (to use a mathematical
expression) an nondenumerably infinite group?84
Denumerably infinite (or ‘countably infinite’) groups, such as the

chromatic scale, are more easily manageable from a serialist
perspective: we know that pitches theoretically carry on ascending and
83
Stockhausen-Verlag: 3). Quotation in accompanying booklet: 102.
84
The concept of infinity implies an unlimited quantity, or a lack of definable boundaries.
Within this context, however, it is possible for a measurement to be either ‘denumerably
infinite’ or ‘nondenumerably infinite.’ For the former case, integer numbers represent a
good example: we know that there is an infinite quantity of them, but it is hypothetically
possible to positively identify every single one of them by counting (even though this
would take an infinitely long time!). Floating point (or fractional) numbers, on the other
hand, represent members of a nondenumerably infinite group. While we are able to
enumerate the quantity of integer numbers that exist, for instance, between zero and ten, we
are unable to evaluate the quantity of floating point numbers that exists between these
boundaries.
62
descending ad infinitum, but at least we could hypothetically give a
name (or number) to any member of that infinite group. Abstractly, in
order for parameters to be sub-divided in this way, a constant unit of
measurement that arbitrarily describes the smallest interval between two
states must be defined for each parameter. This results in what Wishart
describes as a ‘lattice’ of discrete values85 and therefore renders the
number of possible values for that parameter denumerably infinite. The
composer may wish also to define upper and lower thresholds – for
example the extents of the human hearing range, to reiterate the
Stockhausen example given above.
Having defined a parametric lattice, the parameter in question can

become subject to mathematical operations: relationships between
latticed parameters can be easily defined; a succession of pitches can be
directly mapped on to a succession of, say, amplitudes. Such a
procedure would be absurd – for what exactly is the concrete
relationship between pitches and amplitudes? – if not for the fact that
each parameter is, effectively, represented numerically. In other words,
once compartmentalised parameters have been divided into ordered sets
of differentiated values, the development of each parameter throughout
the course of the music can be deterministically calculated according to
predefined rules. Stockhausen continues:
The duration of each note will be inversely proportional to its thus-defined

frequency difference, so that as the distance from the middle frequency
range increases, the duration decreases. The amplitude series is to decrease
proportionally to duration, as the frequency difference increases. Thus the
tendencies away from the middle register towards the lower and upper
limits of audibility will be perceptible from the correspondingly
decreasing duration and amplitude of the notes.86
Here, Stockhausen has defined a mathematical relationship

(specifically, inverse proportionality) between frequency interval and
duration, and, in turn, between duration and amplitude. In this context,
this is a compositional act. Such a relationship would be difficult to
85
Wishart (1996). On Sonic Art (Amsterdam; Harwood Academic): 23-30.
86
Stockhausen-Verlag: 3). Quotation in accompanying booklet: 102.
63
justify, however, in the context of musique concrète, because there
would be no such system of enumerated frequencies, durations, and
amplitudes. The elektronische composer is not, therefore, composing
directly with the sound itself (as is the case in musique concrète), so
much as with abstract mathematical constructs that happen to then be
applied to compartmentalised sonic or structural parameters. Similar
techniques have been documented with respect to other works
composed in the formative years of elektronische Musik:
During the first half of 1953 Beyer and Eimert composed Ostinate Figuren
und Rhythmen, and Eimert alone composed Struktur 8. These pieces are
characterised by the strict application of serial procedures to the processes
of tone selection and processing. Struktur 8, for example, is derived
entirely from a restricted set of eight intervallically related tones.87
It was previously suggested that the composer of musique concrète

‘works with’ the pre-existing characteristics of ‘real’ sound materials,
and that musical structure is therefore born out of these intrinsic
characteristics. In the case of elektronische Musik, the composer designs
the musical structure, in this respect, directly, and the intrinsic
characteristics of the resulting sounds are born out of the predetermined
relationships between their constituent parameters. The ‘directionally
opposite’ nature of these compositional approaches is neatly
encapsulated by Francis Dhomont:
[In musique concrète the] compositional method begins with the concrete
(pure sound matter) and proceeds towards the abstract (musical structures)
– hence the name musique concrète – in reverse of what takes place in
instrumental writing, where one starts with concepts (abstract) and ends
with a performance (concrete).88
The strong relationship between elektronische Musik and established

traditional compositional practice (most obviously, serialism) has
already been observed, and it can therefore be assumed that within
‘instrumental writing’ Dhomont would also include the compositional
procedures of elektronische Musik.
87
88
Dhomont (1995). "Acousmatic Update". EContact! 8(2). Electronic journal available at:
http://cec.concordia.ca/contact/contact82Dhom.html.
64
2.3. The Implications of ‘Real’ and ‘Artificial’ Sounds
As previously observed, it can essentially be stated that musique concrète

was concerned with ‘real’ recorded sounds, while elektronische Musik was
concerned with ‘artificially’ synthesised sounds. The reasons behind this
generalisation may already be clear, but as a step towards further clarifying
some of the essential aesthetic differences that co-exist within the broad
practice of electroacoustic music, it is helpful to examine the implications of
‘real’ sounds and ‘artificial’ sounds – as two distinct creative frameworks –
more closely.
‘Real’ sound occurs naturally, in the real world: it develops according to

physical laws governing the interactions between the various sounding
bodies that give rise to it. Here, there is no ‘abstraction layer,’ and the
processes of sound creation and sound reception are (to all intents and
purposes) intrinsically bound, as illustrated in Figure 1(a) on page 11.
Therefore, there are certain constraints with regard to the kinds of control a
human agent may exert on it. The nature of the physical interactions
themselves – when a violinist tilts the bow away from the bridge in order to
play more quietly, for example – may be contrived, but ultimately the
particular character of the resulting sound is the product of natural, physical,
interactions. In this respect, the way in which a ‘real’ sound develops
through time can be described as organic process.
With synthesised, or ‘artificial’ sound, exactly the opposite is true: a high

degree of precision can be directly exerted on the intrinsic characteristics of
a sound. Assuming that the composer is familiar with the techniques of the
various synthesis algorithms, then he or she notionally has absolute control
over the spectral profile of a sound at any given moment throughout the
course of its development. However, there are no physical interactions that,
directly and by virtue of the laws of physics, result in the sounding
phenomenon.89 In this respect, sound synthesis presents a direct point of
89
Even in the case of real-time synthesis (which of course was simply not available during
the historical period in question), we are not dealing with ‘physical interactions’ so much as
‘data entry’. Although the ultimate goal is, perhaps, to give the impression of direct control
65
entry into the abstract domain of encoded audio streams, offering up the
possibility of creating sound – literally – from nothing. The way in which an
‘artificial’ sound develops through time is governed by a concept – an
algorithm – which, as Wishart puts it, has ‘been generated by an electrical
procedure set up in an entirely cerebral manner.’90 The process that gives
rise to synthetic sound, as the result of predetermined abstract construction,
cannot therefore be described as ‘organic’ and would be better described as
‘architectonic.’91
It can therefore be said that ‘real’ and ‘artificial’ sounds differ primarily –
or, more precisely, are opposite – in terms of the ways in which their
intrinsic sounding characteristics can be accessed. In the case of ‘real’
sounds, the nature of the sound arises from a combination of physical
gesture and the physical attributes of the sounding bodies in question. With
‘artificial’ sounds, however, there is no physical gesture as such (although
many would argue that there is an implied gesture, or gestural
‘surrogacy’92), nor sounding bodies in the same sense, and the sounding
characteristics (or a higher-level algorithm that will, in turn, generate the
sounding characteristics) are determined directly by the composer. This
reading is directly analogous to the respective working methods of musique
concrète and elektronische Musik.
It can be seen that the process of synthesising an ‘artificial’ sound, generally

speaking, requires work at a ‘low’ level: it is necessary to define the ways in
which basic fundamental parameters interact, and in this way work up to a
(think of a synthesiser controlled by a MIDI keyboard or other such gestural controller) we

are effectively dealing with specific ways of providing an intermediate synthesis algorithm
with the correct variables. If the algorithm and data entry method are good, we may well be
convinced that we are interacting with physical sounding bodies, but ultimately changes to
the synthesis algorithm will have a far more radical effect on the resulting sound than
changes to the input data, and there is no way we can provide the algorithm with any
information that it has not been designed to receive. It is therefore true to say that the
resulting sound is essentially governed by the synthesis algorithm, and not by the physical
gestures.
90
Wishart (1986). "Sound Symbols and Landscapes". In Emmerson [Ed.] The Language of
91
We will return to the expressions ‘organic’ and ‘architectonic’ – originally proposed by
Harrison – in section 2.5.3.
92
Smalley (1996). "The Listening Imagination - Listening in the Electroacoustic Era".
Contemporary Music Review, 13(2): 77-107.
66
‘finished’ sound. In generating ‘real’ sounds, the laws that govern the
evolution of the sound’s characteristics are already physically in place, and
the performer essentially provides these sophisticated natural ‘algorithms’
with high-level parameters. This reading can, again, be regarded as a
microcosm of the proposed binary model of composition as a process whose
various aspects can be located at any point on a continuum that extends
from ‘top-down’ to ‘bottom-up.’ It can therefore also be said that ‘real’
sound – as a creative framework – is inherently biased towards top-down
methodologies, whereas ‘artificial’ sound is biased in favour of a bottom-up
approach. Such attributes can be regarded as intrinsic characteristics of the
frameworks themselves, and it is proposed that similar biases are potentially
to be found in all creative frameworks.
2.4. Musique Concrète versus Elektronische Musik
Having briefly examined the materials and working practices of musique

concrète and elektronische Musik it may be restated that the two are, in
many respects, opposite. It is worth examining these differences further, as
it will later become clear that, although musique concrète and elektronische
Musik are often regarded as ‘extinct,’ the underlying aesthetic discrepancies
remain in many instances central to contemporary electroacoustic musical
discourse and performance practice.
As already noted:
The Germans held the work of the Second Viennese School in high esteem, and
many became avowed disciples of the cause of furthering the principles of
serialism. An increasing desire to exercise control over every aspect of musical
composition led to a keen interest in the possibilities of electronic synthesis, for
such a domain eliminated not only the intermediate processes of performance
but also the need to accept the innate characteristics of natural sound sources...
This [was a] movement towards total determinism. [My italics.]93
It is therefore clear that the aesthetic standpoint of the Cologne school not
only represented a different approach to that of its Parisian counterpart, but,
by definition, sought actively to exclude it. It is logical to deduce that, in an
art-form that relies so heavily on ‘total determinism,’ not only is it necessary
93
67
to exercise a high degree of control over one’s source materials – hence the
preference for ‘artificial’ synthesis over recordings of ‘real’ sounds – but it
is also important that the precision of the compositional process be
accurately recreated in performance. After all, there is little point in
defining intricate relationships between musical parameters if these
relationships will be lost or inadequately represented in performance. It is
for this reason that the Cologne school found human performers, in some
cases, to be inadequate for the performance of their music. The practice of
elektronische Musik, therefore, only becomes possible as a direct
consequence of the new creative opportunities offered up by synthesis
technology, and can therefore be regarded as an ‘electroacoustic music’
practice according to the definition proposed in Chapter 1. Harrison makes
the following observation, with reference to certain aspects of
Stockhausen’s instrumental work:
One of the primary reasons for the emergence of elektronische Musik was the
need to be able to realise with absolute precision in the studio the kind of
serialised dynamics presumably vital to the structure of works like
Stockhausen’s Klavierstuck I. This piece, famously, features a simultaneously
struck nine-note chord containing five different dynamic levels – a fairly
unrealistic demand on any pianist; but if it cannot be accurately performed, the
work becomes, in a very real sense, unintelligible, as the measurements
between the five dynamics cannot be made aurally (perceptually).94
This, of course, supports the suggestion that the degree of compositional

accuracy demanded by certain composers is simply unattainable by non-
synthetic means. Generally speaking, it is true to say that works composed
in an ‘elektronische’ manner are designed to be heard exactly as the
composer intended them, and that the Cologne school regarded the new
creative frameworks, and synthesis in particular, as an unprecedented means
of achieving this. ‘Real’ sounds – whether recorded onto a fixed medium or
otherwise – were found to be inappropriate for this way of working, as they
did not afford the composer a sufficient level of control over their intrinsic
characteristics. Stockhausen evidently discovered this in 1952 whilst
working – at the hospitality of a certain Pierre Schaeffer – on his Étude,
whose compositional procedure he describes as follows:
94
68
First, I recorded six sounds of variously prepared low piano strings struck with
an iron beater (tape speed: 76.2 cm per second)… With scissors [I] cut off the
attack of each sound. A few centimetres of the continuation, which was –
briefly – quite steady dynamically, were used. Several of these pieces were
spliced together to form a tape loop, which was then transposed to certain
pitches using a transposition machine… I then chose, according to my score,
one of the tapes having a certain sound transposition, measured the notated
length in centimetres and millimetres, [and] cut off that length… Next, I chose
another prepared tape, measured and cut off a piece, and spliced it onto the
previous piece. Whenever the score prescribed a pause, I spliced a
corresponding length of white tape onto the result tape… Upon hearing two
synchronized layers, and even more so hearing three or four layers, I became
increasingly pale and helpless: I had imagined something completely different!
[…] Anyway – on this CD released in 1992 – the world can now hear my
concrète Étude of 1952, which for many years I had presumed lost until I
finally found it again in a pile of old tapes.95
This quotation is interesting in several aspects. Firstly (and as a slight

aside), in describing the work in question as ‘my concrète Étude,’
Stockhausen appears to suggest that any composition using ‘real’ recorded
sounds as its source material qualifies as a piece of musique concrète purely
on that basis. Nowadays (although this misconception is still frequently
iterated) it is widely acknowledged that the situation is more concerned with
the method of working as opposed to the materials, but it is worth
considering that in 1952 research into such aesthetic matters was at a very
early stage of its development. It seems likely that many of the aesthetic
differences associated with elektronische Musik and musique concrète are
the result of years of subsequent research, and therefore care must be taken
not to invoke this knowledge anachronistically.96 It is, in part, for this reason
that alternative (and in a sense more generic) terminologies will be proposed
in section 2.5, and it may already be clear that, despite his use of an
inherently top-down framework (‘real’ sound sources), in composing his
‘concrète Étude,’ Stockhausen was actually working in a characteristically
‘bottom-up’ manner. Secondly, it is clear that Stockhausen, in constructing
this piece, was working from a predetermined score (a facsimile thereof is
provided in the source previously cited97) and therefore had in mind a
95
Stockhausen-Verlag: 3). Quotation in accompanying booklet: 95-97.
96
Indeed, much of Schaeffer’s work seems to have been directed toward the development
of a systematic categorisation (or solfège) of ‘real’ sounds, a practice that, paradoxically,
might nowadays be regarded as more characteristic of the Cologne school.
97
Stockhausen-Verlag: 3): 97-100.
69
precise structure that he wished to articulate with the sonic material. Note
also the sense of ‘scientific precision’ embedded in the language that
Stockhausen uses, for example in stating (perhaps superfluously?), ‘Tape
speed: 76.2 cm per second.’ Thirdly, it is evident that the use of ‘real’
sounds for this purpose was, to a certain extent, unsuccessful, as
Stockhausen acknowledges that the results were not as he had anticipated.
This might be regarded as material evidence of the suggestion that ‘real’ and
‘artificial’ sounds, as a direct result of the processes necessary to facilitate
their existence, inherently lend themselves to differing compositional
approaches.
Arguments in favour of the ‘elektronische’ method of composition often

centre on claims to ‘objective truth,’ and composers accordingly seek to
engage structures that can be demonstrated to transcend the subjective
interpretation of the individual, that is, things which are irrefutably ‘true.’
This position was articulated by Eimert who, in the formative years of
elektronische Musik, cited ‘scientific fact’ as a justification for the aesthetics
associated with the Cologne school:
In electronic serial music… everything to the last element of the single note is
subjected to serial permutation… Examination of the material invariably leads
one to serially ordered composition. No choice exists but the ordering of sine
tones within a note, and this cannot be done without the triple unit of the note.
A note may be said to ‘exist’ where elements of time, pitch, and intensity meet;
the fundamental process repeats itself at every level of the serial network which
organizes the other partials related to it… Today the physical magnification of
a sound is known, quite apart from any musical, expressionist psychology, as
exact scientific data.98
To paraphrase: it can be scientifically proven that the physical nature of

sounds demonstrates finite, indisputable, relationships between certain
fundamental parameters (time, pitch, intensity). The serial procedures by
which composers organise musical materials are based on these
scientifically observed relationships and therefore ‘cannot be wrong.’ Such
claims relate to the innate need for (perhaps all) composers to differentiate
their work from the purely arbitrary, and rigorous structuring principles such
as serialism represent an effective means of achieving this goal (despite
98
Eimert (1955). Cited in Manning (1993). Electronic & Computer Music (Oxford; Oxford
University Press): 46-47.
70
those that might argue that the end results are musically unsatisfying).
Ultimately, the use of objective (or perhaps ‘super-human’) structuring
principles is, quite simply, one way to justify compositional decisions: the
composition is ‘good’ because it is demonstrably based on ‘the truth.’
This method of working, however, also has its drawbacks. It forces the
composer into working with ‘known quantities,’ which can have a limiting
effect on the results attainable. As a simple example, working strictly within
the twelve semitones defined in the chromatic scale, it would be impossible
to realise the clarinet glissando at the start of Gershwin’s Rhapsody in Blue
as it is usually performed. This manner of composition also burdens the
composer with the responsibility of determining every last aspect of the
sonic development. A laborious task indeed, and one which reportedly often
yielded musically unsatisfying results. It can be argued that an appreciation
of the works cited previously depends on the listener’s ability to apprehend
the ‘super-musical’ structures articulated by the sound material. If this does
not happen, then the listener is left in a state of confusion:
There appears to be a considerable discrepancy between postulation [what is

composed] and reception [what is perceived by the listener], a discrepancy
which must be of the very nature of the new art form… in that nothing
pertaining to electronic music is analogous to any natural existent phenomenon
of traditional music, associations have to be evoked from elsewhere. Instead of
being integrated, they remain an ever increasing conglomeration of mentally
indigestible matter.99
A ‘discrepancy between postulation and reception’ implies that structures

conceptualised by the composer are not always reflected in what the
audience perceives.100 This indicates a mismatch between objective schema
and the subjective evaluation of the resulting musical (sonic) phenomena. In
this context, the following quotations are of interest:
No evaluation of the musicality of sounds can be made on the basis of […] its
spectrum. A sound may possess a haphazard spectrum lacking in meaningful
information as to how its component frequencies are ordered: this does not
necessarily mean that its presence in a composition is not the result, distinct and
99
Stuckenschmidt (1955). Cited in Ibid.: 47.
100
There is also, of course, a sense in which the quotation implies a cognitive difference
between ‘new’ (electroacoustic) music and traditional instrumental music.
71
removed from any physical reality, of sophisticated and highly meaningful
musical construction.101
A spectrogram is a type of literal spectral analysis at a chosen visual resolution:

at too high a resolution detail becomes lost in a blur; at too low a resolution
there is insufficient detail. But a sonogram is not a representation of the music
as perceived by a human ear – in a sense it is too objective. 102
Here, both Berenguer and Smalley suggest that objective (or ‘measurable’)
and subjective (perceptual) aspects of sound are fundamentally different in
nature. A sound that is ‘well organised’ in terms of the parameters that
represent it, does not necessarily equate to a sound that is pleasing to the
ear. Nor will the ‘well organised’ nature of the sound’s constituent
parameters necessarily be perceived by listeners ignorant of the processes at
work. To put it another way, the arbitrary superimposition of a ‘lattice’ of
discrete values onto continuously variable and interrelated parameters is
exactly that: arbitrary. In most cases it serves a functional, and often visual,
purpose – to observe the frequency content of a sound recording or visualise
the contour of a melody line for example – but it is erroneous to assume that
systematic structuring of such representative ‘secondary’ data will be
reflected in a ‘well organised’ sound on resynthesis. In short, a piece of
music that is based too heavily on ‘objective’ structuring principles will not
necessarily be satisfying subjectively. On the other hand, there are those
who would argue that a piece of music devoid of such objective structuring
principles is not a ‘composition’ so much as an arbitrarily and whimsically
assembled collection of unrelated noises.
It can be seen that the principles of musique concrète and elektronische

Musik differed considerably on a very primary, and potentially
irreconcilable, level. While devotees of musique concrète were committed
to finding a place in musical discourse for the innate characteristics of
natural sound sources, practitioners of elektronische Musik actively sought
to escape the imprecise nature of ‘real’ sounds in favour of synthesis
techniques more conducive to their agenda. As outlined in the preceding
101
Berenguer (1998). "Music-space". In Barrière and Bennett [Eds.] Composition /
Diffusion in Electroacoustic Music: 220.
102
2(2): 108.
72
sections, what might at first appear to have been a trivial and
inconsequential difference increasingly seems to suggest the existence of
two deeply contrasting world-views (there is no reason, after all, to suppose
that these differences manifest themselves only in matters of musical
composition) whose a priori assumptions are in many ways diametrically
opposed. Indeed this fundamental difference is embodied, to a certain
extent, in the very nature of ‘real’ and ‘artificial’ sounds, the archetypal
materials of choice for composers of musique concrète and elektronische
Musik, respectively.
Despite the undeniable – and, in many respects, fundamental – differences

between musique concrète and elektronische Musik, there is an important
sense in which they were similar: both sought explicitly to explore the
creative opportunities uniquely afforded by new audio technologies
(creative frameworks). Both praxes existed within the abstraction layer
brought about by the ability to encode auditory events into static and
transitory streams, as illustrated in Figure 1 (page 11), and as such, could
not have been realised by any other means. Additionally, these praxes did
not make use of the new technologies in a merely functional manner: both
appropriated the frameworks as a means to exploring previously
inaccessible and uncharted musical territories. Elektronische Musik sought
to construct relationships between the objective parameters of sound: this, of
course, only became possible with the advent synthesis technology. Musique
concrète adopted a more perceptual approach that, nonetheless, could not
have been achieved in the absence of sound objects statically encoded onto
physical recording media. It can therefore be stated that both musique
concrète and elektronische Musik fall into the broader category of what is
now known as ‘electroacoustic music,’ as defined in the previous chapter.
2.5. ‘Top-Down’ and ‘Bottom-Up’ Compositional

Models
At this point it is appropriate to explain the very broad notion of ‘top-down’

and ‘bottom-up’ models of musical thought in more detail. It is hoped that
these terminologies, as the respective poles of a hypothetical continuum,
73
may prove useful as a means of evaluating the essential nature of a wide
variety of art-music praxes, whether electroacoustic or otherwise. The
situation of musical attitudes within this continuum will be referred to as
‘aesthetic directionality.’ Taking what has already been discussed as a
starting point, it can be said that musique concrète, and those schools of
compositional thought regarded as having inherited from it, represent top-
down approaches to the creative process, while elektronische Musik and its
subsequent counterparts exist closer to the bottom-up end of the spectrum.
Musique concrète, recalling our simplistic definition, essentially takes

naturally occurring sound as its source material; sound objects that might be
described as ‘individual elements, stamped with [their] own gravitation
values, possessing [their] own internal atomic cohesion.’103 The
compositional process, in this case, takes this given ‘atomic cohesion’ and
builds ‘down’ from it until a musical structure, and eventually a finished
composition, is realised. In contrast elektronische Musik, with its synthetic
means, poses the problem of the (literally) blank canvas. Faced with this
problem, composers are required to construct their own ‘atomic cohesion’
from the bottom up, and it is to this end that much (if not all) of the
compositional energy is directed.
As discussed in section 2.1, the distinction between top-down and bottom-

up modes of musical thought is not a new concept, in abstract terms. The
following sections seek to collate some of the various literary invocations of
dualisms that, it is proposed, are collectively indicative of a single
underlying dichotomy in the context of electroacoustic music.
2.5.1. Invention versus Creation; Building versus

Construction
Savouret eloquently articulates the top-down/bottom-up dichotomy by

invoking various binarisms:
103
Barrière and Bennett [Eds.] Composition / Diffusion in Electroacoustic Music: 236.
74
Creation implies that something is made from nothing, that starting from
nothing one arrives at something [recall the ‘blank canvas’ mentioned
above]. The creator-composer brings about a musical construction, which
presupposes that he is working by self-imposed rules […] from material
which nowadays can be created […] by means of synthesis. Construction
is a project. Conception and realization are distinct, finalized phases,
pointing towards a final object yet to come… Invention (invenire in Latin)
means to come in, to find, to dis-cover something that already exists.
Georges Braque distinguishes constructors, those who fill a frame, from
builders like Cézanne. The inventor-composer picks up something on the
way, something that is already there, he enters into a relationship with an
existing entity… Conception and realization work intimately together in a
perceptual bouncing back and forth in and on a chosen terrain… In such a
compositional intention there is no object lying somewhere in the future,
[…] there is no objective… In creation-composition, it is principally
studio time that is required, whereas in the case of invention-composition
it is principally community time that has to be managed. [My italics.]104
He goes on to propose the analogy of ‘figured bass’ versus ‘melody

harmonisation’:
Figured bass requires the student to construct music upon the […] lower
voice by adding three upper voices. Thus from bottom to top, the student
must fill in an empty space starting from those few sparse bass notes
whose expressive potential is often close to zero. Everything so to speak
remains to be done from scratch. The bass offers very little resistance, the
creator-to-be can create to his heart’s content… Harmonization is a very
different matter: the melody is there, and one cannot escape the upper
voice… Be it good, bad or indifferent, there is a style in the capricious
succession of notes from on high, very different to the inept mumblings of
the figured bass. In the melody, the way the notes follow upon each other
already suggest a certain possible harmony. The student, working from the
top downwards, is going to unmask what is already there, he doesn’t have
to fill in the near-void of the preceding exercise, but rather bring out,
accentuate the shape, add supplementary notes as proof of what is already
present in the song. [My italics.]105
In these quotations Savouret invokes a subtle distinction between

‘building’ (which is associated with the top-down approach) and
‘construction’ (bottom-up; where once there was nothing). Note that in
the previous quotation, one method works ‘from bottom to top,’ while
the other proceeds ‘from the top downwards.’ Savouret refers to the top-
down ‘concrète’ composer as ‘inventor-composer,’ and the bottom-up
‘elektronische’ composer as ‘creator-composer.’
104
Savouret (1998). "The Natures of Diffusion". In Barrière and Bennett [Eds.]
Composition / Diffusion in Electroacoustic Music: 348-349.
105
Ibid.: 349.
75
2.5.2. Abstracted Syntax versus Abstract Syntax
Emmerson uses the expressions ‘abstract’ and ‘abstracted’ to indicate

the differing processes by which musical dialogue is obtained in
bottom-up and top-down methods, respectively. As previously
discussed, the bottom-up composer begins with the proverbial ‘blank
canvas,’ and proceeds to define abstract principles, or algorithms,
abstract insofar as they precede the actuality of the sounds themselves
and, in and of themselves, have nothing to do with sound per se. These
abstract principles are subsequently used to define certain aspects of the
musical dialogue, from the ‘micro’ level, concerned with the intrinsic
characteristics of individual sounds, through to the ‘macro’ level, which
defines the structure of the music at a higher level. The top-down
composer, on the other hand, takes the very actuality of sounds
themselves as the starting point, performs a subjective evaluation of
their spectromorphologies, cultural implications, and so on, and uses
this as the basis for a musical dialogue. In this case the musical dialogue
has been abstracted from the observed characteristics of already existing
phenomena. In this respect the composer is, in a sense, uncovering, or
‘dis-covering’ (recalling Savouret’s words), that which is already there.
This is not composition ‘from nothing’ but, quite simply, composition
‘from something,’ with ‘something’ being concrete sounds and their
perceived qualities. Emmerson states that the use of abstract and
abstracted syntax as compositional tools are not mutually exclusive but,
rather, ‘are arbitrary subdivisions of a continuous plane of possibilities,
the outermost extremes of which are ideal states which are probably
unobtainable.’106 As described previously, it is proposed that top-down
and bottom-up ideologies demarcate a continuum, and this notion is
supported here by Emmerson.
106
Emmerson (1986). "The Relation of Language to Materials". In Emmerson [Ed.] The
Language of Electroacoustic Music: 25.
76
2.5.3. Organic versus Architectonic
Harrison differentiates top-down and bottom-up approaches to the

composition and performance of electroacoustic music by way of the
terms ‘organic’ and ‘architectonic,’ respectively, using Stockhausen’s
Kontakte as an example of how these two paradigms overlap and
interact throughout the composition/performance/reception process and
in doing so reinforcing much of what has already been discussed:
The apparent need for ‘objective justification’ of musical utterance,

exemplified by the threads of analysis and ‘measurement,’ is one of the
central creeds of Western art music… The high modernist agenda of
serialism (of which elektronische Musik was, interestingly, a part) was heir
to this tradition and continued the prevailing view that the ‘text’ of the
score, amenable to ‘out of time’ analysis, was the ‘true’ representation of
the composer’s thoughts because it allowed for more accurate
measurement of the distances between musical events… Musical events
have no intrinsic interest; they exist primarily to articulate the distance
between them, on the measurement of which rests the notion of ‘structure.’
This seems to be evidence of what I call ‘architectonic structure’ and is
diametrically opposed to the ‘organic structure’ generated by the materials
and compositional strategies of musique concrète… Despite the rigour and
complexity of its concept, Kontakte was evidently assembled by ear,
Stockhausen making countless experiments in the studio, testing the
appropriateness of each ‘moment,’ modifying his intentions in the light of
what he heard and selecting only those sonic results which worked
perceptually [my italics] in a structure which evolved into its present form
during the process of composition, rather than being preplanned… Not
withstanding its impeccable elektronische Musik credentials in its
synthesis method, I would argue that Kontakte can therefore be considered
a classic piece of musique concrète… How was it that such an (allegedly)
concept-oriented composer could be satisfied with an (apparently)
arbitrary process of structuring a work, and: how do we reconcile the
original compositional intent (concept, poiesis) with what we hear when
we listen to the actual work (percept, esthesis)? There is a strong
implication ([…] still to this day underpinning the very basis of much
computer music and algorithmic composition) that if the conceptual
backdrop is sufficiently strong, then a good piece is virtually guaranteed.
Yet this is contradicted by Kontakte… This is […] indicative of the (at
least equal) importance of percept alongside concept in composition.107
In thus describing the distinction between ‘organic’ and ‘architectonic’

structuring processes, Harrison clarifies another aspect of ‘what is top-
down’ about the top-down approach to composition. One might regard
one’s empirical (emotional or ‘involuntary’) response to an auditory
stimulus as ‘given,’ insofar as the response has not been preconceived;
107
77
it has occurred just as naturally as the characteristics of the sound that
evoked it. In the top-down model, this response does not require any
objective justification: it is irrefutably ‘true’ on the basis that it was
experienced; this is the justification. We might refer to this as the
experience of ‘subjective truth’ in a musical context (as opposed to the
more familiar expression ‘objective truth’), and it is this process that
Harrison cites as having been an important aspect in the composition of
Stockhausen’s Kontakte. When musical material is assembled ‘by ear’
in this way, what is ‘true’ (or, one might say, what is ‘right’) is
determined perceptually rather than conceptually, or to put it another
way, subjectively rather than objectively. That is, the subjective response
to compositional materials – as ‘given’ rather than ‘preconceived’ –
becomes the ‘top’ (the ‘actuality’) from which musical structure can be
built ‘downwards.’ Harrison refers to this kind of structuring process as
‘organic.’ Antithetically, if musical materials are organised ‘from the
bottom up’ according to an abstract preconceived structure, then
subjective responses are secondary, a mere happenstance of the
objective truth embodied in the ‘text,’ and are sometimes to be avoided
altogether.
Harrison’s suggestion that Kontakte is a piece of musique concrète (as

opposed to a piece of elektronische Musik) also articulates a sense in
which (just as ‘beauty is in the eye of the beholder’) the nature of music
is ‘in the ear of the listener.’ As an obvious disciple of the top-down
model, it is not surprising that Harrison seems to evaluate musical
works in terms of their ‘top-down’ (that is, perceptual) characteristics,
regardless of whether these played an important role in the composition
of the work. Conversely, one might equally seek to evaluate a ‘top-
down’ work in terms of its apparent ‘bottom-up’ (objective)
characteristics, irrespective of the fact that these were not of any major
concern to the composer. Thus, top-down and bottom-up approaches are
just as much acts of listening as they are compositional acts.
78
2.5.4. The Relationship between Text and Context
As discussed in section 2.4, bottom-up works tend to be conceived in a

manner that necessitates precision performance: in a compositional
aesthetic that relies on deterministic structures, it is important that the
structural relationships be recreated clearly and accurately for the
audience, and it is equally important that no subjective ‘interpretation’
should cloud the objective nature of the music. This has strong
implications for performance practice, for the presentation of any text
(such as a piece of music) must invariably take place within a certain
context: in a particular venue, with particular performers, using
particular equipment, and for a particular audience. It follows that in the
performance of the bottom-up work, which is ‘monolithic’ and has only
one ‘correct’ performance, the context must be brought into line with
the text: everything must be engineered to minimise ‘inaccuracies’ and
‘errors’ within the performance. In performing the top-down work,
however, it is the text (the music) that must be shaped to fit the context,
the music having been realised according to subjective truths, which are
of course infinitely plural and highly context-sensitive. This idea is
proposed by Savouret:
I would now like to turn to creation-based [bottom-up] composition… The

text is created from nothing: from it all things flow, and it must be
considered as imperative. Here, the context in which it is projected is
negligible, and may even be seen as totally subjugated. Contrary to
invention-based [top-down] composition, it is the context that must
submit: nothing is less certain nor more straightforward. [My italics.]108
The impact of top-down and bottom-up approaches on the performance

(diffusion) of electroacoustic music, with further development of
Savouret’s observations, will be discussed more fully in Chapter 3.
Crudely speaking, in the performance of bottom-up music, the singular
truth of the musical text must be brought to bear upon a context that can
be contrived to suit, whereas in top-down music the actuality of the
(particular) performance context is brought to bear upon the (infinitely
108
Composition / Diffusion in Electroacoustic Music: 351.
79
plural) musical text. The precise means by which this is achieved will
also be discussed in Chapter 3.
It is proposed that all of the binarisms described in the previous sections –

some of them more specific than others – point to the same underlying
concept: a fundamental difference between top-down and bottom-up
musical philosophies. This dual paradigm can be summarised as follows. In
top-down composition, artistic justification is sought subjectively, that is,
the ‘truth’ of the music is to be determined by the human process of
perception. In this way, subjective truth can be revealed. Contrastingly,
bottom-up composers typically seek to justify their work with reference to
objective truths, or (pseudo-) scientific facts, which are beyond the
fallibility of subjective opinion or, one could say, super-human. As
compositions realised from the bottom up tend to be exacting in nature, it
follows that their performance should take the form of a transparent
realisation, and this explains the need to eliminate the possibility of ‘errors’
being introduced by performers, or by any other intermediate factors for that
matter. There can be only one ‘correct version,’ and the context in which the
work is performed must be engineered in such a way that the essence of the
piece, as a singular architectonic object, can be perceived. On the other
hand, top-down composers, having fundamentally based their work on the
process of subjective perception, tend to positively embrace the notion of
interpretation, endorsing the idea that their work should be in some way
‘appropriated’ by the performer, or otherwise ‘submitted’ into the context.
In this respect, the top-down work can be regarded as ‘plural’ (different
according to context), while the bottom-up work is essentially intended to
be absolute, or ‘monolithic.’ Alternatively, one might describe the top-down
work as always being ‘relative’ to its context (the performance venue, the
performer, the audience, et cetera), while the bottom-up work has been
determined – ‘where once there was nothing’ – by the composer, and is
therefore absolute, and not ‘relative’ to anything. This is an analogue of the
compositional process itself (as illustrated using the examples of
elektronische Musik and musique concrète): the structure of bottom-up
works is determined by prefabricated, absolute, abstract relationships (as in
80
serialism for example), while the structure of top-down works unfolds
relative to the phenomenal nature of already-existing material, and in
subjective response to it.
Acousmatic music, described by Harrison as follows, represents a

contemporary incarnation of the top-down model of musical thought:
Acousmatic music on the whole continues the traditions of musique concrète

and has inherited many of its concerns. It admits any sound as potential
compositional material, frequently refers to acoustic phenomena and situations
from everyday life and, most fundamentally of all, relies on perceptual realities
rather than conceptual speculation to unlock the potential for musical discourse
and musical structure from the inherent properties of the sound objects
themselves – and the arbiter of this process is the ear. [Harrison’s italics]109.
This is clearly indicative of a top-down approach because the compositional

process is based on ‘perceptual realities rather than conceptual speculation.’
The trouble is that devotees of the opposing bottom-up approach would
probably argue that it is subjective perceptions that are speculative, and that
objective concepts are ‘reality.’ So what we are dealing with, really, is a
dispute regarding how, exactly, we determine musical ‘truth.’ Is it to be
found subjectively, or to put it in its most basic terms, ‘If it sounds good and
feels right,’ or is it to be found in what we consider, objectively, to be true –
in the mathematical nature of sound for example? By extension, is a piece of
music to be regarded as a singular, monolithic, entity, or as a ‘starting point’
from which plural understandings might emerge relative to the many
contextual factors that may impact on its reception? Many composers
demonstrate an awareness of the relative benefits of both possibilities:
There are two, apparently opposing, schools of thought. In the first, the choice
of music must fit in with the characteristics of the venue and must be right for
the audience… The second school of thought, however, believes that a strong
well-made musical product will surmount all obstacles and survive in all
conditions. I subscribe to both schools of thought.110
109
Harrison (1999). "Keynote Address: Imaginary Space - Spaces in the Imagination".
Australasian Computer Music Conference (Melbourne; 4-5 December 1999).
110
Barrière (1998). "Diffusion, the Final Stage of Composition". In Barrière and Bennett
[Eds.] Composition / Diffusion in Electroacoustic Music: 205-206.
81
Having dined at both tables, I am not denying that the two compositional
attitudes are complementary…111
This is […] indicative of the (at least equal) importance of percept alongside
concept in composition.112
Nonetheless it is also true to say that such composers will tend to express a
specific affinity with one of the two philosophies:
In preparing a concert, it is necessary, I believe, to begin by studying the hall…

The hall’s acoustic type must be noted: dry or reverberant; the size and volume
of the space; its geometrical form; lighting, colours, the surface materials of the
walls and the seats, the general atmosphere of the place [in other words, the
context is considered before the text].113
…however, I do now understand better why I prefer the melody harmonization

[Savouret is referring to the top-down ethos].114
…and here I declare my acousmatic [and, as previously discussed, top-down]

allegiance…115
The terminologies used to describe the opposition of top-down and bottom-

up philosophies are varied – ‘two apparently opposing schools of thought,’
‘figured bass versus melody harmonisation,’ ‘acousmatic music versus
computer music or algorithmic composition’ – but it is proposed that these
are all different interpretations of the same, fundamental, underlying
dichotomy. Clozier neatly encapsulates the notion of two opposing musical
ideals as follows:
Either the music is of a type that will not tolerate even the slightest variation
without undergoing a negative change to its very essence, every instant having
to be identical to the original model, or else it is of a type that allows itself to be
enriched through interpretation and communication.116
Let us return briefly now to one of the technological defining characteristics

of the electroacoustic idiom – the encoded audio stream – and pose the
111
112
113
Barrière (1998). "Diffusion, the Final Stage of Composition". In Barrière and Bennett
[Eds.] Composition / Diffusion in Electroacoustic Music: 205.
114
115
116
82
question, ‘What exactly is it that we are encoding?’ One answer might be as
follows: the encoded audio stream is an abstract representation of real
auditory events that contains precise information regarding the spectral
content of the original phenomenon, and how this evolves over time; it is
these objective data that are encoded. Alternatively: the encoded audio
stream is an abstract representation of real auditory events that allows their
sounding characteristics, and the ways in which we perceive them, to be
manipulated; encoded within the streams is the potential for real, perceived,
auditory experience. Clearly, both statements are true. The implication,
however, is that bottom-up and top-down practitioners each recognise
different potentials in the nature of encoded audio streams. The top-down
practitioner regards the encoded audio stream as an abstraction of the
perceptual qualities of the original auditory event, whereas the bottom-up
practitioner regards it as an abstraction of the objective parameters that
describe the auditory event.
Table 3, below, summarises the characteristics of top-down and bottom-up

philosophies schematically.
83
Top-Down Bottom-Up
Human Super-human
Subjective Objective
Composed to be Composed to be
interpreted/appropriated realised/disseminated
Realist (pragmatic) Idealist
Plural Absolute (monolithic)
Relativistic Deterministic
Corporeal (physical) Cerebral (intellectual)
Empirical/perceptual Logical/conceptual
Qualitative Quantitative
Phenomenological Rational
Built/invented Constructed/created
Organic Architectonic
Abstracted forms Abstract forms
Text submits to context Context submits to text
Encoded audio streams are Encoded audio streams are
abstractions of the subjective abstractions of the objective
(perceptual) qualities of real auditory structures that define real auditory
events events
Table 3. A brief summary of the
opposing characteristics of ‘top-
down’ and ‘bottom-up’
approaches to musical discourse.
2.6. Examples of Top-Down and Bottom-Up

Approaches in Contemporary Electroacoustic
Music
Thus far, although passing reference has been made to more current work,
top-down and bottom-up paradigms have largely been explained with
reference to elektronische Musik and musique concrète which, as previously
suggested, are now widely accepted as extinct in their pure form. It has also
been proposed that the aesthetic standpoints engendered by both praxes
present characteristics that remain in evidence to this day. A few choice
examples will therefore be helpful.
Acousmatic music, described very briefly in the previous section, has

already been cited as a contemporary example of the top-down approach to
composition. Accordingly, one would expect composers within this genre to
84
hold an interest in the perceptual, subjective realities of sound and its use
within a musical context, and perhaps to be less interested in the abstract,
objective schema that might be used to determine the musical dialogue. The
following quotation is an extract from Simon Emmerson’s description of his
tape-only work Points of Departure (completed 1997):
The sonic resources are all from the harpsichord and kayagum, from single
pitches to clusters and resonances. A characteristic of both these instruments is
their sharp attack and decay morphology. Indeed their playing techniques both
suggest a struggle to extend this short event, to project it (diffuse it even), to
prolong its sweetness as long as possible. From the toccata tradition of the
harpsichord, the many subtle vibrato types of the kayagum and the arpeggio
and tremolo flourishes of both we see this extension of simple linearity into
denser textures – although always contrasted with the simple and often isolated
plucked sound. As a kind of ‘replacement’ of these performance techniques, all
the most obvious types of electroacoustic sound extension have been used:
granular sampling, time stretching, reverberation, fast reiteration, all in many
variants and combinations.117
Compare this with Michael Obst’s description, taken from the same
publication, of Solaris for ensemble and live electronics:
Solaris is based on a science fiction novel by the Polish author Stanislav Lem.
It seemed logical to include mathematical principles in both the composition
and in the live electronics in order to construct a specific musical atmosphere.
The compositional structure, sound treatment and sound spatialization were to
provide a special ambiance for the story itself. The opera consists of three parts,
each of which focuses on one mathematical principle: in the First Act stochastic
treatment, in the Second Act interpolation […] and in the Third Act
mathematical functions (like the sine or the exponential function).118
In Points of Departure Emmerson seeks to express something of the

essential nature of two musical instruments – the harpsichord and the
kayagum – in terms of their cultural heritage, performance practice, and
perceived sounding characteristics. His source materials consist of various
recordings of each instrument being played, and descriptions are given of
the morphologies of these sound objects, ‘sharp attack and decay’ for
example. Although Emmerson mentions several audio processing
technologies (granular sampling, time stretching, et cetera), no mention is
made of how the parameters fed to these various processing algorithms were
117
Emmerson (1998). "Intercultural Diffusion: 'Points of Continuation' (Electroacoustic
Music on Tape)". In Barrière and Bennett [Eds.] Composition / Diffusion in Electroacoustic
Music: 283.
118
85
devised; no mathematical principles relating numbers and functional
processes to each other are described (compare this description, for
example, with Stockhausen’s account of his Studie119). On the contrary,
Emmerson’s description of his compositional approach suggests that the
methods used were grounded firmly in the perceptual characteristics of each
of the instruments’ particular sounds. Stated simply, it can be suggested that
Emmerson took various phenomenological aspects as ‘given’ (the
characteristic sounds of the harpsichord and kayagum, and the respective
cultural heritages of each, for example), and worked ‘downwards’ from
these in order to arrive at the finished work. It can be deduced that
compositional decisions were made on the basis of perceptual or subjective
realities, rather than having been made with reference to a predetermined
‘set of rules.’ Being so firmly grounded in the actuality of sound, it is
therefore possible – even only on the basis of the short quotation given – to
imagine (to a certain extent) what the perceptual (sounding) qualities of the
sound materials might be like.
Obst’s working practices – and again, this can be deduced from only a very
short quotation – are demonstrably different. In contrast with Emmerson,
the compositional constituents of Solaris are abstract mathematical
principles. Obst describes the numerical relationships between the
parameters of the various processing algorithms in some detail (recall
Stockhausen’s ‘tape speed: 76.2 cm per second’), for example:
Here [are] the boundary conditions for the flanging and the pitch-shift:
Flange: Frequency of the sine 0.25 Hz <-> 0.33 Hz, Feedback 0.8;
Pitch-shift: Delay 5 ms, Transposition: -64 cents <-> +64 cents. Feedback
0.75.120
Recalling section 2.2.2, Obst’s use of the expression ‘boundary conditions’

strongly suggests an approach to the compositional process comparable to
that adopted by practitioners of elektronische Musik, and reference to
119
See section 2.2.2.
120
86
numerical, mathematical, or parametric precision in general is a common
characteristic of the bottom-up compositional approach.
It would be very convenient to say that Obst makes no reference whatsoever

to the perceptual characteristics of the resulting sounding material, but he
does go on to say:
The acoustic result is a continuous ribbon of sound which changes constantly

without any periodicity. The long strings of the piano in the low register with
their many partials provide a very rich sound.121
However, from Obst’s writings it can be deduced that the

spectromorphologies of his sound materials are, if you like, a ‘by-product’
of the mathematical algorithms used to produce them (Obst implies this
himself in his use of the expression ‘acoustic result’). This is in contrast
with Emmerson’s piece, whose very starting point was based on a
perceptual evaluation of the already-existing ‘acoustic results’ of naturally
occurring physical phenomena. Note also Obst’s use of the
objective/scientific descriptor ‘many partials,’ and compare this with
Emmerson’s perceptual evaluation, ‘sharp attack and decay.’ The distinction
is subtle – indeed both expressions could plausibly be used as descriptors of
the same phenomenon – but the contrasting use of language suggests that
Obst and Emmerson each regard the encoded audio stream as an encoding
of different attributes. For Obst, the stream is an encoding of the objectively
measurable parameters of auditory events; for Emmerson, it encapsulates
the potential for a perceptual evaluation of sounds.
Taking into account everything that has already been said, it should be clear
that Emmerson’s Points of Departure represents an electroacoustic work
conceived in the spirit of top-down composition, while Obst’s Solaris shows
obvious signs of having been conceived in a bottom-up manner. It should
also be restated that it would be erroneous to assert that either piece is
‘absolutely top-down,’ or ‘absolutely bottom-up.’ For one thing, there is not
really sufficient detail as to the respective working procedures from which
to draw such an assumption. Further, it has already been noted that Obst
121
Ibid.: 322.
87
does indeed make reference to the perceptual qualities of his materials, and
it is therefore sensible to assume that at least part of his compositional
procedure may have been somewhat top-down in nature. Equally, it may
well be the case that in composing Points of Departure Emmerson also
made use of non-perceptual structuring schemes at some level, although
there is no indication of this in his text. Nonetheless, it is reasonably safe to
propose that, in terms of a continuum ranging from bottom-up through to
top-down, Obst’s piece is oriented toward the bottom-up, and Emmerson’s
toward the top-down.
2.7. Extending the Top-Down/Bottom-Up Paradigm
Having defined a context within which compositional practice may be

evaluated in terms of its position along the top-down/bottom-up axis, it
might be proposed that this paradigm in fact transcends the boundaries of
electroacoustic music and may be manifest, albeit more abstractly, in earlier
‘classical’ music repertoire (and very probably in circumstances beyond
music altogether). One might regard certain strict classical forms, such as
fugue, as indicative of the bottom-up approach to composition, whereby an
abstract set of rules is antecedent to the realisation of the musical fabric
itself. Lévi-Strauss makes the following observation:
Bach and Stravinsky appear as musicians concerned with a ‘code,’ Beethoven –

but, Ravel too – as concerned with a ‘message’…122
Clearly, a ‘code’ is an objective, abstract construct, whereas a ‘message’ is

something that is intended to be perceived. Lévi-Strauss is effectively
suggesting that Bach and Stravinsky were bottom-up composers, whereas
Beethoven and Ravel were top-down.
Certainly, there are works within the classical repertoire that seem to have
certain ‘monolithic’ aspirations, in the sense that they point toward a single
‘correct’ performance of the work. Highly prescriptive and precise
performance directions might be regarded as indicative of this approach
122
Lévi-Strauss (1964). The Raw and the Cooked: Introduction to a Science of Mythology
(London; Pimlico): 30.
88
(Stockhausen’s Klavierstuck I, already mentioned in section 2.4, might be
re-cited in this context). On the other hand, there are works whose scoring
clearly indicates, or indeed necessitates, a much stronger element of
interpretation, perhaps even inviting the performer to put him or herself
‘into’ the work. A fitting example might be Erik Satie’s Trois Gnossiennes,
(1890). These piano works are scored without bar-lines or time signatures,
and only one of them (the first) has a key signature. This suggests a non-
monolithic compositional approach and invites the performer to interpret (as
opposed to ‘objectify’) the score. Furthermore, there is a strong case for
suggesting that annotations such as ‘postulez en vous-même (wonder about
yourself)’ (No. 1), ‘sur la langue (on the tip of the tongue)’ (No. 1), ‘sans
orgueil’ (don’t be proud)’ (No. 2), ‘munissez-vous de clairvoyance (be
clairvoyant)’ (No. 3), and ‘de manière á obtenir un creux (so as to be a
hole)’ (No. 3), are performance directions that, realistically, can only be
evaluated through subjective interpretation.123
Clearly much more could be said about the top-down/bottom-up paradigm

within the context of western classical music, but such discussion would be
digressive. This section does, however, serve to illustrate an important
point: aesthetic ‘directionality’ (from the top down, or from the bottom up)
in a musical context is not a foregone conclusion of the creative frameworks
employed.
As previously discussed, it can be said that the compositional use of ‘real’

sounds is somewhat more conducive to the top-down approach, and the use
of synthesised sounds perhaps lends itself more readily to the bottom-up
approach. This, however, does not mean that any piece of electroacoustic
music consisting mainly in synthesised sounds has ‘by definition’ been
conceived in a bottom-up manner. Similarly, a piece comprising mainly
‘real’ sounds does not ‘have’ to be top-down. The fact is that the creative
frameworks are, in a sense, ‘merely’ technological constraints. Before the
advent of the technologies described in section 1.7, composers simply did
123
Satie (1999). Six Gnossiennes pour Piano (Paris; Éditions Salabert).
Three further Gnossiennes were published posthumously, hence ‘Six Gnossiennes.’
89
not have the opportunity to work ‘directly’ with their materials:
compositions, by and large, had to be written down in some form of musical
notation. In this sense one might regard western classical notation as a
compositional means, a framework with certain peculiar attributes, or even
as a ‘technology’ of sorts. Some might argue that such notational syntaxes
are, of themselves, bottom-up constructs (pitch, rhythm, timbre, and
dynamic are, in the vast majority of cases, ‘latticed’) and as such would tend
to spawn bottom-up compositions; Harrison speculates that, ‘probably most
[western classical] music before electroacoustic music was architectonic
[bottom-up] almost by definition.’124 This argument certainly has some
validity, but is challenged to a certain extent by the suggestion that top-
down and bottom-up models of musical thought predate the specific
frameworks of electroacoustic music.
It is worth reiterating that every creative framework, by virtue of its

particular characteristics, exerts certain unique pressures upon its user, and
that such pressures can, of course, include a bias toward either top-down or
bottom-up approaches to the use of the framework in question. Rephrasing
Harrison’s statement, it can certainly be argued that western classical
notation is a framework that is inherently biased in favour of bottom-up
approaches to the compositional process. However, we must also consider
the fact that composers can only work within the creative frameworks
available to them. In the absence of more conducive frameworks, the top-
down composer (for example) has no choice but to ‘work against’ the
inherent directionality of a less-than-ideal framework. As noted by Wishart:
It is of course possible to subvert the various systems but it is a struggle against

the design concepts of the instrument or software [read: creative framework].125
Outside of the strictly electroacoustic context, therefore, Clozier’s words

(previously cited towards the end of section 2.5) still ring true:
Either the music is of a type that will not tolerate even the slightest variation
without undergoing a negative change to its very essence, every instant having
124
125
Wishart (1996). On Sonic Art (Amsterdam; Harwood Academic): 27.
90
to be identical to the original model, or else it is of a type that allows itself to be
enriched through interpretation and communication.126
Satie has been cited as a composer whose attitudes seem to have embodied
the top-down approach to composition: outwardly, the Gnossiennes seem to
enshrine ‘subjective realities’ rather than ‘objective realities.’ In many ways
the scores force the performer to ‘enrich through interpretation.’ It can be
deduced that in composing in this way, Satie had to work somewhat ‘against
the grain’ of the (notational) frameworks available to him, and indeed these
pieces clearly exhibit a deliberately subversive attitude towards these same
means. In this sense it can be seen that composers working within this
notational paradigm might experience more difficulty in composing ‘from
the top down’ than composers working ‘from the bottom up,’ and this
clearly illustrates the directional bias inherent within this particular
framework.
2.7.1. Electroacoustic Music as a means to the

Potential Unification of Top-Down and
Bottom-Up Models
It can now be stated that, in electroacoustic music, the creative

frameworks present equal opportunities for both top-down and bottom-
up methods, and this has arguably not been the case at any other time in
the history of western classical music. As a result of the technologies
described in section 1.7, composers have direct access to auditory
events in an abstract (encoded) – and, if necessary, static – form. This
allows composers to work directly with their materials, as opposed to
working via some kind of notational analogy.
Of course, the way in which composers choose to work directly with

their materials can essentially be either top-down or bottom-up, with
neither approach presenting significantly greater compositional
difficulties than the other. This fact can be attributed to the dual nature
of the encoded audio stream, which can at once be regarded as an
126
91
abstraction of objective and/or perceptual information. Furthermore, and
as a direct result of this, combined approaches are possible. Some
aspects of a composition might be realised in a fundamentally
‘perceptual’ manner, while other aspects of the same composition might
be the result of more ‘objective’ schemes. Indeed we might regard this
possibility as one of the unique characteristics of electroacoustic
creative frameworks (i.e. audio technologies). The ability to statically
encode sound onto a recording medium and work directly with recorded
materials, whether real or synthesised, is a characteristic of current
compositional means that lends itself easily to top-down modes of
composition. Simultaneously, the simple fact that algorithms of any sort
(whether manifest in analogue circuitry or digitally coded software)
must usually at some level be provided with parameters (ergo
necessitating a denumerable set of discrete values) is a characteristic of
current compositional means that lends itself more readily to
composition from the bottom up. So, while it would be untrue to suggest
that the top-down/bottom-up paradigm sprang into existence only after
the advent of electroacoustic music, it would certainly be true to say that
this continuum can be more fully explored – in a musical context – than
ever before, by electroacoustic means. This observation was supported
by Harrison, in a personal interview with the author, as follows:
JM: [Do] you think that the binarism between the ‘more-or-less organic’
[top-down] and ‘more-or-less architectonic’ [bottom-up] ways of thinking,
has existed for a long time in music?
JH: Well it has. But the boundaries were moved when you became able to
capture real-world sounds that existed beyond the ‘normal’ frame in which
music happened, I think. […] That pushes the boundaries suddenly,
markedly, away from where they were. So yes, there’s always been a
range of composers operating, from the very intellectualised, to the more
improvisatory (say Chopin or Liszt). You can situate composers
somewhere along that axis, but until you get beyond having to use either
notation and/or instruments designed for the purpose to reproduce the
sounds you want – i.e. until you get recording – it seems to me that the
boundary can only go so far; it’s fixed. The minute you can record stuff
that goes beyond that, then, ‘Whoosh!’ the boundary shoots back about a
million miles!
JM: So, in a sense, the advent of recording [and, by implication, the other
technologies discussed in section 1.7] has opened up the full possibilities
of that spectrum between the ‘organic’ [top-down] and the ‘architectonic’
[bottom-up]…
92
JH: I would say so. […] [Before that,] you had to specify what it was you
wanted from [instrumental players], which meant that you were, of
necessity, tending towards an architectonic thing. However improvisatory
your aesthetic might be, you still have to write down that you want this
note played at this level, for this long, on this instrument [laughs]. To go
with this other note, this loud and for this long, on this other instrument!
Et cetera. […] So, yes, I think it’s always been there, it’s just that the
degree of it is much more marked, this binary thing.
JM: Absolutely. But I think that this is something that’s not often
acknowledged in the literature. In electroacoustic music it almost seems
like the implication is that now we are doing things that have never been
done before – which is obviously true, to a certain degree, in that there are
things that it is now possible to do that it wasn’t possible to do before –
but perhaps in the way people, historically, have been thinking musically,
there may be more similarities than the literature acknowledges?
JH: …But it is now very much more extreme, I think.127
To summarise: it can be proposed that the continuum between top-down

and bottom-up philosophies, in a musical context, predates the existence
of electroacoustic music. However, pre-electroacoustic creative
frameworks have, by their nature, always tended to limit the scope of
this continuum because composers have only ever had indirect access to
auditory phenomena, via performers. In the context of bottom-up works
(as described in section 2.2.2), this limits both the framework within
which architectonic relationships can be defined, and the accuracy with
which those relationships that are possible can be articulated. In the case
of top-down approaches, the creative process is restricted in a different
manner, but – paradoxically – by the very same factors. Because the
musical discourse must be realised by performers, so the performers
must be given some indication of what is to be performed. In practical
terms this means that a system of differentiated performance actions
must be defined (or, to re-cite Harrison, ‘you want this note played at
this level, for this long, on this instrument’), and this is somewhat
contrary to the top-down ideal. Such restrictions are far less in evidence
in electroacoustic music – owing to the direct access that composers
have to their materials – and therefore both top-down and bottom-up
approaches to composition can be explored more fully. This being the
case, it is important that means be made available for the performance
127
93
(diffusion) of, potentially highly contrasting, top-down and bottom-up
works. This will become a criterion in the evaluation of sound diffusion
systems in Chapter 4.
2.8. Summary
The purpose of this chapter has been to focus on the more specifically
aesthetic concerns of electroacoustic music and, in doing so, add a further
dimension to the mainly technological perspective offered in Chapter 1. The
following conclusions are apparent:
2.8.1. Top-Down and Bottom-Up Approaches to

As a starting point, the historical praxes of musique concrète and

elektronische Musik have been examined. These contrasting praxes took
place within a common technological framework – broadly speaking,
both made use of the technologies outlined in section 1.7 – yet,
aesthetically, the two approaches were radically different. The essential
natures of these two approaches can be characterised as top-down
(musique concrète) and bottom-up (elektronische Musik), respectively,
and it is concluded that these two philosophies are still evident in
contemporary electroacoustic music.
The abstract characteristics of both top-down and bottom-up approaches

were summarised in Table 3 (page 84). In the briefest and most
generalised terms possible, top-down can be described as a perceptual
approach, while bottom-up would be best described as conceptual.
2.8.2. Creative Frameworks can be ‘Directionally

Biased’
In section 2.3, the essential nature of ‘real’ and ‘artificial’ (synthesised)

sounds was briefly examined. From this it was suggested that ‘real’
sound, by its very nature, is particularly conducive to top-down working
procedures, while ‘artificial’ sound is more in-tune with the bottom-up
method of working. Accordingly, it is not surprising that musique
94
concrète, with its focus on the creative possibilities of ‘real’ recorded
sounds, should have developed into a top-down idiom, while
elektronische Musik, with its synthetic means, should be bottom-up.
This indicates a dialectical relationship between the nature of the music,
and the nature of the creative frameworks employed in its realisation.
In more general terms, it can be concluded that all creative frameworks

(‘real’ sounds, ‘artificial’ sounds, notational systems, technologies,
software, sound diffusion systems, et cetera) can potentially be
‘directionally biased,’ and therefore gravitate to an extent towards either
the top-down or the bottom-up end of the spectrum. Notation in western
classical music, for instance, has been cited as an example of a creative
framework that is biased toward bottom-up working methods. This can
be seen as an extension to some of the arguments proposed in the
previous chapter, whereby technologies are regarded as having
affordances and therefore engender particular creative opportunities and
ways of working. Therefore, the aspirations of composers can either be
reinforced or impeded to a certain extent by the frameworks within
which they choose to realise their work. In section 2.4 the compositional
process of Stockhausen’s Étude was briefly accounted. This represents a
good example of a composer with characteristically bottom-up
aspirations working within a framework that is inherently biased toward
top-down methods. Unsurprisingly, therefore, Stockhausen expresses a
certain degree of frustration that the results were not as he had
anticipated. This, of course, does not mean that music written within a
top-down framework is always top-down, nor that music written within
a bottom-up framework is always bottom-up, but merely that the
frameworks can have certain inherent biases and therefore exert certain
directional pressures on composers.
95
2.8.3. Electroacoustic Technologies allow Equally
for Top-Down and Bottom-Up Compositional
Approaches
The practice of electroacoustic music exists, to a considerable extent,

within the abstraction layer illustrated in Figure 1 (page 11), that is,
much of the practice itself exists inside the abstract domain of encoded
audio streams: all of the technologies associated with the idiom
(described schematically in section 1.7), at some level, share this
common communication protocol. Furthermore, electroacoustic means
offer composers, for the first time in a musical context, the opportunity
to work directly with their materials, as opposed to working via
symbolic representational means such as notation. The overall
framework of electroacoustic music can therefore be regarded – perhaps
uniquely – as both top-down and bottom-up, because frameworks that
are essentially top-down (such as ‘real’ recorded sound statically
encoded onto recording media) and those that are essentially more
bottom-up (such as sounds synthesised algorithmically) can be easily
integrated. Consequently, the electroacoustic idiom presents equal
opportunity for both top-down and bottom-up modes of composition. It
is therefore reasonable to propose that electroacoustic music, as a
whole, will be represented by top-down and bottom-up works in
approximately equal measure, with works that embody these two
ideologies in combination perhaps also being common.
Overall, the function of this chapter has been to present a context within
which electroacoustic music can be regarded, in aesthetic terms, as a binary
art-form. The purpose of this investigation has been, in conjunction with
Chapter 1, to ascertain the technological and aesthetic demands that may be
placed on sound diffusion systems used for the public performance of
electroacoustic music. In the next chapter it will be argued that the
underlying top-down/bottom-up binarism is carried forward into the
performance context, resulting in two contrasting performance practice
ideologies and, as will become apparent in Chapters 4 and 5, two
96
fundamentally different kinds of diffusion system. If, however, works of
varying technical and aesthetic demands are to be performed side-by-side in
concerts of electroacoustic music, then sound diffusion systems that can
cater for diversity in these areas are required.
97
3. Sound Diffusion
3.1. Introduction
This chapter focuses on the notion of ‘sound diffusion,’ that is, the public
performance of electroacoustic music to an audience. It was concluded in
Chapter 1 that all electroacoustic music is dependent, at least in part, on
mediation via loudspeakers, owing to its reliance on encoded audio streams.
Accordingly, it can be stated that the process of sound diffusion must also
involve loudspeakers and, furthermore, that the process of sound diffusion –
if for this reason alone – is a fundamentally necessary one.
We also know from Chapter 1 that any given piece of electroacoustic music
will involve certain further technical demands; these were outlined in
sections 1.7 and 1.8. The ways in which these demands are met (or, indeed,
are not met) must therefore also be considered part of the process of sound
diffusion. Furthermore, it has been proposed that, aesthetically,
electroacoustic works can broadly be sub-divided into those that are
essentially top-down and those that are essentially bottom-up. These
contrasting aesthetic demands must therefore also be catered for.
The ultimate purpose of this chapter is to set up a context within which

specific sound diffusion systems can be evaluated in terms of how
successfully they meet the technical and aesthetic demands imposed upon
them; the evaluation itself will be one of the primary objectives of Chapters
4 and 5.
Discussion will begin with a summary of some of the acoustic difficulties

inherent in presenting sound via loudspeakers in (often relatively large)
performance venues. From this it will be concluded that presentation of
encoded audio channels to the audience in a way that is congruent with
‘what the composer intended’ is, at some level, a unanimous objective of the
process of sound diffusion. It will be further proposed that this overbearing
99
concern tends to take precedence regardless of aesthetic directionality, that
is, it applies to both bottom-up and top-down composers alike.
An additional function of the present chapter will be to propose two original

abstract concepts intended to clarify the purpose of sound diffusion from
both practical and theoretical perspectives. These interrelated concepts are
that of the coherent audio source set (CASS), and that of the coherent
loudspeaker set (CLS). The CASS is a theoretical framework that allows
multiple (one or more) monophonic encoded audio streams to be
conceptually ‘grouped together’ and treated, collectively, as a single,
homogeneous entity, thus maintaining a constant relationship between the
constituent channels. The two discrete channels – ‘left’ and ‘right’ – of a
stereophonic recording are an intuitive example. The CLS, essentially,
implies a group consisting of one or more loudspeakers arranged in an
arbitrary formation. The constituent channels of a CASS are related to each
other via a conditional coherent bond, that is to say, the set is only coherent
on the proviso that certain conditions are met. One of the conditions of the
coherent bond is that the CASS must be reproduced over an appropriate
CLS (or CLSs). In this newly-established context, it will be suggested that
the fundamental aim of sound diffusion is to present a CASS (or multiple
CASSs), via a CLS (or multiple CLSs), to an audience, in a manner that
ensures that the conditions of the coherent bond(s) are satisfied.
This suggestion may be open to criticism, however, on the grounds that it

implies that the process of sound diffusion is one upon which all
practitioners of electroacoustic music agree. This, of course, could not be
further from the truth. In rearticulating the proposition, one might posit that
the overall credo of sound diffusion is to effectively communicate the
discourse of a piece of electroacoustic music to the public, and that – owing
to the nature of the idiom – this process must involve the presentation of
encoded audio channels (which can conceptually be grouped into CASSs)
over loudspeakers (which can conceptually be grouped into CLSs). Of
course, ‘the discourse of a piece of electroacoustic music’ will depend
100
enormously upon whether that piece has been conceived and realised in an
essentially top-down, or in an essentially bottom-up, manner.
As described in Chapter 2, the top-down practitioner regards the encoded

audio stream as an abstract representation of the perceptual realities of
sound, whereas the bottom-up practitioner treats it as a carrier of more
objective data. The present chapter will therefore also discuss the differing
attitudes of top-down and bottom-up practitioners with respect to the act of
sound diffusion itself, on the basis of this distinction. It will be proposed
that both top-down and bottom-up philosophies are carried forward into
diffusion practice, implying a ‘composition-diffusion continuum’ that
results in two highly contrasting performance practice ideologies. An
analysis of the impact that this has on defining top-down and bottom-up
attitudes towards sound diffusion will be offered, with particular reference
to notions of interpretation and performance ‘context.’ Ultimately it will be
concluded that the debate in question centres on the relationship between
text and context, where ‘text’ refers to the piece of music itself, and
‘context’ connotes everything else – the audience, performance venue,
diffusion system, performance methodology, and so on. Specifically it will
be argued that in the case of top-down diffusion, the text is adapted
(performed) in a manner appropriate to the context (and is, in this respect,
flexible), whereas in bottom-up diffusion the context is adapted to suit the
requirements of the text, which in this case is regarded as singular. Specific
examples of both possibilities will be given.
3.2. What is Sound Diffusion?
‘Sound diffusion’ refers to the process involved in presenting

electroacoustic music to an audience.
As previously observed, electroacoustic music is an idiom that relies

extensively on encoded audio streams, which, of themselves, are of little use
unless they are eventually to be decoded via the process of reproduction
over loudspeakers (see section 1.6). It can therefore be assumed that the
process of sound diffusion will necessarily involve at least one loudspeaker,
101
and probably many more. Other technological demands are less certain, but
it is likely that one or more of the technologies outlined in section 1.7 will
be implicated, and may collectively take the form of one or other of the
technological permutations summarised in section 1.8. In perhaps the
simplest possible example, playing a compact disc on a hi-fi system with
loudspeakers can be regarded as an act of sound diffusion. Sound diffusion
is, therefore and above all, absolutely necessary, simply because it
represents the process that ultimately renders electroacoustic music audible:
Electroacoustic music can only have its being in the act of diffusion.128
An important aspect of Clozier’s statement lies in his use of the phrase ‘act
of diffusion.’ This tellingly implies that we are dealing with an active
process as opposed to a passive one. While – in the crudest and most
generalised terms – sound diffusion can be regarded as the process in which
works of electroacoustic music are ‘played back’ via loudspeakers, this is
ultimately a misleading generalisation because the notion of ‘playing back’
– erroneously – implies a passive act. An important function of the present
chapter, therefore, will be in determining the nature of the active and
intentional process of sound diffusion. In what respect, exactly, is it an
active process? And what, exactly, are the considerations that this active
process entails? In short, what are the ‘acts?’ This enquiry will begin with a
summary of some of the acoustic difficulties associated with the
3.3. Acoustic Issues in the Performance of

The performance of electroacoustic music to an audience clearly requires

some kind of venue large enough to accommodate multiple listeners, and
one that is appropriate for public performance. It can therefore be said that,
in the vast majority of cases, sound diffusion involves the presentation of
electroacoustic works in spaces that are relatively large in comparison with,
say, domestic or studio listening environments. This raises a number of
128
.Clozier (1998). "Composition, Diffusion and Interpretation in Electroacoustic Music". In
102
issues that are unique to the performance of electroacoustic music; issues
that are further compounded by the broad technical demands and aesthetic
standpoints embodied within the idiom. There are a number of acoustic and
psychoacoustic considerations to be addressed, mostly stemming from the
fact that such performances take place in (relatively) large venues to
(relatively) large numbers of people. A number of these issues are
summarised by Doherty.129
The sheer variety of venues in terms of their size and geometry – and
therefore their acoustic characteristics – is an important consideration:
The variety of venues where electro-acoustic music is played is enormous. This

ranges from lecture hall-size spaces to large churches… This kind of
presentation creates many problems for listeners, not least that large spaces
severely interfere with stereo imaging, tending to destroy positional
information and mask detail.130
Harrison documents the same problem as follows:
If a stereo piece is played over a stereo pair of loudspeakers (even large

speakers) in a large hall, the image will be even less stable and controllable
than in a domestic space, and will certainly not be the same for everyone in the
audience… Listeners at the extreme left or right of the audience will receive a
very unbalanced image; someone on the front row will have a ‘hole in the
middle’ effect, whilst a listener on the back row is, to all intents and purposes,
hearing a mono signal!131
Harrison’s latter example (the listener at the back of the hall) is confirmed
by Doherty, who additionally notes problems of phase cancellation and loss
of stereo image integrity as a direct result of large distances between
listeners and loudspeakers:
In a large space, temperature and humidity variations and air movement create
unwanted and continually varying changes in the phase of a signal as measured
at a listening point… This results in variations in the phase relationship
between the left and right loudspeaker output wherever you sit. These cause
varying phase cancellations of different frequencies, as well as the destruction
of any meaningful spatial information. Acousticians and sound engineers
generally agree that in large spaces, particularly towards the back, frequencies
129
Doherty (1998). "Sound Diffusion of Stereo Music Over a Multi-Loudspeaker Sound
System: From First Principles Onwards to a Successful Experiment". SAN Journal of
Electroacoustic Music, 11.
130
Ibid.: 9.
131
103
above 3000 Hz cannot carry any sense of stereo in the phase relationship
between left and right loudspeakers.132
And let us not forget the inverse-square law:
As the sound spreads out from a source it gets weaker. This is not due to it
being absorbed but due to its energy being spread more thinly… Every time the
distance from a sound source is doubled the intensity reduces by a factor of
four, that is there is an inverse square relationship between sound intensity and
the distance from the sound source.133
In the context of sound diffusion, the intensity of sound experienced by a

listener positioned twelve metres away from a loudspeaker will be one
quarter of that experienced by a listener seated only six meters away from
the same loudspeaker, simplistically speaking. Perhaps one might address
this problem, simply, by driving the loudspeakers harder? Unfortunately, the
solution is not as simple as that, as Doherty goes on to explain:
For every octave down that you wish to reproduce you need four times the
power used on the first octave to achieve similar perceived loudness. For
example, to produce a sound as loud as that produced by one watt at 1 kHz at a
frequency of around 32 Hz you would need around one thousand watts
(assuming similar loudspeaker sensitivities). This means that the lower octaves
use the bulk of audio power. Without lots of power driving the lower regions
and moving lots of air, the loudspeaker output will tend to be bass light. Couple
a lack of power in the bass with phase cancellation of higher frequencies, and
you end up with an over-emphasis of mid frequencies, particularly noticeable in
loud passages.134
In summary, large distances between audience members and loudspeakers –

particularly in large and reverberant spaces – are problematic, and the
problems are compounded when this distance is increased. These problems
include loss of phantom imaging integrity and frequency-dependent
colourations of the sound, as well as additional issues relating to the amount
of power required to achieve satisfactory levels of sound intensity for
distantly located listeners. The latter of these issues in particular forces us to
deal with the non-linear relationship between power and frequency
132
Electroacoustic Music, 11: 9.
133
Howard and Angus (1996). Acoustics and Psychoacoustics (Oxford; Focal Press): 29-
30.
134
Electroacoustic Music, 11: 9-10.
104
response. Aside from the problems posed by simple distance from the
loudspeakers are issues such as those raised by Harrison: it is impossible for
all of the members of an audience to be centrally located, and off-centre
listeners will experience problems. All of these observations offer up an
obvious conclusion: the accurate reproduction of encoded audio streams via
loudspeakers is very likely to be jeopardised in most performances of
electroacoustic music unless appropriate measures are taken to minimise
this. As noted by Harrison:
Whatever a composer has put on a tape is potentially at risk in a large space

unless positive steps are taken to reinstate what would otherwise be lost.135
It is in this important respect – referring back to section 3.2 – that sound

diffusion is an active process rather than a passive one.
It seems clear (and perhaps this is obvious) that the communication of a

piece of electroacoustic music through the process of sound diffusion must
be, in some sense, coherent with respect to the intentions of the composer.
Although it seems reasonable to suppose that the ‘intentions’ of composers
will be enormously variable – the ‘message’ of the music is likely to vary
dramatically from composer to composer, and from work to work – in the
context of sound diffusion (where works doubtlessly exhibiting profound
technological and aesthetic differences will, nonetheless, have to be
performed side-by-side), it is certainly worth considering the common
interests of composers working within the electroacoustic idiom. What
might these be?
In Chapter 1 it was proposed that – despite differences in specific

technological profile – the presence of encoded audio streams intended for
later decoding via loudspeakers is a defining characteristic of
electroacoustic music. In this respect, the use of encoded audio streams
represents a common interest for electroacoustic musicians, as does the
subsequent use of loudspeakers as a means to decoding them. It is proposed
that this applies regardless of whether the work in question is top-down or
135
105
bottom-up. In section 1.6.2 it was further noted that the use of multiple
encoded streams, and their subsequent decoding via multiple loudspeakers –
through the phenomenon of phantom imaging – ultimately leads to the
emancipation of spatiality in an auditory context. Unsurprisingly, the
‘uniquely spatial’ potential of such audio technologies has been cited – in
section 1.13.2 – as an important avenue for creative exploration within
electroacoustic music. It is therefore proposed that the accurate
reproduction of phantom images is, amongst other things of course, a
common concern in the diffusion of electroacoustic music.
Because the existence of phantom images is dependent on the co-ordinated

use of multiple encoded audio streams to be decoded via multiple
loudspeakers, it seems logical that there should exist a conceptual
framework within which streams can be dealt with in a multiple, and co-
ordinated, manner. Surprisingly – excluding ‘convention’ – there does not
seem to be any such framework. It is for this reason that the concepts of the
‘coherent audio source set’ (CASS) and the ‘coherent loudspeaker set’
(CLS) are proposed; these will be described in sections 3.5 to 3.8.
3.4. Diffusion Example: Two Movements from

Parmegiani’s De Natura Sonorum
The following sections are designed to indicate – by way of example – the

kinds of actions and outcomes that a performer might wish to perform in the
diffusion of a work of electroacoustic music. This is intended as a
supplement to the otherwise more abstract description of the act of sound
diffusion as presented in this chapter. A brief account of some (but by no
means all) of the logistical practicalities inherent in the performance of these
diffusion actions will also be given.
Two movements from Bernard Parmegiani’s stereophonic tape-only work

De Natura Sonorum136 (1975) will be described; these will be the second
movement – entitled ‘Accidentals/Harmonics’ – and the third – ‘A
136
Parmegiani (1991). De Natura Sonorum. (Compact Disc - INA: INAC3001CD).
106
Geological Sonority.’ Audio recordings of these movements can be found
on the accompanying audio CD.
These particular movements have been selected for a number of reasons.

Firtly, they are highly contrasting in nature, and therefore afford the
opportunity to describe quite different diffusion techniques in each case.
Secondly, the movements are reasonably ‘monothematic’ in terms of the
kind of diffusion actions they demand: each movement therefore presents
the opportunity to explore one specific ‘style’ of diffusion in some detail.
Finally, the first of these two movements runs segue into the second,
necessitating a description of the techniques required to perform the
movements consecutively.
As a classic work of acousmatic music, it can be stated that De Natura

Sonorum is oriented towards top-down ideologies. Accordingly, appropriate
diffusion actions for the performance of these movements have been arrived
at mainly on the basis of a perceptual evaluation of the musical materials
and their development and interaction with each other. Top-down diffusion
theory will be described more fully, in abstract terms, in section 3.12.
In the analysis that follows it will be assumed that the loudspeaker array
illustrated in Figure 2, below, is being used. This is based on the BEAST
array for the diffusion of stereophonic tape works as described by
Harrison137, who provides a detailed rationale for the placement of each
loudspeaker pair.
137
107
Very Distant
Distant
Main
Wide
Punch
Roof (Raised)
Tweeters (Front)
T T
B Side B
T T
Tweeters (Rear)
Rear Pair
KEY
Loudspeaker B Bass Bin T Tweeter Tree Audience
Figure 2. Loudspeaker array

assumed for the diffusion of
movements from Parmegiani’s
De Natura Sonorum
At present, the vast majority diffusion concerts are carried out using
mixing desk style systems whereby the level of each loudspeaker must
be controlled with an individual fader. The BEAST system (to be
described in section 4.8) operates on this basis and for the purposes of
this discussion it will be assumed that the mixing desk configuration
illustrated in Figure 3 is being used. Again, this is closely based on a
configuration described and rationalised by Harrison.
108
Bass Bins Tweeters (Rear) Distant Wide Punch Roof
1 1 1 1 1 1 1 1 1 1 2 2 2
1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9 0 1 2
Tweeters (Front) Very Distant Main Rear Side
Figure 3. Mixer configuration

assumed for the diffusion of
movements from Parmegiani’s
De Natura Sonorum
3.4.1. Movement 2: ‘Accidentals/Harmonics’
‘Accidentals/Harmonics’ is described by Parmegiani as follows:
Often very brief events of instrumental origin are brought in to modify the
harmonic timbre from the continuum that they undercut or on which they
are superposed.138
At the most basic level, the movement comprises one continuous drone
of uniform pitch, which is punctuated by sporadic and sudden gestural
interjections. With regard to sound diffusion – and, again, at the most
basic level – it would seem appropriate for the continuous drone
material to be relatively static in terms of spatial articulation, and for the
gestural interjections to be more spatially dynamic. In order to go into
more detail, the movement will be divided up into seven sections,
appropriate diffusion actions for each of which will be described in turn.
The first section introduces the pitched drone, tentatively at first, but
this quickly becomes more firmly established in discrete steps (as
opposed to smoothly or gradually) brought about by superimposed
entries that add to its mass and change its timbral character, as described
by the composer. In diffusion, the drone could begin distant, with the
abrupt introduction of sequentially closer pairs of loudspeakers
138
Parmegiani (1991). De Natura Sonorum. (Compact Disc - INA: INAC3001CD).
Quotation from accompanying booklet.
109
coinciding with each one of these entries. At 0’29”, for example, a very
brief pitched interjection affects a change of quality in the underlying
drone, which at this point attains a distinctive spatial morphology that
oscillates rapidly from left to right within the stereo field. This would be
an appropriate point for the drone to reach the wide pair of loudspeakers
(or even the sides) as this would emphasise the intrinsic left-to-right
movement. In terms of its physical demands this progression poses no
major difficulties because individual fader pairs are introduced
sequentially, with plenty of time available to prepare for each new
entry. As we will soon see, however, when diffusion actions are to be
performed in more rapid succession – and particularly if these actions
involve the simultaneous use of more than two faders – the logistical
difficulties can become considerable.
The drone is interrupted for the first time at 0’59”, marking the
beginning of the second section. At this point, a loud and abrupt ‘drum
roll’ occurs, halting the drone – albeit very briefly – in its tracks. This is
an important moment also because it introduces a timbre – unpitched
and percussive – that has not yet been heard in the hitherto entirely
pitched sound-world of the movement. Really, this entry is unexpected
enough to be effective without any particular diffusion action, but could
be emphasised with, for example, a sudden shift to the main or (even
more emphatic) punch pair of loudspeakers. This would be achieved by
raising faders 17/18 to coincide with the sudden percussive entry. The
results would be even more effective if this was accompanied by a
simultaneous lowering of all of the other fader levels, although this
would clearly be much more difficult to achieve in practice.
The drone immediately reasserts itself, and this is followed by two short
‘sub-sections,’ the first of which is characterised by unpitched
percussive entries that are – to use the composer’s terminology –
superposed on the drone (i.e. they do not ‘interrupt’ it as such) and the
second of which is characterised by similar entries that undercut the
drone (i.e. they do interrupt it). The sub-sections are separated by two
110
brief pauses, occurring at 1’09” and 1’25” respectively. In diffusion
such pauses can be used to affect a ‘scene change’ if necessary, setting
up a different part of the performance space for the following section if
this is appropriate. There is no particular need to treat these sub-sections
differently in diffusion: the composed differences fixed onto the
medium will be enough to make the difference between the two
sufficiently clear. In both cases the sudden interjections can be timed to
coincide with a slight emphasis on the main or punch pair of
loudspeakers (in the case of interjections consisting of a single brief
impulse) or – in the case of longer gestures such as that occurring at
1’12” – a more spatially dynamic articulation in which the gesture is
‘thrown around the space’ among several loudspeaker pairs. The mixer
configuration is such that gestural articulation among the ‘main eight’
loudspeakers (mains, wides, distants, rears) is fairly straightforward.
Articulations incorporating other loudspeaker pairs – however effective
this might prove to be – would be logistically more difficult to achieve.
In either case, it is very important that the drone be clearly established
as spatially static, and that any sudden fader movements are carefully
timed to coincide with the percussive or gestural interjections. The main
challenge here, therefore, will be in simultaneously lowering the levels
of multiple faders in order to re-establish the drone at a static spatial
location. If this is achieved, however, the psychoacoustic effect should
be sufficiently persuasive to convince the audience that the drone
remains ‘stationary,’ while the gestural interjections articulate different
parts of the performance space.
The third section – which follows the brief pause at 1’24” and lasts until
around 1’52” – is dominated by the drone and is, accordingly, rather
more static, in terms of its spatial morphology, than the previous
section. Interjections are also less frequent, and – excluding the gesture
with which this section begins – consist only in individual percussive
hits. Two of these impulses – occurring at 1’35” and 1’44” – affect
changes in the dynamic profile of the underlying drone. The first
initiates a crescendo, the second, a subito piano followed by a
111
crescendo. At these points it would be effective for the drone to
gradually grow in terms of the space it occupies, closely tracking each
crescendo, starting in the distant pair and steadily expanding through
mains, wides, and perhaps even to sides. The only real inconvenience
here is the gap between faders 13/14 (wides) and 19/20 (sides), which
could make the realisation of a smooth front-to-back transition
somewhat awkward. The percussive hit at 1’44” would be effectively
diffused via the punch pair of loudspeakers, this coinciding with the
instantaneous ‘shrinking’ of the drone back into the distant pair in
preparation for the second ‘growing’ crescendo. This will be more
difficult to achieve because – once again – it involves the sudden and
simultaneous movement of multiple faders. This is one of the relatively
few passages where gradual fader movements will be necessary for any
extended length of time.
Transition into the fourth section is affected by a marcato saxophone-

like entry consisting of two rapid repetitions of the same note (1’52”).
This motif occurs twice, the implied space of the second occurrence
(1’54”) being more distant than that of the first. This can be used in
diffusion to re-establish the drone via the distant pair of loudspeakers in
readiness for the following section. The most prolonged in the
movement – lasting until 3’03” – diffusion of this section will largely
consist in articulating the frequent gestural movements around the
performance space whilst maintaining those brief moments of stasis
where the drone dominates the musical texture. The drone is relatively
quiet (and perhaps distant) throughout this section and this should be
reflected in the diffusion. There are, of course, many points at which
more specific actions could be effective. The pizzicato double-bass
notes (2’00”) could be nicely emphasised with the subtle use of bass
bins, highlighting the fact that up to this point there has been little
pitched material with frequency content as low as this. In addition,
notable gestures such as the reversed breathy gesture at 2’35”, the
pitched saxophone-like crescendo from 2’44” to 2’46”, and the
sweeping upward glissando beginning at 2’51”, could be used to ‘open
112
out’ the spatial articulation or otherwise shift the spatial emphasis in
various ways. The upward glissando could be used, for instance, as a
means to introducing the ‘roof’ pair of loudspeakers into the
performance if these have not been used previously.
The fifth section begins with a deep, bassy impact at 3’11” (the use of
bass bins would be particularly effective here), which initiates the
establishment of new material that is pitched, resonant, fluttering, and
reverberant in quality. (This new material has, in fact, been pre-empted
in the previous section, very subtly at first (2’39”, 2’53”) and then more
obviously (3’03”, 3’07”). These points should be used as a means to
establishing the diffusion behaviour that will dominate the following
section and could, perhaps, be diffused to distant loudspeakers.)
Throughout the section this material alternates and interacts with the
drone, which has also become more reverberant and less intimate in
quality, but somewhat faster and more continuous in terms of gestural
movement. This section could be diffused via a rapid interplay between
reverberant/distant space (e.g. 3’11” to 3’14”) and a closer space (e.g.
immediately following 3’14”). The section is also quite ‘expansive’ in
character, and should probably therefore make use of many loudspeaker
pairs; interplay between outlying loudspeakers (e.g. very-distants,
distants, sides, loudspeakers located in the roof, rears) and those located
closer to the audience (mains, wides, and the punch pair) would be very
effective, building to a peak at 3’35”. This will be particularly tricky
because it involves the manipulation of two different groups of faders
that are not conveniently juxtaposed on the mixing desk. The outlying
loudspeakers are routed through faders 7/8, 9/10, 15/16, 19/20 and
20/21 respectively, while those located closer to the audience occupy
faders 11/12, 13/14 and 17/18. A rapid interplay between these two
distinct spaces will therefore be extremely difficult to execute, the
actions achievable in practice being seriously restricted by the physical
ergonomics of the system.
113
At this point, the sound should completely fill the space as the musical
dialogue becomes more continuous with the prominent establishment of
the drone at the forefront. Shortly, the drone is accompanied by much
gestural and textural activity, which should be articulated quickly
around the performance space. Because many loudspeakers will be in
use at this point, this will involve frequent but subtle ‘nudges’ to the
faders. There is rather a lot of composed left-to-right movement in this
section, and so frequent use of the wides and sides could be used to
emphasise this. There is also much high-frequency content in the rapid
textural material, which would very effectively enlarge the perceived
‘size’ of the performance space if diffused via tweeter trees above the
audience. This sixth section – running from 3’35” to 4’11” – should be
the ‘largest’ in the movement, in terms of the number of loudspeakers in
simultaneous use.
The seventh and final section, running from 4’11” to the end of the
movement at 4’46”, exists as a preparation for the subsequent
movement, which follows segue. It should therefore be used in diffusion
as a means to preparing and establishing the correct space for the
following movement, which – as will be seen shortly – is considerably
different in character. The section begins with an abrupt halting of the
previous rapid gestural activity, brought about by a distant, resonant
drum-like entry. Again, such a rapid shift to distant loudspeakers could
prove challenging. This is followed by a series of emphatic, fast-moving
statements, each abruptly terminated by a similar percussive
interruption. The implied spatial articulation is therefore one of rapid
movement, followed by sudden stasis, followed by rapid movement, and
so on. Here, the gestural material is reverberant, implying that the
spatial articulation should be large and expansive, probably employing
large numbers of loudspeakers for its diffusion. The percussive
interruptions are ‘distant’ in character, and could be effectively
articulated via sudden shifts to distant and very distant loudspeakers. As
described previously, however, the actions achievable in practice are
likely to be limited by the physical constraints imposed by the diffusion
114
system itself. The hiatus immediately preceding the final emphatic
impulse-like entry at 4’39” should be used to prepare the mixing desk
for the bass drone, which is introduced at this point to underpin the
beginning of the following movement. The way in which this bass drone
should be diffused will be described shortly.
In summary, ‘Accidentals/Harmonics’ is characterised by moments of

relative stasis – where the musical material is dominated by the pitched
drone – interspersed with percussive impacts and gestural statements in
which much spatial articulation is implied. Sudden changes in timbral
quality or gestural behaviour can normally be successfully diffused with
sudden fader movements, provided the timing is accurate enough.
Where timbre is static, on the other hand – as is often the case here
when the pitched drone dominates the musical dialogue – this tends also
to imply spatial stasis, and sudden fader movements run the risk of
yielding perceptually implausible and/or distracting results. The main
challenge, therefore, in the diffusion of this movement rests in
maintaining the perceptual separation between the underlying drone and
the gestures that articulate it, and in doing so mediating and clarifying
the relationship between the two throughout. This challenge is, of
course, augmented by the fact that the drone frequently changes in
character, as do the articulating gestures, as does the relationship
between the two!
3.4.2. Movement 3: ‘A Geological Sonority’
‘A Geological Sonority’ is markedly different in character from the

previous movement. Here, the tempo is considerably slower, and the
musical material develops through what is essentially an interplay
between multiple layers of broad, continuous, drone-like material,
which is conveyed in long, drawn out, sweeping gestures. The effect is
rather like being repeatedly ‘washed over’ by multiple waves of sound.
Parmegiani states that the movement ‘resembles flying over a landscape
115
in which the different ‘sound’ levels will emerge on the surface one
after the other.’139
The movement begins with a single low-pitched bass drone, as

established at the end of the previous movement. The intensity and
regularity of the subsequent ‘waves’ of sound gradually increases to a
climax at around 3’00”, after which the original low-pitched bass drone
is re-established. At this point, the pace of the music rapidly diminishes,
finishing more or less as it started with the solo bass drone.
The first ‘wave’ grows out of the solo bass drone with the very gradual
introduction (from 0’20” to 1’00”) of a second textural layer, which –
although not unpitched – has a noisier spectrum than the underpinning
bass drone. At 1’25” this second drone enters a slow downward
glissando whilst gradually decreasing in dynamic level, and finally
disappearing at 1’48”. The glissando is accompanied by the staggered
introduction of a further three distinct textural layers that serve to build
the dynamic level and thus bring the wave to a ‘crest’: a drone whose
pitch is in between that of the bass drone and the second layer
introduced, and two noisy drones, one of high tone-height (a hiss) and
the other much lower (a rumble). The perceptual experience of a wave –
beginning in the distance, passing over the audience, and eventually
disappearing in the opposite direction from whence it came – is the
result of two important factors. Firstly, the wave gradually builds in
dynamic level, reaches a peak of intensity, and then dies away.
Secondly, the gradual downward glissando is rather like the Doppler
effect experienced by listeners when a moving sound source passes by.
In this case, the onset of the glissando happens when the dynamic level
reaches its peak, further emphasising the effect.
One approach that could be adopted in the diffusion of this movement,

therefore, is the articulation of the wave-like behaviour of the musical
material. In order to evoke a sense of ‘flying over a landscape,’ it would
139
Ibid. Quotation from accompanying booklet.
116
seem appropriate to have the waves travel from front to back, thus
suggesting that the direction of ‘travel’ is forwards with respect to the
direction in which audience members would normally be facing. In this
way, the different features of the aural landscape will emerge out of the
distance in front of the audience, gradually approach, passing around
and over the audience before disappearing into the distance behind.
In the context of this interpretation, the bass drone with which the
movement begins can be regarded as the ‘horizon,’ a static and
dominating presence against which movement takes place. Accordingly,
the bass drone should be diffused in a way that establishes it as static
and omnipresent: we have not yet begun our journey over the aural
landscape, and the task of the diffuser at this point is to hint at the
imposing grandeur of what we are yet to experience. The nature of the
material inherently lends itself to this kind of treatment:
psychoacoustically, lower frequencies are more difficult to localise
spatially than higher frequencies and therefore – even diffused via
mains and wides only – the bass drone is likely to sound as though it is
coming ‘from everywhere’ to a certain extent. The impact of the bass
drone could be augmented by the use of bass bins to emphasise the very
lowest frequencies. These will be the most difficult to localise, thus
reinforcing the sense of infinite space. As mentioned earlier, the correct
fader positions for the diffusion of the bass drone should already have
been prepared in time for the final impulse that occurs at 4’39” in the
previous movement. In this way, a smooth transition – avoiding any
clumsy fader movements as the drone emerges – is assured.
When the first wave begins, the gradual emergence of the second, semi-
pitched, drone could be used very effectively to establish ‘forward
motion’ over a landscape as the dominating gestural statement of the
movement. Because this drone is somewhat noisy in spectral profile, so
it will be inherently easier to localise spatially. If loudspeaker pairs are
introduced carefully to coincide with its arrival and implied trajectory,
therefore, it will be possible to establish the onset of this first wave as
117
spatially independent from the accompanying bass drone (which is,
recall, more difficult to localise anyway). The first wave should be
diffused first to the very-distant pair of loudspeakers, very gradually
moving forwards towards the audience via distants, mains, and wides as
it builds in dynamic level. The point at which the first wave passes the
audience is at around 1’38”, and this is quickly followed by the
emergence of a second wave on the horizon. The white-noise material
that fades in and out between 1’38” and 1’48” should coincide with the
introduction of tweeter trees, first the front pair, then the rear. As these
are located directly above the audience, this would very effectively
articulate the moment at which the first wave passes by. Use of the roofs
and sides just prior to this would also be very effective in evoking a
sense of the wave filling the space before shrinking into the distance
behind the audience.
In terms of the fader movements involved in the realisation of this

extended gesture, the progression from very-distants through to wides is
straightforward because the corresponding fader pairs (7/8, 9/10, 11/12,
13/14) are located sequentially from left to right on the mixer.
Furthermore, the onset of the wave is very gradual, and therefore only
very slow fader movements will be required. At this point, however, the
diffusion becomes logistically more complicated. As the crest of the
wave passes the audience, the consecutive use of front and then rear
tweeter trees is required, and these are located towards the extreme left
of the mixing console on fader pairs 3/4 and 5/6. This is to be
accompanied by the progression of the wave into the sides (19/20) and
roofs (21/22), which are located at the opposite end of the console.
Simultaneously, the mains and wides – 11/12 and 13/14 – should be
gradually lowered, bringing the wave into the centre of the auditorium.
When the wave passes, all of the fader levels should be gradually
lowered further, with those located furthest forward in the hall being
lowered first. This should coincide with the gradual introduction of
rears on faders 15/16. Thus, the realisation of a (perceptually) relatively
simple front-to-rear articulation is complicated to a certain extent by the
118
physical routing of loudspeaker pairs to mixing desk faders. This
problem is noted by Harrison:
I’ve done a lot of things in France […] where they tend to start at one end
of the mixer with the speakers that are the furthest away from you in front,
and then [gestures from left to right] the next pair, the next pair, the next
pair, and this end [gestures to the right] is at the back. That’s absolutely
tickety-boo if you want to do lots of front-to-back or back-to-front things,
but if you want to get the most significant speakers near the audience to be
going [makes antiphonal sounds and hand gestures], you’ll find that
there’s a pair there, a pair there, a pair there, and a pair there [indicates
four completely separate areas of an imaginary mixing desk]; you can’t get
that interaction, which is why we don’t do that in BEAST. We put those
main speakers in a group on the faders so that you can do it easily. But
then of course it means that if you want to go front-to-back, it’s awkward.
So, nothing’s perfect! Unless you could suddenly re-assign the faders mid-
diffusion. You [referring to the M2 Diffusion System, to be described in
Chapter 5] could do that, couldn’t you? [Laughs].
While the approach and passing of the first wave (and, to a certain
extent, the second) is clearly a ‘solo’ event – nothing else happens
concurrently – subsequent waves arrive in an increasingly overlapping
fashion. As one wave is disappearing into the rear loudspeakers, the
second wave should already be building in the very-distant and distant
pairs and this, of course, compounds the challenges inherent in diffusing
this movement effectively. The task of the diffuser is to convey a sense
of perpetual forward motion by carefully timing the introduction of
loudspeaker pairs – sequentially from the front to the rear of the hall –
to coincide with the spectromorphology of each wave. In contrast with
the previous movement, there will be no sudden fader movements here.
Textural layers emerge progressively and seamlessly, and this should be
articulated by the subtle, fluid, and perpetual cross-fading of
loudspeaker pairs. This is relatively straightforward when the
corresponding fader pairs are adjacent on the mixing console, but where
this is not the case, difficulties arise. Clearly, a degree of compromise
and prioritisation of the perceptual results will be necessary if the wave-
like behaviour of the musical material is to be conveyed successfully. In
this respect, the points at which each ‘crest’ passes the audience will be
the most important and in most cases these can be diffused as described
previously, with the tweeter trees articulating the points of transition
from ‘in front’ to ‘behind.’
119
In terms of perceived dynamic level, the first three minutes of the
movement are, basically, an extended crescendo. This is accompanied
by a gradual increase in gestural motion and intensity, reaching a peak
at around 2’58”, where three consecutively more emphatic waves occur
in quick succession. The bass drone has long since disappeared (at
around 1’40”), indicating that we have lost our awareness of the horizon
and become completely immersed in our experience of the landscape.
This gradual build could be emphasised in diffusion by the progressive
use of an increasing number of loudspeaker pairs, such that we become
more and more immersed within the landscape. By 3’11”, most of the
loudspeakers will be in simultaneous use, with only very subtle shifts in
emphasis between loudspeaker pairs being used to articulate the slowly
lapping, wave-like, motions. Of course, with every additional
loudspeaker the diffuser has an additional fader to manipulate.
Basically, this means that the degree of accuracy with which diffusion
actions can be executed is inversely proportional to the number of
loudspeakers involved. Accordingly, passages involving the use of most
of the loudspeakers in the array are likely to suffer the most, the
physical constraints of the system and the limitations of the diffuser’s
dexterity being most obvious under such circumstances.
From 3’11” onwards, the gestures become steadily more extended and
meandering, although the dynamic level tails off less abruptly. When
the gestural movement begins to abate, this need not be immediately
accompanied by the use of fewer loudspeakers. Although the musical
material becomes somewhat more drawn-out, reverberant, and slightly
lower in dynamic level – almost as though we are beginning to regard
the landscape from an increasing distance – the landscape is still large
and dominating; it is not yet ‘disappearing’ as such. Although we are
increasing our distance from the surface of the landscape (as implied by
the blurring of spectral detail and increased reverberance), we begin to
realise that it extends infinitely around us in all directions and is
therefore, if anything, getting larger. This effect is intensified by the
reintroduction of the bass drone ‘horizon’ (at around 3’30”; this should
120
coincide with the reintroduction of bass bins), as we gradually become
aware once again of our static point of reference. Throughout this
passage it would probably be effective to let the material ‘play itself,’
with most if not all of the loudspeakers remaining in use. Extensive use
of the roofs and sides would prove particularly effective throughout this
passage. The perceived effect here might be something akin to vertigo,
or agoraphobia.
The final wave passes the audience at around 4’05”. This point of
departure is particularly crucial because it is so exposed. If the sensation
of perpetual motion that has been so central to the articulation of the
musical material is to be finally affirmed, it is especially important that
this final wave disappears into the distance behind the audience, leaving
behind the omni-present bass drone with which the movement began.
Because we are currently using most of the loudspeakers in the array,
this will involve careful timing in the gradual dropping out of
loudspeaker pairs to coincide with the implied trajectory of this final
wave. In terms of fader movements, 7/8 and 9/10 (very-distant and
distant) should be the first to drop out, followed by 11/12 and 13/14
(main and wide). As was the case with the introduction of the very first
wave, this should not be problematic as these faders are arranged
sequentially from left to right on the mixing console. When the final
wave passes, front-to-back movement should be emphasised by
dropping out the front pair of tweeter trees first (3/4), followed by the
rear (5/6). This could be accompanied by a slight emphasis on the sides
and roofs (19/20, 21/22), which should subsequently be lowered to
affect the transition into the rears. Meanwhile, preparations should also
have been made for the underpinning bass drone to be re-established in
the mains, wides, and bass bins. The point at which the wave actually
passes the audience should be a sufficiently distracting juncture at
which to perform these fader movements. Again, we can see that the
performance of a seemingly straightforward front-to-back articulation
is, in reality, rather convoluted in terms of the actual fader movements
involved. Given that the articulation involves the introduction (or
121
dropping out) of loudspeaker pairs in a very specific order, this would
seem to be an area in which optimisations could be made.
When the final wave disappears, we return once again to a position of

stasis articulated by the low-pitched bass drone. As the bass drone fades
out (4’25” to 4’34”) loudspeaker pairs can gradually be dropped out one
by one, perhaps ending the movement on mains and bass bins only (we
normally perceive the horizon, after all, as being in front of us). In the
context of this interpretation, it would be inappropriate for the bass
drone to disappear into the distance, as such, because its function has
been to act as the static background against which movement takes
place. A sense of ‘shrinking’ – or perhaps ‘focussing’ into the main pair
of loudspeakers – as the drone fades into silence would seem perfectly
legitimate, however.
3.5. Coherent Audio Source Sets (CASS)
A coherent audio source set (CASS) is a single entity consisting of a number

of discrete but mutually dependent encoded audio streams, which are linked
by a conditional coherent bond.
As discussed in section 1.6.2, multiple encoded audio streams can be used

co-operatively to encode spatio-auditory attributes with respect to sound
source directionality. Two-channel stereo is a good example. Here, two
discrete encoded audio streams are designated ‘left’ and ‘right.’ (The
designation is essentially arbitrary: there is no real reason why the two
channels could not represent ‘up’ and ‘down,’ for example). From this point
onwards, the two audio channels, although physically discrete, are
fundamentally interrelated. Collectively, they abstractly represent an
imaginary axis that extends, ‘spatially,’ from left to right. If, at any given
moment, correlated audio signals of equal amplitude are present in both
channels simultaneously, this represents a sound source positioned at the
exact centre of this imaginary axis. A signal in the ‘left’ channel only
represents a sound source positioned at the left-most extremity of the axis,
while a signal in the ‘right’ channel only represents a sound source
122
positioned at the right-most extremity. Intermediate positions – assuming
that the two signals are sufficiently correlated – are represented by varying
ratios of signal amplitude in each of the two channels. 140 Because it is the
ratio of the amplitudes of the two component channels that abstractly
represents spatial positioning along an imaginary axis, so no representation
of this spatial attribute exists unless both channels are present. In this
respect, although both channels are technically independent, each is
meaningless without the other and this is one respect in which a coherent
bond exists between the two encoded audio channels. It therefore makes
absolute sense that the two channels, in combination, should be regarded –
and indeed treated, where appropriate – as a single entity. The concept of
the coherent audio source set exists to make this possible.
Why is the coherent bond between the constituent channels of a CASS

conditional? One condition, as discussed, is that all of the channels are
present at all times. If this condition is not met, then that which is abstractly
represented by the set as a whole (spatial location in the present example) is
lost. Another important condition, however, concerns the decoding of the
CASS channels. It is important to note (again, using stereo as a simple
example) that the amplitude ratio of the two channels of a stereophonic
CASS is an abstract representation of spatial positioning along an imaginary
axis. The real axis will not exist until such time as the encoded audio signals
are decoded via reproduction over loudspeakers. The real axis must be
facilitated by the positioning of loudspeakers: in this case, positioning one
loudspeaker to the left of the other would seem appropriate. In the
stereophonic CASS we have a hypothetical representation of ‘left’ and
‘right,’ and intermediate points. Now we have a real ‘left’ and a real ‘right,’
which are differentiated in terms of real, physical, space. If the channel
arbitrarily designated ‘left’ in the CASS is decoded via the loudspeaker that
is physically positioned to the left of the other loudspeaker, and the ‘right’
CASS channel is decoded by the other loudspeaker, which is physically
positioned to the right (this is, of course, another condition of the coherent
140
This account of amplitude-based phantom imaging techniques is clearly simplified, but
is nonetheless adequate for the purposes of the present discussion.
123
bond), then – and only then – will we have a real spatio-auditory
phenomenon that accurately reflects the data encoded within the CASS. In
other words, this is where the spatial (or any other) representation contained
within the CASS is concretised. To clarify, the primary condition of the
coherent bond between CASS channels is that they are decoded via an
appropriate coherent loudspeaker set; this will be discussed more fully in
section 3.7.
In and of itself, however, a CASS is simply a conceptual framework that

allows multiple encoded audio streams (whether analogue or digital, static
or transitory) to be arbitrarily grouped together; nothing more. This
grouping implies that the component channels should be treated,
collectively, as a single homogeneous entity, thus maintaining a constant
relationship between them. In doing so, anything that is represented
collectively by the constituent channels (whether this is spatial information,
as in the previous example, or any other attribute) can be preserved. Some
diagrammatic examples are given in Figure 4, below.
Stereophonic DVD with 5.1 channel

Analogue mixing desk
microphone pair sound track
2-channel 6-channel 16-channel

CASS CASS CASS
Figure 4. Some examples of

multiple discrete encoded audio
streams grouped into coherent
audio source sets.
In the examples given above, all of the audio channels available within a
given medium are grouped together into a single CASS. This procedure is
fairly common in dealing with fixed medium electroacoustic works. It
124
would be usual, for instance, for the two audio channels available from a
compact disc to be conceptually grouped together and treated as a single,
two-channel, CASS. Similarly, in octaphonic works, it is common for the
eight channels encoded onto, say, an ADAT tape, to collectively represent a
single, homogeneous, octaphonic spatio-auditory image.141 It is critical to
state at this point, however, that a coherent audio source set need not simply
be defined as the total number of audio channels available on any given
medium. In a personal interview with the author142, Harrison described his
octaphonic tape piece Rock ‘n’ Roll, in which the eight discrete channels
present on the recording medium are subdivided into two coherent audio
source sets: one stereophonic and one six-channel. The piece therefore
comprises one coherent two-channel image, and one coherent six-channel
image. The integrity of the stereo image will be lost if it is not treated as a
CASS with its ‘own internal atomic cohesion’ (to reiterate the Clozier
citation given in section 2.5), and the same can be said of the second, six-
channel, image. In diffusing this piece one would expect that attempts
should be made to maintain the integrity of both coherent audio source sets,
but that the precise relationship between the two need not necessarily be so
strict.
Truax describes a comparable approach:
I would like to suggest that the multiple-channel system can be understood as

an extension of stereo practice. Eight-channel tape, for instance, can be thought
of as four contrapuntal stereo layers… This can be done […] with multiple
channel inputs where each soundtrack can be kept discrete and projected
independently of all others.143
Here, Truax is effectively suggesting that, rather than regard the eight
channels of an octaphonic fixed medium as one single CASS, one might
compose four independent two-channel sets. Each two channel CASS
represents a homogeneous stereophonic image in its own right; the temporal
141
Some examples of the various coherent loudspeaker sets used to decode such CASSs
will be described later, in section 3.7. For the time being, we will focus only on the CASSs
themselves.
142
143
Truax (1998). "Composition and Diffusion - Space in Sound in Space". Organised
Sound, 3(2): 145, 141.
125
relationship between the four stereo sets is fixed onto the medium, but no
spatial relationship is defined at this stage. Harrison’s and Truax’s
respective subdivisions of the eight available fixed medium channels into
coherent audio source sets, along with some further arbitrary examples, are
illustrated in Figure 5, below.
A-DAT KEY
Coherent Audio Source Set
Single encoded audio stream
Jonty Harrison's Rock 'n' Roll -

2-ch. CASS + 6-ch. CASS
Barry Truax's proposed

2 + 2 + 2 + 2 ch. CASS's
All channels treated as

one 8-channel CASS
3 CASS's: 1 + 3 + 4
channels
8 channels treated as 8
one-channel CASS's
Figure 5. Some arbitrary

examples of the subdivision of
eight ADAT audio channels into
coherent audio source sets.
These examples serve to demonstrate the true value of the expression

‘coherent audio source set,’ insofar as coherent audio source sets can be
independent of the number of discrete audio channels supported by a fixed
(or any other) medium. Nonetheless it should be restated that (at present) the
majority of fixed-medium works do indeed treat all of the available channels
as one homogeneous CASS.
Of course a coherent audio source set need not consist of audio channels
encoded onto a fixed medium at all: the concept applies to all encoded audio
streams, whether static or transitory, analogue or digital (see section 1.6).
Consider the case of an instrumentalist, whose performance is being
encoded via a near-coincident stereophonic pair of microphones. The two
transitory-analogue encoded signals – one for each microphone – can (and
probably should) be treated as a coherent audio source set.
126
A coherent audio source set may also comprise arbitrarily grouped encoded
audio channels from a number of different sources. Arguably, this could be
said to be the case if two transitory microphone signals are collectively
regarded as a single stereophonic CASS, as illustrated in Figure 4, above.
Alternatively, the sources of CASS channels could be more disparate. For
instance, the of total four encoded audio channels from two compact discs
could be conglomerated into a single, four-channel, CASS. In this case, the
CASS might also be said to possess one level of hierarchy: two stereophonic
(sub-)CASSs aggregated into one four-channel CASS. Such possibilities
will be described more fully in the next section.
3.6. CASS Attributes
Having introduced the concept of the CASS, we can now summarise some
of its key characteristics:
3.6.1. Consists of an Arbitrary Number of Discrete

Encoded Audio Streams Treated Collectively
as a Single Entity
A CASS consists of an arbitrary number of discrete but fundamentally

inter-related encoded audio streams. Very often this number will remain
constant, for any given CASS, throughout the duration of a work, but
this does not necessarily have to be the case. It would theoretically be
possible, for instance, for a CASS to be composed of, say, six fixed
medium channels and then, at some point later in the work, be
augmented to eight channels. Although the author is unaware of any
instances of this at the time of writing it is, at least, a theoretical
possibility. Notwithstanding this, in many (most) cases a CASS will
comprise a fixed number of audio channels.
3.6.2. Independent of the Number of Encoded

Audio Streams provided by any Single Audio
Source
In section 1.6 it was observed that electroacoustic music deals almost

exclusively in audio streams that are statically or transitorily encoded.
127
The two encoded signals from a stereophonic microphone could
justifiably be regarded as a single stereophonic CASS, thus defining a
constant relationship between the constituent channels; this is fairly
intuitive. What is perhaps less intuitive is that, conceptually, a CASS
need not consist of the sum total of independent audio channels from
any single ‘audio source’ (such as a microphone, or multichannel
ADAT) and may therefore consist of less, or more, channels than this.
Some examples of the former case were discussed in section 3.5: the
eight channels of an ADAT tape could be conceptually grouped into
four two-channel CASSs, each of course containing fewer channels than
the total available on the (in this case, fixed medium) source. For a
CASS to contain more channels than the number available from any
single source, it would clearly have to be composed of audio channels
from more than one source, and the ‘nested’ CASSs might therefore be
regarded as hierarchically subordinate to the ‘aggregated’ CASS. This
latter case will be further clarified in section 3.6.6.
3.6.3. Constituent Channels Contain Phantom

Imaging
One of the most important defining characteristics of a CASS is that

phantom imaging techniques may be present (read: almost certainly will
be present) across its constituent channels. The relationship between
these channels is therefore not arbitrary, but constant. It is primarily for
this reason that, in the vast majority of cases, all of the channels of a
CASS must be present at all times during reproduction if the coherent
bond between them is to be maintained to any reasonable degree.
3.6.4. Order of Channels is Spatially Determinate
Another characteristic of the CASS – strongly related to phantom

imaging – is that, typically, it abstractly defines a hypothetical
‘sounding space’ which is delineated by the constituent audio channels.
To use the amplitude panning method of phantom imaging, it would be
true to say that signal in one CASS channel abstractly denotes sound at
128
a different location from that implied by signal in another channel. To
put it another way, the distribution of signals across the channels of the
CASS (and perhaps also the spectral characteristics of those signals,
which would be more relevant to a binaural stereophonic CASS for
example), has a constant relationship with the relative ‘location’ of the
sound at any given moment upon reproduction. This is equivalent to the
notion of ‘order’ in coherent loudspeaker sets, which will be described
in section 3.8.2.
3.6.5. Reliance on a Specific Coherent

Loudspeaker Set
Following on from the previous item in particular it is true to say that,

ultimately, the efficacy of a CASS upon reproduction depends on the
positional relationships between the loudspeakers representing each
constituent channel being, more or less, constant and, to a certain extent,
conforming to the notionally ‘ideal’ configuration for that particular
CASS. This does not mean that the geometrical positions of the
loudspeakers must be absolute, but rather that, at any given moment
during diffusion, the loudspeakers being used to recreate the CASS
must be spatially arranged relative to each other in a formation that at
least in some way resembles the relationship assumed by the CASS.
Obviously what exactly constitutes a ‘related’ spatial formation in this
context varies from composer to composer, but (for example) for a
stereo CASS to be fully realised it would need to be broadcast over two
loudspeakers between which effective phantom imaging could take
place to some extent. This particular condition of the coherent bond
between CASS channels demonstrates the close relationship between
CASS and CLS, and will be discussed again in section 3.7.
3.6.6. Hierarchical Organisation
One aspect of the CASS model that is particularly interesting from the
point of view of diffusion, is that it is conceptually hierarchical. That is
to say, a CASS could theoretically be composed of two independent
129
sub-CASSs. Consider the case of an electroacoustic work for two-
channel tape and a live, stereo-miked, instrumentalist. Here, in the most
obvious case, we are dealing with two independent, two-channel,
coherent audio source sets. Each CASS exhibits phantom imaging
across its constituent channels as well as all of the other conditions of
the coherent bond; this state of affairs is represented pictorially in
Figure 6.
...from Processed
...from Tape Microphone Signals
KEY:
Audio Channel
Coherent Bond between Channels
Figure 6. Graphical
representation of two
stereophonic CASSs in an
electroacoustic work for
processed instrument and tape.
Alternatively, the total four channels (compact disc plus two

microphone signals) could effectively be treated as one single CASS. In
doing so, a constant arbitrary relationship – or relative bond – between
the two homogeneous stereo images would be defined: perhaps the left
and right channels of each are mixed; perhaps the tape channels become
the ‘fronts’ and the microphone channels the ‘rears’ of a quadraphonic-
style setup; perhaps the two stereo images are mixed but the lefts and
rights of each are reversed with respect to each other; the possibilities
are numerous. In encapsulating these two CASSs within an aggregated
CASS, the relative bond between the constituents would be maintained
in diffusion, effectively treating the two notionally independent images
as a single entity. This would result in a CASS with one level of
hierarchy, as illustrated in Figure 7, below. This specific example would
130
allow for combined tape and live processed material to be diffused as a
single source and represents an example of a CASS containing more
channels than those provided by any single medium/source.
Hierarchical CASS
Tape Microphone Signals
KEY:
Audio Channel
Coherent Bond between Channels
Relative Bond between Nested CASS's
Figure 7. A CASS with one level

of hierarchy, consisting of two
nested CASSs linked by a
constant relative bond.
Equally (although perhaps more bizarrely) the four channels in the

previous example could be treated as one three-channel CASS and an
independent monophonic CASS. Of course, this does not mean to say
that this configuration would necessarily be appropriate, but merely that
it is a theoretical possibility. Essentially, to summarise, the number and
nature of the audio signals within a CASS is arbitrary, and therefore
any collection of signals could, theoretically, be treated as a CASS. It is
the responsibility of the performer (diffuser) to make appropriate
decisions in this regard.
It seems reasonable to suggest that, under any circumstances, the existence

of a coherent audio source set is the result of an intentional act, and that the
set will have certain indigenous properties that are important to its effective
realisation upon reproduction; such is the conditional bond between its
constituent channels. A stereophonic electroacoustic work, as a specific
two-channel CASS, will most likely contain subtly executed phantom
imaging that has been carefully prepared in the studio. Similarly, an
131
engineer recording a piano recital in the studio will take care over the
precise positioning of microphones, monitoring the signal and making
adjustments and refinements until the perfect stereo image is obtained. Such
procedures are not arbitrary acts, but precise and intentional ones and, in the
context of electroacoustic music performance, it is important that such
coherent audio source sets are handled sensitively, and in a manner that
affords respect to the craft of their creation: on this aspect most practitioners
would agree. To give a crude example, it would probably be inappropriate
to fail to reproduce the centre channel of a 5.1 channel CASS, or to
arbitrarily present images intended for the surround speakers at the front and
vice versa (of course some composers may not object as strongly as others
to such an approach). It would be equally inappropriate to reproduce a tape
piece intended for a standard 5.1 configuration over an array of
loudspeakers positioned completely differently with respect to each other:
the resulting phantom images would bear little or no resemblance to those
intended by the composer.144 Overall, it is true to say that the coherent bond
between CASS channels is conditional because in order for the desired
effect to be realised, or even approximated, the coherent audio source set
must be treated as such, that is, it must be dealt with in a way that
acknowledges the criteria on which its coherency and homogeneity rests. An
important aspect of this concerns the relative positions of the loudspeakers
used to decode the CASS. It is primarily in this respect that the notions of
CASS and CLS are fundamentally interrelated.
3.7. Coherent Loudspeaker Sets (CLS)
A coherent loudspeaker set (CLS) is an arbitrary group of loudspeakers

over which a CASS can be broadcast without unduly disrupting its
coherency.
144
Of course the notion of ‘reproducing what the composer intended’ is not quite as simple
as this statement seems to imply, because – as discussed in Chapters 1 and 2 – composers
have a propensity to intend very different things with respect to each other, and also tend to
have very different ideas with respect to how their compositions should be realised in
diffusion. We will return to this matter later.
132
Stereo is, once again, a simple and convenient exemplary paradigm. In a
stereophonic CASS (that is to say, in most two-channel stereophonic
recordings) there are two independent encoded audio streams. It is usual in
such cases for the two independent channels to be used collectively to
encode spatial attributes with respect to a single axis, as described in section
1.6.2: this is why, collectively, they constitute a CASS. Because the audio
streams are encoded, it is assumed that – in order for the encoding to have
served any useful purpose – they will at some point be decoded via the
process of reproduction over loudspeakers. But in order for the intended
spatial effect to be correctly recreated (i.e. for the coherency of the CASS to
be preserved), some fairly specific conditions with respect to the number,
and relative positioning, of loudspeakers, must be observed. If this is the
case, then the group of loudspeakers in question can be regarded as a CLS.
This is the crucial point of contact between CASS and CLS and represents
the most important respect in which they are related.
In composing a stereophonic work, the composer is likely to have based his

or her compositional actions, to some extent, on a certain set of assumptions
with respect to how the CASS will be reproduced over loudspeakers in
performance. In this case, two matched loudspeakers are probably assumed.
These should be subtended at an ideal angle of between sixty145 and one-
hundred-and-twenty146 degrees (depending on whose research is observed)
with respect to the ideal listening location; certainly one loudspeaker should
be positioned further to the ‘left’ than the other. Ideally the listener should
be facing directly forwards and situated at a point in front of the
loudspeakers and equidistant from each of them. The ‘left’ channel of the
CASS should be fed to the left-most loudspeaker, and the ‘right’ channel to
the right-most, and the two constituent signals of the CASS should be
subjected to equal amounts of amplification (although some diffusers –
particularly if they are top-down practitioners – might dispute this final
point; this will be discussed later). These conditions collectively represent
145
Malham (1998). "Approaches to Spatialisation". Organised Sound, 3(2): 174.
146
Küpper (1998). "Analysis of the Spatial Parameter: Psychoacoustic Measurements in
Sound Cupolas". In Barrière and Bennett [Eds.] Composition / Diffusion in Electroacoustic
Music: 190.
133
the criteria for a CLS that would be appropriate for the reproduction of a
typical stereophonic CASS, as illustrated in Figure 8.
R
L
60º - 120º
Listener
Figure 8. ‘Ideal’ CLS conditions

with respect to maintaining
typical stereophonic CASS
coherency.
The list of CLS conditions for the successful reproduction of a stereophonic

CASS could be augmented further still, but it is worth posing the question,
‘What are the most important conditions?’ This is, of course, a matter of
opinion above all else. For certain, if all of the criteria described are met
(and, it should be noted, this will probably only be possible in the case of a
single ideally positioned listener) then the ‘best’ phantom imaging results
will be achieved. But is it really necessary to meet all of these criteria?
Before considering the answer to this question, some further CLS
configurations will be considered.
Another CLS configuration in common use is the 5.1 array. In composing

for such an array it is likely to have been assumed that five – ideally
matched – loudspeakers and a sub-woofer will be made available during
performance, that their relative positions with respect to the audience and to
each other will more-or-less conform to the accepted standard for 5.1 sound
reproduction – as illustrated in Figure 9 – and that the CASS signals will be
routed appropriately to their corresponding loudspeaker outputs.
134
C
R
L
30º
Listener
140º
Rs
Ls
Figure 9. Relative loudspeaker

positions in a standard 5.1 CLS
(excluding sub-woofer).
Eight-channel (octaphonic) CLSs are also fairly common in electroacoustic

music, although, in contrast with stereophonic and 5.1 configurations, there
is less unanimity with regard to the relative loudspeaker positions in an
octaphonic array. Harrison’s octaphonic tape work Streams assumes the
‘main eight’ arrangement of loudspeakers as frequently deployed by
Birmingham Electro-Acoustic Sound Theatre (BEAST) in concerts of
electroacoustic music; this configuration is illustrated in Figure 10.
Audience
Figure 10. BEAST ‘Main Eight’

octaphonic loudspeaker
configuration.
135
Wyatt states that ‘three different loudspeaker positionings for eight-channel
systems are in common use.’147 As well as reiterating BEAST’s ‘main eight’
octaphonic configuration, Wyatt describes two further possible
arrangements of eight loudspeakers: one in a circular formation around the
audience and one rectangular array. These configurations are shown in
Figure 11 and Figure 12 respectively. Again, in order for the spatial imaging
to remain notionally ‘true’ to what was composed in the studio (the notion
of a theoretically ‘true’ reproduction will be discussed at length later) the
loudspeaker positions and signal routings during performance must be the
same as those assumed by the CASS.
45º
Listener
Figure 11. Circular octaphonic

loudspeaker array as described
by Wyatt.148
147
Wyatt, et al. (1999). "Investigative Studies on Sound Diffusion / Projection". EContact!
2(4). Electronic journal available at:
http://cec.concordia.ca/econtact/Diffusion/Investigative.htm.
148
Ibid.
136
212.5 cm
376.25 cm
Diffuser
Figure 12. Rectangular

octaphonic loudspeaker array as
described by Wyatt.149
3.8. CLS Attributes
Let us now return to the question posed previously: what are the absolutely
indispensable characteristics of a CLS? One could argue that – at the most
basic level – the most important criteria are: that the number of
loudspeakers contained within the CLS at least equals the number of
channels contained within the CASS to be reproduced;150 and that each
loudspeaker is placed in a different spatial location from the others (it is, of
course, physically impossible to negate this latter criterion). Realistically,
however, such minimal criteria would probably be considered inadequate by
most electroacoustic musicians.
149
Ibid.
150
There are exceptions to this generalisation. Ambisonic B-format (assuming first-order
encoding) consists of four encoded audio streams that can be transcoded into an
(essentially) arbitrary number loudspeaker feed signals. If the Ambisonic B-format is
regarded as the CASS, then the number of loudspeakers will almost invariably exceed the
number of channels contained within the (B-format) CASS. However, in order to be
reproduced, the Ambisonic B-format must be transcoded into a number of signals suitable
for decoding via a specific loudspeaker array, and in this case the number of loudspeakers
must indeed be equal to the number of channels present in the (transcoded) CASS. In this
respect, Ambisonic B-format can be viewed as a further abstraction of the CASS itself.
Ambisonics aside, a case could (at the very least theoretically) be made for using a CLS
with fewer channels than the CASS in question. This would, of course, result in the loss of
a certain amount of spatial information, but could be regarded as equivalent to producing an
alternative ‘mix’ of the electroacoustic work, designed for reproduction over a smaller
number of loudspeakers. The author’s own Graffiti 2 (2003), for example, exists in both
two-channel stereo and 5.1 versions.
137
3.8.1. Specific Shape
It therefore seems reasonable to suggest that there should also be a

certain consistency with respect to the relative positioning of individual
loudspeakers within a CLS. In the case of a stereophonic CLS it is
proposed that the two loudspeakers should collectively delineate a
single axis. The existence of this axis may be (and usually is) reinforced
by training each of the loudspeakers on a single ‘focal point,’ normally
located at some point along a second axis that is at right-angles to the
centre of the ‘main’ axis: we will return to this notion later. In 5.1, the
five full-range loudspeakers are not usually positioned arbitrarily, but
such that they conform to a particular ‘shape’ (see Figure 9). In terms of
its essential characteristics, this particular CLS formation can be seen to
demarcate a plane (as opposed to a single axis) with a trapezium-shaped
outline. It should also be noted that, within this trapezoid architecture,
there are two ‘sub-groups’ of loudspeakers, which occupy the parallel
sides: one group of three ‘frontal’ loudspeakers, and one group of two
‘rear’ loudspeakers. In Figure 9 all five full-range loudspeakers share a
common, central, focal point, but it is also common for each of the two
sub-groups to have focal points at different places along a central ‘front-
to-back’ axis. In this case, it could be argued that the ‘frontal’ and ‘rear’
loudspeakers represent two hierarchically subordinate CLSs contained
within one, hierarchically superior, CLS.
The ‘shape’ of a coherent loudspeaker set, therefore, refers to the

physical positioning of the constituent loudspeakers, relative to each
other, within a venue. The direction in which loudspeakers are facing
can also be a criterion of the shape of a CLS, although this is less
important than the relative positions. For example, in some cases a
common ‘focal point’ will be desirable, while in other cases it may be
less necessary. In Figure 9, the five full-range loudspeakers in the 5.1
CLS share a common focal point. In Figure 13 the loudspeakers do not
share a common focal point, but they do still conform to the generally
accepted shape of the 5.1 CLS. It is also important to note that the
138
‘correct’ shape for a CLS is determined by certain aspects of the
coherent bond of the CASS that is to be decoded through it. It would, of
course, also be true to say that the conditions of the CASS are
formulated in response to the shape of the CLS that is used during the
compositional process. Stereo and 5.1 configurations are convenient
examples because they are conventional, but in a sense they are
misleading as they suggest that the nature of coherent loudspeaker sets
determines the nature of coherent audio source sets. In abstract (non-
conventional) terms this is not the case: the relationship between CASS
and CLS is one of co-dependency, and in this respect CASS and CLS
define each other.
C R
L
Rs
Ls
Figure 13. Five loudspeakers

within a CLS. The loudspeakers
do not have a common focal
point but the CLS still conforms
to the conventional 5.1 shape.
3.8.2. CASS Channels must be Appropriately

Routed to CLS Loudspeakers
Another characteristic of both CLSs described so far is that the

constituent loudspeakers exist in a spatially-differentiated order: in
stereo, we have ‘left’ and ‘right’; in 5.1 we have ‘left,’ ‘right,’ ‘centre,’
‘left-surround,’ ‘right-surround,’ and ‘LFE’ (the latter of which is not,
in fact, spatially differentiated: according to certain researchers, the
rationale behind this is questionable151). In both cases the order is
151
It is commonly stated that, below a certain threshold frequency, localisation of sound
sources in terms of directionality becomes impossible. On this basis, it can be suggested
139
rationally connected to the relative positions of the individual
loudspeakers relative to the overall CLS shape: the ‘lefts’ are positioned
further ‘to the left’ than the ‘rights’; ‘centre’ is central in relation to
‘left’ and ‘right,’ and so on.
In section 3.6.4 it was stated that the order of the constituent channels
of a CASS is spatially determinate: there is a constant relationship
between the individual encoded streams that defines the intended spatio-
auditory attributes on decoding. In order for these spatio-auditory
attributes to be decoded successfully, the loudspeakers will need to be
correctly positioned in relation to each other, and the individual CASS
channels will need to be routed to the correct CLS loudspeakers. An
exemplary description of the relationship between a stereophonic
CASS, and a stereophonic CLS, was given at the beginning of section
3.7.
3.8.3. Reliance on a Specific Coherent Audio

Source Set
The previous two sections articulate an important relationship: above

all, a CLS is effectively meaningless in the absence of a CASS whose
conditions match those provided by the CLS. The inverse, as described
in section 3.6.5, is also true: a CASS is meaningless unless there is a
CLS whose characteristics will allow for it to be decoded acceptably. In
this respect the two concepts are co-defining and, therefore, depend on
each other.
3.8.4. Multiple CLSs can Exist within a Single

Loudspeaker Array
It is conceptually useful to differentiate coherent loudspeaker sets from

loudspeaker arrays: the two expressions are not synonymous. A
that the spatial positioning of the LFE loudspeaker – because it only emits very low
frequencies – is unimportant. According to certain sources, however, recent research
implies that this may not, in fact, be the case:
Malham (2001). "Toward Reality Equivalence in Spatial Sound Diffusion".
Computer Music Journal, 25(4): 34.
140
loudspeaker array should be understood to mean the total number, and
formation, of loudspeakers present within a sound diffusion system; a
CLS can be regarded as a hierarchical subdivision of this. An
appropriate example would be the ‘main eight’ loudspeaker array
commonly utilised in the BEAST sound diffusion system; this is
illustrated in Figure 14, below, and will be described further in section
3.12.1. It is true to state that this array was originally conceived to be
appropriate for the decoding of stereophonic CASSs. Accordingly, the
total number of loudspeakers in the array (eight) is sub-divided into four
coherent loudspeaker sets consisting of two loudspeakers each: in
Figure 14 these are labelled ‘distant pair,’ ‘main pair,’ ‘wide pair,’ and
‘rear pair,’ respectively. Notice that each of the CLSs is similar in
shape, meaning that each individual CLS is appropriate for the coherent
reproduction of a two-channel stereophonic CASS. This kind of practice
will later be defined as ‘pluriphony.’
Distant Pair
Main Pair
Wide Pair
Audience
Rear Pair
Figure 14. The BEAST ‘Main

Eight’ array of loudspeakers,
consisting of four two-speaker
CLSs.
This does not mean that a loudspeaker array has to be subdivided into
several independent CLSs: it could, indeed, be treated as a single CLS.
If a stereophonic CASS is broadcast via a single stereophonic CLS, and
141
there are no other loudspeakers present in the array, then the array
represents a single CLS only. Similarly, if a 5.1 composition is
reproduced over a standard 5.1 array, with only one loudspeaker present
for each CASS channel, then the array, in its entirety, represents a single
CLS that conforms to the conditions of the coherent bond between the
constituent channels of a 5.1 CASS. These scenarios were illustrated in
Figure 8 and Figure 9, respectively. As mentioned previously,
Harrison’s octaphonic work Streams treats the BEAST ‘main eight’
array as one single, eight-channel, CLS, as opposed to four groups of
two.
3.8.5. A Single Loudspeaker can be a Member of

Multiple CLSs
Because the coherent loudspeaker set is simply a conceptual grouping of

loudspeakers not necessarily consisting of the total number of
loudspeakers in an array, so any given loudspeaker within an array can
theoretically be a member of multiple coherent loudspeaker sets. Figure
15 illustrates a circular eight-channel loudspeaker array152 in which the
front-centre loudspeaker is treated as a member of several arbitrarily
chosen stereophonic CLSs.
152
As proposed by Wyatt: see page 136.
142
Figure 15. Four out of the
theoretical fourteen possible
stereophonic CLSs involving the
centre-front loudspeaker in a
circular octaphonic loudspeaker
array.
In this case, each individual loudspeaker can be a member of seven

different coherent loudspeaker sets, this number being doubled if
inverted left-to-right audio source routings are considered. Whether or
not all of these hypothetical CLSs would actually be appropriate for the
accurate broadcast of stereophonic CASSs is, of course, a matter of
debate, and may well vary depending on the piece being diffused and
the attitudes of the diffuser.
3.8.6. Hierarchical Organisation
Related to the previous item, coherent loudspeaker sets can be organised

hierarchically (and as described in section 3.6.6, this is also the case
with coherent audio source sets). For instance, in defining a 5.1 CLS,
one has conceptually grouped a number of independent loudspeakers
together on the basis that they are likely to be utilised for the broadcast
143
of several similarly grouped encoded audio channels. It would,
theoretically, be possible to further define ‘sub-groups’ within that CLS;
for example, the three frontal, and the two rear, loudspeakers. This
particular example is illustrated in Figure 16 below.
CLS 1
CLS 1.a
CLS 1.b
Figure 16. Five loudspeakers

collectively constituting a CLS,
within which there are two
further sub-CLSs.
Such an approach could be useful, for example, if the piece in question

treats the three frontal speakers and the two rear speakers as essentially
separate multichannel images in some parts, but as a single five-channel
image in others.
3.9. Sound Diffusion: A Re-Evaluation in terms of the

Concepts of CASS and CLS
Models of the coherent audio source set and the coherent loudspeaker set
have been proposed as useful tools in helping to define the relationship
between audio sources (those technologies used to encode, manipulate, and
transmit audio streams) and loudspeakers (those technologies used to
decode audio streams). In mediating this particular relationship it seems
reasonable to suggest that these concepts should be particularly useful
within the context of sound diffusion, whose basic task is also to mediate
between audio sources and audio decoders. Accordingly, it can now be
proposed with more clarity that the process of sound diffusion involves the
decoding of one or more coherent audio source sets via one or more
144
coherent loudspeaker sets, with one of the principal aims being to maintain
the coherency of the source set(s) throughout the process. As outlined in
section 3.3, this objective is often complicated by acoustic issues. The
remaining sections within the present chapter will be devoted to an
examination of the differing approaches by which top-down and bottom-up
practitioners seek to negotiate these difficulties as a means to achieving the
ultimate objective of musical communication.
3.10. ‘Diffusion’ versus ‘Panning’
It is a commonly held misconception that ‘diffusion’ and ‘panning’ (i.e. the

act of artificially applying spatial attributes to encoded audio streams) are in
some way synonymous. In an interview with the author, Harrison identified
this misconception at a very early stage:
JM: What does ‘sound diffusion’ mean to you?
JH: What it means to me, is to do with not just space. That’s the first thing to
say. I think one of the problems is that when people talk about diffusion and
spatialisation, it’s as though it’s something being added to the music, that’s not
already inherent in the music. To me, whether spatial, or any other thing that
you apply to the material coming from CD or tape, it should be in the spirit of
the music. So space, of course, comes into it, but it is not the only aspect. […]
The problem is that a lot of people think that diffusion and spatialisation are
synonymous, and I don’t think they are; I think they’re different things.153
Here, Harrison touches upon what could be one of the fundamental reasons
behind the hostility of bottom-up composers towards the top-down style of
diffusion: the misconception that something is being inappropriately and
arbitrarily ‘added’ to their carefully constructed music. In this context it
may be useful, therefore, to differentiate between the predominantly
performance-oriented practice of diffusion, and the conceptually different
practice of panning.
A brief examination of the essential nature of ‘panning’ might lead one to

propose that it represents part of the process of constructing a coherent
audio source set. For example, a monophonic sound object might be placed
and moved around within a stereophonic CASS using the pan-pots on a
153
145
mixing desk, or their software equivalents. The final stereophonic CASS
might ultimately comprise several monophonic sound objects panned across
its two constituent channels in this way. By extension, a stereophonic source
recording may be panned within an octaphonic CASS via a similar
procedure. Indeed any number of source materials may be panned within the
same CASS, in a process that basically entails placing or moving audio
sources (the individual ‘sound objects’ that constitute the CASS) within a
virtual space represented by a greater, or at least equal, number of channels.
The process of diffusion, on the other hand, is more likely to be concerned
with the presentation of one or more ‘complete’ coherent audio source sets
already containing multiple ‘panned’ sound objects across their constituent
channels. In this respect panning is a ‘lower level’ practice than diffusion.
This does not mean that the process of ‘panning,’ as such, is restricted to the
confines of the studio: in certain performances of electroacoustic music,
particularly where live audio sources are involved, an element of ‘live
panning’ (as distinct from ‘diffusion’) may be entirely appropriate.
Intuitively, and generally, it would seem to make more sense for singly
identifiable sound sources to be ‘panned,’ and coherent audio source sets to
be ‘diffused.’ As a hypothetical example, consider the case of an
electroacoustic work for quadraphonic tape and (monophonic) synthesised
sounds to be diffused over an array of twelve loudspeakers as illustrated in
Figure 17, below. The tape part – a four-channel CASS – can be diffused
over the twelve loudspeakers in such a way that its coherency is maintained
throughout the performance. In doing so, it is likely that coherent sets of
four loudspeakers will be used. Those loudspeakers highlighted in different
colours in Figure 17 are appropriate examples. The monophonic signal from
the synthesiser, however, can essentially be panned around the output
channels of what is, effectively, a twelve-channel CASS, decoded in real-
time via a single coherent loudspeaker set of twelve. The tape part is treated
as a CASS, while the synthesiser part is treated more like a ‘point source.’ A
scenario such as this demonstrates the usefulness of differentiating the
processes of ‘panning’ and ‘diffusion’ as described.
146
Audience
Figure 17. Hypothetical 12-

speaker diffusion array,
subdivided into three ‘coherent
loudspeaker sets’ of four
loudspeakers each. A four-
channel CASS could be diffused
via these three CLSs, while a
monophonic source could be
panned within a twelve-channel
CASS represented by a single
CLS consisting of all of the
loudspeakers contained within
the array.
Clearly there is there is the potential for a certain degree of cross-over

between these two expressions. As previously mentioned, a stereophonic
source could conceivably be ‘panned’ around, say, an eight-channel CASS,
and in doing so one would expect that some attempts be made to maintain
the original coherency of the stereophonic sound source within the CASS.
This might be thought of as another example of the hierarchical nature of
the CASS as described in section 3.6.6. Additionally the possibility of
‘panning’ – perhaps more readily identifiable as a compositional practice –
as an appropriate aspect of the diffusion of certain electroacoustic works,
represents a sense in which both composition and performance exist within
a continuum: this will be more fully described in section 3.13.
3.11. Uniphony and Pluriphony
Roads et al use the expression ‘pluriphony’ to denote the act of diffusing a

stereophonic audio source via multiple stereophonic loudspeaker pairs.154 In
154
Roads, Kuchera-Morin and Pope (2001). "Research on Spatial and Surround Sound at
CREATE". Website available at: http://www.ccmrc.ucsb.edu/wp/SpatialSnd.2.pdf.
147
abstract terms, this concept can be extended and generalised as a practice in
which a coherent audio source set is diffused via multiple appropriate
coherent loudspeaker sets. The relationship between CASS and CLS – and
between individual CASS channels and individual CLS channels – can
therefore be described as ‘one to many.’
Roads et al do not explicitly define pluriphony in terms of its opposite, and

it is for this reason that the term ‘uniphony’ is proposed. In contrast with the
pluriphonic approach, uniphony can be described as a practice in which a
coherent audio source set is presented via a single appropriate coherent
loudspeaker set. The relationship between CASS and CLS (and, again,
between individual CASS channels and individual CLS channels) is
therefore ‘one to one.’ A diagram clarifying the difference between
pluriphony and uniphony, using stereo and 5.1 as examples, is given in
Figure 18, below.
148
Stereophonic 5.1 (LFE omitted)
2 channels: L, R 5 channels: L, C, R, Ls, Rs CASS
L R L C R
Uniphonic
Ls Rs
L R L C R CLS('s)
Pluriphonic
Ls Rs
Figure 18. Diagram illustrating

the difference between uniphonic
and pluriphonic CASS-to-CLS
relationships, using stereo and
5.1 (LFE channel omitted) as
examples.
3.12. Top-Down and Bottom-Up Approaches to Sound

Diffusion
At the end of section 3.2 an as yet unresolved question was posed: If sound
diffusion is an active process whose fundamental aim is to ensure that an
electroacoustic work is accurately communicated to an audience, then what
exactly does this active process involve? What, specifically, are the acts? It
can be deduced from the previous sections that maintaining the coherency of
the CASS (in the broadest possible sense) is an issue central to the
performance of electroacoustic music in diffusion. As noted in section 3.5,
the successful broadcast of coherent audio source sets, even over the
minimum number of loudspeakers (two for stereo, four for quadraphonic, et
cetera), is dependent on a number of acoustic conditions. Most performance
149
venues do not fully support these conditions and some of the reasons behind
this were described in section 3.3. It is therefore clear that the public
performance scenario seems, in many cases, to conspire against this central
objective to a certain degree, and that appropriate measures must be taken
to minimise the impact of this. It is proposed that these essential problems
are generally recognised by most composers and practitioners regardless of
their aesthetic directionality (top-down or bottom-up) and that it is in
finding their ultimate solution that fundamental differences in approach
become most obvious. The question therefore becomes, ‘What manner of
coherency are we attempting to preserve, and what might constitute
appropriate measures to achieve this particular kind of coherency?’ This is,
essentially, the issue upon which the whole sound diffusion debate centres.
With certain conceptual frameworks in place, it can now be proposed that

the exact nature of the ‘act’ of sound diffusion represents a continuation of
the overall nature of the ‘act’ of composition itself. In Chapter 2 it was
suggested that the compositional process can, broadly speaking, be
characterised as either top-down or bottom-up. Accordingly, it is now
proposed that the ‘act’ of diffusion can also be either top-down or bottom-
up in nature. That is, the active measures taken to ensure the successful
communication of the work are demonstrably congruent with the overall
notions of either top-down or bottom-up practice. Much of the remainder of
this chapter, and some of the following chapter, will be directed towards
consolidating and exemplifying this assertion.
At this stage it is necessary to recall the continuum that exists between top-
down and bottom-up models of compositional thought – Table 3 on page 84
is a useful refresher should one be necessary – and to consider what might
constitute ‘appropriate measures’ from these two standpoints, and what
might be the nature of the ‘coherency’ that each seeks to preserve. A
considerable portion of electroacoustic work exists on stereophonic fixed
media alone, and this also serves as a convenient and conceptually simple
example through which to explain the basic premises of top-down and
bottom-up attitudes towards sound diffusion. It is proposed that the same
150
basic concepts, however, can be applied to electroacoustic music of any
technological permutation.
3.12.1. The Top-Down Model
One possible solution to many of the problems outlined in section 3.3 is

to use more than two loudspeakers, thereby theoretically presenting a
less compromised auditory image of the stereophonic material by
(amongst other things to be described presently) minimising the
distance between audience members and loudspeakers. The use of
multiple loudspeaker pairs also addresses the power issues described
earlier. In abstract terms this solution entails diffusing a CASS over a
number of loudspeakers that is larger than the number of audio
channels that constitute the CASS, that is, pluriphonically. This kind of
approach is advocated by the Birmingham Electro-Acoustic Sound
Theatre (BEAST), whose ‘main eight’ configuration of loudspeakers
(see Figure 14 on page 141) is described by Harrison as ‘the absolute
minimum for the playback of stereo tapes’.155
This circumstance raises the question, then, of how to present a stereo

image across more than two loudspeakers. The top-down model of
sound diffusion observes the fact that, outside of the controlled studio
environment (which will most likely be relatively acoustically stable
and populated by a single ideally positioned listener) acoustic detail will
be lost. For example, dynamic subtlety that was obvious in the studio
will be far less so in a larger, less acoustically ideal, performance venue.
In response to this observation, Harrison states the following:
The composer will have indicated relatively louder and quieter events. In
performance I would, at the very least, advocate enhancing these dynamic
strata – making the loud material louder and the quiet material quieter –
and thus stretching out the dynamic range to be something nearer what the
ear expects in a concert situation.156
Smalley concurs:
155
156
Ibid.: 120-1.
151
In a recorded format you can never achieve an ideal dynamic range that
will suit all spaces and contexts; maybe it is not even ideal on two
loudspeakers. And so you need to exaggerate or highlight the high end –
lift the top levels up – and possibly drop the low levels down. Extending
the dynamic range affects peoples’ perceptions of the piece and permits
and enhancing of the structural shape.157
Such practice already begins to address the question of how to utilise

additional pairs of loudspeakers:
For effects of distance (which on the original stereo tape are implied by
careful balancing of amplitude and reverberation characteristics, and
which are very susceptible to being swallowed by (actual) concert hall
acoustics), it is useful to be able to move the sound from close to distant in
reality, following the cue on the tape – hence the distant pair [of
loudspeakers].158
Following the same logical process, it follows that a sweeping pan from
the left to the right of the stereo field, although markedly obvious under
ideal listening circumstances, may well seem like a disproportionately
small gesture in a venue whose physical dimensions and acoustic
characteristics are much larger. It therefore seems reasonable to
exaggerate this gesture in diffusion, in order to properly convey the
effect intended by the composer. Indeed any gesture that is notionally
‘on the tape’ may be lost in a large venue, and should therefore be
exaggerated by the sound diffuser:
If you have something which is […] zapping around all over the stereo
stage, then it seems to me perfectly legitimate to exaggerate that erratic
behaviour over a much bigger loudspeaker system. Hence doing this kind
of thing [makes short rapid hand movements] and wiggling the faders
around in a way that will throw the sound all around the room, in an
erratic manner. That seems to be perfectly in-keeping with the musical
idea… Where you have a real sense of something going [makes a
‘whooshing’ sound and gesticulates from front to back], that’s such a
physical gesture that it’s just screaming to be exaggerated! So you do it
with the faders; you just ‘do more.’ […] So inevitably [diffusion] grows
from a need to reinstate, if you like, in a big space, that which is audible in
a small space but would get lost in a big space.159
We might regard this as a process whose goal is to present musical

material in a manner that is appropriate to the nature of the context, or
157
158
159
152
as Smalley puts it, ‘to expand the stereo image and to project it
effectively in a large space.’160 The very essence of top-down sound
diffusion is that this process necessarily entails a subjective
interpretation of the musical material. This is, of course, perfectly in-
keeping with the ethos of top-down composition in general, which
fundamentally bases its aesthetics on perceptual and subjective realities.
Accordingly, proponents of the top-down model of sound diffusion tend
to regard the practice, in the truest possible sense, as a continuation of
the compositional process itself:
Through analysis, familiarity and understanding of the work, an informed

and experienced composer/diffusor/projectionist can present the diffused
work as a continuation of the composer’s musical intent in such a way to
significantly expand the listening experience of that work.161
Certain musics have such physicality already embodied in them,

embedded in them, that not to continue that process into diffusion seems to
be a travesty of the piece.162
As the compositional process in this case is effectively based on the

premise of abstracting musical structure from ‘concrete’ materials via a
process of perceptual evaluation, so the process of diffusion consists in
a reiteration of this procedure. In diffusion, the work is treated as a
concrete material, whose subjective evaluation within a particular
context informs the way that the material is treated in performance. This
can only happen via a process of interpretation.
Accordingly, aspects that might increase the scope for interpretation in

the diffusion of electroacoustic music tend to be embraced by top-down
composers, at least to a far greater extent than by their bottom-up
counterparts. Loudspeakers that ‘colour’ the sound, for example, are
actively endorsed by certain practitioners on this basis:
160
161
Wyatt, et al. (1999). "Investigative Studies on Sound Diffusion / Projection". EContact!
2(4). Electronic journal available at:
http://cec.concordia.ca/econtact/Diffusion/Investigative.htm..
162
153
Are we really all in agreement […] that the diffusion instrument should be
an ensemble of high fidelity loudspeakers, possessing linear response, and
thus rather lacking in character? […] We are under enormous pressure to
normalize so that compositions may be distributed with guaranteed
conformity. And yet it is precisely those highly original diffusion
instruments, made up of motley mixes of loudspeakers, that have given
pleasure to so many of us.163
One notices a certain rejection of the idea of an ensemble of high-fidelity

rigorously homogenized loudspeakers, possessing a near-military
precision of performance and behaviour in their devotion to the common
cause of the composition. To this totalitarian concept of sound-projection,
I prefer the high-infidelity of loudspeaker pairs that allow variable shading
during the diffusion. To the autism of an ensemble of identical
loudspeakers, I prefer the multiracial accents of a disparate gathering.164
Clearly the above quotations are indicative of a broader top-down view

of electroacoustic music that extends beyond the mere specifics of its
performance, and are also, characteristically, antagonistic towards the
opposing bottom-up view (use of the word ‘totalitarian’ is particularly
telling in this respect). In the broadest possible terms it can be stated
that top-down approaches to sound diffusion tend to consist in adapting
electroacoustic works to given performance spaces via a subjective
process of interpretation. Often this involves the pluriphonic
presentation of a coherent audio source set via multiple coherent
loudspeakers sets. Some diffusion systems that embody the values
discussed in this section will be described in the next chapter.
3.12.2. The Bottom-Up Model
Another approach to the issue of maintaining coherence in performance

would be to tailor the performance context such that any undesirable
colourations of the musical material are minimised. For example, if an
overly reverberant venue interferes with the subtleties of the musical
material, then measures can be taken to reduce the reverberation time,
or even find a more appropriate, less reverberant, venue. If an
audience’s positioning adversely affects their perception of accurate
phantom imaging, then steps can be made towards restricting the
163
Boesch (1998). "Composition / Diffusion in Electroacoustic Music". In Barrière and
Bennett [Eds.] Composition / Diffusion in Electroacoustic Music: 221.
164
154
auditioning area so that (ideally) all listeners experience the same effect.
(Calls for every member of the audience to be provided with an
individual pair of headphones are not unknown).
Of course, this approach to the performance of electroacoustic music is

dependent on a somewhat fixed notion of what a composition ‘is,’ and it
should be clear that this standpoint follows on logically given what we
know about the nature of bottom-up composition. If an electroacoustic
work has been composed ‘monolithically,’ and particularly if there are
notionally quantifiable relationships within its fabric, then it is of crucial
importance that the ‘truth’ of the work be delivered to the audience in
its pure, unadulterated form, to rearticulate a point raised in section 2.4.
As discussed previously, this model of the musical composition is
conceptually resistant to the notion of interpretation: here, there is no
interpretation; the work is a closed, fixed entity; it can only be
performed ‘correctly,’ or ‘incorrectly,’ with little margin for error. It is
therefore perfectly understandable that composers inclined towards
bottom-up models of composition and diffusion tend to be dubious
about the methods of their top-down colleagues, (perhaps rightly)
suspicious that such practice might destroy the precise relationships
expressed by the musical material. This particular methodological
conflict is described anecdotally, and very effectively, by Harrison, who
quotes Jean Piché as follows:
Phase-aligned fullband systems with enough power to fill larger spaces

[…] neutralize positional phase-shift and offer a better rendition of the
original compositional intent in the studio. [This] is certainly not perfect
given the conditions of most [electroacoustic] concerts but at least it tries
to provide a ‘neutral’ acoustical front to the audience where the music is of
prime importance, not the ‘artistry’ of the diffuser… When I go to an
electroacoustic concert, I want to hear the music in the best conditions
possible as it was intended to be heard. I don’t go to hear someone express
himself on the sliders, when I know perfectly well that whatever is done
there (with the possible exception of the composer him/herself) will be at
best ‘inspired’ improvisation and at worst ‘Bobby is loose on the sound
system again…’ Perhaps there will come a time when sensitive diffusion
artistry can be codified, but for the time being, it seems more of a whim
than ‘sensitivity.’165
165
Piché (March 1997) cited in Harrison (1998). "Sound, Space, Sculpture - Some
Thoughts on the 'What,' 'How' and 'Why' of Sound Diffusion". Organised Sound, 3(2): 124.
155
An important aspect of the above quotation is Piché’s use of the word
‘neutralize,’ as it epitomises a characteristically bottom-up attitude
towards the diffusion of electroacoustic music, whereby the aim is to
neutralise the performance context so that the truth of the music can be
perceived transparently. Similarly the notion of interpretation, which
top-down diffusers would consider an integral and essential part of
performance diffusion, is frequently regarded as ‘whim’ by those of
bottom-up persuasion. The reason for this is that interpretation, as such,
can be regarded as a search for subjective truth, a notion which is
generally not recognised by bottom-up composers with their objective
aspirations but one which is central to the ethos of top-down
composition. It therefore follows that Piché would have no quibble with
top-down diffusion if it could be (his own words) ‘codified,’ and thus
rendered in some way objective.
This argument, of course, does not solve the previously observed

difficulties of accurately presenting a stereo image in a large and
acoustically unstable performance venue. If bottom-up diffusers are
unhappy with the idea of presenting their works pluriphonically (and in
doing so ‘letting Bobby loose on the sound system again’), then an
alternative solution to this very real problem is required. This is perhaps
why bottom-up composers seem to favour composition for multichannel
fixed media. Like the top-down, pluriphonic, approach, this offers a
solution to the problem of phantom images (et cetera) getting ‘lost’ in
larger performance venues by providing larger numbers of
loudspeakers. Unlike the top-down method, the multichannel solution
(notionally) offers the composer complete control over the phantom
imaging contained across the constituent channels. This uniphonic
approach contrasts the tendency of top-down diffusers to prefer the
pluriphonic approach as a method that offers more scope for
interpretation.
There is also a case for stating that pluriphonic traditions have arisen, at
least partially, as the result of technological circumstance. Historically,
156
electroacoustic composers have, for the most part, been limited to two
channels of audio stored on magnetic tape as the mode of delivery for
compositions. Accordingly there were finite limitations with respect to
the auditory results attainable. In the absence of fixed medium
multichannel formats, composers were perhaps forced to rely on third
party diffusers to facilitate the fully immersive sonic landscapes that
they had envisaged, as well as to compensate for the dynamic range
limitations of analogue recording media. It would perhaps be argued by
the bottom-up composer that this practice is outmoded nowadays, with
probably very close to one-hundred percent of all recently produced
electroacoustic works having been realised and stored entirely in the
digital domain, and given the increasing availability of multichannel
capable software and storage media. In this context it can at least be
argued that the convention of ‘emphasising the contours’ – be they
dynamic or spatial – of works realised in the studio has out-lived the
technical need to do so. Even Harrison – a self-declared ally of the
acousmatic (top-down) school – admits that, from this purely
technological perspective, certain aspects of top-down diffusion are
‘less necessary’ than has previously been the case.166
Of course top-down diffusers would also argue that, while matters may
have been slightly improved with the advent of digital and multichannel
technologies, it is still not possible to provide a ‘definitive’ version of a
composition owing to the persistently variable acoustics of performance
venues:
I don’t agree with the premise that you compose on four or eight speakers
[…] and therefore all you need to do in the concert hall is replicate it.
Because the point is, you can’t. I think that’s my basic problem. You
cannot do it: it doesn’t work. It may work in theory, and if the acoustic is
sufficiently controlled, but how many halls have you been in that have that
kind of controlled acoustic? Not very many. 167
Notwithstanding this assertion, certain practitioners of the bottom-up

school of thought remain dedicated to constructing ‘standardised’
166
167
Ibid.
157
performance venues for the diffusion of electroacoustic music. Here,
‘transparency’ and ‘neutrality’ are often key considerations: high-
quality, matched, phase-corrected arrays of loudspeakers; dry acoustics;
carefully planned seating areas; and so on. Generally, variables that
might detract from a transparent dissemination of the work – ‘coloured’
loudspeakers or whimsical ‘interpretations’ of the musical material for
example – are to be avoided:
It is certainly true that for many composers, intimate contact with the
sound material of a work, the constant re-listening, refinement and steady
progress to the realisation of the work’s conception, together with the
possibility of its almost totally accurate reproduction, excludes more and
more the idea of an interpretation afterwards. [My italics.]168
So, in summary, bottom-up approaches to sound diffusion tend to

consist in adapting the performance circumstances to suit the nature of
given electroacoustic works, most often via a process of objective
neutralisation. This can involve minimising the reverberant
characteristics of performance venues, and is often (but not always)
characterised by a preference for the uniphonic presentation of one
CASS via one CLS. Some specific systems will be discussed in the next
chapter.
3.13. The Composition-Performance Continuum
Essentially what differentiates top-down and bottom-up approaches to the

performance of electroacoustic music is whether actions are taken on the
basis of the (perceptual) subjective reality of what a work sounds like when
diffused over a particular system, in a particular hall, and to a particular
audience (top-down), or on the basis of the notionally objective reality of
what the piece ‘is like’ (bottom-up). In the top-down case, the text (the
piece) is submitted to the context (the venue, audience, et cetera) via a
process of perceptual interpretation. In the bottom-up case, the context
submits to the text, and this often involves carefully setting up an ‘ideal’
listening environment in which factors that will compromise the monolithic
168
Tutschku (2002). "On the Interpretation of Multi-Channel Electroacoustic Works on
Loudspeaker-Orchestras: Some Thoughts on the GRM-Acousmonium and BEAST". SAN
Journal of Electroacoustic Music, 14: 14-16.
158
nature of the music are minimised. Both approaches take measures to ensure
that the coherency of the coherent audio source set (or sets) is
communicated, but the exact nature of the measures taken differs markedly.
In other words it can be argued that the ultimate goals of sound diffusion, in
the most general terms possible, are ‘universal,’ but approaches differ
according to which phenomena are regarded as ‘given,’ or ‘absolute.’ The
bottom-up standpoint regards the composition as absolute, and seeks to
address the issues of performance by bringing the context closer to it. The
top-down view, essentially, regards the context as ‘given’ (at least with
respect to any one specific performance) and attempts to move the
composition ‘towards’ that context.
Stated simply, the bottom-up composer attempts to place the performance

context within the scope of the work, while the top-down composer seeks to
place the work within the scope of the context. This conceptual difference
owes much to the writings of Savouret,169 and is expressed
diagrammatically in Figure 19.
Bottom-Up Top-Down
STATIC
Nature of the Nature of the

Composition Composition
Nature of the Nature of the

Performance Context Performance Context
STATIC
Figure 19. Simple diagrammatic

illustration of the difference
between bottom-up and top-
down approaches to the diffusion
of electroacoustic works.
169
Composition / Diffusion in Electroacoustic Music.
159
An interesting phenomenon that supports this model can be found in
differing attitudes towards the use of artificial reverberation in composition.
Denis Smalley, for example, expresses a preference for producing
compositions with very little artificial reverberation, based on the
assumption that most performance venues will provide reverberant
characteristics of their own and the work can therefore be diffused in a
manner that takes advantage of this:
In my pieces, I’ve not been keen on placing one reverberated space into another
reverberated space, which is what can happen when taking your piece to a
public space… That’s probably the reason I don’t use artificial
reverberation…170
Others might tend to favour the use of artificial (and therefore controllable)
reverberation in composition, intended for performance in a venue that is
more acoustically neutral. In this case it would seem likely that proactive
steps would need to be taken in order to ensure as little ambient
reverberation in the venue as possible: Küpper’s approach to diffusion via
sound cupolas – which will later be described in section 4.12 – is a good
example.
In both top-down and bottom-up cases the practice of sound diffusion can
be regarded as something of a continuation of the compositional process
itself. In bottom-up composition, ‘musical structure is created by a process
which is primarily the imposition of quantifiable values on fundamentally
inert sound material.’171 Similarly, in diffusion, the performance context
must also be ‘fundamentally inert’ (or, as discussed, neutral) in order to
transparently articulate the music. In top-down composition, the composer is
guided by a perceptual evaluation of the materials within the (subjectively
true) context of his or her perceptions, and this process is continued in the
context of diffusion. It should therefore be clear that, in either case, it is not
possible to define an obvious distinction between the processes of
composition and performance. In both top-down and bottom-up cases, the
170
171
160
nature of the compositional approach taken, to a considerable extent,
presupposes the nature of the performance, and vice versa. It can be
proposed that sound diffusion, broadly speaking, is a process that seeks to
publicly present coherent audio source sets in an effective manner via
loudspeakers, such that the essential nature of the music is successfully
relayed to the audience. This process is complicated because the ‘essential
nature’ of electroacoustic works can be either top-down or bottom-up, and
therefore the means of presenting them ‘effectively’ can also be either top-
down or bottom-up. In general terms it can be seen that the conflict between
these opposing attitudes is forced out into the open at the performance stage
because, ultimately, attempts to reconcile these highly contrasting
convictions simply cannot be deferred any further.
3.14. Review of Top-Down and Bottom-Up Techniques
In the briefest terms possible, it can be said that the process of sound
diffusion might entail presenting a piece of electroacoustic music in a
manner that suits the context, or alternatively tailoring the context itself in a
way that suits the piece. Clozier’s use of the expressions ‘diffusion-
interpretation’ and ‘diffusion-transmission’172 represents an easily
comprehensible means of expressing the difference between the two.
‘Diffusion-transmission’ suggests a passive process involving the
‘transparent’ communication of the musical work. ‘Diffusion-
interpretation,’ contrastingly, infers that the performer plays a more active
role in expressing the discourse of the work. These contrasting methods
have here been described, respectively, as the top-down and bottom-up
approaches to sound diffusion and are, in many cases, bound within a
continuum that chronicles the development of an electroacoustic work from
initial conception, through composition, to performance. That is to say,
actions taken during the respective processes of composition and
performance are often taken with respect to the same set of underlying
beliefs. Table 4, below, reiterates the abstract defining characteristics of the
top-down and bottom-up standpoints given in the previous chapter,
172
161
additionally appending certain traits specific to the approaches to the task of
sound diffusion adopted by each of these groups.
Top-Down Bottom-Up
The overall ethos is…
Human Super-human
Subjective Objective
Composed to be interpreted/ Composed to be
appropriated realised/disseminated
Realist (pragmatic) Idealist
Plural Absolute (monolithic)
Relativistic Deterministic
Corporeal (physical) Cerebral (intellectual)
Empirical/perceptual Logical/conceptual
Qualitative Quantitative
Phenomenological Rational
Built/invented Constructed/created
Organic Architectonic
Abstracted forms Abstract forms
Text submits to context Context submits to text
…which indicates that…
Encoded audio streams are regarded Encoded audio streams are regarded
as abstractions of the perceptual as abstractions of the conceptual
qualities of real auditory events structures that define real auditory
events
...and therefore the techniques of sound diffusion are more likely to
involve…
‘Coloured’ loudspeakers ‘Transparent’ loudspeakers
Diffusion-interpretation Diffusion-transmission
(Sensitively) improvised diffusion Pre-planned or automated diffusion
Variable performance contexts Controlled performance contexts
Preference for pluriphonic approach Preference for uniphonic approach
Maximising interpretative scope Minimising interpretative scope
Table 4. Revised criteria of the
Top-Down/Bottom-Up model,
with items relevant specifically
to the practice of sound diffusion
appended.
Once again it should be pointed out that these criteria are, essentially, gross
generalisations, and that in reality opinions with respect to sound diffusion
are never so neatly divided. Not all top-down practitioners, for example,
would agree with all of the attributes designated as ‘top-down.’
Nonetheless, in generalising in this way it is possible to identify certain
162
patterns that seem to have emerged in the performance practice of
electroacoustic music, and this gives an overall indication of the kind of
dichotomy we are dealing with. It would be more reasonable to suggest that
most practitioners recognise the relative advantages and disadvantages of
both standpoints, nonetheless ultimately expressing some degree of
preference for one or the other.
If works from across the technological and aesthetic spectrum of the

electroacoustic idiom are to be successfully performed side-by-side in
concerts, then some kind of solution that – somehow – reconciles these two
highly contrasting philosophies and methodologies and is acceptable to both
top-down and bottom-up practitioners is required. Chapter 4 will focus, in
part, on the evaluation of existing sound diffusion systems on this basis.
3.15. Summary
The present chapter has focused mainly on two things: the definition of
‘sound diffusion’ in abstract terms; and the attitudes of electroacoustic
musicians with respect to this practice. Concepts of the ‘coherent audio
source set’ (CASS) and the ‘coherent loudspeaker set’ (CLS) have been
proposed as useful tools in defining the practice of sound diffusion from a
technological perspective; it is also proposed that these concepts will prove
useful in the design of future sound diffusion systems.
It can be concluded that sound diffusion involves the presentation of one or

more coherent audio source sets, via one or more coherent loudspeaker sets,
to an audience in a public performance situation. Ideally this should be done
in such a way that the discourse of the musical work is effectively
communicated to the audience. This central objective is, basically,
unanimous among electroacoustic musicians but its realisation in practice is
complicated by the fact that, acoustically, it is difficult to facilitate a
completely ‘accurate’ decoding of encoded audio streams in most
performance venues (as described in section 3.3). It is for this reason that
sound diffusion must be an active process.
163
The central objective of sound diffusion is further complicated by the fact
that top-down and bottom-up practitioners fundamentally disagree with
regard to what exactly an ‘accurate reproduction’ of a work is. This is owing
to the fact that the top-down practitioner regards the encoded audio stream
as an abstraction of the perceptual qualities of real auditory events, whereas
the bottom-up practitioner regards it as an abstraction of the conceptual
structures that define real auditory events. Accordingly, the ways in which
composers/performers seek to achieve this central objective are essentially
two-fold, and can broadly be categorised as top-down and bottom-up,
respectively. Usually, top-down composers demonstrate a preference for
top-down approaches to sound diffusion, while bottom-up composers tend
to prefer bottom-up methods. Some of the general characteristics of top-
down and bottom-up approaches to electroacoustic music and sound
diffusion were summarised in Table 4. In the most basic possible terms, top-
down diffusion methods are motivated, broadly, by the subjective reality of
sound as perceived by the diffuser, while bottom-up methods are based on
the objective reality of the work in question, as conceived by the composer.
In this respect the two approaches differ fundamentally in terms of the
nature of what they are trying to communicate. In other words, top-down
and bottom-up practitioners have highly contrasting views with regard to the
nature and purpose of sound diffusion itself and, therefore, with respect to
the role of sound diffusion systems in the performance of electroacoustic
music.
In both top-down and bottom-up cases, the approach to sound diffusion

represents a continuation of the essential nature of the compositional
process itself. This being the case, the hypothesis is that sound diffusion
systems will themselves be divisible into those that are broadly intended for
the communication of works in an essentially top-down manner, and those
intended for the communication of works in an essentially bottom-up
manner. This bifurcate approach is, however, problematic given that top-
down and bottom-up works are reasonably likely to be juxtaposed in concert
programmes. Ideally, a sound diffusion system should be able to facilitate
the presentation of both top-down and bottom-up works, and will therefore
164
be required to accommodate both top-down and bottom-up attitudes and
diffusion methods. In combination with those technological observations
made in Chapter 1, this proposition will form the basis for a system of
criteria designed for the evaluation of existing sound diffusion systems,
which is to be the central focus of the next chapter.
165
4. Sound Diffusion Systems
4.1. Introduction
In Chapters 1 and 2 an approach was made towards an understanding of the

technological and aesthetic scopes, respectively, of the electroacoustic
idiom, with a broad focus on the demands of the public performance
scenario from each of these perspectives. In Chapter 3, some of the ways in
which these issues are carried forward as contrasting approaches to the
practice of sound diffusion itself were outlined. From a technological
perspective, this has been explained in terms of the relationship between
coherent audio source sets and coherent loudspeaker sets. In aesthetic terms,
two distinct methodologies have been observed. Of course, these differing
top-down and bottom-up aesthetics, as they have been defined, will have a
fundamental impact upon the ways in which the technologies are
approached in both composition and performance. From these three
previous chapters an overall impression of the ‘range’ of electroacoustic
music has been given. It has been proposed that if any specific sound
diffusion system is to be of widespread use, then it will have to be able to
accommodate works – and attitudes, and methodologies, and so on – from
across this range side-by-side, in the context of a live performance.
The purpose of the present chapter is to evaluate the extent to which various
existing sound diffusion systems are able to satisfy this challenging demand,
and more importantly to examine the ways in which this is (or is not)
achieved. As far as the author is aware, there is no single published
document that collates, summarises, and evaluates current sound diffusion
technology in this way. Observations made in this respect will form the
basis of a proposed ‘way forward’ in the development of future systems;
this will be given in Chapter 6. In order to facilitate the realisation of this
overall objective more easily, a system of criteria for the evaluation of
sound diffusion systems will be proposed. These criteria will also be used
for the evaluation of the M2 Diffusion System (Chapter 5), which was co-
designed, co-developed, and co-implemented by the author as an integral
167
part of this research. The review process will begin with a non-specific
explanation, in abstract terms, of what a diffusion system is.
4.2. What is a Sound Diffusion System?
A sound diffusion system is hardware, software, or a combination thereof,

used to facilitate the presentation of electroacoustic works via loudspeakers
in a live concert situation.
As such, any given diffusion system is likely to be subjected to a fairly

weighty set of technological and aesthetic, not to mention practical,
demands. From a technological perspective, the role of a sound diffusion
system is to mediate between audio source(s) and a loudspeaker array
consisting of multiple loudspeakers, via some kind of intermediate control
interface that allows the performer to execute the diffusion as required. The
format of the mediation is encoded audio streams (see section 1.6). An
abstract graphical representation of a generalised diffusion system
architecture is given in Figure 20, below.
Audio Source(s)
[Encoded Audio Streams]
Physical Audio Input Layer
Mix Engine [Control Data] Control Interface
Physical Audio Output Layer
[Encoded Audio Streams]
Loudspeaker Array
Figure 20. Graphical

representation of a sound
diffusion system in terms of four
main components: audio
source(s), control interface, mix
engine, and loudspeaker array.
168
The point of entry for encoded audio source streams into the system is via a
physical audio input layer, which receives encoded audio streams from the
various audio sources (CD player, microphones, computer sound card
outputs, et cetera; any device capable of outputting at least one transitorily
encoded audio stream can potentially be used as an audio source). The
transmission of encoded audio streams to loudspeakers (the point of exit)
takes place via a physical audio output layer. These input and output layers
are also illustrated in Figure 20. The mix engine mediates between these two
layers, performing the necessary signal routing, mixing, and any other signal
processing tasks. Often (but not necessarily always) this will involve
outputting a number of encoded audio streams that is larger than the number
of input streams. The mix engine performs its task according to control data
received from the control interface, which is the means by which the
performer interacts with the diffusion system. The input and output layers
are necessarily physical, whereas the mix engine and control interface can
be implemented in software or hardware or a combination of both.
The exact technological means by which the abstract process of sound

diffusion is facilitated are various (a number of different possibilities will be
described later in the chapter) but in many cases a hardware audio mixing
desk serves as the physical audio input layer, control interface (faders), mix
engine, and physical audio output layer. For any single performance, the
loudspeaker array is most likely to be static, that is, it will consist of a fixed
number of loudspeakers arranged in a set formation (although it is not
unknown, in the author’s experience, for loudspeakers to be added or
removed, or their positions changed, within a single performance). The
nature and number of channels contained within the audio source(s),
however, is much more likely to change within a single performance. As a
simple example, a concert programme may contain stereophonic fixed-
medium-only works, multichannel fixed medium works, and works for
instruments and tape. Consequently, the way in which the total number of
encoded audio inputs is subdivided into coherent audio source sets may vary
from work to work, and indeed the total number of audio inputs in use at
any given time may also change. This is also likely to mean that the way in
169
which the loudspeaker array is utilised – that is, subdivided into coherent
loudspeaker sets – is also likely to change from piece to piece, even though
the total number and formation of loudspeakers within the array remains
constant. For example, a stereophonic work (representing a single two-
channel CASS) diffused via an array of eight loudspeakers is likely to treat
the loudspeaker array as four stereophonic CLSs; an octaphonic fixed
medium work (if eight source channels have been treated collectively as a
single CASS) is more likely to treat the entire loudspeaker array as a single
octaphonic CLS.
Having briefly described the concept of the sound diffusion system in

abstract technological terms, it can now also be said that a sound diffusion
system will have to facilitate the presentation of electroacoustic works
according to both top-down and bottom-up methodologies, as it is likely that
both kinds of work will co-habit concert programmes. The following
sections describe a set of criteria by which, it is proposed, sound diffusion
systems can be evaluated in terms of their flexibility and appropriateness for
the task of performing live electroacoustic music.
4.3. Criteria for the Evaluation of Sound Diffusion

Systems
The objective of sound diffusion is, in a very general sense, to facilitate the
presentation of one or more coherent audio source sets – via one or more
coherent loudspeaker sets – such that the coherency of the audio source sets
is maintained as far as possible. If this can be achieved then the diffuser will
have done everything within his or her power to ensure that the discourse of
the electroacoustic work will be effectively and faithfully communicated to
members of the audience, and this – it is suggested – should be regarded as
the ultimate aim of sound diffusion. It is clear that if this overall objective is
to be uniformly achieved – given the broad technical and aesthetic scope of
the electroacoustic idiom – a sound diffusion system will have to satisfy a
complex set of interrelated demands. Some of these demands stem directly
from the nature of electroacoustic music itself, in technological and
170
aesthetic terms. In this particular respect, it is proposed that a sound
diffusion system can be evaluated in terms of its:
4.3.1. Ability to Cater for the Technical Demands

of Live Electroacoustic Music
The various technical demands that might be exerted by any given piece
of electroacoustic music were outlined in sections 1.7 and 1.8, with
more general observations with regard to how the technologies are
appropriated by electroacoustic musicians being made in the latter
sections of Chapter 1. In the context of the diffusion system, ideally, all
of the possible permutations should be easily manageable from a
technical standpoint (it will later be argued that this is not currently the
case). It should, of course, be noted that the technological repertoire is
constantly expanding.
As an arbitrary example, this means that an ideal sound diffusion system

should be able to accommodate the technology required to perform an
electroacoustic work for live instruments, live synthesis, and tape,
alongside that required for fixed-medium-only works, and alongside
that demanded by works for instruments and ‘live electronics.’ It must
be remembered that a concert programme of electroacoustic music is (at
least, in the author’s experience) reasonably likely to incorporate works
of varying technical demands in this way. The diffusion hardware (or
software if applicable) must therefore be designed in such a way that
these demands can be met without unduly disrupting the flow of the
programme as a whole. Hiatus between consecutive works within the
concert programme should be avoided if at all possible; this can arise if
works of varying technical demands necessitate the physical re-patching
of audio hardware, for example.
In section 1.6.2 it was noted that individual encoded audio streams are
often used collectively to abstractly represent spatio-auditory attributes;
in section 3.5 the coherent audio source set (CASS) was presented as a
convenient conceptual framework for representing this. In the present
171
context it should be noted that, in addition to those technological
demands already discussed, electroacoustic works are also likely to vary
in terms of their CASS demands. The number and individual nature of
CASSs in any given electroacoustic work is, at the general level, an
unknown variable. A stereophonic fixed medium work is most likely to
consist of a single, two-channel, CASS: this consensus is practically
unanimous and therefore relatively simple to deal with. When the
number of CASSs (and the number of constituent channels within any
given CASS) increases, the situation becomes exponentially more
difficult to accommodate from a technological standpoint, particularly
given the relatively inflexible signal routing offered by most studio
mixing desks. If physical re-patching is to be avoided, and if works
comprising variable numbers of coherent audio source sets – themselves
consisting of a potentially arbitrary number of constituent channels – are
to be accommodated, then a high degree of dynamic signal routing
flexibility is clearly required, particularly if works that vary in these
respects are to be performed consecutively; this will be discussed more
fully in section 4.3.3.
4.3.2. Ability to Cater for both Top-Down and

Bottom-Up Approaches to Sound Diffusion
In sections 3.12 to 3.14, some of the characteristics that differentiate

top-down and bottom-up approaches to sound diffusion were proposed
and summarised. As concert programmes are fairly likely to incorporate
works conceived from both of these contrasting standpoints, it is
desirable that diffusion systems should be able to cater for them equally
and without prejudice. It has been noted, for instance, that top-down
diffusers tend to favour pluriphonic relationships between a single
CASS and multiple CLSs, while bottom-up diffusers are more inclined
to adopt a uniphonic approach whereby each channel of the CASS is
represented by a one loudspeaker within a single CLS. The former
(pluriphonic) approach is reasonably flexible because appropriate CLSs
can be defined as necessary within a variety of different loudspeaker
arrays. The uniphonic approach can be more restrictive as it usually
172
relies on a fairly specific set of loudspeakers in terms of relative
positioning. If both models are to be supported, this means that the
loudspeaker array must allow appropriate CLSs to be defined in each
case. The array shown in Figure 21, for example, would allow for both
stereophonic and 5.1 CLSs to be easily defined within it. A stereophonic
work could be diffused pluriphonically by a top-down diffuser (see
Figure 21(b)), while a 5.1 work could equally be presented
uniphonically by a bottom-up diffuser (Figure 21(a)).
(a) 5.1 CASS presented (b) Stereo CASS presented

uniphonically via single CLS pluriphonically via multiple CLS's

allowing for the uniphonic
presentation of a 5.1 CASS in
addition to the pluriphonic
presentation of a stereophonic
CASS.
Of course if the scenario illustrated in Figure 21 is to be facilitated in

practice, this raises further technological issues, particularly with
respect to signal routing and mixing (to be discussed in section 4.3.3)
and interface ergonomics (section 4.3.4).
It has also been noted that top-down diffusers are more inclined to
favour loudspeakers that ‘colour’ the sound, while bottom-up diffusers
tend to prefer loudspeakers that are ‘transparent’ in terms of their
frequency response. Again, if both approaches are to be supported
within the context of a single performance, then some kind of
compromise is required. If the loudspeaker array consists of transparent
173
loudspeakers, then perhaps some facility to add ‘colouring’ during
performance is necessary. Alternatively, if the array consists in
‘coloured’ loudspeakers, then the ability to compensate for this during
performance would seem desirable. In either case this could be achieved
with flexible equalisation capabilities.
In terms of venue acoustics, it has been suggested that bottom-up

practitioners tend to favour controlled acoustics in which reverberation
(and so on) is minimised, while top-down practitioners are more
inclined to regard unpredictable performance conditions as an
interpretative challenge that, in a sense, justifies the very act of
diffusion itself. These contrasting standpoints would, indeed, seem
difficult to reconcile within the scope of a single sound diffusion
system, but nonetheless, a compromise could be reached. Again, means
to ‘correcting’ less-than-ideal listening environments could be
implemented for the benefit of bottom-up practitioners; these might
include phase correction, careful corrective filtering, calculated delays
for individual loudspeaker feed signals, and so on.
Overall – as noted previously – the basis for evaluating and performing

top-down works is essentially perceptual, whereas bottom-up works are
conceived, realised, performed and evaluated according to conceptual or
‘objective’ criteria. This means that top-down and bottom-up
practitioners are likely to expect different things from the sound
diffusion system. For the top-down practitioner, the means of diffusion
should perhaps offer a repertoire of real-time techniques suitable for
shaping the musical material ‘on the fly’ in a perceptually effective
manner. From this perspective, an intuitive and tactile interface
allowing the diffuser to interact immediately and ‘organically’ with the
musical material would seem appropriate. For the bottom-up
practitioner, performance of the work is likely to be altogether more
passive (perhaps taking the form of a transparent and objective
presentation in which more attention is paid to the ‘neutralising’ of the
performance context prior to diffusion than to the process of diffusion
174
itself), or else precisely realised from a carefully preconceived ‘score.’
The latter case – like the bottom-up compositional process itself –
demands a system whereby performance actions can be assessed
according to their ‘objective accuracy’ (as opposed to their ‘perceptual
efficacy’) and it is for this reason that bottom-up practitioners often
express a preference for automated diffusion routines or procedures that
can otherwise be consistently and accurately realised from a
predetermined scheme. These two – in many respects diametrically
opposed – methods collectively present additional challenges with
respect to interface ergonomics, which will be discussed further in
section 4.3.4.
It would be long-winded and impractical to individually describe all of

the subtle nuances that differentiate top-down and bottom-up
approaches to sound diffusion in detail. Reference to Table 4 (page 162)
will give an indication of some of the broad philosophical and aesthetic
contrasts and from these it should be possible to extrapolate more finite
methodological differences. It is suggested that if top-down and bottom-
up ideologies are to be fully supported by a sound diffusion system,
then all of the factors described so far would have to be dynamically
assignable during performance. This is to accommodate the fact that a
concert programme could quite conceivably feature a top-down work
immediately followed by a bottom-up work, each with quite distinct
requirements from the diffusion system. Above all, this means that if a
sound diffusion system is to be equally accommodating of both top-
down and bottom-up methodologies, it must be enormously flexible,
and able to facilitate (potentially fundamental) changes as necessary
without disrupting the flow of the live concert programme.
In addition to these ‘high level’ considerations, there are additional criteria

pertaining more to the logistics of sound diffusion itself. Sound diffusion is
not simply a transparent means by which electroacoustic works are
communicated to an audience, but an active practice in which hardware and
software platforms are engaged, by a performer, for this purpose. In other
175
words, a sound diffusion system is a creative framework in itself, and as
such can be regarded, abstractly, as being similar to any other creative
framework. As discussed in Chapters 1 and 2, implicit in every creative
framework is a unique set characteristics, paradigms, and working
techniques. The nature of the diffusion system itself (the processes involved
in setting up and using the system; the control interface; the way in which
hardware and software components interact; et cetera) will be a determining
factor in evaluating the efficacy of that system as a platform for the
communication of electroacoustic works. The following sections outline
some of the criteria by which sound diffusion systems can be evaluated in
these terms. Of course, the extent to which these criteria are met (or are not
met) is very likely to impact upon the degree of success attainable in those
‘higher level’ criteria described previously.
4.3.3. Flexibility of Signal Routing and Mixing

Architecture
Technically speaking, the aim of sound diffusion is to relay one or more

coherent audio source sets – via an intermediate control interface – to
one or more coherent loudspeaker sets. As described previously, there
are many potential variables between works: the number and nature of
audio sources; the number of channels contained within each CASS; the
number and nature of the CLSs to be defined within the loudspeaker
array; et cetera. In satisfying the two broad criteria given previously, it
is therefore proposed that one of the single most advantageous features
that a sound diffusion system can offer is flexible, dynamic signal
routing and mixing capabilities.
What is ideally required is a signal routing and mixing architecture

capable of handling an arbitrary number of audio input channels, to be
mixed to an arbitrary number of output channels, via a user interface
that allows the input and output channels, and input-to-output routings,
to be conceptually grouped together (into CASSs and corresponding
CLSs) in an arbitrary manner and ergonomically controlled (interface
ergonomics will be discussed separately in section 4.3.4). Large
176
numbers of input and (particularly) output channels are desirable, so
long as this does not compromise flexibility. Such an architecture is
represented diagrammatically in Figure 22, below.
Total number of audio

input streams (arbitrary)
Audio input streams to

be arbitrarily grouped
into CASS's
Physical Audio Input Control interface allows...

Layer
Mix Engine Control Data Control Interface
Physical Audio Output

Control interface allows...
Layer
Audio output streams to

be arbitrarily grouped
into CLS's in a manner
appropriate to the
source CASS's,
loudspeaker array, and
diffusion requirements
Total number of audio output streams (arbitrary, but

less likely to change during a single performance)
Figure 22. Diagrammatic

representation of an ideal signal
routing/mixing architecture for
the purposes of live diffusion of
In this example, the total number of audio input streams (six) has been
conceptually subdivided into two coherent audio source sets, one with
two constituent channels (perhaps from a stereo pair of microphones)
and one with four (perhaps from a quadraphonic fixed medium). As per
section 3.8.3 (see page 140), this means that both two-channel and four-
channel coherent loudspeaker sets are likely to be required within the
177
loudspeaker array.173 Of course the total number of input and output
channels is an unknown variable and (particularly the former) may vary
significantly from work to work within the context of a single concert
programme.
This model is beneficial from the standpoint of criterion 4.3.1, because

it does not make any assumptions with regard to the number of audio
input and output channels, nor the ways in which these might be
respectively grouped into CASSs and CLSs. This means that works
demanding variable technical resources can easily be accommodated
side-by-side. It also does not make any assumptions with regard to the
relationship between the mixing architecture and the control interface,
which – as will later become clear – is a criticism that can justifiably be
directed at many existing sound diffusion systems. For the same
reasons, this model is also beneficial with respect to certain aspects of
criterion 4.3.2: CASSs and CLSs can dynamically be assigned in a
manner appropriate for both uniphonic and pluriphonic diffusion
methods (assuming, of course, that the loudspeaker array is able to
accommodate all of the CLSs required), again with no restrictions
imposed by the control interface itself.
4.3.4. Interface Ergonomics
‘Interface ergonomics’ relates to the physical logistics of interacting

with a user interface (whether software or hardware, or a combination)
as a means to achieving particular objectives in the process of sound
diffusion. As simple examples, a top-down diffuser may wish to
progressively ‘widen’ a stereo image by projecting it via consecutively
wider-spaced loudspeaker pairs, or a bottom-up diffuser may wish to
accurately articulate a carefully planned spatial orchestration. This
173
The grouping of physical audio outputs into CLSs is also shown in Figure 22. Arbitrary
groupings of the output channels into ‘twos’ and ‘fours’ (reflecting the nature of the input
CASSs) are illustrated. The grouping lowest down on the diagram, for instance, shows that
physical outputs 2, 4, 6, and 8, and 10, 12, 14, and 16, have been grouped together forming
two four-channel CLSs. Of course, the actual groupings in any real scenario will depend on
the shape of the loudspeaker array and the order in which physical audio outputs are routed
to loudspeakers.
178
criterion evaluates the extent to which the user interface itself – by way
of its particular architecture and paradigms – is able (or unable) to
facilitate the realisation of such objectives.
If the control interface is to facilitate the complex demands outlined in

the previous section, this raises some fairly serious questions with
respect to how this can be achieved ergonomically, and in a manner that
is suitable for live, real time, performance. Architecturally, matrix
mixers (which will briefly be discussed in section 4.14) are able to fully
support the ideal model presented in Figure 22, above, insofar as any
input stream can be mixed, in any proportion, to any physical audio
output. However, the number of physical controls (often rotary
potentiometers) required for a matrix mixer with individual control over
every input-to-output crosspoint can render the interface non-ergonomic
for the purposes of live performance, that is, it would be logistically
difficult to realise certain diffusion actions. This is, of course, indicates
the necessity for a certain degree of compromise in the satisfaction of
criteria 4.3.3 and 4.3.4
Additionally, one must consider the fact that diffusion equipment must,
for the most part, be operated by performers in real time, and under
reasonably stressful (live performance) circumstances. When this is the
case, the control interface must ideally be fairly simple and intuitive, but
nonetheless able to provide the performer with all of the necessary
means to achieving the musical communication required in real time
(except, of course, where preconceived automated diffusion is
concerned).
4.3.5. Accessibility with respect to Operational

Paradigms
This criterion is closely related to the previous, but is more concerned

with the relative ease or difficulty experienced in the appropriation of
interface paradigms by performers. To rephrase, this criterion evaluates
the ‘approachability’ of the interface used to realise the diffusion of
179
electroacoustic works, and the ease with which the necessary skills can
be assimilated; the ‘steepness’ of the learning curve, in essence.
Implicit here is the notion of familiarity. If an interface is familiar to its

user, then the range and quality of results attainable with that interface
are likely to be fairly high. If the interface is unfamiliar, then it will be
necessary for the performer to learn how to use it, and until this has
happened the range and quality of results attainable will be limited. The
difference between this criterion and the previous one rests in the fact
that an interface can be highly ergonomic in the hands of a familiar user,
but effectively useless in the hands of a less familiar user. Harrison
makes the following observations:
[The mixing desk fader is] an interface that practitioners of electroacoustic

music tend to know fairly well. […] Those of us who grew up making
pieces with analogue tape had to use faders. You had to use the mixer
because it was the only way of mixing stuff together, so you got to know
what a fader felt like. So that’s why it worked, historically, quite well, and
has been the dominant one. […] All these new, fancy things – gloves and
motion sensors and so on – they all have their place, I think, but it’s a
question of how much time do we have to be able to get to control them
sufficiently to know what we’re going to do? I think that’s the big issue.
[…] The big problem with all of them is that every single possible
configuration is a ‘new instrument,’ and an instrument takes time to learn
how to play. At least with the good old fader we have a reasonable history
of having learned to play it a bit. […] But in a sense what you’re looking
at with everything you’re doing is a range of values from a to b. A fader is
actually a pretty efficient way of doing that, if you think about it.174
As implied by Harrison, the importance of the present criterion is

reinforced by the fact that rehearsal time for concerts of live
electroacoustic music is often minimal and, consequently, performers
typically have very little time available to learn new paradigms. If an
interface is unfamiliar, therefore, it is – on the one hand – desirable that
performers should be able to assimilate the necessary skills quickly and
easily. On the other hand, if an interface is quick and easy to master, the
range of results attainable by more proficient users may be limited as a
direct consequence, and this is – paradoxically – a case against
interfaces that are too ‘easy.’ Kirk and Hunt stress the importance of
174
180
mediating between these two opposing demands in the design of
creative user interfaces; although they refer specifically to computer
software interfaces, their point remains very valid in the context of
sound diffusion:
Most people would expect an ideal computer system [read: diffusion

system] to be easy to use and easy to learn. However, nobody expects
learning to drive a car to be easy. No musician would ever say that a violin
or a clarinet was easy to use. Yet the subtlety of control and real-time
interaction which can be shown by car drivers and instrumental performers
is often astounding. By making the assumption that interfaces should be
easy we are in danger of undervaluing the adaptation capabilities of human
operators and thus limiting the potential human-computer interaction to a
lowest common denominator of ‘easy to use’ commands. […] A musician
who has to work hard to produce a good sound is likely to be rewarded
with appreciation from an audience. In contrast ‘easy-play’ instruments
tend to produce ‘cheap-and-easy’ musical results.175
4.3.6. Accessibility with respect to Technological

System Prerequisites and Support thereof
The ‘technological prerequisites’ of a sound diffusion system refers,

basically, to the pieces of software and hardware required to build that
system. If a system is accessible in terms of this, then it should be
relatively straightforward for an institution to build the system in
question. If a system is inaccessible in terms of its technological
prerequisites, on the other hand, then the building of that system is
likely to be less feasible or, in the worst case, impossible.
A number of discrete factors contribute to the relative accessibility or

inaccessibility of diffusion systems in this context. Finance is a factor: if
a diffusion system comprises expensive software and/or hardware
components then it will obviously be inaccessible to any individual or
institution without the financial means to obtaining these. Accessibility
in terms of simple availability is also an important consideration. If a
sound diffusion system consists of components that are difficult or
impossible to obtain – perhaps they are ‘custom’ components or items
that are not commercially or otherwise easily available – then that
system will consequently be less accessible, regardless of any financial
175
Kirk and Hunt (1999). Digital Sound Processing for Music and Multimedia (Oxford;
Focal Press): 311-312.
181
considerations. The third sense in which a diffusion system can be
evaluated in terms of this criterion concerns the level of technical
support (access to documentation, shared knowledge, et cetera)
available for the software and hardware components and this is,
therefore, fairly closely related to the previous criterion. If system
components are poorly supported in this respect, then it will be more
difficult for performers to assimilate the necessary operational skills and
the system will consequently be less accessible. Again, this is
particularly true in the case of unconventional and/or custom-built
components.
The importance of system accessibility in this respect has already been

alluded to in the previous section. Kirk and Hunt (and, less directly,
Harrison) liken creative interaction with a system to the practice of
playing a musical instrument, noting that to become an accomplished
instrumental player often involves years of disciplined training. This
comparison is particularly valid in the context of sound diffusion, which
is indeed a form of musical performance. In terms of the various factors
described in the present section, traditional musical instruments (oboes,
pianos, guitars, violins, drum-kits, et cetera) are reasonably highly
accessible: financially, most teaching institutions (and, in some cases,
individuals) are able to afford them; certainly, these instruments are
readily available, and culturally well-supported in terms of tuition,
relevant literature, performing ensembles, and so on. So, while these
instruments may not be enormously accessible in the sense of item 4.3.5
– as observed by Kirk and Hunt, it cannot justifiably be argued that they
are ‘easy to play’ – this is offset by a high degree of accessibility in the
present sense.176
In any performance-related musical discipline, access to the instrument

(in this case, the diffusion system) is important primarily because it
176
In this respect, items 4.3.5 and 4.3.6 can be regarded as ‘inversely proportional’ to each
other, insofar as a relatively low level of accessibility in terms of the one can be
compensated for, to a certain extent, by a higher level of accessibility in terms of the other.
A relatively high degree of accessibility in both senses would, of course, be ideal.
182
allows performers to practise in their own time. It would be completely
unreasonable to expect a concert pianist to rehearse only on the day of
the performance itself, yet this demand is – in the author’s experience –
fairly usual in the context of sound diffusion. This could be attributable
to the fact that sound diffusion systems – as performance instruments –
are relatively inaccessible in the present sense, and therefore often
become available to performers only on the day of the concert. If
systems were more accessible then institutions – perhaps even
individuals – could build systems to rehearse with outside of the
performance context, thus greatly increasing the potential for
accomplished performance.
4.3.7. Compactness / Portability and Integration

of Components
Performance venues in which equipment for the live diffusion of

electroacoustic music is permanently installed are comparatively rare
(although a few permanent systems will be described later in the
chapter). It is therefore reasonable to state that the process of staging a
performance of electroacoustic music will often entail the transportation
of all of the necessary diffusion equipment to the venue, and setting up
on location. If this is the case, it would seem advantageous for the
system to be as portable as possible, and relatively quick and simple to
set up. In addition, it is desirable that all of the equipment should be
sufficiently robust to withstand the stresses sustained in transit, and in
repeated setting-up and de-rigging throughout its life-cycle.
‘Integration’ implies the extent to which individual system components

are ‘normalised,’ where appropriate, in terms of their relationships with
one and other. If a system comprises components whose relationship
within the context of the system as a whole is fixed, it would seem
logical to integrate these. As a simple example, a CD player – rather
than being transported as a separate unit – could be hard-wired into the
physical audio input layer, assuming this is appropriate for the system in
question. This would reduce the amount of time taken to set the system
183
up and improve system reliability (to be discussed in section 4.3.8) by
minimising the potential for signal routing errors, faulty cables, and so
on. Care must be taken, however, to ensure that the process of
integration does not jeopardise the flexibility of the system in terms of
other criteria.
Aside from the obvious advantages, a system that is compact, portable,

and well integrated might yield advantages with respect to some of the
other evaluation criteria. If performers are to be able to practise in their
own time, then a compact system that can be set up and de-rigged
quickly and easily would be highly advantageous, for example.
The present criterion does not apply so obviously to diffusion systems

that are designed specifically for permanent installation in a
performance venue. It must be considered, however, that permanently
installed diffusion systems – by definition – can only be used inside the
venue within which they are installed. Unless a similar system can be
built elsewhere (see criterion 4.3.6), this means that only performers
with immediate access to the venue in question will have access to the
system for practice. It seems logical to propose, therefore, that for a
permanently installed diffusion system that is unavailable elsewhere to
be of widespread use, it would have at the very least to ‘score highly’ in
criterion 4.3.5 (i.e. it would have to be fairly quick and easy to learn).
4.3.8. Reliability
It should always be borne in mind that sound diffusion is a form of live

performance, and this, of course, means that a diffusion system must
take into account all of the issues inherent in live performance and real-
time operation. An important consideration, in this respect, is reliability:
if the technology is not stable in performance then it is not appropriate
for the task in hand. Robustly-built hardware systems would seem the
most likely candidates to score highly in this criterion, with systems
incorporating software components (which can, in some cases, crash
unexpectedly) potentially being less reliable.
184
It almost goes without saying, therefore, that software written
specifically for the purposes of sound diffusion should be optimised for
complete stability and reliability in performance. Assuming the software
itself is well-written, unpredictable problems can arise when host
computers of varying specifications are used; this can be presumed
usual in the case of multi-purpose Windows-, Macintosh-, and Linux-
based PCs, which are very rarely identical in software and hardware
specification. If the undoubted advantages of software technologies are
to be implemented in diffusion systems, therefore, the ideal in terms of
reliability would consist in rigorously tested software, running on a
system of known software and hardware profiles, and preferably
dedicated solely to the purpose of running the diffusion software.
Another consideration in evaluating the reliability of diffusion systems

(whether hardware, software, or a combination thereof) is the extent to
which any malfunctions that do occur can be rectified quickly and easily
within a performance context.
4.4. Criteria for the Evaluation of Sound Diffusion

Systems: Summary
In summary, it is proposed that the following ‘high level’ criteria can be

used to assess the effectiveness of sound diffusion systems as a means to
publicly performing works from across the spectrum of electroacoustic
music:
• 4.3.1: Ability to Cater for the Technical Demands of Live

• 4.3.2: Ability to Cater for both Top-Down and Bottom-Up
Approaches to Sound Diffusion
In the broadest possible terms, these two criteria have been designed for the
purpose of evaluating the ability of sound diffusion systems to support
electroacoustic works on a technological and aesthetic basis, respectively.
Subordinate to these ‘high level’ criteria are several further criteria:
185
• 4.3.3: Flexibility of Signal Routing and Mixing Architecture
• 4.3.4: Interface Ergonomics
• 4.3.5: Accessibility with respect to Operational Paradigms
• 4.3.6: Accessibility with respect to Technological System
Prerequisites and Support thereof
• 4.3.7: Compactness / Portability and Integration of Components
• 4.3.8: Reliability
It should already be reasonably clear that most of these criteria are strongly
interrelated, and that the relationship between them is potentially complex.
It seems unlikely that any diffusion system could perfectly satisfy all of the
criteria simultaneously, because they represent the conditional factors of a
multidimensional compromise that has no ideal solution. In other words, in
satisfying the criteria, certain trade-offs will be more or less inevitable.
4.5. Description and Evaluation of Existing Sound

Diffusion Systems
The following sections will seek – on the basis of information available in

the literature and the author’s own experience – to evaluate certain existing
sound diffusion systems with respect to as many of the criteria given in
section 4.3 as possible. Evaluation of existing diffusion systems cannot be
absolutely systematic, as such. This would be enormously long-winded and,
in any case, the literary information available is not always sufficient to
permit such an approach. Nonetheless, it is proposed that the criteria should
be of use to anyone wishing to undertake such a task, and with sufficient
information and experience at their disposal to do so.
The purpose of this appraisal is not by any means to discredit existing

diffusion systems in comparison with some unattainable ideal, but rather to
broadly evaluate existing systems in terms of their appropriateness for the
task of performing live electroacoustic music. Where limitations unduly
interfere with this overall objective, possible improvements in the design of
future systems will later be proposed. Equally, where advantageous features
are identified, these will be acknowledged as features that could or should
be incorporated in the design of future systems.
186
4.6. Generic Sound Diffusion Systems
As evidenced in the previous sections, the practice of sound diffusion

presents a fairly unique, not to mention challenging, set of demands.
Nonetheless, systems developed specifically for this purpose are
surprisingly few, many performances of electroacoustic music (the majority,
it is reasonably safe to suggest) being realised with what will be referred to
as ‘generic’ sound diffusion systems. A generic diffusion system (GDS)
comprises what might be considered ‘every day’ studio hardware. That is to
say, it consists of an ‘ad hoc collection of equipment’177 whose functionality
is not uniquely devoted to live concert sound diffusion and was perhaps not
originally designed or obtained for that specific purpose. Typically these
systems are built around a conventional studio mixing desk that serves as
the physical audio input layer, mix engine, control interface, and physical
audio output layer. Such systems are very common, and are in use at the
University of Glasgow178 and at the Spatial Information Architecture
Laboratory (SIAL) of RMIT (Royal Melbourne Institute of Technology)
University,179 to give but two examples. Two variations on this design are in
common use.
4.6.1. Generic Diffusion System 1 (GDS 1):

Description
A typical small system of this nature might consist of the following

main components:
• Audio Source – most often a compact disc player or other stereo

player.
• Studio Mixing Desk.
• Loudspeaker Array.
177
Roads, Kuchera-Morin and Pope (2001). "The Creatophone Sound Spatialisation
Project". Website available at: http://www.ccmrc.ucsb.edu/wp/SpatialSnd.2.pdf.
178
University of Glasgow (2005). "Department of Music: Diffusion System". Website
available at: http://www.gla.ac.uk/departments/music/studio/diffusion.html.
179
SIAL (2002). "Sound Diffusion System". Website available at:
http://www.sial.rmit.edu.au/Resources/Sound_Diffusion_System.php.
187
Figure 23 and Figure 24, below, illustrate the typical hardware
configuration and routing capabilities, respectively, of such a system.
Audio source channels are routed to mixing desk input channels, which
are typically routed internally to the mixing desk’s group buses. In a
mixing desk with eight group channels – the commonly used Soundcraft
Spirit Studio desk for example180 – the ‘left’ audio source channel (such
systems are inherently predisposed to stereophonic sound diffusion)
would be routed to input channel 1 and this, in turn would be routed to
group buses 1, 3, 5, and 7. The right audio source channel would be
connected to input channel 2 and routed to groups 2, 4, 6, and 8. Thus,
four copies of the stereophonic CASS are obtained. Group bus outputs
are then routed appropriately to active loudspeakers – in this case the
odd group channels are routed to ‘left’ loudspeakers, and the even
channels to ‘rights’ – and the audio source can be diffused among
stereophonic coherent loudspeaker sets using the group faders. This
technique is described by Harrison as follows:
An easy way to derive the multiple outputs needed is to use the ‘groups’
on a standard mixer – the stereo input from the DAT or CD is routed to all
available groups and the relative levels achieved by ‘playing’ the group
faders.181
As indicated in Figure 24, the group buses are in fact usually arranged
into four stereo pairs, with the left-right balance for each channel sent to
a stereo group bus being determined by the position of the pan pot for
that channel.
180
This mixing desk is used for sound diffusion at the University of Wales, Bangor and
was, until recently, also used at the University of Sheffield Sound Studios. A slightly larger
version was used in Performance Space 1 at the Sonic Arts Network Expo 966 conference
held in Scarborough, 17th – 20th June 2005.
181
188
CD Player DAT
L R L R
Channels Groups
Loudspeaker
Array
Figure 23. Set up and basic

signal routing diagram for
GDS 1.
Channel Inputs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
I I I I I I I I I I I I I I I I
Diffusion
Controls
C C C C C C C C C C C C C C C C
1 G 1
2 G 2
Group Channels
Group Outputs
3 G 3
4 G 4
5 G 5
6 G 6
7 G 7
8 G 8
KEY: I = Channel Input Gain C = Channel Fader
(L)
G = Group Fader = Routing determined by pan pot position
(R)
Figure 24. Schematic diagram

illustrating the routing
capabilities of a typical GDS 1
based on a 16/8/2 bus studio
mixing desk.
189
Description
A widespread variation of the system described above incorporates a

‘splitter box’ into the system between the audio source and the mixing
desk. This system is illustrated diagrammatically in Figure 25 and
Figure 26. The splitter box receives the audio source signals and
duplicates them several times, relaying copies to its multiple outputs.
These output streams are then routed directly to the input channels of
the mixing desk, with loudspeaker feeds being taken directly from
individual channel outputs. In this case the task of CASS duplication –
particularly important in the case of pluriphonic diffusion – takes place
outside the mixing desk. Diffusion is executed using the channel faders
to control individual loudspeaker levels.
CD Player
L R
Splitter Box
Channels Groups
Loudspeaker
Array
Figure 25. Set up and basic

signal routing diagram for
GDS 2.
190
Channel Inputs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
I I I I I I I I I I I I I I I I
Diffusion
C C C C C C C C C C C C C C C C
Controls
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Channel Outputs
Figure 26. Schematic diagram

illustrating the routing
architecture of GDS 2.

Evaluation
With respect to criterion 4.3.3, an advantage of this system is that the

audio sources routed to the group faders can be switched. In
performance situations this makes it relatively easy to deal with
consecutive compositions whose audio streams originate from different
sources: a CD player can be routed to input channels 1 and 2, a DAT
player to channels 3 and 4, a stereo microphone pair to channels 5 and
6, an ADAT player to channels 7 to 14, and so on, and performers can
alternate between audio sources reasonably quickly and easily, either by
bringing up the levels of the appropriate channel faders or by switching
between group bus routings, in both cases without the need for physical
re-patching. This represents a reasonable degree of flexibility in terms
of signal routing.
In terms of its mixing architecture (a further dimension of criterion

4.3.3), however, GDS 1 is less flexible. Although audio streams from
multiple sources can be presented simultaneously (works for live
instruments and tape, for example, could be accommodated to a certain
degree), the extent to which these sources can be independently diffused
is minimal. This is a function of the fact that the signal routing/mixing
architecture and the control interface are intrinsically linked: the faders
with which the performer controls the diffusion of audio sources to
191
loudspeakers are also an integral part of the system that deals with the
routing and mixing of audio source streams to physical outputs. Because
there is only one fader per physical output, the only way that (say) two
CASSs could be independently diffused would be via two separate sets
of loudspeakers and corresponding faders. As well as being fairly
unergonomic, this would also diminish the number of loudspeakers
available for the diffusion of each CASS. This is a fairly serious
problem given that the number of group channels (used as loudspeaker
feeds in this system) available is likely to be fairly small to begin with.
Indeed, a general disadvantage of GDS 1 is that the number of group
channels available on the mixing desk limits the maximum number of
loudspeakers in the array: for example, eight group channels allows for
a total of only eight loudspeakers.182 While this allows reasonable scope
for the pluriphonic diffusion of stereo works,183 CASSs with more
channels would be subject to increasing restrictions, with CASSs of five
or more channels effectively being limited to uniphonic presentation.
The fact that the amplitude levels of fixed input-to-loudspeaker routings

are each controlled individually with a single fader (in some cases
simultaneous control of a stereophonic pair with one fader may be
possible) also presents problems in terms of interface ergonomics
(criterion 4.3.4). In combination with the fact that faders are almost
invariably arranged in a horizontal row, this means that certain diffusion
actions will be physically difficult to execute. This problem would
appear to be fairly widely recognised:
The speed at which decisions can be implemented and the dexterity of

control required when operating equipment clearly influences performance
practice. Given the practicalities of very little rehearsal time in often
182
In practice, an additional pair of loudspeakers can be obtained by using the (normally
stereo) master outputs. Further channels can also be utilised via a somewhat clumsy method
of ‘copying’, by connecting channel sends to the inputs of unused channels and using the
direct outputs of these channels (assuming, of course, that these are provided) to drive
loudspeakers. Additionally, auxiliary buses can be used as independent channels, although
this is obviously far from ideal, as the levels thereof are normally controlled with rotary
potentiometers as opposed to the more familiar (and perhaps more appropriate) sliders.
Clearly, none of these techniques are particularly convenient.
183
Although, recalling the quotation given in section 3.12.1 (page 151), Harrison regards
eight loudspeakers as the minimum for the diffusion of stereo works.
192
inappropriate venues, this basic setup can be either extremely limiting or
(in the case of a large system) highly intimidating…184
JM: In your experience, how much is your diffusion of any given piece
restricted – or indeed enhanced – by the fact that you’re having to do it on
a ‘one fader, one loudspeaker’ basis, where the faders are arranged [from
left to right]?
JH: Well, inevitably it is influenced by that, because there are physical

constraints. You can of course re-patch or re-jig and get it to be better, and
easier to do certain things. I’ve done a lot of things in France, for example,
where they tend to start at one end of the mixer with the speakers that are
the furthest away from you in front, and then [gestures from left to right]
the next pair, the next pair, the next pair, and this end [gestures to the
right] is at the back. That’s absolutely tickety-boo if you want to do lots of
front-to-back or back-to-front things, but if you want to get the most
significant speakers near the audience to be going [makes antiphonal
sounds and hand gestures], you’ll find that there’s a pair there, a pair there,
a pair there, and a pair there [indicates four completely separate areas of an
imaginary mixing desk]; you can’t get that interaction, which is why we
don’t do that in BEAST. We put those main speakers in a group on the
faders so that you can do it easily. But then of course it means that if you
want to go front-to-back, it’s awkward. So, nothing’s perfect! Unless you
could suddenly re-assign the faders mid-diffusion. You [referring to the
M2 Diffusion System, to be described in Chapter 5] could do that,
couldn’t you? [Laughs].
JM: The bottle-neck as I see it, is the fact that most systems work on a
‘one fader, one loudspeaker’ basis. While you’ll have the advantage of
knowing exactly what everything does (that’s the positive side), the
negative side is that you essentially have an arbitrarily organised set of
controls that are fixed. So there will be things that are just going to be very
difficult to do.
JH: That is true.185
With this architecture and control interface, issues of ergonomics are

exponentially compounded in the case of works for more than two
channels.
If you’ve got an eight-channel source, and every channel of the eight has a
fader, how do you do crossfades? You haven’t got enough hands!186
This indicates that, as far as pluriphonic diffusion goes, GDS 1 is, in

practical terms, easier to use, and more flexible, for single-CASS works
with a small number of channels: given current conventions, this
basically equates to stereophonic fixed medium works. Works
184
Moore, Moore and Mooney (2004). "M2 Diffusion - The Live Diffusion of Sound in
Space". Proceedings of the International Computer Music Conference (ICMC) 2004.
185
186
Ibid.
193
comprising a single CASS of more than two channels are, essentially,
limited to uniphonic presentation, as are works incorporating more than
one CASS.
As noted, many of the problems described can be directly attributed to

the fact that control of the amplitudes of each loudspeaker is on a one-
fader-to-one-loudspeaker basis, and therefore actions involving multiple
loudspeakers will necessarily involve potentially complex (or, at worst,
impossible) interactions with multiple faders. Nonetheless, this does
afford the diffuser a relatively large degree of ‘low level’ control and
therefore (relating to section 4.3.5) offers the potential for accomplished
and virtuosic performance (particularly in the case of stereophonic
works), notwithstanding certain issues relating to interface
ergonomics.187 The potential for virtuosity is augmented by the fact that
the mixing desk fader – as observed by Harrison in section 4.3.5 – is an
interface familiar to most practitioners of electroacoustic music (and
also in other areas) and is therefore accessible as an operational
paradigm.
There are a number of further issues arising from the appropriation of

conventional mixing desks (which were not originally intended for this
purpose) in GDS 1. The effective presence of two input gain controls
per channel (namely the input gain itself, and the channel fader – this is
best observed in Figure 24) is clumsy, and can lead to unpredictable
signal levels in concert, with performers potentially adjusting the overall
input gain with two separate controls. Additionally, because audio
mixing desks almost invariably provide more channel strips than group
buses, so diffusion systems utilising mixing desks in this way can
exhibit an over-abundance of audio source inputs, as well as an
inadequate number of group bus outputs for use as loudspeaker feeds.
187
The keys of a piano, after all, are also arranged horizontally in a row, meaning, of
course, that certain combinations of notes will be physically impossible to play, and
comparable ‘limitations’ can be observed in all creative frameworks. It is rarely suggested
(in the author’s experience) that these limitations render the instrument ineffective; on the
contrary, the potential for subtle and virtuosic performance on such instruments is widely
acknowledged, perhaps in part because of the difficulty associated with playing them.
194
Overall, GDS 1 benefits from a high degree of accessibility in terms of
both operational paradigms and technological system prerequisites (this
will be discussed more fully in section 4.6.5). A reasonable degree of
flexibility with regard to signal routing is offered, but this is let down by
an overly restrictive mixing architecture, which is too closely integrated
with the control interface. Consequently, GDS 1 presents fairly serious
problems with regard to interface ergonomics. The various technical
demands of electroacoustic music are reasonably well-supported
(criterion 4.3.1) insofar as multiple technologies can be accommodated
and the signal routing architecture allows for these to be dynamically
assigned. They are less well supported in respect of multiple and/or
variable CASS demands, the system only really offering any degree of
flexibility in the pluriphonic diffusion of single-CASS stereophonic
works. Accordingly – with respect to criterion 4.3.2 – it could perhaps
be argued that GDS 1 exhibits a slight bias towards top-down works and
diffusion methods. On the other hand, the system presents no major
difficulties for the uniphonic presentation of works, which, in any case,
tends to be favoured by bottom-up practitioners. It can therefore be
proposed that GDS 1 does not exhibit any substantial top-down or
bottom-up bias, but – at the same time – neither does it offer an
enormous degree of flexibility or creative scope with respect to either of
these contrasting performance practice methodologies.

Evaluation
GDS 2 has the considerable advantage of being able to utilise all of the
channels on the mixing desk as speaker feeds (provided the splitter box
has a sufficient number of output channels, that is) and can therefore
potentially drive a substantially larger loudspeaker array than GDS 1.
This is simply due to the fact that normal studio mixing desks almost
invariably provide more input channel strips than group buses. The
system also has the advantage of offering independent EQ on every
effective output channel, which might be useful in compensating for
loudspeakers with non-linear frequency responses, or indeed for
195
‘colouring’ loudspeakers with uniform frequency response, should this
be desired. This adds a dimension of flexibility with respect to both top-
down and bottom-up methodologies that cannot be obtained with GDS 1
(which only offers EQ on the audio source inputs assuming a fairly
standard mixing desk is being used). Additionally, this configuration
has only one input gain control per channel (see Figure 26, comparing
this with the two effective input gain controls superfluously featured in
GDS 1, illustrated Figure 24), thus reducing the possibility of
unpredictable signal levels during performance and improving overall
system reliability.
In these respects a generic diffusion system incorporating a splitter box

can potentially make more efficient use of the resources available than
one without. However, GDS 2 does still exhibit a certain degree of
hardware redundancy insofar as the group and master sections of the
mixing desk will normally not be utilised. Consequently, the pan-pots
on each channel strip are also effectively useless. Additionally, as
Harrison observes, this setup is only possible ‘if the mixer has direct
outputs on all channels, [and] lower-cost desks tend not to have this
feature.’188
Although it yields a number of advantages, GDS 2 also presents some

considerable disadvantages, most notably in that (mainly owing to its
reliance on a splitter box) the signal routing and mixing is far less
flexible. In addition to the difficulties arising from a control interface
that is integral to the signal routing/mixing architecture (these were
described in the previous section and GDS 2 presents exactly the same
problems), in GDS 2 the input-to-output routings must effectively be
hard-wired when the system is set up (in GDS 1 the input-to-output
routings are at least switchable via the group bus architecture). Each
channel fader is part of a simple series circuit comprising audio input,
in-line attenuator (fader), and audio output (again, see Figure 26), and
188
196
owing to this direct and physical routing of inputs to outputs (there is no
abstraction whatsoever), the use of multiple audio sources in such a
system is highly problematic. In practical terms (unless a particularly
sophisticated splitter box with switchable input-to-output routings is
being used) the system is limited to a single audio source if physical re-
patching during performance is to be avoided.
Ergonomically, the control interface offered in GDS 2 is practically

identical to that of GDS 1, because both systems utilise normal studio
mixing desks. Although GDS 2 can potentially host a loudspeaker array
large enough for the pluriphonic diffusion of multichannel works (its
architecture allows for this; the same cannot realistically be said of GDS
1), the fact that each source-channel-to-loudspeaker amplitude must be
controlled with an individual fader makes this logistically extremely
difficult, as noted in the previous section. Stereophonic works, however,
can be pluriphonically diffused reasonably easily and – like GDS 1 – the
system presents no real problems for the uniphonic presentation of
multichannel works, except that this is likely to necessitate physical re-
patching unless all of the programme items utilise the same audio
source technology. What this implies is that the benefits of GDS 2 –
mainly, the fact that a larger loudspeaker array can be utilised – are only
really advantageous in the pluriphonic diffusion of stereophonic works,
meaning that GDS 2 is fairly biased in favour of top-down diffusion
techniques.
4.6.5. Generic Diffusion Systems in General:

Evaluation
An important advantage of both of the generic sound diffusion systems

is that they utilise commonplace studio hardware, and therefore should
be easily and conveniently within the capabilities of even fairly
modestly equipped studios with comparatively little additional outlay.
Generic systems are therefore accessible as per criterion 4.3.6, and
indeed it seems very likely that this is one of the main reasons for their
existence and continued use. These systems are fairly ‘modular’ in
197
nature, insofar as additional components can be added with relative
ease. This means that various different technologies can be
accommodated (in accordance with criterion 4.3.1) but this is not so
much a result of innovative design as a convenient characteristic of
communication via the standardised protocols of encoded audio. Such
systems are often, therefore, neither particularly portable nor integrated
(criterion 4.3.7), with individual components having to be transported
separately and time-consumingly assembled on location. This often
necessitates an enormous amount of cabling (particularly for GDS 2,
where a cable will be required to connect each splitter-box output to a
channel input: this represents an obvious aspect in which optimisations
could be made) as well as increasing the possibility of routing errors,
forgotten components, and so on.
In both GDS 1 and GDS 2 there is the potential for a relatively high
level of redundancy with respect to the hardware resources used. The
root of this problem can be traced to the fact that mixing desks are
designed and built with a view to mixing multiple audio streams into
(usually) a two-channel stereo mix, whereas with sound diffusion, the
aim is (in most cases) to take a finite number of audio channels and
project these into a greater, or at least equal, number of audio channels.
In this respect, the use of studio mixing desks for sound diffusion can be
viewed as something of a misappropriation. As noted by Harrison:
The drawback with […] [GDS 1] is that even 8-group desks normally have
16 or more input channels, of which only 2 are really needed for the stereo
input. If the mixer has direct outputs on all channels […], then input
channel faders can be used for diffusion, but this necessitates splitting the
stereo signal out from the source (via parallel boxes, for example) and
running several left/right signal cable pairs into successive pairs of input
channels [GDS 2]. BEAST has developed an elegant way of achieving
what is effectively a ‘mixer in reverse’ (2 in/many out) by using a
switching matrix through which any incoming stereo signal can be routed
to any pair of outputs. [This will be described later, in section 4.8].189
In addition, mixing desks can be heavy and cumbersome to transport –

further diminishing portability – and this seems somewhat inefficient
189
Ibid.
198
given that many of the hardware features will not, in fact, be used
during performance.
Although the mixing desk fader is highly accessible as an operational

paradigm (this is a definite advantage), the paradigm itself is not
particularly ergonomic, owing largely to the fact that the control
interface is also an integral part of the signal routing and mixing
architecture, which (as already observed) is not designed specifically for
the purpose of sound diffusion and accordingly falls short of meeting
certain demands. Works comprising multiple CASSs and/or
multichannel CASSs both present particular difficulties in this respect.
Overall, the single greatest advantage of generic diffusion systems rests

in the very fact that they are generic. Put simply, most institutions will
own all of the necessary hardware anyway, and this can be assembled in
a manner that is (more or less) appropriate for the concert programme to
be staged. Ultimately, however, the technology has not been designed
specifically for sound diffusion, and is therefore not as well suited for
this purpose as might be desirable.
4.7. The Acousmonium
4.7.1. Description
The Acousmonium was devised by François Bayle and Jean-Claude

Lallemand at the Groupe de Recherches Musicales (GRM), hosting its
first public performance in 1973.190 It is this system that first gave rise
the expression ‘orchestra of loudspeakers,’ using up to (and sometimes
over) eighty loudspeakers in any single performance. Relatively little
published literature relating directly and specifically to the
Acousmonium exists in English. The following description is, therefore,
based mainly on secondary sources and personal communications with
the author.
190
The first piece performed was François Bayle’s L’Experience Acoustique:
GRM (2004). "INA-GRM: Key Dates". Website available at:
http://www.ina.fr/grm/presentation/dates.en.html.
199
One notable characteristic of the Acousmonium is its use of
asymmetrical loudspeaker arrays; many other systems tend to favour
strictly symmetrical arrangements. The Acousmonium also incorporates
loudspeakers of different timbral characteristics as ‘soloists,’ an effect
that is augmented by the physical arrangement of loudspeakers in a
manner comparable to the instrumental sections in an orchestra.
Harrison further observes that the loudspeakers are ‘positioned in a
mainly frontal array on the large stage of the Salle Olivier Messiaen,’191
further reinforcing the orchestral analogy. Accordingly, it has been
observed that the Acousmonium specifically draws attention to the
loudspeakers as ‘performers,’ whereas many other systems, including
the Cybernéphone, (which will be described in section 4.11) attempt to
draw the audience’s attention away from the loudspeakers during
performance.192
Notwithstanding this (fairly subtle) distinction, others liken the

Acousmonium to the Cybernéphone. According to Stevenson, both of
these systems are similar insofar as:
Intensity panning, reverberation, equalisation and loudspeaker design or

‘registers’ [are used] to modify the perception of individual sources and
pre-mixed tracks. Loudspeakers are distributed throughout the auditorium.
The unique acoustic properties of these systems are exploited in much the
same way as a musical instrumental ensemble. It is [Stevenson’s]
impression that adapting a piece for these ensembles would be considered
similar to the process of re-orchestration of instrumental music. [My
italics.]193
Stevenson’s observations (particularly the italicised sections) appear to

indicate that the Acousmonium and Cybernéphone are similar insofar as
both seem to engender top-down approaches to the diffusion of sound. It
will later be proposed that – certainly in the case of the Acousmonium,
at least – this is indeed true.
191
192
NTMusic UK (2004). "Magnetic Tape Distribution". Website available at:
http://www.ntmusic.org/proj_dmu12.htm.
193
Stevenson (2002). "Spatialisation, Method and Madness - Learning from Commercial
Systems". Australasian Computer Music Conference (Melbourne; 6th - 8th July 2002).
200
Diffusion is executed using a mixing-desk-style interface with a fader-
based control paradigm. With up to eighty loudspeakers, control of the
diffusion on a one-fader-to-one-loudspeaker basis would clearly be
difficult. Therefore, according to Garro, the Acousmonium allows input
signals to be routed to loudspeakers on a ‘one-to-many’ basis. 194 In this
way, one fader can be used to control the amplitude of an input channel
routed to multiple loudspeakers. It is safe to assume that, in the system’s
original 1973 incarnation, this signal routing architecture – if indeed it
was available at this time – must have been facilitated in hardware; it is
unclear as to whether or not subsequent software additions have been
made.
4.7.2. Evaluation
Although there is no reason to believe that it is not technically capable

of accommodating a reasonably wide variety of electroacoustic works,
the Acousmonium – more so than either of the systems described
previously – demonstrates a clear and considerable bias towards top-
down methods, and this is due mainly to the nature of the loudspeaker
array. The use of loudspeakers that ‘colour’ the sound is normally
contrary to the bottom-up ethos of transparent reproduction of the
electroacoustic work, and is also objectionable to some top-down
practitioners. The asymmetrical arrangement of loudspeakers potentially
makes it difficult to accommodate works that demand a specific or
symmetrical loudspeaker array, for example multichannel fixed media.
Because bottom-up works are more likely to make use of multichannel
media, this represents another respect in which bottom-up
methodologies are marginalised by this system. This is, of course,
completely in-keeping with the top-down tradition, which – as described
in Chapter 2 – is descended from musique concrète and has its roots in
the GRM itself.
194
Diego Garro (2005). Personal communication with the author.
201
Nonetheless, the ability to control the amplitude levels of multiple
loudspeakers with a single fader is most certainly an advantage from a
number of perspectives. Firstly, and most obviously, interface
ergonomics are greatly improved. Secondly, support for multichannel
works is improved: Garro reports that audio source channels can be
individually assigned to different groups of loudspeakers,195 meaning
that multichannel fixed-medium works (which will normally demand a
fairly specific CLS shape) can make selective use of the asymmetrical
loudspeaker array. Given the large size of the loudspeaker array, this
architecture might also permit simple pluriphonic diffusions of
multichannel works, although in this case interface ergonomics may
again become an issue. The system does have the advantage of being
based around the familiar fader, and this compensates, to a certain
extent, for the fact that it is not particularly portable.
Overall, the Acousmonium appears to be technically more flexible than

either of the generic systems, but has a markedly distinct aesthetic bias.
This bias is far more favourable to the top-down ethos than the bottom-
up, but even within the scope of top-down practice the use of
asymmetrical loudspeaker arrays with non-linear frequency response
can be subject to strong criticism. Therefore, although apparently fairly
flexible, the Acousmonium is only ideally suited to those practitioners
who subscribe to the reasonably specific set of aesthetic values that it
engenders.
4.8. The BEAST Diffusion System
4.8.1. Description
The diffusion system used by Birmingham Electro-Acoustic Sound

Theatre (BEAST) includes a custom-built DACS 3D diffusion console,
which Harrison describes as ‘a mixer in reverse (two in, many out)’.196
195
196
202
A similar console is also in use at Keele University.197 This console
offers twelve channel inputs and thirty-two outputs, connected via a
matrix of input-to-output routing buttons which can be switched on or
off in stereo pairs. Additionally, the left-right orientation of each stereo
routing can optionally be reversed, resulting in the routing schematic
given in Figure 27, below. Each channel has an individual input gain
control and insert point, and diffusion is realised using the thirty-two
output channel faders. A more detailed specification for the 3D console
is available via the DACS website.198
Channel Inputs
1 2 3 4 5 6 7 8 9 10 11 12
I I I I I I I I I I I I
C 1
C 2
C 3
C 4
C 5
C 6
Channel Outputs
C 7
C 8
C 9
C 10
C 11
C 12
C 13
C 14
C 15
C 16
....32
Indicates how inputs can be routed to outputs
Figure 27. Simplified schematic

diagram illustrating the signal
routing capabilities of the DACS
3D Sound Diffusion Console
4.8.2. Evaluation
As confirmed by Harrison (see quotation given in section 4.6.5), the

DACS 3D diffusion console was developed in response to practical
197
198
DACS (Unknown). "3D Sound Diffusion Console". Website available at:
http://www.dacs-audio.com/Custom/3d_console.htm.
203
experience with the generic systems described in section 4.6,
incorporating the beneficial features of each of the two variants and
addressing some (but not all) of the problems. Input channels can be
routed to output buses with the same degree of flexibility offered in
GDS 1 (and lacking in GDS 2). Additionally, the console allows for
diffusion across a large loudspeaker array, as in GDS 2, but without
compromising the flexibility of the signal routing. Incorporating
hardware designed specifically for the purpose of sound diffusion, the
BEAST system represents an enormous improvement with respect to
criterion 4.3.3 in comparison with the generic setups, effectively
dispensing with the hardware redundancy previously cited as a flaw of
the latter. Different audio sources can be selected reasonably quickly
and easily with the input-to-output routing buttons, and the ample
twenty-four audio inputs ensures that this is unlikely to necessitate
physical re-patching except in fairly extreme circumstances.
However, the system presents precisely the same difficulties with

respect to interface ergonomics that were observed in relation to the
generic setups, insofar as the diffusion console operates on a strictly
one-fader-to-one-loudspeaker basis. To reiterate, this can make the
pluriphonic diffusion of multichannel works, in particular, logistically
difficult. Indeed, the system would certainly appear to have been
designed specifically with the diffusion of stereophonic fixed medium
works in mind: routings are made in stereo pairs, and the typical
BEAST loudspeaker array consists of multiple loudspeakers distributed
throughout the venue in stereo pairs in the manner illustrated in Figure
14 (page 141). Accordingly, the diffusion of works with different
technical demands – a greater number of fixed medium channels or live
microphone feeds for instance – can become problematic to a certain
extent. Although the BEAST system offers a certain degree of input-to-
output routing abstraction (the routings are not ‘hard-wired’ as they are
in GDS 2), problems can still be attributed to the fact that the
routing/mixing architecture and control interface are too closely
interrelated. Nonetheless, it can always be argued that the mixing desk
204
fader is an accessible paradigm with respect to criterion 4.3.5 and this
is, of course, an extremely important consideration.
In terms of portability, the BEAST system is broadly similar to the

generic systems. Although the diffusion console undoubtedly represents
a far more elegant solution (particularly in comparison with the
cumbersome splitter box incorporated in GDS 2), audio sources – for
example – are still provided by external units that must be integrated
into the system in the performance venue. However, the DACS 3D is
ruggedly built, regular transportation between venues presumably
having been an explicit design consideration, and this cannot always be
said for conventional studio mixing desks. In this respect, the BEAST
system is potentially more reliable than either of the generic systems.
The BEAST system also benefits, to a certain extent, from the fact that
audio source technologies can be integrated into the system in a manner
comparable with normal studio mixing hardware, thus making the
system a broadly familiar – and therefore accessible – platform to work
with. The only ‘non-standard’ component is the mixing console itself,
which currently must be built to order by DACS Limited, Gateshead. At
the time of writing this can be done for a total cost of £5915, the order
taking around two months to complete.199 Although this is not, perhaps,
as readily accessible as standard commercial mixing hardware, there are
a number of advantages. Firstly, the desk has been designed specifically
for the purpose of sound diffusion and is therefore generally more
appropriate for the task. Secondly, the exact specification – in terms of
the number of inputs and outputs – can be tailored to suit the particular
requirements of the individual or institution in question.
Overall, the BEAST diffusion system is far more appropriate than the
generic systems in a number of respects, as a platform for the live
presentation of electroacoustic music. The technical demands of the
199
Douglas Doherty, Managing Director of DACS Limited (2005). Personal
communication with the author.
205
idiom are fairly well accommodated, mainly by virtue of the signal
routing architecture, which is simple and effective and therefore well-
suited to live performance. Like GDS 1, the BEAST system does not
exhibit any overly prohibitive bias towards top-down or bottom-up
techniques (in comparison with, for example, the Acousmonium, which
is enormously biased in favour of the top-down), although it can
certainly be argued that the top-down ethos is favoured to a certain
extent. Like the generic systems, however, the biggest criticism is
relatively poor interface ergonomics, stemming from the fact that
diffusion must be executed on a one-fader-to-one-loudspeaker basis.
This means that the capabilities of the system can only really be fully
exploited in the case of pluriphonic diffusions of stereophonic fixed
medium works, hence the slight top-down bias. Although bottom-up
techniques are by no means institutionally excluded (this argument
could, tentatively, be made with reference to the Acousmonium), neither
are there any features that are of particular and specific use to bottom-up
practitioners.
4.9. BEAST’s Octaphonic Diffusion Experiments
4.9.1. Description
It has been noted that the standard BEAST diffusion system – in

common with the generic systems – presents certain difficulties in the
pluriphonic diffusion of more-than-stereo works. In response to this,
experimental research into the pluriphonic diffusion of octaphonic fixed
medium works (that is, the presentation of an eight-channel CASS via
multiple eight-channel CLSs) has been carried out by BEAST. At the
time of writing, these experiments have not been documented
elsewhere.
In an interview with the author,200 Harrison described an array

comprising eighty loudspeakers arranged in ten eight-channel CLSs
200
206
throughout the hall, roughly as illustrated in Figure 28, below. Six of the
CLSs – ‘main,’ ‘close,’ ‘floor,’ ‘higher,’ and ‘gallery’ and ‘ceiling’ –
were arranged in a fairly standard circular/ovoid octaphonic shape, with
the speakers inclined inwards towards the centre. In the ‘stage’ array, all
eight loudspeakers pointed towards the audience and were slightly raked
so that those further away were positioned higher than those closest to
the audience. The ‘central’ array comprised eight small loudspeakers
arranged in a circle and pointing outwards. The ‘ferris’ array formed a
vertically-oriented circle of eight loudspeakers around the audience on a
left-to-right axis, like a Ferris-wheel, with each of the speakers pointing
inwards. A second vertical array – ‘rose’ – was set up on the far wall of
the venue resembling a rose window in a church, with each of the
speakers facing towards the audience.201
201
As an aside it is interesting to note that, although each CLS is slightly different in
formation, and some are more different than others, they all conform to the appropriate
‘shape’ for this kind of coherent loudspeaker set. That is, they all meet the criteria – in
Harrison’s estimation, which is oriented toward the top-down end of the spectrum – for the
accurate reproduction of the coherent audio source set(s) in question. This clarifies and
reinforces certain points that were made in sections 3.5 to 3.8.
207
KEY:
Main Central Ferris (vertical)
Close Gallery (raised) Rose (vertical)
Floor (lowered) Ceiling
Audience
Higher (raised) Stage (raked)
Figure 28. Approximate

loudspeaker positions for the
BEAST octaphonic sound
diffusion experiments.
Such a system would obviously be completely unmanageable with a

fader controlling the level of each individual loudspeaker. Accordingly,
diffusion was controlled using a MIDI fader box in conjunction with an
intermediate software interface programmed in Max/MSP, where each
MIDI fader controlled the amplitude level of a group of eight
loudspeakers. This, of course, supports the notion of coherent audio
source sets and coherent loudspeaker sets as useful concepts in the
context of sound diffusion. The result was a ten-fader interface
controlling diffusion to an eighty-strong loudspeaker rig. In addition, an
208
eleventh ‘virtual’ fader was created that inverted the octaphonic image
in the ‘main’ array, that is, input 1 was routed to output 8, input 2 to
output 7, input 3 to output 6, and so on. This demonstrates a degree of
input-to-output routing flexibility that is extremely difficult to achieve
with standard audio mixing hardware, with the exception perhaps of
matrix mixers, which will be discussed later.
4.9.2. Evaluation
It must first of all be noted that the system described in the previous
section is very much experimental and has not (at the time of writing,
and as far as the author is aware) been used for any public performances
or further experiments to date. The system was not presented to the
author as a ‘finished product,’ and can therefore not be justifiably
evaluated as such. Nonetheless, an evaluation in terms of the criteria
given in section 4.3 – as far as this is reasonable – will prove useful.
The main advantage that this system offers is that the signal
routing/mixing architecture and control interface are highly abstracted
from each other. The same cannot truly be said of any of the systems
described previously, where the control interface, at some level, is an
integral part of the routing and mixing architecture itself. Here, the
control interface ‘merely’ provides data relating to the position of each
fader, and this data can ostensibly be interpreted in any way by the
software mix engine. In this case, data relating to the position of each
individual fader is used to simultaneously control the output levels of an
eight-channel coherent loudspeaker set, with appropriate CASS-
channel-to-loudspeaker routings also being facilitated in software. This
is conceptually interesting as it foreshadows and supports the notions of
CASS and CLS in sound diffusion, insofar as decisions must be made
with regard to the grouping of audio source streams and corresponding
loudspeakers. The benefits of this are particularly evident in the ‘virtual’
array in which input-to-output routings are dynamically inverted in
software. This particular specification is, of course, exclusively suited to
the pluriphonic diffusion of eight-channel works and can therefore be
209
regarded, on the whole, as an extension of the top-down ethos broadly
advocated by BEAST.
Clearly this setup presents vastly improved interface ergonomics: a

cross-fade between two CLSs can be achieved with the manipulation of
only two faders as opposed to the practically impossible sixteen that
would be required to realise the same results with many other systems.
This system also has the advantage of presenting the performer with a
simple and (usually) familiar control interface – a bank of faders –
whose operation is basically similar to that of a conventional studio
mixing desk. Accordingly, the system meets criteria 4.3.4 and 4.3.5
reasonably well, mediating the potential ‘trade-offs’ inherent in these
criteria in an effective manner.
It could also, of course, be argued that – in comparison with the

potentials of abstracted software control – the system does relatively
little, adhering quite closely to the somewhat limited model presented
by conventional studio mixing hardware. However, it must be recalled
that this is a basic experimental system designed only to explore fairly
specific creative objectives. Indeed, this approach – using Max/MSP as
a platform for the implementation of experimental software mix engines
with abstracted physical control – could prove to be an exceedingly
effective (not to mention relatively quick and straightforward) way of
prototyping new sound diffusion paradigms, and much research could
be undertaken in this way. Because the system incorporates fairly
standard components (many institutions with an interest in
electroacoustic music will have easy access to Max/MSP, MIDI fader
modules, multichannel audio interfaces, et cetera) and is reasonably
loosely specified, it is therefore accessible according to criterion 4.3.6,
and such research can potentially be widely engaged with.
Accordingly, it can be said that the approach adopted by BEAST in

these experiments is one that could be taken much further, with
potentially enormous benefits in the field of sound diffusion. As it
210
stands, the system is not, perhaps, as reliable as might be desirable in a
live context: there is no indication, for instance, that the software has
been extensively tested. Neither is the system as compact and integrated
as it could be. Additionally, in this particular instance, there is a
demonstrable bias in favour of the top-down agenda, to the practical
exclusion of bottom-up techniques. However, such limitations are surely
to be expected in an experimental research context, and the model
adopted would certainly allow for these issues to be addressed in the
future.
4.10. The Creatophone
4.10.1. Description
The Creatophone project is based at the Center for Research in

Electronic Art Technology (CREATE) at the University of California,
Santa Barbara, in the USA. Research in progress at CREATE expresses
an interest in surpassing the ‘one-fader-to-one-loudspeaker’ paradigm
with grouped loudspeakers, thus addressing issues similar to those
tackled by the Acousmonium and by BEAST’s experimental octaphonic
system:
The first full realization of the Creatophone […] would be designed from a
number of loudspeaker, amplifier, and mixer components especially
selected for this use. A set of 16 pairs of loudspeakers and 16 stereo
amplifiers would be deployed. In order to keep costs down and make the
task of spatialisation manageable in concert, the mixer would be limited to
8 output channels. This means that certain loudspeaker pairs would be
grouped, that is, assigned to a single potentiometer. [Note the references to
pairs of loudspeakers and stereo amplifiers: we will return to this later.]202
The Creatophone also ‘take[s] advantage of the particularities of certain

loudspeakers,’203 and as such is comparable to the Acousmonium
(described above) and the Cybernéphone (to be described below).
Roads et al also plan to develop a system for the simultaneous diffusion

of multiple independent coherent audio source sets, additionally (and
202
Roads, Kuchera-Morin and Pope (2001). "The Creatophone Sound Spatialisation
Project". Website available at: http://www.ccmrc.ucsb.edu/wp/SpatialSnd.2.pdf.
203
Ibid.
211
correctly) observing that ‘one-fader-to-one-loudspeaker’ control would
be insufficient for this kind of performance:
As the project evolves the Creatophone can be directed towards research

in control of spatialisation in concert from multiple input sources. This
[…] requires computer-controlled digital mixing technology. 204
It is unclear – owing largely to a lack of detailed published material –

exactly how many of the development objectives for this system have
actually been realised at this time. When used for the diffusion of
electroacoustic works at the Sound in Space Symposium – held in
March 2000 at CREATE – the system was described by Harley as
follows:
The Creatophone is a spatial sound projection system of flexible

configuration. For this symposium, the various inputs (ADAT, DAT, CD,
computer with 8-channel output) were fed into a 16-bus Soundcraft mixer
and then out through four Threshold stereo power amplifiers via Horizon,
AudioQuest, and Tara interconnects and MIT and AudioQuest speaker
cables to eight B&W Matrix 801 loudspeakers. […] The loudspeakers
were placed in a standard octagonal configuration with two in front, two in
rear, and two each on either side. [It can be deduced that this array is
somewhere in between the two configurations illustrated in Figure 11 and
Figure 12 (page 136). Such arrays are – as far as the author understands –
fairly standard in the United States.] The best listening was found in the
center, as one would expect, but many spatialized effects could be heard
quite effectively from other locations as well. The superior clarity and
power of the Creatophone sound system certainly enhanced the listening
experience, regardless of seating.205
Clearly this description falls short of the developers’ ‘first full

realization of the Creatophone’ (as described in the quotation given
previously) and is presumably not, therefore, representative of the
complete system. As little additional information has been forthcoming,
evaluation of this system (as far as this is possible) will have to be on
the basis of the proposed developments described in the literature.
4.10.2. Evaluation
The Creatophone system as described by Harley is, in fact, no different

from a generic sound diffusion system and, according to the quotation
given above, would appear to fall into the category of GDS 1 (see
204
Ibid.
205
Harley (2000). "Sound in Space 2000 Symposium". Computer Music Journal, 24(4)
212
section 4.6.1), as no splitter box is mentioned and the system comprises
only a small number of loudspeakers. Having said that, this information
is – at the time of writing – more than five years old, and it can be
presumed that subsequent developments may have taken place during
this time.
The developers express an interest in ‘the peculiarities of certain

loudspeakers,’ which strongly suggests that loudspeakers that ‘colour’
the sound are to be utilised. The design objectives also seem to favour
stereophonic audio sources, insofar as there is no mention of
multichannel sources but much reference to ‘loudspeaker pairs,’ ‘stereo
amplifiers,’ and so on. Furthermore, simultaneous control of multiple
loudspeakers with a single fader has been described as a possibility.
This would, of course, be beneficial from the standpoint of interface
ergonomics. On this basis, this system appears to be rather similar in its
objectives to the Acousmonium, although almost certainly less biased in
favour of top-down methods.
Perhaps most interestingly, the CREATE team express an interest in

developing tools for the simultaneous independent diffusion of multiple
CASSs. This is an area of research that the author believes would be of
tremendous benefit within the field of sound diffusion in terms of
accommodating works from across the creative and technological
spectrum of the electroacoustic idiom, not least because none of the
systems reviewed easily facilitate the real-time realisation of such
practice. Similar interests are expressed by Harrison and Truax (see
section 3.5), although fully developed systems are not yet in evidence.
With these proposed developments borne in mind it would appear that

the Creatophone might be of considerable interest and benefit to top-
down practitioners, and accordingly perhaps less useful to those with
more characteristically bottom-up concerns.
213
4.11. The Cybernéphone
4.11.1. Description
The Cybernéphone – known up until 1997 as the Gmebaphone – has

been under continuous development at the Institut de Musique
Électroacoustique de Bourges (IMEB) since 1973. The sixth and latest
version of the system was first used in performance at the 24th Concours
International de Musique et d’Art Sonore Électroacoustique, in
Bourges, in 1997, and is described at length by Clozier.206
Fixed medium audio sources are transferred onto the hard drive of a
computer. In performance the computer mixes the audio source signals,
played back directly from the hard drive, to fifty loudspeaker output
channels, according to input obtained from an intermediate user
interface. According to Clozier, there are two primary user interfacing
methods: ‘manual mode’ and ‘computer-assisted diffusion mode.’207 In
‘manual mode,’ the performer effectively has direct one-to-one control
over individual loudspeaker output levels via faders, in much the same
way as with BEAST and the ‘generic’ diffusion systems. ‘Computer-
assisted diffusion mode’ essentially allows predetermined fader
movements to be triggered during performance. Such fader movements
can be pre-programmed in two ways: either they can be ‘recorded’
directly from the faders into a sequencer; or parameters can be specified
offline in ‘configuration tables.’ Pre-programmed actions can then be
triggered manually during performance, or automatically by a
sequencer.
In addition to simple mixing between output channels in real time or

semi-automatically, the Cybernéphone also allows the diffuser to
perform various treatments on the sonic material: these include ‘time,
206
Clozier (1998). "Presentation of the Gmebaphone Concept and the Cybernephone
Instrument". In Barrière and Bennett [Eds.] Composition / Diffusion in Electroacoustic
Music: 266-281.
207
Ibid.: 269.
214
timbre, space, phase and detuning.’208 Although Clozier does not
describe exactly what these processes involve, one might deduce that
the system allows the performer to perform time-delay, filtering,
amplitude diffusion, phase adjustments and pitch shifting on the audio
signals. Again, these actions can be performed in real time or
sequenced.
An important aspect of the Cybernéphone that differentiates it from

many other diffusion systems is the loudspeaker configuration used.
Notably, many of the loudspeakers used have limited frequency
response bands. Consisting of a number of such loudspeakers, these ‘V-
systems,’ as Clozier describes them, ‘analyze and select the timbres and
redistribute them in six sound color registers for each of the left and
right channels (2 basses, 2 mediums, 2 trebles) which are sent to special
loudspeakers. When the latter are set up, from the very lowest bass to
the very highest treble, they effect an acoustic resynthesis of the
sounds.’209 In other words, discrete bands of the audio spectrum are
reproduced by loudspeakers with different frequency responses
(essentially ‘filter characteristics’) in different locations, thus
distributing the sound in space differently according to its frequency
content. The ‘bandpass filter’ characteristics of each V-system are such
that, across the group of loudspeakers, the entire frequency spectrum is
represented, but not by any one, single, loudspeaker. In addition to these
‘registered’ groups of loudspeakers, ‘reference’ loudspeakers are also
provided. These exhibit a linear frequency response across the audio
spectrum but can be subject to the user-specified signal processing
procedures mentioned in the previous paragraph. Loudspeakers are
arranged in pairs, symmetrically about the front-to-back axis of the hall,
but are conceptually sub-grouped in various combinations into ‘planes’
and ‘diagonals,’ again foreshadowing the concept of the coherent
loudspeaker set.
208
Ibid.: 268.
209
Ibid.: 268.
215
4.11.2. Evaluation
Clozier describes the Cybernéphone as follows:
The Cybernéphone may be defined as a huge acoustic synthesizer, an

interpretation instrument that the composer plays in concert, an instrument
that serves to express his composition, to enhance its structure for the
benefit of the audience, to bring it to sonic concretization.210
Referring back to section 3.12.1 in the previous chapter, it could be

deduced that the Cybernéphone – at least, as conceived by Clozier – is
very much oriented towards top-down diffusion techniques, in-keeping
with the idea that a diffusion system should offer the performer as much
interpretative scope as possible, rather than being focused on providing
an essentially ‘neutral’ reproduction of the electroacoustic work.
Nonetheless, in terms of the techniques and processes offered to the
performer, the Cybernéphone in fact caters for both top-down and
bottom-up methodologies extremely well and without obviously
exhibiting any bias toward one or the other.
The two parallel modes of operation – ‘manual mode’ and ‘computer

assisted diffusion mode’ – would seem immediately and obviously
conducive to top-down and bottom-up approaches to sound diffusion,
respectively. In manual mode, diffusion can be executed in real-time, in
a manner that is perceptually effective and improvisatory as far as this is
appropriate to the piece and the performance context; this is firmly in-
keeping with the top-down aesthetic. In computer assisted diffusion
mode, on the other hand, actions that have been pre-programmed offline
can be executed automatically. This is more in tune with the bottom-up
aesthetic because it allows the parameters of diffusion actions to be
precisely specified outside of the performance context (this can,
accordingly, be done with respect to any kind of deterministic scheme
as required, as opposed to being ‘whimsically improvised’ at the
moment of performance) and triggered at an objectively defined and
mathematically accurate time.
210
Ibid.: 268.
216
Another advantage is that there would seem to be significant facility for
cross-over between these two fairly ‘polar’ modes of operation.
Predetermined diffusion actions, for example, can either be specified
offline as described above, or else recorded in real-time via the control
interface. Although the offline option would seem more immediately
affiliated with the bottom-up approach, it could certainly be useful to
the top-down performer as a means to facilitating actions that would be
difficult or impossible to attain in real time, for example. Equally, a
slightly more ‘liberal’ bottom-up diffuser may wish to define diffusion
actions in real time, but have those real time actions consistently and
accurately reproduced throughout the performance (perhaps also
wishing to achieve a degree of uniformity across several different
performances of the same work). In addition, these predetermined
actions – howsoever obtained – can either be triggered manually in real-
time, or automatically at a specified time, facilitating a further
dimension of ‘crosstalk’ between top-down and bottom-up techniques.
In comparison with those systems described earlier, it can also be

observed that the Cybernéphone has rather more scope in terms of the
range of diffusion actions available. By and large, the systems
accounted previously only allow the performer to manipulate the
relative amplitudes of various input-to-output signal routings,211
whereas the Cybernéphone – in addition to the usual amplitude
diffusion – offers time-delay, filtering, phase adjustment and pitch
shifting capabilities. Needless to say, each of these could be utilised in
either an essentially top-down (‘creative,’ in the sense of continuing the
compositional process) or bottom-up (perhaps more ‘corrective,’
directed toward achieving a more transparent presentation of the
finished work) manner and their inclusion is therefore extremely
beneficial from both perspectives. The benefits are augmented in that
211
Although certain systems – particularly those based around conventional studio mixing
desks – may offer certain techniques including EQ and phase inversion, for example, in
most cases these cannot be ergonomically controlled to the same extent possible with the
Cybernéphone. Of course, such things could be implemented in experimental approaches
such as those described in section 4.9.
217
these processes can be recorded and automated in each of the ways
described previously.
In terms of signal routing and interface ergonomics, it is implied that

loudspeakers can be grouped together to a certain extent and controlled
with a single ‘fader.’ This suggests a higher degree of fulfilment of
criteria 4.3.3 and 4.3.4 than that attained by those systems operating on
a one-fader-to-one-loudspeaker basis, although the literature does not go
into much detail in these respects. It is not particularly clear, for
example, exactly how flexible the signal routing and mixing architecture
is, nor to what extent the grouping of loudspeakers is fixed or dynamic.
Certainly, the presence of both ‘registered’ (coloured) and ‘reference’
(transparent) loudspeakers – particularly in conjunction with the ability
to dynamically apply equalisation – would be beneficial to both top-
down and bottom-up practitioners, depending of course on the extent to
which the diffuser has control over which loudspeakers are used at any
given time. Again, this is not particularly clear in the sources consulted.
Like all of the systems described so far, the Cybernéphone is based

around a control interface that is essentially comparable to a
conventional studio mixing desk and – like BEAST’s experimental
system – this is abstracted from the mix engine via the MIDI protocol,
adding a further dimension of flexibility. Personal communications with
the author suggest that this interface is accessible as per criterion 4.3.5
(according to my sources, it does not ‘feel’ enormously different from a
standard mixing desk) although the touch-sensitive strips are reported to
respond less well with sweaty hands, which can be problematic in the
stressful (and often rather warm) performance context.212
Moore et al observe that the Cybernéphone ‘is only suited to the

diffusion of stereo works from CD or DAT and is very difficult to adapt
212
Adrian Moore (2002) and Nikos Stavropoulos (2002). Personal communications with
the author.
218
to other formats.’213 This seriously restricts the fulfilment of criterion
4.3.1, as works with different technological demands cannot be
accommodated. However, unlike any of the other systems to be
described, the Cybernéphone addresses the issue of diffusion ‘practice’
outside the performance context:
The entire program [software] is made available on CD-ROM, allowing

the composer to prepare his diffusion and record it in sequence. The
diffusion can then be played as is, adapted to the conditions in the concert
hall and to the audience, or it may be viewed as a temporary configuration,
a guide sequence that can be interpreted in terms of dynamic timbres and
spaces, all in real-time, by a musician at the console. Second, the
composer can send his diffusion program on CD-ROM to be used in
concert in Bourges or elsewhere, or he can carry out the diffusion himself
via Internet.214
It is obvious that both top-down and bottom-up methodologies have

been considered, and that in allowing performances to be prepared
elsewhere the important issue of rehearsal time and familiarity with the
instrument has been addressed.
In summary, the Cybernéphone caters very well for both top-down and
bottom-up approaches to sound diffusion by offering an extended range
of diffusion actions (in addition to the standard amplitude diffusion) that
can have useful applications from both perspectives. It also benefits
from a capability that has not been observed thus far: automation.
Although this facility would, at first, appear more beneficial from the
bottom-up perspective, it would appear that this particular
implementation also has demonstrable benefits for the top-down
diffuser – as described previously – although the literature available
does not confirm or negate this deduction as assuredly as would be
ideal.
213
Moore, Moore and Mooney (2004). "M2 Diffusion - The Live Diffusion of Sound in
Space". Proceedings of the International Computer Music Conference (ICMC) 2004
214
Clozier (1998). "Presentation of the Gmebaphone Concept and the Cybernephone
Instrument". In Barrière and Bennett [Eds.] Composition / Diffusion in Electroacoustic
Music: 267.
219
4.12. Sound Cupolas and The Kinephone
4.12.1. Description
It is interesting to compare the stipulation, as proposed above by Clozier

and by many others, that an electroacoustic work is ‘concretised’ at the
performance stage, via a process that essentially continues the top-down
method of working, with the following, highly contrasting, assertion:
The spatial projection of a composition is realized first and foremost in the

studio, where the composition progressively takes shape according to a
precise working method… If the serious listening public is placed in a
manner unfavourable to the perception of the intended spatial
composition, then the spatial effect will be lost, and the composer, who
has spent many weeks at the difficult task of creating clear and precise
spatial forms will have wasted his time and energy… The studio in which
composition takes place, and of necessity the concert hall, must be as
anechoic as possible if there is not to be a serious risk of confusion of
parameters.215
It is fundamental musical beliefs such as these that have informed the

development of ‘sound cupolas’ – purpose-built standardised listening
spaces – from 1977 onwards, and sound diffusion systems (in some
respects) radically different from those described so far.
A sound cupola essentially consists of a hemispherical or hemi-ovoid

frame – erected within a performance venue – to which a large number
of loudspeakers are attached. All of the loudspeakers are angled towards
the central, focal, point of the symmetrical frame ‘and then very exactly
adjusted so as to obtain equal intensity, phase and spectrum.’216 The
natural acoustic characteristics of the venue are typically suppressed by
covering the top of the frame structure with cloth, or some other
acoustically absorbent material:
The structures and bridges of the scaffolding conceal all the cables, and
the cloth covering insulates the hall and renders it nearly anechoic.217
215
Music: 289.
216
Ibid.: 296.
217
Ibid.: 296
220
The number and positioning of loudspeakers in sound cupolas is
determined by research into the acoustics and psychoacoustics of
directional sound localisation. Küpper describes a series of experiments
undertaken in order to determine ‘the spatial interval,’ that is, a
notionally ‘universal’ subdivision of three-dimensional auditory space
into discrete intervals measured in degrees. Comparison with the
processes of ‘compartmentalisation’ described in section 2.2.2 should
identify such practice as a characteristically bottom-up approach.
Further to this research, Küpper proposes that the number of spatially
discriminable points on a notional ‘auditory sphere’ surrounding the
listener, is somewhere in the region of six-thousand; that is, the
supposed ‘resolution’ of the human hearing system, in terms of spatial
localisation, allows us to perceptually differentiate between
approximately six-thousand discrete sound source locations. In between
these points, differentiation is less reliable or impossible. Küpper also
notes that, ‘only 88 out of a maximum of 1400 determinable pitches
were chosen for the Western tempered scale (12 pitches per octave) for
a ratio of 1 out of 15.9.’ Using the same ratio, Küpper eventually states
that the spatial equivalent of a semitone:
…varies between 13.3 degrees and 15.8 degrees. We therefore propose 15

degrees as the tempered spatial interval, since 360 degrees may be divided
easily by this figure.218
Accordingly, loudspeakers in sound cupolas are, as far as possible,

positioned a ‘spatial semitone’ (15 degrees) apart, ultimately resulting in
around 102 equally spaced loudspeakers for a hemispherical or similarly
shaped array. A sound cupola installed at Linz in 1984 consisted of 104
loudspeakers attached to a hemi-ovoid frame. This configuration was
used for the performance of fixed medium electroacoustic works, as
well as pieces featuring live sound encoded via microphones.
218
Ibid.: 294.
221
In line with the bottom-up ethos, Küpper implies a strong distaste for
overly improvised, whimsical, or ‘intuitive’ diffusion, preferring more
‘objective’ schemes, or even automated computer realisations:
Four interwoven spirals […] enable the performers (one, two or four in
number) to set up spatial play more easily. Based on these four very
symmetrical spirals, the performers play tape compositions as well as live
concerts. In the light of previous experience, these four spirals are a safety
factor in helping to avoid clumsy, improvised or chance spatial
movements. Spatial performers do not yet exist, and in the future it would
be good if composers themselves were able to construct the spatial
structures and movements on computer.219
When Küpper states that ‘spatial performers do not yet exist,’ what he
means is that there is no universally accepted objective ‘scale’ for the
spatialisation of sound within which a performer can exhibit virtuosity.
In this context, any diffusion that is based ‘merely’ on subjective
judgements – without the need for, say, a ‘spatial scale,’ or a ‘score’ – is
completely discredited on the grounds that there is no objective
milestone against which to judge the quality or ‘correctness’ of the
performance, an attitude to which many top-down diffusers would
surely (and quite justifiably) object. It is also interesting to note that
Küpper describes the four predetermined spatial trajectories as ‘a safety
factor in helping to avoid clumsy, improvised, or chance spatial
movements.’ This implies that the spiral trajectories could – in Küpper’s
opinion – be meaningfully applied to any musical material, regardless of
the perceived nature of that material, an assertion that is indicative of an
absolutely characteristic bottom-up attitude. Firstly, it suggests that
deterministic musical structures can be (and, indeed, should be)
imposed upon ‘fundamentally inert’ musical material (recalling
Harrison’s quotation, cited in section 3.13): it therefore ‘does not
matter’ what the perceived relationship between the abstract structure
and the musical material is; this is simply not a consideration. Secondly,
in accordance with this, the proposition is that the spatialisation (via
carefully planned diffusion) adds a previously non-existent dimension
of interest and structural rigour to the musical work, ‘where once there
was nothing’ (recalling observations made in section 2.5.1, further to
219
Ibid.: 296.
222
Savouret’s writings). Again, this position simply does not recognise the
belief that musical materials might have spatial characteristics
perceptually ‘embedded.’ Top-down practitioners would tend to
disagree strongly with this position, as anecdotally accounted by
Harrison:
I went to a conference in the States, and the guy on the desk ‘diffused’
[…] all the pieces. And what he actually did was, he had choreographed,
on the mixer, certain fader movements, and he did them irrespective of
what material was coming off the medium. So he just went through the
same routine: he just went from these faders, to these faders, from these
faders to these faders, to the next, the next, and then he started again. And
he did it at the same tempo, he did the same pattern, the same sequence.
He was just arbitrarily moving the sound, no matter how fast it was
coming, no matter whether it was lots of movement implied or no
movement implied, he was moving it round the room arbitrarily, in the
same sequence, piece after piece after piece. That is not diffusion: that’s
ridiculous; that’s just nonsense.220
Whether or not this is in fact ‘nonsense’ ultimately depends on one’s

musical beliefs. In the context of top-down ideas the situation does,
indeed, appear to be completely nonsensical. To a bottom-up
practitioner, however, the same approach may seem perfectly
legitimate, particularly in comparison with a (perhaps badly) improvised
performance. In fairness, the example scenario is almost certainly not
representative of a high quality performance (neither top-down nor
bottom-up), and is probably fully deserving of criticism, but it has
nonetheless served to articulate an important point.
4.12.1.1. The Kinephone
Perhaps developed in response to the belief that ‘spatial

performers do not yet exist,’ the Kinephone is a performance
interface, based on the piano keyboard, that allows a diffuser to
control sound diffusion via a large array of loudspeakers. The
specific instrument described by Küpper has fifty keys, each one
of which, when pressed, opens a particular loudspeaker channel.
It was used in conjunction with a cupola installed at the Palazzo
220
223
Sagredo, Venice, in 1986.221 Such an interface would seem
particularly appropriate for a diffusion system whose
loudspeakers are positioned according to the theory of the
‘spatial semitone.’ As Küpper also points out, it allows
diffusions to be scored using traditional western notation, thus
ensuring that the performance is not improvised. (Recall that
Piché, cited in section 3.12.2, also called for diffusion practice to
be ‘codified’).
4.12.2. Evaluation
It should already be evident from the previous section that sound

cupolas and the Kinephone have been devised according to a fairly
puritanical bottom-up agenda. It is therefore reasonable to suppose that
the system will be intrinsically biased towards bottom-up diffusion
methods. Of course most systems can be appropriated in a broadly top-
down or bottom-up manner depending on the specific approach of the
diffuser in question, but it is nonetheless clear that this particular system
has been designed almost exclusively with respect to bottom-up criteria.
One fairly unusual aspect is that this system offers three-dimensional

sound source diffusion, with height. Although loudspeakers positioned
above (and perhaps even below) the audience are not entirely
uncommon,222 it is generally true to say that the most frequently
employed loudspeaker arrays exhibit an enormous bias in favour of the
horizontal plane. As such, diffusion in sound cupolas engenders certain
creative possibilities that are rarely available elsewhere.
Use of the Kinephone to control signal routings to large numbers of

loudspeakers raises the very important issue of interface ergonomics. In
the earlier sound cupolas, control of signal levels sent to loudspeakers
221
Music: 296-297.
222
The system installed at the Sonic Arts Research Centre (SARC) in Belfast, for instance,
has loudspeakers positioned above and below the audience. This system will very briefly be
described in section 4.15.
224
was – as in several of the systems described previously – via a fader-
based mixing desk control interface, with levels being controlled on a
one-fader-to-one-loudspeaker basis. With up to 104 loudspeakers, four
diffusers were required. The Kinephone undoubtedly represents a more
ergonomic design for this particular purpose, but only if the performer is
a reasonably accomplished pianist! In this respect, the Kinephone – as a
control interface – is not particularly accessible as per criterion 4.3.5
(particularly if an accurate ‘bottom-up-style’ diffusion is sought; this is,
after all, the forte of this particular system) and it could be argued that
much practice would be required to master it. Nonetheless it could also
be argued that the piano keyboard is, at least, a comparatively familiar
interfacing paradigm, although its suitability for the purpose of sound
diffusion would remain open to debate.
The evidently non-portable nature (criterion 4.3.7) of sound cupolas

also raises issues, particularly in conjunction with the unusual nature of
the Kinephone interface. With more than one hundred loudspeakers it
seems extremely unlikely that a cupola could be installed in anything
other than a sizeable venue. In addition to this, there are quite
considerable financial implications and practical constraints regarding
the construction of such systems. These observations indicate that the
present diffusion system is also highly inaccessible as per criterion
4.3.6, and therefore only likely to be of any real use to those fortunate
enough to have regular and easy access to it.
Overall, sound cupolas and the Kinephone are effectively completely

biased towards bottom-up diffusion methodologies, and are –
realistically – highly impractical and inaccessible. However, such
systems do give an excellent indication of kinds of requirements
demanded by classically bottom-up practitioners, and this will surely aid
the design of future sound diffusion systems.
225
4.13. The ACAT Dome
4.13.1. Description
Comparable in some ways to Küpper’s sound cupolas, the ACAT Dome

is a scaffold geodesic dome measuring fourteen metres in diameter.
Sixteen loudspeakers are symmetrically arranged on the frame, as well
as five cinema screens for visual image projection. Sound is distributed
among the sixteen loudspeakers using the Ambisonic system via a
software application running on a Macintosh computer.
Some highly relevant advantages of the Ambisonic system will be cited

later and on this basis an explanation is necessary. Ambisonics allows
any location in three-dimensional auditory space to be represented via
what is effectively an abstraction of a coherent audio source set,
comprising four discrete encoded audio streams.223 These signals –
collectively known as Ambisonic B-Format and individually referred to
as W, X, Y, and Z – are rather like the axes on a three-dimensional
Cartesian graph, to use an intuitive visual analogy. The amplitude of
signal X in relation to the others, at any given time, denotes the position
of the sound source on a ‘left-to-right’ axis, with the relative amplitudes
of Y and Z respectively denoting the position on ‘front-to-back’ and ‘up-
to-down’ axes. The amplitude of W does not change, and this is
basically to improve the quality of the perceived effect upon
reproduction. Monophonic signals can be positioned within an
Ambisonic ‘sound field’ by substituting polar co-ordinates (azimuth,
elevation) relating to the desired three-dimensional location into a set of
four mathematical equations (which, when resolved, result in each of
the four constituent B-Format channels), or else recorded with a
Soundfield microphone comprising four capsules, and an encoding
process that derives the B-Format signals from the microphone capsule
signals. Very importantly, it should be noted that unlike a coherent
223
First-order Ambisonics comprises four signals. Higher-order equivalents involve more
channels but the principles remain the same, and in any case, it is not necessary to discuss
these in this context.
226
audio source set, Ambisonic B-Format is not composed of encoded
audio streams that can be fed directly to loudspeakers. The B-Format
signals must first be decoded into loudspeaker feed signals – again,
mathematically – in order for the desired effect to be achieved. Because
B-Format is an abstract representation of a three-dimensional sound
field, so it can be decoded for a variety of different loudspeaker arrays
(using formulae incorporating the relative positions of the loudspeakers)
ostensibly with the same spatio-auditory results. Obviously, the quality
of the final results depends on the number of loudspeakers, and it is
generally accepted that three-dimensional auditory results are not
convincingly attainable with less than eight loudspeakers arranged in a
cubic formation.
Returning to the diffusion system in question, mouse movements are

used to control real-time sound source positioning on the horizontal
plane, while a variable height foot pedal provides the vertical positional
information. Data is communicated to the software via the MIDI
protocol. The system, developed at the Australian Centre for the Arts
and Technology (ACAT), is described more fully by Vennonen.224
4.13.2. Evaluation
A fairly unique characteristic of this system rests in the fact that it uses
the Ambisonic system, which has an altogether different set of
operational affordances and applications in comparison with those
systems described previously. Abstracted control is a particularly useful
characteristic; this will be discussed more fully in Chapter 6. A detailed
discussion of Ambisonics is beyond the scope of the present thesis, but
the salient aspects of the system are summarised by its inventor,
Michael Gerzon,225 and also by the author in a previous thesis.226
224
Vennonen (1994). "A Practical System for Three-Dimensional Sound Projection".
Website available at: http://online.anu.edu.au/ITA/ACAT/Ambisonic/3S.94.abstract.html.
225
The standard primary source for Ambisonics is:
Gerzon (1974). "Surround-Sound Psychoacoustics". Wireless World, 80(12).
A useful bibliography of other relevant sources, compiled by Gerzon, is:
Gerzon (1995). "Papers on Ambisonics and Related Topics". Website available at:
http://members.tripod.com/martin_leese/Ambisonic/biblio.txt.
227
The computer mouse is a control paradigm that is familiar to most
people nowadays, and in this case it is used in relation to spatial
positioning in a two-dimensional plane, which is not too far removed
from its normal application. It seems likely, therefore, that this
operational paradigm could be assimilated by performers with relative
ease (in comparison with, say, the piano-style interface of the
Kinephone, which is potentially far less approachable). The efficacy of
the foot-pedal is slightly more difficult to evaluate as this control
paradigm – although not by any means uncommon – has a wider variety
of potential applications.227 Certainly, the simultaneous operation of a
mouse and foot-pedal presents certain challenges with respect to motor
coordination, and on this basis it seems reasonable to suppose that the
control interface might take some time to get to grips with. Therefore,
the overall efficacy of the system will be determined to a certain extent
by its widespread availability for practice purposes (relating to criterion
4.3.6), and in this respect, the ACAT Dome is open to some of the same
criticisms directed towards Küpper’s system, described in the previous
section.
Additionally, it is not immediately apparent (in terms of interface

ergonomics) how a single mouse and foot-pedal could be used to
simultaneously diffuse multiple sources, nor as a means to realising
pluriphonic diffusion. This system effectively allows one CASS at a
time to be positioned as a ‘point source’ and it can therefore be argued
that the system is perhaps more suited to ‘panning’ than sound diffusion
(see section 3.10). It might therefore be suggested that this system is
biased in favour of the bottom-up ethos whereby ‘space’ is performed
live as an additional and autonomous dimension to the electroacoustic
work (in this context the distinction between ‘panning’ and ‘diffusion’
226
Mooney (2001). "Ambipan and Ambidec: Towards a Suite of VST Plugins with
Graphical User Interface for Positioning Sound Sources within an Ambisonic Sound-Field
in Real Time". M.Sc. thesis (University of York).
227
Nonetheless, in abstract terms the foot-pedal is – in the author’s opinion – a useful
addition to the diffusion repertoire because it adds a further dimension of interactive
control. It can therefore be argued that time spent assimilating the necessary skills is, at
least potentially, worthwhile.
228
is less obvious, in any case). In this respect, the ACAT Dome could be
considered similar – in terms of its ethos – to Küpper’s system,
described in the previous section. Indeed, the Ambisonic system – with
its emphasis on the accurate and realistic reproduction of three-
dimensional auditory phenomena – could itself be considered biased in
favour of bottom-up ideologies.
4.14. The DM-8 and the Richmond AudioBox
4.14.1. Description
The DM-8 and its various subsequent developments represent diffusion

systems built upon the concept of a mix matrix, which is described by
Rolfe as follows:
A matrix mixer differs from a conventional mixer in that the usual on/off
buttons in the assign-to-buss section are replaced by level controls. An 8
input, 8 output mixer has, for example, 64 (8 times 8) independently
adjustable levels commonly called crosspoints and usually conceived of as
a square, or matrix.228
The 8x8 matrix to which Rolfe refers is illustrated in Figure 29, below.
1 C C C C C C C C
2 C C C C C C C C
3 C C C C C C C C
Signal Inputs
4 C C C C C C C C
5 C C C C C C C C
6 C C C C C C C C
7 C C C C C C C C
8 C C C C C C C C
1 2 3 4 5 6 7 8
Signal Outputs
KEY: C = Matrix Crosspoint Level Control
Figure 29. Illustration of an 8x8

signal routing matrix with a level
control at each crosspoint.
228
229
Crosspoints can be referred to like grid references on a map: a
horizontal reference followed by a vertical reference. For example, if we
turn up the level on crosspoint 1-1, we are mixing some of the audio at
signal input 1 to signal output 1. If the controls at crosspoints 1-1, 1-2,
1-3 and 1-4 are all turned up, then the signals present at inputs 1, 2, 3
and 4 will be mixed together at signal output 1, with the balance of the
mix proportionate to the relative positions of each of the crosspoint
controls in question.
It should be immediately apparent that this model is potentially far more

flexible than, say, the routing matrix implemented in BEAST’s DACS
3D mixing console, which only allows routings to be switched on or off.
Further, unlike most switch-based audio mixing hardware, the mix
matrix paradigm is not inherently biased towards stereophonic coherent
audio source sets insofar as it allows inputs to be mixed to outputs
arbitrarily. However, performing live sound diffusion using a bank of
sixty-four rotary potentiometers as illustrated in Figure 29 is,
realistically, impossible. This brings us back to the issue of interface
ergonomics. Accordingly – like the Cybernéphone – the DM-8 and
those systems developed from it represent diffusion systems based on
both software and hardware components. In other words, control of the
matrix crosspoints is not on a physical, one-by-one basis, but achieved
via an intermediate software interface that allows crosspoint level
controls to be indirectly manipulated in a way that the limits of human
dexterity would otherwise restrict.
In the DM-8 system, the mix matrix itself is realised in hardware, and is
controlled via a software interface written in Max/MSP. The interface is
described in some detail by Rolfe and is ‘speaker-centric,’229 that is,
predetermined diffusions are specified in terms of which loudspeakers
they utilise at which times. This means that the loudspeaker array used
for the performance of the predetermined diffusion must be identical to
the array used to create the diffusion. This problem was addressed in a
229
Ibid.
230
subsequent development of the system comprising a 16x16 hardware
matrix mixer (namely the Richmond AudioBox, marketed by Harmonic
Functions230) and a Max/MSP interface, written by Rolfe and others,
named ABControl.231 With this system – again, described in more detail
by Rolfe – crosspoint levels themselves are determined ‘on the fly’ via
an intermediate vector-based user interface:
A rotation [for example] mapped onto vector angles can then be played
back on a quad, 8-channel, or other configuration.232
Nonetheless, one assumes that the loudspeakers would still have to be

arranged in a circular formation around the audience for the desired
effect to be achieved.
4.14.2. Evaluation
This system benefits enormously by offering performers the ability to

express diffusion actions abstractly. For example, a rotation around the
loudspeaker array can be expressed in terms of the parameters of the
rotation itself (rate of rotation, direction of rotation, et cetera), as
opposed to the sequence of mix matrix parameter adjustments required
to achieve the same effect. Of the systems described thus far, only the
ACAT Dome offers this kind of functionality, the other systems all
effectively operating directly on loudspeaker amplitude settings. This is
because the control interface (in this case realised entirely in software)
and mix engine (in this case the hardware AudioBox matrix mixer) are
not directly linked, but rather separated by an intermediate abstraction
layer in which control data can be algorithmically transformed into the
data required by the mix engine.233
230
Harmonic Functions (2001). "The AudioBox Disk Playback Matrix Mixer". Website
available at: http://www.hfi.com/dm16.htm.
231
232
Ibid.
233
Although several of the systems described previously (namely, the Acousmonium,
BEAST’s experimental system, Cybernéphone) do, in fact, operate on this basis, these
systems opt for control paradigms that closely emulate direct parameter control.
231
A fundamental characteristic the DM-8, and of those systems developed
from it, is that these are completely offline, automated, systems. In other
words, the diffusion must be planned and programmed in advance for
strictly automatic playback during performance. In terms of their ethos,
this essentially limits the application of these systems to use by bottom-
up practitioners. Indeed, the DM-8 might not even be regarded as
diffusion system at all by top-down practitioners, as it does not allow
for the real-time shaping of musical material in a way that is
perceptually effective for the audience – a criterion upon which sound
diffusion, from a top-down perspective, is fundamentally based. In order
to be appropriate for top-down use, the diffusion would have to be
realised in the performance venue itself, and this will not normally be
possible due to insufficient time being available. In any case, the
acoustic characteristics of the venue are likely to be considerably
different when the audience is present, and this can – of course –
completely change the diffusion requirements, effectively rendering a
pre-planned performance useless. This system is therefore not
particularly useful for the performance of top-down works, and
consequently falls short of fully satisfying criterion 4.3.2.
Because the DM-8 is an offline system, so considerations with respect

to its interface ergonomics will have a rather different emphasis. In this
case the interface is realised entirely in software and based around
standard keyboard and mouse input devices. As observed in section
4.13.2, these are not tools best suited to real-time sound diffusion but
are, however, extremely well supported in the context of offline human-
computer interaction. Accordingly, systems based around the AudioBox
have been reasonably widely employed as compositional tools, the
offline editing paradigm being particularly well-suited for multichannel
panning, which can be difficult using more mainstream audio
sequencers. Such a system is currently in use, for example, in the
electroacoustic music studios at Leeds University.234 The degree of
cross-over between ‘compositional’ tools and ‘performance’ tools
234
Ewan Stefani, University of Leeds (2004). Personal communication with the author.
232
apparent in electroacoustic music has already been noted and is
something that should be considered in the design of new sound
diffusion systems.
4.15. Other Systems
The Sonic Arts Research Centre (SARC) at Queen’s University, Belfast, has
a forty-eight channel sound diffusion system permanently installed in its
sonic laboratory.235 Loudspeakers are deployed at various heights in four
‘tiers,’ one of which is underneath the acoustically transparent grid floor of
the listening auditorium. Diffusion is controlled with a Digidesign Control-
24 mixing surface, meaning that the SARC system effectively operates in
the same way as a generic diffusion system.
The elBo, devised by Colby Lieder at Princeton University, is described is

follows:
The elBo is an instrument for live spatialization and diffusion of sound in

space. The instrument translates gestural actions into spatial location, point
source velocity, pan width (spread), effects parameters, and other diffusion
data. The guts fit in a single rack unit chassis.236
What Lieder describes is not a full ‘diffusion system’ so much as a control

interface that could conceivably be incorporated into one. Very little other
published information is available.
Along the same lines is a device known as the aXi0, which was ‘designed
and built by Brad Cariou at the University of Calgary and conceived to
provide digital artists with the intimate control and flexibility needed to
express themselves clearly in various new media.’237 It consists of a
velocity-sensitive MIDI keyboard, a joystick, and an array of buttons;
collectively, these facilitate multidimensional control over parameters. The
precise nature of the control is determined by the user in a Max/MSP
interface, so the device is not limited solely to sound diffusion applications.
235
SARC (2005). "Sonic Lab". Website available at:
http://www.sarc.qub.ac.uk/main.php?page=building&bID=1.
236
Lieder (2001). "Colby Lieder - Instruments". Website available at:
http://silvertone.princeton.edu/~colby/Instruments.html.
237
Eagle (2001). "The aXi0". Website available at: http://www.ucalgary.ca/~eagle/.
233
Again, this is a control interface as opposed to a complete sound diffusion
system, but represents interesting research into control paradigms that
surpass the simple mixing desk fader. However, ‘novel’ devices such as the
Elbo and aXi0 – with particular reference to criterion 4.3.5 – present
practitioners with the difficulty of having to appropriate new performance
skills. As such, these are more representative of experimental approaches.
There are, of course, many other systems currently in development or in use

for the live performance of electroacoustic music. A detailed account of all
of them is far beyond the scope of the present thesis, and in any case, there
are surely many systems that remain either under-documented or completely
undocumented, further complicating the task of achieving anything even
close to a comprehensive review.
4.16. Summary
Observations specific to each of the systems evaluated have been given

previously. Clearly, the sheer number of different approaches to the task of
sound diffusion – as evidenced in the systems described – makes their
evaluation on a general level somewhat problematic. The broadest
conclusion that can be made is that, at present, no single sound diffusion
system that is in regular use for the performance of electroacoustic works is
able to adequately satisfy all of the given criteria, and this circumstance can
be attributed to a great many factors. The relative successes and failings of
each system are variable, but nonetheless, certain general observations can
be made.
In technical terms, problems arising from the inflexibilities inherent in

mixing hardware – particularly that which has not been designed
specifically for the purpose of sound diffusion – are strikingly apparent. If
the mix architecture itself limits the ways in which audio input streams can
be diffused to audio outputs, this is likely to have a negative impact on both
the technological range of works that can be accommodated, and on the
range of ways in which diffusion can be realised. This is particularly true if
the control interface and mixing architecture are intrinsically linked, a
234
circumstance that effectively compounds the already considerable
challenges presented in each of these areas. Such limitations are particularly
in evidence in the generic setups, and indeed in all of the systems employing
hardware mixing desks (BEAST, Creatophone, Acousmonium). For largely
the same reasons, bias in favour of stereo, to the effective exclusion of
larger CASSs, is also a problem.
Many of the technical problems can potentially be overcome by the use of

software mix engines, as implemented in some of the systems described
(BEAST experimental, DM-8, ACAT Dome, Cybernéphone). However, the
Cybernéphone and BEAST’s experimental system adhere fairly closely to
traditional hardware paradigms and, while this is advantageous from the
perspective of familiarity, it does not exploit the benefits of software
abstraction to their full potential. The DM-8 and ACAT Dome, on the other
hand, deviate from familiar paradigms to such an extent that they seriously
limit the scope of their applicability. This is particularly unfortunate in the
case of the DM-8, because the mix-matrix architecture it employs
potentially represents an effective solution to many of the problems
discussed. This demonstrates the importance of careful user interface
design: a flexible internal architecture is effectively useless without an
appropriate control interface. It would be useful to find a ‘middle ground’
between the relatively conservative emulation of familiar hardware
architectures and those paradigms that deviate radically from these.
The subject of interface ergonomics presents interesting challenges,

particularly if interfaces that are ‘easy to learn’ are to be sought. Most of the
systems reviewed implement faders or fader-like controls as their primary
control interface. In brief, these are familiar (this is an advantage), but
present fairly serious problems in terms of ergonomics. Those systems that
deviate from this paradigm (Kinephone, ACAT Dome, DM-8), again, limit
their applicability fairly seriously. Furthermore, any interface that strays too
far from the familiar runs the risk of being inaccessible in the limited
amount of rehearsal time available. Again, an approach that mediates
between these poles (familiar versus unfamiliar) would be beneficial.
235
Clearly, none of the systems are as portable as might be desirable. Several
of the systems are, realistically, permanent installations, and all have
considerable transportation and setup overheads. Again, software systems
offer a potentially elegant solution to this problem, but problems arise where
systems are operationally dependent on large loudspeaker arrays. A system
that would allow similar results to be obtained on a wide variety of arrays
for practice purposes would be beneficial from this perspective. A system
combining this feature with the availability of diffusion software for
independent practice and/or preparation of performances – as demonstrated
by the Cybernéphone – would be ideal.
In terms of technological prerequisites, the most readily available systems

are the generic setups, and BEAST (whose system has, in line with this
observation, been adopted elsewhere). The DM-8 is also fairly easily
available, and has been adopted elsewhere. Other systems (the
Cybernéphone, for example) rely heavily on custom and/or bespoke
components or present considerable logistical implications (ACAT Dome,
Sound Cupolas). Unfortunately, the systems that are the most accessible in
terms of the technologies required to build them, are not the most flexible.
On a grand scale, it can be observed that, as creative frameworks, the

systems described each tend to exhibit a reasonably strong bias in terms of
their aesthetic directionality. In other words, they are clearly and easily
divisible into those that are bottom-up (Cupolas/Kinephone, ACAT Dome,
DM-8) and those that are top-down (BEAST, Acousmonium, Creatophone).
This is often a function of the nature of the control interface in particular:
interfaces that are amenable to objectively precise control tend to be suited
to bottom-up applications, for example. The systems in most common use –
the generic setups – are slightly biased in favour of top-down methods (they
are best suited to pluriphonic stereo performances and are not particularly
conducive to ‘accurate’ control of the diffusion in objective terms) but, in
actual fact, are not particularly flexible from either perspective. The only
system that makes clear and obvious efforts to cater for both top-down and
236
bottom-up approaches to diffusion is the Cybernéphone, and much can be
learned from this system in this respect.
237
5. The M2 Sound Diffusion System
5.1. Introduction
The M2 Sound Diffusion System was conceived, designed, developed, and

built jointly by Dr. David Moore and the author at the University of
Sheffield Sound Studios between 2002 and 2004, hosting its first public
performance of electroacoustic works as part of the Electroacoustic Wales
concert series at the University of Wales, Bangor, in spring 2004.
While it would be slightly misleading to suggest that the M2 was conceived

strictly ‘in response’ to the research documented in the present thesis (for
one thing this is not exactly the case from a purely chronological
perspective), it is certainly true to say that the development of the system
has played an extremely important role in the research process itself. The
prototype design presents many significant improvements in comparison
with other diffusion systems and also raises further important issues to be
considered in the design of new systems; both of these are highly positive
outcomes.
In building a new system one must obviously pose questions relating to the
proposed capabilities of that system, and give careful thought to the
considerations to be made in its design. It was through posing such
questions in the early stages of designing the M2 that the criteria for the
evaluation of sound diffusion systems (presented in the previous chapter)
began to take shape. The design of a system that would satisfy such criteria
to a greater extent than existing diffusion technology was, therefore, an
important objective in the development of the M2.
Accordingly, there are several areas in which the M2 clearly exceeds the
capabilities of many of the systems reviewed previously. The compact and
portable nature of the system, for instance, and the relative speed and ease
with which it can be set up, cannot realistically be rivalled by any currently
available sound diffusion technology. Careful consideration has also been
239
given to the demands of live sound diffusion from the perspective of signal
routing and mixing, an area identified as a deficiency in many other
systems. The M2 allows the routing and mixing architecture to be
determined by the diffuser, thus offering a degree flexibility unavailable
elsewhere. Additionally, the specific actions made available by the custom-
designed control interface (or, in very simplistic terms, ‘what each fader
does’) can also be user-defined, a further improvement on the (for the most
part) fixed functionalities offered by other systems. Conceived from the
outset as a modular system, the M2 is also extremely flexible in terms of the
technologies required to build it and (with the exception of the software
components, to be described presently) does not depend upon specific
technologies. This has a number of advantages that will be described in due
course.
Such improvements have, of course, been the result of careful planning

throughout the design of the system. The development of the M2 has also,
however, raised further issues that could not have been predicted by any
preliminary research, issues that will clearly be of significance in the design
of future systems. In this respect, the relative limitations of the M2, as
identified since the completion of the prototype in early 2004, are just as
important as its strengths in formulating a useful way forward for the design
of new sound diffusion systems. The purpose of the present chapter is,
therefore, to present a description and evaluation of the M2 Sound Diffusion
System, giving an objective overview of its various strengths and
weaknesses. Evaluation will be – as with all of the systems described in the
previous chapter – according to the set of criteria outlined in section 4.3.
5.2. System Description
The M2 comprises software and hardware components, the former taking

the shape of Moore’s SuperDiffuse applications.238 The control interface
was designed and assembled by the author, with certain bespoke
components being manufactured by the Department of Electrical and
238
Moore (2004). "Real-Time Sound Spatialization: Software Design and Implementation".
Ph.D. thesis (University of Sheffield).
240
Electronic Engineering at the University of Sheffield to the author’s
specification.
The control interface consists of a bank of thirty-two high quality ALPS

100-millimetre throw faders arranged in two horizontally-oriented rows of
sixteen and housed in a custom designed aluminium box. This is shown in
Figure 30. In terms of its functionality, the control interface is completely
abstracted from the mix engine, which is realised in software, and
communication with the latter is via MIDI controller messages. As such, the
control interface does not directly process any audio. Conversion from
simple control voltages into MIDI controller messages is achieved via an
Ircam AtoMIC Pro interface that has been pre-programmed appropriately.
Figure 30. The M2 Sound

Diffusion System control surface.
Dimensions: 373w x 370d x 55h
mm approx.
Audio input and output streams are respectively communicated to and from
the system via a MOTU 24 I/O audio interface239 and this currently
determines the number of analogue audio inputs and outputs made available
by the system: twenty-four in each case. Again, this device is completely
abstracted from the control interface and mix engine and therefore dually
serves only as a point-of-entry and point-of-exit for transitorily encoded
audio streams. In terms of audio sources, the system includes a Denon DN-
C635 professional rack-mounting compact disc player that is permanently
239
MOTU (2005). "MOTU 24I/O Overview". Website available at:
http://www.motu.com/products/pciaudio/24IO/body.html/en.
241
connected to two of the audio interface analogue inputs. External audio
sources can be connected as required to the remaining twenty-two analogue
inputs.
The software component of the system is represented by two distinct

programs: the SuperDiffuse Client application, and the SuperDiffuse Server
application. The client application effectively allows the performer to
specify what each of the thirty-two faders ‘does’ in terms of diffusion: a
number of highly flexible possibilities are available, and these will be
described later. Essentially, this information (control data) is communicated
in real time to the server application, which performs the function of mixing
audio input streams (obtained from the analogue inputs of the audio
interface) into audio output streams. The server does this in real time
according to data from the control interface received via the client
application. Audio output streams are then communicated by the server to
the audio interface physical outputs, each of which can be connected
directly to an active loudspeaker.
In its present incarnation both client and server applications run on a single
computer under the Windows XP operating system. This PC acts, therefore
as a central ‘communications hub,’ insofar as it facilitates communication
between all of the discrete components of the system. It is also possible to
run the client and server applications on two separate PCs.
242
Figure 31. The M2 Sound
Diffusion System flight case,
containing integrated core
system components. Dimensions:
530w x 730d x 540h mm approx.
The core components of the system – this excludes loudspeakers, signal

cables, a 15-inch TFT monitor for use with the PC, and any external audio
sources required – are flight-cased (see Figure 31) inside a single compact
and robust 8U 19-inch rackmount casing. When the detachable front panel
of the flight case is removed, the frontal rackmount system – which houses
the CD player, audio interface, and 4U rack-mounting PC – can be accessed
(see Figure 32). In transit, the detachable rear panel of the flight case houses
a wireless keyboard and mouse for use with the rack-mount PC. The rear
rackmount system (Figure 33) houses a custom-designed analogue audio I/O
breakout panel comprising eight analogue inputs (provided on balanced
quarter-inch jack sockets) and twenty-four analogue outputs (balanced XLR
connections); these are internally connected to the corresponding inputs and
outputs on the audio interface. Access to the back of the PC is also via the
rear rackmount system but should not normally be necessary. The rear
rackmount system also has a compartment in which the control interface is
stored when not in use. A small MIDI interface, wireless keyboard and
mouse receiver, and the Ircam AtoMIC are housed internally. Access to
these should not be necessary as the design allows for them to remain
243
permanently inter-connected in the appropriate way; however, easy access is
available via the front rack-mount system if required. All of the core
components obtain mains power via a single standard mains cable, which, in
transit, is coiled up inside the flight case and can easily be pulled out via the
rear rack-mount system. Cables for connecting the control surface to the
Ircam AtoMIC and the monitor to the PC are also provided in this way.
Figure 32. The front rack-mount

system, housing (from top-to-
bottom/left-to-right): Denon DN-
C635 CD player, Midiman USB
Midisport 2x2 interface, Ircam
AtoMIC Pro v2.0 interface,
wireless keyboard and mouse
receiver, MOTO 24 I/O audio
interface, 4U rack-mounting PC
running Windows XP operating
system and the SuperDiffuse
client and server applications.
244
Figure 33. The rear rack-mount
system, housing: (from top-to-
bottom): custom built analogue
audio input/output break-out
panel with eight analogue inputs
and twenty-four analogue
outputs, M2 control surface
inside transit compartment, back
of rack-mounting PC. Cables
shown are (left-to-right) power
and video signal for computer
monitor, two cables for
connection to the control
surface, mains power for all of
the equipment in the flight case.
Wireless keyboard and mouse
are also pictured as stored in the
detachable rear panel.
5.3. Physical System Architecture
A schematic representation of the M2 Diffusion System, showing the

relationship between the core devices, is given in Figure 34, below.
Referring also to Figure 20 (page 168) it can be seen that each of the four
main diffusion system components (audio source, control interface, mix
engine, loudspeaker array) is completely abstracted from the others via
various device interfaces. This effectively allows each device to function
independently of the others and allows for devices to be interchanged if
necessary, without affecting the overall functionality of the system. In other
words, the system is modular: we will return to this concept in section 6.2.7.
The function of physical audio input layer and physical audio output layer
(not labelled in Figure 34) is dually performed here by the audio interface.
245
It will be noted that the two software components of the system are
completely separate entities, communicating via the standard network
TCP/IP. This allows for two possible computer configurations, labelled (a)
and (b) in Figure 34. In scenario (a) the client and server run on two
separate machines. In this case, both machines would obviously require an
Ethernet interface to enable communication between them: these are
standard on the majority of modern PCs, and in any case can be obtained
very cheaply. The client machine would additionally require a USB (or
similar, P/S 2 for instance) interface for keyboard and mouse, a VGA socket
for connection with a monitor (VDU) – both of these are also standard on all
modern PCs at the time of writing – and some kind of MIDI interface for
communication with the control surface via the Ircam AtoMIC (the current
prototype has a Midiman USB Midisport 2x2 MIDI interface240). For the
server machine, the only other hardware prerequisite – in addition to the
Ethernet connection – is the audio interface itself. The current system, as
mentioned previously, uses a MOTU 24 I/O, but any audio interface can be
used as long as ASIO compliant drivers are available for it; this has become
a widely accepted standard in recent years. No keyboard, mouse, or monitor
is required for the server machine as all user interfacing is via the client.
In scenario (b) both the server and client run on a single PC; this
configuration is used in the current prototype system. This does not affect
the communication between client and server applications, which still
happens internally via the TCP/IP. When a single PC is used, it must
obviously provide all of the interface devices described. In terms of system
functionality, there is absolutely no difference between scenarios (a) and
(b).
240
Midiman no longer exists: the company is now known as M-Audio.
246
System Input
Audio Source(s) Control Interface
CD Player Keyboard
Control Surface
Other Audio Sources Mouse
32 Control Voltages
Ircam
AtoMIC
32 MIDI Controller Data Streams
MIDI
USB
Audio Interface
Mix Matrix User Settings

Ethernet
Ethernet
TCP/IP
VGA
SuperDiffuse Server SuperDiffuse Client
Application Application
(a) PC 1 - Server (a) PC 2 - Client

-OR-
(b) PC - Server and Client
Loudspeaker Array VDU
System Output
KEY
Data Streams: Audio Source Device Interface
Audio Input Control Interface
Audio Output Mix Engine Hardware Component
Control Loudspeaker Array Software Component
User Feedback / Other Computer

the general level architecture of
the M2 Sound Diffusion System.
247
5.4. The SuperDiffuse Software
An explanation of the capabilities of the M2 system, in terms of actual

sound diffusion, is best approached via an explanation of the capabilities of
the software. A comparatively brief explanation of the client and server
applications will be given in the following sections. SuperDiffuse was
designed and coded by Moore, who gives a detailed and comprehensive
account of its design and functionality.241
5.4.1. SuperDiffuse Server Application
In terms of the system as a whole, the SuperDiffuse server application

is, effectively, the mix engine. This takes the shape of a software model
of an audio mixing matrix, as illustrated in Figure 35 (matrix mixers
were described, conceptually, in section 4.14). The size of the matrix is
determined solely by the number of hardware inputs and outputs
available on the sound card and is neither defined nor restricted by the
application itself. Each of the matrix crosspoints, inputs, and outputs,
has an individual signal attenuator. At present the SuperDiffuse server
does not support audio ‘gain’ as such, owing to the risk of digital signal
clipping. Audio source signals are routed directly to sound card inputs,
and the diffusion data sent from the client application is used by the
server to determine the position of each attenuator in real time. In this
way, the client and server collaboratively allow any sound card input
signal to be dynamically mixed to any sound card output, in any
magnitude. In terms of terminology each mix matrix attenuator (inputs,
outputs, and cross-points) is considered to be a parameter within the
SuperDiffuse software; there are other parameters that will be discussed
in due course.
241
Moore (2004). "Real-Time Sound Spatialization: Software Design and Implementation".
Ph.D. thesis (University of Sheffield).
248
1 A A A A A A A A A A A A A A A A A
Sound Card Inputs

... n
A A A A A A A A A A A A A A A A
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ... n
Sound Card Outputs
KEY: A = Signal Attenuator
n = Number of Input or Output Channels Available on Sound Card
Figure 35. Schematic of the

audio mixing matrix emulated
by the SuperDiffuse server
application.
5.4.2. Super Diffuse Client Application
The SuperDiffuse client application mediates between the control

surface and the SuperDiffuse server application and, effectively,
represents an abstraction layer between the control interface and mix
engine. The ‘Performance Window’ of the client application (see Figure
36) consists of thirty-two virtual faders, which are directly analogous to
the thirty-two physical faders of the control surface. Any movements in
the control surface faders will be immediately and exactly reflected on-
screen in the Performance Window. Accordingly, the term ‘physical
fader’ can normally be taken mean the to physical fader itself, and its
on-screen counterpart.
249
Figure 36. The Performance
Window of the SuperDiffuse
client application.
5.4.2.1. Assigning Mix Matrix Parameters

Directly to Physical Faders
In the simplest instance, parameters may be assigned directly to

physical faders. This is achieved by clicking one of the ‘Assign’
buttons in the Performance Window, and then choosing which
parameter to assign to the physical fader with the subsequent
dialog box (Figure 37). Thus, any one of the physical faders can
be used as a direct control for any one of the mix matrix inputs,
outputs, or crosspoints. This method alone can be used to
emulate the operation of a generic diffusion system. As Figure
36 illustrates, each physical fader has two ‘Assign’ buttons.
Accordingly, each physical fader can in fact have two parameters
simultaneously assigned to it. This allows, for example, a single
fader to control the output levels of a pair of loudspeakers
without the need for an intermediate group fader (these will be
described presently). The two independent assignable paths of
each physical fader can each be associated with any parameter
within the SuperDiffuse software, so their flexibility extends
beyond simple control of loudspeaker pairs.
250
Figure 37. Dialog box with which
physical faders are assigned to
parameters within the
SuperDiffuse client application.
5.4.2.2. Group Faders
In addition to the thirty-two physical faders, the SuperDiffuse

client maintains a bank of 256 virtual ‘group’ faders. Using the
Group Window (the screen-shot shown in Figure 38 shows two
audio inputs and six audio outputs, reflecting the profile of the
audio interface in use at the time) each of these virtual faders can
be configured to define the respective values of a collection of
parameters. To this end, the Group Window displays a diagram
of the mix matrix in which each element can be associated with a
value ranging from -127 to 127 inclusive.
Figure 38. The Group Window

of the Super Diffuse Client
application.
251
Once defined in this way, the value of each virtual group fader
becomes a parameter within the software, and can therefore be
associated with a physical fader assign path. When this is done,
the parameter values contained within that group fader will be
reflected when the corresponding physical fader is at the
‘maximum’ position, with intermediate physical fader positions
resulting in grouped parameter values being scaled linearly. Note
that, within groups, parameters can be also be given negative
values.
5.4.2.3. Effect Faders and Effect Types
The SuperDiffuse client also maintains a bank of 256 virtual

effect faders, which can be configured via the Effects Window.
Using this interface, an effect fader is associated with an effect
type. At present there are three different effect types – Wave
(pictured in Figure 39), Chase (Figure 40), and Randomise
(Figure 41) – each with its own set of user-configurable
characteristics. Each effect type has its own unique behaviour,
directly affecting two or more parameters.
The Wave Effect takes one or two parameters, and sinusoidally

modulates their values at user-defined frequencies, phases, and
amplitudes. As with all assignments within the client application,
the parameters assigned to the Wave Effect can be mix-matrix
parameters, or indeed any of the 256 virtual group or other
effects faders.
252
Figure 39. The Effects Window
of the SuperDiffuse client
application, showing a Wave
Effect assigned to virtual Effect
Fader 1.
The Chase Effect is a basic twenty-four-step sequencer and, as

such, can be given up to twenty-four parameters. Each parameter
is in turn associated with a value between 0 and 127 (inclusive),
and the Chase Effect steps through the sequence of parameter-
value settings at a user-defined rate, with cross-fades
automatically implemented between steps. This effect can be
used to sequence mix-matrix parameters and, more interestingly,
group and effect faders, or any combination thereof.

application, showing a Chase
Effect assigned to virtual Effect
Fader 2.
253
The Randomise Effect takes one or two parameters and
randomises their values independently – additively,
subtractively, or both – within user-specified ranges and at user-
defined intervals. Again, parameters can be obtained directly
from the mix-matrix, or they can take the form of group or effect
faders.

application, showing a
Randomise Effect assigned to
virtual Effect Fader 3.
Once an Effect Fader has been associated with an Effect Type,

the value of the Effect Fader becomes a parameter, which can
then be assigned to a Physical Fader, or to any of the other
assignable paths within the system.
5.4.2.4. The Monitor Window
Levels for each parameter of the mix matrix can be observed at

any time by means of the Monitor Window. Note that the
Monitor Window does not provide signal levels for sound card
inputs and outputs: this is a physical impossibility because the
client application does not have any direct means of accessing
audio processing, this being carried out by the server.
Additionally, mix-matrix parameters can be muted or ‘locked’
from the Monitor Window. The value of a muted parameter will
always be zero, while the value of a ‘locked’ parameter will
always be maximum (127), regardless of any assignments made
to that parameter. Some reasons why this might be necessary will
254
be described later, in section 5.5.3. Suffice, at this point, to say
that the ability to lock parameters of the mix matrix allows for
sound diffusion to be executed in a number of different ways by
allowing the user to define the routing limitations (of course,
there need not be any at all) in a manner that will be
advantageous to the type of diffusion required.
Figure 42. The Monitor Window

of the Super Diffuse Client
application. In this case, a sound
card with two audio inputs and
six audio outputs is being used.
5.4.2.5. Summing of Parameters
It should be clear that, owing to the flexible nature of parameter

assignments within the system, it is possible that any given
parameter could conceivably have its value modified from
several ‘places’ simultaneously; indeed this is common practice
in the operation of the M2 system. Consider the following
scenario. Matrix inputs 1 and 2, and crosspoints 1-1 and 2-2 are
locked. Matrix outputs 1 and 2 are assigned to a Physical Fader
A. Matrix outputs 1 and 2 are also assigned as the two
parameters of a Wave Effect, associated with one of the 256
virtual Effect Faders. This Effect Fader is, in turn, assigned to
Physical Fader B. On the Control Surface, we now have two
Physical Faders that are able to exert control over matrix outputs
1 and 2: Physical Fader A can be used simply to control the level
of inputs 1 and 2 that gets sent to outputs 1 and 2; Physical Fader
255
B will modulate the ‘levels’ of matrix outputs 1 and 2
sinusoidally.
So, what happens if we raise the levels of both Physical Faders A

and B on the Control Surface? The target parameter values are
summed. If we raise only fader B (and fader A is at ‘zero’), the
‘levels’ of outputs 1 and 2 will modulate sinusoidally between
zero and the value of fader B. If fader A is already raised, then
the ‘levels’ of outputs 1 and 2 will modulate sinusoidally
between the value of fader A and the value of fader A plus B. The
software ensures that no parameter ever exceeds its maximum
value of 127: if the result of summing for any given parameter is
greater than this, then the value of that parameter will be
‘clipped’ at 127. This does not mean that the audio signal clips
(in fact, this safety feature is included to ensure that the audio
signal does not clip), but merely that the parameter will not
exceed its maximum value.
5.4.2.6. Summary
To summarise, the SuperDiffuse client application presents a

number of paradigms by which the server’s mix matrix can be
controlled via the M2 Control Surface interface. Mix matrix
parameters may be controlled on a simple one-to-one, or two-to-
one, basis with Physical Faders. Alternatively, virtual Group
Faders allow multiple parameters to be grouped and controlled
proportionally, while Effect Faders allow groups of parameters
to be automated in various different ways. Generally speaking,
any parameter can be assigned to any entity that requires
parameters, as illustrated in Figure 43. The exception to this is
with Effect Types; these may only be assigned to Effect Faders,
at which point the Effect becomes the (only) means for the Effect
Fader to accept parameters.
256
Figure 43. Summary of
assignable paths for parameters
within the SuperDiffuse client
application.
5.5. Using the System
The following sections describe the typical process involved prior to staging
a performance of electroacoustic music using the M2 Sound Diffusion
System.
5.5.1. Transportation
The M2 has been specifically designed for easy transportation to

venues. The flight case measures 53cm wide by 73cm deep by 54cm tall
and can fit into the back of most cars if this is necessary. The flight case
is heavy but has wheels and so can very easily be moved on the ground
by one person, but two people are realistically needed to get the unit
into and out of a vehicle, or if stairs are involved. The wheels of the
flight case can lock for safety during transit. Aside from the flight case,
257
it will also be necessary to transport loudspeakers, stands, at least as
many signal cables and standard IEC mains connectors as there are
loudspeakers, a number of mains power splitters to power the
loudspeakers, and a second very small flight case containing a flat-
screen TFT monitor. If only diffusion from compact disc is required
(this is at least relatively often the case), this is all that is required.
Reference to Figure 34 should indicate what other components will be
needed for larger performances; it should be noted that the current
prototype does not include microphone pre-amplifiers.
It will be clear that, in the vast majority of cases, the bulk of the
equipment for transportation will be represented by loudspeakers (in the
author’s experience this is also the case with most commonly employed
diffusion systems), and purely on this basis a van (or several car
journeys) is often required except where small loudspeaker arrays are
used. There are also likely to be a number of small sundry items such as
desk lamps and so on.
5.5.2. Physical Setup
The flight case is normally the first piece of equipment to be installed,

and is placed close to the desired location of the diffusion console. In
performance all of the core system components remain housed inside
the flight case, and on several occasions the flight case itself has been
used as a table for the control surface. Although amply large enough for
the control interface alone, this is not, however, ideal as it is difficult to
accommodate the computer monitor, despite that fact that this is flat-
screen, and on this basis a table is normally required.
When in place, the front and back detachable panels of the flight case
are removed. The keyboard and mouse are removed from inside the rear
detachable panel, and the batteries for each installed. The front
detachable panel has been used in the past as a stand for the control
surface, neatly providing a raked angle towards the performer if this is
preferred. The control surface slides out from its compartment inside the
258
flight case. The mains cable for the flight case is uncoiled via the open
back panel. As described previously, the flight case components are all
powered from a single mains connection, and so only one mains socket
is required. Cables to connect the control surface and computer monitor
are also uncoiled from inside the flight case and connected. The small
flight case used to house the monitor in transit doubles as a stand for the
monitor. The keyboard and mouse are wireless, so it is not necessary to
physically connect these.
The placement of loudspeakers on stands throughout the venue

normally takes place next, followed by the provision of mains power for
each of these. The time taken to achieve this invariably outweighs the
set up time for the flight-cased components by a considerable margin.
With the loudspeakers in place, each of these is connected via a
standard balanced XLR signal cable to the audio interface via the break-
out panel in the rear rack-mount system of the flight case, a note being
made of which loudspeakers are connected to which audio interface
outputs. When this is completed, the audio interface is switched on,
computer booted up, and the SuperDiffuse server and client applications
launched in that order. Loudspeakers are then switched on, completing
the physical setup of the system.
5.5.3. Software Configuration
With the system in place, the software is then configured according to

the requirements of the performance. If a preset has been saved from a
performance using an identical audio interface and loudspeaker array,
this is simply a matter of loading the corresponding file. If not, the
system must be configured.
The SuperDiffuse server application does not require any configuration

as this is all done via the client. In configuring the client application we
are ultimately determining the way in which the control surface can be
used to execute diffusion and, therefore, this stage of the setup process
is very important. Two procedures are common: either the same
259
configuration is used for every piece in the concert programme, or else
different pieces have different configurations. The former case is only
really possible if all of the pieces have the same technological profile in
terms of physical audio sources and CASS specifications. This scenario
would be possible, for example, if all of the pieces were to be diffused
from stereophonic CD. For the purposes of explanation it will be
assumed that we are dealing with this scenario. In any case, where
different configurations are required, the same process has to be iterated
again for each new configuration.
Firstly, a decision must be made with regard to how the signal routing
within the system will operate. In terms of the mix matrix (see Figure
35, above), here we are effectively deciding which parameters will be
controlled via the control surface, and which (if any) are to be locked to
maximum amplitude. Two different routing examples for the
pluriphonic diffusion of stereophonic works will be given, as this is
conceptually simple. The same basic principles would, of course, apply
for larger CASSs and different kinds of diffusion. The examples given
will assume, again for simplicity, that the BEAST ‘main eight’ array is
being used, and that audio interface outputs are connected to
loudspeakers as represented in Figure 44.
260
1 2
Distant Pair
3 Main Pair 4
5 6
Wide Pair
7 Rear Pair 8

and audio-output-to-loudspeaker
connections assumed for the two
SuperDiffuse routing
configuration examples given
below. A stereophonic audio
source from CD is also assumed.
In the first example, matrix inputs 1 and 2, and cross-points 1-1, 2-2, 1-
3, 2-4, 1-5, 2-6, 1-7, and 2-8 are locked, as shown in Figure 43, below.
This means that audio inputs 1 and 2 are effectively ‘hard wired’ to
audio outputs 1 and 2, 3 and 4, 5 and 6, and 7 and 8, respectively. In
order to achieve diffusion to loudspeakers, therefore, audio outputs 1 to
8 will have to be assigned to physical control surface faders. In the
simplest instance, this could be done by assigning each of the eight mix
matrix output parameters to an individual fader. The fader to which
output parameter 1 is assigned, would afford control over the output
level of the left-distant loudspeaker, referring to Figure 44, the fader
controlling matrix output 2 would afford control over the right-distant
loudspeaker, and so on. The kind of diffusion control made available by
this configuration (one-fader-to-one-loudspeaker) will be familiar to
most practitioners.
261
Figure 45. First routing example.
Matrix inputs 1 and 2, and cross-
points 1-1, 2-2-, 1-3, 2-4, 1-5, 2-6,
1-7, and 2-8 are locked, leaving
matrix outputs 1 to 8 available
for assignment to physical
faders.
In the second example, matrix inputs 1 and 2, and matrix outputs 1 to 8

are locked, as illustrated in Figure 44, below. In this case, no audio
inputs are ‘hard wired’ to outputs as such, and in order to hear anything
we will need to be able to control the mix matrix cross-points. The same
kind of control attained in the first example could be achieved in this
case by assigning cross-points 1-1, 2-2, 1-3, 2-4, 1-5, 2-6, 1-7, and 2-8
each to an individual fader.
Figure 46. Second routing

example. inputs 1 and 2, and
matrix outputs 1 to 8 are locked,
leaving matrix cross-points 1-1,
2-2-, 1-3, 2-4, 1-5, 2-6, 1-7, and 2-
8 available for assignment to
physical faders.
262
Clearly, the nature of the routing configuration in the Monitor Window
will depend on the technological profile of the work to be performed,
and the kind of diffusion required. Once this is completed (and, indeed,
it may be decided that no mix matrix parameters are to be locked), it is a
matter of defining the ways in which those parameters that are not
locked are to be controlled via the faders. It should be clear, referring
back to section 5.4.2, there are infinite possibilities ranging from direct
one-to-one control of loudspeakers via faders (as described in the
previous examples), through to configurations where the relationship
between faders and mix matrix parameters is highly abstracted.
Once the software has been set up, the configuration can be saved as a
file. A SuperDiffuse performance file can contain up to 128 preset
programs, each of which can be configured differently via the process
accounted above. During performance, programs can be selected from a
drop-down menu (this can even be done while the server is processing
audio, without any glitches). This method would be adopted in cases
where different configurations are required from one piece to the next.
5.6. Evaluation
As stated previously, evaluation of the M2 Sound Diffusion System – as

will all of the systems described in Chapter 4 – will be according to those
criteria defined in section 4.3. As the author has a comprehensive
knowledge of this system, so the evaluation in this case will be systematic:
this has not been possible in most other cases.
5.6.1. Flexibility of Signal Routing and Mixing

Architecture
At its heart, the M2 Sound Diffusion System operates on a matrix-based

mixing architecture. As explained in the previous chapter, this is the
most flexible mixing architecture available at the present time, because
it allows any input to be mixed to any output in any quantity. It is also,
however, reasonably convoluted, potentially confusing, and extremely
difficult to control directly in the context of sound diffusion, owing to
263
the number of parameters and the fact that these do not always
correspond in obvious ways to the desired outcomes. In short, the
matrix mixing paradigm has the potential to be enormously non-
ergonomic. The extent to which this has been successfully addressed in
the M2 system will be evaluated in the next section.
In terms of signal routing, the M2 is certainly among the most flexible

of the systems discussed, because this can be defined by the user, in
software. In this way, well-established paradigms – such as bussing in
stereo pairs – can easily be configured, but the system is not limited to
these. The client application also allows for signal routing presets to be
stored and recalled instantly, meaning that each item within a concert
programme can use an appropriate routing. This facility is not available
in any other system as far as the author is aware, and therefore
represents one of the considerable strengths of the M2.
This means that physical re-patching during a performance should never

be necessary, so long as the total number of audio input channels used
throughout the programme does not exceed twenty-four (of which two
channels must be from CD). Put simply, it does not matter which audio
inputs the audio sources are connected to, because the routing is all
determined in software. However, the use of more than eight non-CD
audio input channels – although perfectly possible – is not as easy as it
could be, owing to the fact that only eight inputs are externally
accessible on the rear rack-mount system; this can increase set-up time
slightly for concert programmes requiring more than eight non-CD
inputs. In addition, any audio sources that cannot be routed to standard
XLR connectors will require adapters, although this criticism can
probably be filed against any current diffusion system.
The total number of audio outputs presently offered by the system –

twenty-four – allows for a reasonably generous loudspeaker array to be
used in concerts. The system has been used to its full capacity in this
respect in at least three public performances and has performed
264
flawlessly. A twenty-four channel array, however, is not by any means
large in comparison with several of the systems described previously,
and does not rival the Acousmonium, Cybernéphone, BEAST, or sound
cupola systems.
The number of audio inputs and outputs is, however, flexible, because
further modules can be added to the MOTU PCI-424 PCI card. At
present, further 24 analogue I/O modules are available from MOTU, as
well as the MOTU 2408 Mark 3, which adds eight analogue inputs and
outputs and twenty-four digital ins and outs via A-DAT optical and
Tascam T-DIF interfaces. A maximum of three further modules can be
added, allowing a number of possible permutations. The maximum
number of input and output channels attainable with this technology is
ninety-six of each, with various combinations of analogue and digital
possible. This exceeds the number of inputs and outputs offered by all
but one of the systems described in the previous chapter, the exception
being certain of the sound cupolas (see section 4.12). In addition, the
system will function correctly with any sound card that can be
accommodated by the PC’s motherboard (normally these will have the
standard PCI interface) provided that the sound card in question has
ASIO drivers, as many PCI sound cards nowadays do. This will prove
to be advantageous in a number of ways to be explained later.
Another advantage is that the control interface is abstracted from the

signal routing and mixing architecture. Therefore, the design of the
architecture itself has not been limited in any way by control
considerations. This results in a higher degree of flexibility in the
present context, and is also beneficial in relation to interface
ergonomics.
5.6.2. Interface Ergonomics
On the surface, the M2 system would appear to be based around a rather

unoriginal bank of thirty-two standard faders. Issues relating to this, in
terms of interface ergonomics, have been described previously.
265
However, most of these issues arise when the faders are themselves an
integral part of the signal routing and mixing architecture, and this is not
the case with the present system. This means that more ergonomic use
(or more ‘efficient’ use) can be made of the faders, which also have the
considerable advantage of being an interface familiar to many
practitioners (this latter point will be discussed more fully in the next
section).
Among the many apparently straight-forward advantages offered by this

design is – quite simply – the ability to control the loudness of a pair
(or, indeed, a group of arbitrary denomination) of loudspeakers using
only a single fader. This can be achieved by assigning the appropriate
mix-matrix parameters to a virtual group fader, which is, in turn,
assigned to a single physical fader. With some of the more widespread
systems (including both the generic systems and the BEAST system) it
is necessary to interact with two faders (one for each loudspeaker) to
obtain this result. In cases where both faders will be manipulated
identically – and in many cases preserving the (e.g.) stereo balance in
this way will indeed be a concern – the ability to carry out this action
with a single fader most certainly represents an ergonomic
improvement.
A unique feature of the M2 system – along the same lines – is the

possibility of realising proportional amplitude control over a group of
loudspeakers; neither the Acousmonium nor the BEAST experimental
system offer this possibility. In this way, raising a fader to its
‘maximum’ position could – for instance – simultaneously bring one
pair of loudspeakers to maximum output level, a further pair to 50%
maximum level, a further pair to 30%, and so on. Of course, the exact
nature of the proportions is completely arbitrary (and loudspeakers need
not be grouped in pairs if this is not appropriate), and far more abstract
groups (comprising further groups and effect faders in addition to direct
mix-matrix parameters, for example) can be defined if required.
Assigning grouped parameters with negative values can yield interesting
266
results. Let us imagine that the system is in a state whereby every
loudspeaker in the array is raised to maximum output level. If a virtual
group fader (assigned to a physical fader) has each of the matrix output
channels assigned to it with a negative maximum value, raising that
fader to its maximum position will lower the output levels of all the
loudspeakers to zero. This technique has been implemented in many
performances with the M2, a good example being the use of a single
fader that brings the diffusion ‘all to mains’ (that is, only a frontal pair
of loudspeakers remain active). Such techniques are only available
owing to the fact that mix-matrix parameter values are summed, as
described previously, and this feature is not available on any widely-
used diffusion system. In all of the examples given, to achieve the same
result by manipulating one fader per loudspeaker would be considerably
less ergonomic, and in many cases infinitely less ergonomic (i.e.
impossible).
Overall, the nature of the assignments that can be made, and the effects
these have on the behaviour of the control interface with respect to the
diffusion, is extremely open-ended, and the user has direct control over
this. To this end, the user can tailor the controls to be ergonomic with
respect to the specific diffusion actions required. This would appear to
be a rare commodity indeed, with most existing systems offering only a
fixed repertoire of techniques, and is certainly one of the strongest
benefits of the M2 system in the present context. Additionally, the
configuration can be instantaneously changed from one piece to the next
(or, indeed, at any time) by recalling a previously defined preset.
An exploration of the ways in which the use of faders has been

ergonomically improved aside, it could be argued that because the
system adopts the fader as its only interfacing paradigm (the keyboard
and mouse are only used to configure the system and not for actual
diffusion), it is therefore – on a physical level – still subject to the
ergonomic limitations thereof, progressive control over one dimension
only being a good example. To put it another way, while the M2 may
267
offer extended functionality, the physical mode of operation remains the
same. As described in Chapter 1, every creative framework has its own
particular characteristics and limitations, and the degree of restriction
imposed by the interface as a whole could be lessened with the inclusion
of alternative paradigms (joysticks, buttons, foot-pedals, etc.) alongside
more conventional means. A fuller exploration of such possibilities
would certainly be beneficial, and is not evidenced in the system as it
stands.
The most obvious ergonomic limitations of the M2, however, reside in

the way in which the system itself must be configured by the user, via
the client application’s graphical user interface (GUI). The way in
which assignments are made, for instance, is relatively inefficient in
comparison with what it could be, and many optimisations could be
made in this respect. As a representative example, consider the process
of configuring a physical fader to control the output levels of a pair of
loudspeakers in equal proportion. This procedure would involve
(depending, of course, on the configuration of the Monitor Window –
see Figure 42; one possibility will be described here) clicking on the
physical fader’s ‘Assign’ button, selecting ‘Output Level’ from a drop-
down menu, selecting which output level (i.e. which mix-matrix output
attenuator) from a further drop-down menu, and then clicking ‘OK’ to
confirm the assignment. This process would then have to be repeated for
the physical fader’s second ‘Assign’ button (recalling that we are
assigning two output levels to a single control), resulting in a total of
twelve mouse-clicks to complete the assignment. For the most part, all
assignments must be made in this way, and the example given is a
relatively simple one. With more complex scenarios involving the
assignment of parameters to multiple virtual group and effect faders
prior to physical fader assignment, it is clear that configuring the system
has the potential to become a tedious and time-consuming process. This
is not owing to the nature of the task being complex, as such, and
therefore represents poor interface ergonomics. With available rehearsal
time usually limited to begin with, this could be a serious problem.
268
It can therefore be seen that, in making enormous ergonomic
improvements in certain areas, the M2 system consequently poses a
different set of ergonomic challenges in other areas, and some of these
are less well dealt with. Nonetheless, the problems are, for the most
part, in relation to configuring the system prior to performance, with
ergonomics during performance substantially improved in comparison
with other systems.
5.6.3. Accessibility with respect to Operational

Paradigms
Although cited – in one sense – as a potential limitation in the previous

section, the M2’s use of faders represents a considerable advantage
under the present heading. As described in the previous chapter,
familiarity is the single greatest aid to the realisation of an accessible
interface, and the mixing desk fader is most certainly familiar in the
context of sound diffusion. This was, for this very reason, an explicit
choice in the design of the control surface, and 100-millimetre throw
ALPS faders – frequently used in commercially available audio mixing
desks – were specifically selected to enhance the sense of familiarity,
despite the fact that the control surface does not have any direct contact
with encoded audio streams. This particular design choice was a good
one, feedback from users frequently indicating that the interface does
not ‘feel’ substantially different from a conventional mixing desk.242 In
conjunction with the increased functionality of this already familiar
interface (it is possible to ‘do more’ with the faders than in certain other
systems employing the same physical interface), this approach has been
very accurately described by Lewis as ‘supplementing existing diffusion
techniques rather than replacing them,’243 and this has proven to be
useful in the present context. From a familiar starting-point,
practitioners are able to progressively – over the course of several
242
Although Robert Normandeau is reported to have expressed a distaste for these faders,
on the grounds that they do not offer enough resistance. Nikos Stavropoulos (2005).
Personal communication with the author.
243
Dr. Andrew Lewis, University of Wales, Bangor (2004). Personal email communication
with the author.
269
performances and rehearsals, say – extend their repertoire of diffusion
actions through more and more complex assignments to the physical
faders. This is not true of other systems, which either tend to provide the
performer with a relatively limited selection of potential techniques and
fixed functionalities, or with paradigms that are less familiar, or both.
The visual and tactile familiarity of the interface has also, however,
proved problematic in other respects, certain users – for instance –
finding it difficult to comprehend the fundamental difference between
the M2 control surface and a conventional audio mixing desk, as
implemented in a generic diffusion system. This is due at least partly to
the fact that the system can be configured (and quite often is) to behave
in this way (faders eliciting one-to-one control over loudspeakers, et
cetera), and when this is the case, the results are highly convincing.
When presented with an interface that looks, feels, and behaves in more
or less the same way as a conventional mixing desk, it is hardly
surprising that practitioners get confused from time to time.
This problem is also due, albeit more indirectly, to certain of the

ergonomic inefficiencies described in the previous section. Because the
process of configuring the software is somewhat cumbersome, and
because there is usually little time available to train all of the performers
in the necessary procedures (this raises further questions with respect to
accessibility, which will be discussed more fully in the next section), so
many concerts have, in the past, been rehearsed and performed with the
system pre-configured by an expert user. This does, in certain respects,
defeat the object of having such a highly configurable system, and has
fairly often resulted in practitioners taking the ‘easy option’ and using
the M2 as if it were a generic system, effectively bypassing many of its
advantageous features.
This circumstance is both unbeneficial and self-perpetuating insofar as

practitioners are only relatively rarely able to gain first-hand experience
of the more configurable aspects of the system, and therefore have little
270
opportunity to assimilate the skills necessary to take advantage of these
for their own individual purposes. It is for this reason that the abstract,
open-ended, nature of the system is often remains ill understood by new
users, which is unfortunate because this is, without a doubt, one of the
system’s strongest features.
For these reasons – apparent similarly to conventional mixing hardware,

and a lack of knowledge with respect to the fundamental concepts and
potential capabilities of the system – it has been noted that certain users
become disorientated because the system does not behave as they have
expected. When, for instance, raising a fader results in the reduction of
loudspeaker output levels, and it is not immediately apparent to the
performer why this has been the case, the tendency – particularly during
performance – might be to panic. From this perspective, the behaviour
of the system can appear highly erratic and unpredictable, and it is
therefore not surprising that certain practitioners opt out of the more
unconventional capabilities. This does make it somewhat difficult to
engender a culture of ‘supplementing existing diffusion techniques
rather than replacing them,’ and in this respect it could be argued that
‘throwing performers in at the deep-end’ with completely unfamiliar
paradigms – and without a safe, familiar, option – might be more
successful.
Overall, however, the M2 exhibits the potential for a high degree of

accessibility with respect to the nature of its operational paradigms, and
positive feedback has been received on this basis. Some suggestions as
to how problems noted in the present context can be indirectly
addressed will be described in the next section.
5.6.4. Accessibility with respect to Technological

System Prerequisites and Support thereof
The constituent parts of the M2 system can be observed in Figure 34

(page 247). This summarises the software and hardware components
required to build the current M2 prototype, excluding flight casing. It
271
can be seen that many of the components are very easily, readily, and
comparatively cheaply available. These include the CD player, audio
interface, computer and computer-related hardware components, and the
loudspeaker array (which in any case would cost the same for any sound
diffusion system). It will also be noted, however, that the system
comprises certain bespoke components, including both software
applications and the control surface. It also incorporates an Ircam
AtoMIC interface, which, at a cost of around US$695,244 is excessively
expensive in this context, given that its sole function is to perform
control voltage to MIDI conversion. This means that the current M2
prototype is not as accessible as might be desirable.
However, the requirements described represent only one potential

configuration. The M2 deliberately offers a high degree of modularity,
and this is strongly beneficial from the point of view of accessibility in
the present context. All of the devices are completely abstracted from
each other, and communication between them achieved via standardised
protocols, effectively meaning that every component of the system
(realistically excluding the software, which is essential to maintaining
the core system functionality; this will be discussed presently) is
interchangeable, so long as the appropriate communication protocols are
observed. This means that the system can be tailored in accordance with
a number of important criteria – including cost, potential application,
and personal preference – without affecting the fundamental
functionalities of the system as a whole. The expensive Ircam AtoMIC
can be dispensed with, for example, if an alternative MIDI device is
used to control the diffusion. Indeed, before the development of the
control surface, the system was extensively tested using a Kenton
Control Freak 16-fader MIDI interface, and many other alternative
devices are available commercially. The same interchangeability can be
244
Retail price in 2000, as cited in:
Cutler, Robair and Bean (2000). "The Outer Limits - A Survey of Unconventional Musical
Input Devices". Electronic Musician. Electronic journal available at:
http://emusician.com/mag/emusic_outer_limits/.
272
observed in many of the other system components, and this has been a
conscious design feature.
Another important advantage rests in the fact that the system is highly
scalable. Owing to its modularity, the system can be scaled down (or
up) very effectively, and this is highly beneficial insofar as a scaled
down version of the system can effectively be ‘taken home’ for practice
purposes. However, at present – owing in part to the nature of the
software interface – system operation does depend considerably on the
loudspeaker array being used, in terms of number of loudspeakers and
their relative positioning. In this respect, much could be learned from,
for instance, the DM-8 and ACAT Dome systems, which, to a certain
extent, allow arrays of various sizes to be dealt with in essentially the
same way.
Use of the M2 for rehearsal outside the performance context is also

hindered by a lack of access to the software components, which – at the
time of writing – are neither available commercially nor for free. At
present this precludes the use of the M2 system in a wider context and
this is a problem that urgently needs to be addressed. Some possible
solutions will be proposed in the next chapter. Overall, however,
although the components of the M2 (the software in particular) may not
be as immediately accessible as those required to build, say, a generic
diffusion system, the functionalities offered are considerably extended,
and the system has the notable advantage of being highly scalable
without unduly affecting this.
5.6.5. Compactness / Portability and Integration

of Components
A great deal of attention has been paid to the efficient integration of

components within the M2 system. Because many of the core elements
can remain permanently inter-connected and housed within the flight
case, the system can be transported with comparative ease and set up
very quickly. Once the flight case is in place in the venue, the core of
273
the system can be physically set up by one person and fully functioning
in well under five minutes,245 provided the integrated CD player is the
only audio source required. This does not take into account, however,
the time taken to deploy and power up loudspeakers and connect these
via signal cables to the audio interface outputs, which – it can be argued
– represents a considerable portion of the setup overhead for any
portable diffusion system. If any audio sources other than compact disc
are required, then the appropriate hardware must be integrated, and it
has also been noted that software configuration of the system can be
time-consuming.
The portability of the system is also greatly improved by its scalability,

as described previously. While there are doubtlessly further
optimisations that could be made to decrease setup time, in comparison
with other systems the M2 performs extremely well in this respect, and
– importantly – the portability of the system does not compromise its
overall flexibility.
5.6.6. Reliability
In having hosted over a dozen public concerts (at the time of writing),
and numerous live demonstrations and workshops since its inaugural
performance in March 2004, the software component of the M2 system
has not crashed once during performance or rehearsal. On one occasion
the system was left set up and running continuously for around sixty
hours, in which time three full days of rehearsal and three public
evening performances took place, involving multiple performers.246 In
this case, all twenty-four output channels were in use, and no problems
were experienced.
245
This assumes that the user is familiar with the procedure. The system has been set up in
tests at the University of Sheffield Sound Studios – from fully flight-cased to up-and-
running – in just over two minutes.
246
USSS Sound Junction III was staged at the University of Sheffield Drama Studio
between Thursday 24th and Saturday 26th February 2005, and featured – amongst others –
composers Francis Dhomont and Andrew Lewis performing a selection of their own works
using the M2 system.
274
If, at any point, TCP/IP communication between the client and server
applications is lost, the server application will maintain the state of the
mix-matrix and this will therefore not result in the loss of sound
projection, but only a loss of control over the mix matrix. The work in
question can therefore continue to play until the end, at which point the
client application can be restarted. To date, this has never happened,
except deliberately during testing. Testing has also proved that if
communication with the client is lost, the client can be restarted, and
communication with the server re-established without any disruption to
the DSP engine; diffusion can then be continued as normal. Various
other tests have also been carried out in which – for example – multiple
copies of the client application have been loaded, software settings
altered, saved, and re-loaded, complex signal routing configurations
defined in real time, and so on, and none of these has resulted in any
disruption whatsoever to the audio playback.
However, the software has not been tested on as wide a variety of

different hardware configurations (different audio interfaces, different
PC software and hardware configurations, et cetera) as would be
desirable and this partly explains the continued unavailability of the
software components. Preliminary testing has, for instance, indicated
potential problems concerning certain graphics cards. Nonetheless, the
software has been proven to perform reliably with the MOTU 24 I/O,
MOTU 2408, RME Hammerfall, and a few other ASIO-compliant audio
interfaces.
Furthermore, although the software has never crashed to date, measures

in place to deal with this eventuality should it ever happen are not,
perhaps, as extensive as they might be. Although client-side problems
are relatively inconsequential, a serious server-side software failure
could potentially result in high-amplitude digital noise from all of the
audio interface outputs. If this was to happen, and the server machine
had stopped responding completely, the only option would be to switch
the power off on the audio interface. This is clearly not ideal and further
275
work is therefore needed in this respect. Another minor problem has
been identified, and this concerns the cold-booting of the system.
Communication with the MOTU 24 I/O is not always successfully
established in the first instance, and a re-boot is sometimes needed.
However, as the system functions perfectly once it is ‘up and running,’
it seems most likely that this is an issue relating to the audio interface
drivers, and MOTU has been contacted in relation to this.
Software aside, the integration of hardware components – as well as

decreasing setup times – improves reliability by minimising the
likelihood of equipment being incorrectly patched. With normalised
components in place, the system can be rigorously tested once and left
in a stable state. In terms of the hardware, there has been one small
problem with the control surface, in which one of the 25-pin D-
connector sockets (used to connect to the Ircam AtoMIC) was found to
be slightly loose. This can almost certainly be attributed to the nuts that
secure the connectors to the aluminium casing rattling themselves loose
during transit, and could be easily rectified by replacing the nuts with
Nylock equivalents.
The modularity of the system – mentioned in most of the criteria given

– is also beneficial from the perspective of reliability, because no single
component necessarily depends on all of the others to function correctly
(obviously this is truer in some cases than in others). This is beneficial
in two ways. Firstly, any damaged or faulty components can be replaced
individually. Secondly, if a malfunction should occur and it is not
possible to repair or replace the component in question, there is a good
chance that the system will be able to function in a depleted state (again,
this obviously depends on the component it question). Explicit
considerations in the design of the control surface, for example, have
been made in this respect. Internally, its circuitry is modular, and has
been designed so that any defective module can be easily accessed and
replaced quickly and easily. If this is not possible, all of the remaining
modules will function perfectly. Because each module accommodates
276
four faders, if a module is faulty during a performance, twenty-eight
fully functioning faders will still be available. The internal design of the
Control Surface is described in Appendix 2.
Overall, it can be concluded that considerable measures have been taken

to ensure that the M2 performs reliably. The system may not
theoretically be as reliable as some of the other systems described,
simply because it is more complex, and incorporates software
components. However, there is actually very little material evidence to
support this supposition. As with all of the criteria, certain aspects in
which improvements could be made have been identified.
5.6.7. Ability to Cater for the Technical Demands

of Live Electroacoustic Music
The ability to cater for the technological demands of a wide variety of

electroacoustic works is largely attributable to the M2’s flexible signal
routing and mixing architecture. Audio sources can be connected more
or less arbitrarily to the audio interface inputs and routed in software as
required during the concert programme. Each piece can have its own
system configuration if necessary, and these can be preset and recalled.
In this way, all of the audio sources required throughout the programme
can be connected to the audio interface at all times, circumventing the
need to physically re-patch. The exception to this under the present
prototype of the system is if the total number of different audio input
source channels in the concert programme exceeds twenty-four.
Nonetheless, this is not an insurmountable limitation of the system, as
the software natively supports ASIO compliant audio interfaces of an
arbitrary number of channels. As described in section 5.6.1, the MOTU
PCI card can in fact host up to four 24 I/Os simultaneously, for a total of
ninety-six possible analogue input and output channels maximum. This
should be ample for all but the very most varied and ambitious concert
programmes. Nonetheless it should be recalled that as new technologies
and capabilities become readily available, the very nature of the
electroacoustic idiom itself – as described in Chapter 2 – is to embrace
277
the new creative possibilities. If ninety-six inputs and outputs are
available at present, this may be insufficient in future.
The M2 is also rather more accommodating than some of the other

systems described, in terms of its tolerance for works of varying
numbers of channels. Again, this is largely attributable to the fact that
the signal routing and functionality of the faders can be different for
each piece. A stereophonic work can make use of the interface to
diffuse pluriphonically to a number of stereophonic coherent
loudspeaker sets within the array. Using a different pre-configuration,
an octaphonic work could be performed uniphonically with one fader
assigned as a level control for each of the eight loudspeakers being
used. The next piece again might call for a quadraphonic tape part to be
diffused alongside a stereophonically encoded acoustic instrument, and
this could also be accommodated, with independent diffusion of the two
coherent audio source sets attainable. Technically, the system is able to
accommodate each of these distinct demands as consecutive items on a
concert programme, with no physical re-patching in between works. If
this is to be realised in practice, a carefully planned loudspeaker array
will clearly be required, that is able to cater for all of the necessary CLS
shapes (see section 3.8.1). This will be discussed further in the next
chapter. It will also be noted that the third hypothetical example given
(quadraphonic tape with stereophonically amplified acoustic instrument)
could not be accommodated by the present system without external
microphone pre-amps, and this should certainly be considered as a
possible addition to the rack-mount system (even generic systems – the
simplest and arguably crudest of them all – normally cater for this).
5.6.8. Ability to Cater for both Top-Down and

Bottom-Up Approaches to Sound Diffusion
The ability to pluriphonically diffuse a wide variety of multichannel

works, making ergonomic use of the interface in various different ways,
is sure to be of considerable benefit to top-down practitioners. The
‘wave,’ ‘chase,’ and ‘randomise’ effects may also prove useful as a
278
means to attaining certain actions that would be difficult to achieve
manually.
Although some processes are, to a degree, automated, there is in fact

little scope for accurately controlled automation within the system;
those actions that do occur automatically not being controllable to the
extent that certain practitioners might wish. All of the ‘effects’ are
continuously iterating processes and the performer will not know –
when fading in a chase sequence, for instance – at what part of the cycle
the effect will be when it becomes audible. This has been observed by a
number of users of the system and is clearly something that needs to be
addressed.
Indeed, few – if any – of the paradigms on offer are ‘accurate’ in terms

of the parameters they manipulate, the system having very much been
geared towards perceptually-evaluated real-time performance. While
this is likely to be of enormous benefit to certain practitioners, what it
ultimately amounts to, in respect of the present criterion, is a
considerable top-down bias. This is not entirely surprising as the system
was developed from within a strongly acousmatic culture at USSS,
whose overall preferences tend to swing toward top-down
methodologies. If the system is to be widely adopted, however, more
attention will need to be paid to the potential requirements of bottom-up
practitioners.
It is, nonetheless, true to say that the limitations cited in the present
section are merely software constraints and, as such, none are directly
attributable to the physical design of the system itself. With even a few
relatively minor adjustments to the client software, the M2 has the
potential to be equally beneficial to both top-down and bottom-up
practitioners alike.
There are additionally a few aspects of the M2 system that do not fall under
the given evaluation criteria quite so easily. The ability to run the client and
279
server applications on separate computers is a good example, as this offers a
number of fairly unique benefits. It is not always convenient, for example,
for the physical signal routing to take place at the diffusion console. This
can often result in an unmanageable tangle of cabling around the
performance instrument. Because the SuperDiffuse server machine performs
all of the audio mixing, this can be placed in a location that is convenient for
easy and neat cabling to all of the loudspeakers. The exact location is likely
to vary from venue to venue, but this does not matter because the two
machines can be placed separately. Alternatively, the server machine can be
located at a distance from the audience, minimising the impact of noise from
cooling fans that has occasionally been raised as an issue. It will be noted
that, because communication between the client and server machines is via
the TCP/IP, so wireless Ethernet is a possibility. Furthermore, because the
client machine does not handle any audio directly, so it can quite feasibly
run on a machine of relatively low specification, a laptop for instance. This
reduces clutter around the diffusion console, which can very frequently be
observed in performances staged using other systems.
5.7. Summary
In comparison with those systems evaluated in Chapter 4, the M2 presents

improvements in some important areas. The system is very portable, and
therefore comparatively easy to transport between venues. Setup times are
also significantly reduced, owing to the integration of key components
within the flight case. Setting up the loudspeaker array, however, remains
time-consuming, and optimisations (perhaps in the form of rationalised
systems for the distribution of mains power and audio signals) could be
made in this respect.
In terms of its signal routing and mixing architecture, the M2 is completely

configurable, and the settings can be quickly and easily changed from piece
to piece by loading preconfigured files. The mix matrix architecture allows
any audio input to be dynamically mixed to any audio output, and the
internal architecture is governed by the requirements of individual works
rather than by the system itself. In this respect, the M2 is more flexible than
280
any of the systems described previously. However, the time taken to
configure the system in the first place has been identified as an area in
which further optimisations could be made.
The issue of interface ergonomics has been addressed by providing

extended functionality via the familiar mixing desk fader paradigm. Again,
this can be configured differently for each piece. In this way, improved
ergonomics and increased functionality are not at the expense of ease of use,
and the system does not deviate radically from familiar conventions.
However, as described previously, this has proved at times to be a
problematic issue in itself.
Every effort has been made to ensure that the M2 is widely accessible in
terms of the technologies required to build it. The modular nature of the
system, to a considerable extent, ensures that this is indeed the case: a wide
variety of different components (control interfaces, audio interfaces, et
cetera) can be integrated without affecting the overall functionality of the
system. The only real problem in this respect rests with the software, which
is currently unavailable outside the University of Sheffield Sound Studios.
Availability online would be a fairly straightforward solution to this.
In terms of its ability to accommodate both top-down and bottom-up

approaches to sound diffusion, the M2 demonstrates a considerable bias in
favour of the former. This is primarily because the nature of control and
diffusion techniques offered is more amenable to a perceptually evaluated
and ‘improvisatory’ approach than it is to the precise and consistent
articulation of predetermined actions. Accordingly, increased creative scope
is offered to the top-down practitioner, with considerably fewer apparent
advantages with respect to the bottom-up approach. This would appear to be
a difficult problem to address, because it depends so fundamentally upon the
holistic nature of the system as a creative framework. Nonetheless, the
problem could be addressed in the first instance with relatively minor
adjustments to the client software.
281
In summary, the main advantages of the M2 system are: portability and
quick setup times; configurability in terms of mix architecture and diffusion
control; configurability in terms of the hardware resources required;
increased creative scope and improved ergonomics from a mainly top-down
perspective. The main disadvantages are: inadequate support for bottom-up
methodologies; time taken to configure the system prior to performance;
unavailability of the software. The M2 demonstrates considerable
technological and logistical improvements in comparison with many
existing systems, with slightly more attention needing to be paid to aesthetic
issues (particularly with respect to bottom-up techniques).
Much has been learned through the design, development, and

implementation of the M2 Sound Diffusion System, and its subsequent use
in staging public performances of electroacoustic music. Many significant
improvements in comparison with other diffusion systems have been
realised, and the M2 is, in various ways, unique in terms of the creative
possibilities it offers to performers of electroacoustic music. There are, of
course, issues that remain unresolved, and additional issues have arisen
from the unique nature of the system itself. This is only to be expected given
what has been observed about the nature creative frameworks in general.
Strictly, the M2 cannot therefore be presented as a comprehensive solution
to the demands of live sound diffusion, although it can indeed be argued
that, in certain important respects, the system comes closer to attaining this
ideal than any of the other systems reviewed. If a truly comprehensive
solution is ever to be realised (and there is no guarantee that this is even a
realistic possibility given the complex and multifaceted demands of live
electroacoustic music) then further research is needed. Some specific areas
in which further research would be beneficial will be identified in the sixth,
and final, chapter.
282
6. A Way Forward for the Design of New
Sound Diffusion Systems
6.1. Introduction
This final chapter will present some overall conclusions and propose some
important considerations to be made in the design of future sound diffusion
systems. It should be recalled that areas for future research are proposed on
the basis of the following (very broad) questions, as outlined in the
introduction:
• What can we learn from the technological demands of the

electroacoustic idiom?
• What can we learn from the aesthetic nature of electroacoustic
music?
• What can we learn from the technical nature of sound diffusion
itself, and from differing approaches to and aesthetic attitudes
towards it?
• What can be learned from existing sound diffusion systems?
• What can we learn from the design and implementation of the
M2 Sound Diffusion System and from the feedback obtained
from practitioners who have performed with it?
By adopting an inclusive approach in attempting to answer these questions,

and with due consideration given to the evaluation criteria proposed in
section 4.3, it is proposed that new systems appropriate for the performance
of electroacoustic works from across the technological and aesthetic
spectrum of the idiom can be devised. It is hoped that such systems might
gain more widespread acceptance throughout (and perhaps even beyond) the
electroacoustic community.
What follows is not a design specification, as such, but rather a collection of

observations and suggestions that could, conceivably, be developed into
one; further refinement of the ideas would certainly be required. This review
will begin with a summary of some of the general issues that have been
raised, referring back to previous chapters. This will be followed by some
more specific suggestions regarding the kind of system that might be
283
required to address these issues and, more specifically still, a few particular
ideas and concepts that could potentially be beneficial.
6.2. Some Advantageous Paradigms and

Architectures
Through research into existing diffusion systems, and in response to the

criteria described in Chapter 4 (as informed by the technological and
aesthetic demands of the electroacoustic idiom), the following paradigms
and architectures are proposed as useful in the development of future
systems.
6.2.1. Matrix Mixing Architecture
Many of the problems associated with existing diffusion systems arise

from the inflexibilities often inherent in mixing hardware, particularly
that which has not been designed specifically for the purpose of sound
diffusion. If the architecture itself limits the ways in which audio input
streams can be diffused to audio outputs, this is likely to have a negative
impact on both the technological range of works that can be
accommodated (evaluation criterion 4.3.1), and on the range of ways in
which diffusion can be realised (of particular importance to criterion
4.3.2). Such limitations are in evidence in both of the generic setups,
because the architectures of the most common studio mixing desks tend
only to easily support two-channel (stereophonic) CASSs. The same can
certainly be said of the BEAST, Creatophone, and Cybernéphone
systems.
The advantages of the matrix mixing architecture as a solution to this

problem have been described previously. The matrix itself can be
effectively implemented in hardware or software but, for ergonomic
reasons, it is important that software-based control of the matrix be
made available. Hardware mix matrices (such as the Richmond
AudioBox, cited in section 4.14) are advantageous insofar as they do
not depend on the processing power of a host computer, and potentially
offer increased reliability. However, once built, the specification
284
(number of inputs and outputs, nature of DSP performed, et cetera) of a
hardware matrix cannot be modified as easily as a software
implementation. With computer processing power increasingly less of
an issue in recent times, this would appear to be a strong argument in
favour of host-based software-emulated matrices, from the perspective
of continuously improving developments in functionality at least.
In practical terms, however, there are always constraints. Paradoxically,

in its hardware incarnations the matrix mixer is almost certainly among
the least ergonomic control interfaces for the live diffusion of sound,
primarily because of the number of parameters (individual controls) the
user has to directly interact with. Unsurprisingly, not one of the systems
reviewed features a hardware matrix mixer that must be operated in this
way and, as far as the author is aware, no such system is in regular use
for the live diffusion of electroacoustic music. It is for this reason that
the mix matrix paradigm is best controlled indirectly, via software
abstraction.
6.2.2. Software-Based Systems
One of the greatest advantages of software-based diffusion systems is

that they easily afford the possibility of highly abstracted control
paradigms, as simply exemplified in Figure 47. Using the basic example
of four loudspeakers, if the system is based on direct control of output
parameters, then one control will be needed for the output level of each
loudspeaker. In an abstracted control system, the output level
parameters are calculated algorithmically from input (user) parameters
in any of an infinite number of possible ways. This is in contrast with
strictly hardware-based systems, which are far more likely to necessitate
direct one-to-one control over output parameters.
285
Direct Control Abstracted Control
Input Parameter
Input paramenters = output parameters Abstraction
Output parameters. Values determined

algorithmically from input parameters
Figure 47. Simple diagrammatic

example showing the difference
between direct control of
parameters versus abstracted
control.
Software systems are also advantageous in that developments and

improvements can be implemented fairly quickly and economically in
comparison with hardware systems, and can be installed on any
computer (licensing issues permitting) for increased portability.
6.2.3. Flexible and Accessible Operational

Paradigms
Perhaps the most important factor to consider in the present context is

the fact that diffusion has traditionally been – and in most cases
continues to be – executed with mixing-desk-style faders. Thus, the
fader is generally regarded as an approachable paradigm in the context
of sound diffusion and, as observed by Harrison in section 4.3.5, many
sound projectionists are highly experienced and skilful in its use. This
cultural familiarity is extremely beneficial because, on a personal
individual basis, it augments the range and quality of diffusion actions
that can be realised and, on a more general level, it allows for the raising
of performance standards by way of teaching and so on.
As previously discussed, however, the fader – particularly when

implemented in conjunction with an overly restrictive mixing and
286
routing architecture – can be fairly non-ergonomic, no matter how
familiar it is. Several of the systems evaluated – most notably the
Creatophone and BEAST experimental system – indicate an interest in
developing and improving this paradigm, however – also noted
previously – such explorations are fairly limited in scope. It can also be
seen that those systems incorporating non-fader-based control interfaces
nonetheless adhere to paradigms that are ostensibly ‘familiar’ within a
musical and/or electroacoustic context. The ACAT Dome and DM-8,
for example, incorporate standard keyboard and mouse interfaces that
will nowadays be absolutely familiar to every composer of
electroacoustic music; the piano-keyboard-style interface of the
Kinephone (although in certain respects more specific) is familiar in an
even broader musical context.
The use of familiar interfaces in sound diffusion has tangible benefits

with respect to this criterion and, it is proposed, such practice should be
continued to a certain extent in the development of new systems.
However, it is also proposed that more consideration needs to be given
to the appropriateness of such interfaces for the specific task of sound
diffusion. Familiarity in one context does not guarantee familiarity in
another.
New systems that enhance creative scope and improve interface

ergonomics, while at the same time supplementing existing diffusion
paradigms rather than completely replacing them, are required. In this
way, a transition towards more flexible interfaces could be made
progressively, without alienating practitioners used to well-established
paradigms. Some more specific suggestions will be given in section
6.3.4.
6.2.4. Client/Server Architecture
There is much to recommend the development of further client-and-

server based systems. This architecture has a number of considerable
benefits, not least including the fact that new client applications could
287
be devised without having to re-design the underlying mix engine
(whether this be hardware or software). This would be a time-saver for
software developers, but also useful in that various client options could
be made available to performers, and potentially opened up to third-
party development. Specific suggestions, in this respect, will be made in
section 6.3.
6.2.5. Scalable Systems
The benefits of a compact and portable system, whose individual

components are well integrated, were outlined in section 4.3.7. This
raises important questions with regard to the degree of portability that
can realistically be achieved in a sound diffusion system. Certainly the
number of loudspeakers required for even a relatively modest array
would seem to conspire against this objective. It therefore goes without
saying that a sound diffusion system is very unlikely to achieve the
same degree of portability as, say, a laptop computer! Working within
the bounds of what is reasonable, however, it is clear that certain
optimisations could be made with respect to this criterion that do not
seem to have been fully addressed in all of the systems described
previously.
A system that is highly scalable is to be recommended, partly because

this allows for easy adaptation to a number of different performance
scenarios (ranging from full-scale performances with large loudspeaker
arrays to smaller performances in intimate venues) and also because it
potentially allows for performers to practice their art outside the
performance situation using the correct ‘instrument.’ This latter point
would seem, intuitively, to be an absolute prerequisite for any musical
performance practice, but – as observed previously – is inadequately
catered for in the case of sound diffusion. This is largely owing to
potentially impractical technical demands, large loudspeaker arrays
being a good example, and some specific suggestions as to how this
problem can be addressed (learning from existing diffusion systems)
will be proposed in section 6.3.2.
288
6.2.6. Accessible System Prerequisites
If the skills required to operate non-familiar diffusion paradigms are to

be assimilated by practitioners, then this requires a relatively high
degree of access to the systems in question. This is doubly true given
that the rehearsal time available in concert situations is almost
invariably severely limited. On this basis a case can be made for new
systems that incorporate standard or otherwise easily available and well-
supported components, and perhaps systems that can be built to a
variety of specifications without affecting the overall functionality.
6.2.7. Modular Systems
Several of the attributes described above indicate that a modular system

is needed. The concept of modularity can apply to both software and
hardware systems, and is based around the idea of reducing a system
into its component parts in terms of their discrete functionalities.
In abstract terms, a non-modular system effectively comprises a single

entity or ‘black box.’ The internal functionalities of that system are
therefore closed, and input data (including audio streams and control
data, in the case of sound diffusion) are manipulated into output data
within a single unitary process. This is illustrated in Figure 48(a),
below.
A first-order modular system addresses that unitary process and breaks

it down into several constituent processes. In Figure 48(b), we
effectively have three ‘black boxes’ instead of one. The overall
functionality is the same but, provided the appropriate communication
protocols between modules are observed, modules A, B, and C are can
be changed. An insert channel on a hardware mixing desk is a good
example: the function of the insert can change (e.g. it could be a reverb
or a compressor) but the abstract structure of input-process-output
remains the same; the communication protocol is, in this example,
289
transitory-analogue encoded audio streams provided via balanced
quarter-inch jack sockets.
A second-order modular system provides a further level of abstraction

(an ‘abstraction layer’) in between each module of the system, as
illustrated in Figure 48(c). Basically, an abstraction layer can be
regarded as a ‘converter’ between communication protocols. Say
process A outputs digital audio data in 16-bit 44.1 kHz format, and
process B is designed to receive input in 32-bit 48 kHz format. Under
scenario (b), the two modules would be incompatible. In scenario (c),
however, the abstraction layer between processes A and B could be used
to convert between the two formats and therefore establish correct
communication between the modules. This is, basically, how
Steinberg’s ASIO abstraction layer allows audio interfaces of differing
specification to communicate with software.
(a) Non-Modular System
Input Process Output
(b) Modular System
Input Process A Process B Process C Output
(c) Modular System with Abstraction Layers
Abstraction Abstraction
Input Process A Process B Process C Output
Layer Layer
Figure 48. Diagrammatic models

of: (a) non-modular system
architecture; (b) modular system
architecture; (c) modular system
architecture incorporating
abstraction layers.
Scenarios (b) and – even more-so – (c) are advantageous in that we have
access to data at various transitory stages between (overall) ‘input’ and
290
‘output.’ This means that various interventions can be performed before
passing data on to the next module. (Comparison with 1.6.1 on page 11
will confirm that this is a very appropriate analogy for the
compositional practice of electroacoustic music in general). Modular
architectures (the client-server paradigm itself being a good example)
therefore have many advantages, most notably that they are potentially
hardware independent, that is, the ‘same’ system can be built with a
wide variety of different hardware components. The same applies to
modular software designs. Some argue that modular designs can
complicate and prolong the design process in certain respects, but it
proposed that the advantages with respect to the practice of sound
diffusion amply compensate for this.
Overall, it is concluded that due consideration of the criteria summarised in

section 4.3 in the design process should result in sound diffusion systems
that are more broadly appropriate for the task in hand. Sound diffusion is a
unique creative practice with its own particular demands, and flexible
systems designed and built specifically for the purpose are required. It
should also be noted that a sound diffusion system consists of all four of the
components identified in Figure 20 (page 168). The flexibility of a system,
as observed, can be fundamentally compromised by poor design
considerations in any of the four main component areas, and therefore a
design approach that applies the evaluation criteria to each individual
component, in turn, is to be recommended.
6.3. Some more Specific Possibilities
The following sections will describe some more specific functionalities that
will be useful in the development of diffusion systems that are more flexible
in terms of the criteria given. Some of these will take the form of potential
developments to the M2 system (simply because this is a logical perspective
for the author to adopt), while others would require a full re-design in order
to be implemented, but these suggestions are broadly intended to be useful
for the design of any system.
291
6.3.1. Increased Functionality via CASS and CLS
Concepts
It is suggested that a practical implementation of the CASS and CLS

concepts (outlined in Chapter 3) would be useful. This would allow
loudspeakers within the array to be specifically grouped by the system
in a manner that reflects their prospective use for the diffusion of a
particular electroacoustic work. Multiple group membership for
individual loudspeakers is possible (see section 3.8.5), meaning that a
loudspeaker array can potentially contain many more coherent sets than
there are individual loudspeakers. Of course, the CLSs defined within
the array are very likely to be determined by the number of channels in
the CASS (or CASSs) for that piece (see sections 3.6.5 and 3.8.3).
Therefore, a means allowing the user to make this information known
(i.e. how audio input channels are to be grouped) to the client
application would also be necessary.
The CASS/CLS paradigm has many potentially useful applications,

ranging from simple control over multiple signals with one fader (this is
effectively already possible in the M2 and some other systems), to more
creative and experimental applications. Consider, for example, the
possibility of a ‘point source panner’ designed specifically for panning
CASSs to CLSs. In a ‘normal’ point source panner (such as the one
included in Steinberg’s Nuendo for the purposes of 5.1 panning247) one
would expect that the ‘position’ of the audio source relative to each
loudspeaker in the array should determine the relative amplitude of that
signal to each loudspeaker. In principle, a CASS-to-CLS panner is the
same, only there are restrictions in terms of which audio source (CASS)
channels can be sent to which loudspeakers. Specifically, a CASS
channel can only be sent to a loudspeaker if that particular input-to-
output routing is appropriate (whether or not this is the case could be, to
a greater or lesser extent, user defined). As an example, consider the
scenario illustrated in Figure 49 (below). Here, we have a ‘main eight’
247
Steinberg (2005). "Steinberg". Website available at:
http://www.steinberg.de/Steinberg/defaultb0e4.html.
292
array; we will assume that we are dealing with a two channel source
(i.e. one stereophonic CASS consisting of ‘left’ and ‘right’) and have
therefore decided to treat the loudspeaker array as four stereophonic
CLSs. If we have a user interface control that allows us to define the
‘position’ of the CASS along an imaginary axis running from the front
to the back of the venue, then that CASS can be dynamically ‘cross
faded’ between the four CLSs with a single control. This paradigm
could be described as a ‘multi-point cross-fader’: with the fader at
‘maximum’ position, the CASS emanates from the ‘distant’ CLS; with
the fader at ‘minimum’ position, the CASS emanates from the ‘rear’
CLS; at intermediate points, interpolation between CLSs occurs. The
control would not necessarily have to operate on a simple linear basis,
but could have custom interpolation functions applied if necessary.
Fader or
other control
At this position... CASS is diffused to Distant
At this position... CASS is diffused to Main
At this position... CASS is diffused to Wide
At this position... CASS is diffused to Rear

the operation of a ‘multipoint
crossfader’ as a means to
dynamically diffusing a CASS to
multiple CLSs.
293
Of course, the scenario described above would not be sufficient on its
own, but rather could form part of a repertoire of useful control
paradigms, or diffusion ‘actions.’ Overall, a practical implementation of
CASS and CLS concepts would allow performers to deal with their
audio source channels in a more coordinated and efficient way (rather
than individually) and would also represent a step towards establishing
interface/loudspeaker-array independence in diffusion systems.
6.3.2. Interface/Loudspeaker-Array Independence
Any diffusion system that operationally depends on our knowing the

exact nature of the loudspeaker configuration to be used will present
difficulties in the use of smaller arrays for practice purposes. This is a
tricky problem to address, but much can be learned, in abstract terms,
from the Ambisonic system (described in section 4.13), which – given
identical input data – is capable of producing (ostensibly) the same
spatio-auditory results over a variety of different loudspeaker arrays.
Although Ambisonics is of questionable widespread suitability for the
purposes of sound diffusion, conceptually, this particular attribute
would appear to be very useful. The DM-8 system (section 4.14) also
offers this functionality.
One problem with both of these systems, however, is that they impose
specific restrictions with regard to the shape of the loudspeaker array.
Ambisonics, for example, requires loudspeakers to be arranged in
opposite pairs on the surface of an imaginary sphere, and focused on the
central point. Similarly, the DM-8 will allow for the automated
performance of pre-planned diffusions to be realised via a variety of
loudspeaker arrays, provided that they are arranged in a circular
formation and, again, equidistant from the central point. It is true to say
that, in both cases, the circular/spherical formation is primarily a
function of mathematical convenience, and is also due to the fact that
both systems are designed with a degree of ‘precision’ in mind (in this
respect, they are both bottom-up systems). There is, of course, no reason
why such capabilities should not be incorporated into future systems,
294
but it is argued that any system offering only this kind of functionality
would only be of use to a fraction of the electroacoustic community.
Such an approach could be supplemented by a parallel system in which

loudspeakers are arranged graphically in the client application,
reflecting their actual deployment within the venue. Loudspeakers
would then be grouped into coherent sets (as suggested in the previous
section) according to the work in question, and then further defined in
terms of ‘higher level’ criteria. A system of common categories, some
of them arranged hierarchically, would be devised. Further research
would be needed for this, but as a suggestion, categories relating to the
‘higher level’ positioning of loudspeakers (attributes other than position
might also be considered) could include ‘mains,’ ‘wides,’ and so on, to
use BEAST terminology. A sub-category of ‘wide’ might be ‘extra-
wide,’ for instance. User-defined categories might also be a useful
possibility. In this way, the diffusion client would ‘know’ about the
positions, groupings, and prospective uses of the different loudspeakers
in the array,248 and this information could be used to ‘map’ between
different loudspeaker arrays according to the nearest appropriate match.
Clearly, this would only be possible using a software-based system. A
hypothetical example is given in Figure 50. Loudspeaker arrays (a) and
(b) are clearly related – either (a) is regarded as a ‘scaled-down’ version
of (b), or (b) is regarded as an ‘extended version’ of (a) – but this
conclusion is most easily reached on the basis of perceptual evaluation.
Loudspeakers are all arranged into stereophonic CLSs. CLSs that can be
regarded as categorically ‘similar’ between the two arrays are colour
coded accordingly. Of course, this could be done in a number of
different ways according to the requirements of the piece and the
248
In most systems – including the M2 – the diffusion client or equivalent does not ‘know’
anything (or ‘care,’ if this is a better personification) about the loudspeakers, and refers to
them only in terms of the audio output to which they are connected. This direct approach is
a ‘bottle-neck’ in the present context, resulting in the array dependence that we are
presently trying to evade. Note that the Ambisonic and DM-8 systems circumvent this by
applying a further level of abstraction. The presently suggested approach is essentially
similar but less geared towards ‘mathematical precision’ and more towards ‘perceived
results.’ In this respect it is presented as potentially useful to top-down practitioners in
particular, and should probably be implemented along-side more bottom-up techniques.
295
preferences of the performer. The groupings might be substantially
different for a 5.1 CASS for instance. If such judgements are to be
useful in a software context, mechanisms must be in place for the
software (which, under present technological constraints, only
understands objective parameters) to have access to this information,
and clearly this must be provided by the user. In short, it would be
useful to be able to refer to loudspeakers – and groups of loudspeakers –
without referring directly to the audio interface outputs to which they
are connected. It is proposed that such an approach could potentially be
developed into a solution to the problem of scalable loudspeaker arrays
that is more useful to top-down practitioners in particular.
Loudspeaker Array (a) Loudspeaker Array (b)
Extra Distant
Distant Distant
Main Main
Wide Wide
Extra-Wide
Close
Side Front
Side
Side Rear
Rear Rear
Rear Distant
Figure 50. Two loudspeaker

arrays that can be recognised
(perceptually) as related.
6.3.3. Diffusion ‘Actions’
Implementation of items 6.3.1 and 6.3.2 in particular would be a step

towards establishing platforms within which performers could diffuse
works more directly in terms of the specific diffusion ‘actions’
296
involved. Present systems mainly afford the user direct control over
loudspeaker amplitudes, or else slightly more abstract control over the
‘position’ of sound sources (recalling the ACAT Dome and DM-8
examples). However, these are not always obviously amenable to the
kinds of actions that practitioners wish to perform, which are often more
easily associated with ‘higher level’ descriptors. The ‘effects’ provided
by the SuperDiffuse client application (see section 5.4.2.3) represent
initial steps towards remedying this, and this could be developed more
thoroughly. A set of highly configurable and intuitively controllable
(see section 6.3.4) diffusion actions based loosely around Smalley’s
taxonomies (‘divergence,’ ‘convergence,’ ‘contraction,’ ‘parabola,’ et
cetera)249 would surely be of interest to top-down practitioners in
particular. For the bottom-up practitioner, more objectively satisfactory
actions would be desirable, perhaps with more of a focus on quantifiable
processes, and with accurately controllable parameters. Both
approaches, however, could obviously be incorporated into the same
overall software framework. Of course, diffusion actions could be taken
to include extended DSP functionality in addition to simple amplitude
manipulation.
6.3.4. Support for Multiple Physical Control

Paradigms
Control over the output level of a single loudspeaker can easily and
ergonomically be achieved with a single fader. Simultaneous control
over the levels of a group of loudspeakers can also be ergonomically
achieved with a single fader, so long as real-time control over the
relative amplitudes is not required. This method is effective because the
control paradigm is appropriate to the action being performed. When
diffusion actions become more abstract than this (and the previous three
sections have largely been devoted towards suggesting that this should
at least be an option), it is certainly time to begin questioning the
appropriateness of the fader as a control paradigm. For this reason, an
249
Smalley (1986). "Spectro-Morphology and Structuring Processes". In Emmerson [Ed.]
The Language of Electroacoustic Music: 74.
297
exploration of alternative paradigms in addition to familiar ones is
suggested. These might include joysticks, foot-pedals, buttons, switches,
track-balls, and any number of other input devices. The idea is that the
performer, or performers, can choose how to map these controls onto
the diffusion actions they wish to perform. Ideally, an architecture that
abstracts the specific control method from the means of communication
of control data to the diffusion client is required. Technically the Ircam
AtoMIC is already used by the M2 system for this purpose, but this is
neither the most elegant, nor economic, nor user-friendly solution. What
might be better is a control surface designed specifically with a modular
approach to physical control paradigms in mind. A device in which
‘control modules’ are interchangeable might be desirable, and a sketch
is given in Figure 51, below. Internally, the current M2 control surface
is modular in architecture (see Appendix 2) and could therefore be
adapted to suit relatively easily.
Figure 51. Sketch of a fully

modular Control Surface design
with interchangeable control
units. Modules shown are (left-
to-right, top row first): vertical
faders, horizontal faders,
joysticks, momentary switches,
switches, dummy panel for use
when no module is installed.
298
6.3.5. Extended DSP Functionality
Many diffusion systems – including the M2 – only really allow us to

perform amplitude manipulation on the signals. In some cases this may
be sufficient, but there are situations in which extended functionality
may be necessary or desirable. The Cybernéphone (section 4.11), for
example, affords performers access to time-delay, filtering (EQ), phase
adjustment and pitch shifting functionalities. With the possible
exception of the latter, all of these would be of definite use to bottom-up
practitioners as means to ‘correcting’ less-than-ideal loudspeaker arrays.
Applications for top-down performers would probably adopt a more
creative guise. This gives an indication of some of the factors that must
be considered in the design and implementation of high-level diffusion
actions incorporating extended DSP functionality.
6.3.6. External and/or Automated Control via

Third-Party Software Development
An advantage of modular software-based diffusion technologies rests in

the fact that the appropriate interfacing methods and communication
protocols can be released to third-party developers (or directly to
practitioners) in the form of software development kits (SDKs). The
benefits of this approach are well-proven in the commercial sector.
Steinberg’s VST (Virtual Studio Technology) and ASIO (Audio
Streaming Input and Output) software development kits – both freely
available to third-party developers on request – have resulted in the
development of a wide variety of compatible applications (some of them
taking the shape of plugins), many of which are freely downloadable.
A software development kit for the purposes of sound diffusion could

take any of several possible forms. Using the client-server model as an
example, it would be possible to allow third-party software to
communicate directly with the server (in the case of the M2, this would
allow direct control over the mix matrix, effectively allowing
developers to write alternative client software), or with the client (third
299
parties would essentially be developing a ‘front end for the front end,’
and would be able to automate the client software for instance). Perhaps
both approaches could be adopted.
One approach that can already be realised with the current M2 system is
the provision of MIDI control data from a source other than the control
surface. The use of Max/MSP for this purpose would open up an
entirely different range of possibilities for algorithmic control of the
diffusion (as opposed to real-time control via a physical interface), and
would provide a framework more suitable perhaps for bottom-up
practitioners. Another approach still – suggested in a personal email to
the author from Charlie Richmond, Managing Director of Richmond
Audio Design – would be the use of the SuperDiffuse client application
in conjunction with the Richmond AudioBox hardware matrix mixer.
The possibilities are numerous, because any system that exhibits

second-order modularity (see Figure 48(c)), will – by its very design –
provide several potential ‘points of entry’ for external intervention, any
or all of which could be exploited by making the relevant methods and
protocols known to third parties.
6.3.7. Diffusion System Software Plugins
A system for high-level diffusion ‘actions’ (whether based around

amplitude-only diffusion, extended DSP, or a combination) could be
plugin implemented. Much like VST plugins, this would allow third-
party developers to define their own routines within a standardised
framework. This would clearly be advantageous because practitioners
would be able to devise routines that are appropriate for their own
particular purposes, and in doing so, extend the overall repertoire of
diffusion actions on a more general level.
300
6.3.8. Logistical Functionalities
As described in previous chapters, the task of staging public

performances of electroacoustic music can be a substantial undertaking.
Research has indicated a few mechanisms that could be put in place to
make the task quicker, easier, or otherwise more reliable.
In section 5.6.6 it was noted that contingencies to deal with the event of
software failure are not as thoroughly in-place in the M2 system as
would be ideal. Where control of signals sent to loudspeakers is
governed entirely by software, this is always likely to be a problem
unless some kind of in-line attenuation system is provided in hardware.
This would purely be an emergency measure, with no diffusion function
as such, and could therefore be highly ergonomic. No individual
attenuation of loudspeaker levels would be required, and so a design
incorporating a single attenuator would be ideal.
An in-built test signal facility would also be beneficial. In conjunction

with abstracted signal routings this would allow loudspeakers to be
connected to audio interface outputs – literally – arbitrarily, and the
relevant mappings made in software, via test signal calibration.
Also in relation to loudspeakers, it has previously been noted that

setting these up frequently represents the majority of the setup time
overhead in the staging of electroacoustic music concerts. This is often
because venues provide relatively few mains outlets, which, in addition,
are not always conveniently located. An efficient power distribution
system to deal with this would certainly be extremely useful. Because
active loudspeakers invariably need to be provided with mains power
and signals, there is perhaps also a case for attempting to integrate these
two distribution systems in some way, although care would obviously
have to be taken to avoid unwanted electrical interference.
301
6.3.9. Increased Accessibility via Online
Resources
If a system partially based around software components is to be

implemented, it is very important that these are made readily available,
either commercially or freely, and are be well supported in terms of
documentation. The internet would seem to be an extremely useful
mediator in this respect. Furthermore, if TCP/IP is to be implemented as
a means of communication between components in future systems (as it
is in the M2), this also means that network facilities would be natively
supported by the system, offering the potential for direct software
and/or firmware updates and so on. Networking technology – for both
creative and purely functional purposes – is becoming increasingly
common in electroacoustic music,250 and this would appear to be an
argument in favour of its inclusion in future diffusion systems.
6.4. Areas for Further Research: an Inclusive

Approach…
General architectures and specific suggestions aside, this research has

additionally indicated various areas in which further research would be
beneficial. It seems apparent that the global development of sound diffusion
systems is still very much engaged in a research process that is somewhat
specific to the particular concerns of the researchers in question. If the
approach advocated by this thesis is to be widely implemented, then further
research is needed. Research into the ways in which specific top-down and
bottom-up techniques can be accommodated into the design of future
diffusion systems – in addition to an exploration of how these needs can be
accommodated by control interfaces – is particularly important. Overall, this
thesis proposes an inclusive approach to the research process, which can be
described in terms of the following aspects:
250
Nicholas Melia and Matt Rogalsky’s Treatise on the Economy and Containment of
Despair (which was performed at the Sonic Arts Network’s Expo966 conference in 2005)
is described in programme notes as ‘a piece for guitar, violin, and networked electronics.’
Sounds encoded via microphones in real-time from the acoustic instruments are streamed
across a network to a remote location (in this case the network connection was between
Scarborough, UK and Kingston, Ontario, Canada) where they are processed and streamed
back to the performance space for diffusion.
302
6.4.1. …in terms of Technology
In Chapter 1, an approach was made towards gaining a basic holistic

understanding of the scope of electroacoustic music in technological
terms. This was summarised in Table 2 (page 23), which comprises six
‘base unit’ technological profiles. A work of electroacoustic music
might make use of one or more of these profiles, in any combination,
possibly incorporating all of them. This will, obviously, have an impact
on the kinds of demands that will be exerted on a diffusion system if it
is to be used for the performance of the work in question. In adopting an
inclusive approach it is proposed that a diffusion system should be able
to cater for all of these eventualities in the context of live performance.
It could, of course, be argued that this is already the case, and to a

certain extent this is indeed true. Most diffusion systems are able, at
some level, to accommodate all of the technologies discussed. However,
it can be counter-argued that this is merely a fortunate by-product of the
fact that widely accepted standards exist for the communication and
storage of encoded audio streams. Studio mixing desks tend to have
analogue audio inputs: it is therefore possible to connect any device
capable of outputting transitorily encoded audio streams. Problems very
quickly become apparent, however, when works that vary in
technological profile are juxtaposed in concerts. When this is the case,
this indicates that the diffusion system is not as well-suited to the task in
hand as might be desirable. This is because most applications of mixing
desks do not exert these kinds of demands, and it is therefore not usually
necessary to cater for them in the design of technologies. A more
focussed approach, geared towards the specific requirements of
electroacoustic music and sound diffusion – bearing fully in mind that
electroacoustic music tends to involve the appropriation of technologies
in fairly unique and detailed ways – is therefore to be recommended.
303
6.4.2. …in terms of Aesthetics and the Aesthetic
Affordances of Creative Frameworks
Top-down and bottom-up attitudes were introduced, and their salient

characteristics abstractly defined in Chapter 2. In this chapter it was also
noted that creative frameworks – and in the context of electroacoustic
music this includes both the means of composition and the means of
diffusion – can be ‘directionally biased,’ that is, owing to their very
nature, they can individually exert certain pressures towards either top-
down or bottom-up approaches to their use. Some frameworks are, of
course, broadly conducive to both approaches, encoded audio streams –
and, by extension, the electroacoustic idiom itself – being excellent
examples.
A diffusion system should, equally and without prejudice, be able to

accommodate both top-down and bottom-up approaches to sound
diffusion. This is a tricky issue to approach because it is inherently
paradoxical. The most effective means to composing works of a bottom-
up nature are likely to be those means that are intrinsically suited to that
particular way of working. Conversely, those frameworks that exert a
bias in favour of top-down approaches to their use are likely to be more
appropriate for the composition of top-down works. The very relevant
example of real sounds ‘versus’ synthesised sounds, as those
frameworks that respectively engendered the praxes of musique
concrète (top-down) and elektronische Musik (bottom-up), was given in
section 2.3. In the context of performing works, the same can be said of
sound diffusion systems: top-down works will be most effectively
diffused via a top-down framework, and bottom-up works are likely to
be more faithfully communicated via a bottom-up framework. Viewed
from the perspective of diffusion system design, what this seems to
imply is that any design features that are favourable to bottom-up
practitioners (those that will result in the better communication of their
works) will be correspondingly unfavourable to top-down practitioners,
and vice versa. However – and herein lies the paradox – it is very likely
304
that a diffusion system will have to accommodate both kinds of work.
Accordingly, the most commonly employed diffusion systems – the
generic templates described in section 4.6 – are relatively neutral in
terms of their directional bias, but as a consequence are far from ideal
from both top-down and bottom-up standpoints.
Two ways in which this frustrating and paradoxical problem could be

approached seem evident. Either ‘top-down sound diffusion systems’
and ‘bottom-up sound diffusion systems’ are to be regarded as two
separate and mutually exclusive frameworks, each with their own
unique characteristics and design considerations, or else attempts to
develop systems that are thoughtfully designed, highly flexible – and
therefore able to cater for both of these contrasting attitudes – are to be
made. This thesis opts for the latter choice because it engenders both
diversity and an inclusive attitude towards the electroacoustic idiom.
The former approach, however, is frequently evidenced and has been
observed, and criticised, by Landy:
One part of the hypothesis introduced above concerns the amount of

individual/small group effort going into development and scholarship (as
well as composition, in fact) in isolation. […] Individuals [stake] their
claim to an idea, an approach or some such often without adequate
contextualisation, but more importantly here without adequate or any
feedback or consistent correlation, using methodologies that are often self-
referential. Even if we accept that after fifty years we continue to be
participants in an experimental phase or process, the implications of
continued pockets of isolation are dissatisfying. Our […] enclosed, self-
supporting structure is as endangered as the opera companies and
orchestras of the world waiting for their arts […] subsidies to diminish if
not pass away. In other words, if the work of many musicians and […]
researchers remains marginalised, how can people expect all forms of
support to continue? The island mentality represents an archaic aspect of
academe which hopefully will evolve a little in the coming decades…251
Landy’s observations raise further important issues with regard to the

continued relevance of the electroacoustic idiom in an altogether
broader cultural context; we will return to this shortly. In terms of sound
diffusion (which is, of course, central to the appreciation of
electroacoustic music in a wider context anyway), systems providing
251
Landy (1999). "Reviewing the Musicology of Electroacoustic Music - A Plea for
Greater Triangulation". Organised Sound, 4(1): 63, 68.
305
frameworks that are conducive to both top-down and bottom-up
approaches are needed, and in order to achieve this, further research into
the various directional biases of prospective techniques and paradigms
will be useful.
At present, it can be argued that practitioners can be so strongly biased

in favour of their own particular methods, techniques, systems, and so
on, that they fail (or even flatly refuse) to acknowledge the validity of
other approaches. This is clearly counter-productive in a broader
context. It is not by any means suggested that all practitioners should
adopt a single mutual and unanimous perspective, but rather that a wider
variety of approaches should be regarded as equally valid despite their
inevitable differences.
6.4.3. …in terms of Specific Methodologies and

Techniques
In Chapter 3 it was observed that top-down and bottom-up attitudes tend

to foster markedly different approaches to the task of sound diffusion.
Again, this poses difficult challenges in the design of diffusion systems
and – once again – an inclusive approach to this problem is proposed.
More research into the specific techniques favoured by top-down and
bottom-up practitioners, and with respect to how these can be
accommodated into the design of future diffusion systems is needed,
and it would appear that a more holistic and inclusive ethos needs to be
adopted in order for this to happen. This proposal might seem very
similar to that made in the previous section. The distinction is this: in
the previous section, research with a view to ascertaining which creative
frameworks are top-down, and which are bottom-up, was advocated; in
the present section, a more in-depth assessment of the different
techniques (actions) carried out by top-down and bottom-up
practitioners during the process of sound diffusion is recommended.
These two distinct areas of research would appear to be reciprocally
beneficial. If we know, broadly, what top-down and bottom-up
practitioners ‘want to do’ (Table 4 on page 162 – it is hoped – will be a
306
good starting point) and which frameworks will be most appropriate for
the facilitation of these actions, then we will be in a position to
implement designs that cater both equally and more flexibly and
efficiently for both parties.
6.4.4. …in terms of Potential Applications Outside

the Electroacoustic Idiom
Of course, there is no need for advanced sound diffusion systems to be

devoted exclusively to electroacoustic music. There are other situations
in which the live broadcast of sound via multiple loudspeakers may be
necessary or useful. Theatre applications are a definite possibility, and
these represent a distinct target market for the Richmond AudioBox.252
The director of the University of Sheffield Drama Studio has expressed
an interest in the M2 system, stating that it would provide tangible
benefits in the field of theatre sound.253 Installation in night-clubs and
other music venues might also be a possibility. If such applications are
to be considered, this will obviously need to be taken into account at the
design stage. Applications with commercial prospects (electroacoustic
music is, after all, a fairly specialised area) may be an effective means
to bringing much-needed development funds into the design of new
systems – as well as offering the potential to raise the profile of
electroacoustic music – and should therefore be embraced.
6.5. Summary
What these suggestions ultimately amount to is the endorsement of an

altogether more holistic approach to the task of sound diffusion that takes
into account all of the technologies involved (both at the present time and in
the foreseeable future), different attitudes towards electroacoustic music and
the appropriation of creative (technological) frameworks (which, for
simplicity, can broadly be characterised as top-down and bottom-up), and
differing approaches to sound diffusion in terms of the specific actions taken
252
253
David Moore, co-developer of the M2 system. Personal communication with the author.
307
(again, top-down and bottom-up). There have additionally been suggestions
made with a view to making the logistical task of staging electroacoustic
music concerts more efficient.
In technological terms, a more abstract approach to the diffusion process

would be beneficial. Rather than conceptualising diffusion as a practice in
which ‘audio inputs’ are mixed to ‘audio outputs’ by moving faders, it
would be better to imagine the process as one in which ‘sources’ are
diffused to groups of loudspeakers via a repertoire of ‘actions.’ The CASS
and CLS will be useful concepts if this approach is to be adopted.
Alternative control interfaces should also be considered, bearing in mind
that each of these is a creative framework in itself, and can therefore have a
directional bias.
In aesthetic terms, we must consider the fact that top-down and bottom-up
attitudes exist within the electroacoustic idiom. Attempts to reconcile these
two opposing philosophies have thus far proved unsuccessful and therefore
– for the time being, at least – their respective methodologies must co-exist.
This is problematic because the two ideals are, in many respects,
antithetical. Nonetheless, through research into the directional biases of
creative frameworks, it will be possible to ascertain which means are
appropriate for which ends, and how these can be integrated within a system
such that top-down and bottom-up techniques are equally catered for.
In short, the design of improved sound diffusion systems assumes an

awareness of the kinds of actions performers wish to carry out, and an
understanding of the underlying musical beliefs that engender these
methodologies. We need to know what kind of musical communication is
sought, and what functionalities will be required to facilitate this. Once this
has been ascertained, attempts can be made to discover the best means to
carrying out those actions in an appropriate way. This thesis has attempted
to address these issues, and much ground has been covered, but no single
study can be comprehensive. A collaborative approach is therefore needed if
308
new sound diffusion systems are to be beneficial to both individual
practitioners and to the electroacoustic community as a whole.
309
Conclusion
Electroacoustic music is a diverse and multifaceted idiom that is

characterised by the appropriation of certain creative frameworks (see
section 1.9) in unique and specific ways. The creative frameworks
themselves were summarised in section 1.7; these include (but are not
necessarily limited to): audio encoding technologies; recording and
playback technologies; processing technologies; synthesis technologies; and
audio decoding technologies. Software technologies are often used to
emulate one or more of these, and can also be appropriated as creative
frameworks in their own right. All of these frameworks, in some capacity,
deal with encoded audio streams (section 1.6), which can therefore be
regarded as a partial defining characteristic of the electroacoustic idiom. The
loudspeaker (or, more abstractly speaking, the audio decoding technology)
can also be regarded as a defining characteristic because all of the other
frameworks rely on it as a means to realising the final auditory result. As far
as the performance of electroacoustic music is concerned, there are certain
‘instrumentations’ (combinations of creative frameworks) that are culturally
identifiable as characteristically ‘electroacoustic.’ These were summarised
in Table 2 (page 23), and sound diffusion systems should ideally
accommodate all the technological permutations.
Typically, the creative process in electroacoustic music focuses directly, and

as the primary object of artistic exploration, on one or more of the
affordances uniquely offered by the creative frameworks employed. This is
as opposed to using the frameworks as a functional means to realising
and/or communicating objectives whose primary creative drive has its
origins in the affordances of other frameworks. This differentiates
electroacoustic music from artistic praxes (such as ‘popular’ music, for
example) that engage the same frameworks in a secondary manner, and also
from purely functional applications of the same technologies, although in
the former case there is clearly the potential for ambiguity (certain popular
music traditions such as ‘intelligent dance music’ for example). While the
311
creative frameworks in general cannot be regarded as defining
characteristics of the electroacoustic idiom of themselves (they have too
many other applications), the combination of the frameworks and the unique
ways in which they are appropriated can be regarded as a defining
characteristic.
Within the scope of this definition, approaches to the composition and

performance of electroacoustic music can be regarded as either top-down or
bottom-up. These terminologies do not connote specific techniques but,
rather, broad philosophies with respect to the nature and ‘purpose’ of
electroacoustic music. Of course, specific techniques are invariably born out
of these underlying convictions, but the exact nature of these can vary
considerably. It is therefore helpful to focus on the general as opposed to the
specific: a summary of the opposing traits of top-down and bottom-up
philosophies is given in Table 4 (page 162). Historically, this binarism can
be observed in the praxes of musique concrète and elektronische Musik,
respectively, and is still strongly in evidence. These contrasting ideologies
fundamentally impact upon the nature of compositions (as described in
Chapter 2) and the approaches taken in their performance (as described in
Chapter 3).
Sound diffusion is the active process by which works of electroacoustic

music are performed to an audience. As such, this process necessarily
involves the use of loudspeakers to decode the encoded audio streams and
render the real auditory results. Because multiple encoded streams are very
often (in fact, almost invariably) used collectively to encode spatio-auditory
attributes (see section 1.6.2), this process can involve the spatialisation of
encoded material that already contains spatial information. It seems logical
that attempts should be made to preserve this spatial information during
diffusion, and it is primarily for this reason that the concepts of the coherent
audio source set (CASS) and coherent loudspeaker set (CLS) have been
proposed; these were explained in sections 3.5 and 3.7, respectively.
312
The essential nature and ultimate objectives of the process of sound
diffusion vary depending on aesthetic directionality, that is, ‘top-down
diffusion’ and ‘bottom-up diffusion’ can be regarded as two fairly distinct
performance practice conventions. Stemming from the fact that top-down
practice is, essentially, a perceptually motivated exercise, the ultimate
purpose of ‘top-down diffusion’ is to present the electroacoustic work in a
manner that is perceptually effective in the given performance context (this
can include the venue, audience, system, the piece itself, and so on). The
efficacy of the results is evaluated via a continuous process of subjective
judgement at the time of performance. For the bottom-up diffuser – who is
more objectively motivated – the aim is to faithfully communicate the
abstract concepts and structures embodied within the given electroacoustic
work. In this case, the efficacy of the results is ensured via careful
preparation of the performance context (which, again, can include the
venue, audience, system, the piece itself, and so on) prior to performance. In
terms of technique, top-down diffusers tend to favour pluriphonic
interpretations of the electroacoustic work, whereas bottom-up diffusers are
more inclined to prefer uniphonic presentations in controlled acoustic
environments.
For the top-down practitioner, the encoded audio stream (a defining

characteristic of the idiom) is an abstract representation of the perceptual
experience of a real auditory event. For the bottom-up practitioner, the
stream is a carrier of objective data that collectively express abstract micro-
and/or macro-structures. In both cases, it is a fundamental aim of sound
diffusion to present the CASS (which, of course, simply consists of one or
more encoded audio streams) in a manner that makes its ‘contents’ apparent
to the audience: in this respect, top-down and bottom-up practitioners differ
only in terms of what they consider those ‘contents’ to be (perceptual or
conceptual). It can therefore be said that maintaining the coherency of the
CASS (or CASSs) is a unanimous objective of the process of sound
diffusion.
313
Sound diffusion systems are the means by which the process of sound
diffusion is carried out, and are therefore also the means by which
electroacoustic music is performed. As such, diffusion systems should
ideally accommodate the full range of technological and aesthetic diversity
embodied within the electroacoustic idiom. This includes the purely
technological requirements, conditions determined by top-down and
bottom-up philosophies, consideration of the practical and logistical nature
of the process of sound diffusion itself and, of course, the unique
requirements of individual works and practitioners if these deviate from the
trends discussed. This is a complex and challenging set of demands indeed,
and in Chapter 4, a system of criteria was described, taking all of these
factors into account. These are as follows:
• Ability to Cater for the Technical Demands of Live

• Ability to Cater for both Top-Down and Bottom-Up Approaches
to Sound Diffusion
• Flexibility of Signal Routing and Mixing Architecture
• Interface Ergonomics
• Accessibility with respect to Operational Paradigms
• Accessibility with respect to Technological System Prerequisites
and Support thereof
• Compactness / Portability and Integration of Components
• Reliability
All of the criteria are strongly interrelated, and were explained fully in
section 4.3.
On this basis of these criteria it can be stated that current sound diffusion
system technology – as described in Chapters 4 and 5 – does not fully cater
for the requirements of the electroacoustic idiom. In many cases this is
because technological restrictions are imposed, but it is equally often
because the systems are inherently biased in favour of either top-down or
bottom-up performance practice methodologies. The design of improved
systems is therefore necessary.
In designing new systems, reference to the set of criteria outlined in Chapter

4 will be helpful. Consideration of the relative advantages and
314
disadvantages of existing systems is also to be recommended. Modular
systems with a high level of abstraction between components are
advantageous because they offer technological flexibility and can be
appropriated in various different ways. Software-based systems are also
worth considering as these offer a high degree of dynamic configuration. At
present, the concepts of the CASS and the CLS are maintained mainly by
convention: when a diffuser raises two mixing desk faders at once, for
example, he or she is effectively taking measures to ensure that the
coherency of the stereo CASS is maintained. Logistically, the practice of
sound diffusion could be made much more efficient if new systems allowed
for the explicit specification of CASSs and CLSs by performers. In this
way, the process of sound diffusion could deal with logical groupings of
encoded audio streams – as is conventionally the case anyway – as opposed
to manipulating encoded audio streams individually. This paradigm would
also allow for the development of a performance practice that is more
oriented towards high level diffusion ‘actions’ (see section 6.3.3) and less
focussed on the individual mixing of audio inputs to audio outputs. This will
be particularly advantageous in implementations of the mix-matrix
architecture, which – as discussed in section 6.2.1 – is the most appropriate
paradigm for sound diffusion but, paradoxically, also the most difficult to
control.
Numerous further suggestions were given in Chapter 6; many, many

different approaches are possible – and it has not been the purpose of this
thesis to state which specific approach is ‘best’ – but one must always be
aware of the fact that sound diffusion systems are creative frameworks in
themselves. Because every creative framework can exert a directional bias
in favour of top-down or bottom-up approaches to its use, particular care
must be taken to ensure that new systems are acceptable to both parties.
Clearly, this will have an impact on all aspects of the design, but particular
attention should be paid to the operational capabilities of the system and the
nature of the control interface. Above all, the design of diffusion systems
should always afford careful consideration to the very broad technological
and aesthetic nature of the electroacoustic idiom itself.
315
316
317
Appendix 1: An Interview with Professor
Jonty Harrison
This appendix presents a verbatim transcription of an interview conducted

by the author with Professor Jonty Harrison. The interview took place at the
University of Birmingham on 8th September 2004, and has been reasonably
often cited in the present thesis. At the time of writing, this interview has
not been published elsewhere.
Professor Harrison is among the best established practitioners of live sound

diffusion in Europe, if not world-wide, and is the founder of the
Birmingham Electro-Acoustic Sound Theatre (BEAST), whose current
diffusion system and recent experiments in the field of sound diffusion have
been documented in sections 4.8 and 4.9 respectively. The former can
effectively be regarded as having set the standard for sound diffusion
systems (certainly in the United Kingdom) and is – at the time of writing –
currently involved in a European tour. Harrison is, equally, an
internationally acclaimed composer of acousmatic works, including the
frequently performed and much-cited Klang (realised in 1982).
The present thesis owes a great deal to Harrison’s published writings on

electroacoustic music and sound diffusion, his writings on ‘organic’ and
‘architectonic’ structuring principles (see section 2.5.3) having been
particularly influential. With reference to Table 4 (page 162), it should be
reasonably obvious that Harrison’s ethos falls more obviously into the top-
down (or, to use his own terminology, ‘organic’) category than the bottom-
up.
Obviously, an enormous debt of gratitude is owed to Professor Harrison for

providing this interview, which has been much appreciated by the author. It
is hoped that the inclusion of this transcription will both clarify and
supplement some of the salient aspects discussed in the main body of the
thesis.
319
James Mooney: As one of the best established practitioners of the art, what
does ‘sound diffusion’ mean to you?
Jonty Harrison: Thank you, you’re very flattering! What it means to me, is
to do with not just space. That’s the first thing to say. I think one of the
problems is that when people talk about diffusion and spatialisation, it’s as
though it’s something being added to the music, that’s not already inherent
in the music. To me, whether spatial or any other thing that you apply to the
material coming from CD or tape, it should be in the spirit of the music. So
space, of course, comes into it, but it is not the only aspect. So, for example,
you might argue that one of the limitations of analogue tape – which is
where all of this started – was noise. The rather restrictive signal-to-noise
ratio of analogue tape without noise reduction is about 60 dB. So, the first
thing you need to do is make the quiet bits quieter and the loud bits louder,
because the composer will in those days have manually compressed the
signal anyway. So the quiet bits are quieter than the loud bits, but they’re
not necessarily as quiet, relative to the loud bits, as they should be, and
consequently you need to reinstate that. So that’s one thing: you need to
reinstate dynamic profiles. Now of course if you think about that, you’re
going to work off something that already has a relative amplitude curve
[draws hypothetical amplitude curve] from ‘quieter’ to ‘louder’. What you
are actually going to do is expand it; exaggerate it [draws the same
amplitude curve exaggerated].
JM: You mention this in your article in Organised Sound.254 To what extent
do you think that specific aspect is still necessary with the increased
dynamic range of digital recording media?
JH: It is less necessary, clearly. The thing is, how many of us really use
even the theoretical maximum of sixteen bits, which is 96 dB? Probably not
a lot. And a lot of composers I know apply compression anyway. I did a talk
recently where I used lots of CD examples, snippets. I was editing them
254
320
together and looking at them in the editor thinking, ‘Why are Normandeau’s
levels so much hotter than mine all the time?’ The answer is, he’s
compressed it. A lot of people do that as a matter of course for CD release,
learning from the pop industry of course. So that’s one thing. The [other]
point I was going to make is yes, it is still necessary, and the reason it is still
necessary is because you can’t predict what the audience conditions will be
like. So that’s why you need to be able to intervene in real-time. That’s
another reason why I’m not a big fan of completely automated diffusion,
which we’ll come back to, no doubt, later on. If you accept that, in terms of
dynamic profile, what you’re doing is exaggerating the implied profile that’s
already on the medium, then the same would apply to spatial control as well.
In other words, if you have something which is going: [makes erratic
‘pointillist’ textural sound], and zapping around all over the stereo stage,
then it seems to me perfectly legitimate to exaggerate that erratic behaviour
over a much bigger loudspeaker system. Hence doing this kind of thing
[makes short rapid hand movements] and wiggling the faders around in a
way that will throw the sound all around the room, in an erratic manner.
That seems to be perfectly in-keeping with the musical idea. On the other
hand, if you had a big, solid, steady drone going on in a piece, it seems to
me that you can’t just arbitrarily add random spatialisation to that, unless the
piece actually already has random spatialisation within whatever medium its
in, like stereo. So if its sitting there, a big fat drone, and you want it to be a
‘bigger, fatter drone’ in a big space, but you don’t go ‘wiggly-wiggly-
wiggly-wiggly’ with it! Because that’s not what’s on the tape. So that’s
what I mean about being in-keeping, and being appropriate. The problem is
that a lot of people think that diffusion and spatialisation are synonymous,
and I don’t think they are; I think they’re different things. They relate, and
very often they are connected, very strongly, in a one-to-one relationship.
But very often they’re not. In some pieces, it seems to me completely
inappropriate to throw it around the space, just because one happens to have
a big system, if the music doesn’t demand or suggest it.
JM: It seems fairly clear that presentation over one or more loudspeakers is
an inescapable fact of electroacoustic music, and that this fact precedes any
321
aesthetic concerns. Which aspects of sound diffusion practice do you see as
being applicable to all electroacoustic music, and which do you think might
be more directly related to a particular aesthetic standpoint?
JH: Actually, I don’t agree with that: I don’t think it precedes aesthetic
concerns. I think it comes from the aesthetic concerns. That’s precisely what
I’ve just been trying to say. You would find that there are some means of
doing things that are appropriate in some cases – to some pieces, or to some
sections of some pieces – which are inappropriate to other sections, of even
the same piece, in terms of the kind of movements you would use. For me,
that is an aesthetic thing because it actually relates to the way that the music
articulates itself. The same goes for dynamics. If there is a big structural
moment – a climax if you like – it seems to me that the diffuser ignores that
at his or her peril. If you want to play the piece, and that is the climax, you
have to make that climax. You can’t just arbitrarily play it a bit quieter:
you’ve got to go with what’s there. And that all means an aesthetic
understanding, and I think that this aesthetic, actually, is part of the
compositional process in a way. You can somehow encode this on the tape,
by ‘implying’ it with the kind of movement that you have.
JM: I see what you mean. Let me rephrase the previous question. On a ‘pre-
aesthetic’ level, if you’re going to write electroacoustic music – not
necessarily ‘acousmatic’ music in particular but possibly something you
would describe as more ‘architectonic’ in nature – the fact of the matter is
that it will still have to be played…
JH: …over loudspeakers. I understand what you mean now. I agree, it will
have to be played over loudspeakers. In which case, the answer to your
question – which aspects of sound diffusion practice do I see as being
applicable to all electroacoustic music – I would say probably sensitivity to
the musical material, as the first thing. That determines how you diffuse.
Because there are certain pieces that I diffuse very lightly. I hardly do
anything: slow movements, slow shifts, just ‘pointing’ certain things at
certain places. There are other pieces that I could really ‘busk’ in a huge
322
way and get away with it, because the material in the piece lends itself to
that kind of openness of approach. I think that’s the first thing; you have to
be sensitive to that. Beyond that, I think it’s the dynamic profile, and the
spatial profile. Anything that is implied on the tape as ‘this-must-be-heard’
– this movement from right to left, or this sudden surge of dynamic – then
those are the things, it seems to me, that you would exaggerate. You would
ensure that they are heard. The problem is of course, when you sit in a
controlled environment like a studio, you can hear them. When you sit in an
uncontrolled environment, like your average performance space, surrounded
by a hundred other people, you don’t necessarily hear them: somebody has
to ‘play’ them for you. That’s the key thing. But there are a lot of composers
who would disagree with that. They say, ‘The movements and levels I want
are all on the tape. Just press play!’
JM: This is when you start to get into arguments about wanting to have
anechoic halls and standardised speaker rigs.
JH: Yes, very nice ideas, but hardly the most practical. But the problem is,
everybody who has an experimental space, including places like SARC at
Belfast,255 just because they get something right in there, does not mean that
they then have a definitive version of the piece. They have a version of the
piece that works in that hall. If you take it to another space, you have to do
something else, and I think that’s the key thing. When I came over to
Sheffield to look at the M2 system, I think its fascinating and the control is
fantastic – and I want one – but it still has to be programmed afresh for
every space you do a piece in. You can’t, I think, just take the same way you
did a piece, let’s say in Sheffield, you couldn’t just ‘press play’ as it were,
and then let the M2 do the diffusion in the CBSO Centre [in Birmingham].
JM: Yes, and that’s actually not one of the main aspects of our research.
JH: But a lot of people want standardised arrays. I had a furious email
argument with – a very nice guy actually – Jean Piché of the University of
255
Sonic Arts Research Centre, Queen’s University, Belfast. See section 4.15.
323
Montréal. He wrote, on CEC-Conference256 about five years ago, that all he
needed was four very high quality, high powered, speakers, at a sufficient
distance from the audience, so that everybody was sitting in what amounted
to the ‘sweet spot’, and everything would ‘work out’. But I just do not
agree.
JM: So perhaps what you would suggest in that situation is that maybe you
would do very little [in terms of diffusion], but you would still have to do
something?
JH: Well, I don’t agree with the premise that you compose on four or eight
speakers, or whatever it is, and therefore all you need to do in the concert
hall is replicate it. Because the point is, you can’t. I think that’s my basic
problem. You cannot do it: it doesn’t work. It may work in theory, and if the
acoustic is sufficiently controlled. But how many halls have you been in that
have that kind of controlled acoustic? Not very many.
JM: I find that you always have to do something. This is reminding me of

one of my own pieces – Graffiti 2 – which is written in 5.1 formation. So, in
that sense I guess you could say the spatialisation is all ostensibly ‘in there’
[Harrison looks sceptical]. But it does use a very broad dynamic range, and I
have found that wherever I’ve played it, I have had to do certain things,
whether it’s making the front slightly louder at certain points just to
emphasise…
JH: Yes, exactly. And, if you go to the average cinema complex – for
example, there’s one in town I go to fairly regularly – and basically their
surround speakers are just ‘echoed’ all the way down the sides of the hall.
But actually that doesn’t really work because basically for most people
they’re just a side image; that’s the net result. For very few people are they
actually where they’re supposed to be, in terms of the surround.
256
Canadian Electroacoustic Community "CEC Conference". Website available at:
http:www.concordia.ca.
324
JM: So you won’t get any particularly effective phantom imaging if the
speakers are directly to either side.
JH: Exactly, and also the point is it seems to me that, well, I think this is
one of the reasons that the specification for 5.1 as I understand it, is very
demarcated in terms of what you’re supposed to put on each component in
the 5.1 array. The LFE is supposed to have that and nothing else, and the
centre speaker is supposed to have the dialogue and nothing else, and the
frontal stereo is supposed to have the soundtrack, the image, the music, et
cetera, and very little else. The surrounds kind of vaguely interact with that
but mostly for effects. So in Back to the Future the car zooms across the
screen diagonally, it kind of goes from front-right to left-rear or whatever it
does. That’s about the extent of it. So they’re kept fairly distinct in terms of
their functions. But most composition usage doesn’t want to keep them that
distinct. It wants actually to blur those things. It wants to have something
that swirls around the space and then it wants something that’s very precise
and in a very particular location; those kind of things. And I think that’s
actually quite difficult to do with 5.1, especially when, very often, the centre
speaker will – I don’t know if you’ve noticed – you get a stereo pair of
speakers, the same model of speaker – and even Genelec have done this I’ve
noticed – the same model of speaker, ostensibly, has a different number of
drive units from the left-right pair. So it can’t match the sound. And very
often, the surround speakers are a smaller unit altogether than the main
stereo. So, actually, it’s very difficult to get a fully balanced equal surround,
for example, in the average 5.1 array.
JM: Yes, if only simply because there are three speakers in front and only
two behind.
JH: Actually, that’s an interesting question that we’ll probably get on to

later. Just put that on the back-burner: ‘more in-front than behind’. I will
come back to that.
325
JM: Let’s turn to the issue of what Clozier describes as ‘diffusion-
transmission’ and ‘diffusion-interpretation.’257 Do you think that there could
be a case for a ‘diffusion-transmission’, where the idea is essentially to
attempt – as far as possible – to recreate what the composer might have
intended you to hear in the studio (and I’m thinking perhaps more
specifically of multichannel works with ‘fixed’ panning), and a ‘diffusion-
interpretation’, which is rather more for pieces of music that seem to invite
the performer to be more creative in the way that they diffuse it?
JH: If I understand the distinction you’re making between those two terms,
you’re saying that diffusion-transmission is, in a sense, this supposed
‘replication of what the composer composed’, and diffusion-interpretation is
something where somebody comes along and ‘plays it differently’. If we
can just take an instrumental analogy for a moment, the diffusion-
interpretation is equivalent to somebody playing a well-known piece at a
tempo that is completely abnormal, for example. Somebody could take
liberties with it. Is that the distinction you’re making? Or are you saying that
there’s some pieces that are intended to be diffused: they fall into diffusion-
interpretation; and there are some that the composer, particularly using
multichannel, has a fixed notion of what should happen, and he wants
everybody to hear it like that?
JM: It’s all of these senses, but if I can just draw back in a point that we
mentioned earlier on. Stemming from the simple fact that all of these pieces
of music will have to be played through loudspeakers, there’s always going
to have to be a ‘diffusion’ of sorts, even if that’s all you understand it to
mean. And so it occurs to me that we may need different ‘degrees of
diffusion’ if you like.
JH: Well I think there are certain ‘degrees of diffusion’. That’s in a sense
what I’ve been saying right from the beginning. It’s not appropriate to do,
necessarily, the same kinds of things in any given circumstance, to a whole
257
326
bunch of different pieces. I went to a conference in the States, and the guy
on the desk ‘diffused’ the pieces; all the pieces. And what he actually did
was, he had choreographed, on the mixer, certain fader movements, and he
did them irrespective of what material was coming off the medium. So he
just went through the same routine: he just went from these faders, to these
faders, from these faders to these faders, to the next, the next, and then he
started again. And he did it at the same tempo; he did the same pattern, the
same sequence. He was just arbitrarily moving the sound, no matter how
fast it was coming, no matter whether it was lots of movement implied or no
movement implied, he was moving it round the room arbitrarily, in the same
sequence, piece after piece after piece. That is not diffusion: that’s
ridiculous; that’s just nonsense. How did I get onto that? The point about
that is that, again, it goes back to this thing about you’ve got to find what’s
appropriate for the given piece. If you say, ‘Let’s take this piece; its stereo
so all we need is two loudspeakers,’ well I think most people – even in the
USA, where this has taken a long time to catch on at all – would actually
tend to say, maybe that isn’t sufficient: its fine if you’re in the sweet-spot
but not at the back or the front or the side or whatever. So you need more
speakers. Well, then you’ve got to work out how to deploy certain sections
of the piece across the four speakers, or the eight, or six, or twelve, or
whatever it is you’re using. And then you’re into diffusion. So inevitably it
grows from a need to reinstate, if you like, in a big space, that which is
audible in a small space but would get lost in a big space. So that’s the
basis. I think the diffusion-transmission thing – in the sense of taking that
principle and saying ‘Let’s assume a big space, let’s assume a lot more
loudspeakers but now, let’s fix it’ – I can see where they’re coming from.
But, all I can say is that my attempts at doing it like that have proved that
it’s virtually impossible to do it. My piece Streams, which is eight-channel,
was written for the ‘BEAST main eight’ configuration.258 The problem, first
of all, is that if you take it anywhere where they don’t understand what that
means, you have a problem. Again, at the same conference in the States,
they had a typical eight-array for North America, which is to say quad, plus
258
See Figure 10 (page 135).
327
the ‘diamond’.259 So one speaker was smack in front of you, and one
directly behind. I couldn’t use those so I had to end up using the fold-back
speakers, turning them to point at the audience, as the mains. And of course
they were the worst speakers in the house, and it showed. So there were real
problems. Also – a different space – I played it in a cinema in Edinburgh,
and because it was a cinema acoustic it was very dry, and all the sounds
stayed ‘in the boxes’. There was no ‘gluing together’ of the different
locations by the acoustic, which I have had happen in other places. It kind of
requires a little bit of ‘mixing in the air’. So what you end up doing is
putting in extra speakers, to try and have it more fluid. Before you know it,
you’re diffusing again. And then you’re into a different set of problems,
because the problem there is that, if you’ve got an eight-channel source, and
every channel of the eight has a fader, how do you do crossfades? You
haven’t got enough hands! That’s where the M2 comes in, because you can
do that. Anyway, I’m starting to repeat myself now, so let’s move on.
JM: Ideally, what kind of results should be attainable in the process of

sound diffusion, and to what extent can these be fully achieved at present?
What changes might improve the situation, and what issues may need to be
addressed?
JH: I think what one is trying to achieve in terms of results is that the
audience hears the piece, in one sense, as close to the original as possible.
Taking into account the acoustic of the room, the number of people in the
hall, how far away they’re sitting, these will all contribute and cause that
experience to deteriorate. So what you’re trying to do is compensate for
that, so that they hear it, in a sense, as clearly as they would sitting in a
studio. They can’t hear the composer’s original intention unless you
intervene to help them do it, and that’s why you have to understand ‘what
the composer was about’. That is what should be attainable. But that’s an
aesthetic; an artistic answer. If you’re asking me technically, then I would
say that there are all kinds of things one might want to do. For example –
and this is another take on multichannel sources – you might want to have
259
328
certain things that you can diffuse in one manner, at the same time as other
things you can diffuse differently. [This, of course, strongly supports the
notion of CASS and CLS groupings.] At the moment, with stereo, if you
have a drone, and something else going ‘skitty-skitty-skitty’ all over the
room, you’re relying on the fact that you can ‘ nudge’ it just enough to make
the high frequency stuff move around, whereas the lower frequency drone
doesn’t appear to move around. That’s just a ‘trick’ because of course we all
know its all coming out of the different speakers.
JM: You’re always diffusing ‘the whole thing’.
JH: Exactly. What you could do with multichannel, is put your drone in
some tracks and the ‘skitty-skitty’ in some other tracks, and deal with them
independently. Which is exactly what I’ve been trying to do. In my last
piece – Rock ‘n’ Roll – I used a six-channel hexagonal array as a ‘surround’,
and I had two solo speakers – the ‘mains’ if you like , in the BEAST setup –
for ‘close up’ stuff. What I would like to do, in a bigger diffusion rig than
we had in Leicester,260 is be able to diffuse the ‘twos’ and the ‘sixes’
completely independently.
JM: That reminds me of Truax’s DM-8 system.261
JH: Is that the AudioBox262 essentially?
JM: Essentially yes, I believe.
JH: I’ve not used it. I’ve heard it, and I know one or two people who have
used it. Again, it’s a little bit like the M2: it requires an awful lot of time,
because what you really need to do is do a diffusion afresh for each space
that you install in. And I’m not sure how flexible DM-8 really is, because
my understanding is that – at its largest – it’s only eight-in, sixteen-out. The
260
Annual Sonic Arts Network conference, Sound Circus, was held at De Montfort
University, Leicester, 11-14/6/04.
261
See section 4.14 (page 229).
262
329
M2 is way beyond that, because you’re only limited by the number of
physical outputs you can attach to your computer, aren’t you?
JM: That’s right.
JH: So you’re well ahead of the game there, I think!
JM: Well… Nonetheless, stereo is still the dominating standard in

electroacoustic music. What do you think are the advantages and limitations
of this?
JH: The advantages are it’s portable and everybody’s got it. The limitations
are those we’ve already discussed [laughs]. If you want to publish, if you
want to release stuff commercially, that’s what you’ve got to use. Except
now we’re starting to get a bit of the possibilities of 5.1, DVD-A, and that
kind of thing.
JM: DVD-A, which seems to have been ‘hovering on the horizon’ for about
five years now…!
JH: No, don’t say that! I’m trying to put out a DVD album.
JM: Well, good luck – I would love to see it happen.
JH: Well, some of the new DVD players, I think, can play the
uncompressed audio on the DVD-A and Super Audio CD formats. I saw one
the other day on a website that claimed to be able to do all this. That would
be good.
JM: As we’ve already touched on, in stereo we only have two discrete
audio channels, yet a good diffusion can give the impression of multiple
spatially independent streams. How does this happen? What techniques
might we use to achieve this?
330
JH: I think the reason it happens is psychoacoustics, in the broadest sense.
You can ‘fool’ people. For example, if you have some material going on,
and you can establish that material in the ears of the listener as relating to a
particular part of the space – a particular set of speakers, a particular array, a
particular image, or whatever way you want to think of it – and then new
material enters, if you can get the entry of the new material, if you can hit it
exactly with introducing some different loudspeakers, on top of the image
you’ve already established, you can persuade people that the original image
is staying where you already had it, and the new material is coming in on
different speakers. That can be done quite successfully, but you have to be
exact. It really needs to be exact in every respect. You need to get the right
level, and it helps of course if it’s spectrally differentiated and frequency
differentiated. There’s a piece by Vaggione called Octour, which is all
textural. It’s very busy, and eventually all these layers build up – of course
it’s the layers that enable you to do this – so you can set up something, and
there’s certain particular moments where you’re able to introduce new
material that’s kind of trumpet-like, and you could swear its over there on
those other speakers. But in fact it’s not: everything’s everywhere, because
it’s stereo! So it’s psychoacoustics and that’s how you do it. But you use
what the composer has put there in order to do it. You couldn’t just add
some other speakers ‘willy-nilly’ and have that happen. It has to be matched
to the appearance of new material which, in turn, suggests ‘new entry: new
location; new image’. And that’s where you get the thing to work; where
you marry all these things together. It just can’t be done by arbitrary means.
JM: That moves quite nicely on to one of the main things I wanted to talk to
you about. I believe you have performed some octaphonic diffusion
experiments, which I’ve heard so much about but have not been able to find
very much information on.
JH: That’s because we didn’t document them. But we should do, you’re
right. Shall I tell you what our thinking has been so far?
JM: Absolutely.
331
JH: We have various ‘prongs’ of our thinking, some of which are still very
much just ‘our thinking’ and haven’t happened yet. One thing that did
happen was at BEAST’s twentieth anniversary weekend in Spring 2003: we
actually put up eighty loudspeakers. The thinking here goes back to my
notion about diffusion. The assumption about diffusion on stereo is that, in a
big public space, you can’t deliver the image correctly. And the image of the
piece is what we’re trying to get over, in all its totality: its dynamic range,
its spatial articulation, its full spectrum; the image of the piece as it
fluctuates and develops and morphs throughout itself; we’re delivering that
image. Now, if you do that in stereo using several stereo pairs of
loudspeakers because one stereo pair cannot deliver it alone (hence the
multispeaker system), then why on earth would we assume that if you’re
working in eight channels, for example, you can do it with eight speakers?
Does the logic of the first experience not imply that what you actually need
is multiple arrays of eight speakers, not just one? That was the premise. We
were going to take an eight-channel piece, and try and diffuse it over
multiple eight-channel arrays.263 And the way we did that was we set up
these multiple arrays – and I can draw some of them for you if that would be
helpful – a main array; then we put in some that are very close; then we put
some that were on the floor; we had a stand in the middle here with eight
tiny speakers pointing out like this; then we had eight on the galleries – this
was in the CBSO Centre… I don’t know if you’ve ever noticed this, but if
you go to the opera, the stage is not flat, and probably it’s not flat in the
theatre most of the time either. It’s raked, so that people sitting in the house
see something that appears to be flat, because they’re higher up, but in fact
it’s a rake. So what we did was – and I did this with a dance project – I had
the stage with the dancers on it, moving around. They were being tracked by
camera, and the movements were translated into sound in space. So we did
it like this: these [two speakers] were on the floor; these [two more
speakers] were up a bit; these were higher; and these were higher still. So, if
you look at it from the side, it looks like this [draws diagram]. The dancers
were moving around in this space, but of course to the audience looking
263
332
from the front, they could hear the sound following the dancers around. But
if they’d [the speakers] all been flat, it would have maybe just been doing
this a bit [makes level left-to-right movements]. But the way we did it, it
was more three-dimensional. So, [returning to the octaphonic diffusion rig]
we had some on the floor, some higher ones, higher ones, and then some up
on the top gantry. We had some right up in the roof, I don’t quite remember
where. There’s a central gantry across the CBSO, so we had two on the
floor, two on the first level, two on the next level, and two in the ceiling,
like a Ferris-wheel.
JM: Ah, so tilted upwards through ninety degrees.
JH: Absolutely: a vertical array. And we also had a vertical array on the
back wall, like a rose-window in a church. The idea of what we did was
very simple: Max/MSP, and an eight-channel sound file assigned to one
fader controlling every eight-channel array. That’s it; piece of cake. And the
other thing, of course, in Max, because you can mix the signals just by
attaching two signals to one input, so in fact we had eleven ‘virtual arrays’
over eighty speakers. So these eight, on the floor, were also part of a
‘reversed eight’. They were ‘1, 2, 3, 4, 5, 6, 7, 8,’ but they were also ‘8, 7, 6,
5, 4, 3, 2, 1’. We sub-mixed all the sub-woofers as well, because we didn’t
have eight of those, or we were running out of channels, or something. So,
eighty-eight virtual channels. We were running the MOTU 24 I/O – we had
two of them – and we had a 2408 Mark 3, and we were also using the digital
optical outputs to get more channels. That [the eight-channels of audio] was
all coming from an 8-to-8, which was feeding the computer optically, and
then we also had eight channels out of the [MOTU] 828 which fed this little
array in the middle here, because that’s where the 828 was. So that’s how
we did it. And then, just a little Peavey box with eleven faders. Now of
course the problem with that is, if you’ve got certain kinds of movements on
the piece – and the piece we did was a piece I threw together the night
before, literally, so it was no great shakes, and it had a lot of circular motion
in it, which I wouldn’t normally do in quite that way. The point was that I
wanted to find out whether circular motion, through crossfading of arrays,
333
could be turned into, for example, spirals. So you go from ground, through
to main, through to higher, through to roof, and back again. So instead of
just going round and round and round, it actually climbed up, around the
space, and back down again. Kind of like a ‘Slinky’ – those metal coils that
go down stairs – that kind of effect in sound. So that something could come
round, through the Ferris-wheel array and then ‘stick’ up on the far wall. All
that kind of stuff.
JM: You’re almost pre-empting my next question. To what extent were the
perceptual effects of the octaphonic diffusion as effective as those we’ve
talked about in stereo, where you get this kind of ‘phantom separation’ of
sources?
JH: To some extent, it worked. The basic mechanics of it worked, so that

was one thing to find out. The basic practicalities of it were addressed,
which is to say, it’s a hell of a lot of work because of the number of
loudspeakers and so on! The big problem was that we only tested it with one
piece, and it wasn’t really a proper piece; it was a kind of étude. That is
problematic because in a sense the ‘database’ isn’t large enough to draw any
conclusions from. We need to do it again, with more pieces.
JM: We’re going back to the question, ‘Why does stereo dominate?’
‘Because it dominates!’
JH: Yes, partly. A lot of people did say to me that they really were surprised
at how effective the sound was, but you can’t draw much from that. We
needed to do more; we needed more objective tests; we needed to try
different kinds of material. But I’m still convinced that it could work.
However, I have to say that I’m no longer working within a regular array of
eight, so now I’m thinking more about this ‘two plus six’. So, for every one
of these ‘eights’, you could think of a ‘six’, but you could also think of
additional ‘twos’, or ‘twos’ that are part of other ‘sixes’! And of course in
Max/MSP you just draw cables to connect them: it’s not a problem. All you
need to do is have the fader controlling the right things, and you could do
334
this. You could have an array of six speakers, in one instance, where also
two of them are the stereo. You’ve got independent control over the ‘two’
and the ‘six’. That’s what I’m thinking at the moment. Musically and
aesthetically I want the control over things that are close and intimate in
quality, and things that are more general and ‘surround’ in quality, and I
want to exploit that. You can certainly think of all kinds of ways of doing
things: it all depends on how you set up the hall, and how you configure the
software. But it’s actually relatively easy to do. If we were to get an M2
[Harrison makes a suggestive glance], we would be looking to do this kind
of thing, because I think it would lend itself to it very well. Still, though, the
big problem is going to be the controller interface: what is controlling what,
and how?
JM: Keeping that thought in mind, it’s probably true to say that throughout
the history of sound diffusion, the mixing desk fader has been the primary
physical interface. What do you see as the advantages and disadvantages of
this, and to what extent do you think alternative control methods might be
successfully employed?
JH: The advantage, first of all, is that it’s an interface that practitioners of
electroacoustic music tend to know fairly well. Ironically, they know it less
well if they’ve been brought up entirely in the digital domain, because it
hasn’t been as necessary. Whereas those of us who grew up making pieces
with analogue tape, you had to use faders. You had to use the mixer because
it was the only way of mixing stuff together, so you got to know what a
fader felt like. So that’s why it worked, historically, quite well, and has been
the dominant one. The disadvantages are, of course, that it only allows
certain kinds of access. If you have physical faders, they have to be a certain
distance apart. They are, therefore, going to be sitting next to some other
fader, but not next to that fader over there at the other end of the desk. So if
you happen to want to do something with that fader and this fader, and
they’re half a metre apart, you’re stuffed! You haven’t got enough hands.
That’s where other kinds of options might come in. One option, of course,
would be automation, or using presets with ‘slews’ between them to smooth
335
out the transitions. That, of course, is all possible. To me, you see, what you
would need would be some kind of data entry thing to call up the preset,
some kind of fader or other variable control to set the slew rate, which can
be controlled in real time. Because you won’t necessarily want the change
always to be at the same rate from one preset to another preset. So you need
real-time control over a slew, not a signal.
JM: One of the things we’ve been looking at for the M2…
JH: When will it be on the Mac? That’s what I want to know!
JM: That’s in the pipeline as well, actually. With the new version we’re
working on fully platform independent code, which can be compiled for
Windows, Macintosh, and Linux. One of the other things we’re working on
is a bit like a ‘multi-band crossfader’.264 Fader is perhaps a little misleading
though: it could be on a fader. You can define points along that fader that
equate to specific presets if you like.
JH: Like the GRM Tools ‘super slider’?
JM: Very much like that. We’ve also been talking about implementing
alternative methods of control. Specifically at the moment, we’re thinking
of using joysticks, which of course give you two dimensions to work with at
once; maybe also foot pedals, and we’ve also talked about some things that
might be done with buttons.
JH: I would see all of those as being potentially useful. I’ve always
wondered why we don’t have a ‘gas pedal’ for volume, for example. It’s
such a simple idea. Whatever preset you’ve got, whatever state everything
else is in, you’ve got an overriding thing that can give a boost to all the
volumes on the desk. That would seem to me to be relatively easy to
implement. Taking the ‘driving’ analogy further, the notion of ‘shifting
gear’ or, as it were, pedal-shift, something like that. That’s a kind of preset
264
See section 6.3.1 (page 292).
336
thing, I guess. How quickly the clutch kicks in, is the ‘slew’ [laughs]. Do
you see what I’m saying? There’s all kinds of ways of thinking about it. If
you’re talking about real-time control, we’ve tried some things with game
pads. The big problem with them is that they’re terribly imprecise. There’s
not a lot of resolution, and even when the thing supposedly flicks back to
the centre, if you actually map the numbers, what it’s throwing out every
time it hits the centre is within a quite large range. It’s not very precise. So I
think there are a lot of problems there. Joysticks certainly are a good idea.
The thing about joysticks, though, is they pre-condition your thinking to
assume that what they’re going to be controlling is spatial location. Which is
fine, if that’s what you want to do. And it is entirely possible to flip images
and cross them over, all that kind of stuff. And of course they’re smooth:
they operate at the rate you set. But in a sense what you’re looking at with
everything you’re doing is a range of values from a to b. A fader is actually
a pretty efficient way of doing that, if you think about it.
JM: There’s a lot of possibilities. I think the restriction we’re looking at is

not necessarily the fader in isolation, but the fader as a single attenuator for
a single audio signal. That becomes a different issue.
JH: But like I say, if you got a fader to control the speed of a crossfade from
one preset to another, for example, that’s slightly beyond that paradigm
already, isn’t it? So there are other options. One of the reasons we got quite
excited about the video thing – the dance project I mentioned earlier – was
the idea that maybe one could sit in one’s studio with a video camera above
your hands, and you could actually do things just in space; certain kinds of
motions which could be mapped. Really, it’s just mapping: data streams that
can be mapped however you want. The big problem with all of them is that
every single possible configuration, is a ‘new instrument,’ and an instrument
takes time to learn how to play. At least with the good old fader we have a
reasonable history of having learned to play it a bit. All these new, fancy
things – gloves and motion sensors and so on – they all have their place, I
think, but I think it’s a question of how much time do we have to be able to
337
get to control them sufficiently to know what we’re going to do. I think
that’s the big issue.
JM: One of the things I’ve observed, relating to that, is that a lot of new
diffusion systems and tools seem to propose too many, and too radical,
changes. It’s almost as if they say, ‘This is the old way of doing things,
we’re going to throw that out the window and present you with something
completely new.’ This has the effect of alienating people. One of the main
things we’re trying to do with the M2 is to expand on the practice of
diffusion, rather than suggest an entirely new paradigm.
JH: I think that’s wise, actually. That’s very much more helpful for people
like me. The other thing that you and I discussed before, a propos the M2,
was that not only do you need the system to be responsive to the input
you’re giving it, but you also need it to report back what it’s doing, or what
state it’s in. For example, if you’re going from one preset to another preset,
you need a sensible and efficient way of getting that information to the
person driving. That’s always a problem, I think. Obviously one way is
motorising the faders; that would be an obvious way to go. So that, when
the preset is called – as in any automated desk – then the faders zip to that
position, or they move to that position at the rate determined by the
crossfade rate. So it recalls the position. But you need motorised faders for
that, and of course they have other problems. Every time a motorised fader
array is reset, you hear the ‘clunk’ and ‘thump’ and all the rest of it. On the
other hand, one of the things I hate about MIDI faders is, if you have some
other thing that is interacting with it, and you send a message to it saying,
‘Set the fader value to x,’ the next time you touch the fader, it’s at the
previous value, and it will glitch back to the previous value, and only then
will it increment. So that’s why the motorised faders, I would say, are
possibly the way forward. The trouble is, it also bumps the cost up.
JM: Also, as soon as you get into slightly more complex assignments of
‘what’s controlling what,’ then it’s not necessarily always going to be the
case that a movement of one control will have an obvious effect on all of the
338
others. That paradigm works better if you’re working on a ‘one fader one
speaker’ basis. The problem is, as soon as you start doing things in software,
you’re increasingly bound continue doing them in software.
JH: And then you’ve still got the reporting problem.
JM: Then you have to deal with reporting on-screen, really.
JH: Exactly, which takes time to read unless you’ve got some clever
graphics or something…
JM: On a slightly different note, you have previously described ‘organic’

and ‘architectonic,’ compositional philosophies265 I suppose they are. To
what extent do you think this dichotomy predates the context of
electroacoustic music within which you mention it?
JH: Hmm [smiles]. Well that is an interesting question, because I wasn’t

really thinking about it except in the context of electroacoustic music. I
suppose my first response might well be to say that probably most music
before electroacoustic music was architectonic almost by definition, in the
sense of western art music anyway. Obviously there are folk and other
musics, which are completely different. The assumption behind the question
may be that there is some sort of ‘rational’ decision-making process going
on – to decide that ‘this will be this’ and ‘that will be that’ – and the minute
you do that, you have some way of deconstructing that into intervallic
relationships. The point about intervallic relationships – whether they’re
pitch-to-pitch or duration-to-duration or location-to-somewhere-else – is
that that becomes the thing that you’re measuring, the ‘interval’. The thing
that is at ‘this location’ or ‘that location,’ is secondary to the interval. So
what you’re doing is you’re structuring things by intervals, rather than
structuring them by the nature of ‘what the thing is’ in itself. That’s why I
call it ‘quantitative’. The architectonic is quantitative; it’s to do with
intervals. Where as ‘qualitative’ is more to do with ‘what this thing is’ in
265
'Why' of Sound Diffusion". Organised Sound, 3(2)
339
itself, and then what something else is in itself, but not necessarily with the
measurement of the difference between them. Does that make sense
[laughs]?
JM: It makes perfect sense, and I think the binarism is a very real one. I got
to thinking, before the advent of electroacoustic music and any of the
technology that made it a possibility, if you take, say, a piece by Erik Satie,
and a fugue by Bach… Actually Bach is probably not the best example. In
fact, you have mentioned a piece by Stockhausen, one of the Klavierstücke,
where you’ve got extremely precise directions to the performer – you give
the anecdote of the nine-note chord with five different dynamic levels – to
the point where you really do end up thinking, ‘You’d be better to get a
machine to do this!’ because this really isn’t a particularly ‘human’ process
at all. On the other hand, you have something maybe a bit more like Satie
where, if you’re looking at the score – obviously he was working within that
framework – you don’t have a tempo indication in beats per minute, you
don’t have any bar lines, you don’t have any key signature. You’ve got
directions in the score such as ‘on the tip of the tongue,’ and so on. It’s
almost as if the system itself is being used in a way that actually promotes
interpretation rather than ‘reiteration,’ if you see what I mean.
JH: I take the point. Obviously, yes, there are people at different degrees
along that ‘axis of precision,’ if you like. Stockhausen, as you mention, is a
case in point. In his own work, one of the reasons that he gives for going
into electronic music, was precisely to get the accuracy you’re talking about.
Of course he also wanted to serialise timbre as well. At some point he said
something like, ‘Give to electronic music what is electronic music; give to
instrumental music what is instrumental music.’ In other words, let’s exploit
the different characteristics. If you want absolute precision, do it in the
studio; get a machine to do it, effectively. Whereas if you write instrumental
music, then allow the player more freedom, or allow the player to have more
of an influence. Don’t try to write that kind of music [machine-like] for
players. From about that time in Stockhausen’s work, you see a divergence.
The studio work is the studio work, and obviously it’s fixed so it’s precise.
340
Whereas he was moving, from the fifties and certainly through the sixties,
towards a freer and freer instrumental mindset, where things were
interchangeable and you can do this or that, or miss them out, et cetera, and
eventually to texts. He would say, ‘Play a note in the rhythm of the
universe’ and so on and so forth; Aus den sieben Tagen (1968). Of course
the ultimate thing is, if you think about it, what he’s also on record as saying
is that the principle of serialism is actually to do with mediating between
extremes. You take a situation and another, different, situation, which could
be perceived as its opposite, and you create between them a set of values.
Let’s say, ‘black and white’. You have different shades of grey in between,
and that creates a scale. If you then re-order them, that’s a series. So what
you can actually do, he says – this is his philosophical point – is that what
you’re actually doing with serialist composition, is mediating between
extremes, so you don’t see black as an antithesis of white; you see it as a
degree of white. Therefore what happens is, black and white are no longer
in binary opposition, they are ‘drawn upward into a higher unity,’ as he put
it. We all now know he thinks he was born on Sirius, and that’s slightly
worrying! The fact is, this was his philosophical approach: whatever the
situation was, he could unify it by dint of applying a serial process. So if he
says to some players in an improvising group, ‘Play a vibration in the
rhythm of the universe,’ and then later, ‘Play a vibration in the rhythm of
your smallest particle,’ they are opposites. Let’s face it: what could be
smaller than your smallest particle, in all practical senses? What could be
bigger than the universe, in all practical senses? And then he says, ‘Play all
the rhythms you can distinguish between the two.’ It’s serial! It’s still serial,
even though he’s allowing a certain kind of freedom. He was not saying,
‘This is an F-sharp; play it at this dynamic level,’ and so on, but it’s still
coming from the same serialist mindset. Even Aus den sieben Tagen can be
traced back to serialism, in my view. Anyway, I picked up on the
Stockhausen link because you said, ‘Get a machine to do it.’ At that stage in
the fifties Stockhausen was saying, let’s do this in the studios, because
we’ve reached an impasse here, as to what you can achieve with real
players: they can’t actually play five different dynamic levels in a nine-note
chord and have them heard, and be accurate; it’s just impossible. You can
341
do it in the studio: no problem. So let’s now let players do something
slightly different. And so he wrote slightly different music.
JM: Of course what players can do is put some of their own interpretation
into the music. That’s why I asked to what extent you think that the
binarism between the ‘more-or-less organic’ and ‘more-or-less
architectonic’ ways of thinking, has existed for a long time in music.
JH: Well it has. But the boundaries were moved when you became able to
capture real-world sounds that existed beyond the ‘normal’ frame in which
music happened, I think. Because everything prior to that, which existed
within the ‘normal’ bounds within which music happened – which was
notatable – therefore conformed to certain kinds of norms, and therefore
could be measured against something else that also inhabited that same
world. What breaks the barrier, is where you can go out and capture the
sound of a locomotive (or pick another Schaefferian example) and actually
incorporate that into what has previously been ‘the musical domain’. That
pushes the boundaries suddenly, markedly, away from where they were. So
yes, there’s always been a range of composers operating, from the very
intellectualised to the more improvisatory (say Chopin or Liszt). You can
situate composers somewhere along that axis, but until you get beyond
having to use either notation and/or instruments designed for the purpose to
reproduce the sounds you want – i.e. until you get recording – it seems to
me that the boundary can only go so far; it’s fixed. The minute you can
record stuff that goes beyond that, then, ‘Whoosh!’ the boundary shoots
back about a million miles!
JM: So, in a sense, the advent of recording has opened up the full
possibilities of that spectrum between the ‘organic’ and the
‘architectonic’…
JH: I would say so.
342
JM: …Whereas before it was always necessarily confined to, usually,
‘people making sounds’.
JH: And therefore you had to specify what it was you wanted from them,
which meant that you were, of necessity, tending towards an architectonic
thing. However improvisatory your aesthetic might be, you still have to
write down that you want this note played at this level, for this long, on this
instrument [laughs]. To go with this other note, this loud and for this long,
on this other instrument! Et cetera. Does that make sense? So, yes, I think
it’s always been there, it’s just that the degree of it is much more marked,
this binary thing.
JM: Absolutely. But I think that this is something that’s not often
acknowledged in the literature. In electroacoustic music it almost seems like
the implication is that now we are doing things that have never been done
before – which is obviously true, to a certain degree, in that there are things
that it is now possible to do that it wasn’t possible to do before – but
perhaps in the way people, historically, have been thinking musically, there
may be more similarities than the literature acknowledges.
JH: …But it is now very much more extreme, I think.
JM: So what implications do these different compositional perspectives

have for sound diffusion?
JH: I think that the more architectonic a composer is in his or her mind-set,
then the less likely they’re going to be overjoyed if somebody comes along
and re-spatialises or re-shapes their piece in diffusion. Or as Jean Piché put
it, he ‘Doesn’t want to hear Bobby expressing himself on the faders!’ Which
is how he characterised diffusion. He wants to hear the piece!
JM: Although that might be just as symptomatic of Bobby

misunderstanding what diffusion is.
343
JH: Like our friend in America, doing his little choreographed routine on
the faders, yes absolutely.
JM: So that might be an issue as well, that people don’t have a clear
impression of what diffusion should be. It’s very easy to think –when you
first come across the practice – ‘Why do you do that if you can have an
eight-channel piece with it all in?’
JH: I understand entirely, but I think the answer is, ‘To make the music
sound better.’ That should be the answer: that’s why you do it! Irrespective
of what the format of the piece is. To make the music sound better for more
people, in a less-than-perfect listening situation. If you put everybody in a
studio, you perhaps wouldn’t need to do it! [Laughs]. But you can’t do that.
It is because the venue is considerably larger and less predictable [than the
studio] that you need to do it at all.
JM: Might there be a case for distinct organic and architectonic ‘styles’ of
diffusion? What would these involve?
JH: The ideal for the architectonic would be no diffusion. That would be the
ideal. For the organic, all I would say is that I think that certain musics have
such physicality already embodied in them, embedded in them, that not to
continue that process into diffusion seems to be a travesty of the piece.
Where you have a real sense of something going [makes a ‘whooshing’
gesture from front to back], that’s such a physical gesture that it’s just
screaming to be exaggerated! So you do it with the faders; you just ‘do
more.’ It’s not ‘cut and dried,’ for me, but I would say that ‘architectonic
people’ don’t want diffusion, and everybody else is happy with some or
other degree of it. That’s the problem!
JM: To what extent do think diffusion measures might need to be taken in

the performance of multichannel works?
344
JH: Of course, the big problem is how many fingers you’ve got! And
therefore, as I was saying, our solution was one fader per eight
[loudspeakers] in that particular instance. But there are plenty of other ways;
in a flexible software environment like Max, or in the M2, it’s not a
problem. You can have one fader to control as many channels as you want.
You just set up a Group, in your system;266 in Max you just draw in the
patch cords. That’s one way you could do it, anyway.
JM: Multichannel media have been widely available for over ten years now,
yet stereo continues to dominate. What factors do you believe to be involved
in this?
JH: The simple answer to that is the domestic market. That’s why stereo
dominates. It’s commercial: sales. Purely and simply. I’m sure that 5.1 is
catching up, particularly in North America and Japan, but it’s still not the
norm here.
JM: Do you think that there might also be an issue of difficulty in

composing in more-than-stereo formats?
JH: Well there’s certainly a difficulty in composing in anything that isn’t

standard. It’s very interesting that you and Nikos [Stavropoulos], and others
at Sheffield, have done 5.1, and we’ve all been doing other things like eight,
or ‘two plus six,’ because of course it’s impossible to get those things out
[released commercially]. If I want my DVD-A out, all my pieces are going
to have to be converted somehow; remixed into 5.1, because that’s what the
format is.
JM: DVD-A will only do 5.1?
JH: Well, let me put it another way: although DVD-A will support eight
channels of audio, who has eight channels of audio? That’s the problem! I
looked at the Eventide eight-channel DSP outboard unit: ‘Fantastic! Eight
266
See section 5.4.2.2 (page 251).
345
channels!’ Look closely: it doesn’t output eight channels! It can, but it’s
preset to do things in 5.1 and 7.1. And if you look at the panners in Pro
Tools or Digital Performer – where you have a mono signal and you want
an output routing that is multichannel – you can do it, but only in 5.1, 6.1,
7.1, et cetera. You can’t do it in eight equal channels. You can pan between
seven of the eight in Pro Tools, but not the ‘point-one.’ Nuendo is the only
one, probably, but even that still has these presets.
JM: Then you have the difficulty – and it’s something that you’ve
mentioned in the past – what kind of source material do you use? If you’ve
got mono sources then you’ll get amplitude-only panning with no phase
differences. If you use stereo source signals, how do you pan them?
JH: The way we’ve been doing that here is that most of us have been busy
doing all of our spatialisation in Max, and then just importing it to our
sequencer of choice. I do still tend to work with stereo sources, but we’re
about to buy a Soundfield microphone so we’re hoping to be able to
incorporate ‘real’ surround sound images (as it were) as well as the
‘constructed’ ones.
JM: Barry Truax267 describes a multichannel diffusion strategy with

‘multiple channel inputs where each soundtrack can be kept discrete and
projected independently of all the others.’ Do you have any experience of
such systems? What could be the possibilities of such a system?
JH: That’s similar to my ‘two plus six’ idea; it’s very much along those
lines. It’s just that you may not want to do eight independent tracks. You
may want to have a stereo and a six, or you may suddenly want to have two
quads. It doesn’t really matter. If it’s set up in software you can just recall
the state for that piece, or every patch for example in – you could have one
for doing ‘two plus six’ and a different one for doing eight-channel. For
267
Truax (1998). "Composition and Diffusion - Space in Sound in Space". Organised
Sound, 3(2)
346
every piece in the concert you just run a new patch and it automatically
configures.
JM: How important (in any kind of diffusion) is image integrity? And what
issues might arise when we increase the number of channels on the fixed
medium to beyond stereo?
JH: This is really interesting, because it depends what we mean by ‘image.’

Do we mean by the ‘image’ of the piece, the image that you retain of the
piece as it unfolded in time; the image of the piece that you have in your
mind; the notion of the piece? Or do you mean the actual ‘sonic image’; the
‘spatial image?’ Because it could be both: they’re not necessarily mutually
exclusive. But they’re not necessarily the same thing. Image integrity seems
to be everything, except that as I said earlier, what we’re trying to do is give
the audience the image of the piece: so image integrity is everything. Except
that, I think the issue is more is ‘the integrity of the image’ to be taken to
mean there is only one ‘version’ of it? And that’s I think where, again, the
architectonic-organic divide might show up: the ‘architectonic brigade’
would say, ‘Yes, there is only one [image] – it’s the one I composed,’
whereas I might be more amenable to be saying, ‘Actually, it depends on the
hall.’ I would prefer this thing to be ‘close and frontal,’ but maybe for this,
it’s not so crucial where it is, exactly. So you get the relative thing: it’s a
little bit more malleable; it’s organic; it’s still developing in the course of
performance, just as it’s been developing in the course of composition. So I
would say that image integrity is everything, so long as, for example, if you
have hierarchical things of distance, you retain those, but exactly where you
place them to be ‘close’ and ‘distant’ is less important than the distinction
between them. It’s not that [assumes deliberately surprised and irritated
tone], ‘Hey! This sound came from there, it should have come from here!’; I
don’t worry about that so much. But I do worry about whether I’m getting
the kind of relationships in the piece, globally, that I want. That is what I
would characterise as ‘image integrity’ every bit as much as some people
might think of it as ‘fixed locations’ and so on.
347
JM: In terms of the relative benefits and disadvantages, how does the
BEAST system compare to other diffusion systems you might have used?
JH: Like the GRM, like Bourges, like all the French ones really, in BEAST
we have lots of different loudspeakers. But, of course, if you have a stereo
pair, they’re a matched pair. So if you’re going to have an eight-channel
array, they need to be the same, I think, because of the eight-channel image.
So what we’re looking at doing is we’re buying multiple eights [of
loudspeakers]. So, if someone’s going out to do a stereo gig, fine, they can
take two of these, four of those, six of those, two of those, two of those, and
a couple of sub-woofers; no problem. On the other hand, if you want to do
an eight-channel gig, then you take out eight of these, eight of these, eight of
these, sixteen of these, thank you very much! You wouldn’t dream of
putting a stereo pair on an ATC and a Genelec 1029 – that’s not a stereo
pair because they aren’t matched. So if you’ve got an eight-channel image,
you need eight matched speakers. The big problem, it seems to me, is where
you only have eight, or where the whole system is made up of the same
speakers, but that’s not that common.
JM: Many electroacoustic concert programmes are likely to contain works

that are highly diverse, both in terms of their technological demands and the
aesthetic principles on which they are based. To what extent can current
diffusions systems cope with this diversity, and how might they be
improved?
JH: Well exactly. That’s where software, and presets and recalls come in.
The way we’ve solved it with the [DACS] 3D and BEAST is that you can
have preset buttons. When you do the track assignments – the channel
assignments for the outputs – the way it works, very simply, is that if you’ve
got something coming into inputs one and two, then inputs one and two can
all be assigned to stereo pairs [of outputs] for example, so everything gets
it’s input from one and two. If you’ve got eight channels from an eight-
channel device, you would put them into eight different inputs and then just
assign the eight, and then immediately you put the faders up, that device is
348
running and you get the assignment; it works. That’s just like your patching
thing where you just hook them up; the matrix page on the M2, in
essence.268
JM: In your experience (with BEAST and with other diffusion systems), to
what extent is the potential diffusion determined by the system itself, and
how much of an issue do you think this might be? To what extent do
existing diffusion systems embody specific aesthetic approaches to
composition and diffusion (perhaps at the expense of others) and how might
this be addressed?
JH: It is determined by the system itself, by the speakers and so on, but it’s
mostly determined by the combination of that with the room. My phrase that
I’ve used a few times is, ‘You sculpt the sound in the space, but you sculpt
the space with the sound.’ So it’s a reciprocal thing between the sound on
the tape, the sound of the system and the particular sound in that space.
They’re not independent; you can’t treat any one of those things as
independent of any of the others. A fixed medium piece for eight channels
might sound one way on a set of Genelecs in a given space, but on a set of
ATCs in a different space, it’ll sound completely different. You can’t have
the same conditions all the time in all respects, and those three things
interact: the piece, the system [which includes] the loudspeakers, and the
room. They’re three variables.
JM: I was also trying to come at this particular question from an interfacing
point of view. So, in your experience, how much is your diffusion of any
given piece restricted – or indeed enhanced – by the fact that you’re having
to do it on a ‘one fader, one loudspeaker’ basis, where the faders are
arranged, say, [from left to right] mains, wides, et cetera?
JH: Well, inevitably it is influenced by that, because there are physical

constraints. You can of course re-patch or re-jig and get it to be better, and
easier to do certain things. I’ve done a lot of things in France, for example,
268
See section 5.4.2.4 (page 254).
349
where they tend to start at one end of the mixer with the speakers that are
the furthest away from you in front, and then [gestures from left to right] the
next pair, the next pair, the next pair, and this end [gestures to the right] is at
the back. That’s absolutely tickety-boo if you want to do lots of front-to-
back or back-to-front things, but if you want to get the most significant
speakers near the audience to be going [makes antiphonal sounds and hand
gestures], you’ll find that there’s a pair there, a pair there, a pair there, and a
pair there [indicates four completely separate areas of an imaginary mixing
desk]; you can’t get that interaction, which is why we don’t do that in
BEAST. We put those main speakers in a group on the faders so that you
can do it easily. But then of course it means that if you want to go front-to-
back, it’s awkward. So, nothing’s perfect! Unless you could suddenly re-
assign the faders mid-diffusion. You could do that, couldn’t you? [Laughs].
JM: The bottle-neck as I see it, is the fact that most systems work on a ‘one
fader, one loudspeaker’ basis. While you’ll have the advantage of knowing
exactly what everything does (that’s the positive side), the negative side is
that you essentially have an arbitrarily organised set of controls that are
fixed. So there will be things that are just going to be very difficult to do.
JH: That is true.
JM: If you could imagine an ‘ideal’ diffusion system, what would it be like?
What does the future hold for sound diffusion?
JH: An ideal diffusion system would be one that makes the music sound
good. And what the future holds for diffusion is that I hope to God we carry
on with it!
JM: Any developments you see as somewhat inevitable?
JH: I think the move will be, more and more, to multichannel, and therefore
an attempt to ‘fix’ everything. And people will therefore think that diffusion
really isn’t necessary after all. On the other hand, even in America now they
350
do a lot more diffusion, so in one sense I don’t think it’s going to die. I think
it’s seen as a necessary thing and that there is still some need for live
interaction, at some level, in performance.
JM: So you could say it’s as necessary as having a sound engineer – you
need to have some ‘on hand’ to make sure that it all goes according to plan.
JH: I’m not sure about the ‘sound engineer’ analogy. Do you mean
somebody who does sound at a gig?
JM: Just the fact that you have to have someone there, controlling it, rather
than it being ‘automatic’.
JH: Yes, yes. I think so. Otherwise you would just go into a room full of
loudspeakers and sound would suddenly emanate from them for no
particular reason. I think somebody needs to be interacting with that sound
in real-time, in that space. And the best place to do that, is in the middle of
the audience, or at least as part of the audience somewhere, not from some
control booth off to the side or at the back. Your ears need to be in the same
physical space as the audiences’ ears.
JM: Professor Harrison, thank you very much.
351
Appendix 2: Design of the M2 Sound
Diffusion System Control Surface
The M2 Sound Diffusion System Control Surface was briefly described in

Chapter 5 and is pictured in Figure 30 (page 241). It was designed and built
by the author specifically for the purpose, taking into consideration many of
the issues discussed previously. This appendix briefly accounts and
documents the design.
A Modular Control Surface
A key objective in the conception of the control surface was to obtain a

generic design that would allow performers to select control modules
(faders, joysticks, buttons, foot pedals, et cetera) appropriate to their needs.
Accordingly, the M2 Control Surface is, internally, modular in design. The
present prototype consists of two ‘M2 Atomic Splitter’ modules and eight
‘M2 Four-Vertical-Fader’ modules. These will be described shortly.
Voltage to MIDI Conversion
As described in Chapter 5, communication between the control surface and

SuperDiffuse client application is via the MIDI protocol. An Ircam AtoMIC
is used to derive the MIDI controller messages required by the client
application. This allowed for the development of the control surface
prototype to take place relatively quickly, as the AtoMIC was already
available at the University of Sheffield Sound Studios.
At present, the SuperDiffuse client application expects to receive MIDI CC

(continuous controller) numbers 7 and 10 (‘volume’ and ‘pan’) on all
sixteen MIDI channels as a means to externally controlling each of the
thirty-two Physical Faders on-screen. The AtoMIC has been configured
such that the control voltages received at each of its thirty-two analogue
inputs (reflecting the positions of each of the thirty-two faders) are
converted appropriately: analogue inputs 1 to 16 become MIDI CC 7,
353
channels 1 to 16, while analogue inputs 17 to 32 become MIDI CC 10,
channels 1 to 16. This AtoMIC configuration is saved as a ‘patch’ that
automatically loads when the AtoMIC is powered on.
Power Supply
The power supplied to the Control Surface is also via the AtoMIC,
providing the necessary +5v DC reference voltage that is scaled by each of
the thirty-two faders. The two-fold relationship between the Ircam AtoMIC
and the Control Surface (i.e. provision of reference voltage and conversion
of scaled control voltages to MIDI) is illustrated in Figure 52.

the relationship between Ircam
AtoMIC and Control Surface
modules.
Physical Interfacing
The Ircam AtoMIC has thirty-two analogue inputs, each of which can
receive a control voltage (in the range of 0v to +5v DC) for conversion into
MIDI CC data. The AtoMIC is also able to provide DC reference voltages,
for use if required. The analogue inputs and reference voltages are provided
via two 25-pin sub-D connectors, each providing sixteen analogue inputs,
354
reference voltages of +5v, +12v and -12v, and a path to ground. This is
illustrated in Figure 53 and Table 5, below.269
13 1
x
25 14
Figure 53. Illustration of the 25-

pin sub-D connector, of which
the Ircam AtoMIC Pro has two.
Pin numberings are shown.
Pin Function
Number
1 – 16 Analog inputs 1 to 16 / 17 to 32
17 Not connected
18 – 21 Ground
22 – 23 +5v
24 +12v
25 -12v
Table 5. Function of each of the
pins of the two 25-pin sub-D
connectors provided by the
Ircam AtoMIC Pro 2.0.
Generic Module Interface
In order to achieve a modular design, it was determined that a generic

module interface (GMI) would be useful. This is a low-level physical
connection that is able to function regardless of the specific control devices
that are connected to it. Effectively, it is an abstraction layer, of sorts. The
GMI was devised with the Ircam AtoMIC in mind, but the same physical
interface could be used in other scenarios.
It was decided that the GMI should have four control voltage input paths.
This would allow four single-axis control devices (e.g. faders) to be hosted
by a single module. If dual-axis devices (e.g. potentiometric joysticks) were
269
Ircam (2002). AtoMIC Pro Sensors/MIDI Interface User's Manual (Paris; Ircam): 22.
355
required, the GMI could host two. The GMI would obviously also need to
provide a reference voltage and a path to ground, so six pins in total would
be required. For several reasons (economy, availability, increased reliability,
scope for future expansion), a 10-pin boxed header was chosen as the
physical interfacing method for the GMI. A pin-out diagram for the GMI is
given in Figure 54 and Table 6, below.
5 1
10 6
Figure 54. Pin numberings for
the 10-pin header used for the
M2 Generic Module Interface.
Pin Function
Number
1 CV input 1
2&7 CV input 3
3&8 CV input 4
4&9 Ground
5 & 10 +5v
6 CV input 2
Table 6. Function of each of the
pins in the M2 Generic Module
Interface.
M2 AtoMIC Splitter (M2AS) Module
The purpose of the M2 AtoMIC Splitter (M2AS) module is to ‘convert’

between the connectors on the AtoMIC and the M2 Generic Module
Interface connectors. This printed circuit board is the abstraction layer that
makes the GMI generic, as opposed to an interface exclusively designed for
the AtoMIC. Figure 55 is a circuit diagram for the M2AS module; reference
to Figure 53 and Figure 54 will be helpful.
356
+5v
CN1: 25-pin sub-D (female)
22-23
CN2 - 5:
5/10 10-pin boxed header (male)
1 1
2 6
3 2/7
4 3/8
4/9
CN3
5/10
5 1
6 6
7 2/7
8 3/8
4/9
CN4
5/10
9 1
10 6
11 2/7
12 3/8
4/9
CN5
5/10
13 1
14 6
15 2/7
16 3/8
4/9
R1: 1.1k
LED1
18-21
Figure 55. Circuit diagram for

M2 AtoMIC Splitter Module
(M2AS) v1.0.
It can be seen that the M2AS circuit ‘splits’ the 25-pin sub-D connector into
four Generic Module Interface headers. Because the AtoMIC has two 25-pin
connectors, so two M2AS modules are required to access the full thirty-two
analogue inputs. An LED is included for the purposes of testing. A facsimile
of the M2AS circuit board itself is given in Figure 56; the 25-pin connector
(at the bottom-right of the PCB) and four Generic Module Interface
connectors (above, arranged horizontally) can clearly be observed.
357
Figure 56. Printed circuit board
design for the M2 AtoMIC
Splitter module v1.01, as viewed
from component side (actual
size). Dimensions: 70 x 45 mm.
M2 Four-Vertical-Fader (M2FVF) Module
The M2 Four-Vertical-Fader (M2FVF) module is a printed circuit board that

hosts four ALPS 100mm-throw slide potentiometers (faders) aligned
vertically; eight such modules are included in the current Control Surface
prototype, giving a total of thirty-two faders. The circuit diagram and a
facsimile of the PCB itself are shown in Figure 57 and Figure 58,
respectively. In Figure 58, the Generic Module Interface can be seen at the
top of the PCB, with housings for each of the four faders positioned below.
+5v
CN6: 10-pin boxed header (male)
5/10
VR1 - 4: 10k log.

1
VR2 R2: 1.1k

6
VR3 LED2
2/7
VR4
3/8
4/9
Figure 57. Circuit diagram for

M2 Four-Vertical-Fader Module
(M2FVF) v1.0.
358
Figure 58. Printed circuit board
design for the M2 4-Fader
Module v1.1, as viewed from
component side (actual size).
Dimensions: 80 x 150 mm.
Internal Control Surface Architecture
Figure 59 describes the relationship between the two M2AS and eight
M2FVF modules and the Ircam AtoMIC interface. Internally, the individual
PCBs are connected with 10- or 25-core ribbon cable. It can clearly be seen
that the control modules themselves are abstracted from the AtoMIC via the
M2AS modules. This means that alternative control paradigms can easily be
implemented in place of the vertical faders, so long as the Generic Module
Interface is observed. Equally, the AtoMIC itself could be dispensed with in
359
a future design (it is neither the most elegant nor cost-effective solution, but
was convenient for the design of a prototype) without necessitating a re-
design of the control modules. These are some of the advantages of the
modular design.
KEY
Female 10-pin
= 25-way cable 10F =
AtoMIC header connector
Male 10-pin
= 10-way cable 10M = header connector
Female 25-pin M2 AtoMIC
25F 25F 25F = sub-D connector
M2AS = Splitter Module
Male 25-pin M2 Four Vertical
25M 25M 25M = sub-D connector
M2FVF = Fader Module
25M 25M
25F 25F
M2AS M2AS
10M 10M 10M 10M 10M 10M 10M 10M
10F 10F 10F 10F 10F 10F 10F 10F
10F 10F 10F 10F 10F 10F 10F 10F

10M 10M 10M 10M 10M 10M 10M 10M
M2FVF M2FVF M2FVF M2FVF M2FVF M2FVF M2FVF M2FVF
Figure 59. Use of M2AS modules

to realise modular hardware
architecture using the Ircam
AtoMIC voltage-to-MIDI
converter, as implemented in M2
Control Surface v1.0.
External Casing
All of the Control Surface components are housed inside an aluminium box
designed by the author and built at the Department of Electrical and
Electronic Engineering at the University of Sheffield. The box consists of
two separate sheets of aluminium, cut to size, and bent to form the top and
bottom halves. The top of the box provides thirty-two slots, one for each of
the faders – and holes for the two 25-pin sub-D connectors. The bottom of
the box simply completes the enclosure. The original CAD designs provided
to the workshop are shown in Figure 60 and Figure 61.
360
Figure 60. Design for the top half
of the M2 Control Surface box.
Dimensions in millimetres.
361
Figure 61. Design for the bottom
half of the M2 Control Surface
box. Dimensions in millimetres.
362
363
364
365
Bibliography
Austin, L. (2000). "Sound Diffusion in Composition and Performance - An

Interview with Denis Smalley". Computer Music Journal, 24(2): 10-
21.
Barrett, N. (2002). "Spatio-musical Composition Strategies". Organised

Sound, 7(3): 313-323.
Barrière, F. (1998). "Diffusion, the Final Stage of Composition". In

Barrière, F. & Bennett, G. [Eds.] Composition / Diffusion in
Electroacoustic Music (Bourges; Editions Mnemosyne): 204-209.
Berenguer, J. (1998). "Music-space". In Barrière, F. & Bennett, G. [Eds.]

Composition / Diffusion in Electroacoustic Music (Bourges; Editions
Mnemosyne): 219-220.
Blumlein, A. (1931). "Improvements in and relating to Sound-transmission,

Sound-recording and Sound-reproducing Systems". Pat. App. No.
34657/31, International Pat. No. 394325. Alan Blumlein Official
Website. Available online at:
http://www.doramusic.com/patents/394325.htm.
Boesch, R. (1998). "Composition / Diffusion in Electroacoustic Music". In

Bukvic, I. (2002). "RTMix - Towards a Standardised Interactive

Electroacoustic Art Performance Interface". Organised Sound, 7(3):
275-286.
Canadian Electroacoustic Community. "CEC Conference" Mailing List.

(Montreal QC.; CEC). Website available at: http:www.concordia.ca.
CDP (2005). "CDP Home Page". Website available at:

http://www.bath.ac.uk/~masjpf/CDP/CDP.htm.
Chion, M. (2002). "Musical Research: GRM's Words". (Paris; INA).

Website available at:
Clozier, C. (1998). "Composition, Diffusion and Interpretation in

Electroacoustic Music". In Barrière, F. & Bennett, G. [Eds.]
Clozier, C. (1998). "Presentation of the Gmebaphone Concept and the

Cybernephone Instrument". In Barrière, F. & Bennett, G. [Eds.]
367
Cutler, M., Robair, G. & Bean (2000). "The Outer Limits - A Survey of
Unconventional Musical Input Devices". Electronic Musician.
Electronic journal available at:
http://emusician.com/mag/emusic_outer_limits/.
DACS (Unknown). "3D Sound Diffusion Console". Website available at:

http://www.dacs-audio.com/Custom/3d_console.htm.
Decker, S. (2005). "Shawn Decker - Installations". Website available at:

http://www.artic.edu/~sdecker/Finstallations.html.
Dhomont, F. (1995). "Acousmatic Update". EContact! 8(2). (CEC).

http://cec.concordia.ca/contact/contact82Dhom.html.
Doherty, D. (1998). "Sound Diffusion of Stereo Music Over a Multi-

Loudspeaker Sound System: From First Principles Onwards to a
Successful Experiment". SAN Journal of Electroacoustic Music, 11:
9-11.
Dolby Laboratories (1999). "Some Guidelines for Producing Music in 5.1

Channel Surround". Website available at:
http://www.beussery.com/pdf/buessery.dolby.5.1.pdf.
Dolby Laboratories (1999). "Surround Sound: Past, Present and Future".

http://www.dolby.com/assets/pdf/tech_library/2_Surround_Sound_P
ast.Present.pdf.
Eagle, D. (2001). "The aXi0". Website available at:

http://www.ucalgary.ca/~eagle/.
Emmerson, S. (1986). "The Relation of Language to Materials". In

Emmerson, S. [Ed.] The Language of Electroacoustic Music
(London; Macmillan Press): 17-39.
Emmerson, S. (1998). "Aural Landscape - Musical Space". Organised

Sound, 3(2): 135-140.
Emmerson, S. (1998). "Intercultural Diffusion: 'Points of Continuation'

(Electroacoustic Music on Tape)". In Barrière, F. & Bennett, G.
[Eds.] Composition / Diffusion in Electroacoustic Music (Bourges;
Editions Mnemosyne): 282-284.
Emmerson, S. (2000). "'Losing Touch?' The Human Performer and

Electronics". In Emmerson, S. [Ed.] Music, Electronic Media and
Culture (Aldershot; Ashgate): 194-216.
Ferrari, L. (1998). Electronic Works. (Compact Disc - BV Haast: CD9009).
368
Field, A. (2000). "Simulation and Reality: The New Sonic Objects". In
Emmerson, S. [Ed.] Music, Electronic Media and Culture
(Aldershot; Ashgate): 36-55.
Gerzon, M. (1974). "Surround-Sound Psychoacoustics". Wireless World,

80(12): 483-486.
Gerzon, M. (1995). "Papers on Ambisonics and Related Topics". Website

available at:
http://members.tripod.com/martin_leese/Ambisonic/biblio.txt.
GRM (2004). "INA-GRM: Key Dates". Website available at:

http://www.ina.fr/grm/presentation/dates.en.html.
Harley, J. (2000). "Sound in Space 2000 Symposium". Computer Music

Journal, 24(4).
Harmonic Functions (2001). "The AudioBox Disk Playback Matrix Mixer".

(Burnaby BC.; Richmond Audio Design). Website available at:
http://www.hfi.com/dm16.htm.
Harrison, J. (1998). "Sound, Space, Sculpture - Some Thoughts on the

'What,' 'How' and 'Why' of Sound Diffusion". Organised Sound,
3(2): 117-127.
Harrison, J. (1999). "Diffusion - Theories and Practices, with Particular

Reference to the BEAST System". EContact! 2(4). (CEC).
Harrison, J. (1999). "Keynote Address: Imaginary Space - Spaces in the

Imagination". Australasian Computer Music Conference
(Melbourne; 4-5 December 1999). Available at
http://cec.concordia.ca/econtact/ACMA/ACMConference.htm.
Harrison, J. (2004). "An Interview with Professor Jonty Harrison". Personal

interview with the author, 8th September 2004. See Appendix 1 for
full transcription.
Harvey, J. (1986). "The Mirror of Ambiguity". In Emmerson, S. [Ed.] The

Language of Electroacoustic Music (London; Macmillan Press):
175-190.
Hindemith, P. & Sala, O. (1998). Elektronische Impressionen. (Compact

Disc - Erdenklang Musikverlag: EK81032).
Howard, D. & Angus, J. (1996). Acoustics and Psychoacoustics (Oxford;

Focal Press).
Hyde, J. (2004). "Projects - Live Sampling". Website available at:

http://www.theperiphery.com/projects/LiveSampling.htm.
369
Ircam (2002). AtoMIC Pro Sensors/MIDI Interface User's Manual (Paris;
Ircam).
Kirk, R. & Hunt, A. (1999). Digital Sound Processing for Music and
Multimedia (Oxford; Focal Press).
Küpper, L. (1998). "Analysis of the Spatial Parameter: Psychoacoustic

Measurements in Sound Cupolas". In Barrière, F. & Bennett, G.
[Eds.] Composition / Diffusion in Electroacoustic Music (Bourges;
Editions Mnemosyne): 289-314.
Landy, L. (1999). "Reviewing the Musicology of Electroacoustic Music - A

Plea for Greater Triangulation". Organised Sound, 4(1): 61-70.
Lévi-Strauss, C. (1964). The Raw and the Cooked: Introduction to a Science

of Mythology (London; Pimlico).
Lieder, C. (2001). "Colby Lieder - Instruments". Website available at:

http://silvertone.princeton.edu/~colby/Instruments.html.
Malham, D. (1998). "Approaches to Spatialisation". Organised Sound, 3(2):

167-177.
Malham, D. (2001). "Toward Reality Equivalence in Spatial Sound

Diffusion". Computer Music Journal, 25(4): 31-38.
Manning, P. (1993). Electronic & Computer Music (Oxford; Oxford

University Press).
Marinoni, M. (2004). "Marco Marinoni's Home Page". Website available at:

http://www.geocities.com/marco_marinoni.
McNabb, M. (1986). "Computer Music: Some Aesthetic Considerations". In

McNabb, M. (1993). Dreamsong. (Compact Disc - Wergo Schallplatten:

WER 20202).
Messiaen, O. (2000). Turangalîla Symphony / L'Ascension. (Compact Disc -

Naxos: 8.554478/79).
Mooney, J. (2001). "Ambipan and Ambidec: Towards a Suite of VST

Plugins with Graphical User Interface for Positioning Sound Sources
within an Ambisonic Sound-Field in Real Time". M.Sc. thesis
(Music Technology Research Group; University of York).
Moore, A., Moore, D. & Mooney, J. (2004). "M2 Diffusion - The Live
Diffusion of Sound in Space". Proceedings of the International
Computer Music Conference (ICMC) 2004 (1-6 November 2004,
Miami FL.): 317-320.
370
Moore, D. (2004). "Real-Time Sound Spatialization: Software Design and
Implementation". Ph.D. thesis (Department of Music; University of
Sheffield).
MOTU (2005). "MOTU 24I/O Overview". Website available at:

http://www.motu.com/products/pciaudio/24IO/body.html/en.
NTMusic UK (2004). "Magnetic Tape Distribution". Website available at:

http://www.ntmusic.org/proj_dmu12.htm.
Obst, M. (1998). "Sound Projection as a Compositional Element: The

Function of the Live Electronics in the Chamber Opera 'Solaris'". In
Parmegiani, B. (1991). De Natura Sonorum. (Compact Disc - INA:

INAC3001CD).
Parmegiani, B. (1996). Sonare. (Compact Disc - INA: E5203-275912).
Puckette, M. (2004). "Software by Miller Puckette". (La Jolla, CA.;

University of California, San Diego). Website available at:
http://crca.ucsd.edu/~msp/software.html.
Roads, C., Kuchera-Morin, J. & Pope, S. (2001). "The Creatophone Sound

Spatialisation Project". (Santa Barbara, CA.; Berkeley, University of
California). Website available at:
http://www.ccmrc.ucsb.edu/wp/SpatialSnd.2.pdf.
Roads, C., Kuchera-Morin, J. & Pope, S. (2001). "Research on Spatial and

Surround Sound at CREATE". (Santa Barbara CA.; Berkeley,
University of California). Website available at:
http://www.ccmrc.ucsb.edu/wp/SpatialSnd.2.pdf.
Roland-Mieszkowski, M. (1989). "Introduction to Digital Recording

Techniques". (Halifax NS.; Digital Recordings). Website available
at: http://www.digital-recordings.com/publ/pubrec.html.
Rolfe, C. (1999). "A Practical Guide to Diffusion". EContact! 2(4). (CEC).

http://cec.concordia.ca/econtact/Diffusion/pracdiff.htm.
Roth, C. (1997). Being Happier (Sheffield; Kingfield Press).
SARC (2005). "Sonic Lab". Website available at:

http://www.sarc.qub.ac.uk/main.php?page=building&bID=1.
Satie, E. (1999). Six Gnossiennes pour Piano (Paris; Éditions Salabert).
Savouret, A. (1998). "The Natures of Diffusion". In Barrière, F. & Bennett,

G. [Eds.] Composition / Diffusion in Electroacoustic Music
(Bourges; Editions Mnemosyne): 346-354.
371
Schaeffer, P. (1966). Traite des Objets Musicaux (Paris; Seuil).
SIAL (2002). "Sound Diffusion System". Website available at:

http://www.sial.rmit.edu.au/Resources/Sound_Diffusion_System.ph
p.
Smalley, D. (1986). "Spectro-Morphology and Structuring Processes". In

Smalley, D. (1996). "The Listening Imagination - Listening in the

Electroacoustic Era". Contemporary Music Review, 13(2): 77-107.
Smalley, D. (1997). "Spectromorphology - Explaining Sound Shapes".

Organised Sound, 2(2): 107-126.
STEIM (2004). "STEIM Products - LiSa X". (Amsterdam; STEIM).

Website available at: http://www.steim.org/steim/lisa.html.
Steinberg (2005). "Steinberg". Website available at:

http://www.steinberg.de/Steinberg/defaultb0e4.html.
Stevenson, I. (2002). "Spatialisation, Method and Madness - Learning from

Commercial Systems". Australasian Computer Music Conference
(Melbourne; 6th - 8th July 2002). Available at
http://www.iii.rmit.edu.au/sonology/ACMC2002/ACMC2002_Web
_Proceedings/103-112_Stevenson.pdf.
Stockhausen, K. (1991). Stockhausen 3 - Elektronische Musik 1952-1960.

(Compact Disc - Stockhausen-Verlag: 3).
Tarsitani, S. (1999). "Musica Scienza 1999". Computer Music Journal,

24(2): 88-89.
Truax, B. (1998). "Composition and Diffusion - Space in Sound in Space".

Organised Sound, 3(2): 141-146.
Tutschku, H. (2002). "On the Interpretation of Multi-Channel

Electroacoustic Works on Loudspeaker-Orchestras: Some Thoughts
on the GRM-Acousmonium and BEAST". SAN Journal of
Electroacoustic Music, 14: 14-16.
Tutschku, H. (2004). "Hans Tutschku". Website available at:

http://www.tutschku.com/.
University of Glasgow (2005). "Department of Music: Diffusion System".

http://www.gla.ac.uk/departments/music/studio/diffusion.html.
Various (1990). Computer Music Currents 6. (Compact Disc - Wergo

Schallplatten: WER 20262).
372
Various (2000). Métamorphoses 2000. (Compact Disc - Musique &
Recherches: MR2000).
Various (2000). OHM: The Early Gurus of Electronic Music. (Compact

Disc - Ellipsis Arts: CD3670).
Various (2001). Cultures Électroniques No. 15: 28e Concours International

de Musique et d'Art Sonore Électroacoustiques. (Compact Disc -
Mnémosyne Musique Média: LCD278074/75).
Vennonen, K. (1994). "A Practical System for Three-Dimensional Sound

Projection". Website available at:
http://online.anu.edu.au/ITA/ACAT/Ambisonic/3S.94.abstract.html.
Waters, S. (2000). "Beyond the Acousmatic: Hybrid Tendencies in

Electroacoustic Music". In Emmerson, S. [Ed.] Music, Electronic
Media and Culture (Aldershot; Ashgate): 56-83.
Waters, S. (2000). "The Musical Process in the Age of Digital Intervention".

ARiADA 1. (University of East Anglia). Electronic journal available
at:
http://ariada.uea.ac.uk:16080/ariadatexts/ariada1/content/Musical_Pr
ocess.pdf.
Westerkamp, H. (1996). Transformations. (Compact Disc - Empreintes

Digitales: IMED 9631).
Windsor, L. (2000). "Through and Around the Acousmatic: The

Interpretation of Electroacoustic Sounds". In Emmerson, S. [Ed.]
Music, Electronic Media and Culture (Aldershot; Ashgate): 7-35.
Wishart, T. (1986). "Sound Symbols and Landscapes". In Emmerson, S.

[Ed.] The Language of Electroacoustic Music (London; Macmillan
Press): 41-60.
Wishart, T. (1996). On Sonic Art (Amsterdam; Harwood Academic).
Worrall, D. (1998). "Space in Sound - Sound of Space". Organised Sound,

3(2): 93-99.
Worrall, D. (2002). "David Worrall Home Page". Website available at:

http://www.avatar.com.au/worrall/.
Wyatt, S., Ewing, C., Kilbourne, J. C., Oehlers, P., Pounds, M. & Warde, A.
(1999). "Investigative Studies on Sound Diffusion / Projection".
EContact! 2(4). (CEC). Electronic journal available at:
http://cec.concordia.ca/econtact/Diffusion/Investigative.htm.
Zicarelli, D. (2004). "Max/MSP for Mac and Windows". (San Francisco

CA.; Cycling '74). Website available at:
http://www.cycling74.com/products/maxmsp.html.
373
374

Sound Diffusion Systems For The Live Performance of Electroacoustic Music - James R. Mooney - 2005

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sound Diffusion Systems For The Live Performance of Electroacoustic Music - James R. Mooney - 2005

Uploaded by

Copyright:

Available Formats

The University of Sheffield

Sound Diffusion Systems

Submitted for the Degree of Ph.D. in the Faculty of Arts

This work is licensed under a Creative Commons Attribution-

Key points are as follows.

You are free:

• To share – to copy, distribute and transmit the work.

Under the following conditions:

• Attribution. You must attribute the work in the manner specified

See http://creativecommons.org/licenses/by-nc-sa/3.0/legalcode for full

Having defined ‘electroacoustic music’ in terms of the technologies

A system of criteria for the evaluation of sound diffusion systems is

Beep: 1988 – 20th May 2004.

When I first experienced the live diffusion of electroacoustic music – at the

Later, while working towards an M.Sc. in Music Technology at York

Also while at York, I was introduced to the Ambisonic surround sound

I enjoyed my first hands-on experience of sound diffusion but, I have to say,

One aspect that I found particularly hard to comprehend, both conceptually

During this time, I was also engaged in the preparatory stages of

In 2002, as an Easter holiday, I spent a week travelling around the costal

In November 2002 – in an attempt to address this – I recorded five lengthy

Further to my formative experiences in the studio and at the diffusion

At an early stage during my research it became apparent to me that there

My approach is deliberately broad because I feel that this is an appropriate

James Mooney, August 2005.

Table of Contents xiii

1. Electroacoustic Music: Defining Characteristics 5

2. Top-Down and Bottom-Up Approaches to

5. The M2 Sound Diffusion System 239

Appendix 1: An Interview with Professor Jonty

Appendix 2: Design of the M2 Sound Diffusion System

This research is intended to be of use to anyone seeking to engage in the

From a purely technological perspective, the performance of electroacoustic

Chapter 4 evaluates several existing diffusion systems in terms of their

Chapter 5 will be devoted to a description and evaluation of the M2 Sound

The final chapter proposes directions in which practical and theoretical

These purposefully open-ended questions serve as a basic template for the

As a creative and intellectual discipline, electroacoustic music can be

1.2. Existing Definitions

The expression ‘electroacoustic music’ (musique électroacoustique in the

This definition is useful, in one sense, because it engenders an inclusive

Wishart uses the expression ‘sonic art,’ which he explains as follows:

Poorly substantiated and/or overly generalised definitions have frequently

The hybrid word ‘electroacoustic,’ used by this organization and by [Harrison]

However, practitioners of the electroacoustic art must surely have an

1.3. Towards a More Useful Definition

Unlike many spheres of musical activity, it is difficult (if not impossible) to

The expression ‘electroacoustic music’ itself suggests that we are dealing

It is an inescapable fact that electroacoustic music must be reproduced,

It has become common practice to use the term ‘electroacoustic music’ to

The performance of electroacoustic music necessarily entails amplifiers and

It can safely be said that the loudspeaker can be regarded as a defining

1.5. Fixed Media

In addition to presentation over loudspeakers, the presence of a recording

It is proposed that the fixed medium, as a defining characteristic of

1.6. Encoded Audio Streams

An encoded audio stream is a representation of real auditory events in some

Encoded audio streams can, at present, be analogue or digital, and in either

Static (Fixed Media) Transitory (Transmission)

Analogue Signals stored on: Analogue Electrical signal travelling

Digital Signals stored on: CD; DAT Optical signal travelling in an

Table 1. Summary of currently

1.6.1. Encoded Audio as an ‘Abstraction Layer’