Ebook TA225 Block 3 Part 1 ISBN0749258969 L3

1 TA225 BLOCK 3 SOUND PROCESSES CHAPTER 1 DESKTOP SOUND 1
1
TA225 The Technology of Music
Sound
Processes
Chapter 1 Desktop Sound page 3
Chapter 2 Notation and

Representation page 107
Chapter 3 Carillon to MIDI page 127
Index page 229
c
This publication forms part of an Open University course, TA225 The Technology of
Music. Details of this and other Open University courses can be obtained from the
Course Information and Advice Centre, PO Box 724, The Open University, Milton Keynes
MK7 6ZS, United Kingdom: tel. +44 (0)1908 653231, email general-enquiries@open.ac.uk
Alternatively, you may visit the Open University website at http://www.open.ac.uk
where you can learn more about the wide range of courses and packs offered at all
levels by The Open University.
To purchase a selection of Open University course materials visit the webshop at
www.ouw.co.uk, or contact Open University Worldwide, Michael Young Building,
Walton Hall, Milton Keynes MK7 6AA, United Kingdom for a brochure. tel. +44 (0)1908
858785; fax +44 (0)1908 858787; email ouwenq@open.ac.uk
The Open University
Walton Hall, Milton Keynes
MK7 6AA
First published 2004
Copyright © 2004 The Open University
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,
transmitted or utilized in any form or by any means, electronic, mechanical, photocopying,
recording or otherwise, without written permission from the publisher or a licence from the
Copyright Licensing Agency Ltd. Details of such licences (for reprographic reproduction) may be
obtained from the Copyright Licensing Agency Ltd of 90 Tottenham Court Road, London W1T 4LP.
Open University course materials may also be made available in electronic formats for use by
students of the University. All rights, including copyright and related rights and database
rights, in electronic course materials and their contents are owned by or licensed to The Open
University, or otherwise used by The Open University as permitted by applicable law.
In using electronic course materials and their contents you agree that your use will be solely
for the purposes of following an Open University course of study or otherwise as licensed by
The Open University or its assigns.
Except as permitted above you undertake not to copy, store in any medium (including
electronic storage or use in a website), distribute, transmit or re-transmit, broadcast, modify
or show in public such electronic materials in whole or in part without the prior written consent
of The Open University or in accordance with the Copyright, Designs and Patents Act 1988.
Edited, designed and typeset by The Open University.
Printed in the United Kingdom by The Burlington Press, Foxton, Cambridge CB2 6SW.
ISBN 0 7492 5896 9
1.1
TA225 Block 3 Sound processes
Chapter 1
Desktop Sound
CONTENTS
Aims of Chapter 1 4
1 Introduction 7
1.1 The emergence of desktop sound 7
1.2 What’s involved in making a master recording? 8
1.2.1 Assembling the sound elements 9
1.2.2 Editing and mixing 10
1.2.3 Adding effects 11
1.3 Introduction to the Yamaha AW16G Professional Audio
Workstation 12
2 Getting sound in and out 14
2.1 Analogue inputs 14
2.1.1 Peak-to-peak and r.m.s. amplitudes 14
2.1.2 dBm, dBV and dBu 15
2.1.3 Analogue input sensitivities 16
2.1.4 Metering 18
2.1.5 Balanced inputs 19
2.1.6 Phantom power 20
2.2 Analogue outputs 20
2.3 Digital inputs and outputs 22
2.3.1 AES/EBU digital interface standard 23
2.3.2 S/PDIF 28
2.3.3 Multi-channel audio digital interface 29
2.3.4 Other digital methods 30
2.4 Cables and connectors 31
2.4.1 Cables 31
2.4.2 Connectors 32
2.5 Inputs and outputs provided on the AW16G 34
2.5.1 Inputs 34
2.5.2 Outputs 35
2.5.3 Digital I/O 35
3 Storing sound 36
3.1 Current digital storage systems 37
3.2 Hard disk audio recorders 38
3.2.1 Hard disk recorders versus desktop computers 40
3.3 Solid state memory 41
3.3.1 Random access memory 41
3.3.2 Flash memory 42
3.4 Audio file formats 43
3.4.1 AU 44
3.4.2 AIFF 44
3.4.3 RIFF WAVE 48
3.5 Storage facilities in the AW16G 51
4 Editing 52
4.1 Getting the levels right 52
4.1.1 Compression and limiting 53
4.1.2 Expansion and gating 55
4.1.3 Normalisation 56
4.2 Editing processes 57
4.2.1 Analogue techniques 57
4.2.2 Digital techniques 58
4.3 Editing facilities in the AW16G 60
5 Mixing 61
5.1 Analogue mixing techniques 61
5.2 Digital mixing techniques 62
5.3 Controlling the level 63
5.4 Mixing facilities in the AW16G 64
6 Adding effects 70
6.1 Equalisation 70
6.2 Echo and reverberation 74
6.2.1 Echo 74
6.2.2 Reverberation 75
6.3 Flanging and chorus 78
6.3.1 Flanging 78
6.3.2 Chorus 80
6.4 Pitch and tempo 81
6.4.1 Changing pitch 81
6.4.2 Changing tempo 82
6.5 Other effects 84
6.5.1 Invert 84
6.5.2 Stereo imaging 85
6.5.3 Vocoder 86
6.5.4 Envelope follower 86
6.6 Effects provided in the AW16G 87
6.6.1 Input channel processing 87
6.6.2 Effects units 88
6.6.3 Non-real-time effects 89
6.6.4 Summary 89
7 External control 89
7.1 External control connections 90
7.2 External control facilities on the AW16G 91
8 Summing up 91
8.1 The TA225 Course Tune 92
Summary of Chapter 1 93
Appendices 97
Appendix 1 AW16G Input library list 97
Appendix 2 AW16G Preset equalisation library list 98
Appendix 3 AW16G Effects library list 99
Appendix 4 AW16G Reverberation parameters 100
Answers to self-assessment activities 101
Learning outcomes 104
Acknowledgements 106
AIMS OF CHAPTER 1
I To outline the processes involved in producing a master recording

of a musical performance using equipment that conveniently fits on
a desktop.
I To describe the types of analogue and digital signals audio devices
commonly require and/or produce.
I To describe the main methods of transferring sound signals
between audio equipment, and to introduce the basic requirements
for the cables and connectors used.
I To outline the various methods of storing sound – both analogue
and digital – and in particular to describe digital sound storage on
hard disks, solid state memory and in computer files.
I To describe how sound sources can be mixed and edited to produce
a master recording.
I To introduce the various common effects that can be applied to
sound in general and to music in particular, and to show how they
can be achieved using both analogue and digital techniques.
I To outline some methods and advantages of remote control of audio
devices.
I To give practical experience of a commercial sound recording,
editing and mixing software package, and to outline the features
and facilities offered by a commercial desktop sound device.
1 INTRODUCTION
This chapter will look at the technology behind the processes involved
in recording musical performances. A case study approach using a
commercial desktop sound device will be used so that you can see how
the technology described is used in practice. In addition, through the
computer activities associated with this chapter, you will get practical
experience of making recordings.
The chapter contains specific details of a number of real devices and
also details of a number of different audio standards. These details are
given for comparative/illustrative purposes and you are not expected
to learn the details of any of them. However, if presented with the
same or similar details in an assessment question, you should be able
to understand and answer questions involving them.
The ‘output’ of this chapter is a master recording of a musical
performance. Chapter 4 in this block will take this master recording as
its starting point and show the processes and technologies involved in
the mass distribution of such a master recording. I shall consider a
‘master’ recording to be a single stereo recording, in other words a
master recording will always comprise two sound channels (the left
hand channel and the right hand channel) which are separate in terms
of the sound they carry, but linked in that they are always stored and
played back together.
1.1 The emergence of desktop sound

In the late 1980s, desktop computers and their programs were becoming
sufficiently advanced to permit the production of typeset paper material
that until then had only been able to be done using traditional labour-
intensive methods (manually setting each line of text using lead blocks) or
mechanical methods (e.g. typesetting machines made by companies
such as Linotype and Monotype). The Apple Macintosh range of
computers together with powerful text layout software packages such
as QuarkXpress were the first to allow users to produce text layouts
that were equal to, if not generally superior to, the quality that was
available using traditional methods. In the twenty-first century, of
course, typesetting features such as fonts, line spacing, layout etc.
are commonly incorporated into word processing packages that are
available to be used with most general-purpose desktop computers.
As prices dropped and consumer-oriented general-purpose computers
were readily available, it became economically viable for individuals to
set up their own typesetting business using equipment that easily
fitted onto a normal desk – hence the birth of desktop publishing. The
same situation has now occurred with sound ‘publishing’.
Until the 1990s, whenever a professional quality recording needed to
be produced, the only way to do this was to use expensive and space-
consuming equipment. This usually meant hiring a studio – either a
large one designed for recording live performances, or, where all the
elements of the recording had been previously recorded ‘on location’, a
special studio where the master recording could be produced. If the
recording was destined for publication, the master would then have to
be reproduced onto records (vinyl), compact cassette, compact disk

(CD) or DVD.
However, like desktop publishing, technology now makes it possible
for an individual to produce high quality master recordings using
equipment that will fit on a desktop. ‘Desktop sound’ can now be
achieved with little more equipment than a desktop computer with
perhaps some special audio interface cards. However, there are now
available dedicated ‘desktop sound’ devices (often called audio
workstations or audio processors) that are designed to integrate all
the functions required to produce a master recording.
Of course, I am only talking here about the technical side of making a
recording, I am not considering the musical aspects. Musically, it may
still be necessary to hire a studio because of the number of musicians
involved, the complexity of the recording or because of the special
acoustic properties that studios can offer.
ACTIVITY 1 (EXPLORATORY) ................................................................
Can you think of any technical developments that might have had an
influence on bringing about desktop sound.
Comment
Some possible technical developments are:
• developments in technology that allow more electronic circuitry to
be concentrated into a smaller space, and therefore permitting
more complex operations to be performed;
• the development of desktop computers that can operate fast
enough to be able to process digital sound data in real time;
• the development of sophisticated techniques for processing and
manipulating digital sound;
• the development of digital storage devices (computer hard disks,
CDs, DVDs, etc.) to enable practical lengths of sound to be stored
in digital form. I
In this chapter we will be using desktop sound as a means to explain

the technical processes that are involved when producing a master
recording of a musical performance. To illustrate the processes, at each
stage real examples will be introduced – via practical work on your
computer using a commercial sound package, and/or by explaining
how the process is achieved in a real dedicated audio workstation.
The emphasis in this chapter will be on today’s digital techniques, but
analogue techniques will be mentioned briefly where appropriate for
completeness and for comparison purposes.
1.2 What’s involved in making a master recording?

As I mentioned above, desktop sound is the process by which the
whole process of making a master sound recording is achieved using
equipment that conveniently fits on a desktop. This could involve use
of a general-purpose computer (Windows or Apple Macintosh etc.)
with suitable interfaces and software. However, dedicated audio
processors are now being used since they are available at a price
that is not beyond the individual with a need for, or a serious
interest in, making professional quality recordings.
Whichever desktop system
is used (computer or
dedicated audio processor),
the basic processes involved
in making a master
recording are the same.
These processes are:
assembling/recording the
analogue inputs sounds that are to be used,
editing, mixing and adding
assembling effects as illustrated in
the sounds
Figure 1. In this section I
digital input
will outline these processes,
editing and in later sections I will
sound expand on them and
mixing
recording device introduce the technologies
adding effects that are involved. Then, in
Chapter 4 of this block some
of the more commercial
final stereo master
recording aspects of the production
sequence will be described.
Figure 1 Producing a master recording
1.2.1 Assembling the sound elements

The first step is to get all the raw sound material or sound elements
available in electronic form (digital or analogue). A sound element is
any section of sound that is (or may be) used in creating the master
recording. This could be anything from a single drum hit that is to be
looped to create a rhythm track to a complete piece of music (which
could be of just one instrument or a complete orchestra). Making these
sound elements could involve using microphones to record ‘live’ sound,
or could simply involve plugging an electronic keyboard or other
electronic instrument into the desktop sound device. There may be
some parts of the recording that are already in electronic form – perhaps
on a CD, or stored in a computer sound file for example. For live
recordings, more than one recording or ‘take’ of a whole piece or of
just one section may be made so that the best version can be chosen later.
This also allows the possibility of correcting any small imperfections in
an otherwise good recording by using sections from the other takes.
If there is more than one sound source an important decision may have
to be made right from the start: should the final blend or mix of the
various sources be achieved as the recording is being made, or should
the individual sources be recorded separately (even possibly at
different times) so that they can be mixed together at a later stage? Each
separately recorded sound element is known as a track. Note that a
track usually refers to a single sound source, whereas a channel is
used for a linked set of tracks, for example two tracks forming a stereo
pair. However, the two terms are often used interchangeably.
Can you think of any considerations that might influence whether to

produce the final mix during the recording or at a later stage?
Comment
My list of the main considerations is given below, but you may have
thought of some others.
• Does the desktop sound device have facilities to record multiple
tracks?
• If the device can record multiple tracks, are there sufficient
available to record all the sound elements of the piece of music?
• How easy (or expensive) is it to recreate the original recording set-
up if a re-recording becomes necessary at a later date?
• Are all the elements of the recording available at the same time?
• Can any ‘effects’ be applied to individual sound element(s) as the
recording is being made?
You may not have thought of this last point, but as you will see as you
study this chapter, some effects can involve a great deal of processing
and they may not be able to be done fast enough to keep up with the
progress of the recording. I
For all but the most simple recordings such as where a stereo
recording is being made using just a pair of microphones, and
assuming suitable equipment is available, it is better to try to record
the elements separately, and then combine them later. This is because
the blending can be done at leisure and away from the pressures of
time that a recording session brings, and so hopefully produce a better
result. In addition, if there is a musical or technical problem with one
sound element that cannot be corrected later, it might be possible to re-
record this one element on its own and therefore save the cost of
having to re-record the whole piece again.
Assembling the sound sources involves input and output connections,
signal levels, cables and connectors, and this is the subject of Section 2. In
addition, the sound elements have to be recorded (or stored)
somewhere, and this aspect is studied in Section 3.
1.2.2 Editing and mixing

Once the raw material has been recorded, the various sound elements
have to be combined together in the correct proportions to form a
stereo master. This process is called mix-down. Sometimes it is
necessary to carry out an intermediate stage whereby a few of the
elements are combined together, possibly with the addition of some
new material, before the final mix-down is carried out. Such a
procedure is called bounce or ping-pong recording.
Some editing either of individual sound elements or the final mixed
version may also be needed. In its simplest form this could just be
fading in the signal at the start of the piece, and fading it out at the
end. However, it may also involve cutting out, copying or swapping
sections between takes because of imperfections in the original
recording or the particular take that is deemed to be the best overall.
You have already carried out some simple editing operations using
your computer earlier in the course.
This stage may also incorporate some alteration of the overall levels or
dynamic range of individual tracks or groups of tracks to make best use
of the available dynamic range and so to maximise the signal-to-noise
ratio.
Increasingly common now is the ability to create a scene list or edit list
which contains a list of the editing actions that are to be carried out in
real time as the recording is being played. For example, the edit list
may contain instructions to switch on and switch off different tracks at
predefined points – a process known as punch in and punch out
respectively.
An advantage of using an edit list is that the original tracks are usually
not altered, and so re-editing can be carried out as many times as
required without the original material being changed or destroyed – a
process known as non-destructive editing. Another advantage is that
re-editing can be done very quickly, and the new edit can easily be
compared with the previous one. Edit lists also allow effects to be
switched on and off and other operations to be carried out in real time
as playback proceeds.
1.2.3 Adding effects

At various points in the editing and mixing process, effects may need
to be applied. Common effects that can be applied are equalisation
(treble, mid, bass etc.), reverberation and chorus, but there are many
other special effects that can be used. These may even include
changing the pitch of a piece without changing its duration (or vice
versa). These effects may need to be applied to individual tracks, to
groups of tracks or to the final stereo master.
As I mentioned earlier, in the following sections, I will describe the
processes outlined above in more detail, and I will explain the
technological principles, practices and standards which are used.
The basic principles of microphones and electrical representations of
sound in both analogue and digital form where introduced in Chapter
6 of Block 1, so I will start at the point where the signal representing
the sound source(s) (microphone, guitar pickup, electronic instrument,
etc.) is available in electrical form. My finishing point in this chapter
will be a master stereo recording which can then be mass produced in
various forms, or transmitted (e.g. broadcast by radio or sent over the
Internet). This is the subject of Chapter 4 in this block.
Although digital techniques have largely taken over from analogue
techniques in many of the stages, for completeness, I will mention
analogue methods where appropriate. Many of the topics I will be
introducing are just as applicable to a general-purpose desktop
computer with suitable sound processing software, and to other sound
processing and recording devices as they are to a dedicated desktop
sound device.
At each stage, you will carry out a number of practical activities to
illustrate the principles and procedures involved. In addition, the
features and facilities provided by an actual audio processor will be

described to show how the technology is used in a real dedicated
device. So, before we delve into the details of desktop sound, in the
next section I will give you a short introduction to this device.
ACTIVITY 3 (SELF-ASSESSMENT) ...........................................................
What is meant by non-destructive editing and how can it be achieved? I
1.3 Introduction to the Yamaha AW16G Professional

Audio Workstation
To give the following sections on desktop sound some context, I will
use a real dedicated desktop sound device as a case study at each stage
of the discussion. The device I have chosen to use is the Yamaha
AW16G Professional Audio Workstation, which was first produced
in 2002. This unit is a development of the small portable multitrack
sound recording devices that came out in the 1980s. The most common
of these was the Tascam ‘Portastudio’ which contained a simple
analogue audio mixer and used high quality compact cassettes to
record up to 4 tracks. For a time these devices were quite popular with
amateur musicians, but their use of analogue recording on compact
cassettes limited the quality of the recordings that could be made.
The AW16G contains features of all the processes involved in
producing a master audio recording, from sound input right through to
producing a master CD. Although it has a ‘professional’ specification it
costs (in 2003) about the same as a general-purpose desktop computer
system, and so it is not beyond the means of an amateur musician.
Figure 2(a) is a photograph of the unit, and Figure 2(b) is a
diagrammatic view of the front panel controls with the various
sections marked. Since the unit incorporates many sophisticated
features, I will not be explaining the function of all the front panel
controls and will cover only those concerned with the basic
procedures in producing a master recording that I have outlined
earlier.
The AW16G contains a 16-track hard disk digital recorder. Apart from
the input and output stages, all processing of the sound signals is done
digitally at the standard sample rate of 44.1 kHz (as used in compact
discs). Analogue sound inputs are quantised using 24 binary bits and
internally the device works with 32 bit sample values, but the hard
disk recorder section only uses 16 bits. This variation in the number of
bits per sample may seem a little odd, but the reasons for this will
become clear as you study this chapter. Frequency response is quoted
as 20 Hz to 20 kHz, with a dynamic range of 103 dB and less than
0.03% distortion. The standard hard disk capacity is 20 Gbytes
(1 Gbyte = 1 million million bytes).
In essence, you might like to think of the workstation as a notebook
computer running a dedicated sound processing program with an
integrated multitrack sound mixer, but of course the whole design of
the ‘computer’ part and its program is optimised to the sole task of
recording and processing sound signals.
(a)
input/output section selected channel section
work navigate
section
display
data entry/
control section
quick navigate
section
quick loop locate section

sampler section
(b) mixer section transport section
Figure 2 (a) Photograph of the AW16G Professional Audio Workstation;

(b) diagrammatic view of the front panel controls showing the various sections
There is a set of ‘cursor’ controls that allow selections to be made and

operations to be controlled. A small display is provided through
which most of the user operations are carried out.
As you study the following sections, you will be introduced to more of
the features of this device. However, before going on, this is a good
point at which to get familiarised with the course’s music recording
and editing software which is provided as part of the course.
ACTIVITY 4 (COMPUTER) ....................................................................
Follow the steps associated with this activity which you will find in
the Block 3 Companion. These will give you a basic familiarisation of
the course’s music recording and editing software. I
2 GETTING SOUND IN AND OUT
In our overall look at the recording and mastering process, I mentioned

that the first stage is to assemble the sound elements. In this section,
therefore, I will look at the various analogue and digital inputs and, for
completeness, the outputs that are commonly provided in an audio
processing system.
2.1 Analogue inputs

The term sensitivity is often used to indicate the sort of signal
amplitude that the input is able to accept. There are two main types of
analogue audio inputs – a high sensitivity input designed for low-level
signals, e.g. microphones; and a less sensitive input for signals from
synthesisers and other audio devices. This less sensitive input is often
called a line input (as it is the sort of level that can successfully be
sent down long ‘lines’ of electrical cables without signal loss becoming
a problem). Sometimes these inputs are combined into a single ‘dual-
purpose’ input and either the input is designed to cope with the whole
range of signal amplitudes, or there is an associated selector switch to
select the input type. How is the ‘sensitivity’ of an analogue input
specified? To look at this, we need to return to the discussion of
amplitudes and decibels that were first introduced in Sections 6 and
10 of Chapter 1 in Block 1.
2.1.1 Peak-to-peak and r.m.s. amplitudes

In Section 6 of Chapter 1 in Block 1, two ways of specifying the
amplitude of a electrical sine wave signal were introduced. The first is
the difference in voltage between the positive and negative peaks of the
signal – the peak-to-peak amplitude; the second is the root-mean-square
(r.m.s.) value. The r.m.s. value of a sine wave is defined as the peak
amplitude (or half the peak-to-peak amplitude – see Figure 3(a))
multiplied by a factor of approximately 0.71 (the actual value is 1/√2).
This r.m.s. value is the amplitude
+ of the constant source that would
be needed to deliver the same
peak
peak-to-peak energy as the sine wave delivers
signal level
amplitude amplitude
averaged over time.
time
Although the formula for the r.m.s.
value is only exact for sine wave
signals, it is sufficiently accurate to
–
be used as a means of specifying
(a)
the ‘volume’ of a real sound signal.

+
Vp
signal level
time
r.m.s.
amplitude
= 0.71 Vp Figure 3 (a) Peak and peak-to-peak
– values of a sine wave; (b) r.m.s. value
(b) of a general sound signal
Using the term volume implies some sort of average level, e.g. the
volume setting on your hi-fi when listening to a piece of music – there
will be loud passages and soft passages, but there is an overall volume
control setting that gives a comfortable listening level. Thus input
sensitivities are usually specified in terms of r.m.s. values as
illustrated in Figure 3(b).
However, as you will see later, peak amplitudes are also important, as
they give a measure of the largest instantaneous signal level that needs
to be accommodated.
ACTIVITY 5 (REVISION) .......................................................................
(a) A sound signal has a peak-to-peak amplitude of 20 mV (1 mV is

one thousandth of a volt). What is the r.m.s. value of this signal?
(b) A sine wave sound signal has an r.m.s. value of 71 mV. What is
the peak value of this signal?
Assume 1/√2 is approximately 0.71. I
2.1.2 dBm, dBV and dBu

In Section 10 of Chapter 1 in Block 1 you saw that because of how we
hear, it is more convenient to represent sound levels using a scale
based on equal multiplications. You saw that sound pressure levels
are often specified using the decibel (dB) scale where, if 0 dB is
defined as the sound pressure level (SPL) of the quietest sound, an
SPL of 130 dB represents the threshold of pain. You should also
remember that the decibel scale is a relative scale, and that it is
commonly used with other quantities that have a wide range.
This being so, if SPL is measured in decibels, then it is likely that any
representation of sound in another form will also more conveniently
be specified using a decibel scale (with an appropriate 0 dB reference
value). This is indeed the case for analogue electrical representations
of sound as you will see below.
A sound has an SPL of +40 dB (with 0 dB SPL being defined as in the

text above). What is the SPL of:
(a) a sound which has half the sound pressure level;
(b) a sound that has 100 times the sound pressure level?
Note you may need to refer to Table 1 in Block 1 Chapter 1. I
If you had trouble answering Activity 6, have a look at Box 1 which

gives some handy tips on working out decibel values.
Box 1 Tips on remembering decibel values

You are not expected to learn any decibel values and their equivalent ratios, and relevant tables
will always be available for any assessment question. However, you might find it helpful to remember
the two common values of 6 dB and 20 dB, +6 dB is a doubling, –6 dB is a halving, +20 dB is ten
times and –20 dB is one tenth. With these basic values, a number of other factors can be determined
by adding or subtracting decibel values. For example, a factor of 200 is the same as 10 × 10 × 2
and can therefore be thought of as a 10 multiplied by another 10 followed by a doubling, i.e. 20 +
20 + 6 = 46 dB. Similarly, a factor of 5 is 10 ÷ 2 or 20 – 6 = 14 dB.
As I mentioned above, when sound is converted into its analogue

electrical equivalent, the range of voltage levels needed to represent the
sound has the same wide range as the original sound pressure waves.
Thus sound voltage levels are also commonly measured in decibels.
Like the SPL scale for sound pressures, in electrical terms it would
clearly be helpful to have standard voltage levels for sound signals, so
there are three common standard 0 dB references – dBm, dBV and dBu.
The dBm scale is different from the others because it refers to ratios of
powers rather than amplitudes. However, it can often be approximated
to the dBu scale where 0 dBu is defined as an r.m.s. signal amplitude
of 0.775 V as explained in Box 2. Because these scales are very similar,
just ‘dB’ is often used with the ‘m’ or ‘u’ being omitted. The dBV scale
is similar to the dBu scale, but is referenced to an r.m.s. amplitude of 1
volt rather than 0.775 V (which makes 0 dBu = –2.2 dBV).
Box 2 The dBm and dBu scales

The dBm scale is a measure of the power of an electrical signal with 0 dBm
being defined as 1 mW of power. This amount of power is dissipated when a
constant voltage of √0.6 volts (or approximately 0.775 V) is placed across a
600 ohm resistance (since voltage2 = power x resistance).
Although 600 ohms is a common impedance in audio connections, it would
be impractical for all audio inputs or outputs to use this impedance. The dBu
scale overcomes this limitation and 0 dBu is defined simply as an r.m.s.
voltage level of 0.775 volts.
These reference levels are all around the level of a line signal (see
under Section 2.1 above) which means that low-level signals from
devices such as microphones have negative decibel figures (i.e. they
have a lower value than the reference level) and signals with
amplitudes greater than ‘line’ level (e.g. the sort of levels required by
loudspeakers) have positive decibel values. In contrast, for the SPL
scale where 0 dB is specified as the quietest sound pressure level, any
audible sound will always have a positive decibel value.
What are the r.m.s. voltages of the following decibel values?

(a) +6 dBV
(b) –20 dBu
(c) –26 dBV
(d) –60 dBV
As with Activity 6 above, you may need to refer to Table 1 in Block 1
Chapter 1, but you may also find you can work out the answers just by
using the techniques given in Box 1. I
2.1.3 Analogue input sensitivities

As I mentioned at the start of this section, there are two main types of
analogue input in audio equipment – a high sensitivity input for the
connection of low-level signals and a less sensitive ‘line’ input for
higher level signals. Now we have looked at r.m.s. values and absolute
decibel scales we can put some specific detail on these types of input.
A typical input for low-level signals has a nominal sensitivity of

around 1 mV r.m.s. (1 thousandth of a volt) which, as your answer to
Activity 7 should have told you, is the equivalent of –60 dBV. Such an
input is designed for direct connection to microphones, vinyl record
pickups and other sound transducers. A line-level input has a
sensitivity of perhaps a twentieth of a microphone input, say 20 mV
r.m.s. or –34 dBV, and is used for general audio connections between
devices.
Explain why –34 dBV represents a signal level of 20 mV. I
In practice input sensitivities will vary widely so most inputs have an

associated level control that allows the user to adjust the input level.
This does not mean that the most sensitive input can be used for all
levels of signal since there is often some electronic circuitry before the
level control, and if this circuitry is overloaded, then no amount of
subsequent adjustment of level will rectify the distortion that has
already been introduced.
Sensitivity values are usually given in terms of the input level that
will provide the device’s specified output level with all volume or
level controls set to maximum. In addition the maximum sound level
that an input can handle before distortion occurs within the audio
equipment is also often specified.
Another parameter that is often specified is the electrical impedance of
the input (see Box 3) although rarely is there any indication of how
this impedance varies with frequency given, or at what frequency it is
measured. Using low impedance inputs means that they are less
sensitive to external interference. However, some types of microphone
and other sound transducers themselves have a high impedance, and if
connected to too low an impedance input, their signal may become
distorted or the frequency response altered by their outputs being
loaded.
Box 3 Electrical impedance

The idea of the impedance of an input or an output has been mentioned earlier
in Block 1, but the following may help to remind you just what electrical
impedance is.
Consider water flowing in a plastic pipe as being the equivalent of electricity
flowing in a wire. If the diameter of the pipe is reduced along a small section of
the pipe by squeezing it, then as long as the water pressure at one end of
the pipe is constant, the rate of flow of water at the other end will reduce.
The more the pipe is squeezed, the less water will flow out of the other end.
In electrical terms, the rate of flow of water is the current flowing in the wire,
the water pressure at the end is the signal voltage applied and the amount of
restriction in the water pipe diameter is called the resistance, and its unit of
measurement is the ohm (Ω). However, the resistance of many electronic
devices varies with signal frequency. When this is the case, the term
impedance is used instead of resistance. Using the term impedance therefore
implies there may be some change in its value depending on the signal
frequency. It is though still measured in ohms.
2.1.4 Metering
Audio devices, then, usually provide a number inputs with different
sensitivities and each usually has an associated input level control for
fine adjustment of the level. I mentioned that one of the reasons to
provide inputs with different sensitivities is to prevent overloading of
the input circuitry and causing distortion of the signal. Overload must
therefore be avoided at all costs – particularly with digital signals
where even small amounts of overload are very noticeable.
However, in order to maximise the signal-to-noise ratio the input level
control should be set as high as possible so as to use as much of the
dynamic range of the device as possible. How then can the optimum
input level be set? The answer is to provide some sort of sound level
indication.
To monitor the audio level two basic metering systems are used: VU
(volume unit) and PPM (peak programme meter).
VU (volume unit) is a metering system which gives an indication of
the signal level which is roughly proportional to the perceived volume.
However, because the meter does not show short large transients in the
signal (short loud passages), these may cause audio distortion without
there being any indication on the meter. This makes the VU unit of
limited value especially for use on digital systems.
Sometimes a large signal transient can be so short that no distortion is
audible. PPMs (peak programme meters) therefore are designed to give an
indication only if the transient is large and long enough to be likely to
produce audible distortion. PPMs were originally designed for analogue
signals, but for digital signals a much simpler peak value indication
(however short) is usually all that is needed. Any type of peak level
indication though is a much more useful indication of sound level from
which to set an input level control, particularly where the signal is or will
be digitised. Several different PPM scales exist, but two of the more
common are the British Broadcasting Corporation’s (PPM (BBC)) which is
scaled 0 – 7 and the European Broadcasting Union’s (PPM (EBU)).
24 There are many different ways of
22 displaying sound levels from analogue
20 meters to strips of lights with varying
18 numbers of lights in the strip, but most
16 are based on average (VU) or peak (PPM)
14
measurement of the audio signal.
12 7 +12
Sometimes a meter might indicate both
10
8 6 +8
types of measurement by having the
6
peak sound level retained on the display
4 5 +4 either for a short time or until reset
+3 manually. Figure 4 shows some of the
2
0 0 –100 4 test more common metering scales for line-
–2 –80 level signals compared with a dBu scale.
–3
–4 3 –4
–60
–6 –6
Figure 4 Some common metering scales for
dBu VU PPM (BBC) PPM (EBU) line-level signals compared with the dBu scale:
(a) (b) (c) (a) VU, (b) PPM (BBC) and (c) PPM (EBU)
2.1.5 Balanced inputs

Audio signals can be affected by external electrical signals that exist all
around us. Some of these interferences may come along the power supply
lead (such as from the compressor in a refrigerator), but others are due
to electrical ‘noise’ in the atmosphere (such as a long wire acting as
signal wire
an aerial for radio signals). To
minimise the second effect, wires
screen that carry audio signals – especially
if they are carrying low-level
to equipment ground signals – are constructed such that
or earth connection the ‘live’ signal wire is surrounded
(a) signal wires
or screened by a ground wire as
wires twisted
illustrated in Figure 5(a).
screen
To minimise the effects of unwanted
noise and interference even further,
professional sound systems often
to equipment ground
or earth connection provide balanced inputs (and
(b) outputs). In a balanced system, the
Figure 5 Screened leads for audio signals:
audio signal is carried on two wires
(a) a single wire and (b) a twin wire for balanced signals. which are not only enclosed in a
separate metal screen, but are also
twisted together as illustrated in
Figure 5(b).
The screen is connected to the audio
+ device’s ground connection to
provide the protection from
interference and the other two wires
carry the analogue signal. However,
(a)
the signals in the two wires are out-
of-phase. This means that if they
were added they would cancel each
other out and the result would be no
signal at all (see Figure 6(a)). So, at
+ the receiving end, the phase of one
inverter of these signals is reversed, and the
result is added to the signal on the
other wire. Reinforcement therefore
(b) occurs to produce the ‘wanted’
interference signal (Figure 6(b)).
cancelled out
What is the advantage of doing this?
Both of the signal wires will pick up
interference
affects both + noise and interference, and as they
signals
are twisted together, most of the time
inverter
this will affect the signals on both
wires equally. At the receiving end
then, when the phase of one of the
(c)
signals is reversed and added to the
Figure 6 (a) Signals with reversed phase cancel when
signal on the other wire any noise
added together; (b) inverting one signal before addition
produces the wanted signal; (c) noise and interference and interference which has affected
appearing on the signal wires will be cancelled out leaving both wires will be cancelled out, as
only the wanted signal shown in Figure 6(c).
In addition to their use with low-level signals, balanced connections

are particularly useful for long cable runs where noise and interference
pickup is more likely.
2.1.6 Phantom power

Section 2.2 in Chapter 6 of Block 1 introduced the condenser or capacitor
microphone and explained that this type of sound transducer
needs a constant voltage across the capacitor plates in order to
operate. In addition, due to their very high impedance and low
output, such microphones often incorporate a small amplifier to
boost the signal and lower the impedance. Condenser microphones are
widely used particularly in the professional area because of their
high quality. So, microphone inputs on audio devices often provide a
constant voltage to power these types of microphone.
How is this voltage supplied? Clearly a battery, a separate power source or
an additional set of connecting wires between the microphone and the
audio equipment could be used, but these solutions are not particularly
convenient or practical. So, a constant voltage, called phantom power, is
provided between the ground (screen) wire and the signal wire (or both
signal wires in the case of a balanced system) to supply the microphone
with power (Figure 7). The usual voltage is +48 volts.
+ 48 V
voltage
audio signal
0
time
Figure 7 Phantom power provides an offset to the audio signal input
This constant voltage has no effect on the audio signal as it simply gives
it a constant offset which is removed in the audio device before the
signal is processed. (Of course the voltage of the phantom power must
be absolutely constant and not be contaminated with noise or other
electrical interference or these will be fed straight into the input along
with the wanted microphone signal.)
Care must be taken to turn the phantom power off when using a micro-
phone that does not require it otherwise the microphone might be damaged.
2.2 Analogue outputs

Audio equipment usually provides standard ‘line’ level outputs, but in
addition there might be some special outputs, e.g. for headphones.
The signal level of a line output has to be sufficiently high to ensure that
noise and interference will not become a problem when using long lead
lengths, but not too high to cause difficulties with generating such high
level signals in the audio equipment (or so high as to cause a shock
hazard!). A nominal level of +4 dBm r.m.s. (1.23 volts) with an impedance
of 600 ohms is a common specification for professional audio equipment,
and –10 dBV r.m.s. (0.316 volts) with an impedance of 10 000 ohms is
typical for consumer equipment. However, as with analogue inputs,

output levels and impedances vary widely, and although the nominal
value may be quoted as, say, +4 dBm, the output may be able to supply
up to perhaps +20 dBm r.m.s. (7.75 volts) before clipping of the signal
occurs as shown in Figure 9. Clipping can also occur for signals below
the specified maximum amplitudes if the output is fed to a low
impedance input that ‘loads’ the output too much (i.e. tries to take
more power from the output than it can provide).
clipping maximum positive

output voltage
output voltage
time
maximum negative
output voltage
Figure 8 Clipping of a signal occurring when an audio device

cannot provide a sufficiently high voltage on its output
Headphone and loudspeaker outputs have special requirements since

these devices have very low impedances (say 32 ohms for headphones
and 8–16 ohms for loudspeakers). Loudspeakers in particular can need
substantial signal voltages in order to produce loud sound levels – you
may remember a self-assessment activity on this topic in Chapter 6 of
Block 1 (Activity 15), but see Box 4 for a reminder.
Box 4 Loudspeaker power example

Audio amplifiers are often specified in terms of the output power they can
supply. 30 watts is a common value for a consumer device. How does this
relate to the signal voltage that the amplifier must be able to provide to give
this power output, assuming the loudspeaker has an impedance of 16 ohms?
Power (P) is related to impedance (R) and voltage (V) by the equation:
V2
P= , or V = P r R
R
For the loudspeaker example, P = 30 watts and R = 16 ohms, therefore:
V = 30 r 16 ≈ 22 volts
However, this is an r.m.s. value, so in order to provide this voltage, the
amplifier has to be able to supply a peak voltage of
22
≈ 32 volts (see Section 2.1.1)
0.7
or a peak-to-peak voltage of twice this value (64 volts).
ACTIVITY 9 (COMPUTER) .....................................................
In this activity you will use the course’s sound editing software to
investigate what a signal sounds like when it becomes clipped. As this
involves a feature of the editor you have not used before, you will find
detailed steps for this activity in the Block 3 Companion.
Run the course’s sound editing software, load the sound file associated
with this activity and then follow the steps associated with this
activity in the Block 3 Companion.
Comment
I hope you heard how clipping of the signal produces quite audible
distortion in the form of additional harmonics. I
2.3 Digital inputs and outputs

Unlike analogue audio signals, one cannot necessarily connect a digital
sound source to a digital sound receiver and expect the transfer to
work, even if the physical connections and electrical voltages used for
the digital signal are compatible. This is because as well as physical
connections and signal voltages being compatible, the digital data
being sent by the source must also be in the form that the receiver is
expecting. In other words there must be a protocol to which both
source and receiver adhere before the transfer will work.
Section 5.6 of Chapter 6 in Block 1 introduced the idea of serial

transmission of a digital sound signal. Using such a serial signal, can
you think of any problems which might occur which would prevent a
digital transfer working when there is no protocol?
Comment
There are a number of problems that can occur, my initial list is given
below, but you may have thought of other ones:
• the rate at which the individual bits of the serial data are sent is
not known;
• the ‘sense’ of the digital data is not known (i.e. the receiver might
interpret a binary 1 as a 0 and vice versa);
• the receiver does not know how to detect when the bits for one
sound sample end and bits for the next begin. I
Since the advent of digital audio, various manufacturers have developed

different proprietary methods for digital sound connections. In the
mid-1980s, the Audio Engineering Society (AES), the British Broadcasting
Corporation (BBC) and the European Broadcast Union (EBU) produced
a standard interface called the AES/EBU digital audio interface that
embraced most of these proprietary methods, and was also applicable
to both professional and domestic use.
Alongside the development of this standard, the Sony and Philips
companies jointly developed a similar standard for consumer use.
This is called the Sony/Philips Digital Interface or S/PDIF (often
pronounced “spif-dif” or “s-p dif” and the slash is sometimes
omitted). Also, Sony, Mitsubishi, Neve and SSL jointly developed a
multi-channel audio digital interface (MADI) for professional use in
studios when transferring digital audio between mixing consoles and
multitrack recorders etc. In terms of the basic format and transmission
method, both of these standards are identical to the AES/EBU

standard, but the connectors and some of the control data that the
signal incorporates are different.
All these interfaces have now been incorporated into the IEC958
standard, but I will refer to them by their original names as these are
still in common use (in 2003).
2.3.1 AES/EBU digital interface standard

The AES/EBU digital interface standard is a specification for sending
digital audio signals from one device to another. This physical
connection can be either an electrical connecting cable or, as you will
see later, an optical fibre connection using light.
From Section 5 in Chapter 6 of Block 1, you will remember that
digitising sound involves sampling the electrical analogue
representation of the sound at fixed time intervals, and then coding the
value of this sample as a set of binary 1s and 0s. In order to transfer
these samples to another device, it is necessary not only to transfer
these binary 1s and 0s, but to do so in such a way that the samples and
their values are not mixed up. In addition, the transfer must be able to
be done in real time otherwise it would not be possible to listen to the
sound as it was being transferred, and the samples would have to be
stored and the sound played from these stored samples once the
transfer had been completed.
At first thought, it might be considered reasonable to send the sound
data one sample at a time, with all the bits of each sample being
transferred at the same time in parallel using multiple connections as
Section 5.6 of Chapter 6 in Block 1 mentioned. However, although this
method would be fast, it would be very cumbersome and inconvenient.
It would be much better if a system could be devised whereby all the
data was sent serially down a single connection.
In an AES/EBU connection, the individual bits of each sound sample
are sent one after the other (serially) down a single wire as one of two
voltage levels – one indicating a binary 1, the other a binary 0; the bits
for any other sound track (the second channel for stereo) are then
similarly sent before the bits for the next sample of the first channel
are sent. The specification states that the rate at which the bits are sent
(the data rate) must be such that a single sample from each (or all)
channel(s) can be sent within one sampling interval (i.e. the time
between samples) so that the transfer can operate in real time.
Adopting such a scheme means that:
• the bandwidth (frequency response) of the connection must be able
to cope with the data rate of the serial signal;
• the receiver must be able to recreate the original binary stream of 1s
and 0s even in the presence of noise and a degraded signal, i.e. it
must be able to synchronise itself to the serial stream;
• the stream of data bits must contain information to enable the
receiver not only to identify the starts and ends of each sample, but
also to know how many sound channels there are and to which
channel a sample belongs.
What is the data rate of a single digital sound signal that has been
sampled at 44.1 kHz (44 100 Hz) and quantised using 16 binary bits?
Ignore any additional synchronisation or control data.
Comment
If the sample rate is 44.1 kHz, then a new sample from both sound
channels must be sent every 1/(44.1 × 103) = 23 µs (23 millionths of a
second).
Each sample contains 16 bits and there are two sound channels, so
within the above time, 32 bits of data need to be sent along the serial
connection. Thus, in order to send this number of bits within a 23 µs
period, each bit must be sent within 23/32 = 0.71 µs. Therefore the
data rate is 1/(0.72 × 10–6)= 1 408 450 bits per second. I
As you should be able to appreciate from the above activity, the data
rate required means that the bandwidth of the connection must be
much higher than for a single analogue audio channel (i.e. 20 kHz).
This means that special cable and connectors may need to be used
instead of ordinary analogue audio ones.
The second aspect mentioned above is sometimes called bit
synchronisation as it refers to making sure each individual binary bit
is received correctly. This is addressed in the AES/EBU specification
by specifying that the digital signal contains at least one transition
between the logic 0 voltage level and the logic 1 voltage level for every
bit of the sound data as illustrated in Figure 9. By doing this, even if
the data stream contains a long stream of all ones or all zeros, and/or
has been degraded through a long connecting lead, the receiver can
recreate the original stream of binary digits. How does it do this?
bit rate of sound data
clock rate (2 r data rate)
1
(a) 1 0 0 1 1 0 digital serial sound data
0
(b) transmitted AES/EBU signal
(c) received AES/EBU signal
receiver timing points
time
Figure 9 (a) Original stream of binary ones and zeros
that form the digital sound data; (b) the signal that is
transmitted; (c) the degraded signal that is received by
the receiver showing the crossing points the receiver
uses to recreate the original data stream
The receiver uses timing rather than detecting voltage levels to recreate
the original digital signal. The receiver simply has to measure the time
between each crossing of the signal between the two voltage levels.
When it does this, it should find there are two different times – one
being roughly twice the time of the other. From Figure 9(c) you should
see that two short times in succession will indicate a binary 1 and a
single double-length time a binary 0.
Using this system also means that the data rate does not have to be
fixed and can be set depending on the number of channels and the
number of bits per sample – remember that the data must be sent in
real time, so the more channels and/or bits per sample, the more data
that has to be sent within the time for a single sample. Such a system
as this is known as a self-clocking system as it does not require any
separate synchronising signal to be able to detect the bits correctly.
The third aspect mentioned above – that of determining where the bits
for one sound sample end and those for the next begin – is achieved by
dividing the data into different sections called sub-frames, frames and
blocks, and ensuring that each of these are identified by a unique
pattern of bits. Such a procedure is sometimes referred to as word
synchronisation or frame synchronisation, as it refers to the ability of
the receiver to detect specific sections of the data – this of course is in
addition to bit synchronisation which permits the receiver to detect
each individual bit correctly in the first place.
In the AES/EBU specification a sub-frame contains the data for a
single digital audio sample for one sound channel plus some
associated synchronisation and auxiliary data (Box 5). A frame
consists of one sub-frame from each of the sound channels strung
together one after each other – normally there are only two channels
(for stereo), so a frame would contain two sub-frames (Box 6). Frames
are sent at the rate of the original sample rate (i.e. one frame is sent
within the time interval between each sound sample) so the data
transfer can operate in real time.
So that the receiver can obtain information about the form of the sound
signal (e.g. how many channels, how many bits per sample), there
must be some further information contained in the data stream. Each
sub-frame contains a few spare bits which can be used for this
purpose, however, the numbers of these bits is insufficient to carry all
the information that the receiver requires. Unlike the sound data, this
type of information does not change from sample to sample (i.e.
between sub-frames), so the spare bits in each of 192 consecutive
frames are collected together to form a special set of data called the
channel status block (see Box 7). The amount of data that can be
carried in this block is then sufficient to provide not only information
about the form of the audio data, but also some additional user-defined
information as well. Each sound channel has its own channel status
block, but of course much of the information between channels will be
the same and will be duplicated in each block.
Using the information given in Boxes 5 and 6 on the number of bits

that are contained in a frame of data in the AES/EBU serial interface,
work out the data rate of the serial stream that the interface transmits
when it is sending data from a (stereo) CD. Note, CDs use a 44.1 kHz
sample rate. Explain your answer. I
Box 5 The AES/EBU sub-frame format

A single sub-frame contains 32 consecutive bits of data from the serial stream, these are numbered
0 to 31 as shown in Figure 10.
0 3 4 7 8 27 28 31 bit numbers
preamble audio sample word V U C P
Validity flag
User data
Figure 10 The AES/EBU sub-frame format
Channel status
Parity bit
The first four bits of the sub-frame are called After the preamble, the next 24 bits in the serial
the preamble and they contain a unique stream contain the actual binary data for one sound
sequence of bits that identify not only the start sample. This allows sound data quantised with up
of a sub-frame, but also its type. There are to 24 bits to be sent, but it is more usual to reserve
three different preambles usually denoted X, Y the first 4 bits for an auxiliary sound channel, and
and Z. The X preamble indicates the start of a the remaining 20 bits to the main sound signal.
sub-frame containing a sample from sound However many bits are used for the sound data,
channel A (the left-hand channel in a stereo there must always be a total of 24 bits in this
system), the Y preamble indicates the start section (unused bits are padded out with 0s).
of a sub-frame containing a sound sample The auxiliary sound channel is a low-quality voice-
from channel B (the right-hand channel) and grade sound channel that can be used in a studio
the Z preamble indicates not only the start of situation to provide voice communication without
a sub-frame containing a sample from sound the need for any additional cables. This channel
channel A, but also the start of a 192-frame consists of a digital sound stream quantised with
block. Since the first frame in a block always up to 12 bits at a sample rate of one third of the
contains a sample from channel A, once in sample rate of the main sound channel (the 12 bits
every 192 frames, the X preamble is replaced of each sample are separated into three 4-bit
by a Z preamble to indicate the start of a sections and sent using the 4 auxiliary channel bits
block of data. in three consecutive sub-frames of the main channel
In order to separate the data stream into sound – hence the sample rate needs to be one third that
samples, therefore, the receiver simply looks of the main signal).
for these preambles and from these it can The last 4 bits of the 32-bit long frame are used
identify not only the start of each sub-frame, individually as follows:
but also the start of each frame and of each
block. How is this done? The answer is that • bit 28, the V bit, is a validity flag which indicates
each of the three preambles has a unique data whether the sound data is reliable and is suitable
stream with special longer and shorter zero for conversion to an analogue signal or not. If
crossing times than occur during normal data, this flag is set, then the data is either
as illustrated in Figure 11. Notice though that erroneous, or is not sound data at all (for
the overall length of each unique preamble example it could be computer or textual data
section still occupies four bit times. for use in CD-I players);
• bit 29 is the U user data bit – the U bits from
1 bit
period each sub-frame are collected together and can
be used for auxiliary user data;
• bit 30 is the C channel status data bit – the C
X
bits from each sub-frame are collected together
to form the information that tells the receiver
Y
the form of the sound data that is being sent;
Z • bit 31 is the parity bit – this bit provides a simple
method of checking for errors in the data stream.
Parity will be considered further in Chapter 4 of
time
this block. For reasons beyond the scope of this
Figure 11 Data streams for the X, Y course, inclusion of the parity bit also improves
and Z preambles the receiver’s ability to detect the preambles.
Box 6 The AES/EBU frame format

Figure 12 shows the frame format for a two-channel (stereo) transmission. As
the diagram illustrates, a frame simply consists of one sub-frame from each of
the sound channels. Frames are sent at the sample rate of the main sound
channels, so that the system operates in real time. Notice how the preambles
and their types determine the starts of the individual samples and the channels
from which they come. Also shown in the figure is the situation that occurs
between the end of one block of 192 frames and the start of the next block.
sample rate of
original sound signal
preambles
X channel A Y channel B Z channel A Y channel B X channel A Y channel B
32 bit periods subframe subframe time
frame 191 frame 0 frame 1
start of new channel

status block
Figure 12 The AES/EBU frame format
Box 7 The AES/EBU block format

A channel status block consists of 192 consecutive frames. The main purpose of
dividing the data stream into blocks of frames is to provide sufficient control and
status information to the receiver. Clearly there needs to be some information sent
with the main sound data to indicate to the receiver the form of the data – in
particular the number of channels and the number of bits per sample, but there are
a number of other items of information that could be useful to the receiving device.
The channel status bits from each sub-frame in a block are collected together to
provide 192 bits or 24 bytes of data (1 byte is 8 bits). There is channel status
information for each channel, so for a stereo system there are two sets of channel
status information data – which would normally contain almost identical information.
A detailed discussion of the uses to which these 24 bytes of data are put is beyond
the scope of this course, but the list below outlines the more important items of
information that the channel status data contains.
• Basic control data
– information on whether the sound data is for professional or consumer use
(S/PDIF – see later);
– audio or non-audio mode (this allows the connection to be used for general
computer data such as text files etc. instead of audio samples);
– the sample rate;
– whether the data rate is sufficiently stable to be used to generate a
reference clock signal in the receiver when converting the samples back
to analogue (if not, then the receiver will have to generate its own
reference clock signal for this operation).
• Mode
– information on the number and type of sound channels (e.g. mono/stereo
or perhaps two separate tracks);
– the form/use of the user bit data.
• Number of quantisation bits (maximum 24). (continued over page)
• Source and destination identification

– information about where and what type of equipment is supplying the
serial data stream, and its intended recipient. This is used in professional
applications where the sound data might be fed to a number of items of
equipment and/or studios and can therefore be used to allow only the
intended recipient to receive the data.
• Block count and time-code information. This data gives each block an
individual time stamp, and allows the receiver to make sure the sound
samples are received and kept in the correct order.
• A check word to ensure there are no errors in the channel status data.
If an error is found, the channel status data for the particular block can
simply be discarded – there is no need to try to correct the error as the
channel status data between blocks is likely to be the same for each block
(except for the block count and time code information).
The final item of information defined within a block is the user data which is
compiled from the user bit in each sub-frame. These user bits are rarely used
and are normally set to zero, but if they are used, they can be collected together
over a variable number of frames (not the fixed number of 192 frames as for
the channel status data). The channel status data contains the relevant
information about the use or non-use of these user bits.
2.3.2 S/PDIF
The Sony/Philips digital interface (S/PDIF) standard is fully
compatible with the AES/EBU standard described in Section 2.3.1.
The form of the digital sound data and the sub-frame, frame and block
structure is exactly as described in the AES/EBU specification, the
only difference is in the interpretation of the channel status
information that is accumulated over one block of 192 sound samples.
The first bit in the channel status information (i.e. the channel status
bit from the first sub-frame in a block) is used to specify whether the
sound data is for consumer or professional use. If this bit is set, then
the channel status data is to be interpreted as described in the AES/
EBU specification. If this bit is zero, the channel status bits are to be
interpreted as described in the S/PDIF specification (see Box 8).
Important points to note about this specification are:
• serial copy management system (SCMS) is incorporated (this is a
method of preventing multiple copies of a recording being made
which will be described in Chapter 4 of this block);
• the specification caters for 2 or 4 channel sound;
• the source of the sound can be specified (e.g. CD, DAT);
• CD subcode data can be incorporated;
• there is much scope for future expansion.
As mentioned in Box 8, there is a significant amount of redundancy
built into the specification of the channel status data to allow for
future expansion in the capabilities of the interconnection. The uses to
which this interconnection is put, and the types of devices which need
to be interconnected is continually evolving, and the specification can
therefore be enhanced to cater for new applications, whilst still
retaining compatibility with all previous uses.
Box 8 The channel status data in the S/PDIF interface

The physical form and the sub-frame, frame and group structure of the serial
data stream is the same as that of the AES/EBU interface. However, apart
from the first bit of the channel status data (i.e. the ‘C’ bit in the first sub-
frame of every group which specifies the consumer or professional mode), the
remainder of the channel status data is interpreted differently in the S/PDIF
system. Much of the channel status information is at present unspecified, and
thus there is a great deal of scope for future extension in the specification.
Some of the more important items of information contained in the S/PDIF
channel status bits are given below:
• a bit indicating if the data is normal audio data, or non-audio (e.g.
computer data);
• a bit indicating if the sound data is copyrighted or not;
• information about the number of channels;
• information on the type of equipment transmitting the data stream. A
wide range of sources can be specified such as laser optical devices (CD,
MiniDisc), musical instruments (synthesiser, microphone) and broadcast
reception of digital audio (including the country of origin);
• a bit that indicates if the sound is an original recording or whether it is
a copy;
• a number to identify the source of the sound data and a channel number
within this source (e.g. the left channel in source 1). This allows a number of
sources to use the same interconnection in the same way as the AES/EBU
specification does;
• details about the sampling frequency of the sound data and the stability
and accuracy of the clock signal that generated the serial data stream –
to enable the receiver to know if it can use the regenerated clock to
convert the sound to analogue form or not – again as in the AES/EBU
system.
As with the AES/EBU specification, the user data is generally unused. However,
when the source is sound from a CD, the user data can contain subcode data
from the CD. This is information about the start times and durations of the
sound tracks within the CD.
2.3.3 Multi-channel audio digital interface

The basic AES/EBU interface system is fine for the interconnection of
stereo equipment, but is at a disadvantage when many channels of
sound have to be transferred. Although the specification allows more
than two channels to be incorporated (by adding more sub-frames
within a frame, and adjusting the channel status bits appropriately),
the more channels that are incorporated, the faster the basic bit rate of
the serial data has to be.
Why is the bit rate of a serial AES/EBU data stream related to the
number of sound channels it contains? I
The problem with increasing the bit rate too much is that signal loss
and interference on the interconnecting lead increases, and the
maximum length of lead that can be used reduces eventually to an
impractical length.
To solve this problem, Sony, Mitsubishi, Neve and SSL have jointly
produced a specification based on the AES/EBU specification called
the multi-channel audio digital interface (MADI) for transferring up to
56 simultaneous digital sound channels in real time on a single
connecting wire.
The details of MADI are beyond the scope of this course, but in
essence, a sample from each sound channel is formed into an AES/EBU
sub-frame, and 56 of these subframes are strung together to form a
frame that is transmitted within a single sample period. However, to
allow the receiver to decode the data stream correctly and to enable the
required amount of data to be sent using a practical bit rate:
• the bit rate is set to a constant value (100 Mbits per second);
• a different coding scheme is used for the individual bits which
means that a separate timing signal is necessary to ensure the data
can be decoded correctly by the receiver.
To enable a constant data rate to be achieved, sample data for all 56
channels is always sent, but for unused channels it is set to zero. The
channel status data indicates how many of the channels are in use.
2.3.4 Other digital methods

The AES/EBU and related digital sound interfaces are designed
specifically for transferring digital sound between audio devices.
However, computers are now commonly used for sound processing,
and because of this, the interfaces used for transferring data to or from
a computer are now being used to carry real-time sound data – most
often between a computer and an audio device. Notice that I have
specifically mentioned real-time transfers here, since if the sound data
does not need to be transferred in real time, then it can be treated as
any other computer data and transferred at a rate that suits both the
computer and the audio device. This rate of course could be faster as
well as slower than the actual playing time of the digital sound data
that is being transferred.
The Universal Serial Bus (USB) and IEEE 1394 FireWire interfaces in
desktop computer systems are now becoming commonly used instead of
the AES/EBU or S/PDIF interfaces to transfer digital sound data. Both of
these interfaces easily have sufficient capability to enable at least one
stereo channel to be transferred in real time and FireWire in particular has
considerably more capacity. Sometimes these transfer methods are used
simply as a ‘communication medium’ for AES/EBU or S/PDIF serial data.
In other words, the sound data is divided up into AES/EBU sub-frames,
frames and blocks, and the resulting data stream then becomes the raw
data for the USB or Firewire connection – both of which have their
own methods of packing up and sending the data (i.e. their own
protocols). In other cases, the sample data is sent directly in raw USB
or Firewire form without first being encoded into AES/EBU format.
Whichever way the data is sent, as long as each sound sample can be
transferred at a rate equal to or faster than the original sample rate,
then it can be used in real-time. If the data arrives too fast, then it is
simply stored temporarily at the receiving end until needed. A simple
method of doing this will be considered later in this chapter.
2.4 Cables and connectors

2.4.1 Cables
For audio signals, the main job of a cable and its connectors is of
course to transfer the signal, be it analogue or digital, from one audio
device or computer to another. There are three effects that occur in any
cable that carries signals from one point to another:
• loss of signal level (amplitude) along the cable’s length;
• reflections of the signal at each end;
• corruption of the signal by external interference.
The first two of these effects are dependant on the impedance of the
connecting cable. Box 3 mentioned that electrical impedance is a
measure of the resistance to the flow of current in an electrical circuit
and it can also change with frequency. Different designs of cable have
different characteristic impedances, all of which in general have
reducing impedance with increasing frequency.
For the frequency range of analogue audio connections (20 Hz to
20 kHz), the impedance of a cable is only significant in reducing the
signal level for long cable lengths (many metres for a well designed
audio cable). However, in a digital connection which uses the AES/EBU
system and where the bit rate may by around 3 Mbits per second,
the impedance of the cable can be a significant factor in the
reduction of signal level even for quite short cable lengths (a few
metres or so).
From your work on reflections and standing waves in Block 2 you
should be able to see that the second of the effects mentioned above
might also occur at the ends of electrical cables, and these could
disrupt the signal that is being transferred. This is indeed the case,
and in order to minimise this effect, the impedance of the cable should
be matched as closely as possible to the impedance of the source and
the receiver. Again, at audio frequencies these effects are minimal, but
they are significant at the higher frequencies used in digital connections.
The third effect is minimised in two ways:
• by screening the signal wire with the common or ground
connection, i.e. enclosing the signal wire or screening it within a
conducting shield of the ground connection;
• by providing a 2-wire and ground balanced connection.
Both of these were mentioned in Section 2.1.5.
For short analogue audio connections, and for loudspeaker
connections where the signal voltages and currents are very high (as
Box 4 on loudspeaker powers showed you), unscreened cables will
give acceptable results. Otherwise, almost always some form of
screened cable needs to be used. Screened cables vary in diameter, in
the method of screening, and in the construction of the insulating layer
between the signal wire and the screen. Cables designed for balanced
signals have two inner signal wires twisted together to minimise the
effect of interference by ensuring that any interference affects both
signal wires equally. Each type of cable has a different impedance, but
as mentioned above, this is only important for long lead lengths and/or
high frequency signals such as those used in digital audio signals,
where cables with a specific impedance and designed for such uses are
available. Figure 13 shows in more detail than Figure 5 the
construction of a typical screened cable that can be used for both
analogue and digital audio.
inner conductor insulation
shield inner conductor

plastic insulation
covers cable
Figure 13 Construction of a typical screened cable
The digital systems mentioned in Section 2.3 can all be used with an
optical connection, and such inputs and outputs are increasingly being
provided. In this type of connection, electrical signals and wires are
replaced by light and a fibre-optic cable. By a process called total
internal reflection, light can be transmitted down a flexible
transparent tube – see Figure 14.
fibre-optic cable
light
Figure 14 Light travelling down a flexible transparent tube
This effect is now used in many applications from decorative lights to

devices that enable doctors to look inside our tummies. There are three
main advantages of using light as opposed to electrical signals:
• the transmitter and the receiver are electrically isolated from each
other;
• light is not susceptible to external interference;
• light signals can easily cope with very high frequencies.
Now that sufficiently robust fibre-optic cable with an acceptable light
loss per unit length can be manufactured for a reasonable cost, optical
inputs and outputs are increasingly being provided for digital
connections because of the advantages listed above.
2.4.2 Connectors
One of the most common connectors for consumer audio is the phono
plug (Figure 15(a)), but although cheap, these are not very robust, and
do not provide a very secure electrical or mechanical connection (there
is no locking mechanism, for example, that prevents the plug being
accidentally pulled out of its socket). Also very common is the jack plug
which is available in a variety of different sizes. For professional audio,
the 1/4 inch jack plug is the most common connector (Figure 15(b)).
(a) (b) (c) (d) (e) (f)
Figure 15 Common audio connectors: (a) phono; (b) 1/4" jack; (c) TRS jack;
(d) XLR; (e) BNC; (f) DNP optical
For connections where there are two signal wires and the common
ground connection (for balanced connections and where a stereo pair
of audio channels is combined in one connection), a jack plug with an
additional sleeve connection is used (Figure 15(c)). In this case, the
two signals wires are connected to the tip and the ring, and the ground
connection to the sleeve (where there is only one signal wire it is
always connected to the tip, and for a stereo connection the left
channel is always connected to the tip). This type of jack plug is
sometimes referred to as a TRS (Tip, Ring, Sleeve) connector.
For professional applications another common connector is the
three pin XLR connector (Figure 15(d)). These connectors are very
robust, and lock together so that they cannot accidentally be pulled
apart. Pin 1 is always the earth or ground connection, and pins 2
and 3 are the balanced signal connections (pin 2 only is used where
there is a single signal connection). XLR connectors are always used
for microphones that require a +48 volt phantom power supply (see
Section 2.1.6).
At present there is no standard connector for digital links (AES/EBU etc.)
although one of the above types is usually used. In consumer devices a
phono plug is commonly used, whereas an XLR connector is common
with professional devices. For high frequency digital connections
(e.g. MADI), a BNC connector is used to minimise losses and
reflections caused by impedance mismatching (Figure 15(e)).
For optical digital links, a special connector must be used that efficiently
transfers the light between the end of the optical fibre and the light
source/receiver. There are a variety of different connectors in use, but a
common one in the consumer field is the DNP connector shown in
Figure 15(f). DNP stands for ‘dry non-polish’ and these connectors are
designed for easy attachment of the optical fibre without glue (‘dry’) or
the need to polish the end of the fibre after cutting. Of course the
consequence of this easy connection system is more light loss at the
connection, but this is not significant given the intended use of these
connectors for short cables in consumer applications.
What type of cable and connector is suitable to use for the connection
between a microphone that supplies a balanced output and the
microphone input on an audio device, assuming the microphone is
being used ‘on location’? Explain your answer.
2.5 Inputs and outputs provided on the AW16G

The AW16G professional audio workstation has a range of audio
inputs and outputs designed to suit the intended range of applications.
Figure 16 is a photograph of the back panel of the unit that shows the
various connectors it provides.
Figure 16 Back panel view of the AW16G
2.5.1 Inputs
There are eight analogue combined microphone and line inputs, which
all offer balanced connections. Since they are dual purpose inputs
(they can be used for both low and line-level signals), they must be
able to cope with a wide range of signal levels without causing
distortion. To do this, each input has an associated level control
that can be adjusted to provide a suitable signal level for the main
recording and mixing sections of the device – indeed these controls are
marked ‘LINE’ and ‘MIC’ to indicate the setting for each type of input
(Figure 17). In other devices there may be a switch to select the input
sensitivity, or sometimes two sets of inputs are provided, one for low-
level and one for line-level signals.
Figure 17 AW16G input level controls
Inputs 1 and 2, shown on the right in Figure 16, are designed

specifically for microphones and use XLR connectors with +48 volt
phantom power available if needed.
Inputs 3 to 8 all use 1/4 inch jack plugs with the tip–ring–sleeve (TRS)
arrangement to provide a balanced 2-wire plus ground connection.
Input 8 has an additional jack connector with a high impedance input.
This can be used for high impedance audio transducers such as some
types of guitar pickup and piezoelectric microphones as mentioned in
Section 2.1.3.
All eight inputs have a specified input level range from –46 dBu to
+ 4 dBu and can therefore cater for a wide range of signal sources from
a balanced condenser microphone that requires phantom power
through to a line-level signal from a synthesiser. Remember though
that none of these are stereo inputs, so two channels must be used if a
stereo source is to be connected. This means that the device is limited
to just four stereo signal sources at any one time.
If 0 dBu is defined as a voltage level of 0.775 V, what voltage

does –46 dBu represent? Explain your answer. I
2.5.2 Outputs
The AW16G provides three stereo analogue unbalanced outputs:
• a main line-level output (nominal level –10 dBV) on separate left
and right jack connectors;
• an auxiliary line-level output (nominal level –10 dBV) on separate
left and right jack connectors designed for monitoring purposes or
for adding effects;
• a headphone output, with a nominal power output of 100 mW,
provided by a single TRS jack connector.
This is a fairly basic range of outputs, but again provided to satisfy the
intended uses of the unit. A more sophisticated mixer unit would
probably provide additional outputs such as a line-level output for
each input – or more probably a combined input–output connection
using a TRS jack plug that can be used to add or punch in effects to a
particular input channel.
2.5.3 Digital I/O

The AW16G provides two digital S/PDIF connections, one provides a
stereo digital output and the other a stereo digital input.
ACTIVITY 16 (COMPUTER, PRACTICAL) ....................................................
In this activity you will use the course’s music recording and editing
software to make a recording and so learn a little more about the
program’s features and operation. You will find the steps associated
with this activity in the Block 3 Companion. I
3 STORING SOUND
Once the basic sound signals for a recording arrive in a desktop

sound device, they must somehow be stored. The mix-down,
editing and effects stages then require the sound to be replayed,
processed and re-recorded again before the final master recording is
produced. The requirements for a suitable method of storage are
different depending on the stage in this process. Table 1 lists the
main recording activities, and the requirements that the situation
places on the recording method.
Table 1 Storage requirements for various stages in producing a recording
Recording activity Storage requirements

Recording a live performance • May require many tracks
• At least one hour’s continuous recording
for all tracks in real time
• High quality
• Could use write-once technology
Editing/mixing • Many tracks
• Instant access to any track/section of a recording
• High quality with (ideally) no degradation
of quality through re-recording
• Recording and playback in real time
although this may not be needed for the
whole recording in one go
Final master recording • Can use write-once media
• Recording must not degrade over time
• Probably only require stereo (2-track) capability
• Playback must be able to be done in real time,
but recording may be able to be done over a
longer time
Distribution of a recording • Cheap
• Playback only
• Lower quality may be acceptable
• Probably only stereo capability needed
In this section I shall ignore the distribution stage as the various

analogue and digital storage methods used here will be considered in
Chapters 4 and 5 of this block.
Since analogue sound recording has now almost completely been
superseded by digital storage methods, I will not consider analogue
methods any further here. However this topic will be revisited
in Chapter 5 when the history of sound recording is described.
Note in passing though that one of the emerging questions with
regard to any recording system – analogue or digital – is how well it
stands up to the ravages of time without degrading to such an extent
that the recording becomes lost forever. Chapter 5 in this block
contains a section on audio restoration that will look at this in
more detail.
3.1 Current digital storage systems

At the present time (2004), the main choices for the digital storage of
sound are:
• compact disc (CD) – in all its various forms (CD, CD-R, CD-RW)
• MiniDisc (MD)
• super audio CD (SACD) and the audio digital versatile disc (DVD-A)
• digital magnetic tape (DAT and ADAT)
• hard disk
• electronic ‘solid state’ memory (like computer memory).
All of these have advantages and disadvantages in terms of cost, ease
of use, quality (in terms of the maximum number of bits per sample
and sample rate that can be accommodated), portability and storage
capacity (which limits the continuous recording time and/or the
maximum number of simultaneous tracks).
ACTIVITY 17
(EXPLORATORY) ................................................................
The recording of a live musical performance lasting 45 minutes

requires the use of 6 microphones. For each of the first three recording
stages mentioned in Table 1 which of the above recording systems do
you think might be able to be used? Assume the mix-down stage is
carried out after the performance has been recorded.
Comment
First of all, for reasons of quality as will be explained in Chapter 4, the
MiniDisc is not recommended for use in any of the mastering stages
(although, many people do use MiniDiscs in such situations, and do
obtain good results).
For the initial recording of the performance, some form of multitrack
recorder is needed. This limits the choice to DVD-audio, ADAT and
hard disk as apart from solid state memory all the others can only
record one stereo channel. Solid state memory is a definite possibility
for the future, but at present the practical capacity of a solid state
digital recorder is not sufficient to record 45 minutes of high
quality 6-track sound. In addition, some types of solid state memories
lose their contents when the power is switched off which makes
recordings rather vulnerable.
For the mix-down stage, the type of recorder used to record the
multitrack originals would have to be used again, but the mixed-down
version could be recorded on any of the above systems. However, this
assumes the mix-down stage does not involve any editing of the
material. If this is needed as well, then a hard disk or solid state
recorder would be the ideal choice as they both allow instant access to
any part of the recording. Again though, fully solid state recorders are
not at present a practical proposition.
The final master stereo recording needs to be stored in a permanent
form where there is no loss of quality, either through the recording
system, or over time. The choices here are any writeable version of CD
(including SACD or DVD-A if writeable versions exist), digital
magnetic tape and possibly hard disk (or any type of backup storage
designed for computer data). I
From the above activity, it is clear that hard disk recording is one of
the most useful formats to use in all the stages of the production of a
stereo master recording. In addition, although use of electronic
memory for the storage of complete recordings is not practical at
present (in 2004), such memory is used in conjunction with hard disk
stores (and indeed with most other digital storage methods) for the
temporary storage of digital audio data during processing. So it is
worth looking at both of these types in a little more detail here.
Discussion of the other systems will be left until Chapters 4 and 5.
3.2 Hard disk audio recorders

Digital sound recording using a hard disk – particularly in conjunction
with some solid state memory – is now one of the most useful and
versatile storage formats to use in all the stages of the mastering process.
A hard disk audio recorder uses the same or a similar device to that
used in a desktop computer. However, it was not until the late 1990s
that the capacity of such hard disks became sufficient to enable a
practical amount of digital sound data to be stored in a conveniently-
sized unit which thus opened the way for desktop hard disk sound
recorders to be produced. A quick example will illustrate this.
The capacity of a normal audio CD is around 640 Mbytes of memory.
This is sufficient to store a little over 70 minutes of stereo sound.
Clearly then to be a practical proposition for sound storage a hard disk
must be able to store at least this amount of data. However, in 2004,
hard disks commonly have capacities of 50–200 Gbytes, and this is
increasing all the time.
ACTIVITY 18
(SELF-ASSESSMENT) ...........................................................
How many ‘CDs’ worth of stereo sound can be stored in a 64 Gbyte

hard disk? I
As your answer to the above activity should have told you, a 64 Gbyte
hard disk can store well over 100 hours of stereo sound, or 50 hours
of 4-track sound, 25 hours of 8-track sound, etc. (all assuming the
standard CD format of 44 100 samples per second for each channel,
each quantised using 16 bits).
Box 9 gives some brief details about the construction and operation of
a hard disk unit often called a hard disk drive (HDD).
One of the major advantages of hard disks, particularly for editing, is
that the time needed for the hard disk to retrieve any section of a
recording is roughly the same, and this can dramatically reduce editing
time (this is known as a random access device). This is different from
a tape system where it clearly takes longer to access, say, the end
section of a recording when the tape is positioned at the start than it
would do to access a point half way through.
The other aspect of hard disks that has to be considered is their read/
write speed and, coupled with this, the time between instructing the
disk to read or write a particular section, and the time when the data
actually starts to be stored or read. There are two components to this
Box 9 Hard disk drives

A hard disk drive (HDD) contains one or more rotating disks on a single spindle,
each disk is coated on both sides with a magnetic material.
A series of very small read/write heads, one for each disk surface, is mounted on
an arm such that the heads can move radially across the surface of the disks as
they rotate (see Figure 18). Electrical pulses representing the digital data to be
stored are applied to a head and produce a magnetic field that causes the small
section of the disk currently under the head to become magnetised. The digital
data is thus stored on each side of each disk as sets of concentric circles or tracks
(not to be confused with audio tracks in a recording) of small magnetic areas.
Figure 18 Hard disk

storage unit
To aid location of the required data, the data is divided into radial sections called
sectors as illustrated in Figure 19. To read the data, the read/write head is
positioned at the correct track, and the magnetic areas on the disk under the
head cause small electrical signals to be generated in the head as the disk rotates
under it. Re-recording data is done simply by overwriting old data with the new
data. All the read/write heads are connected to a single actuator, so they move in
and out of the disk together. They can be used simultaneously, but to reduce the
amount of electrical circuitry required, they are usually only used one at a time so
that the read/write circuitry can be shared between all the heads. This means
that data is recorded only in a single serial stream of bits.
sector
disk
track recording
surface
Figure 19 Layout of data
on the surface of a hard disk
To minimise the access time for data, the disks rotate continuously as long as
power is applied – this means that there is no delay while the disk(s) reach the
correct rotational speed. To reduce wear on the disk, the heads are kept out of
direct contact with the disk, and they float on a cushion of air of typically 20 µm
(20 millionths of a metre) thick. If a head should touch the surface of a disk
through wear, a fault or excessive shock, then this can damage both the head
and the disk causing a disk crash to occur. Often this is fatal, and the whole
disk unit will have to be discarded. Although it is sometimes possible to recover
some data, inevitably some will be lost.
time – the time needed for the read/write heads to move to the correct
track, called the seek time, and the subsequent time spent waiting for
the required sector to appear under the read/write head as the disk
rotates, called the access time. Sometimes, the access and seek times
can cause problems, particularly for simultaneous multitrack recording.
This is because data is not necessarily stored in consecutive sectors or
tracks on each disk, and so during recording the heads will need to be
repositioned a number of times.
To overcome this problem, and as long as the disk write speed is
sufficiently fast, a hard disk audio recorder will contain some
temporary memory for data in transit consisting of computer-type
solid state memory. This memory is often made quite large so that it
can store temporarily a significant amount of sound (minutes rather
than seconds). For recording, this allows the recorder to take in the
regularly occurring sound samples, store them temporarily in the
temporary memory, and then send the samples to the hard disk in
bursts whenever a reasonably sized block of data has been accumulated
as illustrated in Figure 20. This provides spare time to cater for the
hard disk’s seek and access times, but it is only possible if the write
data rate is sufficiently fast to allow this to happen. The reverse
occurs on playback.
data written in bursts
regular digital samples

spare time to allow for the hard
disk seek and access times
audio in temporary
memory
analogue-to-digital
converter
hard disk
Figure 20 Using temporary solid state memory to cater for hard disk seek and access times
Now that hard disk units have become so compact and robust and are
able to store practical lengths of multichannel sound in real time, they
are now being used throughout the recording, editing and mastering
processes. There is however the disadvantage that the disk units in
audio recorders are usually permanently installed in the device and
are not designed to be changed by the user. So, once a disk is full, the
data must be transferred elsewhere before more sound can be recorded
(assuming the original recordings cannot just be overwritten).
3.2.1 Hard disk recorders versus desktop computers

One question that you might be asking at this point, is why use a
dedicated hard disk audio recorder, why not simply use a general-
purpose computer with suitable software? For many situations, a
desktop computer may prove perfectly adequate, and may well be
able to carry out all of the recording, editing and mastering
processes – as you will see in the practical work. However, a
dedicated hard disk recorder is designed with the recording and

editing of digital sound in mind, and as such would provide:
• multiple inputs and outputs

• multi-channel recording
• a wide range of easier to use facilities
• a physically more compact and convenient device to transport to
recording locations
• better real time operation.
Even the first two of the above features can often be solved for desktop
computers by purchasing a special peripheral device that provides a
number of audio inputs and outputs. Such devices might simply
comprise a set of inputs and outputs and associated analogue-to-digital
and digital-to analogue converters, whereas more sophisticated devices
might contain mixing and equalisation controls for each input and
other features such as level metering. Indeed at the time of writing
(2004) the trend is for manufacturers to produce audio controllers that
are really only sophisticated computer peripherals, and which use the
computer for storage and control purposes. Such devices contain
analogue inputs and outputs and various associated controls and are
connected to the computer by a high speed connection such as IEEE
1394 FireWire. The computer’s hard disk is used for storage of the
digital audio. The control software on the desktop computer has a
comprehensive user interface as well as the ability to read and
configure the settings on the peripheral device. Often, these devices
are even designed to be controlled by standard audio recording/mixing
software packages.
In technical terms, the last point in the list above is probably the most
important advantage of a dedicated hard disk recorder. A general-
purpose computer carries out a myriad of tasks in the background such
as updating the time of day, checking the inputs (network connection,
modem etc.) for new data. These tasks may introduce processing
delays that at best could slow down the post-recording non-real time
processes such as editing and mixing, but at worse could cause sound
samples to be missed in a live recording.
3.3 Solid state memory

Electronic or solid state memory has no moving parts and the data
storage is achieved purely by electronic means. There are two main
types that are relevant to the storage of digital sound data – random
access memory (RAM) and flash memory.
3.3.1 Random access memory

In the previous section, I made mention of a quantity of electronic
memory used to store temporarily digital sound data in transit to or
from a hard disk unit. For this purpose, random access memory
(RAM) is likely to be the type of memory used because of its very fast
operation and the fact that the access time for any part of the memory
is the same – see Box 10.
Box 10 Random access memory
Random access memory (RAM) is electronic solid state memory that usually
loses its contents when the power is switched off and so cannot be used for
long-term data storage. Such memory is called volatile memory and is in contrast
to hard disk storage which retains its memory when the power is turned off
and is therefore called non-volatile memory. However, the advantage of RAM
is that not only, as its name implies, is it random access (i.e. the access time
for any part of the memory is the same) but the read and write speed is
extremely fast – much faster than for a hard disk – and there are no additional
access or seek times.
At the time of writing (2004), memories are available in sizes up to a few
gigabytes, but bit-for-bit, RAM is much more expensive than hard disk storage.
In fact, some RAM memory will be present in all digital recorders and
audio workstations and it is just the same sort of memory found in
desktop computers. In audio workstations and when computers and
their hard disks are used to record or playback sound it is simply used
as short term intermediate storage as described in Section 3.2 above. In
audio workstations (and desktop computers), because of its very high
read/write speed, RAM is also used as the working memory for editing
operations, with sections of sound being swapped between the main
permanent storage medium (hard disk, digital tape etc.) and RAM as
and when required.
The solution to the problem of RAM being volatile is either to use a
battery back-up supply, or to use special non-volatile RAM, but this is
more expensive than volatile RAM, and may also take longer to write
data although read times are usually similar.
However, a relatively new type of solid-state memory called flash
memory is now becoming very popular.
3.3.2 Flash memory

The second type of solid state memory that is used in audio
applications is flash memory. This type of memory is semi-permanent
memory in that once data is written to it, it remains stored even when
the power is switched off, but by applying special electronic signals to
the memory, some or all of it can be erased and then subsequently
reprogrammed – see Box 11.
Flash memory is now being used in many areas in the audio field, and
is available in a number of forms. The Multi-media card (MMC) is a
credit card-sized removable flash memory that is used in portable
battery-powered audio players and other hand-held devices such as
organisers, palmtops and electronic books. The ATA flash card is
again around the size of a credit card, but in this case it contains
electronic circuitry to make the flash memory appear like a normal
hard disk. Thus it can be used in portable computers as an additional
removable memory.
Although solid state memory is comparatively expensive when
compared to hard disk or digital tape, the amount of solid state
Box 11 Flash memory

Flash memory is sometimes called flash read only memory (flash ROM) and
this is more descriptive of the way this type of memory operates.
Data can be stored in and read from flash memory in just the same way as
RAM – access times are the same for all parts of the memory (they are random
access devices) and the actual access times are similar to those of normal
RAM (although write times are often a little longer).
The difference here is that flash memory is non-volatile. However, unlike RAM,
once data has been stored it cannot be altered unless it is first erased.
Erasure of the data stored in a flash memory is carried out in blocks by applying
special electrical signals. A single operation will erase a complete block of the
memory in one operation (i.e. in a flash, hence the name).
The disadvantage of this scheme is that small parts of the memory cannot be
erased in isolation, and so it is not particularly suitable for use in editing
operations. A further disadvantage is that the number of write/erase cycles is
not limitless, and at some point after many such cycles the device will fail.
Typical guaranteed write/erase cycles at the time of writing (2004) are 300 000.
Typical maximum sizes of flash memory cards at the time of writing (2004)
are up to a few gigabytes, but this is increasing rapidly.
memory (RAM and/or flash) that can conveniently be incorporated in

audio devices can now be such as to store significant quantities of
multichannel sound.
Why is RAM not suitable for use in a portable solid state audio player? I
3.4 Audio file formats

No discussion of the storage of digitised sound would be complete
without mention of the common computer file formats which are used
to store sound. Such file formats are used not only to store digital
sound data in computers (on hard disk, CD-ROMs etc.), but are also
used when sound is transferred between computers or transmitted
over the Internet. Audio workstations may also use such formats (or
will certainly be able to input and output in such formats even if they
are not used internally by the device).
There is a large range of sound file formats, some are proprietary and
are used with particular sound devices or particular computer
programs. The ones I want to look at here are three of the most
common formats. Note that I am not considering here MIDI file
formats that store music codes (rather than digitised sounds), these
will be considered in Chapter 3 of this block.
However many bits are used to quantise sound samples, the vast
majority of file systems store digital audio data as a series of bytes
(8-bits), and so these sound formats are described in terms of the
meaning of these data bytes as they are read from the file. In addition
to the actual sound samples any file that stores sound data must also
contain information on the form of the data, in particular the sample
rate, quantisation size and number of channels (or tracks). Without

this knowledge, when the file is read (as a string of bytes) there is no
means of knowing where the data for one sound sample ends and the
next begins, neither is there any way of telling to which channel it
belongs (or indeed how many individual sound channels there are) or
how fast to ‘play back’ the samples if the sound is subsequently to be
heard.
Thus each file format has at its start a header section that first
indicates the file type and then defines the above parameters (and a
number of others), before the sound samples themselves are stored.
3.4.1 AU
The AU (AUdio) file format is probably the simplest of formats for the
storage of digital sound samples. It was originally developed by Sun
Microsystems for use on computers using the Unix operating system
and it is commonly used for storing sound in a compressed form
suitable for transmission over the Internet as the compression results
in a very compact file.
An AU file is split into three sections:
• a header section that contains information about the form of the
digital sound samples – Box 12 gives brief details about the data
contained in this section;
• an optional comment section that can contain general textual
information (e.g. name, copyright details and so on);
• the sound data itself, stored in the form indicated by the parameters
in the header section.
If there is more than one track (e.g. two tracks forming a stereo
channel) the sound data is interleaved between each. So, for example,
if there are 2 tracks, sample 1 from track A will be stored first, then
sample 1 from track B, this is followed by sample 2 from track A and
then sample 2 from track B and so on.
Can you think of a reason why the data from each track should be
interleaved rather than storing them as separate blocks of data?
Comment
Interleaving the track data allows the sound stored in the file to be
played as it is being read. If the tracks were separated, then the reading
device would have to continually jump between sections of the file to
read individual samples. This may slow down the reading process so
much that playback in real time is not possible unless the data from all
the tracks is first read into temporary RAM memory. I
3.4.2 AIFF
The Audio Interchange File Format (AIFF) is based on a standard file
format called the Interchange File Format (IFF) originally developed
by a company called Electronic Arts. AIFF is the version of this format
developed in the late 1980s for audio data. In Chapter 3 of this block you
will be introduced to the MIDI or music code version of the IFF format.
Box 12 AU header section

The header section in an AU format file contains the information listed below.
• The 4-character text string ‘.snd’ to indicate that it is a sound file in the
AU format.
• The size of the header and comment sections given in terms of the number
of bytes they occupy. This parameter provides an ‘offset’ to the start of
the actual sound data. So, for example, if this parameter is, say, 60 this
will indicate that the first 60 bytes of the file comprise the header and
comment sections and the 61st byte is the first byte of the actual sound
data. If this value is zero, there is no comment section, and the sound data
starts at the 25th byte since the header has a fixed length of 24 bytes.
• The size of the sound data in terms of the number of bytes (this is not
usually the same as the number of sound samples). When the file is created,
sometimes it is not possible to know in advance how large the sound data
is going to be, e.g. in a live recording that is being stored directly in AU
format on a hard disk. In such instances the header has to be stored before
it is known how long the recording is going to be. So, to cater for this, a
special value of −1 is used to indicate that this parameter is unknown.
When the file is read, the reading device can simply go on reading samples
from the file until the end of the file is reached, or it can work out the data
size by subtracting the header size (or 24 if this is 0) from the file size.
• The encoding method for the sound data. There are a number of alternatives
in terms of sample bit size (8, 16 etc. bits per sample) and whether the
data is compressed or not, and if so the compression method. Digital data
compression is explained in Chapter 4 of this block.
• The sample rate in terms of the number of samples per second.
• The number of individual sound tracks.
The basic building block of the IFF is a chunk which is an identifiable

separate block of data. This structure allows other types of data (e.g.
MIDI data) to be included in a file of audio data and also caters for
future expansion by allowing new chunk types to be specified. Any
device not able to recognise a particular chunk type simply ignores the
chunk and moves on to the next. Chunks can be nested in that one
chunk may contain a number of smaller chunks within it.
Every chunk contains three sections as illustrated in Figure 21:
• the chunk identification (a four-character word);
• the chunk size in terms of the number of bytes of data that follow;
• the actual chunk data (which sometimes starts with a chunk type).
For AIFF, there are a number of different chunks
that can be specified, but only three of these are chunk ID
chunk size
required for a file containing some sound data:
• a header chunk; chunk data
• a common chunk;
• a data chunk.
Figure 21 Basic IFF
The header chunk (Box 13) acts as a sort of chunk format
‘container’ chunk for the other ‘local’ chunks.
The common chunk (Box 14) contains information about the number
of sound tracks, the number of bits per sample and the sample rate etc.
Finally, the data chunk (Box 15) contains the actual sound samples.
Box 13 AIFF header chunk

The AIFF header chunk acts as a container chunk for all the other chunks that
contain the sound samples and associated data. In other words, this chunk
contains a number of other subsidiary chunks (called local chunks).
The 4-character header chunk identification is always ‘FORM’.
The ‘size’ part of this chunk contains the number of bytes of data in the chunk
(excluding the bytes used for the chunk type and the chunk size). In this case
this is the sum of the sizes of all the local chunks that follow plus 4 bytes for
the FORM type specification (see below).
The ‘data’ part of this chunk firstly contains the FORM chunk type which is the
4-character word ‘AIFF’ to indicate an AIFF file type. Following this are the
complete local chunks containing the audio parameter data and actual sound
samples.
Box 14 AIFF common chunk

The AIFF common chunk contains details about the fundamental parameters
of the sampled sound. The 4-character common chunk identification is ‘COMM’.
The chunk size follows which for this chunk is always 18.
The ‘data’ part of this chunk contains the following information about the
digital sound data stored in the subsequent data chunk:
• the number of individual tracks of sound;

• the number of sample frames (see below);
• the number of bits used to quantise each sample (from 1 to 32);
• the sample rate in sample frames per second.
A sample frame is defined as the data required for one sample point, which
means one sound sample value from each track. So, if there is only one track,
the number of samples will be the same as the number of frames, if there are
two tracks, there will be twice as many samples as frames. In general therefore
the total number of samples is the number of frames times the number of
tracks. Thus the sample rate parameter will in fact just be the basic sample
rate for each track.
Box 15 AIFF sound data chunk

The AIFF sound data chunk contains the actual sound samples. The 4-character
sound data chunk identification is ‘SSND’.
The chunk size which follows specifies the number of bytes contained in the
chunk (excluding the 8 bytes required by the chunk type and size).
The main part of the ‘data’ portion of the chunk contains the actual sound
samples. Before this, there are two additional parameters – the offset and the
block size – that can be used to create fixed-size blocks of sound data. This
can be used in some instances to ensure real-time recording and playback of the
sound. If this feature is not needed, these parameters are both set to zero.
The actual sound samples are then stored interleaved between tracks like the
AU format.
Each sample uses an integral number of bytes, with unused bits being padded
out with zeros. So, if the number of bits per sample is between 1 and 8, then
1 byte will be used for each sample, if the number of bits is between 9 and 16
then 2 bytes will be used, between 17 and 24 bits will used 3 bytes, and 25 to
32 bits will use 4 bytes.
If there is more than one sound track, the data for each is interleaved
within this data chunk. Figure 22 shows the form of an AIFF sound
file and illustrates how the common and data chunks are contained
within the header chunk.
‘FORM’
chunk size
‘AIFF’
‘COMM’
chunk size
COMM chunk
sound parameter data AIFF form chunk
(contains COMM
and SSND chunks)
‘SSND’
chunk size
SSND chunk
sound sample data
Figure 22 Basic AIFF sound file format
In addition to the above basic three chunk types, there are a number of
optional chunks that can give more information about the form and use
of the sound data. These are outlined below.
• Marker chunk. This chunk allows one or more particular points in
the sound data to be marked and given a name. This information
can be used for any purpose, but a common use is for specifying
loop points where the sound data is to be used as wave data for a
synthesiser (see Chapter 8 in Block 2).
• Instrument chunk. This chunk is used in conjunction with the
marker chunk to specify further details about how a synthesiser
should play back the sound – of course assuming the sound data
chunk contains wave data. Details such as tuning, note range, key
velocity and volume can be specified with this chunk.
• MIDI chunk. This chunk contains MIDI data, but not usually actual
note data. More usually this is used to store control (system
exclusive) data for an electronic instrument. (Note, the MIDI system
will be explained in detail in Chapter 3 of this block, so do not
worry if you do not understand anything about this chunk now –
the details are given here just for completeness.)
• Audio recording chunk. This chunk allows the channel status
information and user data contained in an AES/EBU or S/PDIF
signal to be included in the sound file (see Section 2.3 earlier).
• Application specific chunk. This chunk can be used for any
purpose – specific to one type of device or more general
information. For example this chunk could be used by a sound
editor program to store an edit list (see Section 1.2.2 above) and
other program-specific data.
• Comments chunk. The comment chunk can be used for general

comments about the sound data contained in the AIFF chunk. In
addition, comments are given a creation time stamp and can also be
linked to markers to allow more detailed information to be given
about one or more marker points.
• Name, Author, Copyright and Annotation text chunks. These four
chunks consist solely of textual data and as their names imply are
designed to contain information about the name of the sampled
sound, its creator, any copyright details and any further related
comments about the sound data.
3.4.3 RIFF WAVE

The RIFF WAVE file format, often shortened to WAVE or WAV, is a
very common sound file format originally developed for use with
Windows-based personal computers. Like AIFF it uses chunks of data
and is structured in a very similar way to this file format. In this brief
study of the RIFF WAVE format I will look only at the basic chunk
types and make passing mention of some of the other more common
optional extras.
Like AIFF, only three chunks are required for a file containing some
digital audio samples:
• a header chunk;
• a format chunk (the equivalent of the AIFF common chunk);
• a data chunk.
Also like AIFF, the header chunk (Box 16) acts as a ‘container’ chunk
for the other ‘local’ chunks. The format chunk (Box 17) contains
information about the form of the digital sound data, and the data
chunk (Box 18) contains the actual sound samples. Again, if there is
more than one track, the data from each one is interleaved.
Figure 23 shows a RIFF WAVE chunk containing a format chunk and a
DATA chunk.
‘RIFF’
chunk size
‘WAVE’
‘FMT ’
chunk size
FORMAT chunk
sound parameter data RIFF WAVE chunk
(contains FORMAT
and DATA chunks)
‘DATA’
chunk size
DATA chunk
sound sample data
Figure 23 Basic RIFF WAVE sound file format

Box 16 RIFF WAVE header chunk

The RIFF WAVE header chunk acts as a container chunk for all the other chunks
that contain the sound samples and associated data. The 4-character header
chunk identification is always ‘RIFF’ which stands for Resource Interchange
File Format.
The ‘size’ part of this chunk contains the number of bytes of data in the rest
of the file.
The ‘data’ part of this chunk firstly contains the RIFF chunk type which is the
4-character word ‘WAVE’ and following this are the local chunks containing the
sound data, each with its own chunk type, size and data.
Box 17 RIFF WAVE format chunk

The RIFF WAVE format chunk contains details about the fundamental parameters
of the sampled sound. The 4-character format chunk identification is ‘FMT ’
(the last of the 4 characters is a space).
Following this is the chunk size which gives the number of bytes in the chunk,
excluding the 8 bytes used by the chunk type and size data.
The ‘data’ part of this chunk contains the following information about the
sound data that is stored in the subsequent data chunk:
• the format of the sound data;

• the number of individual tracks of sound;
• the sample rate in frames per second (a frame being the sound samples
from each channel for one sample point as in the AIFF format);
• the number of bytes of sound data (which is not necessarily the same as
the number of sound samples) that needs to be read from the file each
second if the sound is to be played in real time without breaks (see below);
• the number of bytes needed to store one frame of sound data (this is
calculated by rounding up the result of the calculation (no. of channels) ×
(no. of bits per sample) ÷ 8);
• the number of bits per sample.
The parameter giving the number of bytes of sound data that must be read
from the file each second in order that the sound can be played directly as it is
read and without breaks is a useful figure to enable the reading device to
determine how much temporary memory needs to be allocated to cope with
the file access time (see Figure 20 earlier).
Box 18 RIFF WAVE sound data chunk

The RIFF WAVE sound data chunk contains the actual sound samples.
The 4-character sound data chunk identification is ‘DATA’.
The chunk size which follows specifies the number of bytes contained in the
chunk (excluding the 8 bytes required by the chunk type and size).
The ‘data’ portion of the chunk contains the actual sound data. As with AIFF,
the data is stored in track order within sample point order.
Each sample uses an integral number of bytes, with unused bits being padded
out with zeros.
Now this is where the situation gets a little bit more complicated!
Instead of using one data chunk the sound samples can be divided up
into a number of chunks within another type of chunk called a ‘LIST’
chunk. There are two possibilities for audio data chunks within a LIST
chunk – a DATA chunk as described in Box 18 and a ‘SLNT’ or silent
chunk. This chunk indicates a section of silence and it contains a single
parameter indicating how many ‘sample times’ of silence tare to occur.
The idea behind this is to reduce the file size by not having to incorporate
a large string of zero samples for long periods of silence.
To complicate the situation even further, other chunk types are allowed
inside a LIST chunk, but any further discussion of this is beyond the
scope of the course.
A RIFF WAVE file contains a stereo channel of 16-bit sound samples,

sampled at 44.1 kHz. What is the value of the RIFF WAVE header
parameter indicating the number of bytes per second that must be read
from the file if it is to be played back without breaks as it is read? I
As I mentioned above, there are a number of optional additional

chunks that can be incorporated into a WAVE file. A number of these
have equivalents in the AIFF format, but of course are slightly
different in their format and content. For illustration only, a selection
of some of the common additional chunks is given below.
• Fact chunk. This can be used to store important information about
the contents of the file. It is a required chunk if the audio data is
divided into a number of separate chunks within a LIST chunk.
• Cue chunk. Contains cue points similar to the AIFF marker chunk.
• Playlist chunk. Contains a play order for a series of cue points
contained in the cue chunk.
• Associated data chunk. Contains names or labels and other
descriptive text for cue points defined in the cue chunk.
• Text chunk. Contains arbitrary text associated with the audio data.
• Instrument chunk. Contains the equivalent information as outlined
in the AIFF instrument chunk.
In this activity you will use the course’s sound editing software to
create a short AIFF file and a short RIFF WAVE file. You will then
examine these to see if you can identify the various chunks that have
been described above.
Run the course’s sound editing software and follow the steps
associated with this activity in the Block 3 Companion.
Comment
Unfortunately, without the use of a program that displays each file’s
data in an understandable form (and a detailed knowledge of the file
formats which is not given here), you cannot confirm that each file
conforms exactly to its specification. Such an investigation is beyond
the scope of this course, but I hope that this activity has demonstrated
that actual sound files do seem to use the various chunks of data
described above. I
3.5 Storage facilities in the AW16G

The AW16G audio workstation contains a standard hard disk drive
that is commonly available for use in computers (2.5" IDE as shown in
Figure 24). The standard disk size is 20 Gbyte, but as there is a standard
connection for the hard disk, this can easily be replaced by larger capacity
disks.
There is also RAM memory which is used both as a buffer for the hard
disk, and as a working store for editing operations as described earlier.
As mentioned in the introduction to the AW16G, there are 16 separate
sound tracks available plus a stereo channel. The read/write speed of
the disk allows all 16 tracks plus the additional stereo track to be
replayed simultaneously, but only 8 tracks plus the stereo channel
to be recorded simultaneously. Notice therefore the implied difference
in read and write speeds of the disk and its control circuitry.
However, associated with each track are eight ‘virtual tracks’. These
can be thought of as alternatives for each track that the user can select
when reading from or writing to the hard disk. For each track only
one of these can be selected at a time, but this facility allows the
user to record different takes on different virtual tracks. This gives
the device a maximum capacity of 144 tracks (8 virtual tracks for
each of the 16 separate tracks and the stereo output track).
The workstation also has a slot for an optional CD drive that can be
used with both standard CDs and writeable CDs. As well as being able
to read and create audio CDs, the workstation can also read and write
RIFF WAVE files to CD-ROMs. This means that individual tracks can be
saved onto a CD as separate sound files and the files then transferred to a
computer for subsequent editing and processing if this is more convenient
than using the facilities offered by the AW16G. Finally, the CD drive can
be used to create a backup copy of the files associated with the songs
recorded on the hard disk (including play lists and set-up information
etc.). This also means that when the hard disk becomes full, the songs
it contains can be archived onto CDs and then the hard disk cleared to
enable more material to be recorded. If previous material is needed again,
it can simply be loaded into the hard disk again from an archive CD.
Figure 24 is a photograph of the inside of the AW16G with the hard
disk, the CD drive and two RAM memory devices marked.
RAM memory
hard disk drive
CD drive
Figure 24 Photograph of the inside the AW16G audio workstation showing the
hard disk drive, the optional CD drive and two RAM memory devices
4 EDITING
We have now looked at the basics of getting sound into and out of an
audio recording device (or computer), and how it can be stored within
it. In most cases, once the raw material is stored inside the device, it
will need to be processed in some way before a master recording can be
produced. At its simplest this might just be tidying up the lead in and
lead out sections (fading in and fading out), but most likely the
following processes will need to be carried out between making the
recording of a performance and the subsequent production of a final
master recording.
• Some editing of the various ‘takes’ will need to be done,
• for multitrack recordings, the tracks will have to be combined to
form a stereo channel,
• there may be a need to add various effects to the sound (e.g.
reverberation).
This section and the next two sections therefore will look at each of
these three processes, but do note that often the processes may be
combined, or may be carried out in a different order.
An important point to keep in mind during the discussion is that
sometimes a task has to be done in real time (i.e. as the sound is playing),
and at other times the task can be done at leisure, or in small sections.
This has a bearing on the speed at which each task must operate.
As in previous sections, the facilities provided in the AW16G Audio
Workstation will be used as an example, and you will get practical
experience of some of the processes with the course’s music recording
and editing software.
Although most of the discussion will be on how the processing is
done using digital audio, for completeness and comparison, analogue
methods will be noted where appropriate.
4.1 Getting the levels right

The raw material that has been recorded is likely to have come from a
number of different sources – microphone, electronic instrument,
another audio recorder, etc. When these sources are recorded, their
sound levels are likely to vary widely. In order to start from a ‘level
playing field’, it is helpful to standardise these levels so that not only
are they roughly equivalent, but also that they make full use of the
available dynamic range of the audio device and so maximise the
signal-to-noise ratio. There are two considerations here, the dynamic
range of the sound and its peak level.
The dynamic range of a sound can be altered by two processes
compression/limiting which reduces the dynamic range and
expansion/gating which increases it.
The process of adjusting the level of a sound is called normalisation,
whereby the whole sound is (usually) increased in level until the highest
peak is just below the maximum level the device can cope with.
Changes to the dynamic range should be made before changes to the
overall level (normalisation).
4.1.1 Compression and limiting

Compression, or more precisely audio compression, in relation to the
mixing and editing of sound is the process of reducing the dynamic
range of the signal. This is done by feeding the signal to a special
amplifier that has the property of varying its amount of amplification
(or gain) in response the level of the sound – the louder the sound, the
less the amplification. This should not be confused with digital data
compression that will be introduced in Chapter 4 which is a technique
for reducing the amount of digital data a sound signal contains. Audio
limiting is a similar technique to compression, but rather than the
effect working over all signal levels, the reduction in dynamic range
only comes into effect at high sound levels. This is usually used in
order to prevent overload distortion occurring.
One of the common situations where compression is useful is when
recording a singer who is close to the microphone. If the singer moves
towards or away from the microphone during the performance, even by
a small distance, the signal level can vary by a significant amount.
Compression will help to smooth these variations, but should be used
with care or it will also smooth out the singer’s own performance
dynamics as well.
Radio broadcasters regularly use compression to reduce the dynamic
range of live music (which can be over 100 dB) to the restricted
dynamic range of the broadcast medium (which as you will see in
Chapter 4 can be as little as 50 dB for analogue radio broadcasts). In
addition, some broadcasters compress the signal further and then
normalise it so that the soft parts appear louder – much to the
annoyance of hi-fi enthusiasts! This is done though to help the
audibility of soft passages to those listening in a moving vehicle where
there is a large amount of background noise. Unfortunately there is no
standard for this, and broadcasters do what they think will maximise
the listening experience for the majority of their listeners – which is
why some radio stations appear louder than others.
As hinted above, great care must be taken when using compression to
ensure that the dynamics of the performance are not ruined and that
soft passages are not boosted so much that any background noise on
the recording becomes obtrusive. Also, the attack and decay time (see
below) must be carefully set so as not to make the compression obvious.
There are a number of parameters that may be adjusted when
compressing a sound. The main ones are mentioned below.
• The level of sound above which the amplifier starts to reduce its
amplification (known as the threshold level). If the threshold is set
near the loudest sound level, the compressor becomes a limiter as it
only has effect for very loud sounds.
• The compression ratio that is introduced for signal levels above the
threshold, i.e. the allowable range of variation in amplification of
the signal within the compressor. Ratios of between 1:1 and 20:1 are
common. High compression ratios give a near constant output for a
wide range of input levels above the threshold level, and are used to
prevent distortion occurring through overload. Lower ratios are
used to effect compression over a wider dynamic range and might
typically be used in the broadcasting example mentioned above.
• The attack time, i.e. the speed at which compression is introduced

in response to the start of a loud passage. If it is introduced very
quickly, then the sound is softened and the ‘edge’ or attack is taken
out. Not only can this make the sound unnatural, it can also make it
seem softer and can even cause instruments to become
unrecognisable as much of their character is determined by their
starting transients. On the other hand, a very slow attack time can
make the compression ineffective – the loud passage might have
finished before the compression has taken effect. (Note that a ‘loud’
passage here might consist of a very short section of sound – a
singer’s consonant or the starting transient of an instrument for
example – as well as a sustained loud sound.) Thus when setting
this parameter care must be taken to obtain the required
compression whilst not audibly changing the loud transients. A
typical value for the attack time is 50 ms.
• The decay time, i.e. the time over which the amount of compression
is reduced to a new value after a loud passage has ended. If this time is
set too short, then sounds that have fast variations between loud and
soft (e.g. speech) will cause any background sounds or noise to become
audible during the soft passages. This can sometimes be heard in
interviews that are conducted outside on a windy day, or in the
presence of a lot of background noise such as traffic. If the decay
time is set too long, then at the end of a loud passage, the sound
can appear to disappear completely and then only slowly return.
This effect used to be very common with early video recorders
where a knock on the recorder’s microphone or a close loud sound
would suddenly reduce the sound level making the wanted sound
more or less inaudible for a period of seconds. Thankfully modern
audio devices have a much shorter decay time and this effect is now
not so common. A typical decay time is around 0.5 s.
Figure 25 is a graph that shows the transfer characteristics of a
compressor/limiter.
output
1:1
2:1
3:1
4:1
threshold
20:1 (limiting)
input
Figure 25 Transfer characteristics of a

typical compressor/limiter
In an analogue implementation, there would be a special variable gain

electronic amplifier circuit whose gain is determined by the amplitude
of the input signal.
For digital audio, it is simply a question of number manipulation. For
example, for a constant compression setting, ignoring the attack and
decay parts, each sample is first tested to see if its magnitude is greater
or less than the threshold sample value. If it is below, then the sample
is unaltered. If it is above the threshold, then the difference between
the sample value and the threshold value is simply multiplied by the
compression ratio. For example a compression ratio of 2:1 will result
in the difference between the sample and threshold values being
halved. Notice that I have described the process in terms of the
magnitude of the sound sample (i.e. ignoring the sign). This is because
a positive value has to be made less positive and a negative value less
negative, and so the use of the term magnitude covers both situations.
An analogue compressor/limiter may only provide a limited range of
settings for the above parameters whereas a digital device or the
equivalent software version found in audio processing programs will
often allow the user an almost infinite variation in the basic
parameters.
If the sound comes from a stereo source, then some devices will allow
the amount of compression to be controlled by the sound levels from
both channels, and the same amount of compression then affects both
channels at the same time. This prevents the compression causing
violent swings in the apparent position of the sound between the left
and right loudspeakers.
4.1.2 Expansion/gating
Audio expansion and audio gating are the opposite processes to audio
compression and audio limiting respectively and are similarly
implemented in both the analogue and digital domains. In audio
expansion, the sound is amplified less as the sound level is reduced.
In other words as the sound level is reduced, it appears to get even
softer, and thus the overall dynamic range is increased. Audio gating
occurs when below a particular sound level the signal is switched off
completely. Again there are the same parameters associated with these
forms of processing as for compression/limiting, i.e. threshold, ratio,
attack time and decay time.
Unlike compression/limiting which is primarily used to reduce the
dynamic range and make the recording as audible as possible,
expansion/gating is normally used to produce special effects.
Sometimes gating is used to cut out unwanted background noise
during, for example, speech, but this can give a very unnatural effect
when there is complete silence in the pauses. In the music world,
expansion and gating is commonly used for drums, with particularly,
the bass drum of a drum kit being gated to provide the effect of
damping.
Below is a description of how each sample in a digital sound signal is

processed. Determine whether the signal is undergoing compression,
limiting, expansion or gating, explaining the reasons for your choice.
“The magnitude of each sample value is compared with a threshold
value. If the sample has a greater magnitude than the threshold value
then it is left unchanged. If the sample magnitude is less than the
threshold value, then the sample value is set to 0.” I
4.1.3 Normalisation
Normalisation is the process whereby the whole sound is changed in
level until the highest peak is at a predefined ‘normal’ level. Unlike the
processes described above, all signal levels are altered by the same
amount. Usually normalisation involves amplifying the sound by a
constant factor such that the highest amplitude peak is raised to just
below the maximum level the device can cope with (but this might not
always be the case). Thus, the amount of normalisation applied must
be determined from the peak value of the signal, not the average level
otherwise normalisation might cause high-level peaks to be clipped
and introduce distortion.
Figure 26(a) shows the waveform of a section of a sound waveform
before normalisation, and Figure 26(b) shows the same signal after
normalisation.
peak value of maximum positive

waveform signal level
sound level
time
maximum negative
signal level
(a)
sound level
time
(b)
Figure 26 (a) the waveform of a section of a sound
waveform before normalisation;
(b) the same signal after normalisation
Normalisation in an analogue system is usually simply a matter of

adjusting the input level control so that the loud peaks of the material
just do not produce distortion in the audio system. This is usually
done as the material is recorded in the recorder or replayed to the
audio processor rather than as a separate stage. The problem with
normalisation in an analogue system is how to determine what the
peak level is. Doing this may require rehearsing or replaying the whole
piece to find the peak level. Fortunately, analogue systems tend to
distort rather ‘gracefully’ which means that the effect of overload
becomes progressively worse as the overload becomes greater. This
means that the odd short overloaded peak will probably not be noticed,
and so setting the normalisation level is not too critical.
In a digital system, overload is much more noticeable, and should be
avoided. Overload in a digital signal will result in the maximum
sample value being used for all samples that should require a value
higher than this maximum, i.e the signal will be clipped. However, it
is an easy matter for a digital audio processor to scan through a
recording to find the peak value, and then to work out a multiplication
factor which will result in this peak value just reaching the maximum
quantisation value – or more usually a headroom value which is just
below this. This multiplication factor can then be applied to every
sample and the result stored as the normalised version. As before, this
is a purely mathematical process and usually does not need to be done
in real time.
Remember that this normalisation process works according to the
highest level of the sound over a whole recording (or a section of a
recording). Most of the time the average level will be much less than
this value, and in the extreme, a recording might contain a single large
peak which may even have come from an unwanted click which causes
the normalisation process to have little effect. In such cases, some
editing of these peaks may need to be done first to remove or reduce
them so that normalisation will have the desired effect.
In this activity you will apply normalisation to the recording you made
in Activity 16. Follow the steps associated with this activity which
you will find in the Block 3 Companion. I
4.2 Editing processes

Simple editing of sound recordings involves adding, replacing or
deleting sections of a recording, or perhaps switching between two or
more different takes during the piece (known as cross-fading). The
procedure is just like the cutting and pasting process that occurs in
word processing when sections of text need to be replaced, added or
deleted – or for larger additions, a whole new text file is read in.
Editing may be carried out on individual tracks of a multitrack
recording, or on the final assembled recording. Some important factors
in the editing of sound recordings which will be considered in the
following sections are:
• How accurately can the edit point be determined?
• Can the cross-fading process be done in real time, and if so how are
the various sound sources synchronised?
• Can an edit be undone if it is found not to work as intended?
• Does the editing process degrade the original recording in any way?
4.2.1 Analogue techniques

The problem with any form of editing using analogue techniques is
that, unlike digital audio, if it involves producing a second generation
recording the quality will be reduced. Another problem is that it is
difficult to stop and restart a new recording to select a different take
without introducing a click, so switching may have to be done ‘on the
fly’ without stopping the recording. This of course would be
impossible to do if the edit is between two takes that are recorded on
the same reel of magnetic tape.
All in all, it is much better to work with the original recordings, but of
course in doing so the original will be permanently changed
(destructive editing), and if an edit goes wrong, then restoring the
original will be at best difficult or at worst impossible.
In the very early days of sound recording, editing was not possible – if
a recording went wrong, it had to be discarded and a whole new take
done. When analogue magnetic tape systems appeared, editing could
be accomplished by splicing (physically cutting and joining sections
using thin sticky-backed plastic tape called splicing tape). To obtain a
smoother result, the tape was cut at an angle rather than at right angles
across the tape (Figure 27). However, splicing only provides a straight
switch between the two recordings, cross-fades or fading in and fading
out cannot be done by this method.
magnetic tape splicing tape magnetic tape splicing tape
(a) diagonal cut (b) straight cut
Figure 27 Magnetic tapes are usually spliced at an

angle (a) rather than straight across the tape (b)
4.2.2 Digital techniques

The advent of digital recording and in particular hard disk recording
has revolutionised editing (and mixing). Some might say that the
flexibility and ease of editing that digital recording offers has been a
bad thing for music recordings since all recordings are now expected
to be perfect. In trying to achieve this, the recording engineer may
therefore require many ‘takes’, and carry out so many edits and cross-
fades that the result, although ‘perfect’, has lost its spontaneity and
freshness.
The advantage of digital recordings, however, is that however many
generations of copies are made, the quality is unaffected. This means
that the original recordings can remain unaltered, and edits can be
done on copies and in stages, with a new and more refined ‘master’
recording being produced at each stage. Such a process is called non-
destructive editing as it is always possible to retrace one’s steps if an
edit goes wrong.
In the digital domain, the basic editing processes are the same as in the
analogue world, except that all the editing is done electronically or by
a computer program. However the use of edit lists containing a list of
editing instructions is a much more practical proposition with digital
techniques. Also, once editing is complete, the edit list can be used to
generate new master copies from the original sound sources at any time.
More importantly, if the edit is not quite right, then not only does the
edit list contain a complete record of the editing operations that were
carried out, it also provides an easy method of ‘tweaking’ the edits to
create a revised master from the original sources, rather than having
either to start again or to edit the existing master to produce a new one.
Using multitrack hard disk recording, editing is even easier. The

various sources and/or ‘takes’ are assembled onto a hard disk if
they are not there already, and then it is simply a matter of ‘number
crunching’ – the list of sample values from each of the required
sections is collected together and stored as a new sound file on the
disk. If an edit list is produced, this list can be used to automate the
process. In addition, the access speed of hard disks is such that this
can usually be done in real time (or faster) as well which means
there is no need even to create a new master recording unless it is
needed for creating a CD or transferring elsewhere. There is also no
problem about synchronisation of the sources, although if the
process is being done in real time (so that the edited result can be
heard as it is created), then the input and output buffers of the hard
disk (see Section 3.2) must be of a sufficient size and be carefully
managed to enable the source material to be read off the disk as it is
needed, and the edited sound to be available as a series of sound
samples appearing regularly at the required sample rate.
Location of an editing point can be done by listening to the section of
the recording around the edit point – audio processors will usually
provide a waveform display of the sound from which the required
section can be selected and heard, some even simulate the tape reel
shuttling process that is used to find an edit point when splicing
analogue magnetic tape. Digital editing allows a fine degree of selection
of the edit point right down to an individual sample.
If the edit requires cross-fading between two digital sources (which
includes fading in and fading out since this is simply the process of
cross-fading to or from silence), then again this is simply achieved by
number crunching of the samples. In the case of a fade in, the number
of sound samples occurring over the desired fade in period is first
determined, then each sample within this fade in period is multiplied
by a factor between 0 and 1 such that at the start of the fade in, the
samples are all zero (multiplication factor of 0) and at the end of the
fade in the samples are unaltered (multiplication factor of 1). Usually
the change in multiplication factor from 0 to 1 over the fade in period
will be such as to give a smooth-sounding fade in, but other variations
are possible to give special effects. Fading out is achieved by the
reverse process, and a cross-fade is achieved by fading out the outgoing
source at the same time as fading in the incoming source and adding
individual samples from each set together.
A performance has been recorded using a sampling rate of 48 000

samples per second. A two second linear fade-in period is to be
created. If the sound samples are numbered from zero at the start of the
fade in period:
(a) what is the sample number at a point one quarter of the way into
the fade in period, and
(b) what factor should this sound sample be multiplied by? I
In this activity you will carry out some simple editing operations and
learn more about the course’s music recording and editing software.
You will find the steps for this activity in the Block 3 Companion. I
4.3 Editing facilities in the AW16G

All editing operations on the AW16G are carried out using the small
screen display and the controls in the data entry section on the front
panel (see Figure 2(b)). The AW16G allows all of the following
common editing operations to be carried out:
• erase (clear a section of a track);
• delete (remove a section of a track and move the subsequent section
up to fill the gap);
• insert (insert a blank section at a particular point);
• copy (copy one section to another section on the same or a different
track, overwriting the current contents);
• move (the same as copy, but the material is erased from the source);
• exchange (exchanges sections between tracks);
Most of the above operations can be carried out on individual hard
disk tracks and also on the stereo and pad tracks.
The editing operations require setting start and end points for the
section to be edited and this can be done in a number of ways:
• using the nudge function – the front panel DATA/JOG control is
used to create the digital equivalent of shuttling the tape reels of an
analogue tape recorder backwards and forwards to determine the
exact edit point;
• using the waveform display – the waveform of a track at any point
can be displayed and a particular point identified;
• using the location counter – this counter gives the precise position
of any part of a track in relation to the start.
The lack of a mouse or other pointing device and a ‘computer’ type
display screen can make selecting editing sections and carrying out
editing operations rather tedious and this is one of the drawbacks of
this type of device. This is also one of the reasons why manufacturers
are now tending to produce audio processing add-ons for desktop
computers rather than completely stand-alone units so that users
can benefit from the display and selection facilities a computer
offers.
In this activity you will carry out two simple editing operations;
one is destructive (i.e. the sound is permanently altered) and one is
non-destructive and uses the ‘edit list’ facilities of the course’s
music recording and editing software. You will find the steps for
this activity in the Block 3 Companion. I
5 MIXING
Mixing of audio signals is the process that occurs when two or more
sources need to be combined in various and possibly varying
proportions to produce a new combined recording. The audio sources
could come from a number of microphones recording a single
performance – either live or from a multitrack recorder, or from a
variety of different sources linked to a single performance (e.g. a
combination of signals from microphones and electronic instruments),
or sources from more than one performance (e.g. adding background
effects such as atmospheric sounds etc.). In fact fading out and fading
in that was mentioned above is just a special form of mixing over a
short period of time, where one of the ‘signals’ to be mixed is silence.
As I mentioned in Section 1.2.2, mix-down or simply the mix is the
term usually applied to this stage in the mastering process.
When recording a live performance, often the recording engineer will
try to record each performer or group of performers separately using
individual microphones recorded to individual tracks on a multitrack
recorder. This means that a master stereo recording can be produced
with the best balance of the various sources at leisure at a later date. If
the mixing is done at the time of the recording there is little chance of
changing the balance afterwards. Sometimes the multi-track ideal is
not practical or possible, and in such cases it is very important that the
mix of the various sources is set up with care during rehearsals.
In the discussion below it does not matter whether the sound is
coming from live or from recorded sources or a mixture of both except
you should bear in mind that it is vital all the sources are
synchronised – a topic that will be considered further in Chapter 3 of
this block.
Any sound mixing unit will have a number of inputs and a number of
outputs. The individual sound inputs are mixed in the required
configuration and proportions and the results sent to the required
outputs. Within a mixer, the actual ‘mixing’ is done with the use of
one or more buses. In this context an audio bus is the place where the
signals to be mixed are fed, and from where the result is sent to the
mixer output. In analogue mixers a bus would physically comprise an
electrical wire to which the various inputs were connected. In digital
mixers there is no physical equivalent component, but it is still useful
to retain the idea of buses.
5.1 Analogue mixing techniques

Mixing analogue audio signals is simply a matter of taking the required
proportion of each sound source and adding them together. The correct
proportion of each signal is obtained via the individual level controls
(usually called faders) that provide the required signal level. These
level controls may act directly on the analogue signal, or they may
provide a control signal to a special electronic circuit which changes
the level of the analogue signal. The latter process, although more
expensive to implement, is to be preferred as over time, mechanical
level controls become dirty and tend to add crackles to the sound when
they are moved – something that must not be allowed to happen where
levels need to be altered during a recording. If the level control only

provides a control signal, any ‘noise’ produced when moving the slider
can be filtered out before being fed to the level control circuit. Also,
having electronic control over the signal level means that the settings
can be stored and recalled at a later date if necessary.
Note that the control signal could be an analogue voltage, or a digital
signal (e.g. 0 = zero level, 64 = full level, 32 = half level, etc.), but
remember that this does not make the actual mixing stage into a digital
operation, only the control of the mix is digital.
The analogue mixing process is summarised in Figure 28, where for
illustrative purposes sound sources 1 and 2 are controlled directly and
sources 3 and 4 are controlled indirectly (normally a mixer unit would
control all the sources in the same way).
level controls
1
direct
control
mixed
2 sound out
analogue +
sound direct
sources control
3
variable
control circuit
“remote”
control circuits
4
variable
control circuit
control signals
(digital or analogue signals)
Figure 28 Mixing analogue signals
5.2 Digital mixing techniques

As with digital editing, digital mixing boils down to a mass of ‘number
crunching’. Each sample value from each sound source is multiplied
by the required factor (between 0 and 1) to give the sample value that
represents the sound level required for the mix, and then all the
resulting sample values are added together. Remember that digital
sound samples usually have a range that includes negative as well as
positive values, and so the sign of each sample value must be included
in the calculation.
If the mixing operation is to be done in real time, which is usually the
case, then the individual multiplications for all the channels and the
subsequent addition operation must be completed within the period
between samples. If real time is not needed, then these operations can
be done at leisure. Modern electronic circuits, called digital signal
processors (DSPs) that may be incorporated into dedicated audio
processors can carry out individual multiplications so fast that, to save
cost, it is often possible to share one such circuit between a number of

sound channels (i.e. carry out the multiplications one after the other
rather than all together) and still have time to do this and add the
results – all within a single sample time! A process such as this where
one component is shared on a time basis is known as multiplexing.
The digital mixing process using multiplexing of the multiplication
circuit is shown in Figure 29. Note that in this figure and some later
figures in this chapter, the wide data paths are used to indicate that all
the bits of one sample are transferred and processed together (in parallel).
temporary
multiplication 1-sample stores adder circuit
circuit
1 1
digital sound
sources 2 2
r +
(streams of 3 3 mixed digital
sample values) sample output
4 4 stream
synchronised electronic
selector switches
1 2 3 4
multiplication factors
for each channel
Figure 29 Mixing digital signals by multiplexing the multiplication circuit
Today’s desktop computers operate so fast that they also are able to
carry out the mixing process in real time, although of course here the
mixing is done by the computer program rather than by a dedicated
electronic device (although the computer itself may contain additional
circuits to speed up mathematical processes such as multiplication).
5.3 Controlling the level

One important consideration, which affects both analogue and digital
mixing is that of ensuring that the mixed signal level is not too loud as
to cause distortion in the subsequent audio stages.
Consider the situation where two analogue sources both with peak
levels of 1 volt are added together. At some time it is likely that
peaks from both sources will occur together, which will result in a
peak mixed output level of 2 volts. Extending this to 8 channels
means that the resulting signal might have a peak level of 8 volts.
If the circuits following the mixing stage have a maximum peak
input level of, say, 2 volts, then serious distortion is going to be
heard at certain points in the recording.
For digital mixing, you should recall from Block 1, that each sound
sample is quantised into a finite number of levels. The number of bits
used to store the sample determines the number of amplitude levels that
are available (e.g. 16 bits are required for 65 536 different sound levels, or
sample values from –32 768 to +32 767). Adding sample values from a
number of digital sources could result in a sample value outside this
range – which would normally cause the maximum allowed sample
value to be substituted, but this would cause distortion to be heard.
The solution in both cases is to make sure the levels of the individual
sound sources are such that their sum is within the maximum signal
level. Sometimes the dynamic range of the mixing stage and subsequent
stages is increased to address this problem. This is especially true in
digital mixers where often the internal processing in a digital mixer is
carried out using extra bits to allow a larger range of sample values to
be accommodated. Even if this is done within the digital mixer, the
number of bits must be reduced to the original number by a process
called downsizing when the master recording is made.
How do you think the downsizing process is carried out?
Comment
Downsizing involves adjusting each sample so that it can be stored using
fewer bits. However, before any bits are removed, it must be ensured
that they do not contain any data. This is done by multiplying the
value of each sample by the ratio of the new number of levels to the old
number of levels, and then storing the resulting value using only the
reduced number of bits. This ratio turns out to be the equivalent of
dividing the sample by 2n where n is the difference between the
numbers of bits used before and after the downsize operation.
So, for example, if a 16 bit digital sound signal needs to be downsized
to 14 bits, each sample value needs to be divided by 216–14 = 22 = 4. I
A digitised sound needs to be downsized from 24 bits to 16 bits. What

number must each sample’s value be divided by to ensure all samples
can be stored using 16 bits? I
5.4 Mixing facilities in the AW16G

The mixing facilities form the heart of the AW16G audio processor, so
I will look at these in a little detail. Operationally, the basic structure
of the device in these respects is very similar to a conventional stand-
alone sound mixing desk (analogue or digital).
Figure 30 shows a realisation of the main components of the device
and the signal flows between them. I say this is a realisation because,
as this is a computer-controlled device, the individual sections and
connections may not physically exist within the unit. However, such a
diagram helps us to visualise how it operates. In fact quite a lot of the
discussion below about the mixer section is a virtual realisation of
how the unit operates rather than a physical description of
components and connections within the unit.
In the diagram, the quick loop samplers (sometimes referred to as
pads) are short ‘tracks’ that can be used to create bursts of sound
when a key is pressed or continuous sound loops (e.g. repetitive
percussion rhythms) just as was explained in Chapter 8 of Block 2.
The sound clip can be used to make short temporary recordings.
As both of these features are peripheral to the main purpose of the
mixer
r2
r8 sound clip
input jacks 1–8 input channels 1–8 r2
track channels 1–16
r2 return channels 1/2 r2 stereo/aux out jacks
digital stereo in jack r2
pad channels 1–4 digital stereo out jack
r2
monitor out jacks
buses L/R phones jack
aux buses 1/2 metronome
effect buses 1/2 r2
stereo buses L/R effect 1
r8 r2
internal effects
r2 stereo output channel r2
effect 2
r2
r2
recorder
input CD play
patching
r2 r16 r16
CD write
1 2 3 4
quick loop data backup/restore
sampler hard disk import/export CD-RW drive
recorder
Figure 30 Block diagram realisation of the AW16G audio processor
AW16G, I will not consider them further. The effect units provide
effects such as reverberation and I will look at these in more detail
in the next section on adding effects.
So, in the AW16G, ignoring the pad tracks etc., the main inputs to the
mixer section are:
• 8 inputs from the device’s microphone/line input connectors;
• one stereo input from the digital S/PDIF connector;
• 16 track inputs from the hard disk (these are arranged as 8 separate
tracks and 4 paired stereo channels);
• a further stereo input from the hard disk’s stereo channel (as
mentioned in Section 3.5);
• two stereo channels from the effects units;
• a stereo input from the CD drive (if fitted).
Associated with each of these inputs are a number of controls and a
number of level monitoring facilities. The main controls being a level
or volume control that adjusts the amount of the signal that is added to
the mix and a pan control that allows a single track to be sent to the
left and right channels of a stereo bus in varying proportions.
The mixer section contains a number of audio buses to which each of
the inputs can be assigned (i.e. connected to):
• a stereo bus (i.e. separate left and right buses);
• two auxiliary buses that can be used when an external effect unit is
required, or when a special monitor mix is required for a performer;
• two buses that are used to supply mixes of sound to the effects
units – there is one bus for each unit;
• a general-purpose stereo bus that can be used for partial mixes, or
for intermediate mixes.
The outputs from the mixer section are:

• a stereo output from the stereo bus which is fed to the stereo input
of the hard disk and also to the back panel connectors;
• an auxiliary output from the auxiliary bus – there is a selector switch
that allows the user to choose whether this signal or the stereo bus signal
above is fed to the STEREO/AUX OUT back panel connectors;
• two outputs for the effects units;
• 8 direct track outputs from the 8 input channels.
Also, the general-purpose bus and the direct track outputs can be fed to
any of the hard disk recorder’s 16 track inputs.
How are all these inputs, buses and outputs connected together, since
clearly each setup will require a different set of connections, and so
they must therefore be under the control of the user? The answer is to
use a patch system to connect inputs to buses and buses to outputs.
In the case of analogue mixers, this process would be achieved with selector
switches, but for digital mixers (and desktop computers running a suitable
program), the selection is done via a display screen – as is the case for the
AW16G.
In the case of this device, some of the buses are permanently connected to
particular outputs as mentioned above, but this leaves a lot of flexibility
about how the various signals can be combined. To ease this situation for the
user, the AW16G has four basic recording/mixing modes – direct, mixed,
bounce and mixdown. As these are not peculiar to the AW16G, but are
commonly used mixing/recording modes, I will briefly look at each of these.
Direct mode
This mode is used to record inputs individually on separate hard disk tracks
so that they can be mixed down to a master stereo channel at a later time. The
relevant mixer connections are set up by selecting the DIRECT mode record
screen on the display and then pressing input channel and track channel
buttons on the front panel to make the connections which then appear on the
screen. Figure 31 shows a typical direct mode setup where 2 microphones
and a synthesiser with a stereo output are being recorded on 4 separate hard
disk tracks and Figure 32 shows the direct mode display for this set up.
Mixed mode
This mode uses the general-purpose bus to allow an intermediate mix to be
made in order to save track usage. Here, the inputs are assigned to the bus,
and the bus output is assigned to 2 hard disk tracks (one for the left channel
and one for the right). This saves tracks, but of course the mix of the inputs
cannot be altered at a later date (neither can any effects be later added to an
individual input). Figure 33 shows the MIXED mode set up for the same
inputs as described in the DIRECT mode example above, and Figure 34
shows the corresponding MIXED mode display where the bus outputs are
connected to hard disk tracks 1 and 2. Note that the two microphone inputs
are connected to both the left and right buses (so that the pan controls can be
used to position the sound from each microphone anywhere between the left
and right ends of the stereo sound field), but the synthesiser left and right
inputs are only connected to their appropriate bus lines.
synthesiser/rhythm
machine
1 2 3 4 5 6 7 8
mic/line input
jacks
stereo
input output
channels channel
1 2 3 4 5 6 7 8 stereo
mixer section
recorder section
track 1
track 2
track 3
track 4
track 5
Figure 31 A setup
track 6
using the AW16G’s
track 7
DIRECT recording mode
track 8
Figure 32 The AW16G’s screen display for the setup
in Figure 31
synthesiser/rhythm
machine
1 2 3 4 5 6 7 8
mic/line input
jacks
stereo
input output
channels channel
1 2 3 4 5 6 7 8 stereo
recorder section
track 1
L/R bus
track 2
track 3
mixer section
track 4
track 5
track 6
track 7 Figure 33 A setup using the
track 8 AW16G’s MIXED recording mode
Figure 34 The AW16G’s screen

display for the setup in Figure 33
Bounce mode
Bounce or ping-pong mixing or recording is a generally-used term that
describes the situation where a number of individual already recorded
tracks are combined and stored as two new tracks, with or without
new live material being added. This is an intermediate stage that can
again be used to reduce the number of tracks, but with the same
provisos in terms of getting the mix right and the addition of effects to
individual channels as mentioned above for the mixed mode.
Figure 35 shows a BOUNCE set-up where 8 previously recorded tracks
are combined to form a single stereo channel which is then stored in
two unused hard disk tracks (tracks 9 and 10). Figure 36 shows the
bounce screen display for this set-up.
recorder section
track 1
track 2
track 3
track 4
track 5 phones
track 6
track 7
track 8 stereo/aux out monitor out
track 9
track 10
stereo
output
track channel
channels
1 2 3 4 5 6 7 8 9/10 11/12 13/14 15/16 stereo
stereo bus
mixer section
Figure 35 A setup using the AW16G’s BOUNCE recording mode
Figure 36 The AW16G’s screen display for the setup

in Figure 35
Mixdown mode
This mode is where the final mix is made to the stereo bus which is
then recorded onto the hard disk’s stereo channel. The mixdown can
involve live inputs, tracks previously recorded on the hard disk and
other extras such as sounds from the quick loop sample pads.
Figure 37 shows a typical MIXDOWN set-up which mixes 4 live inputs,
4 previously recorded hard disk tracks and sounds from the four quick
loop samplers to create a final single stereo channel. Figure 38 shows
the mixdown screen display for this set-up.
recorder section
track 1
track 2
track 3
track 4
stereo L
stereo R
phones
stereo/aux monitor
mic/line out out
input jacks
1 2 3 4
1 2 3 4
sound clip
pad channels 1–4 input channels 1–8 track channels 1–16
stereo
output
channel
1 2 3 4 1 2 3 4 1 2 3 4 stereo
stereo bus
mixer section
Figure 37 A setup using
the AW16G’s MIXDOWN
recording mode
Figure 38 The screen display for the mixdown setup

in Figure 37
For all of these four modes, adjustments can be made to some of the
parameters as the recording progresses. In particular the front panel
slider level controls can be used to alter the mix during recording and
channels can be switched on and off (punch in and punch out). Using
an edit list (called a scene memory in the AW16G), all the settings for
each channel and any/all adjustments that need to be made to the
configuration and level settings as mixing proceeds can be stored and

referenced to particular points in the recording. This means that a
complete mix can be replayed time and time again from the original
tracks simply by recalling and running the appropriate scenes from the
scene memory as the song proceeds.
As I said earlier, the AW16G is a very sophisticated device with far too
many features and facilities to be able to explain them all here. However, I
hope the description above will give you a flavour for the sort of mixing
operations that are available in such a desktop sound device.
In this activity you will experiment with some mixing operations

using recordings you have made earlier in this chapter. You will find
the steps for this activity in the Block 3 Companion. I
6 ADDING EFFECTS
There is a huge range of effects that can be applied to a sound signal

and the number has mushroomed since the advent of digital
techniques. Some are designed to be used with individual sound
sources, and some are used to affect the overall sound in the final mix.
There are a number of effects that are widely used and commonly
available, some of these were mentioned in Chapter 8 of Block 2.
In this section we will look at a number of these in a little more detail.
A detailed discussion of the implementation of these effects is beyond
the scope of this course, but where possible an outline of how they are
achieved using both analogue and digital techniques where appropriate
is given.
6.1 Equalisation
This is one of the most common and most basic effects, and is concerned
with altering the frequency spectrum of the signal in some way.
Equalisation, or EQ as it is sometimes called, obtains its name from the
process of boosting frequency ranges that have been attenuated (reduced)
through a transmission medium or audio recorder in order to ‘equalise’
signal frequencies in this range back to their original levels.
Today, equalisation covers not only boosting certain frequency ranges,
but also cutting them, and doing this with many different, and
sometimes very small, frequency ranges.
Sometimes equalisation is fixed and is inbuilt into a system. For
example a type of equalisation called pre-emphasis is used in vinyl
records, and Dolby B is commonly used in compact cassettes to
increase the signal-to-noise ratio. You will come across these and other
examples of fixed equalisation in Chapter 5 of this block.
However, equalisation is often used subjectively to affect the character
of a sound, for example boosting the upper frequencies a little can
make speech more intelligible by emphasising the consonants, mains
hum can be reduced by turning down the lower frequencies.
At its most basic, equalisation consists of simple treble and bass

controls (Box 19). More sophisticated is the use of one or more mid-range
controls that cut or boost frequencies around the middle of the audio
range (Box 20). For fuller control of not only the frequency around
which the boost or cut occurs, but also the frequency range that is
affected, a parametric equaliser can be used (Box 21).
Finally, a graphic equaliser contains a number of fixed mid-range
equalisers spaced across the audio frequency range (Box 22).
Graphic equalisers are often used to reduce the chance of feedback
from a loudspeaker to the microphone in a public address system
by cutting the frequency or frequencies at which howl starts to appear.
They can also be used to compensate for the acoustics of a room or
the deficiencies of a loudspeaker.
Box 19 Treble and bass equalisation

Treble equalisation
gain
boosts or cuts the maximum bass boost
frequencies in the
bass boost treble boost
upper hearing
signal gain/loss
range, and the bass

equalisation control
does the same for frequency
the frequencies at
the lower end of the bass cut treble cut
audio spectrum as
maximum bass cut
shown in Figure 39. loss
These basic controls
provide a very crude Figure 39 Typical variation in frequency response from
simple treble and bass equalisation controls
adjustment to the
upper or lower
ranges and do not allow any adjustment of the frequencies at which the boost
or cut starts to have effect – they are sometimes called fixed equalisation.
Note that the response graphs in Figure 39 flatten out at the ends indicating
that above or below a certain frequency the effect does not produce any further
boost (or cut).
Box 20 Mid-range equalisation

Mid-range equalisation boosts or cuts a range of frequencies in the middle of
the audible frequency range as shown in Figure 40. As with treble and bass this
works over a fixed frequency range, although sometimes the centre frequency
can be varied.
gain
mid-range boost
signal gain/loss
frequency
mid-range cut
centre frequency
loss
Figure 40 Mid-range equalisation

Box 21 Parametric equalisation

Parametric equalisation provides gain
changing the amount
adjustment of not only the amount of boost or cut
signal gain/loss
of cut or boost (Figure 41(a)) and the
centre frequency (Figure 41(b)), but
it also allows the bandwidth of the frequency
frequencies that are affected to be
varied (the bandwidth setting is
sometimes called the ‘Q’) as shown loss
in Figure 41(c). (a)
gain
changing the centre
frequency
signal gain/loss
frequency
loss
(b)
gain
changing the ‘Q’
signal gain/loss
Figure 41 frequency
Parametric equalisation:
(a) varying the amount of boost/cut;
(b) varying the centre frequency; loss
(c) varying the bandwidth or Q (c)
Box 22 Graphic equalisers

A graphic equaliser contains a gain
number of fixed mid-range over-
signal gain/loss
lapping equalisers spaced across the

audio spectrum as shown in Figure 42.
The term ‘graphic’ is derived from the frequency
fact that the slider settings of the
boost/cut controls when set, give a
crude visual image of the frequency loss f1 f2 f3 f4 f5 f6 f7 f8
response of the whole equaliser over
the audible range as illustrated in Figure 42 Graphic equalisation
Figure 43.
slider controls
boost
cut
(a) f1 f2 f3 f4 f5 f6 f7 f8
boost
Figure 43 (a) The slider settings in a frequency

graphic equaliser give a visual
indication of the overall frequency resultant response
response shown in (b) (b)
cut
All types of equalisation use one of three types of electronic filter – high
pass, low pass and band pass which were introduced in Chapter 5 of
Block 1. The creation of effective electronic filters whether implemented
using analogue or digital techniques is complex and is beyond the scope
of this course. However, it is instructive to note that digital filters cannot
work with single sound samples in isolation, since one sample does not
give any information about the instantaneous frequency content of
the signal. To create a digital filter requires proportions of a number
of consecutive samples to be added to produce a single new ‘filtered’
sample as outlined in Box 23.
Box 23 Digital filter basics

A digital filter can be created by sending the sound samples along a series of
processing stages. The samples are moved from one stage to the next whenever
a new sample appears at the input. In each processing stage, a proportion of
the sample value is derived by multiplying the value by a number between –1
and +1 (a negative multiplicand causes a reversal of the sample’s phase – the
idea of phase was introduced in Chapter 1 of Block 1) and the resulting values
from all the processing blocks are then added together to produce one new
‘filtered’ sample.
The multiplication factors (which are different for each processing block) are
set to achieve the required filter type and shape, and the more stages there
are, the better the quality of the filter. Also the higher the sample rate the
more stages that are needed to obtain a reasonable result. High quality digital
filters will use in excess of 100 stages.
Figure 44 shows the block diagram of a simple 7-stage digital low-pass filter.
The regularly occurring clock signal at the sampling rate causes each sound
sample to be transferred from one temporary storage element to the next. As
in Figure 29 earlier, the wide data paths indicate that all the bits of one sample
are transferred and processed together (in parallel).
sampling rate clock
1-sample
input storage units
–0.21 0 0.64 1.0 0.64 0 –0.21

r r* r r r r* r
output sound samples
*note, in this particular case of a low pass

filter, these multiplication factors are zero
Figure 44 Simplified structure of a digital low pass filter
In this activity you are supplied with a piece of music that has an
annoying continuous mains-frequency hum. Your task is to introduce
suitable parametric equalisation to remove as much as possible of this
hum without affecting the music. You will find the steps associated
with this activity in the Block 3 Companion. I
6.2 Echo and reverberation

Echo and in particular reverberation are two of the most useful effects in
music recordings. Echo occurs when the listener is aware of one or more
separate, distinct, fading repeats of a sound, whereas reverberation is
where the echoes are so close together that they merge to give the effect
of the resonance in a large hall or room.
6.2.1 Echo
Adding echo to a recording is simply a matter of delaying the sound by a
fixed amount and then adding a proportion of this delayed sound to the
original sound. This only gives one echo, so if multiple echoes are needed,
the delayed sound is also fed back to the input of the delay again. This
is fairly simple to achieve using both analogue techniques (see Box 24)
and digital techniques, although digital techniques give a better result
and offer much more control over the echo as explained in Box 25.
Box 24 Generating echo using analogue techniques

Traditionally obtaining echo using analogue techniques has been obtained using a magnetic
tape loop. A special tape recorder is fitted with one record head and a number of playback heads.
The original sound is fed to the recording head, and a loop of tape, which travels round and round,
causes the replay heads to produce delayed
versions of the sound which can then be
mixed in proportion with the original sound
fed to the recording head to produce further,
tape drive spindle
decaying, echoes as shown in Figure 45. erase record playback and pinch wheel
Varying the speed of the tape will vary the head head heads
times of the echoes. However, the main
problem with the tape loop system is that tape
loop
the tape soon wears out. direction of tape travel
Shortly before the advent of digital sound,
devices called bucket brigade delay lines Figure 45 Analogue tape loop echo generation
were introduced which enabled analogue
signals to be delayed using purely electronic techniques. These devices contain a large number of
electrical charge storage units connected in a chain where each storage unit could retain an analogue
voltage, as an amount of electrical charge – but only for a short time. The analogue sound signal is
sampled at regular intervals, but is not subsequently digitised. Each analogue sample is fed to the
start of the chain. At a certain time later, the original sound voltages appear at the last storage
element and a delayed version of the original sound can be generated. Often these devices provided
a number of outputs along the chain to produce intermediate echoes as illustrated in Figure 46.
The major problem with these devices were the limited delays that could be obtained because of the
charge losses that occurred along the chain that caused the signal to become too weak. The advent of
digital techniques quickly made these types of devices redundant, but for a few years they were quite
popular.
sample clock signal
analogue sampler
audio input
analogue charge
samples storage units
(short delay) (long delay)
delayed audio analogue outputs
Figure 46 Analogue echo generation using a bucket brigade delay line

Box 25 Generating echo using digital techniques

Producing echoes in the digital domain is very straightforward. Like the bucket brigade delay line
described in Box 24, each sound sample is sent along a chain of storage elements at the sample
rate of the sound. At the end of the chain, the sound sample value is multiplied by a factor less
than 1 to provide a proportion of the sound and the resulting number added to the sound sample
just entering the first storage element. Intermediate echoes can easily be generated by tapping
off the samples at a number of places along the chain.
The actual storage elements can either be provided by a dedicated electronic component, or, for
computer-controlled devices, a section of the controlling computer’s main random access memory
(RAM) is used to simulate the shifting action using an arrangement called a first in, first out
(FIFO) buffer. With this technique, the sound samples are not actually shifted between memory
locations but are stored in consecutive memory locations in an area of memory set aside for this
purpose (called the buffer). Once this area of memory is full, samples are stored from the start
again, overwriting the previously stored samples. Special variables called pointers are used to
indicate where the next sound sample should be stored and where the ‘echo’ sample is to be read
from as shown in Figure 47. The more memory locations in the buffer, the more sound that can be
stored and the longer the echo that can be generated, and since this is digital data, the length can
be as long as is required (or as long as the amount of available memory allows). For even longer
delays, the sound samples can be stored temporarily on the controlling computer’s hard disk.
Multiple delays can be achieved by having a number of ‘out’ pointers.
computer main memory
buffer
in pointer size
last sample to be stored in memory
previously stored samples
next sample to be read from the memory

out pointer empty memory locations
(or locations that contain old
samples that can be overwritten)
pointers are changed

to the next location at
the sample rate
Figure 47 Digital sound sample delay using a FIFO buffer
Figure 48 shows the block diagram of a echo control

(0–1)
typical digital delay unit. The unit could
be implemented using actual hardware
delay devices, or it could be simulated r
with a computer program where the
delay is formed using a FIFO. Note again sound
output
the use of wide data paths to indicate + + samples
samples
with echo
all the bits of each sample are
processed in parallel.
digital
delay
Figure 48 Digital echo generation
6.2.2 Reverberation
Simulating reverberation is more difficult to achieve effectively than
echo. Why might this be so? Hopefully the following revision activity
will give you a clue.
Figure 49 shows an incomplete amplitude–time graph for the sound of

a single hand clap in a reverberant room. Using what you have learnt
from Chapter 4 of Block 1, copy down the graph and fill in the likely
reflections and reverberations that might occur.
Comment
If you had difficulty answering this activity, you may like to have a
quick look back at the material on reverberation in Sections 3 and 4 of
Chapter 4 in Block 1 to refresh you memory before carrying on. I
response
direct
sound
time
hand clap
Figure 49 Graph for Activity 32
The main parameters that are associated with reverberation are as

follows:
• depth (sometimes called the dry/wet balance) – this is the ratio of
the levels of reverberated and original sound;
• decay time – how long the reverberation lasts which is usually
given as the time for the level of reverberation to reduce by 60 dB;
• reverberation type (reverberation algorithm or room size) – this
defines the type of acoustic that is being simulated, e.g. large hall,
small hall, domestic room;
• early reflection delay – the time between the original sound and
when the first single-reflection sounds occur;
• pre-delay – the time between the original sound and when the
multi-reflection sounds occur;
• diffusion and density – the time between individual echoes in the
multi-reflected sound, and how distinct each one sounds;
• colour or equalisation – the equalisation applied to the reflected
sounds – high frequency reflections tend to fade more quickly than
low frequency ones, so equalisation can be applied to the
reverberant sound to simulate this.
The problem with reverberation is how to generate the many and varied
echoes that occur (both in terms of time delay and amplitude). Although
reverberation is achievable using analogue techniques as explained in
Box 26, digital techniques offer much better results and provide much
more control over the reverberation parameters (see Box 27).
Box 26 Generating reverberation using analogue techniques

In the early days of analogue sound,
the only way of producing realistic ‘microphone’
reverberation was by using a special transducer
room in which was placed a
loudspeaker and a microphone. If an sound out
empty room is made sufficiently large ‘loudspeaker’
transducer
and of a sound reflecting material (e.g.
concrete) a reasonable reverberation
effect can be obtained. However, only sound in suspended metal plate
one type of reverberation is available,
Figure 50 Generation of reverberation using a metal plate
and the set-up is of course sensitive
to extraneous outside noises.
coiled springs
An alternative approach was to use a
large metal plate, or a set of metal
springs. In the case of a plate, a sound sound sound
transducer is fixed to the plate which in out
causes it to vibrate in sympathy with
‘loudspeaker’ ‘microphone’
the sound and thus to simulate transducer transducer
reverberation. Another transducer
picks up the vibrations as shown in Figure 51 Generation of reverberation using coiled springs
Figure 50. The other method, shown
in Figure 51, is similar but the plate is replaced by coiled springs. As with the reverberant room, very
little control can be achieved over the type or length of the reverberations, although the metal plate
can be damped to reduce the decay time.
Box 27 Generating reverberation using digital techniques

Reverberation using digital techniques is more sound digital
+
complicated than producing a simple echo, but the samples delay
principles are the same. The digital samples are fed
into a delay unit (hardware device or FIFO buffer). When
the delayed signal appears at the end of the delay period, r
a proportion of it is added to the undelayed signal which
is then fed back into the delay again. In order to simulate
the early reflections and then the multi-reflections, a
feedback control (0–1)
number of these delay units operate both in parallel and
in series. Each one has a different delay time, and the Figure 52 A basic digital delay unit
configuration of numbers of delay units and connection
structure is altered to simulate different acoustics. Figure 52 shows a simple digital delay unit,
and Figure 53 shows how reverberation might be created using a number of these units.
In practice, each of the individual delays may be composed of different configurations of delay and
feedback from that shown in Figure 52, and the structure
parallel delays of an actual reverberation device (again either a real
d1
hardware device, or a FIFO simulation) will probably not be
recognisable as being made up from
a number of individual delay units.
d2 However, the principle on which
it operates should be
input + d5 d6 + output as described above.
d3 serial delays
dx = one delay unit

(See Figure 52)
d4
direct sound path
Figure 53 Creating reverberation
by connecting together a number
of individual delay units
6.3 Flanging and chorus

Although the effects of flanging and chorus sound quite different, they can be
achieved using similar techniques as you will see in the discussion below.
6.3.1 Flanging
Flanging or phasing was first mentioned in Chapter 1 of Block 1, and
is a synthetic effect that occurs when the frequency response of the
system contains a number of equally spaced notches (a notch is a small
frequency range where the signal is reduced) and where the frequency
of these notches varies slowly up and down the audible spectrum.
This introduces a characteristic sweeping effect to the sound.
ACTIVITY 33 (LISTENING) .....................................................................
To remind you what flanging sounds like, listen to the audio track
associated with this activity which is a repeat of the track used in
Activity 25 in Chapter 1 of Block 1 that demonstrates flanging. I
The notches are produced by delaying the signal by a small amount and
adding it to the original signal. This has the effect of cancelling some
frequencies and boosting others. Figure 54 illustrates this effect for two
special cases where the delay is exactly one half of the cycle time of the
sound frequency (Figure 54(a), (b) and (c)) and exactly equal to the cycle
time of the sound frequency (Figure 54(d), (e) and (f)). These special
cases where the resultant signal is either completely cancelled or exactly
doubled will occur at frequencies of 3, 5, 7, etc. times or 2, 4, 6, etc.
times the original frequencies respectively (i.e. the odd harmonics and
the even harmonics respectively). Viewed as a frequency spectrum, these
special cases appear as a set of equally spaced notches (where the delay
causes cancellation) and peaks (where the delay causes reinforcement)
in the frequency spectrum. A device that has this effect on a signal is
Box 28 Generating flanging using analogue techniques

Flanging was originally produced using two The more the pressure on the spool the more it
identical magnetic tape recorders with separate slowed down and hence the name ‘flanging’.
record and playback heads. The sound is fed to
This effect can be recreated with purely analogue
both record heads and the signal from each of
electronic circuitry, but it is a fairly complicated
the replay heads is mixed together as shown in
process and the actual implementation is
Figure 55. Any small difference in the speed of
beyond the scope of this course.
each machine causes a very small delay between
the two replay signals that has the effect of delay 1
cancelling some frequencies and boosting others.
By altering the speed of one of the tape
recorders, the delay will change and the erase record play
resultant phase changes will affect different
sets of frequencies. To make the speed tape tape speeds two identical
travel delay 2 tape recorders
change variable and so create the sweeping slightly
different
effect, the operator used to place a finger
on the flange of its feed spool to slow it
down slightly, and so increase the delay erase record play
between the two tape recorder’s signals.
Figure 55 Flanging
produced using two identical tape recorders
known as a comb filter. Intermediate frequencies will be boosted or cut by

different amounts between these extremes. The flanging sound is produced
by altering the delay and so causing the frequencies where reinforcement or
cancellation occurs to sweep up and down the audio spectrum.
Box 28 outlines how the flanging effect was originally produced using
analogue techniques, and Box 29 mentions how the effect can be
achieved using digital techniques.
amplitude
frequency f1
time
(a)
amplitude
delay
delay = 1/2 r 1/f1

time
(b)
amplitude
(a)+(b)
(result = no signal)
time
(c)
amplitude
frequency f2( = 2 r f1)

time
(d)
amplitude
delay
delay = 1/f2
time
(e)
amplitude
(d)+(e) (result = f2 at
twice the amplitude) time
(f)
Figure 54 Adding phase shifted signals can cause cancellation or

boosting
Box 29 Generating flanging using digital techniques

Digital generation of flanging is a fairly straightforward digital sound
samples +
procedure since it simply involves delaying the sound
samples by a small but variable amount and then
adding the result to the undelayed samples as shown
variable
in Figure 56. This directly recreates the change in digital delay
delay between the two analogue tape recorders as the
speed of one of them is varied, and therefore creates
delay control
the phase change that provides the cancellation/
boosting effect at various sets of frequencies. The Figure 56 Flanging produced using
actual sweeping effect is created by altering the delay. digital techniques
A sinusoidal signal is fed to a delay unit that has a delay of 1 ms

(1 thousandth of a second). The output of the delay is then added to
the original signal. For what signal frequencies with there be complete
cancellation of the signal? I
6.3.2 Chorus
Chorus is an effect that is designed to simulate the sound of a number of
players (or singers) from just one or a few musicians. Its effect is really
only useful for music and in particular for strings and singers as you have
heard in Chapters 6 and 7 in Block 2. As explained in Block 2, when a
number of musicians play the same part (or singers sing in unison) small
varying differences in their pitch cause multiple and changing phase
differences between the individual notes and overall these combine to
sound as random small changes in both volume and pitch and result in a
full, rich sound. Box 30 outlines how this effect may be synthesised.
Box 30 Generating chorus

Like flanging, analogue generation of chorus is achievable, but requires
complicated electronic circuitry, and so will not be considered further here.
Digital techniques however are not only much simpler and provide much more
control over the chorus effect, they produce a better overall result.
To generate chorus, the sound samples are delayed by a number of randomly
varying delays which are added back to the original signal in varying proportions.
This is very similar to how flanging is generated except that:
• a number of different varying delays are used in parallel;
• in general the delays used are longer;
• no feedback is used – sometimes flanging uses feedback whereby the output
of the flanger is fed back into its input again to produce a range of more
elaborate effects.
As you will see later in the discussion of pitch changing, varying delays cause
changes in pitch, and the delays themselves cause the changes in amplitude
through the phase changes that they introduce.
Listen to the audio track associated with this activity. It is a short piece of
music played using a ‘strings’ patch on a synthesiser. The music is played
three times, first without chorus, second with chorus and finally with
both chorus and reverberation added. I
In this activity you will use the course’s music recording and editing
software to add effects to some music. You will find the steps
associated with this activity in the Block 3 Companion. I
6.4 Pitch and tempo

For recordings of musical performances, the ability to change either
the pitch or the tempo (i.e. duration) of a piece without altering the
other seems likely be a useful tool.
When might such effects be useful? Tempo changing is particularly
useful when a piece of music has to be an exact length. This is often
needed in film music or television/radio adverts, where the music has
to last an exact amount of time. Pitch changing can be useful to tune
two separate recordings together, to effect transposition (changing the
key of a piece of music) or to correct notes that are slightly out of tune
(either generally or for odd notes or phrases). However, the larger the
change in pitch or tempo, the less effective are the results.
Analogue devices have been produced that effect pitch changes by the
process of varying delays, but they were never very successful and will
not be considered further here. Even digital implementations are only
really effective for small changes in pitch or tempo.
6.4.1 Changing pitch

When the batteries of a portable compact cassette recorder start to give
up during playback, the tape gradually goes slower which reduces the
tempo of a music recording, but it also reduces the pitch. What is
really needed then is to be able to do one without affecting the other.
How might this be done?
At first thought it might seem impossible to create each of these effects
in isolation, but consider the sound from the siren of a police car as it
travels towards, past and away from you. As the car approaches the
pitch appears a little higher than the actual pitch (although you may
not be aware of this), as it passes the pitch falls sharply, and remains
at this lower pitch as the car speeds away. However, the pitch of the
siren itself does not vary, and if the car stopped, the pitch of the siren
would be heard unaltered. This change in perceived pitch is known as
the Doppler effect, but the important point to note is that to the
stationary listener, the pitch appears to change only whilst the car is
moving.
What is happening here is that because the speed of the car is not
insignificant compared to the speed of sound in the atmosphere, as the
car approaches, the sound waves from the siren get ‘squashed’
resulting in a raising of the pitch. As the car passes and speeds away,
the sound waves change from being ‘squashed’ to be being ‘stretched’,
resulting in a sudden lowering of the pitch.
What we can see from this example is that if the pitch of a sound
signal (the police siren) is compared with a delayed version of itself
(i.e. the listener at a distance who hears the sound a short while later),
the two pitches will be found to be the same (assuming the car and
listener are stationary). However, if the delay is varied (the car moves
towards or away from the listener) then the pitch of the delayed signal
will appear to the listener to change – a reducing delay (the car is moving
towards the listener) raises the pitch (the sound waves are ‘squashed’)
and a lengthening delay (the car is moving away from the listener) lowers
the pitch (the sound waves are ‘stretched’). BUT this only occurs as long
as the delay is changing (i.e. the car is moving). Immediately the delay is
constant (the car stops), the pitches of the delayed and undelayed sounds
become the same again. Box 31 explains how this can be achieved without
having to resort to moving either the microphone or the performer!
Box 31 Changing pitch by changing the delay

In a digital pitch changing device, the sound samples are fed into a circular
store exactly like the FIFO buffer mentioned in Box 25 on generating echo.
However, in the echo case, a constant delay was required, and so the sound
samples were read out of the buffer at the same rate as they were read in. To
change the pitch, all that needs to be done is to read the samples out of the
buffer at a different rate from the rate at which they are put in (or vice
versa). A faster rate will increase the pitch and a slower rate will decrease it.
This is shown diagramatically in Figure 57 where the read and write pointers are
rotating at different speeds.
sound samples
Clearly there is a problem here waiting to be read
as at some point one pointer is
going to catch up with the other
– either the read pointer is read/write
going to need more samples direction
than the write pointer can read
provide, or the read pointer is pointer
write
taking samples out of the buffer pointer
at such a slow rate that the
write pointer cannot put any
more data in the buffer without memory
overwriting data not yet read buffer
by the read pointer.
individual memory
The solution here is to add or locations
remove samples when Figure 57 Circular buffer store acting
necessary to ensure the as a pitch changer
pointers never catch up with
each other, and so be able to sustain the pitch change. What makes a good
pitch changer is how and when these samples are created or removed so as to
minimise any audible glitch or click. Chapter 8 in Block 2 mentioned some
techniques for changing the pitch of a sound created using wavetable synthesis,
and similar techniques to these can be used here.
6.4.2 Changing tempo

Pitch changing was explained above, but how can tempo be changed
without affecting the pitch? One answer is to alter the pitch first and
then do the ‘flattening battery’ trick to reduce the tempo but at the
same time lower the pitch back to where it originally was (or the
opposite if the tempo needs to be increased).
However, another method is to chop up the sound into small sound
slices, and then either spread out these time slices for decreasing the
tempo or overlap them for increasing the tempo as outlined in Box 32.
Box 32 Changing tempo using time slices

In this method of changing the tempo, the digital sound samples are divided
into a number of small time slices – say between 10 and 100 each second. The
time slices are overlapped such that the end section of one slice is repeated at
the start of the next slice and so on as demonstrated in Figure 58(a).
To decrease the slice 1 slice 3 sound samples
tempo (slow down
the speed), these
time slices are
simply slid apart
(Figure 58(b)) and (a) slice 2 slice 4
to increase the slice 1 slice 3
tempo they are
shifted further
together (Figure
58(c)).
(b) slice 2 slice 4
The problem is
what to do at the slice 1 slice 3 slice 5
joins between
time slices. A
straight cut
between one time
(c) slice 2 slice 4
slice and the next
causes a severe time
warbling noise to Figure 58 (a) dividing a sound signal into overlapping
be heard because time slices; (b) sliding the slices further apart in time
of the sudden to slow down the tempo; (c) shifting the slices further
together in time to increase the tempo
transitions that
inevitably occur.
A simple solution is to carry out a fast cross-fade between the two time slices, but
even this does not give particularly good results. Ideally, the time slices should be
adjusted for an integral number of cycles of the waveform, and so the time
stretching or compressing is always done by adding or removing an integral number
of cycles. This of course only works well for single frequency sources (e.g. a voice
or single instrument). For general sound sources more elaborate techniques need
to be used to obtain a satisfactory result.
One of the common techniques is to analyse the frequency components of each
time slice in real time using a fast Fourier transform. Within each time slice, the
phase of each frequency component is adjusted so that when the ‘frequency
representation’ of the time slice is transformed back into digital sound samples,
each slice always starts and ends with a zero level. This means that time slices
can be joined together without causing any distortion or warbling sound.
Well, that’s the theory, but in practice this technique does not work particularly
well for sounds that contain the full range of audible frequencies. So sometimes
the above transform process is carried out in parallel for a number of frequency
ranges. In this case the low frequency parts of the sound are processed using
long time slices (e.g. 10 per second) and the high frequencies are processed
using short time slices (e.g. 100 per second) and frequency ranges in between
are processed with time slices between these two extremes. A device or computer
program that carries out this process is sometimes called a phase vocoder.
This transformation of a sound signal into its frequency components instant by
instant can also be used to alter the pitch by changing the frequency of each
component and then making up a new set of sound samples from these new
frequencies. Working with the original sound samples one after the other and
simply altering their values in some way to achieve the desired result is sometimes
referred to as working in the time domain. Working with the instantaneous
frequencies that the sound contains and generating a new set of samples from
this frequency analysis is known as working in the frequency domain.
As you can see from Box 32, the whole process of pitch and time
shifting can become very complicated, particularly if good results are
to be obtained for sound sources that contain frequencies over the
whole audible range, and where large variations in tempo and/or pitch
are required. Any further discussion of the techniques that are
employed are beyond the scope of this course, however, I hope this
section has given you an overall idea of how pitch and tempo changing
can be achieved and has shown you the problems that have to be
overcome when trying to implement such effects.
In this activity you are supplied with a piece of music where one part
is played in the wrong key. Your task is to alter the pitch of the music
so that it is in the correct key. Carry out the steps associated with this
activity which you will find in the Block 3 Companion. I
6.5 Other effects

As mentioned at the start of this section, there are now a large number
of different effects available in audio processors. So far we have looked
at the most common of these in a little detail. For completeness this
section will mention a few more that are often provided, but their
operation will not be considered in detail.
6.5.1 Invert
This is a simple effect that inverts the sound waveform – positive
sound values (either analogue voltages or digital sample values)
become negative and negative ones become positive. This is also
known as changing the phase, and if the inverted signal is added to the
original signal they will cancel out leaving no signal at all. Note that
this is not the same as delaying the signal as described in Box 26 as in
that situation, cancellation only occurs at a specific frequency where
the delay is equal to one half of the frequency’s cycle time and its odd
harmonics.
Inversion can be used to simulate stereo images from a mono signal (as
described in the next section), and it can also be used to change the
phase of a microphone if it is found that it has been connected the
wrong way round (see Box 33).
Box 33 Microphone phase

Microphones should always be connected so that they are in phase. This means
that the increasing air pressure section of a sound wave should produce the
same direction of increasing voltage signal (either in the positive or negative
direction) from all the microphones, and a reducing sound pressure should
produce the opposite change of voltage.
If two identical microphones are placed close together, but their outputs are
connected out of phase, when mixed, their signals will tend to cancel out.
6.5.2 Stereo imaging

In all the previous discussion, not much specific mention has been
made of stereophonic (stereo) sound, where there are two separate
sound signals, one feeding a loudspeaker placed on the left and one a
feeding a loudspeaker on the right. In just about every sound recording
these days, a stereo signal is assumed, so it is instructive to consider
how sound sources can be placed in this stereo sound field (Box 34).
Box 34 The stereo sound field

The ‘sound stage’ between, and sometimes beyond, the left and right
loudspeakers is known as the stereo field as illustrated in Figure 59.
There are two aspects to placing left loudspeaker right loudspeaker
sounds in the stereo field, position and
image. The position or localisation of
a sound is the place within the sound stereo sound
field that the sound appears to come field
from. A sound sent equally to both left
and right loudspeakers will appear to
come from the centre, a sound sent
listener
only to the left loudspeaker will appear
to come from the left side of the sound Figure 59 Stereo sound field
field and so on. The action of localising
a sound source to a certain point in the sound field is called panning, and the
control that achieves this (i.e. determines how much of the signal goes to the left
and how much to the right) is called the pan control.
The image of a sound is how wide or how much space in the sound field the
sound appears to occupy. At the two extremes, a single channel or monophonic
(mono) sound can only be localised to one point in the sound field and has no
image (zero width), whereas a stereo source where the left signal is sent only
to the left loudspeaker and the right signal only to the right loudspeaker will
have an image which consists of the full stereo field. However, if, for example,
the right hand signal of the source was sent to the right and left loudspeakers
equally, but the source’s left signal was only sent to the left loudspeaker, then
the image width would be reduced to the half of the sound field between the
centre and the left loudspeaker.
For a monophonic sound source, there is only one channel, and so

when this is fed into a stereo system, it can only appear at (can only be
localised to) one point in the sound field. However, by mixing in some
out-of-phase signal with the original source, and feeding different
proportions of each to the left and right loudspeakers, not only can
the signal be apparently localised outside the sound field between
(i.e. it appears to come from outside the space between the loudspeakers),
but the sound can be given some image as well. Box 35 explains this
rather surprising idea by showing that a stereo sound can be considered
not only as two separate left and right signals, but as a sum signal
(the mono component) and a difference signal (the stereo information).
The difference signal is obtained by changing the phase (inverting) one
signal, say the right one, and then adding the result to the non-inverted
left signal. It is not surprising therefore that adding some out of phase
signal will introduce some apparently ‘stereo’ sound.
Box 35 Left and right or sum and difference?

As well as representing a stereo sound by separate left and right tracks, the same
sound can be represented by separate sum and difference signals as shown below.
If the left signal is represented by L and the right signal by R, then the sum of
these will be (L+R) and the difference will be (L–R). The (L–R) signal can be
obtained by inverting the R signal to give –R and then adding it to the L signal.
If these sum and difference signals are created, then the original left and right
signals can be regenerated again by adding and subtracting them since:
(L+R) + (L–R) = 2L and (L+R) – (L–R) = 2R
sum + difference sum – difference
The advantage of treating a stereo sound channel as separate sum and difference
signals is that the sum signal is simply the mono signal, and the difference
signal contains just the stereo information.
As long as there are always two independent sound signals, it does not matter
whether a stereo sound is represented by left and right signals or sum and
difference signals (although the sum and difference signals must be converted
to left and right before they are sent to the loudspeakers). Using digital
processing this conversion can be done as much as required without affecting
the signal, but conversion using analogue techniques will always result in a
small amount of crosstalk (one track’s signal appearing on the other and vice
versa). This will reduce the stereo field width, and this must be taken into
account when working in the analogue domain.
Many effects boxes and synthesisers provide stereo outputs from a

mono input (or mono sound) using such phasing effects. In fact some
simple reverberation and chorus units, even if they have stereo inputs
will actually form the sum signal first, generate the effect using this
signal only, and then simulate a stereo output. This is because only
one set of electronic circuits is needed to produce the effect and not
two which reduces the cost.
6.5.3 Vocoder
A vocoder produces an effect that makes a non-musical sound appear
to speak or sing. This is an example of a multi-track effect where the
effect is produced by the interaction of one sound track with another.
Analogue implementations of the vocoder effect have been in existence
for some time, and this is where the term phase vocoder originates (see
Box 32) – although the effects produced and processes involved are not
closely related.
In the case of the vocoder, one track is used to amplitude modulate the
other (amplitude modulation or AM was explained in Chapter 8 of
Block 2). So, for example, the sound of a telephone ringing can be
simulated by using a vocoder on two sound tracks, one containing the
continuous sound of a bell and the other containing a person saying
“ring ring”.
6.5.4 Envelope follower

An envelope follower is again an effect that uses amplitude modulation
between two sound signals. However, in this case it is not the actual
signal that modules the other, but its amplitude (or envelope). In other
words, the volume of the signal that is being modulated varies according
to the volume of the control signal.
This effect is created in much the same way as compression/limiting

(see Section 4.1.2). However, in this case, the amplitude of one signal
is controlled by the amplitude of another rather than by its own
amplitude.
In this activity you will investigate the effects on the stereo sound field of
feeding the left and right signals with different proportions of in-phase
and out-of-phase signals from a monophonic source. You will find the
steps associated with this activity in the Block 3 Companion. I
6.6 Effects provided in the AW16G

The AW16G audio processor contains a large array of effects. As well as
the common ones – reverberation, equalisation, normalisation etc. – the
unit also has a number of effects libraries that contain effects settings for
different types of sound sources (e.g. electric guitar, acoustic guitar, vocal).
There is also a library of effects to simulate a variety of different
loudspeaker types. In addition, effect settings stored in a library can
be recalled on the fly by a scene memory (edit list).
There are two effects processors that can be set up individually to
produce different effects (as shown in Figure 30 earlier). Each effect
unit has its own bus in the mixer section, so it can be fed with a
mixture of sounds from various tracks. However, as an alternative, an
effect unit can be allocated to a single input channel (in which case it
is not available for use by any other source/channel).
6.6.1 Input channel processing

Each of the eight input channels has its own equalisation and
dynamics processing sections, and also an alternative loudspeaker
simulation which is used instead of the equalisation to simulate
different loudspeaker set-ups. In addition, as explained above, one of
the effects processors can be used with an individual channel – in
which case there is a control that allows the balance between the
original sound and the output of the effects unit to be controlled.
Figure 60 shows a block diagram for a single input channel. I will look
at the effects section later.
input library
speaker
mic/line
simulator
input jack
internal effect dynamics to the bus
EQ
input
level
input channel
Figure 60 Block diagram of an input channel in the AW16G

As I mentioned above, there is a library of overall settings that covers

equalisation, dynamics and effects settings for particular applications,
and there are also separate libraries for both the equalisation and
dynamics sections that can further assist the user. For illustration only,
Appendix 1 lists the preset ‘overall’ input settings, and Appendix 2 lists
the presets in the equalisation library. In addition, the user does have full
control of the individual parameters, and can store in the libraries
bespoke settings for a particular recording. Figure 61 shows the screen
display which is accessed when individual equalisation (EQ)
parameters require adjusting. As you can see from the display, the
equalisation is divided into four frequency bands – low frequencies,
low-middle frequencies, high-middle frequencies and high frequencies.
All the parameters I mentioned in Section 6.1 above can be adjusted
individually.
Figure 61 AW16G equalisation editing screen
Similarly, the dynamics processing (compression/expansion etc.) has a

library of preset effects, and the parameters can also be manually adjusted.
Figure 62 shows an example dynamics editing screen – notice the
small graph on the left which changes as the parameters are altered.
On the graph, the x-axis represents the input level and the y-axis
the output level.
Figure 62 AW16G dynamics editing screen
In Figure 62, what type of dynamics processing is being carried out?

Explain your answer. I
6.6.2 Effects units

Each of the two effects units can be set to create a wide range of different
effects from standard reverberation to specialised effects such as a
guitar amplifier simulator. In addition, there are some combined
effects (e.g. reverberation and chorus together – applied either in
parallel or in series). Again for illustration purposes only, Appendix 3
lists the effect types that are available. Remember also that for each of
these effects, the individual parameters can be fine tuned for a particular
input. As an illustration of this, Appendix 4 shows the parameters and
adjustment ranges for just the reverberation effect. A similar set of
parameters is available for each of the other effect types.
6.6.3 Non-real-time effects

Finally, the AW16G has two effects that cannot be used in real time –
that is to say the effects have to be applied after a recording has been
made, they cannot be applied during recording or playback. For this
reason, the AW16G manual classes these as editing operations.
The two effects are time compression or expansion and pitch change.
Complete tracks, or just sections of tracks can be time compressed or
expanded or changed in pitch. Time compression/expansion can be
adjusted from 50% to 200% of the original time, and pitch can be
adjusted up or down by up to one octave in semitone and cent steps
(one cent is 1/100th of a semitone). Both of these effects can take some
considerable time to process.
Note that the two effects units do have a real-time pitch shift effect, but
clearly time compression/expansion cannot be done in real time!
6.6.4 Summary
As you can see, the AW16G offers a wide range of different effects,
with a large number of preset settings held in libraries. Most of the
time, there will be a preset setting that is suitable for most situations.
However, individual parameters can be adjusted if a special setting
needs to be created, or there needs to be some fine tuning of a preset
setting. Unfortunately, like the AW16G editing operations, setting up
these bespoke settings can be a little tedious because of the small
display and lack of a pointing device (mouse) as would be the case
when using a sound processing package on a desktop computer.
7 EXTERNAL CONTROL
The ability to control an audio processor remotely has a number of

advantages:
• it allows the control panel to be located at a place where it may not
be convenient or possible to locate the actual audio processor – this
may be because of space restrictions or problems with getting all
the various sound source connections to the required control point;
• it allows the possibility of computer control;
• it allows the possibility of storing the processor settings for recall
at a later time.
It is not easy to provide external control of a purely analogue audio
processor as the controls on the front panel directly control the audio
signals. However, if the audio device uses a digital control system, even
though the audio signals are processed using analogue techniques, then
such external control is possible. Of course a fully digital audio processor
will always have the possibility of being able to be controlled remotely.
Simple external control of an audio processor may not offer much
more than control of the various volume controls (faders), whereas full
external control will allow every aspect of the device to be controlled.
Sometimes audio processors are supplied with specialised computer
programs that can be run on a desktop computer. These will provide
graphical and other facilities to enable the audio settings and effects to
be configured and stored easily and even dynamically controlled

during recording or playback.
7.1 External control connections

There are a number of common connection methods to enable the audio
processor and the control unit (special device or computer) to communi-
cate with each other. Some of the possible methods of control are:
• a specialised proprietary connection – either wired or wireless
(infra red or radio);
• the now common universal serial bus (USB) or FireWire computer
interconnect methods;
• the musical instrument digital interface (MIDI).
The advantage of using a commonly-used connection method is that it
offers the possibility of being controlled by a wide range of devices, and
therefore helps sales of the audio processor itself. In addition, some of
the interconnection methods are sufficiently fast to allow one or more
digital sound signals to be sent between the processor and its controller
using perhaps the AES/EBU or S/PDIF protocols described in Section 2.3.
As I mentioned earlier, with the speed capabilities of FireWire, the
trend at the moment (in 2004) is to integrate the hardware of the audio
processor with the desktop computer even more so that not only is
control information sent between the computer and audio unit, but the
actual sound samples are as well.
An example of this is the Yamaha 01X Digital Mixing Studio which was
released in 2003. This device can be used as a stand-alone digital audio
mixer, but it has no sound storage capability, and for recording, needs
to be connected to a desktop computer where the sound data is stored.
When connected to a computer, the computer has control over most of
the audio unit’s functions, including altering the main channel faders
which are motorised and physically move to positions set by the
computer. In addition the computer can ‘read’ and store the settings of
the various controls the user has set up on the 01X for recall and/or
modifying later. Software for both a Windows-based computer and a
Macintosh computer is supplied that gives a comprehensive graphical
interface for convenient control of the mixer. The device can also be
controlled from within other commercial audio editing programs.
The MIDI system will be explained in Chapter 3 of this block, but if
you are familiar with MIDI, the use of this as a control method might
at first seem rather strange, as MIDI is a system for transmitting a
representation of the music itself, and not control information.
However, as you will see in Chapter 3, the MIDI system has a facility
to include ‘system exclusive’ data, and it is these types of data that are
used to control the audio processors. An additional advantage of using
MIDI is that many such devices already have MIDI connections as they
are able to manipulate MIDI as well as audio data. Therefore using the
MIDI connection to control the device means there is no need to add
an additional control interface. The disadvantage of MIDI is that the
interface is quite slow compared with modern computer interfaces and
far too slow to send digital audio sound samples in real time, and so
this connection system is limited to carrying control information only.
7.2 External control facilities on the AW16G

In the AW16G audio workstation, external control is provided by a
standard MIDI interface. This interface allows the user to carry out the
following procedures:
• synchronise the AW16G’s operation with external devices (the
AW16G can act as a master or a slave device for synchronisation);
• remote control of the transport buttons (play, stop, fast forward etc.
can be controlled remotely);
• external control of ‘scene’ changes (scenes can be recalled by signals
from a remote device);
• bulk dump/reload (all of the AW16G’s internal settings can be
backed up to a computer and reloaded at a later date if necessary);
• remote controller (the AW16G’s sliders and track select buttons can
be used as physical controllers for another audio device or
computer program).
All of these operations are carried out through the two MIDI connectors
on the back panel. The various operations are controlled by a number of
MIDI set-up displays. Figure 63 shows the main MIDI set up screen
display. Note though that no actual music codes (as will be explained
in Chapter 3 of this block) are transmitted or received through the MIDI
interface. The interface is used solely for transferring control and
parameter data.
Figure 63 AW16G MIDI set up screen
8 SUMMING UP
In this chapter you have looked at the whole process of making a
master recording, from the forms of audio signal that may need to
be recorded, through the cables, connectors and inputs by which
the signals are input to the recording device to the methods of
recording the sound and the subsequent stages of editing, mixing
and adding effects. Later in this block you will continue on from
where this chapter left off and look at the processes, systems and
equipment that are involved with the copying and distribution of a
master recording.
Throughout the chapter, you have seen how the processes of producing
a master recording are achieved in a real digital hardware device –
the Yamaha AW16G – as well as getting practical experience of
them through the course’s music recording and editing software.
However, before leaving this chapter, I want you to have a go at
producing a real master recording that is more substantial than the
ones you have tackled so far.
8.1 The TA225 Course Tune

The TA225 Course Tune is a short tune that has been specially written
for the course. You have heard the first part of this tune a number of
times already during your study of the course from the listening
activity way back in the very first activity of the course to the sample
song that you have been working with in this chapter. The complete
tune is given in Figure 64.
Figure 64 The TA225 Course Tune Copyright © 2004 The Open University
The TA225 Course Tune has also been harmonised and there are some
specially written words for it as well. In the final activity of this
chapter, you will have the chance to work with this tune to produce
your own master recording.
The TA225 Course Tune will be used again in Chapter 3 of this block
where you will work with the MIDI version of the tune and you will
also see what a professional musician and composer can do when
given just the tune to work with.
ACTIVITY 40 (COMPUTER, LISTENING) .....................................................
This activity is an open-ended activity and you can spend as much or

as little time on it as you wish. The Course Team has supplied you
with a number of resources for the TA225 Course Tune, and in this
activity you will work with these to produce your own master
recording. You may even like to add or use your own material in your
recording. The activity involves assembling the musical elements,
editing and mixing them and adding effects. The Block 3 Companion
contains some more information about this activity and an outline of
the procedure you should follow. I
SUMMARY OF CHAPTER 1
Desktop sound is the equivalent in sound output. Input sensitivities are usually given
terms of desktop publishing of textual in terms of r.m.s. values and are often
material. Desktop sound is the process of specified in decibels. (Sections 2.1 and
producing a fully edited and mixed master 2.1.1)
recording to professional standards using
equipment that conveniently fits onto a Three decibel scales are used for specifying
desktop. Desktop sound became a reality in input sensitivities or amplitudes of sound
the 1990s because of advances in digital signals in their analogue electronic form
audio and storage technologies. (Section 1.1) – dBV, dBu and dBm where 0 dBV is 1 volt
r.m.s., 0 dBu is 0.775 V r.m.s. The dBm
Producing a master recording first involves scale is a power ratio scale, but can often
assembling all the raw elements of the be approximated to the dBu scale.
recording. This may include making (Section 2.1.2)
acoustic recordings of live performances,
recording directly from electronic Input sensitivities vary widely, but a
instruments, or obtaining already recorded typical value for a high sensitivity analogue
material either in analogue or digital form. audio input is 1 mV r.m.s. and perhaps
(Section 1.2.1) 20 mV r.m.s. for a line level input.
Sometimes the impedance of the input can
For all but the simplest of recordings, some have an effect on signal levels, interference
editing and mixing of the various sound and noise. Electrical impedance is a frequency
elements will then be needed. Mixing dependent quantity that represents the
involves adding proportions of the various resistance to flow of electricity. It is
sound sources to create the required overall measured in ohms. (Section 2.1.3)
sound. Sometimes mixing needs to be done
in stages. Editing involves cutting out, In order to monitor signal levels, various
inserting, swapping or moving sections of metering systems are used, the common ones
sound. Editing of individual sound sources being an indication of the average signal
or of the overall mixed sound may need to level and an indication of the peak level.
be carried out, even if it only involves (Section 2.1.4)
fading in and fading out at the start/end of
the recording. An edit list is a list A balanced input uses two wires for the
containing a record of the editing/mixing sound signal – one wire contains an
operations and can be used to speed up the inverted version of the signal on the other
editing/mixing process, it can also allow wire. Such an arrangement makes the signal
non-destructive editing to be carried out less susceptible to external interferences
whereby the original sound elements are not because, as any such interference is likely
altered. (Section 1.2.2) to affect both signal wires equally, it will
be cancelled out in the receiving audio
Finally there is a large range of effects that device. (Section 2.1.5)
can be applied to individual sound sources
or to the mixed sound, from commonly In order to power condenser microphones,
used ones like equalisation, reverberation high sensitivity analogue inputs sometimes
and chorus to more specialised effects. allow a steady state voltage called phantom
(Section 1.2.3) power to be added to the signal wire.
(Section 2.1.6)
The Yamaha AW16G Professional Audio
Workstation is a desktop sound device that Analogue sound outputs on audio
incorporates a digital sound mixer and a equipment are usually at line level unless
multitrack hard disk recorder. It includes they are special outputs such as those
all the features and facilities needed to be designed for headphones or loudspeakers.
able to create a professional-quality master Loudspeakers can require quite high voltage
stereo recording including editing and signal levels. (Section 2.2)
effects facilities. (Section 1.3)
In order successfully to implement digital
There are two main types of analogue sound inputs and outputs, there must not only be
inputs – high sensitivity for microphones a specification for the physical form of the
and other sound sources that have a low data, but there must also be an agreed
electrical output and low sensitivity or line protocol that determines how the data is to
level for devices that have a higher level be interpreted. The AES/EBU, S/PDIF and
MADI specifications are all related and connection systems are just used as a
contain protocols to specify how digital transport means for AES/EBU or S/PDIF
sound samples can be sent in real time as a formatted sound data, but they can also be
serial stream of digital data. (Section 2.3) used to transfer sound samples in their raw
form without forming them into sub-frames
In the AES/EBU specification, the digital and frames, etc. (Section 2.3.4)
sound samples are sent serially along the
connection. Samples for each track (if there The loss of signal level, reflections of the
is more than one) are sent interleaved. The signal at each end, and the addition of
data rate is variable, but is set such that one interference and noise can all be affected
sample for each and every track is sent by the type and construction of the cables
within the sample period. The data is sent used to carry sound signals. Loss of signal
in a form that enables the receiver to recreate depends mainly on the length of the cable,
the original bit stream by measuring the time but can also be affected by its characteristic
between zero level crossings of the signal. impedance, as can the effect of any
This allows the receiver to recognise reflections that occur at the ends.
individual bits correctly and more Interference can be reduced by using a
accurately than if amplitude levels were screened cable whereby the signal wire is
measured. In order for the receiver to decode enclosed in a sheath that is connected to
individual sound samples from the stream the equipment’s ground or earth
of bits, synchronisation data is added. This connection. Digital connections can use an
is done by assembling the sound samples optical method whereby the signal is sent
into sub-frames, frames and blocks. A sub- down an optical fibre as light. This type of
frame contains a special preamble set of bits connection can easily cope with the much
followed by the bits for one sample from higher frequencies required by a digital
one track and some additional control/status sound signal and also provides electrical
bits called channel status information. A isolation between the sending device and
frame consists of one sub-frame for each the receiver. (Section 2.4.1)
sound track. A block consists of a set of 192
frames and provides a means whereby the There are a variety of different connectors
few status/control bits in each sub-frame can that are commonly used for audio signals.
be collected together to provide important Some provide a locking mechanism to
information about the form of the sound prevent accidental disconnection and some
samples. Such information includes the provide connections for two signals
sample rate, the number of bits per sample (together with a common earth or ground
and other control data. (Section 2.3.1) connection) that enables them to be used
with a balanced signal or a stereo signal.
The S/PDIF standard is fully compatible Connectors for optical connections must be
with the AES/EBU standard in terms of the constructed so as to minimise light loss at
physical form of the signal and the sub- the connection. (Section 2.4.2)
frame/frame/block format. A bit in the
channel status information indicates The AW16G offers eight combined mic/line
whether the signal conforms to the S/PDIF balanced analogue inputs and a S/PDIF
standard or the AES/EBU standard. In the optical digital input. The device provides
S/PDIF format, the channel status bits have three analogue stereo outputs – a main
slightly different interpretations, and there output, an auxiliary output and a
is a substantial amount of unspecified data headphone output – and a S/PDIF optical
that can be used for future enhancements digital output. (Section 2.5)
to the specification. (Section 2.3.2)
There are a number of different methods of
The multi-channel audio digital interface digital sound recording, at present the most
(MADI) is a multi-track version of the AES/ useful of these for desktop sound is the
EBU system that allows up to 56 tracks to multitrack hard disk recorder. (Section 3.1)
be transferred simultaneously. It uses a
much higher data rate, a slightly different A hard disk unit contains one or more
coding scheme, and a separate timing signal rotating disks which are coated on both
to ensure individual bits are decoded sides with a magnetic material. The data is
correctly. (Section 2.3.3) stored/read by a set of read/write heads that
move radially across the disk. The disks
Increasingly now, digital sound data is rotate continuously and are sealed during
being transferred using the connection manufacture to minimise the effects of dust.
systems used in computers, particularly The data is stored in concentric circles
USB and FireWire. Sometimes these called tracks which are divided radially into
sectors. The access time of an item of data recording of 8 tracks and a stereo channel
on a hard disk has two components – the and simultaneous playback of 16 tracks and
time for the read/write head to move to the a stereo channel. In addition there can be
required track and the time for the required up to 8 virtual tracks associated with each
sector to appear under the head. The major track that can be used for different ‘takes’
advantages of using hard disks in the of the same recording. The unit also
recording and mastering processes are the contains solid state memory for use as
small access time and the multitrack temporary storage and an optional CD drive
capability. However, even though hard is available. (Section 3.5)
disks have a very small access time, some
solid state memory needs to be used as a Normalisation is the process of adjusting the
temporary store as well. Modern desktop level of a digital sound signal to use the
computers with suitable software are able full dynamic range. Audio compression is
to carry out most if not all of the operations used to reduce the dynamic range, and
required for desktop sound; however, a audio limiting is used to prevent distortion
dedicated device will often be more occurring from overload. Expansion
compact and portable and will be designed increases the dynamic range and gating
specifically for sound recording and switches off a signal when it reduces below
processing, and so may well have a better a certain level. There are a number of
overall performance than a desktop parameters associated with compression,
computer. (Section 3.2) limiting, expansion and gating – threshold,
amount, attack time and decay time.
Solid state memory has no moving parts and (Section 4.1)
has a very short access time. Random access
memory (RAM) is usually volatile – it loses Editing is the process of adding, deleting,
its contents when the power is removed – merging or swapping sections of a recording.
and so can only be used for temporary Analogue techniques involved cutting and
storage of sound data. It is also more joining sections of magnetic tape to avoid
expensive and not available in such large reducing the quality through making
storage sizes as hard disk units. Flash multiple generations of copies. Digital
memory is a form of non-volatile solid state techniques do not have this problem, and
memory. However, once data has been carrying out editing is usually just a matter
stored, it has to be erased in blocks before of reorganising the sound samples. Fading
new data can be stored. Also flash memory in, fading out or cross-fading between
can only be erased and reprogrammed a sections involves carrying out a large
finite number of times. (Section 3.3) number of simple numerical calculations on
sound samples. Non-destructive editing
Sound data is stored in a computer in a
occurs when the original material is not
number of different file formats. Three
altered. An edit list is sometimes used to
common formats are AU, AIFF and RIFF
automate the editing process. (Section 4.2)
WAVE. In all the formats, additional data has
to be included to indicate the sample rate,
the number of quantisation levels of the sound The AW16G has facilities to edit tracks and
data and the number of tracks as well as other all the common editing operations are
information about the sound data that is available. However, the lack of a large
stored. Where there is more than one track, display screen and pointing device can
the samples from each track are interleaved make editing quite tedious and time-
to enable the sound to be replayed as it is consuming. (Section 4.3)
read from the file. The AU format is the
simplest format and consists of a header Mixing is the process of combining individual
section, an optional comment section and the sound sources to create the final required
sound data section. The AIFF format uses a overall sound. Mixing using analogue
number of self-contained sections called techniques involves adding proportions of
chunks – the basic chunks used are the header, each analogue source. For digital sources,
common and data chunks. Similarly, the mixing is done by calculating fractions of
WAVE format uses chunks, the basic ones each sample from each source and adding the
being the header, format and data chunks. results together. For both analogue and digital
There are a number of additional optional mixing, care must be taken to ensure the
chunks available with both the AIFF and amplitude of the mix is not too large that it
WAVE formats. (Section 3.4) causes distortion. To cater for this, mixer units
often have a larger dynamic range than that
The AW16G workstation contains a hard of the individual sources. (Sections 5.1, 5.2
disk unit that allows simultaneous and 5.3)
The mixer section of the AW16G contains an ensemble of the same types of instrument
inputs from the audio inputs and the hard playing together. Analogue generation is
disk track recorder as well as from two effects complicated, but digital techniques are
sections. The mixed sound is output from the more straightforward and involve the use
mixer on one or more audio buses that can be of a number of varying delays connected in
fed to the device’s outputs as well as to the parallel. (Section 6.3)
hard disk recorder and the effects units.
Various mixing modes are available that are Changing pitch without varying the tempo
designed to be used at different stages of the and vice versa are useful effects to have
mixdown process. (Section 5.4) available. Pitch can be changed by creating
a reducing or increasing delay. Tempo can
There is a large range of effects that can be be changed by slicing the sound into short
applied to both individual tracks and to overlapping sections and then sliding these
final mixes. Equalisation is one of the most sections together or apart in time. Both
common effects and this involves adjusting effects need great care to implement
the level of the sound components in one satisfactorily, particularly for large
or more frequency bands. Simple variations. (Section 6.4)
equalisation consists of treble and bass
controls, but more elaborate equalisation can Another common effect is invert where the
involve one or more mid-range frequency sound waveform is inverted. This can be
bands with full control over the centre used to correct for incorrect signal
frequencies, the width of the frequency connections and to provide stereo imaging
bands (the Q) and the amount of boost or effects. A stereo channel can be thought of
cut. (Section 6.1) as being composed of a sum and difference
signal rather than a left and right signal.
Echo and in particular reverberation are Individual sound sources have a
widely used effects to simulate the localisation and an image within the stereo
acoustics of a room or building. Creation of sound field. A vocoder is an effect created
echo involves adding a delayed proportion by amplitude modulating one sound source
of the signal to the original signal. by another. An envelope follower also uses
Generating a delayed signal with analogue amplitude modulation, but here it is the
techniques is not easy if quality is not to amplitude of one signal that changes the
be compromised, however with digital amplitude of the other. (Section 6.5)
signals either a temporary buffer called a
FIFO can be used to delay the sound The AW16G contains a large range of effects
samples, or a special device that contains a that can be applied to individual channels
string of storage elements can be used. and to mixes of sources. There are a number
Reverberation is more complicated to create of effects libraries that allow common
as it involves creating not only the early settings to be set up quickly. Bespoke
reflections, but also the multi-reflections settings can also be created and stored in
that form the actual reverberation. Analogue the libraries. Some effects like pitch and
reverberation can be simulated using a time changing cannot be carried out in real
special room, a metal plate or coiled time. (Section 6.6)
springs, but the results are never particularly
good, and only minimal adjustment of the External control of audio devices can allow
parameters is possible. Digital techniques the device to be remotely controlled, to be
for reverberation involve combining a controlled by a computer or to enable the
number of individual delay units each with settings to be stored. There are a number of
differing delays. (Section 6.2) common interconnection methods,
commonly USB, FireWire and MIDI are
Flanging is an effect that is created by used. (Section 7.1)
adding a delayed version of a signal to the
original signal which has the effect of The AW16G uses a MIDI interface to give
reinforcing some frequencies and cancelling remote access. This provides facilities to
out others. By making the delay variable, enable the AW16G to synchronise its
the characteristic flanging sound is operations with other devices (in both
produced. Flanging was originally master and slave modes), to allow ‘scenes’
produced using two identical but not to be recalled, to allow recording/playback
synchronised tape recorders to create be started and stopped remotely, to enable
slightly different and varying delays. Digital the workstation’s settings to be backed up
simulation is created by using a delay unit and to allow the device’s front panel slider
with a varying delay. Chorus is an effect controls to be used to control a remote
that is designed to simulate the sound of device. (Section 7.2)
APPENDICES
The tables in these appendices are given for information only as an illustration of the
types of settings and adjustment parameters that a typical sound processing device (or
computer program) might provide.
Appendix 1 – AW16G Input library list

Appendix 2 – AW16G Preset equalisation library list

Appendix 3 – AW16G Effects library list
(continued ...)
Appendix 3 – AW16G Effects library list (continued)
Appendix 4 – AW16G Reverberation parameters
ANSWERS TO SELF-ASSESSMENT ACTIVITIES
Activity 3
Non-destructive editing is the process whereby a sound is edited in some
way without the original sound being altered. This can be done by
using an edit list. (It may also be done by making copies of the original
sound and editing the copy rather than the original, however, this is
not possible with analogue devices unless quality is compromised, and
may not be able to be done with dedicated desktop sound devices.)
Activity 5
(a) If the peak-to-peak amplitude is 20 mV, then the peak amplitude
will be 20/2 = 10 mV. Thus the r.m.s. voltage will be 10 × 0.71
= 7.1 mV.
(b) If the r.m.s. amplitude is 71 mV, then the peak value will be 71 ÷ 0.71
= 100 mV.
Activity 6
(a) –6 dB represents a halving of a quantity, so the sound will have a
level of 40 – 6 = 34 dB.
(b) +40 dB represents a multiplication of 100 times, so the sound will
have a level of 40 + 40 = 80 dB.
Activity 7
(a) 2 V (+6 dB is a doubling and 0 dBV is 1 V)
(b) 0.0775 V or 77.5 mV (–20 dB is a tenth and 0 dBu is 0.775 V)
(c) 0.05 V or 50 mV (–26 dB can be thought of as –20 dB, or a tenth,
followed by –6 dB or a halving with 0 dBV being 1 V)
(d) 0.001 V or 1 mV (–60 dB can be thought of as (–20) + (–20) + (–20)
or one tenth times one tenth times one tenth or one thousandth
with 0 dBV being 1 V)
Activity 8
If 0 dBV represents a signal level of 1 V r.m.s., then 20 mV can be
thought of as being composed of a doubling of this reference level (2 V)
followed by a hundredth of this value (2 V ÷ 100 = 0.02 or 20 mV).
Since +6 dB represents a doubling and –40 dB represents one
hundredth, so 20 mV is represented by 0 + 6 – 40 = –34 dBV.
There are other ways of explaining the result, for example taking one
hundredth first and then doubling, or dividing up the decibel value into –
40 and +6 first and then seeing what this means in terms of voltages.
Activity 12
In the AES/EBU system, a single digital sound sample from one sound
channel is sent in one sub-frame. For a stereo system, two sub-frames
will be transmitted for every sound sample. Since a sub-frame consists of
32 bits, between each sound sample the transmitter must send 32 × 2 =
64 bits of data along the serial interface. The data rate of the serial
interface must therefore be 64 times the original audio sample rate, or
64 × 44.1 × 103 = 2 822.4 × 103 or about 2.8 M bits per second.
Activity 13
The system works in real time such that one sample from each sound
channel is transmitted within the sampling interval of the original
digital sound data. Thus, the more channels, the more digital data that
has to be sent within a sample period, and so the higher the bit rate
needs to be to incorporate this data.
Activity 14
If the microphone has a balanced output, a screened cable with a
twisted pair of signal wires needs to be used. The screened
construction is needed to minimise interference particularly since
low-level microphone signals are being carried, and there needs to be
two signal wires to carry the balanced microphone signal.
The connectors need to have 3 connections – 2 for the signal wires and
one for the screen of the cable. Both TRS jack connectors and XLR
connectors could therefore be used, but because the microphones are
being used on location, XLR types are to be preferred as they are robust
and have a locking mechanism to prevent accidental disconnection.
Activity 15
–46 dBu can be thought of as –40 dBu and –6 dBu. –40 dB represents
one hundredth and –6 dB is a halving. Therefore, if 0 dBu is 0.775 V,
–40 dBu is 0.775/100 = 0.00775 V, and –46 dBu is 0.00775/2 = 0.003875 V
or 3.8 mV.
Activity 18
One Gbyte is the same as 1024 Mbytes, so if one CD can store 640 Mbytes
of audio data, then 64 Gbytes can store 64 × 1024 ÷ 640 ≈ 102 CDs
worth of stereo digital sound.
Activity 19
RAM is not suitable because it is volatile which means that if the
battery in the audio player runs flat or needs to be changed, any sound
that is stored will be lost (but note that the device will contain some
RAM that is used as temporary storage during playing or recording).
Activity 21
Each sample requires 16 bits which uses 2 bytes. There are 2 tracks
(the left and right stereo tracks), so four bytes are needed to specify the
amplitude values for each sample point. If there are 44 100 samples
every second (sample rate is 44.1 kHz), then the total number of bytes
that must be read every second is 44 100 × 4 = 176 400 bytes.
Activity 23
The description is of the process of gating a digital sound signal, since
if the sound level (sample magnitude) is below a certain point (the
threshold magnitude), then the sound level is set to zero.
Activity 25
(a) If the sample rate is 48 000 samples per second, then within the
fade-in period there will be 48 000 × 2 = 96 000 sound samples.
At a point one quarter of the way into this period (i.e. after one half
a second), the sample number will be 96 000 ÷ 4 = 24 000.
(b) If the fade in period is linear, then after one quarter of the fade in
period, the sound amplitude should be one quarter of its final
level. Thus the multiplication factor for this sound sample should
be 0.25.
Activity 29
The difference in the number of bits is 24 – 16 = 8. Hence the number
that each sample has to be divided by is 28 or 256.
Activity 32
From your study of Section 3 in Chapter 4 of Block 1, you should
remember that the reverberation consists of the direct sound, followed
by the early reflections and then the multiple reverberations. Figure 65
shows the typical form of this for the hand clap in a reverberant room.
amplitude
direct
sound early
reflections
reverberation
time
hand clap
Figure 65 Answer to Activity 32
Activity 34
Cancellation first occurs at a frequency where the delay time is equal
to one half the cycle time of the signal. In this case, if the delay time is
1 ms, the cycle time of the signal must be 2 ms for cancellation to first
occur. This corresponds to a frequency of 1/2000 which is 500 Hz.
The same situation occurs for every odd harmonic, i.e. 1.5 kHz, 2.5 kHz,
3.5 kHz, etc.
Activity 39
Compression is being carried out. The easy explanation is to say that
the dynamics type is ‘comp’ standing for compression! However,
without this indication, the graph indicates that for low input levels,
the output level rises linearly with input level (the graph is a straight
line at 45° to the axes). However, above a certain point, the output level
rises less than the equivalent input level rise. This has the effect of
compressing the dynamic range of the input signal. (Compare this
graph to the one shown in Figure 25.)
LEARNING OUTCOMES
After studying this chapter you should be able to:

1 Explain correctly the meaning of the emboldened terms in the main
text and use them correctly in context.
2 Describe what is meant by desktop sound, what it involves and
why it has come about.
3 Outline the stages involved in producing a master recording that
uses a range of different sound sources.
4 Work with signal amplitudes expressed in terms of r.m.s. and
decibel values and describe the main types of metering systems
and their uses. (Activities 5, 6, 7, 8 and 15)
5 Describe the types, technical features and typical uses of the audio
input and output connections – both analogue and digital –
commonly found in audio processors and suggest, with reasons, a
particular input or output connection that would be the most
suitable one to use in a given situation. (Activities 12 and 13)
6 Outline the types and construction of the cables and connectors
commonly used for audio signals, and make informed choices of

the cables/connectors to use in a particular given situation.
(Activity 14)
7 Suggest the characteristics of and give reasons for a suitable storage
method for use during a given stage in the mastering process.
8 Outline the construction, operation and important features of a
hard disk recording unit and make informed deductions given
relevant details about a specific device. (Activity 18)
9 Outline the characteristics and common uses of RAM and flash
memory for the storage of digital sound data. (Activity 19)
10 Describe the common characteristics of computer file formats for
storing audio information and suggest the most suitable format to
use in a given situation. (Activity 21)
11 Describe the function of and processes involved in the editing of
audio signals, including normalisation, compression, limiting,
expansion and gating and outline how each might be accomplished
using both analogue or digital techniques. (Activities 3, 23 and 25)
12 Describe the mixdown process and the possible problem and its
solution for signal level of adding signals together and outline how
the mixing of audio signals might be achieved using both analogue
or digital techniques. (Activity 29)
13 Describe each of the various sound effects that have been
introduced in the main text, outline how each might be
implemented using, where appropriate, both analogue or digital
techniques, and mention some common situations where each
effect might be used. (Activities 32, 34 and 39)
14 List some of the common interface methods that are used to
provide computer control of audio equipment, mentioning the
uses, advantages and disadvantages of each.
15 Outline the purpose and operation and/or features of the various

systems and specifications introduced in the main text and carry
out appropriate calculations, descriptions and deductions given
any required details. (Activities 12 and 13)
16 Apply the principles and ideas introduced in the main text to new,
given, situations in order to carry out appropriate calculations,
descriptions and deductions concerning the new situation.
Acknowledgements
Grateful acknowledgement is made to the following sources for
permission to reproduce material in this chapter:
Alistair Jones and Peter Peck of Yamaha-Kemble Music (UK) Ltd for
help with the AW16G case study material.
Figures 2(b), 30–38 and 60–63: ‘Yamaha AW16G Manual’ Yamaha-
Kemble Music (UK) Ltd.
107 TA225 BLOCK 3 SOUND PROCESSES CHAPTER 2 NOTATION AND REPRESENTATION 107
TA225
Block 3 Sound processes
Chapter 2
Notation and
Representation
CONTENTS
Aims of Chapter 2 108
1 Introduction 109
2 The functions of music notation 110
3 A brief history of notation 111
4 Printing 116
4.1 Introduction 116
4.2 Letterpress 117
4.3 Engraving 118
4.4 Photolithography and the Halstan process 119
4.5 Computer setting: Sibelius 121
AIMS OF CHAPTER 2
I To describe briefly the evolution and functions of Western musical

notation.
I To show through video sequences the development of music
typesetting from letterpress to computerised music setting.
I To outline briefly the principal technologies of print.
1 INTRODUCTION
One of the topics in Chapter 1 of this block was the storage of sound and,
in particular, the storage of music. Storage in that case depended on
representing the pressure wave that the listener hears in a permanent
form. The pressure wave in the air was picked up by microphones and
an electrical representation created. The electrical representation was
used to create a record in a permanent medium. In this instance, a digital
format was used, although analogue representation could have been used.
Another method of storing music is in terms of ‘codes’ or ‘instructions’
that represent how the music should sound or be played. Such methods
are not concerned with accurately recording a pressure wave, but with
storing information that will enable the pressure wave to be re-created
by an instrument. There are several methods by which this can be
done, from the simple example of the pins on a rotating cylinder in
a musical box to today’s MIDI system. Conventional music notation
too can be regarded a system of codes or instructions for recording
and recreating music – with the proviso that there is an element of
approximation in the way that notation represents music, and a
degree of latitude in how it should be interpreted.
In this chapter we will be looking exclusively at conventional music
notation, and in Chapter 3 of this block you will find out about MIDI
and other forms of ‘coded music’. The purpose of this chapter, though,
is not to teach you how to use notation (that is, how to read it or how
to transcribe music into notation) but to look briefly at some of the
interactions between technology and notation. Much of the material in
this chapter will be presented using a number of video sequences,
rather than printed text.
A striking characteristic of Western art music is the way the word
‘music’ has almost become synonymous with music notation. This
dual meaning of the word is nicely demonstrated by the story of a
young music student who was about to perform to some examiners.
Before beginning, the student asked, ‘Do you mind if I play without the
music?’, to which an examiner is alleged to have answered, ‘By all
means dispense with the notation, but please let us have the music.’
This dual meaning of the term ‘music’ is understandable: it is almost
impossible to imagine how a Wagner opera or a Mahler symphony
could be performed without notation, or how they could have been
composed without the use of notation. This is not to say, however, that
Western art music has always been accurately notated in all respects,
nor that Western art music is the only kind that uses notation. In
earlier times, many of the details of rhythm, ornamentation and
dynamics in Western music were often left unnotated because it
was understood that the performer would supply what was missing.
Outside art music, in jazz, folk and popular music, where improvisation
is usual, notation is quite often used, though generally in a simplified
form as a reminder – a skeleton of the tune or the chord progression –
rather than as an encapsulation of the finished work. Indeed, Western
notation appears to have begun as just such an aide-mémoire, intended
for people who were already familiar with the music. In non-Western
music, notation is found in the musics of (for example) China, India,
Indonesia and Japan, although, once again, it is often more of a reminder

than a representation of the work as performed. Despite the diversity in
types and applications of notation that exist and have existed, we shall
concentrate in this chapter on Western art-music notation.
Even though we shall limit our view to Western notation, we are not
dealing with a single, unchanging system. There have been various
systems for notating music in Western music, and many variations on
them. Part of a modern performer’s task in coming to grips with
notation from the past consists of understanding what a particular
notation is trying to represent, and relating it to the performing
conventions at the time when the music was notated.
Probably no development has revolutionised the use of notation more
than the development of music printing, which spread notated music
faster and further than ever before. The technological history of music
printing in some ways parallels that of the printing of verbal texts, but
music posed a particular set of problems. Printers’ solutions to them in
turn affected the development of notation. In Section 4 we shall look at
some of the methods of printing music that have been used, but before
that (in Sections 2 and 3) we shall survey briefly the functions and
history of notation.
2 THE FUNCTIONS OF MUSIC NOTATION
Music notation has developed a range of uses beyond the obvious one of
transmitting a musical work from a composer to a performer and,
ultimately, to an audience, and in this section I want to look briefly at
some of these other functions.
One consequence of the development of notation has been the development
of a body of works (or canon) that are thought to be of special standing
historically or aesthetically (or both). For instance, the repertory of Western
music in the Mediaeval period is largely dominated by beautifully written
volumes of Latin church music, which were the repositories of specially
valued pieces. There are hints that notation played a part in regulation of
liturgical practices – the copies promoted the ‘correct’ musical forms for
church services. Similarly, composers of the nineteenth and early twentieth
centuries attempted to control performances of their music by making the
notation as detailed as possible.
Although many Mediaeval volumes were no doubt put to practical
use, some of the more elaborate ones were more probably library,
archive or presentation copies. Here notation was a way of ‘keeping’
music, an ephemeral art in performance, for eternity – or at least for
the next generations. At nearly all periods in Western history there seem to
have been collectors of music, a few of them probably not even able to make
sense of the notation themselves. (This was the case even at times when the
composers themselves had no thought of their music being performed by
posterity.) The possession of notated music was sometimes an end in itself
– a sign of culture and status – but more often it was specifically tied to the
owner’s affection for, or feeling of duty towards, a particular repertory.
Before the advent of musical recording, a volume of notation fulfilled much
the same function as a record – it was music waiting to be brought to life.
Another function of notation relates to the way music is composed.

Although composition is often thought to precede notation, as though
the work were composed first and notated afterwards, for some
composers the processes of composing and notating are not completely
separable, rather as many writers find that developing their literary
ideas is not separable from the act of creating a text. Notation gives
music a tangible form, and for many composers the manipulation of
the notation is a way of developing their musical ideas.
The final function of notation I want to introduce is analysis. In a lot
of musical research, musical analysis is intimately bound up with a
notated representation of the music. Scores are studied in order to
establish how the music works, what the main features are, and how
the parts function in relation to the whole. A conductor, for instance,
when learning a new piece, will spend a long time silently studying
and analysing the score of the piece, rather as an engineer studies a
chart or a circuit diagram to see how a complex device works. The use
of notation for analytical purposes is so common that it is usual when
analysing unnotated styles of music (for example a jazz solo or a piece
of non-Western music) to transcribe a recording of the piece into
standard notation and to use the transcription as the basis of the
analysis. This procedure can be controversial, as Western notation
embodies certain assumptions about music that may not be valid for
music of other cultures. Nevertheless, the procedure is well
established, and can be seen at work in many academic journals
relating to jazz, popular music and non-Western music.
3 A BRIEF HISTORY OF NOTATION
The beginnings of Western musical notation lie with the notation of

sacred music in the ninth century. The music that was notated was
plainsong and plainchant (the two terms are generally held to be
synonymous), and these terms refer to the repertory of monophonic
unison (or octave) Latin chants that were the descendants of the
liturgical intoning of religious texts. The notation used for plainsong
was very different from modern notation, and notation as we now
know it gradually evolved over many centuries, arriving at something
recognisably like modern notation around 1600.
Listen to the audio track associated with this activity which is an

example of plainchant. It is the hymn Bellator armis inclitus (literally
‘Warrior glorious in arms’.) I
Plainsong was notated using neumes (pronounced ‘newms’). This

word originally referred to signs that were placed above or below the
liturgical text in order to give some indication of whether a melody
moved up or down. The signs were based on upward or downward
sloping lines, showing the direction of pitch change. These signs gave
no indication of rhythm or pitch (except for indicating whether the
pitch went up or down).
The origins of neumes are much disputed; their most likely provenance
was the grammatical signs that were used in Classical Greek, derivations
of which are found in many European languages (for example the acute,
grave and circumflex accents in French and Portuguese). Figure 1 shows
an example of a text with neumes marked on it and an enlargement.
Figure 1 A gradual notated in Breton neumes from the late ninth century
This kind of neumatic writing was eventually recognised as

unsatisfactory because it was impossible to work out the exact
intervals between one note and the next. Figure 2 shows an example of
a later method that used an alphabetic system in conjunction with a set
of lines, not unlike that of staff notation.
Figure 2 From the treatise Musica enchiriadis
The most influential system of notation – a revolutionary proposal in

its time – is popularly associated with the most famous Mediaeval
theorist, Guido d’Arezzo (c.991 – after 1033), although he is not
thought to have invented it himself. This was the use of a four-line

stave (rather than the five-line stave now used). Guido d’Arezzo also
introduced a number of other ingenious and effective ideas. One of
these was the identification of the notes of a scale by mnemonic
syllables: ut, re, mi, fa, sol, la. These syllables were taken from a well
known (at the time) hymn to St. John the Baptist in which each
successive phrase or section of the tune began one note higher than the
one before. The words of each phrase (Ut queant laxis, Resonare fibris,
Mira gestorum, Famuli tuorum, Solve polluti, and Labii reatum)
supplied the mnemonic syllables. This system became the tonic solfa
system, which uses the syllables doh, ray, me, etc. In some countries
‘ut’ is still used to represent the first note of the scale.
Listen to the audio track associated with this activity where you can
hear the hymn that supplied the syllables ut, re, mi, etc. for successive
notes of the scale. Listen for the rising scale created by the syllables
ut, re, mi, etc. I
Another of Guido’s innovations was a teaching aid known as the

Guidonian hand. In this system, pitches were associated with parts
of the hand and fingers. By pointing to parts of his hand, Guido
could teach melodies to singers.
The Guidonian hand was often
represented diagrammatically.
Figure 3 is an example.
Plainsong did not need rhythmic
indications, because the words
dictated the rhythm. Some early
plainsong manuscripts added an
occasional letter to a neume
indicating ‘hold’ or ‘quicker’;
occasionally, a line, known as
episema, might be added to a
neume to indicate some sort of
lengthening. Some authorities
question the importance of these
episemae, asserting that the
rhythm was essentially free, or
mirrored by the metre of the text.
Figure 3 Guidonian hand, from a
manuscript in Mantua dating from the
last quarter of the fifteenth century
Organum was a major development in music of the early Middle Ages

(c.1000–1300). The earliest type of organum consisted of plainsong
plus an extra part doubling the melody a fourth or fifth above.
More developed organum writing added the octave, third and sixth
in various combinations. With the development of organum came
refinements of its notation, notably the adoption of a stave of lines to
represent the pitches and the use of symbols to indicate rhythm. The
first composers to make consistent use of an unambiguous rhythmic
notation were Léonin and Pérotin, two names associated with the
long breve Notre Dame school. Léonin (c. 1163–90) composed only two-part
Figure 4 organa, whilst Pérotin (c. 1160–1240) included parts for a third and
Long and short fourth voice. Both composers based their rhythmic notation on the
notes in organum ‘long’ and the ‘breve’ (‘breve’ meaning ‘short’), shown in Figure 4.
These were the only two rhythmic units used at the time. Notice that
the notes are ‘filled’ rather than white; also, in this notation the stem
indicated a longer note, which is the opposite of modern usage.
Listen to the audio track for this activity which is an excerpt from
Pérotin’s organum Viderunt Omnes. I
Subsequent musical developments (Ars Antiqua in the thirteenth century

and Ars Nova in the fourteenth century) fostered some sophistication in
the uses of notation. However, it was only in the fifteenth century that
some of the other core aspects of staff notation, as we currently know it,
emerged. In particular, time signatures appeared, but were notated as
symbols until the end of the seventeenth century. (The symbol C for 4/4
remains in use, and is often regarded as standing for ‘common’ time
although the symbol did not originally stand for a particular word.)
The subsequent history of notation is largely a history of adding more
detail. In music of the Baroque period, dynamic markings were hardly
used, and the performer was often expected to add ornaments that were
not shown in the notation. Even the notes of chords for accompanying
instruments, such as keyboards, lutes, harps etc. were sometimes not
fully shown, but indicated in a shorthand way using a system known
as figured bass, which consisted of sets of numbers below the stave
lines. Performers would work out suitable chords from these figures,
and improvise suitable accompanying textures, such as scale passages,
arpeggios (chords where the notes were played separately) and so on.
An entirely different form of notation from the staff system flourished
in the Renaissance and Baroque periods for certain instruments.
This was tablature, and it was used especially for the lute and related
instruments, and to a limited extent for the keyboard. The essence of
tablature is the notation of how a particular pitch should be produced
on an instrument, rather than the notation of the pitch itself. Tablature
is therefore specific to a particular instrument tuned in a particular
way. Figure 5 shows an example of lute tablature.
h
a b a a a b a a f
b d d b d d a b d a d c f
c c c c c c b c c c b c
c a d c a a c c h
a a d c a
Figure 5 Lute tablature
Each horizontal line represents a string, or course if the strings were in

pairs (as they usually were on the lute). The top line represents the
highest pitched of the courses, and the bottom line represents the
lowest. The letters a, b, c, d, etc. indicate the fret where the course
below the letter was to be stopped. The letter a means an open

(unfretted) course, b represents the first fret, c represents the
second, and so on. The flags over the top represent rhythmic notation.
They indicate relative duration, rather as the hooks attached to notes
in conventional notation do. In this form of tablature, the note value
indicated by a flag applies to all notes and chords following the flag
until it is contradicted by a later one.
Tablature came in many forms, depending on the instrument and also
the originating country. There were, for instance, Italian, French and
German versions of tablature. A version of tablature is still widely
used for popular- or folk-style guitar music; classical guitar music uses
conventional staff notation.
By the late nineteenth and early twentieth century, conventional staff
notation had become much more precise than it had been in earlier
periods, with (generally) less requirement on the performer to improvise.
The amount of detail included in a printed piece of music was much
greater than in earlier periods, with consequent problems for legibility.
The ability to lay out a piece of music in order to maximise its
legibility therefore became a highly developed skill among music
copyists (people employed to copy music by hand) and printers.
Indeed, many musicians would contend that the quality of music
copying and printing achieved in the nineteenth century has not been
surpassed by more modern methods.
The twentieth century saw many experiments in music that broke
away from traditional ways of creating music, and these have affected
music notation – though often more in specialised areas of music than
in the general run of music. For instance, several composers have
experimented with microtones, which are musical intervals smaller
than a semitone, or quarter tones, which are intervals of half a semitone.
Various notational innovations have been devised for music that uses
such intervals, for instance the use of incomplete flat and sharp signs
to indicate a quarter-tone flat or sharp. More significant than this
particular example, though, is the fact that many composers have
incorporated novel techniques as ‘one-offs’ in particular compositions,
and ‘one-off’ notations have correspondingly been devised. Many
modern scores have a page or two at the start listing the new symbols
that have been used, with a few words of explanation of what the
symbol means. These symbols might not be used in any other work.
The use of a non-standard symbol in printed music is more of a problem
than in handwritten music, as nearly all systems for printing music
use a pre-determined set of symbols. Creating a new symbol for use in
only one work is an awkward business, even in modern computerised
systems of music printing.
Computer systems have revolutionised the business of laying out and
printing music, but their effect on the notation itself is relatively small.
Notation produced on a computer is often indistinguishable from that
produced by other mechanical methods. Rather, the effect of computer
technology has been to take a lot of the drudgery out of notating and
copying music. For instance, a composer can write for all instruments
of an orchestra or band at sounding pitch, and have the computer
make the appropriate adjustments to the notation for the transposing
instruments. Computer systems can often also create notation from a

performance played on a keyboard connected to the system, although
the result generally needs to be edited to make it into an acceptable
piece of notation.
Despite the versatility of conventional staff notation for representing
many styles and periods of music, there are areas of music where it
is hardly usable. Electro-acoustic and electronic music, for instance,
often use musical procedures that conventional notation is not
equipped to represent, such as the sound of an instrumental timbre
being progressively modified into a different timbre. Systems of
notation have been devised, and are being devised, for such styles of
music, but they are beyond our scope in this course.
4 PRINTING
4.1 Introduction
Following the development of methods for printing text and pictures in
the fifteenth century, ways of printing music notation were developed.
These methods aimed initially at emulating handwritten notation, but
many features of handwritten music notation did not lend themselves
to printed reproduction. Consequently printed notation developed
features that were specially adapted to printing, and in some cases
these found their way back into handwritten notation.
Printing certainly did not make handwritten music obsolete.
Until well into the twentieth century it was quite common for
professional performers to use handwritten parts under certain
circumstances. For instance, at a session for recording film music
it would not be economic to use anything other than handwritten (and
possibly photocopied) parts. Once the recording was made, the notated
music would have little further interest, and in many cases would be
discarded or lost. However, these are rather specialist circumstances,
and for most musicians nowadays, notated music nearly always means
printed music.
Printing is essentially the bulk reproduction of an image or text.
For many centuries an image or text for printing had to be created in a
special way so that it could be used as a means for applying ink to paper.
In looking at printing, therefore, we generally need to be concerned
with two aspects of the process:
1 The creation of the ‘master’ copy, which will be reproduced
identically in bulk.
2 The techniques by which an image of the master copy is
transferred to paper.
In the early days of printing, these two aspects were closely connected.
However, techniques of printing developed in the twentieth century
led to the separation of these two aspects, to the extent that, in modern
printing, the creation of the master image is completely separate from
the business of applying ink to paper.
In this section of the chapter you will be looking at some video sequences
under the title ‘Music printing’ which are concerned almost entirely with
the first of the two aspects given above, that is, the creation of the musical
master copy. These video sequences are introduced in the sections below
by very brief descriptions of the associated methods of transferring
images to paper – the second of the two aspects listed above.
4.2 Letterpress
One of the earliest methods of printing used the letterpress technique,
in which the image or text to be printed is created in relief, that is, as a
raised surface. Figure 6 shows a single letterpress character, or type,
for the letter n. Pieces of type such as this were almost invariably made
of metal, and created by casting (pouring molten metal into a mould).
A complete piece of text for printing is created by combining such
pieces of type into words (and spaces), and locking them solidly into a
frame. By passing an inked roller over the top of an assembly of such
characters, the printing surface gains a layer of ink. When a piece of
paper is pressed onto the top, it receives an impression (in reverse) of
the text, or image.
Figure 6 When it came to adapting this technique to music, there were several
A letterpress
character
problems to overcome. One was that the characters of music notation sit
on continuous stave lines, rather than being isolated by white space, as
happens with individual letters of text. In addition, whereas in text all
the letters in a line sit on a common base line, the characters in music
can be on any line or space of the staff, or on ledger lines above or
below the staff.
One solution is to treat the piece of music as a single image and to
create a printing surface in relief by carving away extraneous material,
as happens in the creation of a woodcut or linocut. Another possibility
is to use printing to create the stave lines only, and to add the notes by
hand. Yet another possibility is double-impression printing, in which
the stave lines are printed first, and the notes printed afterwards. All
these systems were used for music, but none was entirely satisfactory,
and they all missed the benefit that comes from using separate pieces
of type, which is that the pieces of type can be disassembled after the
printing and re-used to print something else. The video section in the
following activity shows how these problems were overcome to allow
music printing by separate pieces of type.
ACTIVITY 4 (WATCHING) .....................................................................

Watch the DVD video sequence 1 ‘Letterpress’. I

(a) Why is the letterpress method not suitable for runs of notes
beamed together, that is, joined by beam lines? (Such runs of notes
are referred to as ‘fast’ music in the video section.)
(b) Runs of notes such as semiquavers (sixteenth notes) have to be
printed as separate notes in the letterpress system. Why is this a
drawback? I
Assembling separate pieces of printing type is referred to as

typesetting (or just setting) in the context of text printing. By
extension, the assembly of musical type is known as music setting.
These terms typesetting and music setting continue to be used for
methods that do not use letterpress.
4.3 Engraving
Engraving is a way of creating a printing surface that uses the intaglio
printing technique. Itaglio is the reverse of relief printing: instead
of using a raised image, intaglio uses a sunken image (Figure 7).
The image is created by engraving it into a metal plate with an engraving
tool. Alternatively, in etching, a wax-coated metal plate has the image
incised into the wax with a sharp tool. This exposes the underlying
metal, and when the plate is immersed in acid, metal is removed in the
places where the wax has been removed.
Figure 7 Characters gouged into a

plate for intaglio printing
The image is printed by first wiping an inky cloth or roller over the
plate, which fills the recesses of the image with ink, and then wiping a
clean cloth over the plate, which removes ink from the non-engraved
area. When a piece of paper is applied to the plate, it takes the ink from
the engraved image, leaving a printed impression (again as a mirror
image) on the paper.

Watch the DVD video sequence 2 ‘Engraving’. At various points ‘slurs’
are referred to. These are curved lines added over or under sequences
of notes to group them into musical phrases. This video sequence does
not show etching, though etching was sometimes used in music
printing. I

What are the main distinctions between ‘old style’ and ‘new style’
engraving, as described in the video sequence of Activity 6? I
A single engraved plate would normally not carry just a single page of
music, but several pages (typically a multiple of four). During printing,
a piece of paper would be printed on both sides (one after the other),
using a different plate for each side. By folding and cutting, the piece
of paper becomes a series of pages, which are stitched or glued with
others to create the final book.
The term ‘engraved’ in a musical context has come to mean notation that
has the appearance of printed notation rather than handwritten notation,
irrespective of whether engraving was actually used to produce it.
Engraving was widely used from the Baroque era through to the mid-
twentieth century for the printing of music.
4.4 Photolithography and the Halstan process

Photolithography is a printing technique that results from the bringing
together of photographic techniques with the older technique of
lithography. It is still a very common method of printing for books,
newspapers, packaging and music, although laser technology has
superseded some of the photographic parts of the process.
In photolithography, the image to be reproduced is photographed onto
large sheets of light-sensitive film. (Text for books or magazines,
however, is run out from a phototypesetting machine, but the end
result is the same: a large piece of photographic film containing an
image of the text.) When the film is developed, it has the required
image in black, and the non-image area is transparent. (An alternative
process uses a negative version of the image on the film, and there are
several advantages to ‘negative working’, as it is called, but I will
confine my explanation to positive working.)
The film containing the desired image is laid over a thin metal plate (a
litho plate) which has a light-sensitive coating. The plate-plus-film is
exposed to a strong ultra-violet light source. Where the film is
transparent (that is, the non-image area), the light is transmitted to
the coating on the plate. These parts of the coating become chemically
soluble as a result. When the whole plate is passed through a chemical
bath, these non-image areas are dissolved away, leaving the desired
image as a residual coating on the plate.
The coating on the plate has the property of repelling water. It is also
attractive to the greasy ink that is used for this type of printing. On the
other hand, those parts of the plate where the coating has been removed
(the non-image area) have the property of being attractive to water, and,
when wet, of repelling ink. To print an image from the plate, the whole
surface is wiped with a wet cloth. The non-image areas retain the
moisture. Then an ink roller is passed over the plate. The damp parts
of the plate (that is, the non-image parts) repel the ink; the dry parts
(the image areas) take a coating of ink. A piece of paper applied to the
plate receives an impression of the image. In practice the litho plate
(which is thin and flexible) is usually wrapped around a roller, and the
ink image transferred from the plate to a rubber roller, called a blanket,
and then from the blanket roller to the paper. This type of photolitho-
graphic printing, which uses a blanket roller to transfer the image
from the litho plate to the paper, is called offset photolithography.
Although the removal of some of the coating from the litho plate
changes its thickness in places, the change is microscopically small.
Essentially lithography uses an even printing surface – unlike the
raised surface used in letterpress or the recessed surface used in
engraving and other intaglio methods.
Because the litho plate is created from a photograph of the master

image, almost anything that can be photographed can be reproduced
by this method. Thus, for instance, the composer’s manuscript (or a
neat version produced by a copyist) could be photographed and printed
in this way. Alternatively, a piece of music printed by another
method, such as engraving, could be photographed and reprinted by
photolithography. This has often happened when nineteenth-century
engraved editions of music, for instance, have passed out of copyright.
Publishers specialising in producing cheap editions have taken a clean
printed copy and reprinted it by photolithography.
When original music is to be printed by photolithography, some way
of creating a master image is needed. Various methods are used.
One method uses dry-transfer characters (similar to Letraset) which
are rubbed down onto ruled stave lines on paper. Other methods use
stencils to create inked characters and symbols, again on paper that
has been ruled with stave lines. In the video sequence in Activity 8
you will see one such stencil-based ink process, the Halstan process.
Because it is easy to enlarge and reduce images photographically, there
is no necessity for the master image of the music to be the same size as
the printed reproduction. In fact, it is often more convenient to create a
much larger master image, and to reduce it photographically during the
transfer of the image to the litho plate. This is also referred to in the
video sequence in Activity 8.
Watch the DVD video sequence 3 ‘The Halstan Process’. The Halstan
process was a proprietary stencil process used by the printer Halstan,
based in Buckinghamshire, which specialises in music printing. The
video section shows you how the master image was created, but does
not show any of the subsequent processes by which the litho plate is
produced. I
The following questions relate to the Halstan process shown in the last
activity.
(a) How were corrections made?
(b) How were words added for vocal items?
(c) Why was the master image larger than the printed size? I
Developments in photolithography in the 1980s and 1990s removed a

lot of the photographic processing that was needed in the preparation
of litho plates. Nowadays it is possible, using laser technology, to read
or ‘burn’ an image straight from a computer file (for instance a pdf file
or a graphics file) onto the coating of a litho plate. Thus, for instance, if
one is creating a music score on a computer (as is usually the case
nowadays), a computer file can be sent to the printer from which
the litho plates can be produced without the need for photographic
copying of the image.
For colour printing, a separate master image is created for each

colour, from which a separate litho plate is created. The colour image
is built up on the paper by overprinting the paper with successive
colours.
4.5 Computer setting: Sibelius

The advent of cheap, powerful computers has led to the rise of
computerised music-setting. This has now displaced almost all former
methods (except for small jobs). Depending on the system used,
computer music setting systems can do more than simply lay out the
notation on the screen. Typically they can give an audio playback, so
that the piece can be heard, albeit in a synthesised form, and transpose
the notation into any key. Also, they can allow instrumental parts to be
extracted from a score. This used to be a very tedious job, in which a
copyist would take a composer’s manuscript of the score and copy
from it the parts for each of the instruments. One of the most powerful
facilities offered by these systems, however, is simply the ability to
edit and correct the notation with the ease associated with using a
word processor.
Many systems have been developed for computerised music setting.
They range from relatively simple, cheap programs that amateurs can
use, through to professional-standard systems that are expensive but
very flexible and capable of handling many different notational
conventions (for instance, those associated with particular instruments,
brass bands, avant-garde, rock, jazz and choral music), as well as
mainstream instrumental and orchestral music. Two of the most
popular professional-standard systems in 2004 are Finale and
Sibelius, and in the next activity you will see the Sibelius system.
Watch the DVD video sequence 4 ‘Sibelius’. In this sequence you

will see Jonathan and Ben Finn, who devised the Sibelius system,
discussing its advantages over other methods of music setting.
The name ‘Sibelius’, incidentally, is a punning reference to another
famous musical Finn. I
What drawbacks do the Finn brothers see with the use of systems such
as Sibelius? I
There are several ways of getting the music notation from a computer
program such as Sibelius onto a piece of paper that a performer can
play from. One way is simply to output the music to a printer attached
to the computer. This is suitable if only one or a few copies are
required. For longer print runs, it becomes more economic to use print
technology. High-quality output from a good laser printer can be used
as the master image for photocopying or, if still longer print runs are
needed, for photolithographic printing. Alternatively, a computer file
from the computer (stored on a CD-ROM or sent over a network) can

drive a machine that uses a laser to transfer the image directly to a
litho plate. This flexibility in the way the notation can be output and
reproduced is a further advantage to the use of computerised music
setting.
One possibility for the future, often suggested, is dispensing with
printed paper altogether and having the musician play from an
electronic display such as that on a computer screen. This idea crops
up in the video section in the next activity, which is the final video
section for this chapter.
Watch the DVD video sequence 5 ‘The Future’. Jonathan and Ben Finn
speculate about the future of computer systems such as Sibelius. I
Notation has served many functions in lines). Later forms used die punches
addition to that of conveying a composition which were hammered into the plate to
from a composer to a performer. Other create a recessed impression of each
important functions include a mnemonic character. Notes could be beamed together
function, a regulating function, an in engraving. Corrections were made by
organising function during the hammering on the back of the plate to
compositional process, and an aid to remove the recessed characters. (Section 4.3)
analysis. (Section 2)
Etching, like engraving, is an intaglio
Western musical notation evolved from a process. A waxed metal plate had wax
relatively imprecise method (using neumes) selectively removed using a sharp tool,
for indicating pitch changes in plainchant. thereby exposing parts of the underlying
Neumatic notation consisted of small lines metal. When the plate was immersed in
placed over a written text. It did not indicate an acid bath, exposed metal was etched
rhythm. (Section 3) away, creating a recessed version of the
image. (Section 4.3)
The use of stave lines is associated with
Guido d’Arezzo (though he may not have Photolithography uses a plane litho plate
been their originator). Guido d’Arezzo as a printing surface (not a raised or
introduced other innovations such as an recessed surface). The master image is
early form of tonic solfa and the use of the transferred photographically to a light-
Guidonian hand as a teaching aid. Rhythmic sensitive coating on the plate. This
symbols were first consistently used in the chemically changes parts of the coating,
notation of organum (which also used stave so that non-image areas become soluble.
lines). (Section 3) During processing, the coating in the non-
image areas is removed. The residual
Tablatures are systems of notation that coating on the litho plate, representing the
indicate how notes should be produced on master image, retains ink, whereas non-
particular instruments rather than what the image areas repel ink. (Section 4.4)
notes sound like. In lute tablatures, a set of
horizontal lines represent the courses of the The Halstan process is a method of creating
instrument, and letters above each line a master image of a piece of music intended
indicate the fretting positions. Rhythm is for printing by photolithography. Stencils
indicated by flags above notes to show are used to create an inked image of the
their relative duration. (Section 3) music on paper (after the paper has first
had stave lines ruled and the intended
Letterpress printing uses a raised printing positions of musical characters faintly
surface. The term letterpress is also marked). The image is created larger than
associated with the use of separate pieces the printed size, and reduced
of type, which are combined to make the photographically. Correction in the
printing surface. In music printing by Halstan process is much simpler than in
letterpress, separate pieces of music type engraving. (Section 4.4)
had short sections of stave lines and a
musical character on a line or space. When Computerised music setting systems (such
pieces of type were combined, neighbouring as Sibelius) have turned music setting into
sections of stave line joined up, giving the a desk-top process. Music notation can
appearance of continuous stave lines. easily be created, edited, transposed and
(Section 4.2) played back. Parts can easily be extracted
from a score. (Section 4.5)
In letterpress printing, it was not possible
to beam notes together. (Section 4.2) The final, corrected version of the music
can be printed out on a high-quality laser
Engraving is an intaglio process in which printer and used as the master for
the image is recessed into a metal plate. photolithographic printing. Alternatively,
Early forms of engraving were freehand a computer file can be used for laser-
processes (apart from the ruling of stave processing of a litho plate. (Section 4.5)
Activity 5
(a) Because each note is a separate piece of type, beams cannot be
created that will join all the separate pieces of type in a run.
(b) Beaming enables notes to be grouped into beats, which makes
them easier to read. When the notes are separated, as in
letterpress, it is not so easy to sort them into beats at a glance.
For instance, In Figure 8, (b) is much easier to interpret than (a),
although the two pieces of notation represent the same thing.
(a)
(b)
Figure 8 (a) Unbeamed notes, typical of letterpress, are harder to read

than the beamed notes in (b)
Activity 7
In the ‘old style’, all characters except the stave lines were engraved
freehand. In the ‘new style’, die punches were used for all characters
except beams, ledger lines and long slurs.
Activity 9
(a) Corrections were made by simply painting over the errors with
typewriter correction fluid and re-creating the character.
(b) Words for vocal items were typeset separately as strings of text,
which were cut up and stuck down beneath the notes as required.
(c) Two principal advantages were claimed. Reducing the size
photographically gave a sharper image and made any corrections
less conspicuous.
Activity 11
They say these systems are not as flexible as pen-and-ink when it
comes to very old or very new music, which often use non-standard
notations.
The brothers point out that traditional music engraving was a highly
skilled job, and engravers had to serve a long apprenticeship. Too
many computer users think high-quality music setting is easy, or that
the computer can do everything.
LEARNING OUTCOMES
After studying this chapter, and the associated DVD video sections,
you should be able to:
2 Discuss some of the functions that music notation serves.
3 Summarise briefly the evolution of music notation from neumes to
the modern Western system (including tablature).
4 Outline the basic principles of letterpress printing.
5 Describe briefly the process of music printing by letterpress,
explaining how it relates to text printing by letterpress and what
problems music presents for the letterpress process. (Activity 5)
6 Describe briefly the process of music printing by engraving, and
discuss its advantages over letterpress printing. (Activity 7)
7 Outline briefly the photolithographic method of printing and
how a master music image may be created (and edited) for litho-
graphic printing by stencil methods and by computer programs.
(Activities 9 and 11)
8 Discuss some of the benefits and problems of using computer-based
music setting systems. (Activity 11)
Acknowledgements
permission to reproduce material within this chapter.
Figure 1: MSS47, folio 34 verso by courtesy of the Bibliotheque
Municipale de Chartes; Figure 2: Copyright © Bibliothèque
Municipale de Valenciennes; Figure 3: The Bodleian Library,
University of Oxford, MS. Canon. Liturg. 216, fol.168r.
127 TA225 BLOCK 3 SOUND PROCESSES CHAPTER 3 CARILLON TO MIDI 127
TA225
Block 3 Sound processes
Chapter 3
Carillon to MIDI
CONTENTS
Aims of Chapter 3 130
1 Introduction 131
2 Instructions over a barrel 131
2.1
Barrel orchestrions 133
2.2
Music in the street 134
2.3
Cylinder musical boxes 138
2.4
Disc musical boxes 139
3 Cardboard books and paper rolls 140
3.1 Cardboard books 142
3.2 Perforated paper 143
3.3 Piano Rolls 144
4 How far can you go? 146
4.1 Electromechanical violins 146
4.2 Banjo orchestras 147
4.3 The Grand Electric Orchestra 147
4.4 Composition for mechanical instruments 148
5 What does mechanical music tell us about
music in code? 149
6 Introduction to MIDI and its development 151
6.1 The development of MIDI 152
6.1.1 The need for a control interface 152
6.1.2 The origins and acceptance of MIDI 153
6.2 A word of caution 154
7 MIDI basics 155
7.1
A simple MIDI set-up 155
7.2
MIDI channels 156
7.3
real time operation 156
7.4
MIDI messages 157
7.5
Specification components 157
8 MIDI hardware 158
8.1 MIDI cable 158
8.2 MIDI ports 158
8.3 Computer connections 159
9 MIDI Electrical specification 160
9.1 Electrical signals 160
9.2 Serial-to-parallel conversion 163
10 MIDI messages 164
10.1
Channel messages 165
10.2
System messages 169
10.3
Running status 172
10.4
Message coding 174
11 More MIDI features 178
11.1 Sample dump 178
11.2 General MIDI 180
11.2.1 General MIDI 2, MIDI Lite and SPMIDI 182
11.3 MIDI time code 183
11.3.1 The problem of synchronisation 183
11.3.2 SMPTE time code 183
11.3.3 MIDI time code 185
11.4 Standard MIDI files 187
11.5 MIDI machine control and MIDI show control 192
11.6 MIDI downloadable sounds 194
11.7 Summary of Section 11 196
12 MIDI in action 196
12.1
MIDI equipment 196
12.1.1 MIDI generators 197
12.1.2 MIDI manipulators 199
12.1.3 MIDI sound generators 201
12.1.4 MIDI implementation chart 202
12.2
MIDI in computers 205
12.2.1 Hardware 205
12.2.2 Software 206
12.2.3 Latency 207
12.2.4 Other aspects 208
12.3
MIDI in film and TV music 208
12.4
MIDI limitations and improvements 210
12.5
MIDI and the TA225 Course Tune 211
Appendices
Appendix 1 – Table of General MIDI pitched sounds 218
Appendix 2 – Table of General MIDI percussion sounds 220
Appendix 3 – MIDI show control devices and commands 221
AIMS OF CHAPTER 3
I To introduce the concept of using a set of instructions to make music.
I To describe the various methods of storing the instructions.
I To describe the classes of mechanical musical instruments and
highlight the differences in performance between them.

I To watch and listen to various classes of mechanical musical
instruments and in so doing to appreciate the ingenuity of the
engineers who made such instruments possible.
I To explain the importance, development and uses of MIDI.
I To describe the physical, electrical and data format characteristics
of the MIDI system.
I To introduce some MIDI devices and provide some practical
experience of working with MIDI.
1 INTRODUCTION
In Chapter 2 you read how the instructions from composers for

musical performances may be represented as conventional printed
scores using traditional musical notation. However, the traditional
score is not the only way in which instructions for playing the tune
may be presented. Experimental scores, using no conventional
notation but with instructions in the form say of colours to notate
musical events, may be used. Furthermore, instructions which do
not represent the aural experience directly but are used to control
machines can create musical performance without the need for human
intervention.
This chapter opens by introducing you to some musical machines that
have been developed over the centuries to create autonomous musical
performances. In all cases the instructions to play the tunes are stored
as codes which are interpreted by the machines. These codes may be
stored in various ways but typically pins stuck into a cylinder or
barrel, holes in a roll of paper, or bits in a computer’s memory, are
used. Whatever the method, as long as the instructions are accurate
and correctly interpreted by the machine, the result will be musical
performance – sometimes of spectacular proportions.
Sections 2 to 4 introduce the concepts of music performance by
mechanical machines, and much of this is supported by a set of
watching activities from the set of video sequences of mechanical
musical instruments collectively entitled ‘Music in Code’. This allows
you to see and hear the mechanical musical instruments described in
the text. There are also listening activities to give you the opportunity
to listen to complete performances of the instruments shown in the
video sequences. The remainder of the chapter introduces the Musical
Instrument Digital Interface (MIDI). This will involve practical work
with the course’s music recording and editing software and some more
watching and listening activities.
2 INSTRUCTIONS OVER A BARREL
Today the simple pleasure of listening to a musical performance is

available to most of us at the turn of a switch or the press of a button.
Modern audio recordings allow us to listen to the very best
performances of musical works whenever we like and wherever we are.
But what if these recordings were not available to us, as was the case
only a hundred years ago? Think about this during Activity 1.
Watch the DVD video sequence 1 ‘Introduction – the Decap Café

Organ’. In this first sequence the idea of using mechanical instruments
to play music is introduced. The sequence ends with the amazing
Decap Blue Angel café organ, perhaps the ultimate expression of the
art of mechanical music. I
Figure 1 The Decap café organ at Ashorne Hall, Warwickshire
The Decap café organ, illustrated in Figure 1, was manufactured towards

the end of the story of mechanical musical instruments, a story which
started 500 years earlier with an instrument called a carillon.
A carillon is a set of bells, tuned so as to play a musical scale, usually
situated in a bell tower or on the side of a tall building. Carillons have
been used in many parts of Northern Europe for centuries and are
amongst the earliest mechanical musical instruments known. Records
show a mechanically controlled carillon was playing at Mechlin, Antwerp
in 1583, but it was unlikely to be the earliest one ever built*. The bells
which form the carillon were struck by hammers controlled either auto-
matically from a set of instructions, or manually using a special
type of keyboard which was played by pounding the key with the fist to
get sufficient force. Automatic carillons often formed part of the
chiming mechanism of clocks found in towers of churches or town halls.
They provided a musical tune as part of the chime action, most
often playing on the hour. In the United Kingdom there is a strong
tradition of hand-change bell ringing (campanology) where each bell
is played by a different person, and so carillons proved less popular.
Figure 2 illustrates a modern
18-bell automatic carillon built
by the Eisbouts Company of
Holland in 1990 for the Arndale
Centre in Manchester, UK.
At the time of writing it is
installed at Ashorne Hall,
Warwickshire, UK, where it
was filmed for the next
activity.
Figure 2 The carillon installed on a wall at Ashorne

Hall, Warwickshire
*Ord-Hume, A.W.J.G. (1978) Barrel Organ, George Allen & Unwin, London, p. 407.
Watch the DVD video sequence 2 ‘Eisbout’s Carillon’ which shows the
operation of the Eisbout’s Carillon which forms part of a clock chiming
mechanism. The hammers are operated by electric solenoids which
receive their instructions from a MIDI program running on a personal
computer. These instructions are stored on a floppy disk. I
The hammers of early carillons could be

controlled from instructions contained on a
pin barrel similar to the one illustrated in
Figure 3. Each pin on the barrel caused a
particular bell to be struck. A mechanical
amplifier (really a set of levers between the pin
and the hammer) provided the force necessary
to sound the bell. To operate the carillon the
barrel was turned by the handle attached to it.
Figure 3 A carillon controlled from a pin barrel
Pin barrels have been used to provide instructions to a wide range of

mechanical instruments and in the next section I will describe how
some of them have been used to produce mechanical musical
performances.
2.1 Barrel orchestrions

Pin barrels were the instruction source not only for carillons but for
several different categories of mechanical instruments including a
clockwork café orchestrion. An orchestrion is a mechanical
contrivance that contains several different instruments all of which are
controlled by the one instruction source. It gives the listener the
impression of hearing a small orchestra playing music as may be
experienced in Activity 3.
Watch the DVD video sequence 3.1 ‘Pin barrel orchestrion’ which
shows a café pin barrel orchestrion chosen to demonstrate how a range
of different instruments can be controlled from a single pin barrel. As
the barrel rotates you can observe how the pins on the barrel engage
with the levers on the key frame to operate the various instruments.
During the sequence the operator shows how a different tune can be
selected. The barrel is turned by a substantial clockwork motor.
You can watch and listen to a complete performance of a piece of
music played on this pin barrel orchestrion in the performance section
of the video sequences in sequence 6.1 ‘Clockwork barrel orchestrion’.
Comment
Although this orchestrion is over 100 years old and in need of some
restoration it nevertheless ably demonstrates a pin barrel in action. I
Figure 4 shows the

pin barrel from the
clockwork café
orchestrion. The
position of the pins on
the barrel represent,
on the horizontal
scale the note to be
played on a particular
instrument, and on
the vertical scale the
duration of the note.
The time taken for the
barrel to make one
revolution determines
the playing-time of the
Figure 4 The pin barrel from the clockwork musical work and its
café orchestrion showing the varying tempo. The positioning and securing of the pins on
positions and widths of the pins the barrel are described in Box 1 ‘Pinning barrels’.
Mechanical instruments have often been used for performances in public
places such as cafés and bars where they provided entertainment more
cheaply (and probably more reliably in some cases!) than musicians.
Because mechanical instruments were expensive to own, proprietors
would recoup their investment by charging customers for each tune
played. Usually these instruments were started using ‘coin-in-the-slot’
mechanisms. Several of the video sequences show instruments being
started in this way. However, other players had to take their mechanical
instruments to public places, such as city streets or recreation parks,
and rely on donations from passers-by for their income.
2.2 Music in the street

One of the most common instruments to use a pin barrel was the street
piano, illustrated in Figure 6. It is often given the misnomers of barrel
organ, piano-organ or even hurdy-gurdy.
Figure 6 A street
piano that was
played in Warwick,
UK, for many years
Box 1 Pinning barrels

Pinning barrels was a highly skilled job calling noting only allowed tunes with common-time
for both artistic and craft abilities. Pins differ signatures to be transferred onto the same
in size from a small diameter one used for barrel as the same basic marking had to be
the shortest note, usually a demisemiquaver, used (remember that barrels usually contained
to an elongated staple used for a sustained several tunes).
note or special effect, such as controlling Noting by dial used a special frame into which
dampers on piano strings in the café the barrel to be pinned was fitted. The barrel
orchestrion. Positioning and securing pins could be rotated by a handle fitted to the
into the surface of a barrel was done in two frame. A dial, which resembled a clock face,
stages: was calibrated both for the number of turns
Stage 1 was known as barrel noting. Two of the handle needed to rotate the barrel one
basic methods were used, noting by scale and complete revolution and for the particular
noting by dial. piece of music to be pinned. After calibration
Noting by scale used a length of paper cut to each division on the dial equalled a
fit around the circumference of the barrel. demisemiquaver in that particular tune. As
The paper was marked along its length with the barrel was slowly turned the position of
divisions for each bar of the music to be the pins could then be determined from the
played. Each division was then subdivided dial. This method allowed any time signature
into four parts or crochets and each part to be noted directly onto the barrel once the
further divided down into demisemiquavers. dial was calibrated.
The width of the paper was marked with the Stage 2 was barrel pinning. Pins or staples
position of the notes on the key frame of the were fitted to the barrel using the note marks.
instrument. Figure 5 shows an example of The size of each pin or staple was selected to
how part of the TA225 Course Tune could be sound the correct length of note or effect.
scale noted onto paper. Barrel pinning was a precision process using
When the scale noting was completed the specially devised hand tools and gauges. Great
tunes were transferred by attaching the paper care was necessary to ensure each pin or staple
to the barrel and marking the position of the sounded the note as demanded by the
pins through the paper. This method of barrel composer of the music for the performance.
course tune
crotchet
pin
C5 quaver
pin
C4
notes on
position and type of pin paper cut to length
instrument
(piano keyboard
representation)
Figure 5 An example of noting by scale using the TA225 Course Tune
The street piano consists of a strung piano frame (usually made of

wood) with hammers, activated by the barrel pins, striking the strings.
It was operated by turning a handle demonstrated in the next activity.
Watch the DVD video sequence 3.2 ‘Street piano’ which shows a hand
operated street piano mounted on a carriage to wheel about the streets
of, in the case of this piano, Warwick. Observe the crude nature of the
mechanism which controls the hammers. See also how different tunes
are selected by using a lever to move the barrel horizontally. I
By the fifteenth century all the principle characteristics of present-day

pipe organs had been developed and it is therefore not surprising that
barrel organs feature amongst the earliest instruments to be operated
mechanically. As the barrel rotates the pins catch against a set of levers
which are fitted into in a key frame. Each pin causes a particular valve
to open allowing air under pressure from the bellows to enter the pipe
and sound the note. Whilst elaborate power sources using electricity,
steam and even water were used to the turn barrel and work the
bellows often they would be powered by hand as demonstrated in
Activity 5.
Watch the DVD video sequence 3.3 ‘Barrel organ’ which shows a small
hand powered barrel organ. The compressed air comes from bellows
situated in the base of the instrument and pumped by a crank attached
to the barrel. Note particularly how the different effects of sustain,
vibrato and trill can be generated from different shaped pins. I
Pin barrel instruments vary both in size and in operation, from portable
20 note street-organs, such as that featured in Activity 5 to the large
orchestrions similar to the one in Activity 3.
The world of the hurdy-gurdy man, the name commonly given to the
operator of street organs in Europe, appears a sad and lonely one as
may be seen from the contemporary illustration in Figure 7, where the
man is shown operating a portable street-organ which is covered to
protect it from the weather. This loneliness is reinforced in the poem
Der Leiermann by Wilhelm Müller. Activity 6 gives you the opportunity
listen to this poem.
Figure 7 A contemporary
drawing of a 19th century
hurdy-gurdy man
Listen to the audio track associated with this activity. It contains a

recording of ‘Der Leiermann’ by Wilhelm Müller from Franz
Schubert’s song cycle Winterreise. The haunting sound portrays a
lonely, isolated existence. I
Winterreise Winter Journey

Der Leiermann The Hurdy-Gurdy Man
Drüben hinterm Dorfe Yonder, behind the village
Steht ein Leiermann Stands a hurdy-gurdy man
Und mit starren Fingern And with numbed fingers
Dreht er was er kann. He turns what he can
Barfuß auf dem Eise Barefoot on the ice
Wankt er hin und her He dodders here and there
Und sein kleiner Teller And his little plate
Bleibt ihm immer leer. Remains ever empty
Keiner mag ihn hören, No-one hears him
Keiner sieht ihn an, No-one looks at him
Und die Hunde knurren And the dogs snarl
Um den alten Mann. Around the old man
Und er läßt es gehen, And he lets it go by
Alles wie es will, Things are what they are
Dreht, und seine Leier Plays, and his hurdy-gurdy
Steht ihm nimmer still. Stands never still
Wunderlicher Alter ! Wonderful old man!
Soll ich mit dir geh’n ? Should I go with you?
Willst zu meinen Liedern Will you play your hurdy-gurdy
Deine Leier dreh’n ? to my songs?
English translation by Janet Seaton
In Great Britain the person operating the street piano was known
traditionally as an organ-grinder and would often be accompanied by a
pet monkey who would shake a coin tin at passers-by to attract their
attention and more importantly their money. Street pianos often played
out of tune largely because of damp weather affecting the wooden piano
frame which held the strings. The action of the barrel pins operating a
simple mechanism to strike the strings afforded a very crude sound.
This gave the street piano a bad reputation, with the unsubtle ill-
tuned music becoming a curse to many city-dwellers who were daily
subjected to this ‘entertainment’. It is said that many organ-grinders
were paid to go away rather than as a reward for their performance!
In the middle of the 19th century the English mathematician and pioneer
of the modern computer Charles Babbage went as far as to appeal to the
House of Lords in the English Parliament to get organ-grinders and their
instruments banned from London streets. Although quantity rather
than quality was offered, street pianos remained popular and were still
to be found in the streets of large cities in Great Britain up to the
outbreak of the Second World War in 1939.
2.3 Cylinder musical boxes

By virtue of their profession, clock and watch makers were naturally
interested in mechanical devices. During the sixteenth and seventeenth
centuries clock mechanisms became increasingly intricate, striking the
hours with ever more complex musical chimes. However, these clock
makers found a problem with the mechanics of the musical movements.
The size and quantity of organ pipes and bells necessary to provide the
elaborate tunes were simply too large to fit into clocks made for domestic
use. Think about how the size of an organ pipe effects the sound they
make in the next activity.
What happens to a note produced by a pipe as the length of the pipe is

reduced? I
A watchmaker from Switzerland, Anton Favre, is attributed with

inventing ‘the means of establishing carillons without hammers’*. By
using a specially prepared steel ‘tooth’ attached to a wooden frame a
musical note of acceptable pitch and volume for its size was obtained.
The tooth could be plucked by a pin on a rotating cylinder, as shown
in Figure 8, (the pin barrel is referred to as a cylinder in musical box
terminology) making the familiar sound of the musical box.
Figure 8 A musical box cylinder and toothed comb
The teeth form a comb with each tooth cut to a slightly different length
creating a musical scale. Usually the cylinder was turned by a clockwork
motor. The pins plucked the teeth directly, there was no intermediate
mechanism except to operate bells and small drums which were used
to augment the tunes. This may be seen in Activity 8.
* Ord-Hume, A.W.J.G. (1973) Clockwork Music, George Allen & Unwin,

London, p. 63.
Watch the DVD video sequence 3.4 ‘Musical box’ which shows a
musical box in operation. Note how automata* are used to ring the
bells under the control of the six pins at the right-hand-side of the
cylinder. These pins are highlighted in the video sequence. Automata
appeared only in the finest musical boxes. I
Fine musical boxes were very expensive to manufacture. A cylinder

might need ten thousand pins to be fitted to play a complex tune and
each pin had to be inserted and secured into the brass cylinder by
hand. They were also very ornate being housed in boxes made of very
expensive woods and augmented with automata. Just as with barrel
organs so musical boxes could play several tunes from one cylinder
simply by moving the cylinder horizontally with a lever mechanism.
However, the inability of the pin barrel or cylinder to be exchanged
easily to provide additional tunes was always a limitation of this type
of mechanism.
2.4 Disc musical boxes

Towards the end of the
nineteenth century a new type
of musical box appeared. This
used a thin metal pin disc
punched with tabs or pins in
place of the cylinder, illustrated
in Figure 9. The advantage of
this mechanism was that the
disc could be easily replaced by
the user, allowing any number
of tunes to be played.
Figure 9 A Polyphon disc made of
sheet metal with punched tabs or pins
Disc music boxes were manufactured mainly by two rival companies,

Symphonion and Polyphon, the latter becoming a generic name for this
type of box. Discs were much cheaper to manufacture than cylinders.
The title and composer of the tune could be printed onto the top surface
of the disc just like today’s compact disc. Discs and boxes were made
in a variety of sizes for use both in private houses and public places,
the latter being operated by coin-in-the-slot mechanisms. Both types
are shown in the next activity.
Watch the DVD video sequence 3.5 ‘Pin disc musical box and
polyphon’. The video opens with a domestic disc musical box and
then shows a large Polyphon. Note how the mechanism differs from
the cylinder musical box with an intermediate ‘star-wheel’ operating
the teeth. I
*Automata are moving figures of humans or animals that function while the
mechanism is playing and indeed may ‘play’ instruments such as bells.
I am sure you can appreciate the ease in which the disc can be fitted.
It occurs to me that this offers advantages to both the user and the
manufacturer. Can you suggest what they might be?
Comment
The advantage to the user is that new tunes could easily be purchased
or borrowed making the pin disc musical box much more flexible than
the cylinder box. They were also less expensive to buy.
The advantage to the manufacturer is that there are future sales in discs
offering the latest tunes. I
Whilst pin discs overcame many of the limitations of cylinders they were
developed too late. Phonograph cylinders and gramophone records
were making inroads into mechanical music businesses of the early
20th century. Eventually they were to take the majority of sales and
put the mechanical industry into permanent decline. The next section
introduces an alternative instruction system which superseded the pin
barrel mechanism and, for a time, even withstood the onslaught by the
gramophone.
From what you have read and observed in the preceding section make
a list of any drawbacks you think that the use of pin barrels for storing
music may have. To think about this you might find it useful to replay
the Section 3 video sequences on pin barrel instruments. I
3 CARDBOARD BOOKS AND PAPER ROLLS
Playing musical instruments to a high standard requires a considerable

skill which is not given to everyone, even after years of practice. And to
make matters worse, what if a musical instrument becomes a fashion
item? Well just over a hundred years ago, certainly throughout much of
the western world, this was just the case. One item many people wanted
in their homes was a piano. Table 1 shows the growth in piano sales
between 1890 and 1910.
In the best room of the house stood a piano, bought often on easy
payments over many weeks and years, for they were not cheap items.
Unfortunately as a fashion item, unless anyone in the household
played, the lid over the keys was only lifted to dust the keyboard!
Table 1 A comparison of the number of pianos in the UK and USA around 1900
Year Total estimate UK homes Total estimate USA homes

of pianos in UK with a piano of pianos in USA with a piano
1890 170,000 2.8% 200,000 3.0%
1900 205,000 2.8% 460,000 3.3%
1905 270,000 3.2% 760,000 4.8%
1910 335,000 3.8% 1,050,000 5.6%
Source: After Ord-Hume, A.W.J.G. (1984) Pianola, George Allen & Unwin, London, p. 124.
In that case only when friends or relatives who played visited was an
enjoyable evening spent listening to the latest tunes and singing along
to old favourites.
One way to get more use out of the
piano was to purchase Edwin
Votey’s Pianola, or ‘pushup’ as it
became known (for you had to
push it up to the piano keyboard
in order for it to work), shown in
Figure 10.
Figure 10 The Pianola manufactured

by the Aeolian Company of New York,
USA, was pushed up to the piano,
thereby getting the name ‘pushup’
Designed in 1896 by the Aeolean Company of Detroit, USA, the

pianola was a large machine with sixty-five wooden ‘fingers’
which, when set up in front of an ordinary piano, pressed the keys
‘in a way similar to that of a pianist’. A roll of paper perforated
with holes contained the instructions to play the music. When a
hole was sensed by the mechanism the appropriate note was struck.
The pianola was manually powered by two foot pedals which
provided the vacuum (suction) necessary for the instrument to
work. This mechanism will be discussed in greater detail later in
this section.
Whilst the pianola took over the responsibility for striking the keys,
musical interpretation was still provided by the operator, or performer
as they liked to be known. Expression could be added by operating
levers which operated the pedals on the piano. The tempo was
controlled by the rate of pedalling. So in way the operator still
‘played’ the instrument to produce the musical performance and the
results were so good that classical music concerts were given using
pianolas. But the method of controlling the pianola was derived
from an instruction method developed for a completely different
purpose.
At the beginning of the nineteenth
century a Frenchman, Joseph Marie
Jacquard, developed an instruction
mechanism for automating the weaving
of silk cloth. An endless loop of cards
with holes punched into them provided
the instructions to the silk loom, as
illustrated in Figure 11. By sensing the
position of the hole in the card a
particular action on the weaving loom
took place. Once programmed the loom
operated quite autonomously with
complex patterns and pictures being
Figure 11 A Jacquard loom with
woven into the silk.
the punched card mechanism
Subsequently, the Jacquard system became a common way of

programming machinery as diverse as industrial metal cutting lathes,
digital computers and, not surprisingly, mechanical musical
instruments. The programming system may be regarded as supporting
the binary (i.e. two state) code whereby one state is represented by the
presence of a hole, the other by its absence.
Two approaches using the Jacquard system have been used for playing
instruments; a folded cardboard book and a paper roll.
3.1 Cardboard books

The cardboard book similar to the one shown in Figure 12 is closest to
Jacquard’s original idea. A length of cardboard is perforated with
the information to play the music and then folded into a convenient
‘book’ format for easy handling and making it similar in form to
Jacquard’s loop of cards. The cardboard book is shown operating in
Activity 12.
Figure 12 A cardboard book used on a boudoir player piano
Watch the DVD video sequence 4.1 ‘Reiterating piano’ which shows a
small Italian boudoir card book player piano. Playing instructions are
contained on a cardboard book system. Note particularly how the
mechanism is restrained by the card and how the operator is able to
modulate the sound by use of a lever. I
The boudoir piano uses a reiterating mechanism, i.e. a mechanism

that makes the sound by a series of repeated hammering actions on
the strings. The mechanism is restrained by the presence of the
cardboard (i.e. no hole) and released when a hole is detected,
allowing the mechanism to sound a string and play a note.
Cardboard books are to be found in a variety of mechanical musical
instruments including fairground organs, and the street and café
organs so popular in mainland Europe. Indeed the Decap café organ
seen in Activity 1 took its instructions from a cardboard book, and
you may even have noticed a street organ at the start of the video
sequence ‘Voice synthesis at IRCAM’ from Activity 37 of Chapter 7
in Block 2.
3.2 Perforated paper

Paper could also be used to control a mechanism but the action had to
be less harsh than the reiterating mechanism used by the boudoir
piano in Activity 12. Quite simply paper would not have the physical
strength to restrain the reiterating hammer action and would tear.
Paper could be used for playing small organettes such as the one
illustrated in Figure 13.
Figure 13 A small reed organ being played by Paul Camps assisted by Course
Team member Richard Seaton
Here the paper acted as a valve to restrict the flow of air to the reed
which made the sound. (This is an example of a reed organ where the
sound is created by the vibrations induced into the reed by the flow of
air). When the handle on the organette was turned bellows in the base
caused a vacuum to be created. Any hole in the paper would cause air
to be drawn across the reed due to the suction created by the bellows.
Acting as an air-valve caused less stress on the paper than would be
created by a directly operating on a mechanical system. As the paper
was moved forward, by the same motion that operated the bellows, the
tune was played. The organette may be seen playing in Activity 13.
Watch the DVD video sequence 4.2 ‘Organette’ which shows a small
domestic organette. This instrument has only fourteen notes but is
still capable of producing a good tune. The paper would normally be
stored as a roll. I
Can you think of any advantages or disadvantages of paper over

cardboard for controlling instruments?
Comment
Advantages include less storage space necessary as paper is thinner and
can be rolled rather than just folded. Also paper is easier to perforate
than card so smaller holes are possible.
The main disadvantage is that paper is more fragile and tears easily.
Mechanisms must thus be made to put as little stress as possible on
the paper possibly making them more complicated as you will see in
the next section. I
3.3 Piano rolls

The Pianola, described at the beginning of this section, used a
perforated paper roll or piano roll as it became known, to provide the
musical instructions to play the tune. Early piano rolls only played 65
notes (as was the case with the pianola) but this was eventually
increased to 88, so matching the number of notes on a standard piano
keyboard. The piano roll system operated by suction in a way similar
to the organette described in the previous section. This method had
two advantages. Firstly, it was relatively quiet in operation, which was
important when used in living rooms. Secondly, it caused minimal
wear to the piano roll as there was no direct mechanical contact
between the paper and the hammer mechanism. Remember that this
was not the case with the cardboard book piano mechanism.
Aeolean’s pianola was an immediate success and companies such as
Duo-Art and Ampico in America and Welte in Germany soon
developed similar mechanisms. Piano manufacturers also saw the
advantage of building player mechanisms into their pianos. Soon
pushups were superseded by reproducing or player pianos as they
were known, although ‘pianola’ is often incorrectly used as a generic
term to refer to any type of player piano. The mechanism of a typical
player piano is described in Box 2 and is illustrated in Activity 15.
Box 2 The player piano mechanism

Each of the 88 keys of the piano action has a mechanism similar to that shown
in the diagram of Figure 14. The paper runs over a tracker bar which has holes
in it. If the hole is covered by paper then an equal vacuum is maintained either
side of the valve pouch by the bleed valve, which has a tiny hole in it – much
smaller than the hole in the tracker bar. When a hole in the paper matches a
hole in the tracker bar the vacuum is lost and the pouch balloons up causing
the bellows to close and the key to play the note. Once the hole is re-covered
the pouch returns to its normal position and the bellows open. The vacuum is
generated either from foot pedals or by an electric motor.
paper roll
suction
tracker
bar
Figure 14 A diagram of a typical player piano mechanism

Run the computer animation for this activity which shows how a note
operates in the reproducing piano. I
Global sales of player pianos reached their peak in 1923 but by this
time both radio broadcasting and gramophone records were becoming
rival sources of entertainment in the home. By 1940 nearly all
manufacturers had either ceased production or had gone out of
business despite the fact that the player piano was, and still is, capable
of giving very fine musical performances as may be enjoyed in the next
activity.
Watch the DVD video sequence 4.3 ‘Reproducing player piano’ which
shows such an instrument in action. Notice the mechanism, but also
enjoy the superb sound of a truly fine musical instrument.
You can watch and listen to a complete performance of Thurlow
Lieurence’s By the waters of Minnetonka played on this reproducing
player piano in the performance section of the video sequences in
sequence 6.2 ‘Reproducing player piano’. I
The manufacturers of player pianos received endorsements from many

notable musicians of the day. Indeed many well known concert pianists
and composers ‘recorded’ performances for player pianos. Box 3
‘Recording piano rolls’ describes the recording process for making a
piano roll.
Box 3 Recording piano rolls

Piano rolls were ‘recorded’ using a special recording piano. Under each of the
piano keys was an electrical contact. Each contact was connected to a corresponding
electromagnetic punch on a perforating machine installed in a nearby sound-proofed
room. As the pianist performed the punches made a series of holes into paper,
thus ‘recording’ the performance onto a master paper roll. Later copies could be
made from this master roll (cf. master recordings discussed later in Chapter 4).
The punches worked at high speed ensuring the most rapid of staccato notes
would be recorded resulting in a hole size for the briefest note of under 1 mm in
diameter. The paper moved at 2 m per minute. Some systems allowed the pianist’s
touch to be recorded so that the dynamics of the performance could be reproduced.
Can you think what might have happened if a pianist played a wrong
note whilst recording a piano roll?
Comment
If a wrong note was evident when the master piano roll was replayed
the offending hole was covered with sticky paper and a new hole cut in
the correct place using a hand punch. In reality, as with recordings
today, every blemish could be covered up and a perfect recording
would result. Even touch and tempo could be reworked. The composer
Percy Grainger was reported as saying that the piano roll reproduced
him not merely as he did play but as he “would like to play”*. I
*Ibid, p. 35.
Listen to the two audio tracks associated with this activity. On the first
track you will hear an excerpt of George Gershwin’ Rhapsody in Blue
made for the reproducing piano by the composer in 1925. In addition
to playing the solo music Gershwin added a piano reduction of the
accompaniment passages normally played by an orchestra. As it would
not have been possible to play both the solo and accompaniment
passages at the same time when recording the original piano roll a
second pass was made to add the additional notes in a manner similar
to ‘over-dubbing’ on a tape recorder.
On the second track you will hear a similar excerpt from Rhapsody in
Blue, again played by George Gershwin but this time accompanied by
the Columbia Jazz Band conducted by Michael Tilson Thomas. This
recording was made in 1976 even though George Gershwin died in
1937! For this recording the accompaniment in the piano roll was
painstakingly removed by covering the holes corresponding to each
note of the reduction leaving just the solo piano passages. I
The piano roll mechanism was capable of providing very fine

performances. The mechanism was not subject to the constraints of
barrels or cardboard books as the use of vacuum control allowed for
very subtle changes in both tempo and tone, nearly equalling that of
live pianists.
4 HOW FAR CAN YOU GO?
Any instrument can automatically play a tune as long as a mechanical

method can be devised to operate it. With instruments such as pianos,
organs and carillons the notes are already available, the mechanism
simply operates the machine to sound the note. But what of
instruments where the note has to be formed before playing?
ACTIVITY 19 (REVISION)
Name some instruments where the note has to be formed before it can
be played? I
4.1 Electromechanical violins

The ingenuity of the mechanical instrument manufacturers has been
well tested over many years and combining different instruments
together to build what has come to be termed orchestrions or
mechanical orchestras has always been popular. Nearly all
instruments of the orchestra have at one time or another been
mechanised.
Watch the DVD video sequence 5.1 ‘Electromechanical violin player’

which shows a mechanical instrument which, in 1910 when it was
originally made, was said to be the eighth wonder of the decade! It
consists of a solo violin with piano accompaniment. The violin is
played by four rosin-coated wheels which recreate the action of the

bow. The strings are stopped by mechanical ‘fingers’ and kept in tune
by weights ensuring relatively stable tuning, at least within itself!
Unusually, this instrument has an electro-mechanical mechanism
rather than the more usual pneumatic system. Electrical contacts,
which sense the holes in the paper roll, energise electromagnets to
operate the various functions. (So much electrical interference is
generated by this instrument that during the filming the camera’s
electronic counter was reset!) I
Unfortunately despite being a wonder of its age the electric café violin
is really an example of ambition over achievement for the violin is not
really best suited for mechanised operation.
4.2 Banjo orchestras

Believe it or not the banjo was much better suited to mechanisation
than the violin. This was probably because the action of plucking
rather than bowing the strings was more easily mechanised. Also the
string was stopped against conventional finger-board frets making the
note easier to form.
Watch the DVD video sequence 5.2 ‘Banjo orchestrion’ which shows
the mechanical banjo orchestrion built in the early 1990’s by Ramey &
Co of Detroit to a much earlier design. The banjo is accompanied by a
range of percussive instruments. The instrument is controlled by a
conventional piano roll which was prepared using the MIDI
technology that will be discussed later in this chapter.
You can watch and listen to a complete performance of a piece of
music played on this banjo orchestrion in the performance section of
the video sequencies in sequence 6.3 ‘Banjo orchestrion’. I
Manufacturers of orchestrions strove to provide an all-round

entertainment. By placing the mechanism on show, not hidden behind
closed doors, and offering special effects, the instruments were
visually attractive as well as aurally stimulating. Interestingly, juke-
box manufacturers followed this practice nearly half a century later, as
will be seen in Chapter 5 of this block.
4.3 The Grand Electric Orchestra

The Grand Electric Orchestra, shown performing in Activity 22, is an
orchestrion that was reverse engineered in the mid 1980s from ‘a pile
of bits and a few piano rolls’ by Graham Whitehead, the late owner of
the Ashorne Hall Nickelodean Collection, and his colleague Paul Camps.
It brings together a collection of various instruments and effects all
controlled by a piano roll. It demonstrates how the information contained
on the piano roll may be used to play more than just musical notes.
MIDI is now being used in just the same way not only to play music,
but also to control theatrical lighting and other effects during live
performances.
Watch the DVD video sequence 5.3 ‘Electric orchestrion’ which shows
the Grand Electric Orchestra. It is a true tour de force of mechanical
entertainment incorporating a wide range of musical instruments and
special lighting effects – all controlled from instructions contained on
a roll of paper. I
4.4 Composition for mechanical instruments

All the mechanical instruments mentioned above have so far played
music originally written for live performances. However, some
composers have written music especially for mechanical instruments
in order to exploit their particular characteristics. Conlon Nancarrow
(1912–1997) was a composer who, frustrated by the shortcomings of
performers of the time, turned to mechanical instruments to achieve
satisfactory performances of the complex music he wrote. It took him
two years to hand punch the holes in the piano roll for his first work
but later he used a modified piano roll perforating machine. The
resulting sounds were often described as ‘superhuman’, for the player
piano offered a sound more akin to that of people playing rather than
the ‘inhuman’ electronically generated sounds often associated with
this type of music.
Listen to the audio track associated with this activity. This is Conlon
Nancarrow’s Study for Player Piano No 49a played on a 1927 modified
Ampico player piano. Notice that the opening sounds like a normal
piano piece that could be played by any competent pianist. After about
20 seconds you will begin to realise that something else is going on
and after a few more bars I hope you are left with the impression that
either several pianists or multitrack recording is being employed.
However the finale is so fast that only a mechanical piano could cope.
Speeds of up to 50 notes a second are quite normal in works written
by Nancarrow. I
Of course pianists can play Nancarrow’s works but they do have to

resort to multitrack technology as demonstrated in the next activity.
Listen to the audio track associated with this activity. You will hear an
excerpt from Conlon Nancarrow’s Player Piano Study No. 11
beautifully played on a conventional Steinway grand piano by Joanna
MacGregor. It is a multitrack recording as the piece requires eight
hands to play it! I
A pianist playing a composition written for a mechanical musical

instrument is offered here as a paradox to the very existence of the
mechanical instrument. For us to listen to mechanically reproduced
music demonstrated a basic need for music where ever and whenever
it was wanted. Most importantly it allowed music to be heard at any

time in the days before recordings. Mechanical music, or the rather
more basic idea of storing the instructions to reproduce music in the
form of a code, continues to flourish through the use of the MIDI
system.
What is the major difference in performance between the pin barrel

system and the piano roll system? I
5 WHAT DOES MECHANICAL MUSIC TELL

US ABOUT MUSIC IN CODE?
We have now come to the end of our brief look at mechanical music.
Before we move on to look at the MIDI system, let us stop for a
moment and think what mechanical music instruments can tell us
about music stored as codes rather than representations of actual
sound waveforms. This will help you during your study of the
MIDI system.
The first point to note is that music codes only contain information
about what note should be played and perhaps how it is to be
played, they do not determine the actual sound. Three examples
will illustrate the main consequences of this system of storing
music.
• If one was to load a pin barrel into a pin barrel orchestrion the
wrong way round, the music would not simply be played
backwards, it would sound completely different since pins
meant for high notes would activate low notes and vice versa.
• Consider the case of two different player pianos from different
manufacturers. Unless there is agreement about the number of
tracks on the paper roll and the function of each track, it is quite
likely that a roll for one piano would sound wrong when played
on the other manufacturer’s instrument.
• Even if the music is played on two compatible instruments, it
will not necessarily sound the same – the tuning on one might
be slightly different from that of the other, or one might have a
different tone from the other (or one might even use a different
instrument to play the notes).
These examples lead to some fundamental factors about storing
music as codes:
• there is no guarantee that the intended notes will be reproduced
correctly,
• there is a need for standards,
• the timbre of the notes produced cannot be absolutely
determined.
Other important points to note from the discussion of mechanical

music are:
• the music is polyphonic with chords (two or more notes played at
the same time) being stored in a parallel form (this is achieved by
placing a number of pins or holes at the same point across the pin
barrel or paper roll/card respectively);
• only the more sophisticated devices like the player piano have any
means of varying the dynamics of the music (let alone any other
forms of musical expression), and even here the facilities provided
are quite basic;
• the system is essentially a discrete system (mostly a two-state
binary system) and cannot cope with any continuous variation of
parameters such as pitch and dynamics;
• the codes must be stored in a form where their temporal relation to
each other is retained otherwise the music might sound as a set of
randomly occurring notes/chords not linked to the intended tempo
of the piece;
• unlike storing music as representations of the actual sound,
speeding up the playback of a coded piece of music only increases
the tempo, it does not increase the pitch as well;
• changing pitch without changing tempo can easily be achieved in
coded music simply by changing each note code to the code for the
new pitch and playing back these new codes without altering the
playback rate.
There could, however, be another way of achieving pitch change as the
next activity illustrates.
A simple pin barrel musical box contains only one tune. Can you think
of two possible methods of altering the pitch of the tune without
altering its tempo?
Comment
The easiest method of altering the pitch is to simply slide the pin
barrel to the left or right by the required number of notes – assuming
the mechanics of the musical box permit this.
Another more time-consuming method would be to reposition every
pin on the barrel individually by the required number of notes. I
The first method in Activity 26 is an example of a global change

whereby the codes themselves are not changed, it’s only their
interpretation that is altered. In the second method the actual codes
themselves are changed. This is another feature about coded music –
global changes may be able to be made simply by interpreting the codes
in a different way.
As we now move on to look at the MIDI system, you should keep all
the above points in mind as they will help you to understand what
MIDI can and cannot do.
6 INTRODUCTION TO MIDI AND ITS

DEVELOPMENT
The remainder of this chapter is dedicated to a study of the MIDI
interconnection system which in essence is just another method of
storing music in code, albeit in a slightly more complicated form than
the various methods we have so far looked at.
The Musical Instrument Digital Interface (MIDI) is a method for
storing and communicating music messages between compatible
electronic instruments. These messages can indicate notes that are to
be sounded, they can contain information about the type of sound that
is to be produced, and they can also be used to transfer general control
and timing data from one device to another.
It is important to get clear from the start that like the carillon, musical
box and piano roll systems, MIDI uses codes to represent music; it
does not use sound – either as an analogue or a digital signal. This is
in contrast to the AES/EBU and S/PDIF connection systems that were
introduced in Chapter 1 of this block, where the sound itself (as a
serial digital stream) is transferred.
You might like to think of MIDI codes as instructions such as ‘start
playing middle C’, ‘stop playing A3’, or ‘select your string sound’. If
you keep this in mind it should help you during your study of the
MIDI system.
If MIDI is just a storage and interconnection system for music codes,
you may well be asking why am I am allocating most of a whole
chapter to its study? I hope you will realise fully the answer to this
question after you have studied the remainder of this chapter. In
essence though, arranging for synthesisers and other electronic
musical instruments to understand and respond to a common set of
control messages, has not only allowed the whole electronic music
instrument industry to blossom it has also enabled music
performances to be stored in a compact form and to be transferred
easily and quickly between devices and over the Internet. It has thus
become a universal interconnection system that is now being used for
a wide variety of purposes way beyond its original basic purpose as
you will see later.
MIDI then is a method for controlling musical instruments. Some
would say that it stifles creativity and is therefore a bad thing, but
others find it absolutely essential for their music making. On balance
though, the advantages that the MIDI system provides to both amateur
and professional musicians, and the music industry in general, far
outweigh the concerns of those who believe that it has limited
creativity.
Before we get into the details of the MIDI system, I would like to give
you a short background to the development of MIDI since its
introduction in 1983, from which you should appreciate why MIDI has
become so important in music making today.
6.1 The development of MIDI

Most conventional musical instruments produce a single sound – that
is not to say they necessarily produce only one note at a time, just that
all the notes have roughly the same timbre. A pipe organ, though, is
different in that it combines a number of sets (or ranks) of pipes, each
of which produces a different sound. However, there is still only one
player, so the organ can thus be thought of as a collection of separate
instruments all controlled by a single ‘user’ interface, i.e. the keyboard
(or keyboards if the organ has more than one manual). In addition, the
player has control over which of these separate ‘instruments’ sounds
by stops which control whether each individual rank of pipes is
switched on or off. There may also be buttons called pistons that select
preset combinations of stops to save the player having to adjust all the
stops individually.
This example from the conventional music field demonstrates how in
this situation a single player has not only to play the notes, but has
also to control a number of different instruments which may or may
not be sounding together.
6.1.1 The need for a control interface

When electronic synthesisers first appeared, they were monophonic
(one note at a time) analogue devices that occupied a great deal of space
and required complex and time-consuming setting up to produce the
required sound. In addition to only producing one note at a time, they
could only produce one type of sound at a time. As a result, they were
rarely used in live performance and their use was mostly limited to
recordings where the set-up time was not a problem, and multitrack
recording techniques could be used to combine a number of different
sounds.
Listen to the audio track associated with this activity. This is J.S. Bach’s
‘Sinfonia’ from Cantata No. 29 played by Wendy Carlos on the Moog
synthesiser. This is another track from the Switched-on Bach album
that was featured in the Moog synthesiser video sequence in
Activity 22 in Chapter 8 of Block 2. As you listen to the music
remember that this was created part by part and sound by sound
using a monophonic synthesiser and multitrack recording techniques,
and was certainly not something that was able to be done in a live
performance. I
In the 1970s, more compact synthesisers started to become available

that could be used in live performances. However, in order to provide
a more interesting overall sound, musicians liked to combine sounds
from two or more synthesisers. Thus a single player might play two
keyboards at the same time (one with the left hand and one with the
right). Indeed a player might be surrounded by four or more keyboards
and transfer from keyboard to keyboard even during the performance
of a single piece of music.
These early synthesisers were still monophonic devices, but they were
all controlled by analogue electronic voltages which were supplied
from the keyboard. So by designing the individual synthesisers to
work with the same values of control voltage, it was possible to
connect one keyboard to a number of separate synthesisers – just like
the organ example above.
However, when polyphonic synthesisers appeared, this method of
control no longer worked as a single control voltage cannot easily be
used to indicate more than one key being pressed at a time. Within
these polyphonic devices, the keyboard would be scanned
electronically, and key presses would be communicated to the sound
generating circuitry by numerical codes.
So, theoretically even with these early polyphonic synthesisers it
would have been possible to separate the keyboard from the sound
generating circuitry. Conversely, the sound generating section might
thus be able to be remotely controlled by codes which may not have
come from a keyboard – they might have come from an electronic store
that sent the correct codes at the right time to produce the required
music. Indeed, manufacturers did sometimes use this feature to
produce devices called sequencers that were able to store the key codes
for one or more pieces of music.
The problem is that manufacturers kept their key codes secret, and in
any case one manufacturer’s set of codes was incompatible with
another’s. Of course this is quite understandable since if
manufacturers provided a remote control connection on a synthesiser
they did not want customers to buy a competitor’s keyboard or
sequencer to control it with.
However, as long as there was no standard method of connecting
different types and makes of synthesiser and keyboard together, the
problem of producing multiple different sounds at the same time
without multiple keyboards would remain.
6.1.2 The origins and acceptance of MIDI

In 1981 Dave Smith of the Sequential Circuits company (a now defunct
electronic musical instrument manufacturer) proposed a common
control interface for synthesisers called the Universal Synthesiser
Interface (USI). Surprisingly this idea was taken up by several
synthesiser manufacturers, renamed and developed into the first MIDI
specification. The first public demonstration of a MIDI connection
between a keyboard from one manufacturer and a synthesiser from
another was given in 1983 at the Frankfurt Music Fair.
Why did I describe the idea of the manufacturers getting together to

define a standard as surprising?
Comment
I said this because each manufacturer risked a loss of sales since, by
agreeing on a common interface, they were opening up the possibility
of allowing users to purchase other manufacturer’s products to use

with their own. However, they had the foresight to think that in the
long term the flexibility that a common interface would offer would
lead to a growth in the market and thus increased sales – and they have
certainly been proved correct! I
By 1985, most manufacturers of electronic musical instruments were

including a MIDI interface on their products as customers were
starting to avoid buying products that did not have MIDI.
By 1987, the popularity of MIDI was assured, but at the same time
its limitations were becoming more evident, and so the original
specification was enhanced to cater for some of these limitations.
The result was even more reason to buy MIDI-equipped equipment
and so an even wider range of MIDI products were produced.
The more recent introduction of General MIDI and downloadable
sounds has further increased the range of music applications, the
range of devices available and the popularity and flexibility of
MIDI. As you will see later, MIDI has now gone beyond the
musical field into the realms of theatre, although in this chapter
I will restrict my discussion mainly to MIDI’s musical roots and
applications.
MIDI has enabled both professional and amateur musicians to
compose and record their own music at home. Before MIDI,
recording studios were the only place where musicians could
combine instruments and experiment with sounds to produce a
final recording – an expensive and time-consuming task since
hiring a recording studio was, and still is, very costly. In the MIDI
era, the musician can carry out much of the time-consuming
experimentation and mixing at home, and then take a fully edited
MIDI file to the studio for the final addition of vocals or any other
sounds that need to be recorded ‘live’.
Nowadays MIDI can be found in a large range of different and
varied products from simple electronic keyboards, through
synthesisers that work only via MIDI (i.e. they do not even have a
keyboard), to computers incorporating a MIDI interface and full
audio processors such as the Yamaha AW16G device you met in
Chapter 1 of this block.
Most desktop computers incorporate MIDI in some form or other –
either they contain an integrated MIDI-controlled hardware
synthesiser in the sound card, or this is simulated in the operating
system software. There are also devices and computer programs that
just manipulate and store MIDI codes (e.g. MIDI sequencers and
librarians), they do not in themselves use the codes to produce
sounds. So you can see even now that MIDI is more than just an
interconnection system.
6.2 A word of caution

Before I sing the praises of MIDI too much, you should note that MIDI
does have its limitations, and is certainly not a universal solution for
every situation.
The main limitations of the MIDI system that you should keep in mind
during the following discussion are:
• MIDI only contains codes for musical sounds (i.e. sounds with a
definite pitch), although there is provision for percussion and a few
other sounds. However, it cannot be used (yet!) for any random
sound.
• MIDI does not contain the actual sound you hear, it only contains
codes that instruct a MIDI device to produce musical sounds.
• Only the discrete pitches of one of the 12 pitch classes on a
standard keyboard can be transmitted – intermediate frequencies
cannot be represented or transmitted (yet!) although MIDI does
cater for the use of pitch benders which can address this situation
somewhat.
• Only a small number of the many nuances of playing an instrument
can be accommodated.
• MIDI does not define the exact sound that should be heard,
although this aspect is now being addressed with General MIDI and
the new downloadable sounds specification.
Note my bracketed comment in items 1 and 3 above; future
enhancements to the MIDI specification are likely to address these
limitations – indeed at the time of writing (2004) manufacturers are
starting to use special MIDI code sequences to address the pitch
limitation mentioned in item 3.
7 MIDI BASICS
In this section, I will outline the basics of the MIDI system, and this
will serve as an introduction to the more detailed examination in later
sections.
7.1 A simple MIDI set-up

In a basic MIDI system, there
is one master device that is in synthesiser 1 synthesiser 2
control, and a number of slave
devices that respond to the MIDI signal
MIDI signal sent by the master.
For example in a system
comprising a keyboard with a
MIDI interface and two
separate synthesisers, the
master device is the keyboard
and the synthesisers are slave MIDI keyboard
devices. This is depicted in Figure 15 A basic MIDI set-up

Figure 15.
As notes are played on the keyboard, codes known as MIDI messages,
which contain information about the notes being played, are sent one
by one along the interconnecting lead. These messages are interpreted
by the synthesisers and used to generate the corresponding pitches.
7.2 MIDI channels

MIDI allows up to 16 different channels to be used whereby messages
can be directed only to devices that are programmed to respond to a
particular MIDI channel. In our example set-up shown in Figure 15,
this means that if both synthesisers are programmed to respond to the
same MIDI channel, both will sound together. However, if synthesiser
1 is programmed to respond to one MIDI channel, and synthesiser 2 to
another, then by selecting the required channel on the keyboard, the
player can play just one of the synthesisers without having to
disconnect the other.
It is important to note here that MIDI messages are only sent one at a
time serially down a single connecting cable, MIDI does not allow
multiple messages to be sent simultaneously (in parallel). However,
the incorporation of MIDI channels means that by interleaving the
messages from each channel, completely different sets of notes can be
transmitted at the same time. So, for example, one synthesiser could be
instructed to play a completely different tune from another even
though they are both connected to the same MIDI lead.
This particular scenario will not work though using two MIDI
keyboards and two synthesisers with a common MIDI connection,
even if the keyboards are set for different channels. This is because
there are now two master devices which might generate MIDI messages
at the same time on the same connection, and when this happens the
messages will get corrupted.
7.3 Real time operation

If messages are only sent one at a time, you may be wondering if MIDI
can work in real time – that is to say can a MIDI interface cope with all
the messages that need to be transmitted when the player is playing a fast
musical passage that contains chords? Also, what about synchronisation
between different tunes transmitted on separate MIDI channels?
The answer to these questions is all a matter of scale. Yes, even though
only one MIDI message can be sent along the interconnecting cable at a
time, the speed of transmission of each is such that 500–700 messages
can be sent every second. Thus, if for example the player of the setup
of Figure 15 plays a chord containing three notes, the keyboard will
generate three messages one after the other in quick succession – so
quick in fact that when the messages are interpreted by the
synthesiser, the resulting sound appears to the listener as a chord
rather than three separate notes played one after the other.
Similarly, for the interleaved tunes example where different channels
are used and the individual MIDI messages are interleaved. Even if two
notes on different channels need to be sounded at exactly the same
time, the messages will still be transmitted one after the other, but in
such a short time as to be perceived as sounding simultaneously.
However, even with a capability of 500–700 MIDI messages each
second, in a large system with many sound devices controlled perhaps
by a computer which can deliver many channels of MIDI data at the

same time this message rate can be a problem, and can lead to audible
delays. In such situations, more than one MIDI connection is often
used to spread the messages over the multiple connections.
The MIDI system also caters for individual devices to be able to play
their own stored songs or sequences of music under control of the
master MIDI device – both in terms of starting and stopping at the
same time, and for keeping the devices in synchronisation whilst the
music is playing.
7.4 MIDI messages

As I mentioned above codes representing notes played are made up
of one or more MIDI messages. A complete MIDI message is made
up of one or more MIDI bytes. A MIDI byte is the smallest unit of
information that is transmitted (it is called a byte because it occupies
one 8-bit binary data word). There are two classes of message –
channel and system. Channel messages are any instructions that apply
to a particular MIDI channel (e.g. play a note on a particular channel),
and system messages are general control messages.
A status byte is used to identify the type of message that is being sent,
and a data byte is used to specify the numerical data value that is
needed for the particular MIDI message (if one is needed).
Note that MIDI messages are sometimes referred to as MIDI codes –
particularly when they are held in a stored form.
7.5 Specification components

The detailed MIDI specification that formalises the above is controlled
by a body called the MIDI Manufacturers Association (MMA). The
‘Standard MIDI’ specification contains three basic components:
• a hardware specification (connectors, cables, voltage levels, etc.);
• an electrical specification for how the music messages are
transmitted;
• a specification for how the messages are to be interpreted.
Since the introduction of the original version 1 specification in 1983,
the MMA have added a number of small enhancements and I will
incorporate these in my discussion where appropriate. However, they
have also specified some major new additions, notably:
• General MIDI;
• MIDI time code;
• a standard for MIDI computer files;
• MIDI machine control and MIDI show control;
• a system for downloadable sounds (DLS).
In the following sections I will be expanding on the above components
of the specification, and the workings of the MIDI system will be
reinforced by a number of practical activities.
8 MIDI HARDWARE
As I mentioned above, the MIDI specification incorporates details

of the cable and connectors that should be used with MIDI devices.
In this section I will introduce the main components of this part of the
specification.
8.1 MIDI cable

The MIDI signal is carried by a single balanced one-way communication
path using a pair of wires. The cable that is used consists of a twisted
pair of wires surrounded by a metal screen. At each end, there is a
standard plug of the type that used to be commonly used for audio
connections in consumer devices (see Box 4).
The use of a ‘consumer’ type connector indicates the thinking of the
designers of the MIDI specification – that MIDI is definitely designed
towards consumer use.
Box 4 MIDI cable

A MIDI cable uses a 5-pin DIN plug at each end. Only three of the 5 pins are
used – pins 4 and 5 are used for the signal wires and pin 2 for the metal
screen, as illustrated in Figure 16(a).
The maximum length of a MIDI cable is 15 m, Figure 16(b) shows a typical MIDI
cable.
twisted pair cable
shield
5 2 4 5 2 4
3 1 3 1
5-pin DIN plug

(a) (b)
Figure 16 (a) MIDI cable wiring diagram; (b) a MIDI cable
Suggest a reason why a twisted pair, screened cable is specified for a

MIDI connection. I
8.2 MIDI ports

The set of connectors provided on a device that has a MIDI interface is
called a MIDI port. There are three connectors and they all use a 5-pin
DIN socket that is compatible with the plug at the end of a MIDI cable:
• MIDI OUT – this connector is an output that sends MIDI signals to
other devices;
• MIDI IN – this connector receives MIDI signals;
• MIDI THRU – this connector (which need not always be present)
provides a relay of the MIDI signals received on the MIDI IN connector
so that a number of MIDI devices can be chained together.
Figure 17 shows the form of these
three connectors.
Figure 17 Physical layout of the three

MIDI IN MIDI OUT MIDI THRU
MIDI connectors
The use of a one-way signal and
MIDI IN MIDI THRU MIDI IN

separate input and
output
connectors avoids any connection
synthesiser 1 synthesiser 2
problems that might result in two
outputs being connected together.
Figure 18 shows the MIDI

MIDI OUT
set-up from Figure 15 but
with the connections
labelled.
Figure 18 Two synthesisers

and a MIDI keyboard connected
in a chain MIDI keyboard
With the connections as shown in Figure 18, the MIDI signal sent by
the keyboard when it is played is received by the first synthesiser.
Within this synthesiser, the keyboard signal is relayed to the second
synthesiser via the MIDI THRU connection.
If, on the other hand, the second synthesiser’s MIDI IN connector was
connected to the MIDI OUT connector of the first synthesiser, then the
MIDI signals from the keyboard would not be received by the second
synthesiser. But now if the first synthesiser also incorporated a
keyboard, then playing this keyboard would cause the notes to be
sounded on the second synthesiser (as well as the first synthesiser).
To confuse the situation even further, some synthesisers that
incorporate a keyboard have a ‘remote’ or ‘local off’ setting whereby the
keyboard is effectively disconnected from the sound generating section
of the device. If this is done, and the alternative connection scheme
mentioned above is used, then the second synthesiser will be played
by the keyboard of the first synthesiser and the first synthesiser will
be played by the separate MIDI keyboard!
A MIDI set-up consists of a separate MIDI keyboard, two keyboardless

MIDI-controlled synthesisers (synthesiser 1 and synthesiser 2) and a
third synthesiser that has both a keyboard and a MIDI interface
(synthesiser 3), and is set in ‘local off’ mode.
Draw an interconnection diagram that shows the MIDI connections
needed so that the MIDI keyboard will play synthesiser 3 only, and the
keyboard on synthesiser 3 will play both synthesiser 1 and synthesiser 2.
I
8.3 Computer connections

The 5-pin DIN connector mentioned above is the only one that is
approved by the MMA under the MIDI 1.0 specification. However,
many desktop computers now have MIDI capability, but the width and
length of the back plate of a standard plug-in computer card are not
sufficient to accommodate three of

these types of connector as well as
analogue audio connectors.
To solve this problem, many
computer manufacturers use the
standard USB or the older serial
or joystick connectors to provide
the MIDI input and output.
Adapter leads that provide the
standard 5-pin DIN sockets need
to be used but these are readily
Figure 19 MIDI computer adapter lead available (Figure 19).
Clearly the continued use of three comparatively large connectors is

not a very satisfactory situation, particularly with the increasing use of
MIDI with computers and the Internet and the reducing size of
computers and their connectors. So it is likely that the MMA will
specify a new connector in any new MIDI specification.
9 MIDI ELECTRICAL SPECIFICATION
This part of the specification details the values of electrical voltages

and currents that are to be used in a MIDI connection, and also the
form of the signals that go to make up the MIDI messages.
Note that MIDI uses a digital signal, and so there are only two
electrical states on the connection corresponding to either a binary 1 or
a binary 0. Remember also that MIDI data is transmitted serially as a
sequence of these binary 1s and 0s, each one of these being termed a bit
of data.
9.1 Electrical signals

A common problem with connecting audio equipments together is the
generation of earth loops which can cause mains hum to be introduced
into the audio signal as explained in Box 5. So, to prevent a MIDI
connection being the possible cause of hum from an earth loop, at the
receiving end (i.e. at the MIDI IN connection) the MIDI signal is fed
through a component called an optoisolator which provides electrical
isolation of the MIDI signal from the receiving device. An optoisolator
is an electronic component that uses light to transmit a digital signal
from its input to its output, and there is thus no direct electrical
connection between the input and the output (see Box 6).
Given the use of an optoisolator, the electrical specification is based
around the need for the MIDI transmitting unit to drive the input of
the optoisolator located in the receiver. Thus the specification is given
in terms of whether a current (of around 5 mA) is flowing or not. No
current flowing indicates a binary 1 bit and a current flowing indicates
a binary 0. If you are interested, Box 7 which is non-assessable, gives
some outline details of a standard MIDI interface.
Box 5 Earth loops

An earth loop occurs when two items of audio equipment are connected via an audio signal
path as explained below.
audio signal ‘common’ signal
If both equipments have connection connection
earth connections in
their mains supplies, device 1 device 2
and the common
L N E E N L
connection of the signal
connection(s) are also earth loop
connected at each end
to the equipments’
Earth
mains earth, then an Neutral mains
earth loop is formed as supply
Live
illustrated in Figure 20. Figure 20 Multiple earth connections can create earth loops
Why should this cause a
problem? There are two effects that can occur, first the loop acts as an aerial, and picks
up mains and radio interference. This, however, is usually secondary to the effect caused
by imbalances in the mains earth connections. Because the mains earth connections are
not perfect and have some (small) resistance, this can cause there to be a slight difference
in voltage between the common or ‘earth’ signal connections of each of the two items of
equipment. Thus, when the signal lead is connected and the two common connections
become connected, small mains frequency currents start to flow which can produce an
audible hum. This is particularly likely to be noticeable for low-level signals which are
subjected to a high degree of amplification in the receiving device.
There are a number of solutions to this problem, but one of them is NOT to disconnect
any of the mains lead earths as this could cause a shock/safety hazard. A simple solution
is to try connecting all the equipments to the same plug via a multiway mains adapter
(assuming the mains plug will not be overloaded) – rather than plugging them into sockets
on opposite sides of the room. This ensures that both the equipments are grounded to
the same earth point.
A second solution is to disconnect the common wire of the signal cable at one end only. In this
case, the signal is still screened from interference, but there is no earth loop.
A final solution is to use some sort of electrical isolator where there is no direct electrical
connection between the input and the output, but audio signals are nevertheless allowed
to pass.
Of course, the fact that a MIDI connection does not carry audio signals does not necessarily
make any difference when it comes to introducing hum. Any earth loop, whether via an
audio signal common connection or by any other path can be the source of hum.
Box 6 Optoisolator
An optoisolator is a small electronic component that contains a light source in close optical
contact to a light sensor (Figure 21). Both devices are enclosed in a light-proof moulding
so that they are not affected by external light. The light source is usually a component
called a light emitting diode (LED) which requires a current of about 5 mA to produce
sufficient light. There is no electrical connection between the LED and the light sensor.
Optoisolators work best with light source light
digital signals, and there are (LED) light detector
various types of devices
designed to operate at
input output
different maximum data rates
(the higher the data rate the
more expensive the device). light-proof encapsulation
Figure 21 Diagram of an optoisolator

Box 7 Standard MIDI interface circuits (non-assessable)

For the MIDI OUT connector, the serial digital data is fed through an electronic
driver (amplifier) which is a device designed to be able to provide a current of at
least 5 mA through its output (i.e. a current sufficient to be able to drive the LED
in the optoisolator at the receiving end). The driver’s output is fed to pin 5 of the
MIDI OUT connector. Pin 4 of this connector is fed from a +5 V DC supply, and Pin 2
is connected to the MIDI device’s earth connection. To protect the outputs from
damage due to accidental shorting, resistors are placed in the signal leads.
For the MIDI IN connection, pins 4 and 5 of the connector are fed to the optoisolator
– again there are some protection components to prevent damage due to incorrect
signals or connection. The output from the optoisolator is the serial MIDI data
stream which is fed to the decoding circuits.
If a MID THRU connector is provided, the serial output signal from the optoisolator
is also fed to another electronic driver circuit and then on to the MIDI THRU
connector just like the MIDI OUT signal.
Figure 22 is the electrical circuit diagram for a standard MIDI interface.
protection
+5 V resistors
serial MIDI
data
output driver
5 2 4
3 1
MIDI OUT connector
input output +5 V
opto-isolator
serial MIDI data
+5 V
5 2 4 protection
3 1 components
output driver
MIDI IN connector
5 2 4
= device earth connections
3 1
+5 V = 5 volt supply connections
MIDI THRU
Figure 22 Circuit diagram of a MIDI interface
Note here that because no current flowing indicates a binary 1 data bit,
the idle state (i.e. the binary state when no data is begin transmitted, or
when the MIDI input is disconnected) must be also be interpreted as a
binary 1. This sounds a little odd, but this approach makes the MIDI
signal compatible with the serial system that has been used for many
years in computers, where the convention is that the idle state is a
binary 1. This also allows standard readily available computer
components to be used to convert the serial data into MIDI bytes as
you will see in the next section.
9.2 Serial-to-parallel conversion

As I mentioned earlier, MIDI messages are composed of one or more
bytes, and a single byte contains 8 bits of binary data. Since the MIDI
interface only allows one bit to be transferred at a time, on transmission,
each MIDI message must be converted into a serial stream of single
bits, and on receipt, the reverse process must be carried out so that the
receiving device can interpret the MIDI messages.
So why can’t the 8 data bits just be sent one after the other? Well, yes
they can (as shown in Figure 23), but think about the situation where a
whole stream of MIDI data is being sent, and the user plugs in the receiver
in the middle of the stream. How does the receiver know the point in the
serial stream where one message ends and the next begins? Also, although
there is a standard time for each bit to last, what if the timing mechanism
the receiver uses to decode the data is slightly different from that used
in the transmitter (think about the case where there is a whole string of
binary 1s or binary 0s)? (This is a different situation from the AES/EBU
digital audio connection system described in Chapter 1 of this block
where the signal always contains at least one 0 to 1 or 1 to 0 transition
for each bit of the data whether the data is a 1 or a 0.)
most significant bit least significant bit
(m.s.b.) (l.s.b.)
data byte
logic level
binary data 0 1 1 0 0 1 1 1
(l.s.b.) (m.s.b.)
1 1 1 0 0 1 1 0
time
Figure 23 Parallel to serial conversion where the least significant
bit is sent first
The answer to these questions lies in the use of a standard system of

serial transmission that is universally used in computer serial
communications called asynchronous transfer.
In essence this method involves adding an extra bit called a start bit
before the eight MIDI data bits, and another bit called a stop bit afterwards
giving a total of 10 bits for each MIDI byte. The start bit always has the
opposite binary sense to the idle state, and the stop bit always has the
same sense as the idle state. Thus, since the idle state is 1, the start bit
must be 0 and the stop bit must be 1.
Using this system means that not only can the receiver determine the state
of each individual bit correctly, it can also recover the 8 data bits with
no chance of them being mixed up with the bits of an adjacent MIDI byte.
In addition, these 10-bit ‘packets’ can either be sent individually with
any length of time between them or continuously with no intervening
time gap.
The MIDI specification defines the data rate to be 31 250 bits per second.
If the data rate of a MIDI signal is 31 250 bits/s, what is the maximum
number of MIDI bytes that can be sent in a second? I
Run the computer animation for this activity. This is a simple animation
that shows how a MIDI byte is converted to serial form with the two extra
bits added, and how this data stream is decoded at the receiving end.
The simulation shows how the receiver looks for the start of a binary 0
(i.e. a 1 to 0 transition) and when it finds this it waits for a period
equal to one half of the time for a complete bit (1 31 250 × ½ = 16 µs)
before sampling the data stream again. If the result is still a binary 0,
the receiver assumes this is a valid start bit (and not just some
noise or interference), and it proceeds to sample the data stream
every 32 µs (the time for one bit at a bit rate of 31 250 bits per
second). Each time it samples the data stream it notes the binary
state of the signal, and from this forms the 8 bits that form the
MIDI byte (the least significant bit of the byte is always sent first).
As a final check, the receiver checks that the signal is a binary 1 at one
bit time after the most significant bit has been received (i.e. detection
of the stop bit, although often receivers do not bother to do this).
Finally the start and stop bits are removed leaving the original eight
data bits. The receiver than waits for the next binary 0 and the
process starts again for the next MIDI byte.
Comment
You may wonder how the receiver detects the start bit of a MIDI byte
if the MIDI connection is made in the middle of a transmission.
In this case, it is possible that some bytes will be wrongly received, but
at some point the receiver will not receive either a start bit or a stop bit
when it is expecting one, and so eventually synchronisation will be
achieved. I
10 MIDI MESSAGES
Having looked at the physical aspects of how MIDI bytes are transmitted
between two or more MIDI-equipped devices, we can now look at how
MIDI messages are formed from these bytes.
As I mentioned earlier, MIDI messages are composed of one or more
MIDI bytes and there are two types of byte – status and data. The
MIDI status byte is an instruction to do something and a MIDI data
byte provides any data that is needed before the instruction can be
carried out.
In this section, I will be introducing the features of the original MIDI
specification, and in later sections I will describe the major enhancements
that have been made to this specification over the years, that I mentioned
in Section 7.5.
There are two basic classes of MIDI message – channel and system.
Channel messages are the main set of messages that are used to
communicate music instructions, so I will look at these first.
10.1 Channel messages

The MIDI specification provides for up to 16 different musical sounds
to be controlled at the same time. Channel messages (i.e. messages
designed for a particular channel) therefore contain a channel designation
that indicates which particular musical sound is to be controlled. A MIDI
status byte is used to indicate the start of a particular channel message and
this byte not only contains the message type, but also the channel number.
The seven possible channel message types are listed below, and an outline
of the function of each is given in the following text.
• Note On
• Note Off
• Aftertouch
• Control Change
• Channel Mode
• Program Change
• Pitch Bend.
Note that in the discussion below, the numerical value ranges given for
some of the parameters may seen a little odd, but the reasons for these
will become clear in Section 10.4 when the coding of the messages is
explained.
Note On
This is the most commonly used MIDI message and is sent whenever a
note is to be played (e.g. when a note is pressed on the keyboard). The
Note On message requires three MIDI bytes – the ‘Note On’ status byte
and two data bytes which specify the pitch of the note to be sounded,
and the velocity with which the note has been played.
The pitch is specified as a number in the range 0 to 127 for each semitone
on a keyboard. Middle C (C4) is pitch number 60, so the C# above this
is 61, the next D is 62, and an octave above middle C (C5) is 72 (since
there are 12 semitones in an octave).
The velocity is a measure of how hard the note has been played, and
therefore indicates how loud the note should sound. On a piano, if a
loud note is required, the player will play the note with a lot of force,
the amount of force can be determined by measuring the speed at
which the note is pressed. Thus in a MIDI keyboard there is circuitry
for each note that measures the speed the note is pressed and from this
the required MIDI velocity value can be determined. This value, like
pitch, is a number in the range 0 to 127. High velocity values indicate
hard key presses and therefore loud notes.
Even with instruments like a pipe organ and some of the cheaper
electronic keyboards that are not ‘touch sensitive’ a velocity value
must be included, so in these situations a constant mid-range value of
64 is usually used.
Note that a velocity value of 0 is interpreted as zero velocity, or a note
played with a zero volume level. This is usually interpreted as the
equivalent to a Note Off message (see below), and this fact is put to
good use when running status is used (as will be explained later).
What pitch is represented by each of the following MIDI note

numbers?
(a) 69
(b) 48
I
Note Off
This MIDI status message turns a particular note off. As with the Note On
message, this message consists of a status byte (containing the message
type and MIDI channel) and two data bytes – the pitch of the note to be
turned off (0–127 as for the note on message), and a ‘release velocity’
(again a number in the range 0 to 127). Of course, this message will
simply be ignored if the specified note has not previously been turned
on, but the converse is not true, i.e. sending a Note On message when
the note is already playing will cause the note to be sounded again
with (possibly) a new velocity value.
The release speed of a note seems a rather odd parameter to send, and
indeed very few keyboards bother to measure this speed. In any case, it
is not clear what the release speed is supposed to indicate in terms of
the sound heard. In fact, the note off message is not used very much, as
more often notes are turned off by turning them on with a velocity of
zero as mentioned above.
Remember also, that, depending on the particular instrument allocated
to the channel, the actual sound of a note may have disappeared long
before the MIDI signal contains an instruction to turn the note off. For
example, a piano-sounding note will decay over a fairly short time.
Conversely, if a note is played using an instrument that does not decay
(e.g. an organ sound), then this note will continue to sound until it is
turned off. This can sometimes lead to problems of ‘stuck’ notes in the
event of a fault, or if the receiver of the MIDI signal cannot process the
MIDI commands sufficiently quickly.
Aftertouch
Some electronic keyboards have a pressure sensor under the keys so
that in addition to measuring the speed at which a key is pressed, any
additional pressure that the player gives to the key whilst it is
depressed can be determined. This pressure is called aftertouch.
Aftertouch can be used to modify some aspect of the note being played
such as its volume, vibrato or timbre. There are two types of
aftertouch, one that affects each note individually, and one that affects
all notes currently being played. The former is usually only found in
the more expensive synthesisers, but it is common for a device that
does not have this facility on its keyboard nevertheless to respond in
some way to aftertouch MIDI messages.
The polyphonic aftertouch or polyphonic key pressure MIDI message

(the one that applies to individual notes) consists of a status byte
(which includes the channel number), a data byte representing the
pitch of the note affected (0–127) and a data byte giving the aftertouch
value which again is a number in the range 0 to 127 (i.e. the amount of
additional pressure the player has asserted).
The note-independent aftertouch message, called the channel
aftertouch or channel pressure, contains the channel number and the
aftertouch value, but no pitch specification.
Note that the MIDI signal does not contain any details about how this
aftertouch value is to be interpreted. It is up to the musician to ensure
the devices that receive the MIDI signal respond to aftertouch
messages in a suitable manner.
Control change
Control change messages are used to indicate some sort of
modification to the sound through the use of a controller such as a
piano sustain pedal.
These messages contain the control change status byte (which includes
the channel number) and two data bytes representing the controller
type and the controller value. There is provision for a large number of
controller types such as a modulation wheel, foot controller (a pedal
that controls the volume or some other parameter), master volume,
sustain pedal, reverberation level, stereo panning, and there are many
undefined controllers that can be used for future enhancements.
Indeed there have already been a number of new controller types
added since the original specification was prepared.
Some controllers require a simple on/off indication, but others are
continuous and may need a large number of different data values to
indicate a particular setting. Where the data value only requires two
states (e.g. on or off as for a sustain pedal), then in general all values
between 0 to 63 indicate the control should be switched off, and values
between 64 and 127 are treated as indicating the control should be
turned on.
The specification also caters for a ‘double precision’ data value to be
used for a continuous controller that requires more than 128 different
steps. If this is necessary, then two complete MIDI messages (two sets
of three MIDI bytes) are sent, the first contains the most significant
part of the controller value, and the second contains the least
significant part. In this situation, both status messages are identical
control change messages, but the controller type data values are
different (but related) to indicate which part of the controller’s value is
being sent. In this way, controller values of between 0 and 16 383 can
be communicated.
Channel mode
Channel mode messages affect how the receiving MIDI device is to be
configured. They comprise the status byte (which includes the channel
number) and two data bytes – the mode type and the mode data.
The channel mode types are as follows.

• Polyphonic/monophonic mode – switch between playing many
notes at once (polyphonic) or just one note at a time (monophonic).
In monophonic mode, the associated data message can contain a
value that indicates the number of channels over which the
monophonic sound messages will be sent. This could be used for
example for a violin, where each string is allocated its own MIDI
channel.
• Local control on/off – whether or not to disconnect the MIDI
receiving device’s keyboard from its sound generating circuitry as
mentioned in Section 8.2. If local control is turned on (usually the
default situation) the device receiving the MIDI signal can also be
played from its own keyboard (if it has one); if local control is off,
then the keyboard is disconnected from the sound generating
circuits. However, remember that although playing the keyboard
will not produce any sound from the device itself, MIDI messages
may still be produced at the device’s MIDI OUT connector.
• Omni mode on/off – when omni mode is switched on, a receiving
device should ignore the channel number and respond to messages
on any MIDI channel; when off, only messages on any assigned
channels should be processed;
• All notes off – switch all notes off, just in case things go wrong!
The specification recommends that this message is sent
occasionally at appropriate times to ensure notes are not stuck on.
But this message should never be used instead of Note Off
messages.
Program change
This MIDI message is used to select different sounds sometimes called
patches or programs for a particular MIDI channel. This is the
equivalent of changing the stops on a pipe organ. On most electronic
keyboards or synthesisers, there is a set of pre-assigned sounds, and
there is often a facility for having additional sets of user-defined
sounds. Take care here not confuse MIDI channels and synthesiser
programs. Programs are the particular sounds that a synthesiser can
produce, any of these can be assigned to one or more MIDI channels
with the MIDI program change message.
In addition to the status byte the Program change message contains one
data byte that defines the new program number (0–127). When MIDI
was first specified in 1983, it was thought unlikely that any electronic
keyboard would have more than 128 different sounds, but this has
proved to be a false assumption, and today keyboards can contain
many hundreds of sounds.
Note that this says nothing about the actual sound that the synthesiser
will produce (e.g. piano, strings woodwind etc.), this depends on what
happens to be stored in the relevant program when the MIDI message
is received. The General MIDI enhancement to the original
specification that I will discuss later has attempted to address this
problem.
Pitch bend
Many synthesisers and electronic keyboards have a pitch bend wheel
that allows the player to raise or lower the pitch of all the notes
currently being played. When released, the pitch bend generally
returns to its central, no-bend, position.
The MIDI Pitch bend channel message is used to indicate this action to
other MIDI devices, so when the player moves the pitch bend control,
a whole stream of MIDI Pitch bend messages is generated. As well as
the status byte (which includes the channel number), the pitch bend
message contains two data bytes that allow a maximum of 16 384 pitch
steps to be specified, where 0 = maximum pitch lowering, 8192 = no
pitch change, 16 383 = maximum pitch raising.
The problem here is that the message contains no detail as to the size
of the pitch change each individual value represents (although it is
possible to specify this beforehand using special control change
messages). For example the maximum bend might be a semitone or an
octave or more depending on the receiver’s setting. Another problem is
that this affects all notes on the specified channel. So, trying to
simulate a sound such as a violinist sliding up one string while
playing a constant note on another is not possible unless each string is
assigned its own MIDI channel.
In this activity you will use a ‘MIDI Demonstrator’ program that the
Course Team has produced to experiment with some simple MIDI
channel messages. Carry out the steps associated with this activity
which you will find in the Block 3 Companion. I
10.2 System messages

The second class of MIDI message is called a system message as it is
not channel dependent and should be received and acted upon by all
connected MIDI devices, whatever their channel designations.
Like channel messages, system messages are introduced by a status
byte, but unlike channel messages this byte does not contain any
channel information. There are three basic types of system messages –
common, real time and exclusive.
System common messages

System common messages are used to allow all the devices to agree on
some particular musical aspects. They are described below.
• Song position pointer – this message requires a data value consisting
of two data bytes which define the position in the current song that
the receiver should go to (e.g. ‘Let’s go to bar 20’). The song position
is defined as the number of beats since the beginning of the song,
where for the purposes of the MIDI specification, a beat is defined
as 6 MIDI timing clocks (see below). The two data values allow
any position within a song to be specified up to a maximum of
16 383 beats.
• Song select – this specifies which song or sequence is to be played

next. It requires one data value that specifies the song number in
the range 0 to 127. Note that the word ‘song’ is used here, and later
in the chapter, as a general term to cover any complete piece of
music, whether it actually be a song, or just a few bars of music, or
even a whole movement of a symphony.
• Tune request – this message is used to request the receiver to start
its internal tuning routine, if it has one. It was originally intended
for use by analogue synthesisers.
System real time messages

Real time messages allow the various devices to synchronise together.
For example, a synthesiser could get a drum machine not only to start
and stop at the same time, but also to keep in time with the synthesiser.
There are six real time system messages, and they all require just a
single status byte (no additional data bytes are required). In addition,
because they involve real time functions and thus must be acted upon
immediately, they can be sent at any time, even in the middle of other
MIDI messages.
• Timing clock – this message is sent to allow synchronisation
between the various MIDI devices. It is important to note that the
MIDI Timing clock is related to the current tempo (or speed) of the
music, and varies with it – it does not provide a constant measure
of time. This single-byte system message is sent at a rate of between
24 and 128 times within the time for a single crotchet beat at the
current tempo, although higher rates are now more commonly used.
This rate is often referred to as the number of pulses per quarter
note or ppq.
• Start – set the Song position pointer to 0 and start playing (i.e. start
at beginning of the song).
• Continue – resume playing from the current value of the Song
position pointer.
• Stop – stop playing the current song.
• Active sensing – this is an optional (and largely unused) message
that allows receivers to be informed that there is still a MIDI
transmission in progress. Once one of these messages has been
sent by the transmitter it must continue to send them at least every
300 ms whenever there is no other MIDI activity. It basically says
to the receivers “I’m still here, don’t go away”. If a receiver does
not receive this message within 300 ms and there is no other
MIDI activity during this period, then it will reset its parameters to
normal operation. However, this mode of operation is not activated
unless or until a first Active sensing message has been received.
• System reset – this message causes all the receivers to reset their
status to their power on conditions – which at a minimum is
usually channel 1 mode 1, local control on and all notes off. It is
designed to be used manually (i.e. under the direct control of the
operator and not as part of a MIDI song sequence) only when things
have gone very wrong.
System Exclusive messages

System Exclusive messages (commonly shortened to SysEx) are used
to transmit private data of any length to a particular device. These
messages are widely used for loading or backing up synthesiser patch
data (the parameters needed to set up a particular sound), or for
dumping or reloading a synthesiser’s complete set of data. Indeed
they are commonly used to duplicate patch data between compatible
instruments, or to back up a synthesiser’s data to a general-purpose
computer – particularly valuable if the synthesiser does not have its
own backup system such as some sort of removable disk.
Files of patch data for particular electronic instruments are now widely
available over the Internet, and the data they contain can easily be
downloaded into a computer and then transferred to a compatible
device using a MIDI SysEx message. This gives synthesiser users
access to a whole range of new sounds that might have taken them
many hours to set up if they had had to do it themselves.
SysEx messages are also used to configure MIDI processing devices such
as routers. Routers are devices that combine two or more MIDI streams, or
take a MIDI stream and divide it into its constituent channels – sending
the data for each MIDI channel to a different MIDI OUT connector.
A SysEx message consists of the system exclusive status byte followed
by a manufacturer’s identification data byte. The MIDI specification
lays down that each manufacturer wishing to use this MIDI feature
must register with the MMA and be give an identification (ID) number.
Of course there are now many more manufacturers/MIDI devices
registered with the MMA than the maximum number that one byte
allows, so as this became apparent, the specification was quickly
modified to incorporate additional IDs. This was done by specifying
that if the ID data byte is 0, there would be two more data bytes that
would contain the actual ID number. This allows for a further 16 384
possible ID numbers to be used.
After the ID data bytes, there can then be sent any number of data bytes
until the End of exclusive status byte (EOX) is sent, at which point the
receiving device returns to normal operation.
Extended System Exclusive messages

The popularity of the SysEx message as a general data transfer method
has lead to a number of extensions to the SysEx specification, using
additional specific ID data values.
• An ID value of 125 indicates a non-commercial, unregistered ID
that can be used by anyone for experimentation or one-off
applications or products. If there is to be any commercial
exploitation of the application or device, then it must be registered
with the MMA and have its own ID number.
• An ID value of 126 indicates non-real time data. This means that the
data can be processed at leisure by the receiving device. The data
byte that follows this is interpreted as a SysEx channel that can
specify up to 127 different destinations for the data (with a further
one special code to indicate all channels should receive the data).
• An ID value of 127 indicates real time data. The format of this

message is identical to the non-real time message above except that
the data it contains must be acted on immediately.
Because the latter two messages above are not specific to any
manufacturer or instrument, they are often referred to as Universal
SysEx messages.
Further special values have now been allocated for more additions to
the original MIDI specification and these will be mentioned later in
this chapter.
10.3 Running status

Earlier I mentioned that the MIDI interconnection system allowed
some 500–700 messages to be transmitted every second, and that even
at this rate it is easy to start fully loading the system, particularly
when many channels are being used. Also, continuous controllers will
provide large numbers of MIDI messages as the control is varied, and
of course the faster they are varied the greater the number of messages
that need to be sent.
The original inventors of the MIDI system had the foresight to realise
that the message transmission speed might be a limiting factor for
complicated set-ups, and so they incorporated what is known as
running status into the specification to try to maximise the throughput
of messages by removing redundant information.
When running status is used, if there is a need to send a series of
MIDI messages that require the same status byte (e.g. a series of note
ons, or a series of pitch bend messages), then the status byte need only
be sent once at the beginning of the sequence.
For example, without running status, the following MIDI messages
would need to be sent in order to play a C major triad on middle C (the
C major triad consists of the notes C, E and G).
Status byte: ‘Note On’ on channel 1
Data byte: 60 (pitch C4)
Data byte: 64 (mid-range velocity)
Data byte: 64 (pitch E4)
Data byte: 67 (pitch G4)
Also the same number of MIDI bytes will be needed to turn off these
three notes when the keys are released, giving a total of 9 + 9 = 18 bytes.
However, since the three status bytes used to turn the notes on are
identical, running status can be used.
Using running status then, only one status byte is sent at the start of
the sequence to turn on the three notes, as below.

Another single Note Off status byte is sent at the start of the key
release sequence, so this gives a total of 7 + 7 or 14 MIDI bytes overall
– a reduction of 4 bytes.
In this case, the number of MIDI bytes can be reduced even further by
switching notes off using a Note On message with a velocity of 0, as I
mentioned in my discussion of the Note On message earlier.
So, playing and releasing the triad chord could thus be achieved using
the following complete sequence of MIDI messages:
Data byte: 0 (switch on with zero velocity – i.e. silence the note)
This is now 13 MIDI bytes, a further reduction of 1 byte.
As you can imagine, where there are large numbers of notes being
turned on and off, which is the situation when a player is performing a
piece of music, running status in conjunction with switching notes off
using a note on velocity of 0 can allow a whole piece of music to be
transmitted using just one note on status byte at the start. Even real
time system messages can be interleaved without the need for a new
status byte.
However remember that this situation will only occur if only one
MIDI channel is being used and there is no other performance data
such as data from a sustain pedal or a pitch bend control. If a number
of MIDI channels are in use, or other performance data is introduced,
then a new status byte will have to be sent each time such a change
occurs. Running status is also particularly useful for continuous
controllers (e.g. pitch bend) where a great deal of data is generated
in a short time.
In this activity you will use the MIDI Demonstrator program to

experiment with the use of running status. Carry out the steps
associated with this activity which you will find in the Block 3
Companion. I
10.4 Message coding

So far, I have talked about MIDI messages without complicating the
discussion by going into any details about how the various status and
data bytes are coded. In this section I want to outline the MIDI coding
scheme. Note, this section and some subsequent sections contain a
number of tables of MIDI status/data values. You are not expected to
learn any of these tables, but, given such tables, you will be expected to
be able to use the information they contain in an assessment question.
As you are aware, all MIDI messages are composed of one or more
bytes of information that are sent serially, bit by bit and byte by byte,
along the MIDI cable.
As you may recall from Chapter 6 in Block 1, a byte or an 8-bit binary
word has 28 or 256 different patterns of 0 and 1 from eight 0s to eight
1s. If we treat these eight bits as having weightings of 20, 21, 23, …, 27,
then the 256 patterns of 1s and 0s represent the numbers in the range 0
to 255, where a byte with all 0s has a value of 0 and a byte with all 1s a
value of 255. Intermediate values are worked out by adding the weighting
of each bit that is a binary 1. Figure 24 shows the weightings of the bits
in an 8-bit binary word.
most significant bit least significant bit
(m.s.b.) (l.s.b.)
27 26 25 24 23 22 21 20
Figure 24 Weightings of the
bits in an 8-bit binary word

128 64 32 16 8 4 2 1
Determine the denary value of the following binary numbers (the left-
most bit is always the most significant bit).
(a) 0000 1010
(b) 0011 0000
(c) 1100 1001
Comment
If you had difficulty in answering this activity, you should revise
Section 5.5.2 of Chapter 6 in Block 1 before continuing. I
So, in the MIDI system how are these 256 denary values or MIDI codes
allocated? In fact the bits are used individually or in groups of two,
three or four as you will see, but in each case the resulting MIDI byte
can be represented by one number within the range 0 to 255.
The MIDI specification states that any byte that has its most significant
bit set to 1 (the left-most bit when written down, or the bit that is sent
last when it is transmitted over the serial connection) should be
interpreted as a MIDI status byte. In denary this means any code

between 128 and 255 inclusive. Conversely, any byte that has its most
significant bit as 0 is a data byte, and this means denery codes 0 to 127.
This of course immediately explains why data values for such
parameters as pitch and velocity have a range of between 0 and 127.
Also, in a status byte that has a channel designation, the four lowest
significant bits are specified to hold the MIDI channel number.
You should now also be able to see why the MIDI system has a
maximum of 16 channels since four bits have only 16 different
patterns of 0 and 1. Note that MIDI channels are designated 1 to 16, but
the equivalent binary values are 0 to 15, So for example any MIDI
status byte that refers to MIDI channel 1 will have all of its lowest four
bits set to 0, and any byte referring to channel 16 will have all four of
these bits set to 1. You should bear this in mind when working out
MIDI codes in the following discussion.
Ignoring the upper 4 bits, what MIDI channel are the following two
MIDI status bytes referring to?
(a) 1011 1100
(b) 1001 0110
What makes these bytes MIDI status bytes as opposed to MIDI data
bytes? I
At this point it is convenient if I start to refer to the bits in a byte by

their positional weighting, so the least significant bit is bit 0 and the
most significant bit is bit 7 as illustrated in Figure 25.
m.s.b. l.s.b.
7 6 5 4 3 2 1 0
Figure 25 Bit numbering

for an 8-bit binary number
So, in any MIDI status byte bits 0 to 3 are used to specify the channel
and bit 7 is used to indicate a status byte. This leaves three bits
available or up to 8 different combinations in which to specify a
particular MIDI message (bits 4–6). Table 2 relates the eight possible
states of these three bits to the MIDI message that is represented.
Table 2 MIDI messages for bits 4–6 of a status byte
Bit 6 Bit 5 Bit 4 MIDI message

0 0 0 Note Off
0 0 1 Note On
0 1 0 Polyphonic aftertouch
0 1 1 Control change and channel mode
1 0 0 Program change
1 0 1 Channel aftertouch
1 1 0 Pitch bend
1 1 1 System
When these bits are incorporated with the most significant bit and bits
0 to 3, a range of denery values for each MIDI message can be
determined, as shown in Table 3.
Table 3 MIDI message value ranges
Notice that each range covers 16

numbers, and of course this is MIDI message Value range
where the MIDI channel comes in. Note Off 128–143

So a MIDI status byte of 144 will
Note On 144–159
indicate a Note On on MIDI
Polyphonic
channel 1, 145 a Note On on aftertouch 160–175
channel 2, and so on up to 159
Control change
which will indicate a Note On on and Channel mode 176–191
channel 16.
Program change 192–207
Channel aftertouch 208–223
Pitch bend 224–239
System 240–255
(a) Work out the MIDI message and channel number represented by
each of the following MIDI status byte values.
(i) 195 (ii) 137
(b) What status byte value is required for the following MIDI messages?
(i) Pitch bend on channel 3
(ii) Channel aftertouch on channel 5 I
In Table 3 you may have noticed that the Control change and Channel
mode messages both use the same set of status values. How are the two
differentiated?
Comment
The difference occurs in the data byte that follows. Data values of
0–119 are used for Control change messages and values between 120
and 127 are used for Channel mode messages as shown in Table 4.
Remember that data bytes must have their most significant bit (bit 7)
set to 0, so the maximum data value is 127. I
Finally, you will remember that system messages do not have a

channel designation (see Section 10.2). Thus the 16 status values
between 240 and 255 do not select different MIDI channels, instead
they are used as follows:
240 Start of system exclusive message (SysEx)
241–46 System common messages
247 End of system exclusive message (EOX)
248–255 System real time messages
For completeness, Table 5 gives the complete set of byte values from 0
to 255 as detailed in the original MIDI specification.
Table 4 Interpretation of the data values for a

Control change/Channel mode MIDI message
(status byte range 176–191)
Data value MIDI message

or range
0–101 Control change parameter
102–119 Undefined
120 All sounds off
121 Reset all controllers
122 Local on/off
123 All notes off
124 Omni mode off (and all notes off)
125 Omni mode on (and all notes off)
126 Mono mode on (and all notes off)
127 Poly mode on (and all notes off)
Table 5 Complete list of MIDI status byte values
Value or MIDI message

range
0–127 Used for data values
128–143 Note Off
144–159 Note On
160–175 Polyphonic aftertouch
176–191 Control change and channel mode
(see Table 4)
192–207 Program change
208–223 Channel aftertouch
224–239 Pitch bend
240 Start of system exclusive (SysEx)
241 Undefined
242 Song position pointer
243 Song select
244 Undefined
245 Undefined
246 Tune request
247 End of system exclusive (EOX)
248 Timing clock
249 Undefined
250 Start
251 Continue
252 Stop
253 Undefined
254 Active sensing
255 System reset
the Block 3 Companion. These contain details on setting up the
course’s music recording and editing software for MIDI operation and
some simple introductory MIDI exercises using the program. I
11 MORE MIDI FEATURES
Having looked at the basic way in which MIDI operates, I want now to
introduce you to some of the major enhancements to the original
specification that have been introduced over the years. As you will see
some of these enhancements go well beyond the original intention of
MIDI.
Since the details of many of these enhancements are quite complicated,
I will not be looking at them in as much detail as I have done for the
basic MIDI specification. However, for illustration and reference only I
will sometimes include tables of the relevant byte values that are used.
11.1 Sample dump

You own a wavetable synthesiser which allows you to create
complex sounds and you’ve just spent hours fiddling around with
a sample waveform to create a new sound for a specific purpose. Of
course you will then want to make sure you do not lose the settings
for this sound. But your synthesiser doesn’t have an integral disk
drive and you are not really happy relying just on a RAM card.
What you’d really like to do is use your desktop computer as a
backup store.
More frustratingly, having spent hours fiddling around with buttons
on the synthesiser and trying to read the rather small display to
create your new sound, you find that the synthesiser manufacturer
has produced a program for your desktop computer that allows you
to edit waveforms for your synthesiser and so create new sounds
with ease. But how do you get the synthesiser settings to and from
your computer?
To cover these types of situations, the MIDI sample dump standard

was developed. This standard uses SysEx MIDI messages to send
wavetable data (in the form of actual sound samples from a sampled
sound) to another device over a MIDI connection. But do remember
that this transfer cannot be done at the sample rate, so MIDI sample
dump cannot be used for real-time transfer of sound data.
There are two modes of operation – with or without handshaking. The
without handshaking mode is a one-way data transfer where the
transmitter simply sends the data on its MIDI OUT connector and
hopes that the receiver gets it correctly. A more secure way of sending
the data, particularly if the receiver is a little slow, the MIDI lead is
long and prone to interference or there is other MIDI activity occurring
at the same time, is to use the with handshaking mode.
This mode requires a two-way data path between the transmitter and
the receiver, and only allows one device to receive the data. So, the
transmitter’s MIDI OUT connector must be connected to the receiver’s
MIDI IN connector, and to provide the return data path, the receiver’s
MIDI OUT connector is connected to the transmitter’s MIDI IN
connector as illustrated in Figure 26.
In both modes the data is split into separate 120-byte blocks called
packets, and the data transfer works as follows. The transmitter sends
a packet of data and waits up to 2 seconds for a response from the
MIDI OUT
MIDI IN
computer
MIDI IN MIDI OUT
synthesiser
Figure 26 MIDI connections for a sample dump using the with handshaking mode
receiver. If no response has been received, the without handshaking

mode is assumed and the remaining data packets are sent one after the
other. If a response is received within 2 seconds, then the handshake
mode is used. In this mode, if the receiver receives a packet correctly,
it sends a message to the transmitter to this effect. If the data is not
received correctly, the receiver sends an error message to the
transmitter and the transmitter resends the packet of data. This
process continues until all the data has been successfully received.
Each packet contains an identification number so that the receiver can
ensure not only that it has received all of the packets, but also that they
have been re-assembled in the correct order.
The sample dump system is implemented using a number of SysEx
messages as outlined in Box 8 which is given for reference only.
Remember that each of the messages is a self-contained SysEx message,
and in between these, other real-time MIDI sound messages can occur.
Box 8 Sample dump SysEx messages (for reference only)

The sample dump uses MIDI SysEx messages to implement a data transfer.
Since the data is not MIDI sound data and the receiver does not need to act on the
MIDI message straight away, a non-real time universal SysEx message is used.
The basic form of the message is:
1 start of SysEx status byte (code = 240)
2 non-real time universal message data byte (code = 126)
3 a data byte defining the SysEx channel (codes between 0 and 127)
– this enables different devices to have an ‘address’ for SysEx messages
4 the sample dump message identifier code (see Table 6)
5 the message data (if required)
6 An end of SysEx data byte (EOX) (code = 247)
The main sample dump message types are given in Table 6 (overpage). Note
that the loop point request/transmit message is used to save time when editing
a small part of a wavetable when using a computer or separate editing device.
Box 8 (continued)
Since a wavetable can contain a large number of data values, having to
continually reload the data for a the whole waveform just to try out an edit to
a small part can be time-consuming. This feature allows just the part of the
waveform that needs to be edited to be transferred, and so reduces the amount
of data that is transferred.
Table 6 Sample dump message identifier bytes
Code Sample dump Purpose

value message
1 Dump header Contains information about the form of the
wavetable data e.g. the number of sound
samples, the sample period, the number of bits
per sample
2 Data packet Contains 120 bytes of wavetable data (the last
data packet must be padded out to 120 bytes
with zero values if necessary)
3 Dump request A message from the receiver to the transmitter
asking the transmitter to start sending its
sample data
5 Loop point Allows a small part of a sample waveform to be
request/transmit transferred
12 Wait Pause the dump operation until a NAK, ACK or
Cancel message is received
13 Cancel Stop the transfer
14 NAK stands for not acknowledge i.e. a data packet
has not been received correctly
15 ACK stands for acknowledge i.e. a data packet has
been received correctly
Note also that some synthesisers and sound generators just use their
manufacturer’s ID (and sometimes a sub-ID) instead of the second and third
data bytes listed above. They do however use the sample protocol for
transferring data, but the data itself may not be wavetable data but the device’s
own particular configuration and patch data.
11.2 General MIDI

OK, so you’ve developed some exciting new piano and string sounds
on your synthesiser, you’ve backed up the data for these sounds to
your computer using MIDI sample dump messages, and now you’ve
spent ages composing a new song using these sounds. You are
anxious to play your new creation to your friend, so you use a
small portable MIDI sequencer box (a device that can record and
play back MIDI messages) to record the MIDI message sequence for
your song which you then take over to your friend’s house. You
play the song from your sequencer into your friend’s synthesiser
and … out comes a trumpet and some weird bird noises!
What’s gone wrong? Well, as I mentioned earlier, and as you should now
be able to appreciate, there is nothing in the basic MIDI specification that
indicates what actual sound is heard when a particular sound or patch
is selected. So for example in my scenario above, the synthesiser on
which the song was composed may have had the new piano sound
stored in its patch 60 and the new string sound in patch 124 – because
they happened to be spare patches at the time – and thus the MIDI
codes stored in the sequencer would have allocated these patches to
the MIDI channels used for the notes of the song using program change
messages. However, the computer used for playback had a trumpet
sound in patch 60 and the bird sounds in patch 124 (even though the
computer may well have perfectly decent piano and string sounds
available on other patches).
Given this situation, and as manufacturers started to adopt standard
sets of patches, the MMI stepped in and defined a set of sounds called
General MIDI (GM). Whilst doing this, the MMI took the opportunity
also to include some other components in the GM specification to
ensure compatibility as detailed for reference only in Box 9.
Box 9 General MIDI features (for reference only)

A GM-compliant device must have the following features/capabilities:
• a full set of the specified 128 different pitched sounds (or patches);
• a full set of the specified 47 unpitched percussion sounds;
• be able to respond to messages on all 16 MIDI channels simultaneously;
• a capability of at least 24-note polyphony (i.e. can play up to 24 notes at
the same time), and these should be able to be dynamically allocated
between the 16 MIDI channels as necessary;
• be able to respond to the velocity value in MIDI ‘Note On’ messages;
• be able to respond to MIDI ‘Channel pressure’ messages;
• be able to respond to the following MIDI controller messages: pitch
wheel (with a default pitch bend of +/– 2 semitones), channel volume,
pan, expression, sustain, reset all controllers and all notes off.
The GM specification lays down that MIDI channel 10 should be reserved

for unpitched percussion sounds and the other channels (1–9 and 11–16)
for pitched sounds. For reference only, Appendices 1 and 2 give complete
lists of the sets of pitched and unpitched General MIDI sounds.
The pitched sounds are arranged in 16 ‘families’ of instruments, each
containing 8 different sounds. To enable a number of the unpitched
percussion sounds to be sounded simultaneously, these are selected
using particular ‘Note On’ ‘pitch’ values rather than different patches –
for example, a MIDI ‘Note On’ message on channel 10 with a pitch
value of 60 (normally middle C) should produce a hi bongo sound.
Note that only the names of the sounds are defined, how each one
actually sounds is up to the manufacturer. So, the sounds from an
electronic keyboard at the low-cost end of the market may sound only
vaguely like the actual instruments they are suppose to represent,
whereas a more expensive device may reproduce the sounds much
more closely.
As I said earlier, do not get confused between patches and channels.
A patch is simply a particular sound, and any patch can be assigned to
any MIDI channel (with the above proviso concerning percussion
sounds on channel 10 when using General MIDI). Remember also that
when patches are selected using the MIDI ‘Program change’ message,
the actual message must contain a data byte value of one less than the
required patch number since the patches are numbered from 1 to 128
whereas the program change data byte has a range from 0 to 127.
Many synthesisers contain additional banks of sound patches beyond
the basic 128 GM set. To cater for this, there is an additional SysEx
message to switch GM sounds on or off. There is also an additional
master volume SysEx message which controls the overall volume – not
just the volume of the sounds from one MIDI channel.
11.2.1 General MIDI 2, MIDI Lite and SPMIDI

Although General MIDI has greatly helped the music industry in
obtaining some sort of compatibility, some manufacturers felt there
needed to be further functionality embodied in the GM specification to
cater for the additional operational capability that advances in
technology had enabled the manufacturers to incorporate in their devices.
Hence the MMI produced a General MIDI 2 (GM2) specification that
as well as being fully compatible with the original GM specification
specifies that a GM2 compliant device must:
• have a minimum of 32-note polyphony;
• be able to produce up to 16 music patches and 2 percussion kit
sounds simultaneously;
• respond to a number of additional control change messages;
• respond to a number of new SysEx messages which concern tuning,
reverberation and chorus effects and octave scaling (transposition).
In parallel with the development of more sophisticated synthesisers,
mobile phone handsets were also becoming ever more sophisticated,
and simple music synthesisers were starting to be incorporated in
handsets to produce fancy ring tones. In order to have a compatible
system to download ring tones (or should this be called ring tunes or
ring music?) into a mobile phone handset, the MMI has defined two
simplified versions of GM called General MIDI Lite (GML) and
Scaleable Polyphony General MIDI (SPMIDI or SP-GM). In
essence, both of these define a subset of the full GM specification
and with a reduced polyphony capability. In the case of GML, there
is a fixed polyphony requirement of 16 simultaneous notes,
whereas SPMIDI defines a variable number of notes. A novel feature
of SPMIDI is that MIDI channels can be allocated a priority, to
enable less-capable mobile phones to ignore some of the lower
priority channels. Assuming the ring ‘tune’ and basic harmony are
allocated high priority channels, this means that the less-capable
device will still play an acceptable version of the tune even if it
cannot respond to all the MIDI channels in the ring tone.
At the time of writing (2004), this is an emerging application for MIDI,
and it is likely that a number of other performance levels and
variations will be introduced in the future.
the Block 3 Companion. These contain some simple experiments with
General MIDI sounds. I
11.3 MIDI time code

Right, so you’ve now realised that your friend’s synthesiser uses
General MIDI. So you go back to the song on your own synthesiser
and re-allocate the string sound to patch 2 and the string sound to
patch 47 which you think are the General MIDI sound descriptions
that are the nearest match for your new piano and string sounds.
You then play the song on your friend’s synthesiser again. Well, it
doesn’t sound quite as good as it did on your own synthesiser using
your ‘bespoke’ piano and string sounds. Nevertheless your friend
says “Brilliant. Can I add a really cool bass part?”. Knowing your
friend is an excellent bass player, you readily agree. So you use
MIDI to transfer your song to your friend’s sequencer, and a few
days later a CD of the bass part arrives. You start the MIDI sequence
for your song and the CD recording of the bass part playing together
and then sit back to enjoy your friend’s virtuoso performance. But
wait a bit, why is the bass part getting behind the beat, perhaps
your friend is not as good a player as you thought?!
Well I am sure you have guessed what is going on here – there is no

synchronisation between the MIDI playback and the CD player.
11.3.1 The problem of synchronisation

The problem of synchronising sounds or sounds and moving pictures
has been around long before the advent of MIDI indeed, long before the
digital revolution. Synchronisation first became a problem when trying to
synchronise a film to its soundtrack where the soundtrack is recorded on
a separate magnetic tape. Synchronisation was also a problem when two
or more sound recorders needed to be synchronised – for example when a
mixdown required 32 tracks of individual parts stored on two 16-track
tape recorders – and became even more relevant with the advent of video.
Synchronisation therefore may be needed whenever there are two or
more independent devices that are contributing to the overall sound
(and/or vision). To achieve synchronisation, there must always be one
and just one device that is acting as a master time reference and to
which all the other devices are synchronised.
Probably the most common method of synchronisation (but not the
only one) that has been used since the very early days of television is
the SMPTE time code, and this is still the standard method of locking
together (synchronising) professional audio and video devices.
11.3.2 SMPTE time code

The SMPTE time code (often pronounced ‘Simptee’) was developed by
the Society of Motion Picture and Television Engineers as a method of
defining the position of any particular part of an audio or video recording
in relation to the time it occurs from the start of the recording (see
Box 10). You should remember that this code was defined in the days
of analogue recordings using magnetic tape, and because of its wide-
spread use and acceptance, its use has continued into the digital world,
even though in the digital domain there are much better methods of
achieving synchronisation. Another important point to note is that the
code is not just a simple clock tick. Although it does occur regularly, it
contains an absolute time reference, and so use of it does not rely on
measuring the time interval between each code.
Box 10 SMPTE time code

The SMPTE code relates to the scanning rate of a video picture. Each complete
scan of the screen comprises one frame (the word originally taken from the
name for a single picture from a motion picture film), so the basic time code
unit is a frame. A complete SMPTE time code therefore contains the number of
hours, minutes, seconds and frames that have occurred since the start. This
means that, for video, every individual frame has a unique time code stamp.
Now this is where problems start to occur. The original frame rate of the
monochrome television system in America was 30 frames per second (as it was
locked to the mains frequency of 60 Hz), however when they moved to colour
transmissions, they separated the frame rate from the mains frequency and
altered the frame rate to 29.97 frames per second (in order to prevent patterning
occurring from the embedded colour information). Meanwhile over in the UK,
the frame rate was standardised at 50 frames/per second, and to cap it all,
films in the cinema had long since used a rate of 24 frames per second.
In each case, synchronisation with other audio and video devices was required,
and so a number of different SMPTE frame rates emerged. These are known as
30, 30 drop, 25 or 24 frames per second (fps). These clearly correspond to the
four different frame rates mentioned above, but the 30 drop rate needs a little
explanation.
The 30 drop rate is a method of representing the frame rate of the 29.97
American colour television by missing out 108 frame numbers over the course
of an hour (this is done by missing out 2 counts in every minute except in the
minutes 0, 10, 20, 30 40 and 50). What this means is that if the frames are
counted and the dropped frames omitted, the minutes and hours values will be
correct in terms of absolute time. Even more confusingly, in video production,
29.97 non-drop code is commonly used, i.e. no frame counts are missed. This
means that every frame is counted, so that the frame count will not represent
actual time, but it does make calculations easier during the video editing process.
All of the above means that it is important to know which standard of SMPTE
code is being used.
In addition to the time code itself, the SMPTE code has provision for some
additional information called user bits which can be used to record such
information as the date of recording, the take number etc. A full SMPTE code
requires 80 bits of information.
How is the SMPTE code used in audio? Clearly when the code is used
it must in some way be locked to an audio recording so that forever
more the code represents the absolute time interval from the start of
the material. If the code and the audio material were to get out of lock,
then the code would become useless in providing a means to identify
individual parts of the recording. This means that on replay, not only
can the replay speed be checked for accuracy, but where the replay
needs to be synchronised to another sound, its speed can be
dynamically adjusted so that the time codes keep in step with the
master time code generator.
The problem here is that audio is continuous whereas video and film
is discrete i.e. it is composed of individual frames. Sound does not
have any such discrete ‘frames’. The answer is simply to ‘mark’ the
audio signal at the regular SMPTE frame rate intervals.
So, for an original audio recording, a separate audio track is used to
store the SMPTE time code. This can be done before, during or after
the actual audio recording is made and is a process known as striping.

To do this with an analogue device, the time code is converted into a
signal within the audio range, and can therefore be treated just as
another audio track. Once striping has been done, the time code and
associated audio are permanently locked together, so if the tape
stretches or shrinks, then it affects the audio and the time code
equally. Striping can be used for digital devices, but usually there is a
special facility for recording the time code. Therefore, when a number
of recordings have to be synchronised and mixed together, the master
device provides a master SMPTE code, and all the slave devices adjust
their playback speeds so that their own SMPTE codes keep in step
with the master.
11.3.3 MIDI Time Code

When MIDI started to become popular, there became a need to
synchronise devices in a MIDI chain, and as SMPTE time code was
already in widespread use, methods were sought to enable MIDI
messages to carry SMPTE time code information.
As I mentioned in Section 10.2, the original MIDI specification did
incorporate some timing facilities in terms of the Song pointer and the
Timing clock messages. However, both of these are related to the music
and not to absolute time.
How are the MIDI Song pointer and Timing clock related to the music
being played? (You may need to refer to the information given on system
real-time MIDI messages in Section 10.2 to answer this question.) I
It is possible to use the MIDI clock/song pointer for synchronisation, but

it is not easy to combine these with SMPTE time code in a set-up where
some devices incorporate the former method and others use the latter.
So, to provide compatibility with SMPTE, the MIDI specification was
extended to include absolute time-referenced MIDI messages called
MIDI Time Code (MTC).
There are five components of MTC which I will briefly describe below.
Quarter frame messages

An SMPTE frame occurs between 24 and 30 times a second (depending
on the frame rate as explained in Box 10). To be able to achieve the same
positional accuracy as that embodied in this code, a MIDI message
containing the full 80-bit SMPTE code (hours, minutes, seconds,
frames and user bits) must be sent at the same rate. However, a real
time SysEx MIDI message containing this amount of information sent
at the required rate would take up too much time and thus disrupt the
actual MIDI music messages. So, rather than do this, the timing data in
the SMPTE code is divided up into a number of very short MIDI
messages that are sent at a rate of four times the SMPTE frame rate
(between 96 and 120 times each second depending on the frame rate).
Even at this rate, it takes 8 MIDI messages or two SMPTE frames to
send all the information in one SMPTE code. You may think that this
halves the accuracy of MTC as compared with SMPTE, however, the

fact that MTC MIDI messages are sent at a rate of four times the
SMPTE rate, means that the frame counter in the MIDI device can be
incremented after every four MTC messages, even though the full
timing code has not been received.
In order to achieve very short MIDI messages, rather than using real
time SysEx message, one of the previously undefined MIDI status
bytes (code 241 – see Table 5) together with a single data byte is used.
The details of the interpretation of the data byte are quite complicated
and beyond the scope of this course.
Full messages
For cueing purposes, it is convenient to be able to send a complete
SMPTE time code so that a slave device can move directly to a
particular position in a song.
To cater for this, MTC defines a universal real time SysEx sequence
that contains the complete SMPTE time code. Also included in this
message is a SysEx channel number so that individual slaves can be
addressed. One of these channels is reserved to indicate that all slaves
in the MIDI chain should respond.
When this message is received, the only action that a slave device
should take is to move to the defined place in the current song. Playing
should not start until a MIDI start message is received, or MTC quarter
frame messages start.
User bits
As mentioned earlier, the SMPTE time code contains some user bits
that can be used for date/take details etc. Since this data tends not to
vary with each individual time code, MTC does not code it with the
individual quarter frame or full messages, but provides another SysEx
message for this purpose.
Notation information
Sometimes, a device needs specific musical notation information, this
is particularly the case where a musician needs to be informed of the
number of beats in a bar and when each bar starts. So MTC has another
SysEx message that allows the master device to send details about the
current time signature and also a bar marker that indicates the start of
a bar. Such information allows a slave device to produce a click or
illuminate a set of lights in sequence and in time with the MIDI
messages so that a musician can play in time.
Cueing messages
Finally, MTC provides a further range of SysEx messages that can
inform the slave devices of a number of different ‘events’, and when
these events should be carried out. There are messages that set up
these ‘events’, and create a list of actions that should be carried out by
the slave as the song is played. These messages can be generated from
the master device’s edit list and used to inform a slave device of the
editing processes the slave needs to carry out as the song is played.
As well as a number of parameters that can be set up with MTC

messages, importantly there is a message that gives a slave device an
SMPTE offset. This offset value gives a ‘time code shift’ value to
synchronise the starts of the various music elements. For example, it
is unlikely that the SMPTE time code will be exactly 0 (or the same
value) in all of the musical elements of a piece. Rather than re-striping
all these elements (see Section 11.3.2 above) so that their SMPTE time
codes are set to exactly 0 at the start, an individual offset can be set up
for each one. The slave simply adds this offset to its own SMPTE code
when determining when to execute actions stored in its event list.
As you can see from the above brief discussion, the whole process of
synchronisation can become rather involved, and there are many
factors to be considered when deciding on a configuration that will
provide the simplest and best means of synchronisation.
Of course, these synchronisation problems are mainly confined to
studios and professional audio engineers. In the desktop sound
environment, it is likely that a desktop computer or audio sound
processor will be the central device, and if all the elements of a
musical performance are already stored as sound or MIDI files, then
the computer may be the only device involved and will take care of
synchronisation by itself. So in this situation the user does not have to
get involved in the nitty gritty of MTC or SMPTE codes and just needs
to set up an edit list within the computer’s sound processing
programme, press play and the computer does the rest!
11.4 Standard MIDI files

Oh dear, although your MIDI sequencer can generate MTC, your CD
player does not have a MIDI interface at all. So what do you do to
synchronise the bass part to your song? Well, wouldn’t it be nice if
you could use your desktop computer since you know you can get
a MIDI interface for it and it has got a CD drive and can play audio
CD tracks. But what about the MIDI sequence? This is a real time
sequence of MIDI messages, surely they can’t be transferred to the
computer and stored there for later execution … or can they?
Enter Standard MIDI files (SMF). These are special computer files
that not only store the MIDI messages, they also retain the timing
information that is needed to enable each message to be sent at the
required time. On a desktop computer, MIDI files usually have the
extension .MID or .SMF.
Basic format
SMF uses the Interchange File Format (IFF) that is used for storing
digital sound samples using the AIFF and RIFF WAVE formats.
These were introduced in Chapter 1 of this block.
As with the AIFF and RIFF WAVE formats, the basic building block of
an SMF is the chunk which, as explained in Chapter 1, allows other
types of data (e.g. digital sound samples) to be included in a file of
MIDI data and also caters for future expansion by allowing new chunk
types to be specified.
As you may recall from Chapter 1 every chunk contains three sections:
• the chunk identification (a four-character word);
• the chunk size in terms of the number of bytes of data that follow;
• the actual chunk data (which sometimes starts with a chunk type).
Chunk types
For files containing MIDI data, there are two defined chunk types –
header (chunk identification ‘MThd’) and track (chunk identification
‘MTrk’). The file must start with a header chunk and this is followed
by one or more track chunks.
The interpretation of the word ‘track’ in this context needs some
explanation. A track describes any set of MIDI messages held in a
single track chunk. It could apply to the MIDI messages for all the
instruments, channels and other information used in a song (i.e. there
is only one track chunk), it could comprise the MIDI messages for just
one or a small number of MIDI channels (in which case there would be
one track chunk for each channel or channel set used), or it might
comprise a number of different songs or other MIDI data that are
conveniently stored in the same file – or any combination of these.
Box 11 gives some brief details about the information contained in the
header chunk that must be included at the start of an SMF file.
Box 11 The SMF header chunk

The data in the header chunk contains details about the format of the MIDI
data contained in the track chunk(s) that follow (often referred to as the
mode or type), the number of tracks stored in the file and a specification
for the unit of time that is to be used when the MIDI messages are replayed.
The mode and track number specifications indicate how the tracks are to be
used, and how many track chunks the file contains:
• mode 0 indicates that there is only one track and all the MIDI messages
are held in one track chunk;
• mode 1 indicates there are a number of tracks (more than one track chunks),
but they are to be played simultaneously (this could be used where each
track contains the messages for only one MIDI channel);
• mode 2 indicates there are a number of independent tracks (again more
than one track chunk) but each should be played independently from the
others.
The specification for the unit of time gives details on how the timing
information contained in the track chunk(s) is to be interpreted. There are
two types – a timing based on the length of the musical crotchet (quarter
note) or an absolute time in relation to SMPTE frames. Thus the value
given defines the number of time units (or ‘ticks’) that occur between
either consecutive crotchets or between consecutive SMPTE time frames.
In the SMPTE case, the SMPTE frame type is also included (see Section
11.3.2 on the SMPTE code). So in the former case, the timing information
varies with the musical tempo, and tempo change information has to be
included in the track chunks whenever the tempo varies, whereas in the
latter case the time unit is fixed to a real time interval and is independent
of tempo.
A track chunk contains actual MIDI data, so how is the timing

information stored so that on replay the MIDI messages can be sent out
at the correct time? Well, the file contains timing information in the
form of the time interval that is to occur between the execution of one
MIDI message (known as an event) and the next (the next event). This
time interval is called the delta time, and represents the relative time
interval between MIDI events, not the absolute time from the start of
the song. For events that need to occur together (e.g. playing a number
of notes together) the delta time is 0, but for events that occur after a
time interval, the delta time is the number of timing units (‘ticks’) that
are to elapse before the next MIDI event should be executed – the
timing unit being that specified in the header chunk (or subsequently
modified by tempo changes in the case of the crochet-related timing
specification) – see Box 11.
Running status (see Section 10.3) can (and should) be used between
events where possible to reduce the amount of data to a minimum.
Box 12 gives an outline of how the MIDI data is stored in a track chunk
together with its timing information – note particularly the comments
about the variable length specification for the delta time and the
omission of any MIDI message data length or terminator.
Box 12 The SMF track chunk

The data that is contained in a track chunk is divided into a number of individual
events each of which has the basic form:
• delta time – the time interval that should be allowed before executing the
event following;
• the event data (a MIDI message or other event that should occur at the
specified delta time).
An interesting point to note about the delta time specification (and also the length
specification at the start of a chunk and a number of other length specifications)
is that it uses a variable number of data bytes. This is done because delta (and
chunk length) values can vary widely from very small numbers, which may only
require one byte to specify, to very large numbers which may require many bytes.
Thus, rather that using a fixed number of bytes, which is wasteful of storage
space for small values and puts a limit on the maximum value, only the minimum
number of bytes necessary to specify the length is used. This is done by using the
most significant bit of each data byte to indicate if there are further bytes to
come or if it is the last (or only) of the bytes used to define the value (1 for a
continuation, 0 for the last or only byte). This means that only 7 bits can be used
in each byte to specify the actual value. So one byte can only define values between
0 and 127 (27 – 1), two bytes values between 0 and 16 383 (214 – 1), and so on.
Notice how important it is to specify the delta time using this variable number
of bytes as it would be hugely wasteful of storage space and increase file sizes
substantially if every event (every individual MIDI message) had to be introduced
by a delta time composed of many bytes.
The other point to notice about this chunk format is that it contains no event
length or terminator (i.e. some special indication that the event data has ended
and the next byte is part of the next event’s delta time). The reason for this is
that all MIDI messages have an implicit length. For example, a ‘note on’ message
always uses 2 or 3 bytes – three bytes if there is a status byte (in which case the
first byte will have its most significant bit set), or two bytes if running status is
used. Therefore there is no need to add additional superfluous information
that will result in larger file sizes.
Why are file SMF sizes likely to be much larger if a fixed number of
bytes is used to specify the delta time? I
SysEx and Meta Events

SysEx messages can be included in the track chunk data just as another
event, but so that they can be split up between events, there is a length
specification after the SysEx status byte (code 240) giving the number
of bytes in the SysEx event before the next delta time specification.
The end of the SysEx message, in whichever event it occurs is
indicated by the normal end of exclusive (EOX) status byte value (247).
The length is given as a variable number of bytes just like the delta
time.
As well as SysEx messages, there is a special set of events called
meta events that allow additional information about the file and its
data to be included. Box 13 gives brief details about how these meta
events are coded, and what sort of information they can contain.
Many of these meta events you will recognise from earlier as
pertaining either to MIDI time code (MTC) and other MIDI
enhancements, or to creator/copyright details about the file itself.
However, an interesting meta event is the lyric, which contains a
word or syllable that may be associated with a particular MIDI note.
This allows a file to contain ‘karioki’ data so that whilst playing the
MIDI file a suitably equipped device can also display the song
lyrics at the correct points.
Note that a device reading an SMF does not have to be able to recognise
all the meta events. Like chunks that it does not recognise, meta events
that are unfamiliar can just be ignored. This also allows for future
backwards compatible expansion of the list of meta events.
As can be seen from the discussion above, the SMF has the ability
to store not only basic MIDI messages and the associated timing
information, but also all the other more recent additions to the
specification. The use of chunks also means that it is possible to
interleave MIDI data with other types of data such as digital sound
samples, and devices reading the file simply ignore the parts that
they are unable to decode.
This is a short activity to look at the contents of a Standard MIDI File,

just as you did in Chapter 1 of this block with AIFF and RIFF WAVE
sound files. You will find the steps for this activity in the
Block 3 Companion. I
Box 13 Meta events

If the status byte after the delta time at the start of an SMF event has a value of
255 (which in the original MIDI specification indicates a system reset), then this is
an indication that a special event follows called a meta event. A meta event
contains textual and other information about the MIDI file and/or the song that is
playing. There are a number of different meta events, and the particular one is
defined by the value of the byte that follows the 255 status byte. This is followed
by the data length (in variable format), and finally the meta event data.
For reference only, Table 7 contains a list of the currently used meta events
together with their event identification value.
Table 7 MSF meta events
Event Meta event Notes

identification
value
0 Sequence The sequence number used when a song is
number divided into a number of ‘cued’ track chunks
1 Text Any undefined textual information
2 Copyright Copyright details
3 Sequence/ Textual information about the name of the
track name sequence or track
4 Instrument The name of the instrument that should
be used to play the MIDI track
5 Lyric A word or syllable that is associated with
the lyrics of a song at a particular point
6 Marker Textual information marking a particular
point
7 Cue point Textual information defining a particular
cue point
8 Program name The name of the patch that should be used
to play the MIDI track
9 Device name Textual information about where the MIDI
data should be routed where a device
contains more than one MIDI output
32 MIDI channel Associate all the following meta and SysEx
prefix events with a particular channel
47 End of track This is the only non-optional meta event and
it must occur right at the end of an SMF track
81 Tempo Indicates a tempo change and defines the new
tempo (i.e. redefines the number of ‘ticks’ in
a crotchet).
84 SMPTE offset Defines the absolute SMPTE time code that
occurs at the start of the track (allows an
SMPTE offset to be incorporated as described
earlier)
88 Time signature Defines a new time signature
89 Key signature Defines a new key signature
255 Proprietary Can be used to store any data specific to the
event file or the application or the manufacturer
(the SMF equivalent of a MIDI SysEx message)
11.5 MIDI machine control and MIDI show control

Now you are really getting into this MIDI thing. You have been
able to use your computer to synchronise the bass part your friend
recorded with your original song by transferring the MIDI sequence
to your computer and storing it as an SMF. Meanwhile, whilst
surfing the Internet you have found a huge array of MIDI files just
waiting to be download and played, and this gives you an idea for
making a bit of extra cash. Why don’t you go into the Karioki
business with your computer and synthesiser? You use your
computer’s monitor to display the song lyrics to the singer whilst
your synthesiser plays the backing track. Great! So off you go and
buy some special MIDI files that contain the lyrics and you even
set up a light show to improve the atmosphere at your ‘Karioki
nights’.
The lights are OK, but wouldn’t it be great if they altered in step
with the music. Wait a bit, what’s this MIDI interface doing on
your light control box?
Well I’m sure you guessed it, MIDI has now branched out well beyond
the vision of the original designers into a more general control system
that can be used to control not only sound devices, but also lights,
staging and effects such as smoke machines etc.
MIDI machine control (MMC) was the first step in this direction, and
it used MIDI SysEx messages specifically to control the emerging new
hard disk digital recorders (See Box 14).
Box 14 MIDI machine control

MIDI machine control (MMC) is designed to control hard disk audio recorders
and similar audio equipment via MIDI SysEx messages. Commands such as
stop, play rewind etc. are specified as well as some special MIDI messages
such as Goto which details a particular SMPTE time that the audio equipment
should jump to. There is also an identity request message that gets the audio
equipment to identify itself via another SysEx MIDI message in the reverse
direction.
The general format of an MMC message is:
• Start of system exclusive (SysEx, value 240)
• Universal real time SysEx (value 127)
• Device identification or SysEx channel (a value in the range 0 to 126
with a value of 127 reserved for ‘all devices to respond’)
• MMC message identifier (value 6)

• MMC command data (variable length)
• End of exclusive code (EOX, value 247)
MMC has now largely been superseded and incorporated into a much
wider control specification called MIDI show control (MSC), which
specifies commands to control a wide range of devices that can be
associated with and/or may have a need to be synchronised with music.
Such devices are lights, stage machinery, video and film and special
effects. The specification is aimed principally at the professional user
in a theatrical or similar environment, but simpler implementations
can be used by smaller concerns and amateurs in applications such as
disco lights.
Like MMC, MSC uses MIDI SysEx messages as the method of

controlling the various devices (see Box 15), and the real time aspect of
MSC (MIDI time code etc.) is entirely consistent with all the previous
MIDI enhancements. There is a large range of possible commands that
can be implemented, and there is a great deal of flexibility for
manufacturers to incorporate their own device-specific commands.
Note though that some messages have specifically been allocated for
future extensions rather than leaving them undefined which means
that the specification can be extended at a later date without affecting
current systems.
Appendix 3 contains a list of the currently defined command format
device types, and also lists some of the commands that might be used
with these devices. This is given for reference only and to illustrate
the sort of devices and actions that MSC covers.
Box 15 MIDI show control

MIDI show control (MSC) is designed to control a wide range of audio and
audio-related equipment via MIDI SysEx messages thus providing the means
whereby a complete performance in all its aspects can be controlled by a single
control device (e.g. computer or MIDI sequencer).
Each type of device (sound device, lighting device, stage machinery, stage
effect, etc.) clearly has a completely different set of control requirements,
and MSC has been made flexible enough to allow manufacturers to incorporate
their own device-specific MIDI control messages. However, most if not all of
the devices will need to be controlled in specific ways at different times. For
example in a theatrical situation at one instant a smoke machine might need
to be activated whereas at another time the lighting board might need to start
a cross-fade to a different set-up, but the cross fade needs to be kept in step
with some music provided by an audio device.
MSC therefore incorporates a range of cueing commands that allow ‘action
lists’ to be set up and played back such that appropriate actions are executed
at the correct times.
The general format of an MSC message is similar to that of an MMC message:
• Start of system exclusive (SysEx, value 240)
• Universal real time SysEx (value 127)
• Device identification or SysEx channel (individual devices can be allocated
a value in the range 0 to 111, groups of similar devices can be allocated a
value in the range 112 to 126 and a value of 127 is reserved for ‘all devices
to respond’)
• MSC message identifier (value 2)
• MSC command format (a number representing the particular device type
the message is intended for – see Appendix 3)
• MSC command (a value between 1 and 127 indicating the particular action
that the device(s) should carry out – a value of 0 is reserved for future
extensions)
• Command data (any data that the command requires with the restriction
that the total number of bytes in the whole SysEx message must not exceed
128)
• End of exclusive code (EOX, value 247)
Through this brief look at MIDI show control, I hope you can envisage
how MSC can be used to control the technical aspects of a complete
performance from a single control device, which may not be much
more than a standard desktop computer. Indeed in some situations, it

is possible to program everything beforehand and then just set it
running and leave the whole performance to be controlled
automatically without any human interaction (of course if this
involves singers or actors, they must keep up!).
The applications for MIDI have now clearly moved on from being
specifically music-based, but remember that all these control functions
are available in addition to providing standard MIDI music messages.
This means for example that a synthesiser connected as part of a large
theatrical set-up could play music supplied via the MIDI connection
whilst lights and other devices are also being controlled along the
same connection.
The limitation to this of course is the capacity of a single MIDI
connection which is limited to just over 3000 bytes per second. To
help solve this problem, multiple MIDI connections are now often
used where one connection might contain control data whilst another
contains music messages. Sometimes data in MIDI format is sent via
standard computer interface protocols such as USB or Ethernet which
vastly increases the possible data rate, but of course many keyboards
and music devices in particular only incorporate a single standard
MIDI interface. In the future it is likely that a new, faster, MIDI
interface specification will emerge.
11.6 MIDI downloadable sounds

One day you get an email from someone who has heard your original
MIDI song which you’ve now go on your personal web site. They
think they can add some really brilliant additional accompaniment
melodies to enhance it. You explain that really the song is designed
to use some special string and piano sounds that you have
developed, and you would really like them to hear the song as it
should sound before they add any extra music.
However, if you record the audio output of your synthesiser on
your computer and create a sound file, you realise that because of
its size, it might take hours to email to your correspondent using
your slow Internet connection. Even if you decide you can afford
the time and telephone charge, although the other person can hear
the song as you intended, they can’t transfer the individual bespoke
piano and string sounds to their own synthesiser and so use them
when composing the additional material …
The final major addition to the original MIDI specification has been
developed for just the above sort of situation.
The MIDI Downloadable Sounds (DLS) enhancement to the original
specification defines an industry standard approach to storing and
transferring between synthesisers sound sets or patches that use
sample-based wavetable synthesis (as described in Chapter 8 of Block 2).
By using DLS, precise specifications for how a particular patch should
sound can now not only be stored on a computer, they can also be
transferred between DLS-compatible synthesisers even over the
Internet and even if the synthesisers have been manufactured by
different companies.
General MIDI was the first attempt at trying to provide a common

listening experience, but even using GM there is a restricted number of
sounds, and a wide range of variations in the actual quality and likeness
of the sounds to the actual instruments they are suppose to emulate.
DLS continues this process by enabling different synthesisers not only to
produce exact replicas of sounds, but also to allow an infinite number of
different sounds to be replicated. At the time of writing (2004), the
main use of DLS is with software synthesisers executed in a desktop
computer, and the main thrust of DLS has been its use as a means of
transferring and using sound patches over the web. But that’s not to
say that separate DLS keyboard synthesisers do not or will not exist.
In essence, for a particular patch, DLS provides information on the
form of individual cycles of the sound waveform as well as how the
sound should start and finish when a note is played (i.e. its envelope
and transients).
Universal non-real time SysEx MIDI messages are used to load or
retrieve sound data from a synthesiser or MIDI device, and additional
chunk types have been added to the MIDI file format specification
(SMF) to enable this data to be stored within MIDI files.
There are many complicated aspects to DLS which are beyond the
scope of this course, but Box 16 gives a very brief outline as to the
form of the data and how MIDI messages are used.
Box 16 MIDI downloadable sounds

Clearly any specification that is too difficult or technologically impossible to
implement straight away is not likely to be a success. Surprisingly though, the MIDI
Manufacturers’ Association managed to obtain agreement on a standard fairly
painlessly, and this is based on sounds created using the following basic components:
• digital samples forming the basic waveform(s) of the sound source;
• details about any loop points in the sampled sound source which are used
to create a continuous sound (if required) by looping the sampled waveform
in between the loop points;
• a low frequency signal that controls vibrato and tremolo;
• descriptions of how the volume and pitch of the sound are to vary from
when the note is first sounded through the steady state situation to what
should happen when the note is released;
• the behaviour of the sound in response to control changes such as pitch bend.
The MIDI downloadable sounds (DLS) specification therefore incorporates the
data needed to define each of the above basic components. In addition, to
allow sounds to vary in timbre over the full pitch range as happens in conventional
instruments, the pitch range can be divided into a number of regions, and
some of the parameters controlling the above components can be varied between
each region.
The DLS specification contains a detailed description of a DLS file format,
which should be used to store the DLS patch data. This format is similar to the
MIDI file format described earlier. It uses chunks to divide the data up into the
individual components, and a standard WAV file format as described in Chapter 1
of this block is used to store the sound sample data.
In order to download the sample data to a separate synthesiser via MIDI, sample
dump SysEx messages are used either in handshaking or non-handshaking mode
(see Section 11.1). There are also a small number of additional SysEx messages
defined to turn on or off DLS sounds.
11.7 Summary of Section 11

I hope my perhaps slightly unrealistic (or now outdated!) scenario
running through this section has helped to show you the sort of
problems people found with using MIDI, and how the MMA set about
solving them – and continues to do so today.
Clearly MIDI has now progressed way beyond its original purpose as
envisaged in 1983 – only 20 years ago at the time of writing. The advent of
the web has substantially helped to widen the use of MIDI, and this stems
from the fact that reproducing music from codes is much more efficient
in terms of the amount of information that has to be communicated
than sending the music as actual digital sound samples.
What about the future of MIDI? There is such a huge range of MIDI-
based equipment available now that it is unlikely that the basic MIDI
protocol will alter fundamentally, but of course enhancements will
continue to be added by using the codes ‘reserved for future
expansion’. The main sticking point with MIDI now is the limited data
speed over a MIDI cable. This of course is not a problem when all the
MIDI processing is contained in a single device (desktop computer or
sequencer for example), but with the increasing use of MIDI show
control, a single MIDI connection is often unworkable because of the
amount of data that needs to be sent.
Multiple MIDI ports are now common to help solve this, but it is
likely that the MMA will soon come up with a proposal for a new
system that allows the MIDI data to flow much faster, and this is likely
to include a new interconnection system as well. As I mentioned
earlier, this may well be based on common computer interconnection
systems like USB, Ethernet or FireWire – indeed transmitting MIDI
over standard USB or Ethernet connections is now happening
particularly where computers are involved, but not many synthesisers
and MIDI processing equipments currently have this facility. In the
short term a new, faster, interconnection system will of course
necessitate the use of converter boxes with devices that have the
current 5-pin DIN connectors, but this is a small inconvenience to the
alternative of making all current MIDI-equipped devices obsolete by
bringing in a new incompatible standard.
12 MIDI IN ACTION
12.1 MIDI equipment

There is a huge range of MIDI-equipped devices, but generally they fall
into one of three categories – those that generate MIDI messages, those
that manipulate MIDI messages and those that interpret MIDI messages to
generate music. Of course, desktop computers can usually carry out all
three tasks, so I will look at MIDI on general-purpose computers
separately. I will also restrict my discussion to music devices rather
than try to include the large range of audio recording and non-musical
devices that can be controlled by MIDI show control messages.
To give the discussion some context, this section incorporates a look at

the work of Simon Whiteside, a professional musician and film/TV
composer, who uses MIDI extensively. Simon works from a small
MIDI-based studio in the North of London. His work is varied and
includes playing live music as well as composing music for films
and television programmes. In the video sequences collectively
entitled ‘MIDI in action’ Simon first introduces himself, gives us a
tour of his studio and talks about MIDI and General MIDI; he then
takes us through the stages of composing background music for
films and television programmes; and finally you can see the
results when he is let loose with the TA225 Course Tune.
Watch the DVD video sequence 1 ‘MIDI equipment’. This first sequence
in the ‘MIDI in action’ set of sequences contains four short sections.
In Section 1.1 ‘Simon’s background’, Simon Whiteside introduces
himself and tells us a bit about his background. In Section 1.2 ‘Simon’s
studio’ Simon gives us a tour of his small studio, and explains the
various items of equipment he has and their uses.
In the final two sections 1.3 ‘MIDI basics’ and 1.4 ‘General MIDI’ Simon
gives us a broad overview of the MIDI system and General MIDI from
his own perspective.
As you watch the video sequence, make a few notes about the important
points Simon makes about his equipment, and his use of MIDI and
General MIDI. I
Simon mentioned that he regularly uses more than 32 MIDI channels

in his music. Describe how the 16-channel limitation of the MIDI
system can be overcome. I
12.1.1 MIDI generators

The most common and most useful device that a musician can use to
generate MIDI messages is the MIDI keyboard (Figure 27). These
devices do not have any sound generation circuits and are designed to
allow all the basic MIDI functions to be controlled. Like the keyboard
in Simon Whiteside’s studio, they are usually ‘performance’ style
keyboards which means they have a full piano encompass of notes with
full size keys that are weighted and touch sensitive. The keyboards also
have facilities for mapping MIDI channels to various sections of the
keyboard, they have pitch bend, program change, transposition and
Figure 27 A MIDI keyboard

other controls, and the keys usually include aftertouch. Quite often
though, many of these MIDI functions can be generated using a
synthesiser or electronic keyboard, where the device’s MIDI OUT
signal is used as the ‘MIDI keyboard’. In this situation, if some
processing of the MIDI signal is required before it is fed to the device’s
sound generating section, then this will not be possible unless the
synthesiser has a local on/off mode.
What is meant by a local on/off mode, and how can a local off mode be
used to solve the situation outlined above? I
As useful as a MIDI keyboard is, it is not the only device that a musician
can use to generate MIDI.
Another common MIDI generator is the MIDI drum controller. Drum
sounds have always been popular in MIDI as they allow people who
are unable to play drums or other percussion instruments the chance
to add drum sounds, and therefore rhythm, to their music using a
MIDI keyboard to generate the required MIDI messages.
On the other hand percussionists also like to use MIDI to enhance
the range of sounds they can produce. So there is available a large
range of MIDI drum kits that can be played just like a conventional kit,
but instead of producing drum sounds, just produce MIDI messages.
Some of these devices are designed to be played with the fingers
rather than using sticks, but others are full sized drum kits that can be
played just like their conventional equivalents – but of course they
make little noise (Figure 28). This aspect is a distinct advantage to a
percussionist wishing to practice without disturbing the neighbours!
Another possibility is to place a microphone near
a real drum and feed the microphone’s signal
to a device called a MIDI trigger. This device
causes MIDI messages to be sent in response to
the sound signal coming from the microphone.
Of course once the drum ‘sounds’ are in the form
of MIDI messages, they can be used in any way
the musician wishes – for example to change the
sound from a snare drum to a timpani or even to
control pitched sounds such as strings or piano.
It’s not only percussionists that like to get in on
the MIDI act. Players of other instruments would
sometimes like to be able to use the facilities
that MIDI provides – even if it’s only to be able
to practice at home with headphones on at two
in the morning! Thus manufacturers have come
up with a number of novel ‘MIDI instruments’
that mimic their conventional counterparts, but
provide a MIDI output signal. Sometimes these
are actual instruments that produce sound but
have a special pickup, but others produce no
sound.
Figure 28 A MIDI drum set
Adding a MIDI interface to real instruments is not a new idea.

Watch the DVD video sequence ‘MIDI melodeon’ which is an item
from the BBC television series Tomorrow’s World originally
broadcast in October 1993. I
Today, popular types of MIDI instrument are MIDI guitars

and MIDI wind controllers as shown in Figure 29.
The problem for the designers of such MIDI
instruments is first how to measure the
various elements that go to make up
the sound – pitch, dynamics, vibrato,
glissando etc. – and then how to
convert these elements into
MIDI messages when the
MIDI system was not
originally designed
for such applications.
However, some give
very good results and
these instruments are

Figure 29 A MIDI wind controller
quite popular.
12.1.2 MIDI manipulators

There is a whole range of devices that simply manipulate MIDI messages,
they do not have any means of generating
original MIDI sound data and have
no sound generating circuits.
Most prominent among
these are MIDI sequencers
(often just called sequencers)
as illustrated in Figure 30.
These are devices whose main
function is to record MIDI message
streams and then play them back at
a later date. These devices can also
perform simple ‘editing’ operations on
the MIDI messages such as individual
note editing, control change message and
program change. Remember that not only
must a sequencer be able to store the MIDI Figure 30 A MIDI sequencer
messages it receives, it must also be able
to remember their relative timing as well.
Although popular in the early days of MIDI, these devices now have
a limited application as the facilities they provide are generally far
inferior to those that can be found in software sequencers such as the
Course’s sound recording and editing software. However, their small
size, ease of use and portability make them ideal for using at live
venues.
There are a number of other categories of device that process MIDI

messages, the main ones are as follows.
• MIDI merger. A device that combines two or more MIDI streams
into a single output. This is not just a matter of adding the serial
message streams together as there would be conflicts if two
messages occurred simultaneously. Thus a MIDI merger must first
convert each MIDI stream into individual messages and then store
these temporarily. It must then create a new serial message stream,
picking messages from each input to try to ensure that messages
from all streams are delayed by the minimum amount.
• MIDI expander. This device performs the opposite function to
the MIDI merger in that it divides a MIDI stream into two or more
separate streams based usually on the MIDI channel number.
For example such a device might take a MIDI stream containing
messages for both MIDI channels 1 and 2 and separate these so that
channel 1 messages are fed to one output and channel 2 messages
to another.
• MIDI filter. MIDI filters are devices that can be programmed to
recognise specific MIDI messages or ranges of messages and either
block them or allow their passage through the device. Such devices
might be used for example to remove SysEx messages, or to block
program change messages so that they do not cause a synthesiser to
change its sound patch.
• MIDI mapper. This device changes one or more data bytes that are
associated with specific MIDI message types. A MIDI mapper
might be used to reassign channel numbers, transpose notes or
change controller numbers etc.
• MIDI patchbay. For a simple system, if there is a need to alter a
MIDI cabling configuration, then it is an easy matter to unplug and
reconnect a few cables. However for a more complicated system,
this would become tedious and prone to error (e.g. connecting two
MIDI outputs together). A MIDI patchbay then is a device that
provides a number of MIDI interfaces and enables any interface to
be routed to one or more of the others. Control of this routing may
be achieved by switches on the unit, or may be from a controlling
computer.
• Multiport MIDI interface. Multiport interfaces are used in
conjunction with a computer. They provide a number of MIDI
interfaces and, in addition to providing patchbay facilities, they
can often carry out the functions of MIDI merger, expander, filter
and mapper as describe above. As you saw in the video sequences
1.1 and 1.3 in Activity 45, Simon Whiteside uses just such a device
in his studio. An important additional facility that is usually
provided is that of synchronisation. Multiport interfaces with
synchronisation facilities will be able to input and output SMPTE
time code, convert between SMPTE and MTC (MIDI time code) and
may even have device-specific synchronisation facilities (e.g. to
control a particular digital recorder).
• Patch editors and librarians. A patch editor is a device that can
allow the user to set up a particular synthesiser’s sound patch data
in an easy and user-friendly way, and a librarian is a device for
storing patch data. Since each synthesiser works in a slightly

different way and requires different set-up data, patch editors are
usually designed to work with a number of different synthesisers –
they are often produced by a manufacturer for use with their own
range of devices . Both of these types of devices do not strictly
manipulate MIDI messages but they use MIDI SysEx messages to
load and store patch data. These functions are now most often
performed by software programs running on a computer as many
more sophisticated operations can be included and in a more easy-
to-use way. But like hardware sequencers, they do still have a use
in live, on location, situations.
• MIDI diagnostic tools. The final category of MIDI processing
devices are those that help to diagnose problems. The simplest of
these is a single light emitting diode (LED) connected to a MIDI
OUT signal. You will recall that the standard MIDI interface
contains an optoisolator that is driven by the MIDI OUT signal.
Since this device contains an LED, there is no reason why a separate
visible LED should not be used in order to see the MIDI data as a
flickering light. More sophisticated diagnostic devices are available
that can interpret the MIDI messages and indicate which channels
are being used and what type of MIDI messages are occurring.
12.1.3 MIDI sound generators

The first type of MIDI device that springs to mind when considering
devices that can interpret MIDI messages and generate the sound they
represent are synthesisers and the huge range of electronic keyboards
that have MIDI interfaces. Almost all of these will be able to respond
to basic MIDI note messages, although the cheaper devices may only be
able to produce one sound at a time even though they may be able to
respond to messages on more than one MIDI channel. The more
sophisticated ones will have General MIDI in addition to providing
banks of other sounds and allowing users to create their own. They
may even be able to implement downloadable sounds (DLS).
Drum machines are the percussion equivalent of electronic keyboards.
These devices not only provide percussive sounds, most will provide
simple pitched sounds as well, e.g. a glockenspiel sound where the
drum pads are used to create the notes.
There are however, many devices that do not have an integral keyboard or
other user interface and have only a MIDI interface. Thus, these devices
can only produce sounds in response to MIDI messages (see Figure 31).
Figure 31 A keyboardless
MIDI sound generator
Finally there are what one might call speciality musical MIDI devices
like the MIDI melodeon in Activity 48, and it is here that we come a
full circle from the original code-operated musical instruments that I
introduced in Section 2. I would like to mention just two of these
‘speciality’ devices – one is the MIDI-driven carillon that I have
already talked about, and the other is a MIDI controlled piano which is
the modern equivalent of the barrel and player pianos.
Piano manufacturers are now producing modern ‘player pianos’. These
can not only be played just like a normal piano, but can be controlled
not by a paper roll, but by MIDI messages, complete with the keys
moving just like the original paper roll instruments. Of course they
also provide a MIDI OUT signal when played, and will incorporate
some form of storage device and/or computer interface. Interestingly
the top-of-range models are purchased mainly by the rich and famous
and used with MIDI files to provide player-less background music at
parties etc. rather than being used for serious music or MIDI work.
This is just the same situation when pianos became a fashion item in
the late 1890s and early 1900s as I mentioned earlier in Section 3.
Play the MIDI file associated with this activity using either your
normal MIDI-playing software or the course’s music recording and
editing software. This is just a few bars of Thurlow Lieurance’s Indian
love song By the waters of Minnetonka. This is the piece of music that
you heard the player piano playing from a piano roll in Activity 16.
Unfortunately an exact MIDI copy of the piano roll could not be found,
so this version has been generated specially for the course by the
Course Team. I
12.1.4 MIDI implementation chart

At a number of points in my discussion of MIDI and MIDI devices,
I have mentioned that not all MIDI devices have to respond to or be
able to generate all the possible MIDI messages – even those contained
in the original 1983 specification. For example, a keyboardless MIDI
sound generator only needs to respond to note on and note off
messages, it does not have to generate them; a simple synthesiser
might only be capable of responding to one MIDI channel at a time.
So that purchasers and users of MIDI equipment can easily find out
the MIDI capabilities of an instrument or device, manufacturers
usually include a MIDI implementation chart with their device.
A MIDI implementation chart is a standardised table that details
how the device responds to MIDI messages, and what messages it
can generate.
The form of the table is laid down by the MMA and consists of 4 columns
and 12 rows. The columns define
• the MIDI function,
• whether the device can transmit MIDI messages associated with
this function,
• whether the device can respond to MIDI messages associated with

this function, and
• any additional notes concerning the implementation of the function.
Each of the rows of the table is allocated to one of the main MIDI
message types:
• basic channel (the MIDI channels the device responds to);
• mode (what modes are implemented, and how many notes of
polyphony are available);
• note number (the range of MIDI note numbers or pitches that the
device can transmit or respond to);
• velocity (how the device handles the note on and note off velocity
values)
• aftertouch (whether the device can respond to or transmit key and/
or channel aftertouch);
• pitch bend;
• control change (what control change messages the device can
respond to or transmit);
• program change (whether the device can respond to program
change messages and, if relevant, how many programs or patches
are available);
• system exclusive (how the device manages SysEx messages,
including the device’s SysEx identification number);
• system common (how the device handles MIDI time code, song
select etc. messages);
• system real time (how the device handles MIDI clock, start, stop
etc. commands);
• auxiliary messages (how the device handles local control, all notes
off, system reset etc. messages);
Finally there is a section for any additional notes.
For reference only, Figure 32 shows a typical MIDI implementation
chart for the Roland D-50 synthesiser. This is an early instrument
produced at the end of the 1980s which I have chosen because the
MIDI implementation chart is quite simple and so you should find it
straightforward to understand.
Unfortunately, because of the limited space in the table and the
complexity of modern MIDI equipment, it is now often of limited use,
and manufacturers usually still have to include additional information
on how their device implements MIDI.
Study the MIDI implementation chart in Figure 32 and then answer

the following questions.
(a) What range of pitches can the D-50 synthesiser transmit from its
MIDI output?
(b) Does the D-50 implement either polyphonic aftertouch and/or
channel aftertouch? I
Figure 32 MIDI implementation chart for the Roland D-50 synthesiser

12.2 MIDI in computers

More and more, people are now turning to the desktop computer when
working with MIDI. Almost all the MIDI operations needed to create a
master song can be achieved using a computer with suitable software
and perhaps a small amount of additional hardware. The one situation
that cannot be achieved with the computer alone is when a musician
needs to play an instrument – most often this is a keyboard, but it
could be a MIDI drum set, guitar or wind controller, etc. Although it is
possible to input simple tunes using the computer’s keyboard, for
anything more complicate a ‘real’ instrument needs to be used.
12.2.1 Hardware
The hardware of a computer refers to the physical bits and pieces that
make up the computer and include not only the electronics inside the
main box, but also the extra devices or peripherals that are needed
such as the computer’s keyboard, display and mouse. All desktop
computers have some sort of hardware ‘sound card’ which acts as a
sound interface and provides analogue audio inputs and outputs and
which contains analogue-to-digital and digital-to-analogue converters
that enable sound input to and output from the computer. The ‘sound
card’ may be integrated into the computer’s main circuitry or it may be
contained on a separate electronic circuit board (but still within the
main computer box).
Today most of these sound interfaces also contain a MIDI input and a
MIDI output (although as mentioned earlier in this chapter, an adapter
lead is often needed if the standard 5-pin DIN sockets are needed – see
Figure 19). Some interfaces even contain sophisticated sound generators
and can interpret MIDI messages themselves to produce an analogue
music output directly. Indeed such is the progress of technology these
days that it is possible to integrate a complete synthesiser on a computer
sound card that provides General MIDI as well as user programmable
and downloadable patch capabilities.
An example of such a sound card
is shown in Figure 33.
analogue
audio input
left
analogue right
audio outputs
S/PDIF digital
output
MIDI
interface
Figure 33 The Yamaha SW1000XG PCI sound card which contains a complete
synthesiser with full MIDI facilities.
With the addition of a MIDI keyboard therefore, a fairly standard desktop

computer can be used for MIDI work without the need for any additional
hardware. In fact because MIDI messages flow at a much slower rate
than actual digital sound samples, the performance of a computer
which is to be used solely for MIDI is not nearly as critical as it would
be when working with digital sound samples – particularly if the sound
interface contains its own hardware synthesiser.
However, in a more sophisticated set-up, there will be a need for some
additional MIDI interfaces, either provided by an internal card, or an
external box such as the multiport MIDI interface described earlier.
In addition, if audio samples are also involved, not only will the
performance of the computer itself be more critical, but there may be a
need for additional analogue and/or digital audio inputs and outputs
which can again usually be supplied by another internal card or an
external box connected to the computer by one of the standard computer
interfaces such as USB or FireWire.
In a professional environment, there will also be a number of special
purpose interfaces to control specific devices such as recorders,
mixers etc., or to provide synchronisation signals.
From your study of Chapter 1 of this block, suggest what particular

aspects of a computer’s ‘performance’ might be critical in a situation
where actual digital sound samples are being processed? I
12.2.2 Software
Software refers to the programs that, when executed (or run), cause a
computer to carry out some function. Some software is provided to
help execute more complex programs – the Windows® operating
system is an example of this type of program, and is often called
system software. In fact Windows itself is composed of many smaller
programs that do specific jobs, e.g. looking after the reading and
writing to secondary memory (hard disk), organising the loading and
execution of the user’s programs which are known as applications.
All desktop computers will have the necessary programs to provide basic
sound input and output, and to interface to the MIDI connections.
However, these are of limited use without applications that can facilitate
making music with MIDI.
Many of these application programs provide a complete MIDI
environment that includes a multitrack sequencer, MIDI editing, time
code generation, and may even provide a general MIDI synthesiser.
More sophisticated programs will integrate MIDI tracks with sound tracks
and provide a complete sound creation and editing environment.
The course’s sound recording and editing software is an example of
this type of program.
An important point to note is the idea of a software MIDI synthesiser.
It is quite possible to implement General MIDI using a program rather
than an actual electronic device such as those found in synthesiser
keyboards and the sort of sophisticated sound cards shown in

Figure 33. Apple’s Quicktime® is an example of a program that is not
only able to read and process digital sound files of various types,
but it is also able to read and interpret MIDI messages and produce
music directly from these messages. However, the processing overhead
required to implement a software MIDI synthesiser is substantial,
particularly if General MIDI is needed. In addition the sounds that
software MIDI sound generators produce are not generally as good
as their hardware equivalents. Thus, if possible it is often better to try
to get a MIDI song played by a hardware device – either on the sound
card, or by an external synthesiser. This is particularly the case when
sound data is being processed at the same time. Of course, whether a
software synthesiser works in a particular case depends on the number of
sound and MIDI tracks being processed and the computer’s processing
speed. Today’s computers (in 2004) can cope remarkable well with lots
of simultaneous audio and MIDI channels, so for the amateur musician
the lack of processing power rarely causes a problem.
12.2.3 Latency
One problem area when using a computer simultaneously for both MIDI
and audio is that of making sure the sounds from the MIDI data are
heard at the correct time in relation to the analogue sound created
from the digital audio data. Any delay of either the digital sound or
the MIDI sound is known as the latency. Why might this occur?
If you think about the situation of a computer playing back a song in real
time that uses both digital sound and MIDI data, then it is reasonable
to suppose that the following situations might occur:
• the sound samples get delayed because perhaps they have to be
sent to an external sound unit;
• the MIDI sound is delayed because the computer is using a
software MIDI sound generator that takes some time to respond to
each MIDI message and generate the required sound.
There are many factors that can affect the synchronisation between digital
sound and MIDI sound, and you may have found in the practical work
associated with this chapter that you had to adjust the settings of the
course’s sound recording and editing software to compensate for the
latency on your computer.
Latency can also occur in a recording where the performer is playing
along to a previously recorded sound and there is a delay between the
performer’s recording and the existing recording. This means that on
playback the two sounds do not sound together as they should.
However, it is important not to confuse latency with inadequate
performance. Latency just means that a sound or a MIDI signal
gets delayed in relation to the other signals it is supposed to be
synchronised with. It does not mean that the computer cannot cope
with the speed or amount of data that it has to process in order to
provide the sounds in real time. If performance is the problem, then
this is likely to manifest itself in unpredictable effects such as a
variable delay or corruption of the audio and/or MIDI sounds.
Performance problems might be caused by the amount of perhaps

unrelated background tasks that the computer is having to do, for
example looking after a network connection, rather than a fundamental
lack of processing power.
12.2.4 Other aspects

One major aspect of MIDI that is missing from programs such as the
course’s sound recording and editing software is a MIDI patch editor
and a librarian. Such programs certainly exist and many manufacturers
provide these types of programs with their MIDI devices, but remember
that patch editors are synthesiser-specific and certainly do not exist for all
the different makes of device. That said, the emphasis is now on wave-
table synthesis and with the introduction of MIDI downloadable sounds
this means that audio editors such as the course’s sound editor can be
used to edit sound samples and a specialised patch editor is not required.
Finally, you will find that MIDI is used quite often without you knowing
it. Background sounds on a web page for instance will often use MIDI,
and when MIDI downloadable sounds become common, the sound
data will be included as well, so that the sound heard will be exactly
as the composer intended.
12.3 MIDI IN FILM AND TV MUSIC
In this section, we return to Simon Whiteside to see how he uses MIDI

in his work as a composer of film and television music.
When he is asked to provide some music for a scene from a film or
television programme, Simon is usually just supplied with a video of
the section and given some ideas from the production company as to
the sort of music they feel is required. From this material Simon
carries out a series of both artistic and technical processes in order to
produce the music that goes with the pictures. As you will see, Simon
needs both musical and technical skills as he not only composes the
music, but he also generates the sounds fully synchronised to the
pictures using MIDI and sometimes live musicians as well.
In the watching activity below, Simon explains how he went about
composing a piece of music for a short scene in a television programme.
The programme is one in the BBC’s Imagine series of arts programmes
introduced by Alan Yentob. The particular programme was about the
artist and inventor Leonardo da Vinci and was programme 2 in the series.
Subtitled ‘Dangerous Liaisons’, it was originally broadcast on 27 April
2003. The programme used both historical enactments to show Leonardo
in his own time and also sequences showing some of his inventions
being built and tested in modern times from Leonardo’s original
drawings.
Watch the DVD video sequence 2 ‘MIDI in film and TV music’ from
the set of video sequences collectively entitled ‘MIDI in Action’.
This sequence contains six short sections.
In Section 2.1 ‘Capturing the pictures’ Simon explains how he sets up

his studio to be able to synchronise the pictures to the sound that he is
to compose and in the next two sections, ‘Capturing the mood’ and
‘Capturing the pace’, he explains how he goes about setting a mood and
pace for the music that fits the scene. In Section 2.4 ‘Capturing the notes’,
Simon demonstrates how he gets the music into the computer, and in
Section 2.5 ‘Putting it all together’ we see how it all comes together to
produce the finished sound.
Finally in Section 2.6 ‘Delivering the results’ Simon comments on the
ways in which he supplies the music to the production company, and
the advantages and disadvantages of incorporating live music.
As before you should note the important points Simon makes about how
he goes about composing the music and how he uses the technology
available to him and in particular his use of MIDI when composing for
film and television. I
There are a number of general points that come out of the video sequence
in Activity 52 – some of the more important ones are mentioned below,
but you will probably have noted a number of other points as well.
• The original video footage has the SMPTE time code ‘burnt in’ to it
to provide the means of synchronisation.
• The mood and pace of the music should be such as to enhance and
not distract from the viewing experience.
• There are a number of ways of entering the notes of the music into
the computer – playing them directly on a music keyboard, placing
the notes on the score display or entering them individually into a
MIDI list or ‘piano roll’ display.
• The finished music is usually supplied as sound (rather than MIDI
codes) on digital audio tape (DAT) or CD. This is because General
MIDI cannot be relied upon to produce the sounds sufficiently
closely, and also there are many better sounds available that are not
covered by General MIDI. However, the use of DLS may mean that
supplying MIDI codes with DLS wavetable data may allow MIDI
files rather than digital audio files to be delivered in such instances
in the future.
• The stability of the playback of today’s digital sound and video
recording systems is such that continuous synchronisation is
not needed for short pieces of music, particularly as here where
very precise synchronisation (such as lip-sync) is not required.
Only a few specific synchronisation points are all that is necessary for
synchronising the sound to the picture.
• Using live musicians brings a number of problems to the process
in terms of having to manually adjust the scores the computer
generates and also the legal problems and costs of recording
rights.
What are the advantages of using MIDI in the initial stages of

composing music that is to go with a piece of video footage? I
Listen to the two audio tracks associated with this activity. Both tracks
contain the final full piece of music that Simon Whiteside composed
for the section of the ‘Leonardo’ programme that was the subject of the
video sequences in Activity 52. One of the tracks is the version using
live musicians and the other uses only sounds from Simon’s various
synthesisers and samplers, and is the version used for the ‘international’
version of the programme.
Which do you think is the one that uses live musicians?
Comment
You should not have found it too difficult to detect that the first
version is the one that uses live musicians. However, I hope you will
agree with me that the synthesised version is extremely convincing,
and that in context it is unlikely that a viewer who is concentrating on
the action on screen would be aware that the music is in fact not
generated with live musicians. I
In this activity you will experiment with the score, list and ‘piano roll’
MIDI editors in the course’s sound recording and editing software.
You will find the steps for this activity in the Block 3 Companion. I
12.4 MIDI LIMITATIONS AND IMPROVEMENTS
As you have seen in this chapter, MIDI is certainly a very useful tool
in the generation and distribution of music. But even though the MIDI
specification has been and is still being enhanced to incorporate more
features, it is after all basically a method for coding music and as such
will always have its limitations – just as conventional music notation
and piano rolls do.
There are two areas where MIDI has its main problems – functionality
and physical. The functional problems include musical considerations
such as catering for different temperaments, timbres and the nuances
of live players, and the physical problems include the speed of MIDI
messages and the connection system.
However, as you will see in the next activity, Simon Whiteside sees a
further limitation of MIDI as being how to get certain nuances of
performance into MIDI codes even though the MIDI system does
already have facilities to record such features.
Watch the DVD video sequence 3 ‘MIDI limitations and improvements’. I
Clearly then MIDI has been and still is a very robust system that works
well, and has certainly bought music creation to many people who
cannot read music or play an instrument.
As I mentioned before MIDI has expanded way beyond its original

envisaged uses and is being used in many diverse applications some of
which have little connection with music. In the future we are likely to
see the MIDI system being further expanded both in its functionality
and in its physical aspects, and perhaps even Simon’s ‘organic’ MIDI
interface will one day become a reality!
Why should the speed of MIDI messages cause a problem with dance
music where there is a lot of percussion? I
12.5 MIDI AND THE TA225 COURSE TUNE
In this final section of the chapter, you will work again with the
TA225 Course Tune, but this time you will incorporate MIDI sounds.
In fact, the MIDI version of the tune was produced before any of the
live versions were recorded since all the performers in the live versions
played along to the MIDI version with a click added. They used head-
phones to listen to the tune so that it and the click didn’t get recorded
with their performance. You will see Simon Whiteside doing just this
in the final set of video sequences in this chapter.
The Course Team gave Simon a copy of just the melody line of the
TA225 Course Tune with no indication of harmony or tempo and
asked him to produce three arrangements of the tune – one that used
General MIDI, one that used synthesised sounds and one free choice
version which we hoped would include some live music. In the next
activity you will see what Simon did with the tune.
Watch the DVD video sequence 4 ‘MIDI and the course tune’. This
sequence contains five short sections.
As before you should make some brief notes about the points Simon
makes concerning the sounds he used, and the way he went about
producing the three course tune versions.
Comment
In the score displays shown during the General MIDI version, notice
again the crude layout of the notes. As Simon mentioned in an earlier
sequence, if the score was needed for live musicians, then there would
need to be some human intervention to get the score in a form that
musicians would comfortably be able to play from.
Make sure you understand the distinction between sampled sounds
and synthesised sounds. Both are controlled by MIDI codes, but
sampled sounds are generated originally from live instruments using
looping techniques as explained in Chapter 8 of Block 2 whereas
synthesised sounds are sounds produced totally electronically. A set
of General MIDI sounds may well use both sampled and synthesised
sounds – the method of production of the sound is not laid down, only
that the sound produced should be like the instrument named in the
General MIDI patch name. It is interesting to note that some instruments

just do not lend themselves to be synthesised well, and so sampled
sounds are most often used for these sounds. As you saw in the video,
Simon used sampled sounds for the drums and double bass in the jazz
version of the course tune, and I hope you will agree with me that the
result is quite convincing and that it sounds as though it could well
have been created by live musicians. I
Why was it important for Simon to wear ‘enclosed’ headphones

(i.e. headphones that restrict the sound that emanates into the space
around the wearer) when he recorded the melody horn in the jazz
version of the course tune? I
Before you decide which of Simon’s versions of the TA225 Course

Tune you prefer, listen to the four audio tracks associated with this
activity. These are full versions of all three of Simon’s course tune
arrangements. The first track contains the General MIDI version, the
second track the dance version and the last two tracks contain two
versions of the jazz arrangement – the first using just MIDI sounds and
the second incorporating the melody horn as Simon is seen doing in
the video sequence.
Comment
Whichever version you personally prefer, I hope you will appreciate
first of all how different all three are from the original versions of the
TA225 Course Tune that you have worked with. The reasons for this
are not just in the slower speed of the tune that Simon decided to use,
but also in the harmonies and counterpoint he used and in the
‘embellishments’ he incorporated (the rhythmic patterns and the
glissandi in the harp part for example). All of these aspects combine to
create interest and variation in the pieces.
Notice also how each version presents its own technical challenges.
The General MIDI version uses a large number of MIDI channels and
has lots of different ‘instruments’ playing. The dance version requires
the use of synthesised sounds which need to be carefully selected, and
the music itself is difficult to input to the computer as some of the
parts cannot be played directly and need to be input note by note.
From a musical point of view the jazz version is quite straightforward
for a musician who is used to playing jazz music (as Simon is).
However, as well as the complication of adding the live melody horn
sound, in order for the result to sound convincing, sampled sounds
had to be used. Notice in particular here that the drum sound Simon
used was created not from the sample of a single brush hit (i.e. using
brushes as a side drum beater as mentioned in Activity 1 of Chapter 5
in Block 2), but from a sample of the complete basic rhythmic pattern
of the piece. This sample lasts a full bar of the music, and at the start
of every bar there is a simple MIDI message to start the sample playing
(two MIDI messages are in fact used as you will see in Activity 61). I
Simon has provided the Course Team with MIDI files for all three of
his arrangements of the TA225 Course Tune. In this activity you will
use the course’s recording and editing software to examine and play
these three files, and in the case of the dance and jazz versions, to try
to edit them to produce acceptable results. You will find the steps for
this activity in the Block 3 Companion.
Comment
I hope this activity has demonstrated the advantages of General MIDI
in enabling people to create MIDI files that can they can be sure will
always sound similar to how they intended – of course as long as
General MIDI is used when they are played! I
In this final activity of the chapter, you will return to the original
version of the TA225 Course Tune, this time incorporating MIDI to
the mix. Like the last activity in Chapter 1, this is an open-ended
activity, and you can spend as much or as little time on it as you wish.
The Block 3 Companion contains some more information about this
activity and an outline of the procedure you should follow. I
In this chapter and Chapter 1 of this block you have learned about the
technology behind the recording, editing and mixing of both digital
audio and MIDI. In addition I hope you feel you have gained some
valuable practical skills in these areas and that these will spur you on
to experiment with creating your own music using these skills and the
software provided with the course. Good luck!
Pin barrels or cylinders store music as pins as a series of codes and was originally
around the circumference. When the barrel designed to enable one or more sound
is rotated the pins engage with the generators to be controlled from a single
mechanism of the instrument to play the keyboard. In the early days of electronic
tune. The position and size of the pin music, sounds were produced using
determines the note. The speed of rotation analogue techniques. They were
then determines the tempo. Instruments monophonic and were controlled by
include carillons, organs, pianos and varying voltages. When polyphonic
orchestrions. Drawbacks to using a barrel digitally controlled synthesisers appeared,
include the expense of manufacture and variable voltages could not be used to
limitations in the number of tunes offered. control the sound generating devices, and
Barrels are also expensive to replace. The this led to the development of the MIDI
pin-disc, used in Polyphons and the like, system which allowed one keyboard to
overcame many of these drawbacks but were control a number of synthesisers,
superseded by the gramophone record. independently and simultaneously. MIDI
(Section 2) has become so successful that it is now being
used in applications which have little to
Paper rolls and to a lesser extent cardboard do with the original concept of music
books overcame the drawbacks of cylinders. codes. (Section 6.1)
These use the Jacquard concept whereby
holes in the paper or cardboard carry the As useful and popular as MIDI is, it is not a
instructions for the instrument. The hole universal solution for all music
causes a note to be played and the length applications. There are many facets of music
of the hole determines the length of the that MIDI cannot cater for. However, it is
note. The speed of the paper affects the likely that future enhancements to the MIDI
tempo. Used mainly to control pianos, the system will become ever more closer to this
original Pianola or pushup involved adding universal goal. (Section 6.2)
a mechanism to the front to play the keys
of a standard piano. Later the player piano In a basic MIDI system, there is one master
was produced that incorporated the controller that generates MIDI messages. The
mechanism in the piano. (Section 3) controller is connected to one or more
devices that respond to these messages.
Many novel mechanical devices have been (Section 7.1)
produced to play a variety of conventional
musical instruments such as the violin and Each one of the basic set of MIDI music codes
banjo. Some of these even controlled can be associated with up to 16 separate
lighting and other effects, all from channels. Receiving devices can be set up to
instructions on a perforated paper roll. respond to one or more of these channels thus
Many well-known pianists recorded allowing the controller to control the devices
performances on piano rolls, and some individually if required. (Section 7.2)
composers composed music specifically for
mechanical instruments, and in particular MIDI messages are sent along a single cable
the player piano. The popularity of the serially, i.e. one after the other. The speed
player piano was affected both by the of transfer of individual messages is such
improvements in the gramophone and by that usually there is no perceptible relative
radio broadcasting. (Section 4) delay in the music that is generated from
these messages. However, the finite speed of
A look at mechanical music shows that transfer can produce problems in situations
coded music can only contain information where there is a lot of MIDI activity. For such
about a limited number of aspects of the situations an additional MIDI connection is
music. Continuous variations in quantities often used. (Section 7.3)
such as pitch, timbre and dynamics are
difficult if not impossible to code A MIDI byte consists of eight bits of data,
satisfactorily. To ensure the music is and is the smallest unit of MIDI data. MIDI
reproduced from the music codes as messages are made up of one or more MIDI
intended, there is a need for standards, even bytes. (Section 7.4)
so there is no guarantee that the music will
sound exactly as intended. (Section 5) The MIDI specification comprises details on
the hardware and electrical signals to be
The Musical Instrument Digital Interface used and how the data is to be interpreted.
(MIDI) is a system for communicating music (Section 7.5)
MIDI messages are carried along a one-way sound either through the use of a controller
communication path using a balanced cable. such as a sustain pedal, or through effects
In the original specification the connector such as overall volume and reverberation.
at each end is a standard 5-pin DIN plug. A There are many MIDI Control Change codes
full MIDI port consists of three 5-pin DIN unallocated that can be used for future
sockets connectors – MIDI IN, MIDI OUT enhancements. Channel mode messages
and (optionally) MIDI THRU. The MIDI IN indicate how the receiver should be
connector is used to receive MIDI messages, configured – polyphonic or monophonic,
the MIDI OUT connector is used to output local control on or off, onmi mode on or off
MIDI messages, and the MIDI THRU (i.e. respond to all MIDI channels or not)
connector outputs a relay of the messages and switch all notes off. Program change
received on the MIDI IN connector. Because messages allow the selection of different
of the physical size of the 5-pin DIN patches (different sounds) for note messages
connector, other types of connection are on a particular MIDI channel. Pitch bend
often used with computers, and this allows the pitch of a note to be varied as it
necessitates the use of an adapter unit or sounds. (Section 10.1)
cable to provide the standard MIDI
connectors. (Section 8) MIDI system messages are of three types –
common, real time and exclusive. System
A MIDI connection uses an optoisolator in common messages contain a number of
the receiver to electrically isolate the MIDI instructions concerned with playing a pre-
equipments. An optoisolator is an electronic recorded set of MIDI codes that form a song.
component that uses light to transfer a System real-time messages are instructions
digital signal from its input to its output. that must be acted upon straight away. As
The MIDI specification therefore is given well as transport controls (start, stop etc.)
in terms of the current needed to drive an there is a MIDI clock message to aid
optoisolator device – current flowing synchronisation, a message to indicate that
indicates a binary 0, no current flowing a the transmitter is still connected and
binary 1. (Section 9.1) operating and a system reset command.
MIDI bytes are sent using asynchronous System exclusive (SysEx) messages contain
serial transfers which involves adding a arbitrary device- or manufacturer-specific
start and a stop bit to the eight data bits. data. Manufacturers have to register with
The addition of these bits allows the receiver the MIDI Manufacturers Association (MMA)
to receive first the individual bits and then to obtain a system exclusive identification
each MIDI byte correctly. (Section 9.2) code. System exclusive messages can be
designated as being real-time (i.e. they must
There are two classes of MIDI bytes – status be acted upon immediately), or non-real-
and data. MIDI status bytes are instructions time. (Section 10.2)
to do something and MIDI data bytes are
the data that the instruction requires (if In order to reduce the amount of data that
any). MIDI status bytes are of two types, needs to be transmitted, running status may
channel and system. (Section 10) be used whereby a string of MIDI messages
which have the same status byte can be sent
Channel messages contain a channel with the status byte being sent only once at
designation and are the messages that carry the start of the string. (Section 10.3)
the main sets of music codes. There are seven
possible channel messages – Note On, Note MIDI status bytes are indicated by having
Off, Aftertouch, Control Change, Program their most significant bit (bit 7) set to 1. This
Change and Pitch Bend. Note On and Note means that MIDI data values must be in the
Off messages are the basic instructions to range 0 to 127. Bits 4, 5 and 6 of a status
play notes. Each requires two MIDI data byte are used to determine the status type,
bytes – a pitch specification and a velocity and bits 0 to 3 are used to indicate the MIDI
value. The data values range between 0 and channel. (Section 10.4)
127 with 60 being the pitch value for Over the years since the original MIDI
middle C (C4), and 127 being the highest specification was published, a number of
velocity value. Aftertouch refers to enhancements have been added. Sample
additional pressure that the player applies dump is an enhancement that allows
to the keys after pressing the notes. wavetable or patch data to be transferred to
Aftertouch can affect individual notes and from a synthesiser. This is achieved using
(polyphonic aftertouch), or all notes special SysEx messages and there are two
(channel aftertouch). Control Change methods of sending the data, one uses a one-
messages refer to a modification of the way connection with no handshaking and
the other uses handshaking and requires a audio files. SMF uses two chunks types – a
two-way connection using two MIDI leads. header chunk that contains information on
The data is transferred in packets of 120 bytes. the form of the MIDI data, and one or more
(Section 11.1) track chunks that contain the MIDI data
itself. The time relationship of the MIDI
The General MIDI enhancement was brought messages is retained by adding an item of
in in response to the problems people were data called the delta time before each MIDI
having because there was no common set message (or event) that indicates how long
of sounds. General MIDI (GM) details a set the next MIDI message is to occur after the
of 128 different musical sound patches and previous one. A delta time of 0 means there
47 percussion sounds that a compliant should be no time interval between
device must be able to produce. In addition messages and that they should be sent one
GM specifies at least a 24-note polyphony directly after the other. To cater for very long
capability and that the GM device must be and very short delta times, and to minimise
able to respond to Note On velocity values, file sizes, the delta time is specified using
Channel Aftertouch and a number of a variable number of bytes. SysEx messages
specific controller status messages. General can be incorporated, and there are a number
MIDI 2 (GM2) further enhances GM to of other items of information called meta
include 32-note polyphony, the ability to events that can also be stored. One of these
produce 16 different musical sounds and 2 is a lyric that can allow words to be
different percussion sounds simultaneously attached to associated MIDI music messages.
and the ability to respond to a number of (Section 11.4)
additional control change messages and some
new SysEx messages concerned with overall MIDI machine control (MMC) and MIDI
effects such as reverberation and chorus. show control (MSC) are enhancements that
General MIDI Lite and Scaleable Polyphony in the main use special SysEx messages to
General MIDI are subsets of GM designed for control a wide range of devices that may be
use primarily in mobile telephone ring tones. associated with or may need to be
Scaleable polyphony GM places a priority on synchronised with MIDI music codes,
MIDI channels so that a less-capable device audio signals and video. MMC is designed
can ignore the lower priority channels it is for controlling hard disk recorders and
unable to cope with. (Section 11.2) similar audio equipment in order to
MIDI time code (MTC) is an enhancement synchronise audio and MIDI. MSC
that provides a constant time interval incorporates and extends MMC to include
reference for synchronisation purposes. This control of a whole host of different devices
is in contrast to the MIDI timing clock including stage machinery, video and film
message that is related to the tempo of the lighting and special effects. Through the use
music. MTC is compatible with the SMPTE of MSC, all aspects of a complete stage
time code that has been in use in the film performance are able to be controlled using
and television industries for many years. MIDI messages sent along standard MIDI
The SMPTE time code was originally based connections. (Section 11.5)
on the number of picture frames that had
occurred since the start of the film or video. MIDI Downloadable Sounds (DLS) allows
Different frame rates are catered for with wavetable data to be sent along a MIDI
different variations in the SMPTE time code. connection or stored in a MIDI file. This
When applied to sound, the SMPTE time means that the precise sounds a song
code indicates the time that a particular part requires can be sent or stored along with
of a recording should occur relative to the the music data so that when the song is
start of the piece. MTC uses a previously played on a DLS-equipped wavetable
undefined MIDI status byte to provide a synthesiser the sound should be heard
special time code message that is sent at a exactly as was originally intended. DLS
rate of between 96 and 120 times each provides for communication/storage of the
second (depending on the SMPTE frame basic digital samples forming the wavetable
rate variation used). There are also a number data, details about any loop points that
of additional SysEx messages to give should be used for the steady-state sound,
additional timing and associated data on any vibrato or tremolo that should be
information. (Section 11.3) added, descriptions of how the volume and
pitch of the sound is to vary through the start
Standard MIDI files (SMF) are used to store up, steady-state and release phases of the
sequences of MIDI messages. These files use notes and details of the behaviour of the
the same basic Interchange File Format (IFF) sound in response to pitch bend and MIDI
as used for AIFF and RIFF WAVE digital Control Change messages. (Section 11.6)
The uses of MIDI now stretch far beyond should not be confused with delays arising
those originally envisaged, and its from inadequate performance that can
shortcomings in terms of the physical produce variable synchronisation problems
connections, the data rate and the limited as well as problems with the sounds
number of channels are becoming more themselves. (Section 12.2)
evident. In the future it is likely that a new
specification will be produced that will The study of a professional musician/
address these shortcomings whilst still composer at work shows that MIDI is an
providing backwards compatibility with the invaluable tool in the creation of background
vast numbers of existing MIDI devices. music for film and television programmes.
(Section 11.7) MIDI offers great flexibility in terms of being
able to try out ideas, the fine adjustment of
There are three main categories of MIDI tempo, the ease of entering and editing the
equipment – those that generate MIDI music, and even, if necessary, in producing
messages, those that manipulate them and the final music without the need to use live
those that interpret MIDI messages to musicians. Synchronisation between music
produce sound. The main MIDI generator is and pictures is easily maintained by using a
the MIDI keyboard, but there are other desktop computer as a central controller. The
common devices such as MIDI drum sets computer controls the playback/recording of
and wind controllers. There are also a the sounds (either as MIDI codes or digital
number of novel MIDI instruments – either audio) as it replays the pictures which have
new devices or conventional instruments previously been transferred to the computer
with a MIDI interface attached. Devices that from video tape. The final sounds are sent to
manipulate MIDI messages include the production company as digital audio, and
sequencers, expanders, filters, mappers, the stability of digital audio and video
patchbays, multiport interfaces, patch playback is such that continuous
editors and librarians and diagnostic synchronisation between the two is not
devices. MIDI sound generators include necessary, and only a small number of
electronic keyboards, synthesisers – with specific synchronisation points need to be
or without keyboards, drum machines and specified. (Section 12.3)
conventional instruments fitted with a
MIDI interface. A MIDI implementation Although MIDI is a robust system that works
chart is a chart with a standard layout that well and is now an essential tool in music
indicates the MIDI capabilities of a device. making, it does have its limitations. Some
(Section 12.1) of these are fundamental in that using a
system of music codes can never exactly
The functions of MIDI generator, represent all aspects of music. MIDI also has
manipulator and sound generator are all some physical/electrical problems in terms
available in today’s desktop computer, of the data speed and connection system.
although generating MIDI music codes may Another problem is that of capturing the
be easier with the use of an attached MIDI nuances of live performers even when the
music keyboard, and better sounds may be MIDI system is capable of representing such
obtained by using an external synthesiser. nuances. In the future the MIDI system is
For a more sophisticated set-up, a desktop likely to be further enhanced in terms of
computer may be augmented with a number both its functionality and physical/
of peripherals such as a multiport interface, electrical aspects. (Section 12.4)
mixing units and sound cards incorporating
a hardware synthesiser. Sophisticated The creation of different types of music
software for desktop computers is available using MIDI codes requires different
that provides a complete integrated MIDI and approaches. Sampled sounds are used when
audio environment. Latency is the term used sounds that mimic those of real instruments
to describe any delay in an audio or MIDI as closely as possible are needed. Sometimes
path that causes two sounds to become out samples of whole phrases or rhythm
of synchronism by a fixed amount of time. patterns rather than just of a single note are
The delays most commonly occur in desktop the best way to obtain a realistic effect.
computers either due to delays in the Synthesised sounds are more often used
computer processing the sound data (MIDI when ‘new’ sounds are required. The use
or audio) or through delays in generation of of MIDI enables passages of music to be
the sounds. Most computer sound programs created that would not otherwise be able to
have facilities to cater for latency. Latency be played in real time. (Section 12.5)
APPENDICES
The tables in these appendices are given for reference only and should
be used to obtain a general idea of the range of sounds that General MIDI
provides for and the types of equipment and operations that MIDI
Show Control includes.
Appendix 1 – Table of General MIDI pitched sounds

Patch Instrument Patch Instrument
PIANO CHROMATIC PERCUSSION
1 Acoustic Grand 9 Celesta
2 Bright Acoustic 10 Glockenspiel
3 Electric Grand 11 Music Box
4 Honky-Tonk 12 Vibraphone
5 Electric Piano 1 13 Marimba
6 Electric Piano 2 14 Xylophone
7 Harpsichord 15 Tubular Bells
8 Clavinet 16 Dulcimer
ORGAN GUITAR
17 Drawbar Organ 25 Nylon String Guitar
18 Percussive Organ 26 Steel String Guitar
19 Rock Organ 27 Electric Jazz Guitar
20 Church Organ 28 Electric Clean Guitar
21 Reed Organ 29 Electric Muted Guitar
22 Accoridan 30 Overdriven Guitar
23 Harmonica 31 Distortion Guitar
24 Tango Accordian 32 Guitar Harmonics
BASS STRINGS
33 Acoustic Bass 41 Violin
34 Electric Bass (finger) 42 Viola
35 Electric Bass (pick) 43 Cello
36 Fretless Bass 44 Contrabass
37 Slap Bass 1 45 Tremolo Strings
38 Slap Bass 2 46 Pizzicato Strings
39 Synth Bass 1 47 Orchestral Strings
40 Synth Bass 2 48 Timpani
ENSEMBLE BRASS
49 String Ensemble 1 57 Trumpet
50 String Ensemble 2 58 Trombone
51 Synth Strings 1 59 Tuba
52 Synth Strings 2 60 Muted Trumpet
53 Choir Aahs 61 French Horn
54 Voice Oohs 62 Brass Section
55 Synth Voice 63 Synth Brass 1
56 Orchestra Hit 64 Synth Brass 2
Patch Instrument Patch Instrument

REED PIPE
65 Soprano Sax 73 Piccolo
66 Alto Sax 74 Flute
67 Tenor Sax 75 Recorder
68 Baritone Sax 76 Pan Flute
69 Oboe 77 Blown Bottle
70 English Horn 78 Skakuhachi
71 Bassoon 79 Whistle
72 Clarinet 80 Ocarina
SYNTH LEAD SYNTH PAD
81 Lead 1 (square) 89 Pad 1 (new age)
82 Lead 2 (sawtooth) 90 Pad 2 (warm)
83 Lead 3 (calliope) 91 Pad 3 (polysynth)
84 Lead 4 (chiff) 92 Pad 4 (choir)
85 Lead 5 (charang) 93 Pad 5 (bowed)
86 Lead 6 (voice) 94 Pad 6 (metallic)
87 Lead 7 (fifths) 95 Pad 7 (halo)
88 Lead 8 (bass+lead) 96 Pad 8 (sweep)
SYNTH EFFECTS ETHNIC
97 FX 1 (rain) 105 Sitar
98 FX 2 (soundtrack) 106 Banjo
99 FX 3 (crystal) 107 Shamisen
100 FX 4 (atmosphere) 108 Koto
101 FX 5 (brightness) 109 Kalimba
102 FX 6 (goblins) 110 Bagpipe
103 FX 7 (echoes) 111 Fiddle
104 FX 8 (sci-fi) 112 Shanai
PERCUSSIVE SOUND EFFECTS
113 Tinkle Bell 121 Guitar Fret Noise
114 Agogo 122 Breath Noise
115 Steel Drums 123 Seashore
116 Woodblock 124 Bird Tweet
117 Taiko Drum 125 Telephone Ring
118 Melodic Tom 126 Helicopter
119 Synth Drum 127 Applause
120 Reverse Cymbal 128 Gunshot
Appendix 2 – Table of General MIDI percussion sounds

In a GM-compliant device, MIDI percussion sounds are accessed using
‘note on’ messages on MIDI channel 10. The Note On pitch value or
‘note number’ selects the individual sound.
MIDI note Percussion sound MIDI note Percussion sound

number number
35 Acoustic Bass Drum 59 Ride Cymbal 2
36 Bass Drum 1 60 Hi Bongo
37 Side Stick 61 Low Bongo
38 Acoustic Snare 62 Mute Hi Conga
39 Hand Clap 63 Open Hi Conga
40 Electric Snare 64 Low Conga
41 Low Floor Tom 65 High Timbale
42 Closed Hi-Hat 66 Low Timbale
43 High Floor Tom 67 High Agogo
44 Pedal Hi-Hat 68 Low Agogo
45 Low Tom 69 Cabasa
46 Open Hi-Hat 70 Maracas
47 Low-Mid Tom 71 Short Whistle
48 Hi-Mid Tom 72 Long Whistle
49 Crash Cymbal 1 73 Short Guiro
50 High Tom 74 Long Guiro
51 Ride Cymbal 1 75 Claves
52 Chinese Cymbal 76 Hi Wood Block
53 Ride Bell 77 Low Wood Block
54 Tambourine 78 Mute Cuica
55 Splash Cymbal 79 Open Cuica
56 Cowbell 80 Mute Triangle
57 Crash Cymbal 2 81 Open Triangle
58 Vibraslap
Appendix 3 – MIDI show control devices and commands

The table below shows the currently specified (in 2004) MIDI show control command
format devices.
Value Device Value Device

0 Reserved for future extensions 48 Video (general device category)
1 Lighting (general device category) 49 Video tape machines
2 Moving lights 50 Video cassette machines
3 Colour changers 51 Video disc players
4 Strobe lights 52 Video switchers
5 Laser lights 53 Video effects
6 Follow spotlights 54 Video character generators
16 Sound (general device category) 55 Video still stores
17 Music 56 Video monitors
18 CD players 64 Projection (general device category)
19 Solid state memory playback 65 Film projectors
20 Audio tape machines 66 Slide projectors
21 Intercoms 67 Video projectors
22 Amplifiers 68 Disolvers
23 Audio effects devices 69 Shutter controls
24 Equalisers 80 Process control (general device category)
32 Machinery (general device category) 81 Hydraulic oil
33 Rigging 82 Water
34 Flys 83 Carbon dioxide
35 Lifts 84 Compressed air
36 Turntables 85 Natural gas
37 Trusses 86 Fog
38 Robots 87 Smoke
39 Animation 88 Cracked haze
40 Floats 96 Pyrotechnic effects (general device category)
41 Breakaways 97 Fireworks
42 Barges 98 Explosions
99 Flame
98 Smoke pots
127 All devices
The table below shows some of the MIDI show control

commands that may be used with the devices listed above.
Value Command
0 Reserved for future extensions
1 Go
2 Stop
3 Resume
4 Timed go
5 Load
6 Set
7 Fire
8 All off
9 Restore settings
10 Reset
11 Timed stop
Activity 7
As you will recall from Chapters 1 and 2 in Block 2 as the length
of a pipe is reduced the pitch of the note produced is increased.
Miniaturisation meant that pitches of notes produced by the
mechanisms would be unrealistically high.
Activity 11
There are at least three major drawbacks. Firstly, no tune can be longer
than the time it takes for the barrel to complete a single revolution.
Tunes had to be arranged to fit this time which of course varied
between instrument models and makers. Secondly, the barrel is
difficult, although not impossible, to change so the range of tunes
offered was limited and difficult to up-date. Finally, each barrel
had to be hand-made and so were very expensive to buy.
Other problems that you may have considered include the size of the
barrels, the fact they are quite fragile (a single broken pin would spoil
the music and be difficult to repair), and the poor action that led to
mediocre piano performance.
Activity 19
Most stringed instruments other than pianos and harpsichords have to
form the note. Violins, cellos, double-basses, guitars, banjos, etc., all
rely on fingering to make the note on the string before playing it.
Activity 25
The pin barrel system reproduced the music as written, but the
performance was dependant upon the skill of the artisan who made
the pin barrel. The selection and placement of the pins would
directly affect the resulting performance.
The piano roll was a faithful reproduction of the performance of the
original artist (with minor corrections!). Naturally the quality of the
player piano could affect the sound but the performance was that of
the artist. So, in Activities 23 and 24, when you listened to Rhapsody
in Blue you were hearing George Gershwin himself play perhaps his
most famous work.
Activity 29
Chapter 1 of this block mentioned that the use of a twisted pair
balanced connection reduced the effects of interference from other
electrical sources. A screened cable also helps this as well.
The greater the length of lead, the more chance there is of interference
becoming a problem. So for cable lengths of 15 m it is reasonable to
suppose that some protective measures like a using a balanced signal
via a twisted pair of wires and screening might need to be used.
Activity 30
The interconnection diagram is shown in Figure 34.
MIDI IN MIDI THRU MIDI IN MIDI OUT MIDI IN
synthesiser 1 synthesiser 2 synthesiser 3
MIDI OUT
MIDI keyboard
Figure 34 Interconnection diagram for Activity 30
Activity 31
The serial conversion process involves the addition of two extra
bits to the eight MIDI bits giving a total of 10 bits for each message.
The maximum number of messages is achieved by sending each
message straight after the previous one with no intervening time gap, so if
31 250 bits can be sent each second, a maximum of 31 250 ÷ 10 = 3125
MIDI bytes can be sent.
Activity 33
(a) The A above middle C (A4).
(b) The C an octave below middle C (C3).
Activity 36
(a) Starting on the right, the weightings of the bits that are 1 are:
4 + 8 = 12
(b) Starting on the right, the weightings of the bits that are 1 are:
16 + 32 = 48
(c) Starting on the right, the weightings of the bits that are 1 are:
1 + 8 + 64 + 128 = 201
Activity 37
Looking at the weightings of the lowest four significant bits only:
(a) 4 + 8 = 12, so MIDI channel 13 is being referred to.
(b) 2 + 4 = 6, so MIDI channel 7 is being referred to.
Both bytes have their most significant bits as one, which makes them
both MIDI status bytes.
Activity 38
(a) (i) Program change on MIDI channel 4
(ii) Note Off on MIDI channel 10
(b) (i) 226
(ii) 212
Activity 42
The MIDI clock is sent at a rate dependant on the crochet (quarter note)
speed of the music being played. In turn, the song pointer is related to
the number of MIDI clocks. So both of these will vary with the tempo
and content of the music, and are not fixed to absolute time.
Activity 43
If a fixed number of bytes were to be used, the actual number must
cater for the largest possible delta time, which could be a long time and
therefore require many bytes. However, it is likely that many
individual MIDI messages will need to occur at the same time, or
within a very short time interval, i.e. have a delta time of 0 or a small
number. Having to use up many bytes of data in every such case is
clearly going to add a large overhead to the amount of data that needs
to be stored, and therefore will increase SMF file sizes.
Activity 46
A single MIDI port can only carry 16 channels, so if more than this
number are needed, then additional MIDI ports need to be used (i.e.
additional sets of connections). Modern samplers/synthesisers now
often incorporate two or more MIDI ports (i.e. more than one set of
DIN connectors – see Figure 17). In addition, a separate MIDI interface
box is used to allow MIDI data to be routed from and to specific devices.
Activity 47
A local mode on/off mode is where the synthesiser or MIDI device has
the facility to disconnect its keyboard circuitry from its sound
generation circuitry. Thus in local off mode, when the keyboard is
played, the MIDI OUT signal contains the MIDI messages
corresponding to the key presses, but no sound is generated. However,
in this mode, MIDI messages fed to the MIDI IN input do get
interpreted by the sound production circuits.
So, in the situation where some processing of the MIDI signal needs to
be done before the signal is used to generate any sound, the
synthesiser is put in local off mode, the device’s MIDI OUT signal is
fed to the MIDI manipulation device, and this device’s MIDI OUT
signal is fed to the synthesiser’s MIDI IN.
Activity 50
(a) The D-50 transmits MIDI note numbers from 24 to 108. If 60
represents the pitch C4, then 24 represents C2 and 108 represents
C8. Thus the pitch range is C4 to C8.
(b) In the MIDI implementation chart, key aftertouch (polyphonic
aftertouch) has a cross in both transmit and receive columns
indicating it does not implement this feature. However, the
channel aftertouch row has an asterisk in both columns indicating
that this feature can be switched on or off under user control, and
that the setting is memorised. Thus the D-50 does implement
channel aftertouch.
Activity 51
For processing digital sound data the main requirement of the
computer is to be able to store large quantities of data (i.e. the digital
sound samples) and to move them around very quickly. Thus the
critical aspects will be the speed of operation of the computer, the
amount of main memory and possibly the size of the secondary
memory (hard disk).
Activity 53
The main advantages of using MIDI are:
• the music can be entered in a number of ways to suit the composer’s
ability and choice, some of these do not require the music to be played
in real time or require the composer to be able to read conventional
music notation;
• the timing of the music can be continuously adjusted to fit the video;
• the music can easily be cut up, copied and pasted;
• a good idea of how the final music will sound can be obtained by
using General MIDI sounds even though eventually special sounds
and/or live musicians will be used.
Activity 57
There are two aspects to dance music where there is a lot of percussion
that contribute to causing a problem with the speed of MIDI messages.
First the General MIDI specification allocates channel 10 to percussion
sounds, and since a MIDI device often sends MIDI note messages in
channel order, the data for channels 1–9 must be sent before the
percussion data is sent. This means that there may be a variable delay in
the percussion sounds sounding. Since these sounds usually contain the
pulse of the music, any delay, and particularly a varying delay, may cause
the beat of the music to appear to vary and also cause the percussion to
perhaps not sound in synchronism with the other instruments.
The other problem is the quantity of messages that percussion sounds
use. Consider a drum roll. Each individual hit of the drum in a roll has to
be indicated by a separate MIDI message. Put these all together to create a
roll and you get a huge rush of MIDI messages in a very short time. The
quantity of these messages can sometimes cause the MIDI transmission
system to overload resulting in delays to other MIDI messages and/or an
uneven drum roll.
Activity 59
In order to be able to synchronise the melody horn to the MIDI version
of the jazz arrangement, Simon needed to listen to the MIDI sounds and
also hear the pulse of the music with an audible click as he played.
In the recording the MIDI version and click must not also be recorded
with the melody horn, so Simon had to wear headphones. However,
unless the headphones are of the ‘enclosed’ type then it is quite possible
that the unwanted sounds might still be heard in the background of the
melody horn recording. (In fact many of the cheaper unenclosed type of
headphone are very bad at restricting the sound that emanates into the
area surrounding the listener – how many times have you heard quite
clearly what the person sitting next to you on the train is listening to
on their personal jukebox!)
LEARNING OUTCOMES
After studying this chapter you should be able to:

2 Explain how music may be stored as a set of instructions which,
when applied appropriately, will play an instrument or
instruments.
3 Describe the ways in which instructions to play mechanical
musical instruments may be stored, including mention of their
limitations and disadvantages. (Activity 11)
4 Outline the operation and development of the mechanical musical
instruments introduced in the main text. (Activity 25)
5 Use mechanical musical instruments to draw conclusions about
the general features and limitations of music in code.
6 Outline the background to and the development and limitations of
the basic MIDI system.
7 Describe a basic MIDI set-up that covers the physical, electrical,
operational and functional components.
8 Outline the hardware and electrical elements of a MIDI connection.
9 Describe the types of MIDI bytes and the basic forms of MIDI
messages, and, given appropriate specification details, produce or
interpret short sections of MIDI codes. (Activities 33, 37 and 38)
10 Describe the purpose and outline the operation and implementation of
each of the various enhancements to the basic MIDI specification
that have been introduced in the main text, and, given appropriate
specification details, produce or interpret short sections of MIDI
codes which use these enhancements. (Activity 43)
11 Describe the problems that synchronisation between audio, video
and MIDI devices presents, and outline the methods and standards
used to achieve such synchronisation, linking their operation to
their historical basis.
12 Identify and outline the function of devices that generate, store,
manipulate or interpret MIDI codes, and be able to extract
information about a device from its MIDI implementation chart.
13 Outline the hardware and software requirements of a desk-top
computer that is to be used in the creation of music using digital
audio and MIDI, mentioning some of the advantages and
disadvantages/problems of using such a computer for this purpose.
14 Outline the processes and equipment involved and describe the
advantages, disadvantages and limitations of using MIDI in the
creation of music. (Activities 46, 53, 57 and 59)
15 Apply the principles and ideas of the coded music systems that
have been introduced in the main text to new, given, situations in
order to carry out appropriate calculations, descriptions and
deductions concerning the new situation.
Acknowledgements
permission to reproduce material in this chapter:
Figures 28 and 29: Yamaha-Kemble Music (UK) Ltd; Figure 32:
Roland (UK) Ltd; ‘Music in Code’ video sequences: Paul Camps,
Janet Whitehead and the late Graham Whitehead founder of the
nickelodeon collection at Ashorne Hall, Warwickshire; ‘Music in Action’
video sequences: Simon Whiteside; translation of Der Leiermann in
Activity 6: Janet Seaton.
228 TA225 BLOCK 3 SOUND PROCESSES INDEX 228
INDEX
Notes 1 This index covers Block 3, Chapters 1–3.

2 Where terms are referenced in two or more places, the page number is only given in one place,
cross references are given for the other entries.
3 Page numbers in bold refer to places where the term appears emboldened in the main text.
4 The index does not cover the aims, chapter summaries, answers to self-assessment activities or
learning outcomes.
access time audio connector 32, 91

hard disk see hard disk access time see also BNC connector, DNP connector,
ADAT see Alesis digital audio tape jack connector, phono connector, TRS
recorder connector, XLR connector
Aeolean Company 141 audio digital versatile disc (DVD-A) 37
AES/EBU digital interface 23, 30, 31, 90, 151 audio expansion/gating 55
block 25, 27 audio file format 43
channel status 25, 26, 27 header see header
frame 25, 27 interleaving tracks see interleaving
parity bit 26 see also AUdio file format, audio
sub-frame 25, 26, 30 interchange file format, resource
user data 26, 28 interchange file format
validity flag 26 audio interchange file format (AIFF) 44, 187
aftertouch see MIDI channel aftertouch comment chunk 48
AIFF see audio interchange file format common chunk 46
Alesis digital audio tape recorder 37 header chunk 46
AM see amplitude modulation instrument chunk 47
amplifier simulation 88 marker chunk 47
amplitude MIDI chunk 47
peak-to-peak see peak-to-peak amplitude name, author etc. chunks 48
r.m.s. see r.m.s. amplitude sound data chunk 46
amplitude modulation (AM) 86 audio limiting 53
audio normalisation see normalisation
analogue audio cable see cable
analogue input 14 audio processor 8, 84, 154
external control 89
impedance 17
line see line input audio recorder 206
sensitivity 14, 16 hard disk see hard disk audio recorder
analogue output 20 audio restoration 36
headphone/loudspeaker see loudspeaker audio workstation 8, 42, 43
power 21 see also Yamaha AW16G
loading see loading of an analogue output automata 139
analogue-to-digital converter 42, 205 AW16G see Yamaha AW16G
application (computer) 206
arpeggio 114
Babbage, C. 137
Ars Antiqua/Ars Nova 114
balanced cable 31, 158
art music 109
balanced input 34
asynchronous transfer 163
balanced system 19
attack time 54
band-pass filter 73
attenuation (in cables) see loss of signal in
cables bandwidth
of a digital audio input/output 23
AU see AUdio file format
banjo 147
AUdio file format (AU) 44
chunk see chunk Baroque period 114
header 45 barrel
audio bus 65 noting 135
audio compression/limiting 53, 87, 88 pin see pin barrel 131
attack time see attack time barrel organ 133, 136, 139
compression ratio see compression ratio bass
decay time see decay time in figured see figured bass
compression/limiting/expansion/gating bass equalisation 71
in broadcasting 53 bell 138
threshold level see threshold level bit 160
transfer characteristics 54 bit synchronisation 24
blanket (in printing) 119

comb filter 79
BNC connector 33
COMM see AUdio file format (AU)
bounce (audio mixing mode) 10, 68

common chunk
bowing 147
compact disc (CD) 37, 209
breve 114
drive 51, 65
broadcasting 145
player 183
brush 212
sub-code data 28
bucket brigade delay line 74

compression
audio see audio compression/limiting
buffer (temporary digital storage) 75
compression ratio 53
bus (audio) see audio bus 61
computer
byte 43
application see application
bit numbering 174
desktop see desktop computer
weightings of bits 173
file see audio file format, standard MIDI
file
hardware see hardware
cable 19, 23, 31, 91

latency see latency
balanced see balanced cable

memory see memory
impedance see cable impedance

MIDI interface see MIDI interface
interference see interference in cables

performance 207
MIDI see MIDI cable

peripheral see peripheral
noise see noise in cables

software see software
optical see optical fibre/cable

software synthesiser see software
screened see screened cable

synthesiser
cable impedance 31
sound card see sound card
campanology 132
system software see system software
canon 110
connector
capacitor microphone 20, 35

audio see audio connector
cardboard book 142

container chunk 48
carillon 132, 146, 151, 202

copyist (music) 115, 120
CD see compact disc

course (string/keyboard instrument) 114
cent 89
crash (hard disk) see hard disk crash
channel
cross-fade 57, 59, 83, 193
audio 9
crosstalk (in a stereo channel) 86
MIDI see MIDI channel

cueing commands 193
channel aftertouch (MIDI) see MIDI channel

cylinder
aftertouch
musical box 138
channel message (MIDI) see MIDI channel

pin see pin barrel
message
channel mode (MIDI) 167
channel pressure see channel aftertouch

d’Arezzo, G. 112
chant 111
D-50 synthesiser 203
chorus 11
MIDI implementation chart 204
chorus effect 80, 86, 182

DAT see digital audio tape
generating 80
data byte (MIDI) see MIDI data byte
chunk 45, 187
container see container chunk

data rate 23
local see local chunk

dB see decibel
see also audio file format, standard MIDI

dBm 16
file
dBu 16, 18
church music 110

Decap Blue Angel café organ 131
clipping 21
decay time (in reverberation) 76
clock chiming mechanism 132

decay time in compression/limiting/
code
expansion/gating 54
MIDI see MIDI code decibel (dB) 15
music see coded music dBm scale see dBm
coded music 109, 131, 149

dBu scale see dBu
and dynamics 150

tips on remembering values 15
and pitch 150

delay
and tempo 150

analogue device see bucket brigade delay
and timbre 149

line
MIDI see MIDI

digital 75, 77, 78
need for standards 149

delaying audio signals 74, 84
coin-in-the-slot mechanisms 134

delta time 189
colour (in reverberation) 76

density (in reverberation) 76
depth (in reverberation) 76

Eisbouts Company 132
desktop computer 7, 89
electroacoustic music 116
see also computer

electronic instrument 9, 52, 61
desktop publishing 7
electronic keyboard 154, 181, 201
desktop sound 8
electronic memory 38, 40, 41
destructive editing 58
flash see flash memory
diffusion (in reverberation) 76
non-volatile see non-volatile memory
digital audio tape (DAT) 209
RAM see random access memory
recorder 42, 37
volatile see volatile memory
digital filter 73
electronic music 116
digital input/output 22
end of exclusive (EOX) 171, 176, 179
AES/EBU see AES/EBU digital interface engraving 118
MADI see MADI digital interface

S/PDIF see S/PDIF digital interface envelope (of a sound signal) 86, 195
digital signal processor (DSP) 62

envelope follower 86
digital-to-analogue converter 42, 205

EOX see end of exclusive
episema 113
disc musical box 139
EQ see equalisation
pin disc see pin disc
Polyphon see Polyphon

equalisation (EQ) 11, 70, 87, 88
Symphonion see Symphonion

bass see bass equalisation
DLS see MIDI downloadable sounds

controls 42
DNP connector 33
fixed 70, 71
graphic see graphic equalisation
Doppler effect 81
in reverberation 76
double-impression printing 117

mid-range see mid-range equalisation
downloadable sounds see MIDI parametric see parametric equalisation
downloadable sounds
treble see treble equalisation
downsizing 64
see also Q
drop frame (SMPTE time code) 184

etching 118
drum (in a musical box) 138

Ethernet 194, 196
drum kit (MIDI) see drum machine

expansion (audio) see audio expansion/
gating
drum machine 198, 201, 205

expression (in Pianola) 141
dry-transfer character 120
dry/wet balance (in reverberation) 76
DSP see digital signal processor

fading in/out 52, 59, 61
DVD-A see audio digital versatile disc

fader 17, 61, 89, 90
dynamic range 11, 52, 64

fast Fourier transform 83
dynamics 88, 109

Favre, A. 138
dynamics processing 87
FFT see fast Fourier transform
FIFO see first in first out buffer
early reflection delay (in reverberation) 76

figured bass 114
file format (audio) see audio file format
earth loop 161
film 182
echo 74
analogue techniques 74
film music 116
digital techniques 74
filter
band-pass see band-pass filter
edit list 11, 58, 69, 87

digital see digital filter
editing (audio) 10, 36, 57, 65, 66, 70, 91

high-pass see high-pass filter
low-pass see low-pass filter
cross-fading see cross-fading

see also Q
destructive see destructive editing

Finale (music setting program) 121
edit list see edit list

FireWire 30, 41, 90, 196, 206
fading in/out see fading in/out
first in first out buffer (FIFO) 75, 77, 82
location counter 60
non-destructive see non-destructive

flanging 78
editing
nudge function 60
splicing see splicing

flash memory 41, 42, 43
tape reel shuttling 59

FMT see Resource Interchange File Format
format chunk
effect (audio) 10, 11, 52, 61, 88, 89, 91
non-real time 89
FORM see AUdio file format header
see also chorus, echo, envelope follower,

format (file) see audio file format
equalisation, flanging, invert, pitch Fourier transform (fast) 83
changing, stereo imaging, tempo frame 49
changing, reverberation, vocoder frame rates (video/film systems) 184
frame synchronisation see word/frame intaglio 118, 119
synchronisation
interchange file format (IFF) 44, 187
free reed 143

variable length specification see variable
frequency domain (working in) 83

length specification
fret 114, 147

interference (in cables) 19, 31, 161
interleaving (tracks in an audio file) 44, 46,
47
gain (of an amplifier) 53

Internet 43, 194
gating (audio) see audio expansion/gating

inversion 84
General MIDI (GM) 154, 157, 168, 181, 195,
201, 206, 211, 212
feature list 181

jack connector 32, 34, 35
patch list 218

Jacquard system 142
percussion sound list 220

Jacquard, J.M. 141
General MIDI 2 (GM2) 182
feature list 182
General MIDI Lite (GML) 182

karioki 190
GM see General MIDI

key pressure see MIDI polyphonic
aftertouch
GM2 see General MIDI 2
keyboard
GML see General MIDI Lite
electronic see electronic keyboard
Grainger, P. 145
MIDI see MIDI keyboard
gramophone 140, 145
Grand Electric Orchestra 147
graphic equalisation 71
latency 207
Guido’s hymn 113

LED see light emitting diode
Guidonian hand 113

Léonin 114
guitar 87
letterpress 117
MIDI see MIDI guitar light emitting diode (LED) 161, 201
limiting (audio) see audio compression/

limiting
Halstan process 120

line input 14, 34
hard disk 37, 65, 66

line output 35
access time 39
line-level signal 18
audio recorder 38
Linotype 7
disk crash 39
LIST see Resource Interchange File Format
sector 39
LIST chunk
seek time see seek time
track 39
litho plate 119, 121
hard disk drive (HDD) 39

lithography 119
hard disk recorder 12

liturgical music see church music
versus desktop computer 40

loading
hardware (computer) 205

of an analogue output 17, 21
see also clipping
HDD see hard disk drive
local chunk 46, 48
header (audio file format) 44
local on/off (MIDI) see MIDI local on/off
headroom 57
localisation (in stereo sound field) 85
high-pass filter 73
long (note value) 114
hum 161
long-playing disc see vinyl LP
hurdy-gurdy man 136

looping 195
hurdy-gurdy see street piano

loss of signal (in cables) 31
loudspeaker 87
IEEE 1394 see FireWire

power 21
simulation 87
IFF see interchange file format
low-pass filter 73
image (in a stereo sound field) 85
lute 114
impedance 17
analogue input see analogue input
impedance
MADI digital interface 30
implementation chart (MIDI) see MIDI

Mahler, G. 109
implementation chart
manual (keyboard instrument) 152
improvisation 109
MD see MiniDisc
input
mechanical musical instrument 132
analogue see analogue input

Mediaeval period 110
digital see digital input

memory (computer) 131
meta event (standard MIDI file) 190

merger 200
list of events 191

message 155, 157, 163, 164
metering systems 18, 41

message value ranges 175
see also volume unit, peak programme

multiport interface see multiport MIDI
meter
interface
microphone 9, 14, 16, 52, 61, 65, 66, 109

notation information (IN MTC) 185
capacitor/condenser see capacitor

Note Off see Note Off
microphone
Note On see Note On
phase 84
origins and acceptance 153
vocal 53
patch editor/ librarian 200, 208
microtone 115
patchbay 200
mid-range equalisation 71
piano 202
Middle Ages 113

‘piano roll’ editor 209
MIDI 44, 90, 109, 131, 149, 151

pitch bend 169, 195
active sensing 170

pitch value 174
aftertouch see aftertouch

polyphonic aftertouch/key pressure 167
and computers 159

port 158
and synchronisation 157

program change 168, 199
applications 194
quarter frame message 184
basics 155
router 171
byte 157
running status 172
cable 158
sample dump 178
channel 156, 181

sequencer see sequencer
channel aftertouch/channel
serial-to-parallel conversion see serial-to-
pressure 166, 167

parallel conversion
channel message 157, 164, 165

show control see MIDI show control
channel mode see channel mode

software synthesiser see software
code 157, 173

synthesiser
computer file see standard MIDI file

song position pointer 169
connections 158, 159

song select 170
control change 167, 199

sound generator 201
control change/channel mode
values 177
start/continue/stop 170
controller 167, 173

status byte 157, 164, 174
cueing message (in MTC) 185

status byte values 174, 177
data byte 157, 164

stuck note 166
data rate 163

system common message 169, 176
development 152
system exclusive message see SysEx
diagnostic tools 201
system exclusive see system exclusive
downloadable sounds see MIDI MIDI message
downloadable sounds system message 157, 164, 169
drum controller/kit/machine see drum system real time message 170, 176
machine
system reset 170
electrical specification 160

time code see MIDI time code
end of exclusive see end of exclusive

timing clock 170
expander 200
trigger 198
file format see standard MIDI file

tune request 170
filter 200
universal SysEx message see SysEx
full message (in MTC) 185

user bit (in MTC) 185
General MIDI see General MIDI

velocity value 174
generator 197
wind controller 198, 205
guitar 198
MIDI downloadable sounds (DLS) 154, 157,
implementation chart see MIDI

194, 201, 208, 209
implementation chart
general form 195
interface 206
MIDI implementation chart 202
interface circuits 162

description of contents 203
introduction 151
MIDI IN/OUT/THRU connectors 158, 162,
keyboard 197, 205

168, 171, 178
librarian 154
MIDI machine control (MMC) 157, 192
limitations and improvements 155, 210

general form 192
list 209
MIDI Manufacturers Association
local on/off 159, 168

(MMA) 157, 159
machine control see MIDI machine

MIDI sample dump 178
control
MIDI show control (MSC) 157, 192
manipulator 199
device and command list 221
mapper 200
general form 193
MIDI time code (MTC) 157, 184, 190, 200

organ-grinder 137
MiniDisc (MD) 37
organum 113
mix-down (mix) 10, 36, 61, 182

ornament 109, 114
mixer 206
output
mixing (audio) 10, 42, 61, 91

analogue see analogue output
bounce/ping-pong see bounce

digital see digital output
direct 66
mixdown 69
mixed mode 66
packet (MIDI sample dump) 178
MMA see MIDI Manufacturers Association

pad 64
modulation wheel 167

pan 85, 167
monophonic (mono) 85, 152, 168

paper roll 131, 141
monophonic unison 111

parallel digital audio processing 63, 73, 75
Monotype 7
parametric equalisation 71, 72
MTC see MIDI time code

patch 66, 168, 171, 180
multi-channel audio digital interface see

peak programme meter (PPM) 18
MADI digital interface

peak-to-peak amplitude 14
multiplexing 63
peripheral (computer) 205
multiport MIDI interface 200

Pérotin 114
multitrack recorder/recording 52, 57, 59, 61

phantom power 20, 34
multitrack tape recorder 182

phase 85
music notation 109, 131

changing see inversion
analysis 111
phase vocoder see vocoder
and composition 111

phasing effect 86
functions of 110
phasing see flanging
handwriting of parts 116
history 111
phono connector 32
printing see printing music

phonograph 140
staff see staff notation

photolithography 119
music printing 110, 116

offset see offset photolithography
music setting 118

piano 146
by computer 121
damper 135
musical box 109, 138, 151

hammer 135
MIDI controlled see MIDI piano
disc see disc musical box

reproducing see player piano
musical instrument digital interface see MIDI

table of sales around 1900 140
piano roll 144, 148, 151
Nancarrow, C. 148
MIDI editor see MIDI piano roll editor
neume 111
recording 145
noise
piano-organ see street piano
in cables 19
Pianola 141, 144
non-destructive editing 11, 58

pin barrel 133, 134
non-volatile memory 42
dial noting 135
pin 134
non-Western music 109
pinning 135
normalisation 52, 56
scale noting 135
notation see music notation

pin disc 139
notch 78
ping-pong (audio mixing mode) see bounce
Note Off (MIDI message) 166, 168

pipe organ 136, 138, 152, 165, 168
pitch 166
piston see piston
velocity 166
rank see rank
Note On (MIDI message) 165

stop see stop
pitch 165
velocity 165
piston 152
pitch changing 11, 81, 82, 150
variation of timbre 195
offset photolithography 119

plainsong/plainchant 111, 113
omni on/off 168

plate (in reverberation) 77
optical fibre/cable 23, 32, 33

player piano 144, 150
optoisolator 160, 161

mechanism 144
orchestrion 133, 136, 146

plucking 147
organ 146
pointer (in FIFO) 75, 82
barrel see barrel organ

Polyphon 139
pipe see pipe organ

polyphonic 150, 168
polyphonic key pressure see MIDI

reverberant room 77
polyphonic aftertouch
reverberation 11, 52, 65, 74, 75, 86, 88, 167,
polyphonic synthesiser 153

182
port
algorithm see reverberation type
MIDI see MIDI port

Portastudio 12
colour/equalisation see colour
power (electrical) 16
decay time see decay time
of a loudspeaker output see loudspeaker depth see depth
power
diffusion see diffusion
PPM see peak programme meter

diffusion/density see diffusion
ppq see pulses per quarter note

pre-delay (in reverberation) 76

dry/wet balance see dry/wet balance
pressure wave 109

early reflection delay see early reflection
delay
printing
double-impression see double

- pre-delay see pre-delay
type (algorithm, room size) 76
impression printing
music 116
rhythm
program (synthesiser patch) see patch

and notation 109, 113
RIFF WAVE see resource interchange file
protocol 22
format
pulses per quarter note (ppq) 170
roll (paper) see paper roll

punch in/out 11, 35, 69
room size (in reverberation) see

pushup see Pianola
reverberation type
root-mean-square amplitude see r.m.s.
amplitude
Q 72
QuarkXpress 7
quarter tone 115

S/PDIF digital interface 28, 30, 35, 65, 90,
Quicktime 207
151
channel status 28, 29
see also serial copy management system
SACD see super audio compact disc
r.m.s. amplitude 14
sample frame 46
RAM see random access memory
sampled sound 211
random access 38
sampling 74
random access memory (RAM) 41, 42, 51, 75
interval 23
rank 152
interval see also sampling rate
real-time 8, 11, 36, 40, 41, 44, 49, 52, 57,

Scaleable Polyphony General MIDI
59, 62, 63, 83, 207

(SPMIDI, SP-GM) 182
and MIDI 156
scene list/memory see edit list
digital audio transfers 23, 25, 26, 27, 30
SCMS see serial copy management system
recording musical performances 7

screened cable 19, 31
reflections
sector (hard disk) see hard disk sector
in cables 31
seek time (hard disk) 39
reiterating mechanism 142
self-clocking system 25
relief (in printing) 117
semitone 89
Renaissance period 114

sensitivity
reproducing piano see player piano

of an analogue input see analogue input
resistance (electrical) 16, 17, 74

sensitivity
resource interchange file format (RIFF

sequencer 153, 154, 199
WAVE) 49, 51, 187

serial copy management system (SCMS) 28
associated data chunk 50

serial data 23, 156
cue chunk 50
see also asynchronous transfers, bit/byte
data chunk 48
synchronisation, self-clocking system
fact chunk 50
serial-to-parallel conversion 163
format chunk 48
setting see typesetting
header chunk 48, 49
Sibelius (music setting program) 121
instrument chunk 50
LIST chunk 50
side drum 212
playlist chunk 50
signal-to-noise ratio 11, 18, 52, 70
SLNT chunk 50
slice (sound) 83
sound data chunk 49

SLNT see Resource Interchange File Format
text chunk 50
SLNT chunk
restoration of recordings see audio slur 118
restoration SMF see standard MIDI file
SMPTE time code 183, 200, 209

Symphonion 139
offset 187
synchronisation 183, 200, 206, 209,
outline description 184

bit see bit synchronisation
Society of Motion Picture and Television

of a serial data stream 23
Engineers (SMPTE) 183

word/frame see word/frame
software (computer) 206

synchronisation
software synthesiser 206

synthesised sound 211, 212
solid state memory see electronic memory

synthesiser 35, 66, 152, 154, 168, 201, 205
control voltage 153
song pointer 184
SysEx (MIDI) 90, 171, 176, 178, 182
sound card 205
channel 171
sound loop 64
sample dump message 179
sound pressure level (SPL) 15, 16

universal SysEx message 172
sound stage 85
system exclusive message (MIDI) see SysEx
SP-GM see Scaleable Polyphony General
system software (computer) 206
MIDI
SPL see sound pressure level
splicing (tape) 58, 59
SPMIDI see Scaleable Polyphony General

TA225 Course Tune 92, 197, 211
MIDI
tablature 114
spring
take 9, 10, 51, 52, 57, 59
in reverberation 77
tempo (in Pianola) 141
SSND see AUdio file format (AU) sound

tempo changing 81, 82, 150
data chunk
using time slices 83
staff notation 114

temporary storage (digital) 73
standard MIDI file (SMF) 157, 187

threshold level 53
delta time see delta time

time code
header chunk 188

MIDI see MIDI time code
meta event see meta event

see also SMPTE time code
mode/type 188
time domain (working in) 83
SysEx 190
time signature 114
ticks 189
track 188
timing clock 184
track chunk 189

tonic solfa 113
start bit 163

total internal reflection (in an optical fibre/
status byte (MIDI) see MIDI status byte

cable) 32
stave 113
track
audio 9
stave line 117
hard disk see hard disk track
stereophonic (stereo) 85
transducer (sound) 77
field 85
transient 54, 195
image see image

transposing/non-transposing
localisation see localisation

instrument 116
pan see pan

transposition 81
sum and difference signal see sum and

treble equalisation 71
difference signal
tremolo 195
stop (pipe organ) 152, 168

triad (chord) 172
stop bit 163
storage (of sound) 109

trill 136
storage of music with codes see coded

TRS connector 33, 34, 35
music
tuning 182
storing audio 36
type (in printing) 117
table of requirements 36
typesetting 7, 118
see also Linotype, Monotype
street piano 133, 137
striping (SMPTE time code) 184
sub-code data 29
sum and difference signal (in a stereo

Universal Synthesiser Interface (USI) 153
channel) 85, 86
universal serial bus (USB) 30, 90, 160, 194,
super audio compact disc (SACD) 37

196, 206
sustain 136
USB see universal serial bus
sustain pedal 167

USI see Universal Synthesiser Interface
variable length specification (in IFF WAV(E) see Resource Interchange File
chunks) 189
Format
vibrato 136, 195

wavetable (synthesis) 82, 178, 194, 208
vinyl disk see vinyl LP

word/frame synchronisation 25
vinyl LP
pickup 17
XLR connector 33, 34
violin 168
electromechanical 146
Yamaha 01X Digital Mixing Studio 90
virtual track 51
Yamaha AW16G 12
vocoder 83, 86
dynamic range 12
volatile memory 42
editing facilities 60
volume 14
effects provided 87
see also r.m.s. amplitude

external control facilities 91
volume unit (VU) 18

frequency response 12
VU see volume unit

front panel controls 13
inputs and outputs 34
mixing facilities 64
Wagner, R. 109
storage facilities 51

Acknowledgement
Cover image: © 1997 Photodisc, Inc.

Ebook TA225 Block 3 Part 1 ISBN0749258969 L3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ebook TA225 Block 3 Part 1 ISBN0749258969 L3

Uploaded by

Copyright:

Available Formats

1 TA225 BLOCK 3 SOUND PROCESSES CHAPTER 1 DESKTOP SOUND 1

Chapter 2 Notation and

Chapter 3 Carillon to MIDI page 127

Index page 229

The Open University

Walton Hall, Milton Keynes

First published 2004

Copyright © 2004 The Open University

transmitted or utilized in any form or by any means, electronic, mechanical, photocopying,

University, or otherwise used by The Open University as permitted by applicable law.

The Open University or its assigns.

electronic storage or use in a website), distribute, transmit or re-transmit, broadcast, modify

Edited, designed and typeset by The Open University.

ISBN 0 7492 5896 9

TA225 Block 3 Sound processes

1.1 The emergence of desktop sound 7

1.2 What’s involved in making a master recording? 8

1.2.1 Assembling the sound elements 9

1.2.2 Editing and mixing 10

1.2.3 Adding effects 11

1.3 Introduction to the Yamaha AW16G Professional Audio

2 Getting sound in and out 14

2.1 Analogue inputs 14

2.1.1 Peak-to-peak and r.m.s. amplitudes 14

2.1.2 dBm, dBV and dBu 15

2.1.3 Analogue input sensitivities 16

2.1.5 Balanced inputs 19

2.1.6 Phantom power 20

2.2 Analogue outputs 20

2.3 Digital inputs and outputs 22

2.3.1 AES/EBU digital interface standard 23

2.3.3 Multi-channel audio digital interface 29

2.3.4 Other digital methods 30

2.4 Cables and connectors 31

2.5 Inputs and outputs provided on the AW16G 34

2.5.3 Digital I/O 35

3.1 Current digital storage systems 37

3.2 Hard disk audio recorders 38

3.2.1 Hard disk recorders versus desktop computers 40

4 TA225 BLOCK 3 SOUND PROCESSES CHAPTER 1 DESKTOP SOUND 4

3.3 Solid state memory 41

3.3.1 Random access memory 41

3.3.2 Flash memory 42

3.4 Audio file formats 43

3.4.3 RIFF WAVE 48

3.5 Storage facilities in the AW16G 51

4.1 Getting the levels right 52

4.1.1 Compression and limiting 53

4.1.2 Expansion and gating 55

4.2 Editing processes 57

4.2.1 Analogue techniques 57

4.2.2 Digital techniques 58

4.3 Editing facilities in the AW16G 60

5.1 Analogue mixing techniques 61

5.2 Digital mixing techniques 62

5.3 Controlling the level 63

5.4 Mixing facilities in the AW16G 64

6.2 Echo and reverberation 74

6.3 Flanging and chorus 78

6.4 Pitch and tempo 81

6.4.1 Changing pitch 81

6.4.2 Changing tempo 82

6.5 Other effects 84