THE COMPUTER MUSIC AND DIGITAL AUDIO SERIES John Strawn, Founding Editor James Zychowicz, Series Editor Digital Audio Signal Processing Edited by John Strawn Composers and the Computer Edited by Curtis Roads Digital Audio Engineering Edited by John Strawn Computer Applications in Music: A Bibliography Deta S. Davis The Compact Disc Handbook Ken C. Pohlman Computers and Musical Style David Cope MIDI: A Comprehensive Introduction Joseph Rothstein William Eldridge, Volume Editor Synthesizer Performance and Real-Time Techniques Jeff Pressing Chris Meyer, Volume Editor Music Processing Edited by Goffredo Haus Computer Applications in Music: A Bibliography, Supplement I Deta S. Davis Garrett Bowles, Volume Editor General MIDI Stanley Jungleib Experiments in Musical Intelligence David Cope Knowledge-Based Programming for Music Research John W. Schaffer and Deron McGee Fundamentals of Digital Audio Alan P . Kefauver The Digital Audio Music List: A Critical Guide to Listening Howard W. Ferstler The Algorithmic Composer David Cope The Audio Recording Handbook Alan P . Kefauver Cooking with Csound Part I: Woodwind and Brass Recipes Andrew Horner and Lydia Ayers Hyperimprovisation: ComputerInteractive Sound Improvisation Roger T. Dean Introduction to Audio Peter Utz New Digital Musical Instruments: Control and Interaction Beyond the Keyboard Eduardo R. Miranda and Marcelo M. Wanderley, with a Foreword by Ross Kirk Fundamentals of Digital Audio New Edition Alan P . Kefauver and David Patschke

Volume 22


Alan P . Kefauver and David Patschke


A-R Editions, Inc.
Middleton, Wisconsin

4. -. Music—Computer programs. Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 1.(Computer music and digital audio series) ISBN 978-0-89579-611-0 1. David. Title. Sound--Recording and reproducing--Digital techniques.389'3--dc22 2007012264 A-R Editions. p. I.K4323 2007 621.Library of Congress Cataloging-in-Publication Data Kefauver. Middleton. Inc. Fundamentals of digital audio / By Alan P. Wisconsin 53562 © 2007 All rights reserved. TK7881. II. cm. Kefauver and David Patschke. Patschke. Music—Data processing. 2. Alan P..New ed. . -.

Contents List of Figures Preface to the New Edition Chapter One The Basics Sound and Vibration The Decibel 9 16 19 The Analog Signal Synchronization 1 ix xiii 1 Chapter Two The Digital PCM Encoding Process Sampling 23 27 32 Quantization 23 Analog-to-Digital Conversion Chapter Three The Digital Decoding Process Data Recovery 45 Error Detection and Correction Demultiplexing Sample and Hold Output Amplifier 50 50 53 53 56 Digital-to-Analog Converter Reconstruction Filter 47 45 .

4mm.Chapter Four Other Encoding/Decoding Systems Higher-Bit-Level Digital-to-Analog Converters Oversampling Digital-to-Analog Converters Oversampling Analog-to-Digital Converters One-Bit Analog-to-Digital Converters Direct Stream Digital (DSD) 61 62 59 57 57 59 57 High Definition Compact Digital (HDCD) Chapter Five Data Compression Formats Lossless Compression Lossy Compression 65 64 63 Chapter Six Tape-Based Storage and Retrieval Systems Rotary Head Tape Systems Record Modulation Systems 76 81 86 84 Digital Audiotape (DAT) Systems Multitrack Rotary Head Systems 75 8mm. and Digital Linear Tape (DLT) Storage Systems 91 Fixed-Head Tape-Based Systems 92 Chapter Seven Disk-Based Storage Systems Optical Disk Storage Systems 101 Magnetic Disk Storage Systems (Hard Disk Drives) Solid State (Flash) Memory 125 101 120 Chapter Eight Digital Audio Editing Tape-Based Editing Systems Disk-Based Editing Systems 127 130 133 127 Personal Computers and DAWs .

Input/Output Channels.Chapter Nine The Digital Editing and Mastering Session Tracks. and Tracks The Editing Session 152 161 The Multitrack Hard-Disk Recording Session 149 147 Chapter Ten Signal Interconnection and Transmission Electrical Interconnection Optical Interconnection Digital Audio Broadcast 167 170 174 167 Glossary Further Reading/Bibliography Index 177 185 187 .


A multiplexer block schematic. The sample-and-hold process.6 Figure 2.2 Figure 1. Waveform sampling and the Nyquist frequency. Waveform sampling and aliasing.2 Figure 2. Block diagram of a digital-to-analog converter.3 Figure 2.8 Chapter 2 Figure 2.8 Figure 2. The inverse square law.13 Chapter 3 Figure 3. Typical sound pressure levels in decibels.6 Figure 1.4 Figure 1. Waveform sampling at a faster rate. Interleaving. Comparison of quantization numbering systems. A musical scale. Filter Schematics.11 Figure 2. Voltage assignments to a wave amplitude. The envelope of an audio signal.3 A sound source radiating into free space.1 Figure 1. Offset binary and two’s complement methods.List of Figures Chapter 1 Figure 1. The musical overtone series.7 Figure 2.9 Figure 2.4 Figure 2. ix . Reshaping the bit stream. A professional sound level meter (courtesy B and K Corporation).2 Figure 3.1 Figure 3.12 Figure 2.5 Figure 1.7 Figure 1. A waveform and different types of digital encoding.3 Figure 1.1 Figure 2. Three error correction possibilities. The Robinson-Dadson equal loudness contours. Block diagram of a PCM analog-to-digital converter. Various filter slopes for anti-aliasing.10 Figure 2.5 Figure 2.

1 Figure 4. a. ATRAC codec overview. Channel coding for record modualtion. The effects of hold time on the digital-to-analog process.4 Figure 6.5 Figure 3. Track layout on an analog 3/4Љ helical scan videotape recorder. the controller. a Hi8 video recorder. b. Transport layout on a Sony PCM-3324A digital tape recorder. Tape wrap on a DAT recorder’s head drum.11 a. a 48track multitrack digital tape recorder (photo courtesy Sony Corp. . Reconstruction of the audio signal.and 24-track digital tape recorder. Track layout on a DASH 48. A Hi8mm-based eight-channel multitrack recorder (courtesy Tascam) showing a. PCM and PWM (1-bit) conversion. A dual slope–integrating converter.4 Figure 5. the main unit and b. b.1 Figure 6. A block diagram of lossless encoding and decoding.9 Figure 6.6 Figure 3.5 Figure 5.6 Figure 5. An S-VHS-based eight-channel multitrack recorder (courtesy Alesis). Cross-fading between the read and write heads on a digital tape recorder. Perspective view of tape wrap around a video head drum.2 Chapter 5 Figure 5.). A block diagram of an MPEG-1 Layer I (MP1) encoder. an 8mm and b. A block diagram of an AAC encoder. The effect of oversampling digital-to-analog converters. track layout on a DAT recorder.6 Figure 6. A block diagram of an AC-3 encoder.2 An 8-bit weighted resistor network converter.7 Chapter 6 Figure 6. A block diagram of an MPEG-1 Layer III (MP3) encoder. Frequency allocation for a.1 Figure 5.3 Figure 6. top view of tape wrap around a video head drum showing details. Figure 6.7 Figure 6.x FUNDAMENTALS OF DIGITAL AUDIO Figure 3. a.4 Figure 3. Cross-fading between the two write heads on a DASH digital tape recorder. Compression/decompression in the digital audio chain.3 Figure 5. b. a.8 Figure 6. PCM-1630 processor and associated DMR-4000 U-matic® recorder (courtesy Sony Corp.5 Figure 6. b.7 Chapter 4 Figure 4.2 Figure 5.10 Figure 6.).

). Notice the FireWire connections (photos courtesy Tascam). a.4 Figure 7. a 500-Gb RAID Level-1 configuration diagram. Dust particles on the substrate are out of focus.10 Figure 7.6 . A 1-Terabyte RAID Level-0 configuration diagram. and Memory Stick (courtesy SanDisk).1 Figure 8. The recording system for the MiniDisc.13 Figure 7.12 Chapter 7 Figure 7.3 Figure 7.8 Figure 7. Note how pit length defines the repeated zeroes. S/PDIF. MiniDisc specifications. Light reflected directly back maintains the 0 series while the transition scatters the beam. The compact disc pressing process.LIST OF FIGURES xi Figure 6. a.4 Figure 8.5 Figure 8. and analog inputs and outputs (courtesy Tascam).1 Figure 7.14 Chapter 8 Figure 8.9 Figure 7.).5 Figure 7. A computer hard disk showing sectors and tracks. b. A professional digital editor for tape-based digital recorders (photo courtesy Sony Corp. Super CD.6 Figure 7. Pit spacing. the back panel of the same unit. Compact Disc Interactive (CD-i) audio formats. Figure 7. The front panel of a CD-R machine. The assemble edit process. CompactFlash. the rear of the same machine showing the RS-232 interface as well as the ES/EBU.12 Figure 7. from left to right: Secure Digital. The front panel of an 8-channel A-to-D (and D-to-A) converter. length and width for the compact disc. a. Various flash-memory products. denoting a change. b. A software instrument plug-in (specifically.11 Figure 7. A professional magneto-optical recorder (courtesy Sony Corp. a Hammond B3 emulator/ virtual instrument) for use with a MIDI controller on a computer-based digital audio workstation. The playback system for the MiniDisc.2 Figure 8.2 Cross-fading between the read and write heads on a ProDigi digital multitrack recorder. Compact disc specifications. b.7 Figure 7.3 Figure 8. A compressor software plug-in for a computer-based digital audio workstation. A computer hard disk stack showing the concept of cylinders.

8 Figure 9.10 Figure 8. even).4 Figure 9. Editing screen from Steinberg Nuendo software. selecting the portion of material to crossfade.10 Figure 9. An edit window in Digidesign Pro Tools software showing a.12 Figure 9.6 Diagram of a computer-based digital audio workstation. A portion of 2-track (stereo) material selected in Digidesign Pro Tools software. Editing screen from Apple Logic Pro software. An example of a stand-alone DAW unit (photo courtesy Tascam).5 Figure 9.8 Figure 8. Figure 9.7 Figure 8.1 Frequencies and their uses. and determining the precise options for the fade. An example of an input matrix for Digidesign Pro Tools software. A multi-channel A-to-D (and D-to-A) converter that connects directly to a processor card located in a computer-based DAW (photo courtesy Avid/Digidesign).11 Figure 9. Figure 9.xii FUNDAMENTALS OF DIGITAL AUDIO Figure 8. The crossfade editor in Apple Logic Pro software. A reverb software plug-in for a software-based DAW. Note that although there are 18 tracks (some of them stereo. The crossfade editing process in Digidesign Pro Tools software: a. b. and c.14 Figure 9. Mixing window from Digidesign Pro Tools software. b.9 Figure 8.1 Figure 9.2 Figure 9. An equalization software plug-in for a software-based DAW.3 Figure 9. b. An output matrix example for the same. The resulting waveform after the fade. A multitrack project opened in Digidesign Pro Tools software.7 Figure 9.11 Chapter 9 Figure 9. .9 Figure 9. maximum magnification of the waveform for precise selection. b. A DAW processor card for a PCI slot in a personal computer (photo courtesy Avid/Digidesign). nominal resolution of a waveform. An example of a control surface to manually adjust DAW software (courtesy Tascam).15 Chapter 10 Figure 10. in this example the hardware supports only up to 8 analog outputs and inputs. A magnetic tape splice with a linear slope. “zooming in” on a portion of the waveform. a. a.13 Editing screen from Digidesign Pro Tools software. that same portion being duplicated and dragged into a new stereo track. The cross-fade and -3dB down point of a digital edit. Mixing window from Apple Logic Pro software.

gates and all manner of sound modification devices. and label. Another wall was filled with large multitrack tape recorders. Audio and music today exist almost exclusively in digital form for three reasons: reduced manufacturing costs. worldwide network of computers through which increasing numbers of people get their information and entertainment and do increasing amounts of their business. gates. with just enough space left for a couple floor-to-ceiling racks full of graphic equalizers. equalizers. In addition. I remember spending $200 for 256 kilobytes of memory for my first computer (an 8 MHz model). I’m remembering the good old days as I sit on my deck. Whether because of the demand for quality and portability or because of its leaner data requirements. audio was at the forefront of the digital entertainment revolution. Don’t get too comfortable with your xiii . reverb units. Learning the fundamentals won’t give you insight for making subjective production decisions. Today. high quality digital audio recording is now accessible to almost everyone. The Internet you know about: a sprawling. Once available only to the most elite recording institutions. the price-per-byte of memory continues to fall similarly from year to year. and the capability to record or export my finished audio programs in any number of professional formats. that same $200 would buy about two gigabytes—almost an 8.Preface to the New Edition Back when I was in school—and it wasn’t so long ago—being in a recording studio was an immense (and immersive) experience.000x increase. Some people do well with this accessibility and others do not. and the Internet. when high-quality equipment and restricted distribution dictated what would be recorded and how it would be heard. and all manner of sound modification devices. noise reduction. What all this means is that even off-theshelf computers today are more than adequate to record. but it will give you a solid basis for making them. edit. process. Moore’s Law goes something like this: Computers double in speed for the same cost every 18 months. Humongous mixers filled whole rooms. Consequently. engineer. reverb units. today anyone can be an artist. typing this on my laptop computer that has built into it a humongous mixer. unlike years ago. and listen to multiple channels of digital audio. Moore’s Law.

. And that is what this book hopes to accomplish. because the one fact about this industry is that everything will continue to change. and faster than you would prefer. That makes understanding the basic concepts and principles underlying your equipment all the more important.xiv FUNDAMENTALS OF DIGITAL AUDIO new high tech equipment and skills though. like it or not.

Velocity The speed of energy transfer equals the velocity of sound in the described medium.130 feet per second (expressed metrically as 344 meters per second). is illusive. envelope. The air itself does not move. but the movement of individual air particles is transferred from one particle to the next in much the same way that waves move on the surface of the sea. It is probably easiest to visualize the creation and movement of soundwaves by imagining a large drum begin struck: the head (or skin) of the drum moves in and out after it is struck. understanding how analog sound is handled and converted to and from the digital environment is impossible. The sound energy from the sound source is transferred to the carrying medium (in this case air) as air compressions (point b) and rarefactions (point c). a sound wave radiates omnidirectionally away from itself. and phase. and defining it in concise terms is not easy. The velocity of sound in air at sea level at 70 degrees Fahrenheit is 1. For example. and harmonic structure. Sound. by its nature.ONE The Basics SOUND AND VIBRATION Sound has been discussed many times and in many ways in almost every book written about recording and the associated arts. Because we discuss sound as a wave later in this book. Yet. sound travels 1 . At the source of the sound (point a). now may be a good time to visualize a sound wave in the air. without a thorough knowledge of the nature of sound. wavelength. Figure 1. The Basic Characteristics of Sound The major components of sound are frequency. amplitude. causing subsequent air compressions and rarefactions emanating away from the drum. Other components are velocity.1 shows a sound wave traveling through free space. The velocity of sound depends on the medium through which the sound travels.

the velocity of sound depends on the density of the medium. For example. as air temperature increases. Even in air. causing sound to travel faster. Rarefaction FIGURE 1. the velocity of sound in air rises about 1 foot per second for every degree the temperature rises. Compression c. In fact.2 FUNDAMENTALS OF DIGITAL AUDIO a.1 A sound source radiating into free space.4 ϩ ˚F . through steel at a velocity of about 16. Sound source b.500 feet per second. air density drops. The formula for the velocity of sound in air is: V ϭ 49 ͱ 459.

e. the frequency becomes higher.58 feet. the sound pressure level over and under the reference pressure) is the wavelength of the sound that is produced. the frequency of the sound is important. Wavelength ϭ Velocity Frequency However. the frequency is 440 cycles per second. we’ll assume generally that the range of human hearing is approximately 20 Hz to 20. and the student is strongly advised to pursue the study of music if preparing for a career in the recording arts. the period of this wave is 1/440 of a second. it is easy to determine the frequency when the wavelength is known. As mentioned above. The length of the sound wave is itself not that useful for our purposes.” that is. can be defined by the formula Period ϭ 1 Frequency Therefore. Although sounds exist both higher and lower and range varies from person to person. The simple formula can be changed to read: Frequency ϭ Velocity Wavelength Thus.1 that as sound moves farther from its source. Wavelength The distance between successive compressions or rarefactions (i. This sound. this skill is essential to becoming a competent recording engineer. The A that is located on the top line . the A above middle C on the piano. with a frequency of 440Hz and a period of 1/440 of a second. or 440 hertz (Hz). The period of the wave. if the distance from compression to compression is 2. its waves become less spherical and more planar (longitudinal).000Hz (20kHz). Because we know the velocity of sound. As the period becomes shorter. the note A has a frequency of 440Hz (this is the note occupying the second space of the treble-clef staff). or the time it takes for one complete wave to pass a given point.THE BASICS 3 Note in Figure 1. is referred to in musical terms as “A. Musical Notation and Frequency It is beyond the scope of this book to teach the ability to read musical notation. However..

fourths.g. When an object (e. Harmonic Content Very few waves are a pure tone (i. Each of these harmonics changes amplitude slightly over time. a bell) is struck. 65.2 shows a musical scale with the corresponding frequencies. Many other harmonics at varying amplitudes are produced. 325. a fifth above the second C. 390. three major seconds. and a minor Š ÝŁ C 130 Hz Ł Ł D Ł Ł Ł Ł Ł Ł Ł Ł E Ł F Ł G Ł A 220 Hz Ł B c 260 Hz d e f g a 440 Hz b c1 520 Hz FIGURE 1. 455. fifths. 520. Although an oboe and a clarinet produce the same note with the same fundamental frequency. followed by other frequencies at varying amplitudes. a tone with no additional harmonic content). and so on.. The fundamental resonance tone of the bell is heard first. in hertz.2 A musical scale. adding to the individual characteristics of the sound. only a sine wave is a true pure tone. which is a G. Therefore. another minor third. and the combination of these harmonies.4 FUNDAMENTALS OF DIGITAL AUDIO of the bass clef staff is an octave below 440Hz and has a frequency of 220Hz. followed by another doubling that is heard as a musical interval of a fifth. 260. an overtone series based on the note C is. 715. gives sound its specific timbre. An octave relationship is a doubling or halving of frequency. For example. These harmonics are arranged in a relationship called the overtone series. In fact. a perfect fourth. a bell with a fundamental frequency of 64Hz produces harmonics of 128Hz and 192Hz. several tones at different frequencies are produced. The next tone is a doubling of the fundamental frequency (an octave) above that. a minor third. or tone coloration. or overtones.. The overtone series is most often notated in the musical terminology of octaves. 585.e. the number of overtones and the amplitude of each differs. 650. In musical terms this is a series of the fundamental followed by an octave. a perfect fifth. a major third. 130. . and so on. The difference in harmonic content makes an oboe sound different from a clarinet. Figure 1. 195. depending on the metallic composition of the bell. but actually corresponds to the addition of the fundamental frequency.

The third harmonic is the first non-octave relationship above the fundamental (it is the fifth).3 The musical overtone series. any distortion in this harmonic tone is often detected before distortion is heard in the tone a fifth below (the second harmonic). These are shown in Figure 1. The attack is the time that it takes for the sound generator to respond to whatever has set it into 20-millisecond range) than wind instruments (in the 60.THE BASICS 5 second (see Figure 1. Confusion often exists concerning this difference in terminology.3). In general. . The four main parts of the sound envelope are (1) the attack. The drop in amplitude. Different 200-millisecond range). and as such. (3) the sustain (or internal dynamic).4. with their different masses. and (4) the final decay (often referred to as the release). How an object is set into vibration also affects its attack time. (2) the initial decay. The attack time of an instrument can be represented by an equivalent frequency based on the following formula: ⌬T ϭ 1 Frequency Š Ý Ł Ł Ł −Ł Ł Ł Ł Ł Ł Ł Ł 4th harmonic 3rd harmonic 2nd harmonic Fundamental FIGURE 1. depending on whether the instrument is bowed or plucked. Many analog audio products list the percentage of third harmonic distortion found (because it is the most audible) as part of their specifications. String attack times vary. The Sound Envelope All sounds have a start and an end. called a decay. A softly blown flute has a longer attack time than a sharply struck snare drum. have different attack times. often occurs after the start of the sound and is followed by a sustain and a final drop in amplitude before the event ends completely. struck instruments have an attack time that is much faster (in the 1. The frequency that is twice the frequency of the fundamental is called the second harmonic even though it is the first overtone.

or tonal change. is caused by the cessation of the force that set the tube. or medium into vibration. which occurs immediately after the attack on most instruments. The initial decay. It is the change in amplitude between the peak of the attack and the eventual leveling-off of the sound. The length of the sustain .4 The envelope of an audio signal. string. The prudent engineer remembers this when applying equalization. known as the sustain. Rearranging this formula gives: Frequency ϭ 1 ⌬T This means that ifi an instrument has an attack time of 1 millisecond. This fact is important to remember when trying to emphasize the attack of an instrument. to a signal. the equivalent frequency is 1 kilohertz (kHz).6 FUNDAMENTALS OF DIGITAL AUDIO Initial attack Sustain Initial decay Final decay FIGURE 1.

even at half-stick (i. Most often. and the amplitude decays exponentially until it is no longer audible.g. When you consider the fact that a piano.” types of office space are equipped to send constant low-level random wide-band noise through loudspeakers in their ceilings... The greatest amount of masking occurs above the frequency of the masking signal rather than below it. loud sounds mask soft ones. Simply stated. the air column in the instrument ceases to vibrate. These curves relate our perception of loudness at varying frequencies and amplitudes and apply principally to single tones. This masking occurs within the basilar structure of the ear itself. and the greater the amplitude of the masking signal. How many times have you sat in a concert hall listening to a recital in which one instrumentalist drowned out another? Probably more than once. produces 6 milliwatts. when a trumpet player holds a whole note) or on how long the medium continues to vibrate (medium resonance) before beginning the final decay.. it is easy to understand why the violinist cannot be heard all the time. not all frequencies or harmonics decay at the same rate. with the lid only partially raised). the high-frequency components of the sound decay faster than the low-frequency ones.THE BASICS 7 varies. many newer buildings that use modular.e. In fact. The louder tone causes a loss in sensitivity in neighboring sections of the basilar membrane. the wider the frequency range masked. or “carrel. dense sound spectrum (i. However. but what if one instrument is playing a C and another an A? Studies by Zwicker and Feldtkeller have shown that even a narrow band of noise can mask a tone that is not included in the spectrum of the noise itself.e. Masking Many references can be found in the literature about the equal loudness contours (discussed later in this chapter) developed by Fletcher and Munson and later updated by Robinson and Dadson. depending on whether the note is held by the player for a specific period of time (e. whereas a violinist. a low-frequency sine wave can easily mask. a higher sinusoidal note that is being sounded at the same time. . This causes a change in the sound’s timbre and helps define the overall sound of the instrument (or other sound-producing device). depending on the vibrating medium. For example. However. Final decays vary from as short as 250 milliseconds to as long as 100 seconds. at best. It makes perfect sense that a loud instrument will cover a softer one if they are playing the same note or tone. The masking sound needs to be only about 6 decibels higher than the sound we want to hear to mask the desired sound. or apparently reduce the level of. can produce a level of around 6 watts. a note with a rich harmonic structure). that sound will usually mask sounds that are less complicated (less dense). which occurs when the sound is no longer produced by the player or by the resonance of the vibrating medium. when a note is produced by an instrument that has a complicated. As the trumpet player releases a held note.

amplitude. as the signal rises above a set threshold. The ability to localize a sound source depends on using both ears. the diffraction effect is greater and the sound attenuated at the farther ear. low-frequency signals often appear to be omnidirectional because of this effect. The level of the masking signal is usually around 45 decibels. when the wavelength is short. also known as the precedence effect. which is discussed later) effectively provides speech privacy among adjacent spaces. often referred to as binaural hearing. there is a perceptual time difference factor as well. If they play . This effect is used also by some noise reduction systems when. However. When the wavelength of sound is long. In fact. Localization A person with one ear can perceive pitch. envelope. one slightly to the right at a distance of 5 feet and another slightly to the left at a distance of 10 feet. Consider the following example. the diffraction effect is minimal and the comparative amplitude at each ear about the same. We can say that high frequencies (above 1kHz) are localized principally by amplitude and time differences between the two ears. The Haas Effect and the Inverse Square Law Although much has been said and written about the Haas Effect. depending on the frequency of the sound being perceived. Because the sound has to travel farther relative to wavelength. then. we can say that the sound we hear first defines for us the apparent source of the sound. the masking effect is a critical part of the data-reduction systems used in some of the digital audio storage systems discussed later in this book. However. In addition. we can say that low frequencies are located by intensity and phase differences. and harmonic content but cannot determine the direction from which the sound originates. this creates phase differences between the two ears. You may have noticed that it is easier to locate high-frequency sounds than low-frequency ones. allowing the brain to compute the relative direction of the sound. these time differences are minimized. Two trumpet players stand in front of you. Several factors are involved in binaural hearing. With longer wavelengths the time-of-arrival differences are less noticeable because the ratio of the time difference to the period of the wave is large.8 FUNDAMENTALS OF DIGITAL AUDIO This masking signal keeps the conversations in one office carrel from intruding into adjacent office space. and this (plus the inverse square law. do we localize low frequencies? With all sound there is a measurable time-of-arrival difference between the two ears when the sound is to one side or the other. How. The ears are separated by a distance of about 6 1/2 or 7 inches so that sound waves diffract around the head to reach both ears. Therefore. processing action is reduced or eliminated. As the sound moves to a point where it is equidistant from both ears. and the inverse square law.

Also referred to in this book is the threshold of feeling. but it is not. the sound levels will balance. where there are no nearby reflecting surfaces).THE BASICS 9 the same note at the same amplitude. This amplitude difference is due to the inverse square law. Actually. you perceive the nearer player to be closer. Beyond a delay of 30 milliseconds a discrete echo was perceived. Although the levels have been equalized. the amplitude of the delayed source had to be 10 decibels louder than the signal from the nondelayed source for the two sounds to be perceived as equal. but your ear-brain combination will insist that player B (on the right) is closer to you than player A. you will localize the sound to the right because it is louder. He found that where the delay was greater than 5 but less than 30 milliseconds.00002 dynes per square centimeter (dynes/cm2 ). referenced to power or intensity in watts. Haas used two sound sources and.e. but one that was conspicuously absent was amplitude. a decibel is one tenth of a Bel. This is called the threshold of hearing.. 0dB is the lowest sound pressure level that an average listener with normal hearing can perceive. so for convenience it is divided into 10 equal parts and prefaced with deci to signify the one-tenth relationship. A typical professional sound level meter is shown in Figure 1. and prior to 5 milliseconds the level needed to be increased incrementally as the delay lengthened. which states that for every doubling of distance there is a 6-decibel loss in amplitude in free space (i.” This means that the ear-brain combination integrates the very short time differences between the two ears. which. equals 0. THE DECIBEL Earlier in this chapter we discussed the basic characteristics of sound. If player A (on the left) increases his amplitude by 6 decibels. while delaying one. to a ballistic measuring instrument. the players would appear to be equidistant from you. Therefore. The unit of measure that is normally used to define the amplitude of sound levels is the decibel (dB).000000000001 watts per square meter (W/m2 ). According to Haas. . “Our hearing mechanism integrates the sound intensities over short time intervals similar. as it were. Now suppose that both players sustain their notes. whereas a quiet spot in the summer woods can be a tranquil 30dB. Most good sound level meters are capable of measuring in the range of 0dB to 140dB. You would think that as long as the levels are identical at both ears. The 0dB reference level corresponds to a sound pressure level of 0. A Bel is a large. asked a subject to vary the loudness of the other source until it matched the sound level of the delayed sound. You might think that the 0dB level is the total absence of sound. rather cumbersome unit of measure. A large commercial aircraft taking off can easily exceed 130dB. which is typically measured as an intensity of 1W/m2.5. causing the sound with the shortest timing differences to appear louder and therefore closer. A sound level meter measures sound in the environment.

is the threshold of hearing. which. when we discuss sound levels in an acoustic environment. looks like this: .5 A professional sound level meter (courtesy B and K Corporation). the decibel must have a reference. the decibel is defined as 10 times the logarithmic relationship between two powers. The formula.10 FUNDAMENTALS OF DIGITAL AUDIO FIGURE 1. We can use this formula to define the amplitude range of human hearing by substituting the threshold of hearing (0.000000000001W/m2 ) for the reference power and using the threshold of feeling (1W/m2) as the measured power. In fact. The formula for deriving the decibel is: dB ϭ 10log PowerA Power B ΂ ΃ where PowerA is the measured power and PowerB the reference power. As you can see. with the proper values inserted.

any 2-to-1 power relationship can be defined simply as an increase in level of 3dB.THE BASICS 11 10log 1W/m2 ϭ 120dB 1ϫ 10Ϫ12 W/m2 Therefore. Figure 1. For example. the ear is not equally sensitive to sound pressure at all frequencies. For the tone to be perceived by the same listener at the same loudness level when the frequency is lowered to 400Hz. Recall from your high school physics class that the area of a sphere is determined by the formula a 4πr2.. the movement of air in the room can be 30dB or more above the threshold of hearing at one or more frequencies. found in our everyday environment. you must remember that sound radiates omnidirectionally from the source of the sound. However. noise is all around us. Whether the increase in power is from 100W to 200W or from 2. Figure 1. The inverse square law was mentioned earlier in this chapter.000W to 4. the amplitude of the tone must be raised about 12dB. The formula for the inverse square law is: 2 2 2 Level drop ϭ 10log r ϭ 20log r ϭ 6dB r1 r1 ΂΃ ΂΃ where r1 equals 2 feet and r2 equals 4 feet. Figure 1. It follows that when a source radiates to a point that is double the distance from the first.000W. Equal Loudness Contours Ambient. the level drops to 94dB. it radiates into four times the area instead of twice the area. twice the original distance).000Hz and 5. The ear is most sensitive to the frequency range between 3. the average dynamic range of the human ear is 120dB. the level must be raised nearly 45dB to be perceived at the same volume. If we lower the frequency to 40Hz. However. You might surmise that doubling distance would cause a sound level loss of one half. related to the threshold of hearing.8 shows the equal loudness contours developed by Fletcher and Munson and updated later by Robinson and Dadson. Note also the level called the threshold of pain. the sound pressure level is 100dB. Even in a quiet concert hall. Note that at a distance of 2 feet. A tone that is heard just at the threshold of hearing (1 x 10-12W/m2) at a frequency of 4. or -3dB. It is interesting to note that other values can be obtained using the power formula. or background. That is.000Hz.000Hz corresponds to a sound pressure level of 0dB.7 shows this phenomenon. . we find that the result is 3dB. if we use a value of 2W in the measured power spot and a value of 1W in the reference power spot.6 shows typical sound pressure levels.e. When the listener moves to a distance of 4 feet (i. This causes a 6dB loss of level instead of a 3dB loss. the increase in level is still 3dB.

6 Typical sound pressure levels in decibels. .12 FUNDAMENTALS OF DIGITAL AUDIO Level in dB 180 170 160 150 140 Rocket engines Jet engine (close up) Threshold of pain Jackhammer Threshold of feeling Full symphony orchestra @FFF Subway train Heavy truck traffic Danger Levels 130 120 110 100 90 80 70 60 50 40 30 20 10 0 Average full symphony orchestra Power lawn mower Average factory Chamber orchestra Average conversation Average office Subdued conversation Quiet concert hall Quiet recording studio Dripping water Threshold of hearing FIGURE 1.

equal to 0 phons. Figure 1.5 has several weighting networks so that at different sound pressure levels the meter can better approximate the response of the human ear. The 90-phon curve shows a variation of 40dB. The sound level meter in Figure 1. and 100phon curves of the equal loudness contours. The contours are labeled phons. you can see that the discrepancies between lows. which range from 10 to 120.8. whereas the 20-phon curve shows one of nearly 70dB. the phon is a measure of equal loudness.7 The inverse square law.” which stands for “minimum audible field. Therefore.000Hz) need to be raised as well.8 has a curve labeled “MAF. Looking at Figure 1.” This curve. low frequencies must be raised substantially in level to be perceived at the same loudness as 1kHz.THE BASICS 13 r1 = 2 feet r2 = 4 feet r1 Sound source r2 FIGURE 1.000Hz the phon level is the same as the sound pressure level in decibels. The A. at 1. Note that at low sound pressure levels. . and mid tones are reduced as the level of loudness increases. although not as much. B. Frequencies above 5kHz (5. and C weighting networks correspond to the 40-. defines the threshold of hearing. 70-. highs.

the formula used to define the decibel applied logarithms.000 10.” Anyone involved in the audio engineering process needs to understand logarithms.000 5. Logarithms As you may have noticed.14 FUNDAMENTALS OF DIGITAL AUDIO 120 120 110 Loudness Level (phons) 100 100 90 2 Sound Pressure Level (dB re 20 N/m 80 80 70 60 60 50 40 30 40 20 10 0 MAF 20 10 20 100 Frequency (Hz) 1. abbreviated “log. a logarithm of a number is that power to which 10 must be raised to equal that number—not multiplied. In brief. The shorthand .8 The Robinson-Dadson equal loudness contours. but raised.000 FIGURE 1.

Therefore. but we will leave those problems to the mathematicians.000 Numbers whose value is less than 1 can be represented with negative exponents.000 109 = 1. for large numbers. and 1012 ÷ 109 = 1012-9 = 103.001 = 10 -3 0. Therefore. indicates how many times the number is to be multiplied by 10. 1 and 10 and 10 and 100. appears in the formula as 1 x 10-12W/m2.000001 = 10 -6 Because we are talking about very large and very small numbers. for example. 1.01 = 10 -2 0.1 = 10 -1 0.000 cycles per second = 103 hertz (or 1 kilohertz [1kHz]) 106 hertz = 1 megahertz or (1MHz) 109 hertz = 1 gigahertz (1GHz) 1012 hertz = 1 terahertz (1THz) For small numbers. defined earlier as 0. . or simply “10 to the third”: 101 = 10 102 = 100 103 = 1. you simply add or subtract the exponents. 106 x 103 = 106+3 = 109. Today. the logarithm of any number can easily be found by either looking them up in a table or pushing the log button on a calculator. Therefore. ampere (A) = 10-3 A (or 1 milliamp [1mA]) 10-6A = 1 microamp (1␮A) 10-9A = 1 nanoamp (1nA) 10-12A = 1 picoamp (1pA) Now you can see that the threshold of hearing.000000000000W/ m2.THE BASICS 15 notation for this is 10x. Logarithms of numbers also exist between. prefix names can be added to terms such as hertz (frequency) and ampere (a measure of current flow) to indicate these exponents. where x. such as 0.000. It is also helpful to know that when powers of 10 are multiplied or divided. the exponent.000. 103 is 10 raised to the third power.

reproduce. which needs ample amplification before it can be used for any recording or reproduction. we do not need to be concerned with R at this time. and that movement within the magnetic field creates a small electrical signal. we are talking about the values of the current and voltage that pass through a cable between two pieces of equipment. A microphone has a thin diaphragm that is suspended in or attached (in some fashion. In professional audio circuits we work with voltage levels (or their corresponding digital values) instead of power levels. The beginning of this process requires a microphone. The signal is transmitted from the microphone along its cable to be amplified. respectively. or transmit sound. Ohm’s law establishes some fundamental relationships that we should be aware of. expressed in watts. the formula should read: 2 dB ϭ 10log E A E2 B ΂ ΃ ΂ ΃ To remove the squares from the voltages in the formula we can rewrite the expression as: dB ϭ 20log EA EB . Microphones generate only a tiny amount of signal (measured in volts). it first needs to be transduced into an electrical signal. P = E2/R and P = I2R. indeed. The diaphragm moves back and forth in reaction to the sound waves that pass through it. which is an electrical representation of the compressions and rarefactions of the sound wave. Reference Levels and Metering When we discuss reference levels.16 FUNDAMENTALS OF DIGITAL AUDIO THE ANALOG SIGNAL In order to record. is equal to either the square of the voltage divided by the resistance or the square of the current multiplied by the resistance in the circuit. To calculate the difference between two powers. which have been generalized for our purposes here. Power. that is. and because the resistance in the circuit is constant. depending on the type of microphone) to a magnetic field. careers) exist that detail microphones and capturing sound. we use the power formula: dB ϭ 10log PowerA Power B ΂ ΃ Because we really want to know the decibel level referenced to volts. It should be noted that entire volumes (and.

However. we use the volume unit (VU) to measure signal level through the device. then -10dBm equals 0VU. Many of our audio standards originated with “Ma Bell. There is always a standard reference level in audio circuits. a less expensive meter was needed. Zero dBm is actually the voltage drop across a 600-ohm resistor in which 1mW of power is dissipated. The dBm The standard professional audio reference level is +4dBm. as many consumer-type ones do. which needs a low impedance. . an electrical zero reference. the actual line level was +4dBm. To make this easier. As was just mentioned. we know that the dBm is referenced to an impedance of 600 ohms.THE BASICS 17 Note now that any 2-to-1 voltage relationship will yield a 6dB change in level instead of the 3dB change of the straight power formula. When the meter was raised to 0VU (volume units). they would be very large and difficult to read. The Volume Unit If the meters on the recording devices that we use to store digital audio displayed signal level in dBm. This reference level. would load down the circuit it was measuring and thereby cause false readings. a 3. To compensate for the loading effect of the meter. (This was derived from the early telephone company standards. (This value is merely a convenient reference point and has no intrinsic significance. we can say that the dBu is equal to the dBm in most cases. Using Ohm’s law (P = E2/R) we find that the voltage is 0. if the device operates at -10dBm. The notation “dBu” is often found in the specifications in manuals that come with digital equipment. As the demand for more meters grew (as stereo moved to multitrack).”) However. This would be a range of around 76dB.775 volts RMS. is equivalent to the voltage found across a common resistance in the circuit. An accurate meter. Now the meter no longer affected the circuit it was measuring. a comprehensive meter would have to range from -40dBm or -50dBm to +16dBm. In most professional applications. we can compare levels by noting how many decibels the signal is above or below the reference level.6KΩ resistor was inserted in the meter path. Considering the variety of system operating levels found on recording devices today. and the dBu is used as the unit of measure.) The meters that were used on the original audio circuits when this standard was enacted were vacuum tube volt meters (VTVMs). most circuits today are bridging circuits instead of the older style matching circuits. The dBm is a reference level in decibels relative to 1 milliwatt (mW). this meter is calibrated so that an input level of +4dBm equals 0VU. Without going deeper into the subject of matching and bridging circuits. Therefore. although it read 4dBm lower.

it is correct to read it in decibels instead of volume units. The Peak Meter On average. or reaction time. which connects the three small bones of the middle ear to the eardrum. the eye would be unable to follow its rapid movement. any peak information shorter than 300 milliseconds will not be fully recognized by the meter. Analog recording uses several standards for what is called the zero level for a peak meter. Analog magnetic tape saturates slowly as levels rise. or response time. If those signals were both sent to an analog-to-digital converter. as well as the Type 1 and Type 2 IEC meters. The International Electrotechnical Commission (IEC) has defined these meters as the BBC. as fast peaks tend to be about 9dB or 10dB higher than the average level. meters designed for digital audio are calibrated differently. as you will see in Chapter 2 during the discussions of digital recording. there is no margin for error. the +4dBm machine would play back 14dB louder. On the other hand.18 FUNDAMENTALS OF DIGITAL AUDIO Consider an experiment where a calibration tone of 1kHz at 0VU is played back from an analog consumer device operating at a level of -10dBm and from a professional analog device operating at +4dBm. The BBC meters. This scaling also accommodates peak information. but this difference is due to the output gain of the playback section amplifier. Traditionally. if the peak meter reaching the input signal were allowed to fall back at the same rate. the Type 1. However. As this signal is played back from the storage medium. The ballistics. once full level is reached. The ear has an impulse response. certain types of music have shorter attack and decay times than others. The same signal played back on either a consumer or professional device will produce the reference level output at the device’s specified line level. Zero on a . where Mark 4 equals 0dBm. the digitized signal level would be the same from both machines. that is defined as a response time and a decay time of about 300 milliseconds. and. The metered value is held in a capacitor for a specified amount of time and allowed to fall back using a logarithmic response. Therefore. If we were to compare the output of the two devices. a volume indicator is calibrated so that there is some headroom above 0VU. The classic VU meter is calibrated to conform to the impulse response of the human ear. A peak meter is a device that is designed to respond to all signals (no matter how short) that pass through it and is much more suitable for digital recording. Because the peak meter is not a volume indicator. and typically 9dB of headroom above zero is allowed before reaching critical saturation levels. The VU meter is designed to present the average content of the signal that passes through it. the digital-toanalog converter outputs the calibration tone and produces a level of -10dBm or +4dBm at the device’s output. depending on the device. where 0dB equals +6dBm. and the Type 2. A peak meter will reach full scale in 10 milliseconds. of a standard DIN peak meter are designed to fulfill these requirements. where 0dB equals 0dBm. This corresponds to the reaction time of the ear’s stapedius muscle. also allow for headroom above their zero reference.

The SMPTE/EBU code is the basis for all of today’s professional video. Therefore. Without time code. but the codes were not compatible with one another. SYNCHRONIZATION Although not a basic characteristic or function of sound. when videotape made its debut. the splice point was still found by trial and error.g. as we know it today. magnetic ink that allowed you to see the recorded magnetic pulses) were developed. making the code an international standard. time code is an important part of digital systems. all bits at full value (there is no headroom). Certain techniques (e. Manufacturers have used a variety of standards.and audiotape editing and synchronization systems.. but these did not prove satisfactory. In early cases. In the meantime the prudent engineer will read the manual for the piece of equipment in use and be aware of its metering characteristics. This implies that the meter is on some kind of digital machine where 0dBfs equals full quantization level (all bits are used). electronic machine-to-machine editing was introduced. Several manufacturers introduced electronic codes to fulfill this task. In 1956. Time Code A system was needed that would uniquely number each frame so that it could be precisely located electronically. but with the advent of systems with higher quantization levels. providing precise machine control and excellent frame-to-frame match-up. To differentiate between peak meters and digital meters. However. Most professional equipment in production today uses the standard of 0VU = -18dBfs. the industry realized that the film process of cut-andsplice would not work in video. -10dBm. it is important to note that 0VU on the console (whether it is +4dBm. was developed by the video industry to help with editing. -18dB is more often used. or another level) does not equal 0dB on the digital meter. This pulse tells the head how fast to switch in a rotating-head system. -12dB was chosen as the calibration point for digital metering. Time code. that is. and some allow several decibels of level above zero as a hidden headroom protection factor.THE BASICS 19 digital meter means full quantization level. In the 1960s. Perhaps in the future a digital metering standard will be adapted that everyone can adhere to. . The images that were visible on film were not so on videotape. Another technique was to edit at the frame pulse or control track pulse located at the bottom edge of the videotape. In 1969 the Society of Motion Picture and Television Engineers (SMPTE) developed a standard code that became recognized for its accuracy. position locating and synchronization in the tape-based digital domain would be extremely difficult. That standard was also adopted by the European Broadcasting Union (EBU). the term “dBfs” is used (“fs” stands for “full scale”).

20 FUNDAMENTALS OF DIGITAL AUDIO The SMPTE time code is an addressable.. Time code does not need to start over on each reel. permanent code that stores location data in hours. 45 seconds. 56. LTC Longitudinal time code is an electronic signal that switches from one voltage to another. forming a string of pulses. 40–42 48–51. between the fields and frames of the video picture). . which can be used to store data such as the date and reel number. A typical time code number might be 18:23:45:28. The advantages of this are (1) precise time reference. 8. seconds. NTSC: 2. The data consist of a binary pulse code that is recorded on the video. Three bits are unused.400 bits per second divided by 30 frames per second equals 80 bits per frame.000 equal parts when used with the PAL/SECAM system of 25 frames per second. (2) interchange abilities among editing systems.or audiotape along the corresponding video and audio signals. 57 64–79 10 11 27 Function Frame count Second count Minute count Hour count Sync word Drop frame Color frame Field mark The remaining eight groups of 4 bits are called user bits. Each 1-second-long piece of code is divided into 2.000 bits per second divided by 25 frames per second equals 80 bits per frame. which is an audio signal that is stored on a separate audio track of the video or audio machine. Following is one frame’s 80-bit code: Bits 0–3. forming a 1⁄2-bit pulse. Notice how each system generates a code word that is 80 bits long: PAL/SECAM: 2. minutes. which is small bursts of video integrated into the main video signal and stored in the vertical blanking interval (i. 23 minutes. On analog-based tape machines. and frames. Most of these bits have specific values that are counted only if the time code signal changes from one voltage to another in the middle of a bit period.e. which can be heard as an audible warbling sound if amplified. the code can be stored in two different ways: longitudinal time code (LTC). whereas a full-bit pulse represents a digital 0. 24–26 32–35. and vertical interval time code (VITC). 9 16–19. This could be on the fortieth reel of tape. and (3) synchronization between machines. reproducible.400 equal parts when used in the NTSC standard of 30 frames per second or into 2. and 28 frames into the event. which represents a digital 1. This code number indicates a position 18 hours.

which were capable of slow-motion and freezeframe techniques. Playback and record levels should be between -10dB and +3dB (-3dB is recommended). It has a few more bits. Bit 11 is the color frame bit. This translates to 30 frames per second as opposed to 29. Frame dropping occurs at the changeover point from minute to minute.6MHz. Color frames are often locked as AB pairs to prevent color shift in the picture. Two frames are dropped every minute of every hour except in the tenth. twentieth. where +4dBm equals 370nw/m of magnetic fluxivity.58MHz.6 seconds). VITC Vertical interval time code is similar in format to LTC. fortieth. As mentioned earlier. user bits can accommodate data for reel numbers. To compensate for this difference. or any other information that can be encoded into eight groups of 4 bits. recording date. LTC being recorded on one of the videotape’s longitudinal audio tracks or on a spare track of the audiotape recorder. the drop-frame bit. which tells the system whether the color frame identification has been applied intentionally. Because VITC is recorded as an integrated part of the analog videotape recorder’s video track. Black-and-white television has a carrier frequency of 3.THE BASICS 21 Bit 10. This allows 12dB of headroom on high-output audiotape operating at a reference level of 370 nanowebers per meter (nw/m). which are similar to the codes used in digital recording systems. tells the time code reader whether the code was recorded in drop-frame or non-drop-frame format. When viewed on a monitor that permits viewing of the full video signal. Normally. The offset is 108 frames (3. The main difference between LTC and VITC is how they are recorded on tape. a defined number of frames are dropped from the time code every hour. and fiftieth minute. Time code appears similar to a 400Hz square wave with many odd harmonics. Time code is difficult to read at low speeds and during fast wind or rewind. Recording the code four times in each frame provides a redundancy error that lowers the probability of reading errors due to dropouts. Some specialized two-track recorders have a dedicated time code track between the two standard audio tracks. However. VITC can be seen as a series of small white pulsating squares at the top of the video field.97 frames per second. LTC is impossible to read. Vertical interval time code was developed for use with 1-inch-tape-width helical scan SMPTE type-C video recorders. and each of the 9 data-carrying bits is preceded by 2 sync bits. VITC is readable (as long as the video is visible on the screen) because the indexing information for each field/frame is recorded in the video signal during the vertical blanking interval. thirtieth. VITC is recorded on two nonadjacent vertical blanking interval lines in both fields of each frame. At the end of each frame there are eight cyclic redundancy check (CRC) codes. During these functions. This generates a total of 90 bits per frame. whereas color uses 3. it can be read by the rotating video heads of a helical scan .

so you should consult the excellent general texts on sound and recording cited in the list of suggested readings at the end of this book. The word clock signal is integrated in the S/PDIF and AES/EBU digital audio signals (discussed in Chapter 10). Some facilities also use a separate. which allows the destination device to correctly process the digital signal. Video technology is explained more fully in Chapter 6. however. In today’s digital video acquisition and editing environment. audio and video are now being handled digitally on the same computer system.000 th of a second (or smaller). The need to identify each field of video is addressed by embedding the time code reference during acquisition into the data for each frame. so the need for mechanical synchronization of separate tape machines is lost. An understanding of these concepts will help you with the information yet to come. We have touched only briefly on some very important areas. Word Clock SMPTE time code can only help synchronize devices up to its finite resolution— 1/30 th of a second—though on some equipment it can be used up to 1/100 of a frame. but rather it is a constant signal that sets the reference sampling rate of the source device in order to avoid data errors and maximize performance in the digital domain. then the source device would provide the reference clock to its destination. Word clock differs from time code in that it doesn’t “stamp” each sample point with another tag of data. even in freeze-frame or fast-wind modes. . the importance of internal synchronization is still present. When connecting digital devices that divide time into slices of 1/96. to dictate the word clock to all studio devices from a single source. time code is still a valuable reference. This ends our discussion of the basic characteristics of sound. a higher resolution reference is needed to ensure that all the data are being sent and received. intertwining all three elements.22 FUNDAMENTALS OF DIGITAL AUDIO recorder at all times. whether it is recorded onto tape or on a data storage device. or master word clock device. and that they are correctly interpreted into the destination device. Frequently. The time code information is part of the data stream that gets transferred concurrently with the audio and video. If no master word clock is present.

Sign up to vote on this title
UsefulNot useful