RT101-1-Intro To Audio

SCHOOL OF AUDIO ENGINEERING
Diploma in Audio Engineering
RA 101: INTRODUCTION TO STUDIO STUDIES
RA 101.1: INTRODUCTION TO AUDIO

RT 101.1
INTRODCUTION TO AUDIO
Identifying the Characteristics of Sound
Sound and music are parts of our everyday sensory experience. Just as
humans have eyes for the detection of light and color, so we are
equipped with ears for the detection of sound. We seldom take the time
to ponder the characteristics and behaviors of sound and the
mechanisms by which sounds are produced, propagated, and detected.
The basis for the understanding of sound, music and hearing is the
physics of waves. Sound is a wave which is created by vibrating objects
and propagated through a medium from one location to another. In this
subject, we will investigate the nature, properties and behaviors of
sound waves and apply basic wave principles towards an
understanding of music.
The Elements of Communication

Communication: transfer of information from a source or stimulus
through a medium to a reception point. The medium through which
the information travels can be air, water, space or solid objects.
Information that is carried through all natural media takes the form of
waves - repeating patterns that oscillate back and forth. E.g. light,
sound, electricity radio and TV waves.
Stimulus: A medium must be stimulated in order for waves of

information to be generated in it. A stimulus produces energy, which
radiates outwards from the source in all directions. The sun and an
electric light bulb produce light energy. A speaker, a vibrating guitar
string or tuning fork and the voice are sound sources, which produce
sound energy waves.
Medium: A medium is something intermediate or in the middle. In an

exchange of communication the medium lies between the stimulus and
the receptor. The medium transmits the waves generated by the
stimulus and delivers these waves to the receptor. In acoustic sound
transmission, the primary medium is air. In electronic sound
transmission the medium is an electric circuit, Sound waves will not
travel through space although light will. In space no-one can hear you
scream.
Reception/Perception: A receptor must be capable of responding to the

waves being transmitted through the medium in order for information
School of Audio Engineering Chennai 1

RT 101.1
.
to be perceived. The receptor must be physically configured to
sympathetically tune in to the types of waves it receives. An ear or a
microphone is tuned in to sound waves. An eye or a camera is tuned in
to light waves. Our senses respond to the properties or characteristics
of waves such as frequency, amplitude and type of waveform.
What is Sound?
Sound is a disturbance of the atmosphere that human beings can sense
with their hearing systems. Such disturbances are produced by
practically everything that moves, especially if it moves quickly or in a
rapid and repetitive manner. The movement could be initiated by:
Hammering (rods), plucking (string), Bowing (strings), Forced air flow
(vibration of air column - Organ, voice). Vibrations from any of these
sources cause a series of pressure fluctuations of the medium
surrounding the object to travel outwards through the air from the
source.
You should be aware that the air is made up of molecules. All matter is
made out of particles and air is no exception. As you sit there in the
room, you are surrounded by molecules of oxygen, nitrogen, carbon
dioxide and some pollutants like carbon monoxide. Normally these
particles are all moving around the room, randomly dispersed and all
roughly the same distance from adjacent particles as all of the others.
The distance between these particles is determined by the air pressure
in the room. This air pressure is mostly a result of the barometric
pressure of the particular day. If it’s raining outside, you’re likely to be
in a low pressure system, so the air particles are further apart from each
other than usual. On sunny days you’re in a high pressure system so
the particles are squeezed together.
Most of the characteristics we expect of air are a result of the fact that
these particular molecules are very light and are in extremely rapid but
disorganized motion. This motion spreads the molecules out evenly, so
that any part of an enclosed space has just as many molecules as any
other. If a little extra volume were to be suddenly added to the enclosed
space (say by moving a piston into a box), the molecules nearest the
new volume would move into the recently created void, and all the
others would move a little farther apart to keep the distribution even or
in ‘equilibrium’.

RT 101.1
.
Because the motion of the molecules is so disorganized, this filling of
the void takes more time than you might think, and the redistribution
of the rest of the air molecules in the room takes even longer. If the
room were ten feet across, the whole process might take 1/100 of a
second or so.
If the piston were to move out

suddenly, the volume of the
room would be reduced and
the reverse process would
take place, again taking a
hundredth of a second until
everything was settled down.
No matter how far or how
quickly the piston is moved, it
always takes the same time for the molecules to even out. In other
words, the disturbance caused by the piston moves at a constant rate
through the air. If you could make the disturbance visible somehow,
you would see it spreading spherically from the piston, like an
expanding balloon. Because the process is so similar to what happens
when you drop an apple into a bucket, we call the disturbance line the
wavefront. It is important to note that the particles of the medium in
this case molecules of air, do not travel from the source to the receiver,
but vibrate in a direction parallel to the direction of travel of the sound
wave. The transmission of sound energy via molecular collision is
termed propagation.
If the piston were to move in and out repetitively at a rate between 20

and 20,000 times a second, a series of evenly spaced wave fronts would
be produced, and we would hear a steady tone. The distance between
wave fronts is called wavelength.
Wave Theory
Waves are everywhere. Whether we recognize or not, we encounter
waves on a daily basis. Sound waves, visible light waves, radio waves,
microwaves, water waves, sine waves, waves on a string, and slinky
waves and are just a few of the examples of our daily encounters with
waves. In addition to waves, there is a variety of phenomenon in our
physical world which resembles waves so closely that we can describe
such phenomenon as being wavelike. The motion of a pendulum, the

RT 101.1
.
motion of a mass suspended by a spring, the motion of a child on a
swing, and the “Hello, Good Morning!” wave of the hand can be
thought of as wavelike phenomena.
We study the physics of waves because it provides a rich glimpse into

the physical world which we seek to understand and describe as
physicists. Before beginning a formal discussion of the nature of waves,
it is often useful to ponder the various encounters and exposures which
we have of waves. Where do we see waves or examples of wavelike
motion? What experiences do we already have which will help us in
understanding the physics of waves?
Waves, as we will learn, carry energy from one location to another. And if
the frequency of those waves can be changed, then we can also carry a
complex signal which is capable of transmitting an idea or thought
from one location to another. Perhaps this is one of the most important
aspects of waves and will become a focus of our study in later units.
Waves are everywhere in nature. Our understanding of the physical

world is not complete until we understand the nature, properties and
behaviors of waves. The goal of this unit is to develop mental models of
waves and ultimately apply those models in understanding one of the
most common types of waves - sound waves.
A wave can be described as a repeating and periodic disturbance that

travels through a medium from one location to another location.
But what is meant by the word medium? A medium is a substance or

material which carries the wave. A wave medium is the substance which
carries a wave (or disturbance) from one location to another. The wave
medium is not the wave and it doesn’t make the wave; it merely carries
or transports the wave from its source to other locations. In the case of a
water wave in the ocean, the medium through which the wave travels
is the ocean water. In the case of a sound wave moving from the
orchestral choir to the seats in the house, the medium through which
the sound wave travels is the air in the room.
To fully understand the nature of a wave, it is important to consider the

medium as a series of interconnected or merely interacting particles. In
other words, the medium is composed of parts which are capable of

RT 101.1
.
interacting with each other. The interactions of one particle of the
medium with the next adjacent particle allow the disturbance to travel
through the medium. In the case of the slinky wave, the particles or
interacting parts of the medium are the individual coils of the slinky. In
the case of a sound wave in air, the particles or interacting parts of the
medium are the individual molecules of air. And in the case of a
stadium wave, the particles or interacting parts of the medium are the
fans in the stadium.
Consider the presence of a wave in a slinky. The first coil becomes

disturbed and begins to push or pull on the second coil; this push or
pull on the second coil will displace the second coil from its equilibrium
position. As the second coil becomes displaced, it begins to push or pull
on the third coil; the push or pull on the third coil displaces it from its
equilibrium position. As the third coil becomes displaced, it begins to
push or pull on the fourth coil. This process continues in consecutive
fashion, each individual particle acting to displace the adjacent particle;
subsequently the disturbance travels through the medium. The medium
can be pictured as a series of particles connected by springs. As one
particle moves, the spring connecting it to the next particle begins to
stretch and apply a force to its adjacent neighbor. As this neighbor
begins to move, the spring attaching the neighbor to its neighbor begins
to stretch and apply a force on its adjacent neighbor.
Sound propagates through air as a longitudinal wave. The speed of

sound is determined by the properties of the air (or a medium to be
more specific), and not by the frequency or amplitude of the sound.
Sound waves, as well as most other types of waves, can be described in
terms of the following basic wave phenomena:
(i) Transverse waves
(ii) Longitudinal waves

RT 101.1
.
Transverse Waves
For transverse waves the displacement of the medium is perpendicular
to the direction of propagation of the wave. A ripple on a pond is an
easily visualized transverse wave.
Elasticity and a source of energy are the preconditions for periodic

motion. A pond has an equilibrium level, and gravity serves as a
restoring force. When work is done on the surface to disturb its level, a
transverse wave is produced.
Waves on a Pond
A pebble thrown into a pond will produce concentric circular ripples
which move outward from the point of impact. If a fishing float is in the
water, the float will bob up and down as the wave moves by. This is a
characteristic of transverse waves.
Longitudinal Waves
In longitudinal waves the
displacement of the medium is
parallel to the propagation of
the wave. A wave in a “slinky”
is a good visualization. Sound
waves in air are longitudinal
waves.

RT 101.1
.
A sound wave is a classic example of a longitudinal wave. As a sound

wave moves from the lips of a speaker to the ear of a listener, particles
of air vibrate back and forth in the same direction and the opposite
direction of energy transport. Each individual particle pushes on its
neighboring particle so as to push it forward. The collision of particle
No.1 with its neighbor serves to restore particle No.1 to its original
position and displace particle No.2 in a forwards direction. This back
and forth motion of particles in the direction of energy transport creates
regions within the medium where the particles are pressed together and
other regions where the particles are spread apart. Longitudinal waves
can always be quickly identified by the presence of such regions. This
process continues along the chain of particles until the sound wave
reaches the ear of the listener.
Sound is a Mechanical Wave

Mechanical waves are waves which require a medium in order to
transport their energy from one location to another. Because
mechanical waves rely on particle interaction in order to transport their
energy, they cannot travel through regions of space which are devoid of
particles. That is, mechanical waves—like sound waves, cannot travel
through a vacuum.
Sound is a mechanical wave which results from the longitudinal

motion of the particles of the medium through which the sound wave
is moving.

RT 101.1
.
Explanation
First, there is a medium which carries the disturbance from one location
to another. Typically, this medium is air; though it could be any
material such as water or steel. The medium is simply a series of
interconnected and interacting particles. Second, there is an original
source of the wave, some vibrating object capable of disturbing the first
particle of the medium. The vibrating object which creates the
disturbance could be the vocal chords of a person, the vibrating string
and sound board of a guitar or violin, the vibrating tines of a tuning
fork, or the vibrating diaphragm of a radio speaker. Third, the sound
wave is transported from one location to another by means of the
particle interaction. If the sound wave is moving through air, then as
one air particle is displaced from its equilibrium position, it exerts a
push or pull on its nearest neighbors, causing them to be displaced
from their equilibrium position. This particle interaction continues
throughout the entire medium, with each particle interacting and
causing a disturbance of its nearest neighbors. Since a sound wave is a
disturbance which is transported through a medium via the mechanism
of particle interaction, a sound wave is characterized as a mechanical
wave.
Sound is a Pressure Wave

Sound is a mechanical wave which results from the longitudinal motion
of the particles of the medium through which the sound wave is moving. If
a sound wave is moving from left to right through air, then particles of
air will be displaced both rightward and leftward as the energy of the
sound wave passes through it. The motion of the particles parallel (and
anti-parallel) to the direction of the energy transport is what
characterizes sound as a longitudinal wave.
A vibrating tuning fork is capable of creating such a longitudinal wave.

As the tines of the fork vibrate back and forth, they push on

RT 101.1
.
neighboring air particles. The forward motion of a tine pushes air
molecules horizontally to the right and the backward retraction of the
tine creates a low pressure area allowing the air particles to move back
to the left. Because of the longitudinal motion of the air particles, there
are regions in the air where the air particles are compressed together and
other regions where the air particles are spread apart. These regions are
known as compressions and rarefactions respectively. The
compressions are regions of high air pressure while the rarefactions
are regions of low air pressure. The diagram below depicts a sound
wave created by a tuning fork and propagated through the air in an
open tube. The compressions and rarefactions are labeled.
Since a sound wave consists of a repeating pattern of high pressure

and low pressure regions moving through a medium, it is sometimes
referred to as a pressure wave. The crests of the sine curve correspond to
compressions; the troughs correspond to rarefactions; and the “zero
point” corresponds to the pressure which the air would have if there
were no disturbance moving through it. The diagram above depicts the
correspondence between the longitudinal nature of a sound wave and
the pressure-time fluctuations which it creates.
The above diagram can be somewhat misleading if you are not careful.
The representation of sound by a sine wave is merely an attempt to
illustrate the sinusoidal nature of the pressure-time fluctuations. Do not
conclude that sound is a transverse wave which has crests and troughs.
Sound is indeed a longitudinal wave with compressions and
rarefactions. As sound passes through a medium, the particles of that
medium do not vibrate in a transverse manner. Do not be misled -
sound is a longitudinal wave.
Simple Harmonic Motion

When a mass is acted upon by
an elastic force which tends to
bring it back to its equilibrium

RT 101.1
.
configuration, and when that force is proportional to the distance from
equilibrium, then the object will undergo simple harmonic motion when
released.
A mass on a spring is the standard example of such periodic motion. If

the displacement of the mass is plotted as a function of time, it will trace
out a pure sine wave. It turns out that the motion of the medium in a
traveling wave is also simple harmonic motion as the wave passes a
given point in the medium.
Wave Graphs
Waves may be graphed as a function of time or distance. A single
frequency wave will appear as a sine wave in either case. From the
distance graph the wavelength may be determined. From the time graph,
the period and frequency can be obtained. From both together, the wave
speed can be determined.
Amplitude/Loudness/ Volume/Gain
When describing the energy of a sound wave the term amplitude is

used. It is the distance above or below the centre line of a waveform
(such as a pure sine wave). The greater the displacement of the
molecule from its centre position, the more intense the pressure
variation or physical displacement, of the particles, within the medium.
In the case of air medium it represents the pressure change in the air as
it deviates from the normal state at each instant.

RT 101.1
.
Amplitude of a sound wave in air is measured in Pascal which is a unit
of air pressure. However for audio purposes air pressure differences are
more meaningful and these are expressed by the logarithmic power
ratio called the Bel or Decibel (dB).
Waveform amplitudes are measured using various standards. Peak

Amplitude refers to the positive and negative maximums of the wave.
Root Means Squared Amplitude (RMS) gives a meaningful average of
the peak values and more closely approximates the signal level
perceived by our ears. RMS amplitude is equal to 0.707 times the peak
value of the wave.
Our perception of loudness is not proportional to the energy of the

sound wave this means that the human ear does not perceive all the
frequencies at the same intensity. We are most sensitive to tones in the
middle frequencies (3 kHz to 4 kHz) with decreasing sensitivity to those
having relatively lower or higher frequencies. This phenomenon will be
discussed in detail later in this text using Fletcher and Munson Curves.
Loudness and Volume are not the same: Hi-fi systems have both a
loudness switch and a volume control. A volume control is used to
adjust the overall sound level over the entire frequency range of the
audio spectrum (20Hz to 20 kHz). A volume control is not frequency or
tone sensitive, when you advance the volume control; all tones are
increased in level. A loudness switch increases the low frequency and
high frequency range of the spectrum while not -affecting the mid
range tortes.
Frequency
The rate of repetition of
the cycles of a periodic
quantity, such as a sound
wave is called frequency.
Frequency can be defined
as the number of cycles
that a periodic waveform
completes in the time of 1
(one) second.
Fig. 1

RT 101.1
.
Frequency is denoted
by the symbol ƒ, and is
measured in Hertz
(Hz) - formerly called
cycles per second (cps
or c/s) - kilohertz
(kHz), or megahertz
(MHz). The three
diagrams on the right
illustrate this. The first Fig. 2
diagram shows a
complete cycle of a sine
wave completing 1
cycle in 1 second. Its
frequency is therefore
1Hz. The second
diagram shows a sine
wave completing 5
cycles in 1 second. Its
frequency is thus 5Hz.
Fig. 3
The last diagram
shows the same sine wave completing 10 cycles in the time frame of 1
second. Its frequency is 10Hz. Likewise a 1000Hz tone will complete
1000 cycles in a second and a 20,000Hz tone complete 20,000 cycles in 1
second.
The only sound which consists of a single frequency is the pure sine
tone such as produced by+a sine wave oscillator or approximated by a
tuning fork. All other sounds are complex, consisting of a number of
frequencies of greater or lesser intensity. The frequency content of a
sound is commonly referred to as its spectrum or spectra. The
subjective sense of frequency
- is called pitch. That is, frequency is an
acoustic variable, whereas pitch is a psychoacoustic one.
Wavelength
The wavelength of a wave is merely the distance (in meters) which a
disturbance travels along the medium in one complete wave cycle.
Since a wave repeats its pattern once every wave cycle, the wavelength

RT 101.1
.
is referred to as the length of one complete wave. For a transverse wave,
this length is commonly measured from one wave crest to the next
adjacent wave crest, or from one wave trough to the next adjacent wave
trough. Since a longitudinal wave does not contain crests and troughs,
its wavelength must be measured differently. A longitudinal wave
consists of a repeating pattern of compressions and rarefactions. Thus,
the wavelength is commonly measured as the distance from one
compression to the next adjacent compression or the distance from
one rarefaction to the next adjacent rarefaction. The Greek lambda (λ)
is used as an abbreviation for wavelength
Below are some examples of various frequencies and their corresponding

wavelengths by using the formula λ = c÷ f
20H =17.2m
100Hz =3.44m
500Hz =0.688m (68.8 cm)
1000Hz =0.344m (34.4cm)
10000Hz =0.0344m (3.44cm)
20000Hz =0.0127m (1.72cm)

RT 101.1
.
Traveling Wave Relationship
A single frequency traveling wave will take the form of a sine wave. A
snapshot of the wave in space at an instant of time can be used to show
the relationship of the wave properties frequency, wavelength and
propagation speed.
The motion relationship “distance = speed of sound x time” is the key to

the basic wave relationship λ = c x T and using this can be expressed in
the standard form:
c
Speed of Sound
We have seen that in an elastic medium such as air, the pressure wave
of alternating condensations and rarefactions moves away from the
source of the disturbance. Since the movement of sound wave is not
confined to a specific direction, it should be referred to in terms of
speed. The lowercase letter c is usually used as the specific abbreviation
for speed of sound in air.
The speed of sound in dry air is given approximately by;
feet/ sec
The speed of sound at 0ºC in air at sea level is 331.4 m/s. The speed of
sound increases 0.6 m/s for every 1ºC increase in the temperature.
Likewise, the speed of sound decreases 0.6m/s for every 1ºC decrease in
the temperature.

RT 101.1
.
The speed of sound is slower at higher altitudes since the air is much
thinner. In the vacuum of space, sound cannot propagate. Factors like
pollution and humidity in air increase the speed of sound slightly.
The speed of sound is faster in denser mediums. The table to your right
compares the speed of sound in different mediums
Example 1
What would be the speed of sound be if the temperature was 20ºC?
331.4 + (0.6 x 20) = 331.4 + 12
= 343.4 m/s
Example 2
How far can a 100Hz sine wave travel in 2 seconds at 20ºC, sea level?
From d = t x c, where t = 2 sec and speed of sound = 343.4
d = 2 x 343.4
= 686.8 meters
Particle velocity
The term particle velocity refers to the
velocity at which a particle in the path of a
sound wave is moved (displaced) by the
wave as it passes. It should not be confused
with the velocity at which the sound wave
travels through the medium, which is constant unless the sound wave
encounters a different medium in which case the sound wave will
refract. If the sound wave is sinusoidal (sine wave shape), particle
velocity will be zero at the peaks of displacement and will reach a
maximum when passing through its normal rest position.
Phase
If a mass on a rod is rotated at constant
speed and the resulting circular path
illuminated from the edge, its shadow
will trace out simple harmonic motion. If
the shadow vertical position is traced as a
function of time, it will trace out a sine
wave. A full period of the sine wave will
correspond to a complete circle or 360

RT 101.1
.
degrees. The idea of phase follows this parallel, with any fraction of a
period related to the corresponding fraction of a circle in degrees.
The phase of any point on a sine wave measures that point’s distance
from the most recent positive-going zero crossing of the waveform. The
phase shift between two points on the same sine wave is simply the
difference between their two
phases.
However, phase shift is more often used to describe the instantaneous

relationship between two measured sine waves in relation to time.
The repetitive nature of sound waves allows them to be divided into
equal intervals and measured in degrees, with 360º in one full cycle.
If another sine wave of identical frequency is delayed 90º, its time
relationship to the first sine wave is said to be a quarter wave late. A
half-wave late would be 180º, and so on.
Waves that are of the same frequency which commence at the same
time are said to be in phase, phase coherent or phase correlated. In such
a case, the phase angle between the two sine waves would be 0º
Those commencing at different times are said to be out of phase, phase
incoherent or uncorrelated.
We know that when two waves of identical frequency and amplitude

cancel out each other when they are 180º out of phase. So when, there is
a phase difference and amplitude difference between them which is less
than 180° and more than 180°, you will have a certain amount of
cancellation and also addition in the amplitude of the fundamental of
the waveform. When more waveforms with different phase and
amplitude levels are added on, the waveform in slowly transformed.

RT 101.1
.
This is one of the factors which play a part in how complex waveforms
are shaped.
Most phase problems result in the loss of low frequency (bass) and
coloration of high frequencies. You can try this at home by listening to a
song you know well on your home hi-fi system. Then reverse the
positive and negative leads on one speaker and listen to the song again.
You will find that the sound seem ‘wider’ than before but has less bass
content and lacks high frequency definition.
The formula to calculate phase angle between waveforms is given by;
Phase (ø) = Time (in seconds) x Frequency x 360°
Example 1
For instance, a 250Hz wave, delayed by 1ms will be shifted in phase by
90°. How do we get this value?
Phase (ø) = 0.001s x 250x 360°= 90°
Example 2
What would be the phase shift if the same wave was delayed by
0.25ms?
Phase (ø) = 0.00025s x 250 x 360°= 22.5°
Example 3
If you want to shift two identical 500Hz waveforms with a phase
relationship of 180°, how much delay would you apply to either
waveform to be phase-shifted by that amount?
180° = (X sec) x 500 x 360°
X sec = 180°÷ 500 x 360°= 180÷ 180 000= 0.001 seconds or 1ms
Comb filter or Combing

Alteration of the frequency response of a system as a result of
constructive and destructive interference between a signal and a
delayed version of the signal is called Comb filtering. This is often
created acoustically in small rooms because of the interaction between
the direct sound and the reflected resulting in a series of peaks and
nulls in the frequency response. Plotted on a linear frequency scale,
such a comb-filter response looks like the teeth of a comb.

RT 101.1
.
The Behavior of Waves

Although it is convenient to discuss sound transmission theory in terms
of free space (or also known as free field) environment, in practice the
sound pressure waves encounters all sorts of obstacles in its path. In
addition to regular room surfaces – walls, floor, and ceiling – there may
be any number of additional objects that act as acoustic barriers. Each of
these surfaces has an effect on the sound that is eventually heard and the
effects usually vary considerably over the audio frequency range.
Reflection of Sound
The reflection of sound follows the law “angle of incidence equals

angle of reflection” as does light and other waves, or the bounce of a
billiard ball off the bank of a table. The main item of note about sound
reflections off of hard surfaces is the fact that they undergo a 180 degree
phase change upon reflection. This can lead to resonances such as
standing waves in rooms. It also means that the sound intensity near a
hard surface is enhanced because the reflected wave adds to the
incident wave, giving pressure amplitude that is twice as great in a thin
“pressure zone” near the surface. This is used in pressure zone

RT 101.1
.
microphones to increase sensitivity. The doubling of pressure gives a 6
decibel increase in the signal picked
up by the microphone. Reflection of
waves in strings and air columns are
essential to the production of resonant
standing waves in those systems.
Phase Change upon Reflection

An important aspect of the reflection
of sound waves from hard surfaces
and the reflection of strings from their
ends is the turning over of the wave
when it reflects. This reversal (180 º change in phase) is an important
part of producing resonance in strings. Since the reflected wave and the
incident wave add to each other while moving in opposite directions,
the appearance of propagation is lost and the resulting vibration is
called a standing wave.
Plane Wave Reflection

“The angle of incidence (θi) is equal to the angle of reflection(θr)” is one way
of stating the law of reflection for light in a plane mirror. Sound obeys
the same law of reflection.
Point source reflecting from a plane surface

When sound waves from a point source strike a
plane wall, they produce reflected circular
wavefronts as if there were an “image” of the sound
source at the same distance on the other side of the
wall. If something obstructs the direct sound from
the source from reaching your ear, then it may
sound as if the entire sound is coming from the
position of the “image” behind the wall. This kind

RT 101.1
.
of sound imaging follows the same laws of reflection as your image in a
plane mirror.
Reflection from Concave Surface

Any concave surface will tend to focus the
sound waves which reflect from it. This is
generally undesirable in auditorium
acoustics because it produces a “hot spot”
and takes sound energy away from
surrounding areas. Even dispersion of sound
is desirable in auditorium design, and a
surface which spreads sound is preferable to
one which focuses it.
Absorption of Sound
Different materials absorb specific
frequencies, with high frequencies being
the most susceptible to absorption.
Concrete, glass, wood and plywood
reflect sound waves, while draperies,
carpet and acoustical tiles absorb more
sound waves, especially the higher
frequencies. The more humid air is, the
greater it’s potential for high frequency absorption. That is why the
humidity levels must be controlled in a concert hall. When the air is
extremely cold, there is little humidity. As a result sounds from a jet
plane sound closer and more piercing on a cold brisk day as compared
to the identical sound on a warm summer day when more high
frequencies are absorbed in the humid air.
Refraction of Sound
Refraction is the bending of waves when they enter a medium where
their speed is different. Refraction is not so important a phenomena
with sound as it is with light where it is responsible for image
formation by lenses, the eye, cameras, etc. But bending of sound waves
does occur and is an interesting phenomenon in sound.

RT 101.1
.
If the air above the earth is warmer than that at the surface, sound will
be bent back downward toward the surface by refraction.
Sound propagates in all directions from a point source. Normally, only

that which is initially directed toward the listener can be heard, but
refraction can bend sound downward. Normally, only the direct sound
is received. But refraction can add some additional sound, effectively
amplifying the sound. Natural amplifiers can occur over cool lakes.
Diffraction of Sound
Diffraction is term used to describe the bending of waves around
obstacles (i.e. walls, barriers, etc) and the spreading out of waves
beyond small openings whose size is smaller relative to the wavelength
of the frequency. Diffraction is exclusively displayed by low frequencies

RT 101.1
.
(bass). The long wavelength sounds of the bass drum will diffract
around the corner more efficiently than the more directional, short
wavelength sounds of the higher pitched instruments.

RT 101.1
.
Loudspeaker Sound Contours

One consequence of diffraction is that sound from a loudspeaker will
spread out rather than just going straight ahead. Since the bass
frequencies have longer wavelengths compared to the size of the
loudspeaker, they will spread out more than the high frequencies. The
curves at left represent equal intensity contours at 90 decibels for sound
produced by a small enclosed loudspeaker. It is evident that the high
frequency sound spreads out less than the low frequency sound.
Frequency perception and the Human Ear

Waveforms Types
Waveforms are the building blocks of sound. By combining raw
waveforms one can simulate any acoustic instrument. This is how
synthesizers make sound. They have waveform generators, which
create all types of waves, which they combine to form composite
waveforms, which approximate real instruments.
Generally waveforms can be divided into two categories
(i) Simple tone
(ii) Complex
Simple Tone
A simple tone is a sound having a single frequency. A sine wave is an
example of a pure tone, in that there is only a single frequency and no
harmonics. The sine tone is the fundamental or ‘backbone’ of every
complex tone there is in existence.

RT 101.1
.
The French scientist Jean Baptiste Joseph Fourier (1768-1830) made a

statement that any acoustic event, no matter how complex, can be
broken down into individual sine waves. He decided that any periodic
waveform can be simplified into its constituent harmonic content built
of a combination of sine waves with specific frequencies, amplitudes
and relative phases. On the other hand, it would be theoretically
possible to artificially or synthetically re-generate any complex sound
by layering multiples sine waves together in the right proportion.
Therefore, we can plot the square wave (shown above as an amplitude-
time graph) in a ‘harmonic analysis’ or ‘Fourier analysis’ shown below.
Complex Tone
A tone having more than a single frequency component is called a
complex. For instance, a tone consisting of a fundamental and overtones
or harmonics, may be said to be complex. Two waveforms combine in a
manner which simply adds their respective amplitudes linearly at every
point in time. Thus, a complex tone or waveform can be built by mixing
together different sine waves of various amplitudes.
The following is a summary of the resultant complex waveforms that

can be synthesized by the addition and subtraction of harmonics to a
sine wave.
Square Wave
An audio waveform theoretically comprised of an infinite set of odd
harmonic sine waves.

RT 101.1
.
Saw tooth Wave

An audio waveform theoretically comprised of an infinite set of
harmonically related sine waves.
Triangle Wave
An audio WAVEFORM theoretically comprised of an infinite set of odd
harmonic SINE WAVE. It is often used in sound synthesis because it is
less harsh than the square wave. The amplitude of its upper harmonics
falls off more rapidly.

RT 101.1
.
Difference between musical sound and noise

Sound carries an implication that it is something we can hear or that is
audible. However sound exists above 20 kHz called ultrasound and
below our hearing range 20Hz called infrasound. Regular vibrations
produce musical tones. The series of tones/frequencies are vibrating in
tune with one another. Because of their orderliness, we find their tones
pleasing.
Noise produces irregular vibrations their air pressure vibrations are

random and we perceive them as unpleasant tones. A musical note
consists of a Fundamental wave and a number of overtones called
Harmonies. These harmonies are known as the harmonic series.
Fundamental Frequency and Harmonics

The lowest resonant frequency of a
vibrating object is called its
fundamental frequency. Most
vibrating objects have more than one
resonant frequency and those used in
musical instruments typically vibrate
at harmonics of the fundamental. The
term harmonic has a precise meaning -
that of an integer (whole number)
multiple of the fundamental frequency
of a vibrating object. When the
frequency of an overtone is not an
integer multiple of the fundamental, the overtone is said to be
inharmonic and is called a partial or overtone; its waveform is
aperiodic.
Instruments of the percussion group, bells, and gongs in particular,

have overtones which are largely inharmonic. As a result, such sounds
may have indefinite or multiple pitches associated with them. The term
overtones or partials are used to refer to any resonant frequency above
the fundamental frequency - an overtone may or may not be a
harmonic. To obtain the frequency of the harmonic number, the
formula below is used.

RT 101.1
.
Overtones and Harmonics

Many of the instruments of the orchestra, those utilizing strings or air
columns, produce the fundamental frequency and harmonics. Other
sound sources such as the membranes or other percussive sources may
have resonant frequencies. They are said to have non-harmonic
overtones.
Cylinders with one end closed will vibrate with only odd harmonics of
the fundamental. Vibrating membranes typically produce vibrations at
harmonics, but also have some resonant frequencies which are not
whole number multiples of their fundamental frequencies. It is for this
class of vibrators that the term overtone becomes useful - they are said
to have some non-harmonic overtones.
Timbre (Quality of Tone)

Timbre or tone quality is determined by the behavior in time of the
frequency content or spectrum of a sound, including its transients
which are extremely important for the identification of timbre.
The presence and distribution of these frequency components, whether
harmonic or inharmonic, and their onset, growth and decay in time
together with phase relations between them, combine to give every
sound its distinctive tonal quality or timbre.
Often qualities of timbre are described by analogy to color or texture

(e.g. bright, dark, rough, smooth), since timbre is perceived and

RT 101.1
.
understood as a ‘subjective’ impression reflective of the entire sound,
seldom as a function of its analytic components.
With musical instruments, timbre is a function of the range in which the
sound has its pitch as well as its loudness, duration, and manner of
articulation and performance. The same applies with speech, where
timbre is the basic quality which allows one to distinguish between
different voices, just as between different instruments or other sounds.
Pitch
In the same sense that loudness is the subjective sense of the objective
parameters of intensity or amplitude of a sound, the subjective
impression of frequency is pitch. As such, pitch is a psychoacoustic
variable, and the degree of sensitivity shown to it varies widely with
people. Some individuals have a sense of remembered pitch, that is, a
pitch once heard can be remembered and compared to others for some
length of time; others have a sense of absolute pitch called perfect pitch.
The term ‘pitch’ is used to describe the frequency of sound. For
example, middle C in equal temperament = 261.6 Hz
Sounds may be generally characterized by pitch, loudness, and quality.

The perceived pitch of a sound is just the ear’s response to frequency,
i.e., for most practical purposes the pitch is just the frequency. The pitch
perception of the human ear is understood to operate basically by the
Place Theory, with some sharpening mechanism necessary to explain
the remarkably high resolution of human pitch perception.
The Place Theory and its refinements provide plausible models for the
perception of the relative pitch of two tones, but do not explain the
phenomenon of perfect pitch. The just noticeable difference in pitch is

RT 101.1
.
conveniently expressed in cents, and the standard figure for the human
ear is 5 cents.
Cents
Musical intervals are often expressed in cents, a unit of pitch based
upon the equal tempered octave such that one equal tempered semitone
is equal to 100 cents. An octave is then 1200¢ and the other equal
tempered intervals can be obtained by adding semitones
Effect of Loudness Changes on Perceived Pitch

A high pitch (>2 kHz) will be perceived to be getting higher if its
loudness is increased, whereas a low pitch (<2 kHz) will be perceived to
be going lower with increased loudness. Sometimes called “Stevens’s
rule” after an early investigator, this psychoacoustic effect has been
extensively investigated.
With an increase of sound intensity from 60 to 90 decibels, Ernst

Terhardt found that the pitch of a 6 kHz pure tone was perceived to rise
over 30 cents. A 200 Hz tone was found to drop about 20 cents in
perceived pitch over the same intensity change.
Studies with the sounds of musical instruments show less perceived

pitch change with increasing intensity. Rossing reports a perceived
pitch change of around 17 cents for a change from 65 dB to 95 dB. This
perceived change can be upward or downward, depending upon which
harmonics is predominant. For example, if the majority of the intensity
comes from harmonics which are above 2 kHz, the perceived pitch shift
will be upward.
Octave
The musician often refers to the octave, a logarithmic concept that is
firmly embedded in musical scales and terminology because of its
relationship to the ear’s characteristics. Whereby harmonics of a
fundamental frequency ascend in a linear manner (i.e. 100, 200, 300,
400….), octaves of a fundamental frequency ascend/ descend in a
logarithmic manner. An octave interval between frequencies is
characterized by the frequency ratio 2:1, such as that produced when
the length of a vibrating string is halved.

RT 101.1
.
Example
If we have a sine tone whose frequency is 100Hz and want to find the
2nd and 3rd octave are
2nd octave = 2 x 100 = 200Hz
3rd octave = 2 x 200 = 400Hz
A note whose pitch is an octave above or below a given note gives the
subjective impression of duplicating that note, such as when men and
women sing in unison, but actually are producing notes an octave
apart.
Envelope of Sound
Defining Envelope of Sound
The variation of (maximum) amplitude over time is called the envelope
of the sound. An envelope of sound is composed of a sound’s attack,
sustain, and decay. Graphical representation of the envelope of a sound
object may show distinctive features in its attack or onset transients,
stationary state, internal dynamics and Release. The envelope of sounds
is its macro-level amplitude behavior in time, whereas its micro-level
pattern of sound pressure variation is its waveform.

RT 101.1
.
Variables of a Sound Envelope
Attack
Sound begins at A and reaches its peak at level B.
Decay
Sound level falls from B to an intermediate steady level C
Sustain
It drops slightly in level and remains steady until D.
Release
When the sound source is removed at D, the sound dies down to a point of silence E.
Attack
The way a sound is initiated is called attack. There are two types of
attack:
(i) Fast
(ii) Slow
Fast attack
The closer the attack of a sound A is to the peak B of a sound, the faster
its attack is. Sounds that have a fast attack are...
Gunshots
Slaps
A snare drum or kick drum
Door slams

RT 101.1
.
Slow attack
Sounds that have a slow attack take longer to build to the sustain level.
Sounds that have a slow attack are…
A dog’s short warning growl prior to bark

Stepping on a dry leaf
Slowly tearing a sheet of paper
Closing a door slowly
An entire thunderclap
Sustain
Once a sound has reached its peak, the length of time that the sound
will sustain is dependent upon the energy from the source vibrations.
When the source sound stops, the sound will began to decay.
Manipulating the sustain time of a sound is yet another way of either
modifying a sound or create a totally new one.
Release
The decrease in amplitude when a vibrating force has been removed is
called release. The actual time it takes for a sound to diminish to silence
is the release time. Listening to a sound tells if it is.
Indoors (small enclosed area with a great deal of absorbency)
- little release and with very little or no reverberation
Outdoors (open unconfined area)
- long decay and release with an echo
The end of a sound is often referred to as the “tail” of a sound, and
conversely, the beginning of a sound is its “head”

RT 101.1
.
The Human Ear and Hearing
The Outer Ear
Hearing and Perception
The operation of the ear has two facets: the behavior of the mechanical
apparatus and the neurological processing of the information acquired.
The mechanics of hearing are straightforward and well understood, but
the action of the brain in interpreting sounds is still a matter of dispute
among researchers.
1. Auditory canal 6. Round window

2. Ear drum 7. Oval window
3. Hammer 8. Semicircular canals
4. Anvil 9. Cochlea
5. Stirrup 10. Eustachian tube

RT 101.1
.
Sound energy spreads out from its sources. For a point source of sound,
it spreads out. For a given sound intensity, a larger ear captures more
of the wave and hence more sound energy. The outer ear structures act
as part of the ear’s preamplifier to enhance the sensitivity of hearing.
The auditory canal acts as a closed tube resonator, enhancing sounds in

the range 2-5 kHz. This is a very important frequency range for
intelligibility of human speech.
Auditory Canal Resonance

The maximum sensitivity regions of
human hearing can be modeled as closed
tube resonances of the auditory canal. The
observed peak at about 3700 Hz at body
temperature corresponds to a tube length
of 2.4 cm. The higher frequency
sensitivity peak is at about 13 kHz which
is somewhat above the calculated 3rd
harmonic of a closed cylinder.
The outer ear (pinna) collects more sound energy than the ear canal
would receive without it and thus contributes some area amplification.
The outer and middle ears contribute something like a factor of 100 or
about 20 decibels of amplification under optimum conditions.

RT 101.1
.
The numbers here are just representative ... not precise data.
The Middle Ear

The Tympanic Membrane (Eardrum)
The tympanic membrane or “eardrum” receives vibrations traveling
up the auditory canal and transfers them through the tiny ossicles to
the oval window, the port into the inner ear. The eardrum is some
fifteen times larger than the oval window, giving an amplification
factor of about fifteen compared to the oval window alone.
The Ossicles
The three tiniest bones in the body form the coupling between the
vibration of the eardrum and the forces exerted on the oval window of
the inner ear. The ossicles can be thought of as a compound lever which
achieves a multiplication of force. This lever action is thought to achieve
amplification by a factor of about three under optimum conditions, but
can be adjusted by muscle action to actually attenuate the sound signal
for protection against loud sounds.

RT 101.1
.
The Inner Ear

The inner ear can be thought of
as two organs: the semicircular
canals which serve as the body’s
balance organ and the cochlea
which serves as the body’s
microphone, converting sound
pressure impulses from the outer
ear into electrical impulses which
are passed on to the brain via the
auditory nerve.
The basilar membrane of the inner ear plays a critical role in the
perception of pitch according to the Place Theory.
The Semicircular Canals

The semicircular canals are the body’s balance
organs, detecting acceleration in the three
perpendicular planes. These accelerometers make
use of hair cells similar to those on the organ of
Corti, but these hair cells detect movements of the
fluid in the canals caused by angular acceleration
about an axis perpendicular to the plane of the
canal. Tiny floating particles aid the process of
stimulating the hair cells as they move with the

RT 101.1
.
fluid. The canals are connected to the auditory nerve.
The Cochlea
The inner ear structure called the cochlea is a snail-shell like structure
divided into three fluid-filled parts. Two are canals for the transmission
of pressure and in the third is the sensitive organ of Corti, which detects
pressure impulses and responds with electrical impulses which travel
along the auditory nerve to the brain.
Section of Cochlea
The cochlea has three fluid filled sections.
The perilymph fluid in the canals differs
from the endolymph fluid in the cochlear
duct. The Organ of Corti is the sensor of
pressure variations.
The Fluid Filled Cochlea

The pressure changes in the cochlea caused by
sound entering the ear travel down the fluid
filled tympanic and vestibular canals which are
filled with a fluid called perilymph. This
perilymph is almost identical to spinal fluid and
differs significantly from the endolymph which
fills the cochlear duct and surrounds the
sensitive Organ of Corti. The fluids differ in
terms of their electrolytes and if the membranes
are ruptured so that there is mixing of the fluids,
the hearing is impaired.

RT 101.1
.
Organ of Corti
The organ of Corti is the
sensitive element in the
inner ear and can be
thought of as the body’s
microphone. It is situated
on the basilar membrane
in one of the three
compartments of the
Cochlea. It contains four
rows of hair cells which
protrude from its surface.
Above them is the tectoral
membrane which can
move in response to
pressure variations in the
fluid- filled tympanic and
vestibular canals. There
are some 16,000 -20,000 of
the hair cells distributed
along the basilar
membrane which follows
the spiral of the cochlea.
The place along the basilar membrane where

maximum excitation of the hair cells occurs
determines the perception of pitch according to
the Place Theory. The perception of loudness is
also connected with this organ.
Arrangement of Hair Cells

Individual hair cells have multiple strands called
stereocilia. There may be 16,000 - 20,000 such cells.
The place theory of pitch perception suggests that
pitch is determined by the place along this
collection at which excitation occurs. The pitch
resolution of the ear suggests a collection of hair
cells like this associated with each distinguishable
pitch.

RT 101.1
.
Single Hair Cell Structure

The sensitive hair cells of the organ of Corti may have about 100 tiny
stereocilia which in the resting state are leaning on each other in a
conical bundle. In response to the pressure variations in the cochlea
produced by sound, the stereocilia may dance about wildly and send
electrical impulses to the brain.
Place Theory
High frequency sounds selectively vibrate
the basilar membrane of the inner ear near
the entrance port (the oval window). Lower
frequencies travel further along the
membrane before causing appreciable
excitation of the membrane. The basic pitch
determining mechanism is based on the
location along the membrane where the hair
cells are stimulated. A schematic view of the Place Theory unrolls the
cochlea and represents the distribution of sensitive hair cells on the
organ of Corti. Pressure waves are sent through the fluid of the inner
ear by force from the stirrup.
2 turns, about 3.2 cm in

length and can resolve about
1500 separate pitches with
16,000 to 20,000 hair cells.
Pitch Resolution
The normal human ear can detect the difference between 440 Hz and
441 Hz. The high pitch resolution of the ear suggests that only about a
dozen hair cells or about three tiers from the four banks of cells are
associated with each distinguishable pitch. It is hard to conceive of a

RT 101.1
.
mechanical resonance of the basilar membrane that sharp. There must
be some mechanism which sharpens the response curve of the organ of
Corti, as suggested schematically in the diagram.
Beats
When two sound waves of different
frequency approach your ear, the
alternating constructive and destructive
interference causes the sound to be
alternatively soft and loud - a
phenomenon which is called ‘beating’ or
producing beats. The beat frequency is
equal to the absolute value of the
difference in frequency of the two
waves.
Following this, you would hear beats at a rate equal to the lower
frequency subtracted from the higher frequency. For example, if the
two tones are at 440 and 444 Hz, you’ll hear the two notes beating 4
times per second (or f1-f2).
Envelope of Beat Production

Beats are caused by the interference
of two waves at the same point in
space. This plot of the variation of
resultant amplitude with time shows
the periodic increase and decrease
for two sine waves.

RT 101.1
.
If the frequencies are far apart:
First and foremost, you’re going to hear the two sine waves of two
frequencies.
Secondly, you’ll hear a note whose frequency is equal to the difference

between the two frequencies being played |ƒ1 - ƒ2|
Critical Bandwidth
There is no exact frequency difference at which the beats are heard for a
listener. However, the approximate frequency and order in which they
occur is common to all listeners, and in common with all
psychoacoustic effects, average values are quoted which are based on
measurements made for a large number of listeners.
The point where the two tones are heard as separate as opposed to
fused when the frequency difference is increased can be thought of as
the point where two peak displacements on the basilar membrane begin
to emerge from a single maximum displacement on the membrane.
However, at this point the underlying motion of the membrane which
gives rise to the two peaks causes them to interfere with each other
giving the rough sensation, and it is only when the rough sensation
becomes smooth that the separation of the places on the membrane is
sufficient to fully resolve the two tones. The frequency difference
between the pure tones at the point where a listener’s perception
changes from a rough and separate to smooth and separate is known as
critical bandwidth. A more formal definition is given by Scharf (1970),
‘the critical bandwidth is that bandwidth at which subjective responses rather
abruptly change.’
The critical bandwidth changes according to frequency. In practice

Critical bandwidth is usually measured by an effect known as masking
in which the rather abrupt change is more clearly perceived by
listeners. Masking is when one frequency can not be heard as a result of
another frequency that is louder and close to it.
The Auditory Nerve

Taking electrical impulses from the cochlea and the semicircular canals,
the auditory nerve makes connections with both auditory areas of the
brain.

RT 101.1
.
Auditory Area of Brain

This schematic view of some of the auditory areas of the brain shows
that information from both ears goes to both sides of the brain - in fact,
binaural information is present in all of the major relay stations
illustrated here.
Attributes of the Human Hearing System

Audible Sound
The human ear can respond to minute pressure variations in the air if
they are in the audible frequency range, The term ‘sound’ is used to
describe a sensation which can be perceived by the human hearing
system. The ear can hear all frequencies from approximately 20Hz to
20,000 Hz, this often being called the audible range, or range of hearing.

RT 101.1
.
This is true if the frequency within the range above is sounded with
intensity above the standard threshold of audibility commonly referred
to as the threshold of hearing.
Sensitivity of Human Ear
The human hearing system is capable of detecting pressure variations
of less than one billionth of atmospheric pressure. The threshold of
hearing corresponds to air vibrations on the order of a tenth of an
atomic diameter. This incredible sensitivity is enhanced by an effective

amplification of the sound signal by the outer and middle ear
structures.
Dynamic Range of Hearing

In addition to its remarkable sensitivity, the human ear is capable of
responding to the widest range of stimuli of any of the senses. Similar
to the frequency range, the dynamic range represents the area between
the softest and loudest sound the ear can handle. The practical dynamic
range could be said to be from the threshold of hearing to the threshold
of pain. The softest sound we can just perceive is about 0.00002Pascals
at 1 kHz. The loudest sound that we can handle is approximately
20Pascals at 1 kHz. The range between 0.00002Pa and 20Pa is quite
huge if we were using the physical unit of pressure (Pascal). Hence in
order to simplify this range, we use the dBsound pressure level, dB SPL, as a unit of
measurement. The threshold of hearing (0.00002Pa) would be equal to
0dB and the threshold of pain would be 120. In this way, it is easier for
us to deal with whole numbered integers such as 0, 10, 20…. rather than
with inconvenient decimals. The human hearing system has the
following thresholds.
Threshold of Hearing 0 dB spl

Threshold of Feeling 120 dB spl
Threshold of Pain 130 dB spl

RT 101.1
.
This remarkable dynamic range is enhanced by an effective
amplification structure which extends its low end and by a protective
mechanism which extends the high end.
Threshold of Pain
The nominal dynamic range of human hearing is from the standard
threshold of hearing to the threshold of pain. A nominal figure for the
threshold of pain is 130 decibels, but that which may be considered
painful for one may be welcomed as entertainment by others.
Generally, younger persons are more tolerant of loud sounds than older
persons because their protective mechanisms are more effective. This
tolerance does not make them immune to the damage that loud sounds
can produce.
The table gives ballpark equivalent values of the different types of

sounds that we hear everyday and their relative dBSPL as well as a
comparison of pressure levels. Notice that the dBSPL reading is more
easily read if compared to the equivalent pressure in Pascals.

RT 101.1
.
Loudness
The loudness of a sound depends on the intensity of the sound
stimulus. A dynamite explosion is louder than that of a cap pistol
because of the greater amount of air molecules the dynamite is capable
of displacing. Loudness becomes meaningful only if we are able to
compare it with something. The sound of a gunshot may be deafening in
a small room, but actually go unnoticed if fired in a subway station
when a train is roaring past.
Loudness is not simply sound intensity!

Sound loudness is a subjective term describing the strength of the ear’s
perception of a sound. It is intimately related to sound intensity but can
by no means be considered identical to intensity. The sound intensity
must be factored by the ear’s sensitivity to the particular frequencies
contained in the sound. This is the kind of information contained in
equal loudness curves for the human ear. It must also be considered that
the ear’s response to increasing sound intensity is a “power of ten” or
logarithmic relationship. This is one of the motivations for using the
decibel scale to measure sound intensity. A general “rule of thumb” for
loudness is that the power must be increased by about a factor of ten to
sound twice as loud.
Equal Loudness Contours

The threshold of hearing is the point at which the average listener can
just barely hear a sound. In 1933, Fletcher and Munson conducted a
series of tests which demonstrated that this threshold is very much a
function of frequency. To put it in another way, the test conducted by
these two scientist produced results which showed that the ear is not
equally sensitive to all frequencies, particularly in the low and high
frequency ranges. The curves are plotted for each 10 dB rise in level
with the reference tone being at 1 kHz. These curves are called loudness
level contours or the Fletcher-Munson curves.

RT 101.1
.
From these curves, we can deduce that:-

(i) We are less sensitive to high frequencies and low frequencies than
to mid-range frequencies.
(ii) The curves are lowest in the range from 1 to 5 kHz, with a dip at 4 kHz,
indicating that the ear is most sensitive to frequencies in this range. This
is an interesting area for the reason that the bulk of our speech
(specifically consonant sounds) relies on information in this frequency
range (although it’s like that the speech evolved to capitalize on the
sensitive frequency range)
The intensity level of higher or lower tones must be raised

substantially in order to create the same impression of loudness. Things
sound better when they’re louder. This is because there’s a “better
balance” in your hearing perception than when they’re at a lower level.
This is why the salesperson at the stereo store will crank up the volume
when you’re buying speakers... they sound good that way... everything
does. If the level is low, then you’ll think that you hear less bass.
The phon scale was devised to express this subjective impression of

loudness, since the decibel scale alone refers to actual sound pressure or
sound intensity levels. Historically, the A, B, and C weighting networks

RT 101.1
.
on a sound level meter were derived as the inverse of the 40, 70 and 100
dB Fletcher-Munson curves and used to determine sound level. The
lowest curve represents the threshold of hearing, the highest the
threshold of pain.
The Importance of Audio Engineers Understanding Loudness

Contours
The practical consequence of the equal loudness contours is that as the
overall listening level is decreased, the listener perceives a falling-off
bass response, and to a lesser extent, of high frequency response.
Therefore, a master tape that was mixed at a high listening level (>
85dBSPL) will sound weaker in bass and high end later on when played
back at lower levels. Conversely, a program mixed at low listening level
will seem to have more bass and treble when played back at a louder
level. Why does this happen? Well, human beings have an inherent
tendency to compensate for the level differences in bass and treble
content and most always want more of it. That’s the reason why more
and more home consumer and car audio systems incorporate frequency
dependant amplifiers (or graphic equalizers) to cater to individual
consumer preferences. From the perspective of audio engineering,
mixing audio with a misconception that ‘more bass or treble is better’
can be detrimental to the overall integrity of the mix.
Given this facts of auditory life, there’s something to be said for keeping
listening levels within reason during a mixdown session. If that level
gets too loud (as often happens), the final product will probably suffer
as a result, when heard by the listener at a normal listening level.
Furthermore, listening to loud volumes subjects the hearing system to
fatigue, causing the threshold of hearing and pain to shift somewhat
and misinterpret what’s being reproduced by the speakers. It is also
worthwhile to mention that studio grade professional speakers can
reproduce larger quantities of bass and treble without distortion or
other erroneous sonic artifacts. However, the same mix if played back
on a normal system with average speakers may distort, sound muddy
(clouding of the frequency spectrum by bass frequencies) or thin
(excessive high frequency content).

RT 101.1
.
The Ear’s Protective Mechanism

In response to sustained loud sounds, muscle tension tightens the tympanic
membrane and, acting through the tendon connecting the hammer and anvil,
repositions the ossicles to pull the stirrup back, lessening the transfer of
force to the oval window of the inner ear. This contributes to the ear’s wide
dynamic range. In short dynamic range refers to the softest and loudest
sound a person can hear without any sort of discomfort

RT 101.1
.
Effect of Extreme Noise and Frequencies on the Ear
On exposure to noise (which may as well be very loud music levels),
the ear’s sensitivity level will decrease as a measure of protection. This
process is referred to as a shift in the threshold of hearing, meaning that
only sounds louder than a certain level will be heard. The shift may be
temporary, chronic or permanent. Threshold shifts can be categorized
in three primary divisions:-
Temporary Threshold Shift (TTS)

During short exposure to noise, most people experience a rise in the
auditory threshold which normally disappears in 24 hours, but may last
as long as a week.
Permanent Threshold Shift (PTS) or Noise Induced Permanent

Threshold Shift (NIPTS)
After prolonged exposure to noise, permanent hearing damage may
result in the inner ear.
Chronic Threshold Shift or Compound Threshold Shift

If exposure to noise occurs repeatedly without sufficient time between
exposures to allow recovery of normal hearing, TS may become chronic,
and eventually permanent. This is a particular danger when people
who work in noisy environments are exposed to further noise
afterwards in driving, at home and at places of entertainment.
Susceptibility to threshold shifts (TS) varies greatly from person to

person, men generally being more sensitive to low frequency sounds,
and women more susceptible to high frequencies. Sounds in the 2 - 6
kHz range seem to induce greater temporary threshold shift (TTS) than
other frequencies. This is also called aural fatigue.
One of the body’s reactions to loud sounds is a constriction of the blood

vessels (vasoconstriction) which reduces the blood supply reaching the
hair cells of the Organ of Corti. The outer rows of hair cells respond
mainly to low intensity sound levels and thus are easily saturated by
loud sounds, particularly when their source of blood is diminished.
This leaves only the inner rows of hair cells working since they need a
higher intensity for stimulation.

RT 101.1
.
Thus, TTS implies a temporary hearing loss for low level sounds
(somewhat analogously to the protective closing of the iris in bright
light and the resulting temporary desensitization to low light levels). If
the outer hair cells are not allowed to recover through periods of quiet,
they gradually lose their ability to respond and eventually die. TTS may
also be accompanied by tinnitus, a ringing in the ears.
So if you seriously want to be an audio engineer, think twice before you

go into a club which is playing back music at extremely high intensities.
As you intoxicate yourself with all those wild cocktail concoctions, the
brain’s sensitivity is gradually diminished and the physical hearing
organ – your ear – undergoes damage. Take care of your ears.
The graph below shows the duration of high intensity sound exposure in
minutes, hours, days, and weeks and how it affects the threshold of
hearing.

RT 101.1
.
Important things to remember about the ear
Ear’s response to frequency is logarithmic

Non-flat frequency response
Dynamic range is about 120 dB (at 3-4 kHz)
Frequency discrimination 2 Hz (at 1 kHz)
Intensity change of 3dB to 1 dB can be detected depending on the frequency range.
Small sensitivity to phase
Structure of the ear

; The outer ear (pinna)
; The middle ear (ossicles)
; The inner ear (cochlea)
Pinna
Gathers the sound
D
ifferentiates (to some extent) sounds from front and rear
Sound localization due to the interference on the pinna
The auditory canal

3 cm long, 0.7 cm diameter
protects thin eardrum from foreign objects
acts as a resonator for specific frequencies in the range of 2Khz to 4kHz
Auditory canal resonance (at the frequency where the canal is quarter wavelength
long) causes acoustical amplification at the eardrum (about 10 dB in the 2-4 kHz
region).
Further amplification due to the diffraction of sound waves around the head (total
amplification 20 dB)
The middle ear

Eardrum vibrates as a result of acoustic variations in pressure
Converts acoustical energy into mechanical energy
stops acoustical energy
The mechanical action of the bones transmits the vibrations of the air to the liquid in
the inner ear.
Protects the delicate inner ear against excessively intense sounds
The inner ear

About 4 cm long
Filled with fluid and sensory membranes
Vibrations of the oval window cause traveling waves in the fluid of cochlea.

RT 101.1
.
The traveling waves stimulate the hair cells in the membranes that transmit impulses
to the brain via the auditory nerve.
Cochlea is a sound-analyzing mechanism capable of amazing pitch and frequency
discrimination.
Ear can differentiate between a tone of 1000 Hz and 1003 Hz (discrimination of 0.3
%).
Noise Spectra
If we have all frequencies with random relative phase, the result is
noise in its various incarnations, the two most common of which are
white and pink noise.
White Noise
White noise is a type of
noise that is produced by
combining sounds of all
different frequencies
together. If you took all of
the imaginable tones that a
human can hear and
combined them together,
you would have white
noise.
The adjective “white” is

used to describe this type
of noise because of the
way white light works. White light is light that is made up of all of the
different colors (frequencies) of light combined together (a prism or a
rainbow separates white light back into its component colors). In the
same way, white noise is a combination of all of the different frequencies of
sound. You can think of white noise as 20,000 tones all playing at the
same time.
White noise is defined as a noise that has equal amount of energy per
frequency. This means that if you could measure the amount of energy
between 100 Hz and 200 Hz it would equal the amount of energy
between 1000 Hz and 1100 Hz. This sounds “bright” (hence “white”) to
us because we hear pitch in octaves. 1 octave is a doubling of frequency,
therefore 100 Hz - 200 Hz is an octave, but 1000 Hz - 2000 Hz is also an

RT 101.1
.
octave. Since white noise contains equal energy per Hz, there is ten
times as much energy in the 1 kHz octave than in the 100 Hz octave.
Due to this fact, the level of white noise signal voltage raises 6dB per
octave.
Because white noise contains all frequencies, it is frequently used to

mask other sounds. If you are in a hotel and voices from the room next-
door are leaking into your room, you might turn on a fan to drown out
the voices. The fan produces a good approximation of white noise. Why
does that work? Why does white noise drown out voices?
Here is one way to think about it. Let’s say two people are talking at the
same time. Your brain can normally “pick out” one of the two voices
and actually listen to it and understand it. If three people are talking
simultaneously, your brain can probably still pick out one voice.
However, if 1,000 people are talking simultaneously, there is no way
that your brain can pick out one voice. It turns out that 1,000 people
talking together sounds a lot like white noise. So when you turn on a
fan to create white noise, you are essentially creating a source of 1,000
voices. The voice next-door makes it 1,001 voices, and your brain can’t
pick it out any more.
Pink Noise
Pink noise is noise that
has an equal amount of
energy per octave. This
means that there is less
energy per Hz as you
go up in frequency.
Pink noise is achieved
by running white noise
through a pinking
filter.
A pinking filter is no
more that a 6dB per
octave roll off filter.
Since white noise rises up 6dB per octave, there is a drop of 6dB of the
energy each time you go up an octave. To say it in another way, the 6dB
roll-off filter negates and equalizes the amount of energy per octave –

RT 101.1
.
neutralizing the rising response. So, if you were to listen to pink noise,
it would sound relatively equal in all frequencies compared to white
noise which will be a bit brighter.
Pink noise is commonly used for evaluating and calibrating sound

reproduction systems like speakers.
Psychoacoustics
Psychoacoustics may be broadly defined as a study of the complex
relationship between physical sound and the brain’s reaction and
interpretation of them in a sound field. Until recently, psychoacoustics
has devoted more attention to the behavior of the peripheral auditory
system than to the details of cognitive processing.
The discipline is a branch of psychophysics in that it is interested in the

relation between sensory input stimuli and the behavioral or
psychological response that they provoke. Because of individual
variations in observed responses, statistical results are most often
achieved. Some of the traditional psychoacoustic concerns involve the
perception of pitch, loudness, volume and timbre.
Response
The response of a device or system is the motion (or other output)
resulting from excitation (by a stimulus) under specified conditions. A
qualifying adjective is usually prefixed to the term (e.g., frequency
response, amplitude response, transient response, etc.) to indicate the
type of response under consideration.
Frequency response curves for (a) two violin strings, showing characteristic
resonance regions, and (b) a loudspeaker which reproduces frequencies

RT 101.1
.
Binaural Localization
Humans, like most vertebrates, have two ears that are positioned at
about equal height at the two sides of the head. Physically, the two ears
and the head form an antenna system, mounted on a mobile base. This
antenna system receives acoustic waves of the medium in which it is
immersed, usually air. The two waves received and transmitted by the
two ears are the physiologically adequate input to a specific sensory
system, the auditory system.
The ears-and-head array is an antenna system with complex and

specific transmission characteristics. Since it is a physical structure and
sound propagation is a linear process, the array can be considered to be
a linear system. By taking an incoming sound wave as the input and the
sound pressure signals at the two eardrums as the output, it is correct
to describe the system as a set of two self-adjusting filters connected to
the same input. Self-adjusting, in the sense used here, means that the
filters automatically provide transfer functions that are specific with
regard to the geometrical orientation of the wave front relative to the
ears-and-head array.
Physically, this behavior is explained by resonances in the open cavity

formed from pinna, ear canal and eardrum, and by diffraction and
reflection by head and torso. These various phenomena are excited
differently when a sound wave impinges from different directions
and/or with different curvatures of the wave front. The resulting
transfer functions are generally different for the two filters, thus
causing ‘interaural’ differences of the sound-pressure signals at the two
eardrums. Since the linear distortions superimposed upon the sound
wave by the two ‘ear filters’ are very specific with respect to the
geometric parameters of the sound wave, it is not far from the mark to
say that the ears-and-head system encodes information about the
position of sound sources in space, relative to this antenna system, into
temporal and spectral attributes of the signals at the eardrums and into
their interaural differences. All manipulations applied to the sound
signals by the ears-and-head array are purely physical and linear. It is
obvious, therefore, that they can be simulated.
Although humans can hear with one ear only - so called monaural
hearing - hearing with two functioning ears is clearly superior. This fact

RT 101.1
.
can best be appreciated by considering the biological role of hearing.
Specifically, it is the biological role of hearing to gather information
about the environment, particularly about the spatial positions and
trajectories of sound sources and about their state of activity. Further, it
should be recalled in this context that inter individual communication is
predominantly performed acoustically, with brains deciphering
meanings as encoded into acoustic signals by other brains.
In regard of this generic role of hearing, the advantage of binaural as

compared to monaural hearing stands out clearly in terms of
performance, particularly in the following areas : -
(i) Localization of single or multiple sound sources and, consequently,

formation of an auditory perspective and/or an auditory room impression.
(ii) Separation of signals coming from multiple incoherent sound sources

spread out spatially or, with some restrictions, coherent ones.
(iii) Enhancement of the signals from a chosen source with respect to further
signals from incoherent sources, as well as enhancement of the direct
(unreflected) signals from sources in a reverberant environment.
Definitions
A few frequently encountered terms are given brief definitions here, for
the sake of the discussion which follows:-
Image localization. - The term localization refers to the perception of the point at
which a sound source, or image, seems to be situated with respect to the listener’s
own position.
Arrival angle. - The angle from which an original sound source arrives, with zero
degrees understood to indicate a source that is directly in front of the listener.
Interaural. - Refers to any comparison between an audio signal measured at one

ear, and the same signal measured at the other ear.
Reproduced source - Any sound recorded earlier and played back over one or
two speakers.

RT 101.1
.
Localization
Localization refers to our ability
to make judgments as to the
direction and distance of a sound
source in the real world
environment. We use various
cues to help us localize the
direction of a sound. When
considering the sound source
and the environment, the fact
that sound waves travel in all
directions from a particular
sound forces the listener to
cope with direct and indirect
sounds. Direct sound is the most direct path that sounds takes, that is
from the object creating the sound to the actual perceiver of the sound.
Indirect sound incorporates all of the reflections of the sound that the
perceiver hears at a delayed interval from the direct sound and
provides the listener with information as to the space, location and
distance of the sound within the environment.
Is the sound emitting from a large or small room? Is it outside or inside?

Is it in a reverberant space or a non-reverberant space? These are a few
of the multitude of factors that influence our judgment of a sound’s
location.
Sound enters the ear canal through direct paths, and indirect paths that
reflect from the complex folds of the pinna. When the reflections of the
indirect sounds combine in the ear with the direct sounds, pinna
filtering occurs, changing the received sound’s frequency response. The
ear/brain duo interprets this equalization, producing cues (assisting
zenith localization, for example) from the filtering effect. To provide
still more directional cues, small head movements allow the ear/brain to
judge relative differences in the sound field perspective. With our
marvelous acuity, we can hear sounds coming from all around us,
whether they are naturally created or coming from the speakers of a
stereo or surround sound system.

RT 101.1
.
We can perceive where sounds come from - above, below, behind, to
the side—this is called spatial localization. In particular, our
stereophonic ears can discern azimuth or horizontal (left-right)
directionality, and zenith or vertical (up-down) directionality. We
perceive directionality using localization cues such as Interaural Time
Difference (ITD), Interaural Intensity Difference (IID), and pinna
filtering (Comb filtering).
The median plane is the region where the sound sources are equidistant
from the two ears. The horizontal plane is level with the listener’s ears.
The frontal or lateral plane divides the head vertically between the front
and the back. The position of the sound source relative to the center of
the listener’s head is expressed in terms of azimuth (0-360 degrees, from
in front of the head all the way around the head), elevation (angle
between the horizontal plane up 90 degrees or below -90 degrees) and
distance.

RT 101.1
.
Localization Parameter
Using the listener’s own position as a reference point, the localization of
a sound source may be conveniently described by two parameters:
distance and arrival angle.
Distance cues
The perception of the distance from which a sound arrives is itself a
function of four (4) variables, each which is discussed below. The
variables are
(i) Loudness
(ii) Ratio of direct to reflected sound
(iii) Frequency response (high frequency attenuation)
(iv) Time delay
Loudness
All else being equal, it is obvious that the closer a listener is to the
sound source, the louder it will be. However, all else is rarely ever
equal, and loudness by itself is relatively uninformative, For example,
turning a volume control up or down does nothing to vary the
impression of distance, unless the level change is accompanies by one
or more other important distance cues.
Direct – to – Reflected Sound

A far for important distance cue is
the ratio of direct to reflected
sound that reaches the listener. As
an obvious example, consider a
sound very close to the listener,
and another at a great distance. In
either case there is one direct path
to the listener and many reflected
paths, as the sound bounces of
various surfaces in the listening
area. The difference between these
two examples is that one that is
closer to the listener and hence
reaches the listener via a shorter path. Its intensity is somewhat
preserved with minimal energy loss as governed by the Inverse Square
Law and the listener hears the direct sound almost entirely with little or
no reflected sound. By contrast, sound arriving from the distant source

RT 101.1
.
is accompanied by many reflections of itself, some which arrive just
after the direct sound. Again the Inverse Square Law is at work: with
little difference in path length, there is less distinguishing difference
between the amplitude of the direct and reflected sounds. By
distinguishing the direct to reflected ratio in a sound source, the listener
is able to perceive the distance of the sound source itself.
High Frequency Attenuation

As a pressure wave travels through the surrounding air, there is
gradual loss of high frequency information due to atmospheric
absorption. For example, at a temperature of 20°C, a 10 kHz signal is
attenuated some 0.15dB to 0.30dB per meter, depending on the relative
humidity. This high frequency attenuation may help convey a feeling of
distance, provided the listener already has some frame of reference; that
is the same source has been heard close-up, or is so familiar that the
listener knows from experience what it is supposed to sound like when
nearby.
Time Delay
As a final distance cue, it takes a certain
amount of time for any sound to reach the
listener. For example, the sound produced by
a musician in the last row of a large ensemble
may arrive some 20 or more milliseconds
later than the sound of a front and center
placed soloist. With the earlier sound serving
as a frame of reference, the later arrival of a
more distant source becomes a subtle yet
powerful distance cue.
When a sound reaches our ears, the ipsilateral

ear, which is closest to the sound, perceives it first and the sound is
louder than at the contralateral ear, which is further from the sound
source. Most sounds are not equidistant from each ear and, as
mentioned earlier, some frequencies bounce off the body while others
curve around the body. Variations in sound reaching each ear created
by these effects may be measured as an (IID) interaural intensity
difference and an (ITD) interaural time difference.

RT 101.1
.
The interaural time difference is frequency

independent, relying on the fact that lower
frequency wave forms bend around the
head. There is a difference in the time it takes
the sound to reach the ipsilateral ear and the
time it takes the sound to bend around the
body and reach the contralateral ear.
Distinctions can be made with time lags

between the ears as small as 800
microseconds, or less that one millisecond.
IID and ITD cues only enable us to identify
the sound images as left, right or inside the
head. The term lateralization is used to describe the apparent location
of sound within the head, as occurs when sound is perceived through
headphones.
Haas Effect
Also called the Precedence Effect, or Law of the First Wave front, describes
the human psychoacoustic phenomena of correctly identifying the
direction of a sound source heard in both ears but arriving at different
times. Due to the head’s geometry (two ears spaced apart, separated by
a barrier) the direct sound from any source first enters the ear closest to
the source, then the ear farthest away.
The Haas Effect tells us that humans localize a sound source based
upon the first arriving sound, if the subsequent arrivals are within 25-30
milliseconds. If the later arrivals are longer than this, then two distinct
sounds are heard. The Haas Effect is true even when the second arrival
is louder than the first (even by as much as 10 dB!). In essence we do
not “hear” the delayed sound. This is the hearing example of human
sensory inhibition that applies to all our senses. Sensory inhibition
describes the phenomena where the response to a first stimulus causes
the response to a second stimulus to be inhibited, i.e., sound first
entering one ear cause us to “not hear” the delayed sound entering into
the other ear (within the 35 milliseconds time window). Sound arriving
at both ears simultaneously is heard as coming from straight ahead, or
behind, or within the head.

RT 101.1
.
The listener’s perception of angle from which a sound arrives is
determined by subtle differences between the ways each ear hears the
same signal. These are:-
relative loudness between the two ears or Interaural Intensity Differences (IID)
time of arrival difference between the two ears or Interaural Time Difference
(ITD)
frequency response differences between the two ears or Interaural Spectral

Difference (ISD)
Interaural Intensity Differences (IID)

In theory, there will be an interaural intensity difference when an
original sound source arrives from an off-center location. In practice
however, that difference is sometimes so slight as to be imperceptible.
For example, consider a sound source originating 90º off-axis, but
located 2 meters away. Given an ear spacing of say 21 cm, the ratio of
the distances to each ear is 2 / 2.21 = 0.90. These means the interaural
difference attributable to distance alone is 20 log (0.90), or only 0.87dB.
(For the moment, we ignore the effect of the head itself as a barrier
between the source and the distant ear.)
In the more normal listening environment shown in the second

diagram, the arrival angle is 30º, and the source distance is 4 meters.
Under these conditions, the additional path length to the distant ear is
only 0.1 meter, so the interaural difference is reduced to 20 log (4/4.41) =
- 0.2dB.
From these examples, it can be seen that under normal listening
conditions the interaural level differences from a slight additional path
length to the more distant (or contralateral) ear is not much use as a
localization cue.

RT 101.1
.
Interaural Time Differences (ITD)
Within a certain frequency band, time becomes a very important
localization cue. For example, consider a plane pressure waveform
arriving from some off-center location. As noted above, if the sound
pressure were measured at each ear, there would be almost no
difference in level. However, the sound would arrive a bit later at the
contralateral ear relative to the ipsilateral ear.
In the following example, a distance of 21 cm is used as the spacing

between the ears. To determine the additional time taken for a sound to
reach the contralateral ear, the calculation below can be used.
Interaural Spectral Difference (ISD)

When a signal arrives from some off-axis location, the effect of the
listeners own head cannot be ignored; it becomes an acoustic obstacle in
the path of the contralateral ear. At low frequencies, the resultant
attenuation is minimal, due to the diffraction of low frequencies
around the head. However as the frequency increases, its wavelength
decreases, and the head become more of a barrier to the arriving
sound. Therefore there is considerable high frequency attenuation at
the distant ear as the sound source moves farther away off centre.

RT 101.1
.
Head Related Transfer Functions

Each sound source creates a
frequency spectrum that
includes the range of
frequencies that make up a
particular sound. When wave
forms approach the human
body, they are affected by interaction with the listener’s torso, head,
pinnae (outer ears) and ear canals. The total of these properties are
captured as HRTFs.
They are dependent on the direction of the sound. For example, before
a sound wave gets to the eardrum it first passes through the outer ear
structure, called the pinna. The pinna acts as a variable filter;
accentuating or suppressing mid and high frequency energy of a sound
wave to various degrees, depending on the angle at which the sound
wave hits the pinna.
Sounds that are above 1500 Hz are reflected off the body, while
waveforms of lower frequency actually bend around the body. The
1500 Hz wave form is significant because one cycle is roughly the same
size as the diameter of our head.
HRTFs can be thought of a two audio filters, one for each of the ears,
that capture the listening cues that are applied to the sound as it travels
through the environment to the eardrum. The filters will change
depending on the direction of the sound source.
Early Reflections & Reverberation

Reverberation is the multiple
repetition of an audio signal that
becomes more closely spaced
with time. Direct sound
represents sounds reaching the
listener by a direct path, while
indirect sound results from the
reflection of audio signals
reaching the listener at a delayed
time. Reverberation is the sum

RT 101.1
.
total of all of the reflections of a particular sound, even though they
arrive at different times, to a given point.
Our perception of sound reflections Diagram shows how a listener sitting

helps us to determine the distance of in an auditorium listens to not only
the sound source. Moreover, the the direct sound, but multiple
reflections of boundary surfaces.
interaction of the sound with the
room has a large effect on how close
the sound is to the ears. When a
person is close to a sound source the
reflections take an acute angle,
creating a greater relative distance
between the arrival time of the direct
sound and the indirect sound.
At farther distances, reflected

sounds make an obtuse angle,
consequently there is less of a
difference in the arrival time
between direct and indirect sounds
from the sound source to the listener. It is easy to localize a sound that
is less than two meters from the listener. Beyond two meters it is
difficult to determine distance without receiving reinforcement from
reverberation. At distances greater than two meters the intensity of
direct sound decreases exponentially with increasing distance, while
the intensity of the reverberation level stays constant. The human ear
compares the ratio of direct sound to the reverberation level in
determining its distance from the sound source.
Reverberation and reflections of the sound can be described

graphically. The graph below shows the direct sound, followed by
reflections of one wall called the 1st order reflections, followed by two
walls called the 2nd order reflections and so on.
The reverberant sound in an enclosed space, like an auditorium dies

away with time as the sound energy is absorbed by multiple interactions
with the surfaces of the room. In a more reflective room, it will take longer
for the sound to die away and the room is said to be ‘live’. However, if
it is excessive, it makes the sounds run together with loss of articulation
- the sound becomes muddy, garbled. In a very absorbent room, the

RT 101.1
.
sound will die away quickly and the room will be described as
acoustically ‘dead’. But the time for reverberation to completely die
away will depend upon how loud the sound was to begin with, and
will also depend upon the acuity of the hearing of the observer. To
quantitatively characterize the reverberation, the parameter called the
reverberation time is used. A standard reverberation time has been
defined as the time for the sound to die away to a level 60 decibels
below its original level. The reverberation time can be modeled to
permit an approximate calculation.
Early Reflections
Those reflections reaching a
listener after the arrival of the
direct sound, but before the
arrival of reverberation sound
resulting from late reflections.
The early reflections give rise to a
feeling of spaciousness in the
music hall, but in the typical listening room they tend to confuse the
stereo image, giving rise to coloration of sound due to combing.
Rationale for 60dB Reverberation Time

The reverberation time is
perceived as the time for the
sound to die away after the
sound source ceases, but that of
course depends upon the
intensity of the sound. To have
a reproducible parameter to
characterize an auditorium
which is independent of the
intensity of the test sound, it is
necessary to define a standard
reverberation time in terms of
the drop in intensity from the
original level, i.e., to define it in
terms of relative intensity.
The choice of the relative intensity to use is of course arbitrary, but

there is a good rationale for using 60 dB since the loudest crescendo for

RT 101.1
.
most orchestral music is about 100 dB and a typical room background
level for a good music-making area is about 40 dB. Thus the standard
reverberation time is seen to be about the time for the loudest crescendo
of the orchestra to die away to the level of the room background. The 60
dB range is about the range of dynamic levels for orchestral music.
Psychoacoustic Effects
Cocktail Party Effect

Imagine yourself listening to a conversation at a cocktail party; you hear
the sound which travels directly from the speaker’s mouth to your ear,
and also the combination of an enormous number of echoes from
different surfaces.
The ability of the human hearing system to selectively attend to a single

talker or stream of audio from a background cacophony of ambient
noise heard at the same time is referred to as the ‘cocktail party effect’.
Someone with normal hearing can, remarkably, concentrate on the
original sound despite the presence of unwanted echoes and
background music.
The separation of auditory targets from a similar background, required

when listening to a speaker on a cocktail-party, is a cognitive task
involving binding of acoustic components that belong to the same
object, segregation of simultaneously present components that belong
to non-target objects, and tracking of homologue objects (same speaker)
over time.
Figure 1: During conversation, we hear a combination of the original sound, a

series of echoes, and interfering noises such as other conversations.

RT 101.1
.
Masking
When we listen to music, it is very rare that it consists of just a single
pure tone. Whilst it is possible and relatively simple to arrange to listen
to a pure tone of a particular frequency in a laboratory, such a sound
would not sustain any prolonged musical interest. Almost every sound
that hears in music consists of at least two frequency components.
When two or more pure tones are heard together, an effect known as
‘masking’ can occur, where each individual tone can become more
difficult or impossible to perceive, or is partially or completely
‘masked’, due to the presence of another relatively louder tone.
In such a case, the tone which causes the masking is known as the
‘masker’ and the tone which is masked is called ‘maskee’. Given the rarity
of pure tone being heard in the music that we hear everyday, these
tones are more likely to be individual frequency component of a note
played on one instrument which either masks other components in that
note, or frequency components of another note. The extent to which
masking occurs depends on the frequencies of the masker and masker
and their relative amplitudes.
The example above shows a 1 kHz tone

played by both a trumpet (a loud
instrument) and a flute (a soft
instrument). The resultant shows that the
loud trumpet has masked the flute
completely due to their respective
frequency components being somewhat
alike

RT 101.1
.
Adding Loudness & Critical Band
When two sounds of equal loudness when sounded separately are close
together in pitch, their combined loudness when sounded together will
be only slightly louder than one of them alone. They may be said to be
in the same critical band where they are competing for the same nerve
endings on the basilar membrane of the inner ear. According the place
theory of pitch perception, sounds of a given frequency will excite the
nerve cells of the organ of Corti only at a specific place. The available
receptors show saturation effects which lead to the general rule of
thumb for loudness by limiting the increase in neural response.
If the two sounds are widely separated in pitch, the perceived loudness
of the combined tones will be considerably greater because they do not
overlap on the basilar membrane and compete for the same hair cells.
The phenomenon of the critical band has been widely investigated.
It has been found that this critical band is about 90 Hz wide for sounds
below 200 Hz and increases to about 900 Hz for frequencies around
5000 Hertz. It is suggested that this corresponds to a roughly constant
length on the basilar membrane of length about 1.2 mm and involving
some 1300 hair cells. If the tones are far apart in frequency, not within a
critical band, the combined sound may be perceived as twice as loud as
one alone.

RT 101.1
.
Missing Fundamental Effect

The subjective tones which are
produced by the beating of the various
harmonics of the sound of a musical
instrument help to reinforce the pitch of
the fundamental frequency. Most
musical instruments produce a
fundamental frequency plus several
higher tones which are whole-number
multiples of the fundamental. The beat
frequencies between the successive
harmonics constitute subjective tones
which are at the same frequency as the
fundamental and therefore reinforce the
sense of pitch of the fundamental note being played.
If the lower harmonics are not produced because of the poor fidelity or
filtering of the sound reproduction equipment, you still hear the tone as
having the pitch of the non-existent fundamental because of the
presence of these beat frequencies. This is called the missing fundamental
effect. It plays an important role in sound reproduction by preserving
the sense of pitch (including the perception of melody), when
reproduced sound loses some of its lower frequencies especially in
radio and television broadcast.
The presence of the beat frequencies between the harmonics gives a

strong sense of pitch for instruments such as the brass and woodwind
instruments. For percussion instruments such as the cymbal, the sense
of pitch is less definite because there are non-harmonic overtones
present in the sound.
Doppler Effect
You may have heard the changed pitch of a train whistle or a car horn
as the train or car approached and then receded from you. As the train
approaches the pitch of the blast is higher and it becomes lower as the
train recedes from you. This implies that the frequency of the sound
waves changes depending on the velocity of the source with respect to
you, as the train approaches the pitch is higher indicating a higher
frequency and smaller wavelength, as the train recedes from you the

RT 101.1
.
pitch is lower corresponding to a smaller frequency and a
correspondingly larger wavelength.
Whenever relative motion exists between a source of sound and a

listener, the frequency of the sound as heard by the listener is different
compared to the frequency when there is no relative motion. This
phenomenon is known as the Doppler Effect. The formula for Doppler
Effect is
f ’= f [V+ (-VD)] ÷ (V + (-VS)
Where f’ is the Doppler Effect frequency. f is the original frequency of

the sound wave, C is the speed of sound, VD us the speed of the detector
or listener, and VS is the speed of the source.
Ear Training
The basic requirement of a creative sound engineer is to be able to listen
well and analyze what they hear. There are no golden ears, just
educated ears. A person develops his or her awareness of sound
through years of education and practice. We have to constantly work at
training our ears by developing good listening habits. As an engineer,
we can concentrate our ear training around three basic practices -
music, microphones and mixing.
Listening to Music
Try and dedicate at least half an hour per day to listening to well
recorded and mixed acoustic and electric music. Listen to direct-to-two
track mixes and compare with heavily produced mixes. Listen to
different styles of music, including complex musical forms. Note the
basic ensembles used production trends and mix set-ups. Also attend
live music concerts. The engineer must learn the true timbral sound of
an instrument and its timbral balances. The engineer must be able to

RT 101.1
.
identify the timbral nuances and the characteristic of particular
instruments.
Learn the structuring of orchestral balance. There can be an ensemble

chord created by the string section, the reeds and the brass all working
together. Listen to an orchestra live, stand in front of each section and
hear its overall balance and how it layers with other sections.
For small ensemble work, listen to how a rhythm section works

together and how bass, drums, percussion, guitar and piano interlock.
Learn the structure of various song forms such as verse, chorus, break
etc. Learn how lead instrument and lead vocals interact with this song
structure. Notice how instrumentals differ from vocal tracks. Listen to
sound design in a movie or TV show. Notice how the music
underscores the action and the choice of sound effects builds a mood
and a soundscape. Notice how tension is built up and how different
characters are supported by the sound design. Notice the conventions
for scoring for different genres of film and different types of TV.
For heavily produced music, listen for production tricks. Identify the
use of different signal processing FX. Listen for panning tricks,
doubling of instruments and voices. Analyze a musical mix into the
various components of the sound stage. Notice the spread of
instruments from left to right, front to back up and down. Notice how
different stereo systems and listening rooms influence the sound of the
same piece of music.
Listening with Microphones

Mic placement relative to the instrument can provide totally different
timbral color. e.g. proximity boost on closely placed cardiod mics. A
mic can be positioned to capture just a portion of the frequency
spectrum of an instrument to be conducive with a particular “sound” or
genre. E.g. rock acoustic piano may favor the piano’s high end and
require close miking near the hammers to accent percussive attack, a
sax may be miked near the top to accent higher notes or an acoustic
guitar across the sound hole for more bass.
The way an engineer mics an instrument is influenced by:
(i) Type of music
(ii) Type of instrument

RT 101.1
.
(iii) Creative Production
(iv) Acoustics of the hall or studio
(v) Type of mic
(vi) Leakage considerations

Always make A/B comparisons between mics different and different
positions. The ear can only make good judgments by making
comparisons. In the studio reflections from stands, baffles, walls, floor
and ceiling can affect the timbre of instruments. This can cause timbre
changes, which can be problematic or used to capture an “artistic”
modified spectrum. When miking sections, improper miking can cause
timbre changes due to instrument leakage. The minimum 3:1 mic
spacing rule helps control cross-leakage. Diffuser walls placed around
acoustic instruments can provide openness and a blend of the direct
/reflected sound field. Mic placement and the number of diffusers and
their placement can greatly enhance the “air” of the instrument.
Listening in Foldback and Mixdown

A balanced cue mix captures the natural blend of the musicians. Good
foldback makes musicians play with each other instead of fighting to be
heard. If reed and brass players can’t hear themselves and the rest of
the group they tend to overplay. A singer will back off from the mic if
their voice in the headphone mix is too loud, or swallow the mic if the
mix is too soft. They will not stay in tune if they cannot hear backing
instruments. Musicians will aim their instruments at music stands or
walls for added reflection to help them overcome a hearing problem.

RT 101.1
.
Frequency Balance
An Engineer should be able to recognize characteristics of the main
frequency bands with their ears.
Hz Band Characteristics Positive Negative

16 – 160 Extreme lows Felt more than Warmth Muddiness
heard
160-250 Bass No stereo Fatness Boominess,

information Boxiness
250-2000 LowMid- Harmonics start Body Horn-

range to occur like(500-1000
Hz)
Ear fatigue
(1kHz-2kHz)
2000-4000 High Mid- Vocal Gives Tinny, thin

range intelligibility definition
4000-6000 Presence Loudness and Definition, Brash

closeness. Spatial energy,
information closeness
6000-20000 Highs Depth of field Air. Noise
Crispness.
Boosting/cu
tting helps
create
senses
/distance
Dimensional Mixing:
The final 10% of a mix picture is spatial placement and layering of
instruments or sounds. Dimensional mixing encompasses timbral
balancing and layering of spectral content and effects with the basic
instrumentation. For this, always think sound in dimensional space:
left/right, front/back, up/down. Think of a mix in Three Levels:
Level A 0 to 1 meter
Level B 1 to 6 meters
Level C 6 meters and further

RT 101.1
.
Instruments which are tracked in the studio are all recorded at roughly
the same level (SOL) and are often close miked. If an instrument is to
stand further back in the mix it has to change in volume and frequency.
Most instruments remain on level B so you can hear them all the time.
Their dynamics must be kept relatively stable so their position does not
change. Level A instruments will be lead and solo instruments. Level C
can be background instruments, loud instruments drifting in the
background, sounds, which are felt, rather than heard and Reverb.
Summary
Stereophonic sound (3D sonic view) is based on our amazing
process of encoding sounds with our bodies and decoding it
with our brains.
Sound localization is based on time/intensity cues or changes

in the spectrum of the sound.
Listening is highly personal experience, unique; No two of us

hear a given sound in exactly the same way.
Our Hearing system is very effective. It has ;

(i) Wide dynamics
(ii) Frequency and timing (phase) coded.
We do not perceive physical terms, we perceive subjective

terms.
(i) Frequency Æ pitch
(ii) Magnitude Æ loudness
(iii) Spectrum Æ timbre
(iv) Time Æ subjective duration
(v) Frequency scale Æ hearing equivalent scale (critical-
band-rate scale), masking
Human auditory system is the final receiver judging the sound

quality of the audio system.
Audio systems should be adapted to the characteristics of the

auditory system.

RT 101.1
.
References
http://hyperphysics.phy-astr.gsu.edu
http://arts.ucsc.edu/ems/music/tech_background/tech_background.html
http://www.glenbrook.k12.il.us/gbssci/phys/Class/waves/u10l2d.html
“Am I Too Loud?” Jour Audio Engineering Soc, V25, p126, Mar 1977.
Beranek, Music, Acoustics and Architecture, Wiley, 1962
Acoustics and Perception of Speech, 2nd Ed, Williams and Wilkins, 1984
Cohen, Abraham B, Hi-Fi Loudspeakers and Enclosures, Hayden, 1968
Halliday & Resnick, Fundamentals of Physics, 3E, Wiley 1988
Huber, David and Runstein, Robert, Modern Recording Techniques, 4th
Ed., Boston: Focal Press, 1997.
Terhardt, E., “Calculating Virtual Pitch”, Hearing Research 1, 155 (1979)
Tipler, Paul, Elementary Modern Physics, Worth, 1992,
Tipler, Paul, Physics for Scientists and Engineers, 2nd Ed Ext, Worth, 1982.
White, Harvey E. and White, Donald H., Physics and Music, Saunders
College, 1980
E. Zwicker, G. Flottorp, S S Stevens, Critical Bandwidth in Loudness
Summation, JASA, 29 (1957) pp548-57.

RT101-1-Intro To Audio

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RT101-1-Intro To Audio

Uploaded by

Copyright:

Available Formats

SCHOOL OF AUDIO ENGINEERING

Diploma in Audio Engineering

RA 101: INTRODUCTION TO STUDIO STUDIES

RA 101.1: INTRODUCTION TO AUDIO

The Elements of Communication

Stimulus: A medium must be stimulated in order for waves of

Medium: A medium is something intermediate or in the middle. In an

Reception/Perception: A receptor must be capable of responding to the

School of Audio Engineering Chennai 1

School of Audio Engineering Chennai 2

If the piston were to move out

If the piston were to move in and out repetitively at a rate between 20

School of Audio Engineering Chennai 3

We study the physics of waves because it provides a rich glimpse into

Waves are everywhere in nature. Our understanding of the physical

A wave can be described as a repeating and periodic disturbance that

But what is meant by the word medium? A medium is a substance or

To fully understand the nature of a wave, it is important to consider the

School of Audio Engineering Chennai 4

Consider the presence of a wave in a slinky. The first coil becomes

Sound propagates through air as a longitudinal wave. The speed of

School of Audio Engineering Chennai 5

Elasticity and a source of energy are the preconditions for periodic

School of Audio Engineering Chennai 6

A sound wave is a classic example of a longitudinal wave. As a sound

Sound is a Mechanical Wave

Sound is a mechanical wave which results from the longitudinal

School of Audio Engineering Chennai 7

Sound is a Pressure Wave

A vibrating tuning fork is capable of creating such a longitudinal wave.

School of Audio Engineering Chennai 8

Since a sound wave consists of a repeating pattern of high pressure

Simple Harmonic Motion

School of Audio Engineering Chennai 9

A mass on a spring is the standard example of such periodic motion. If

When describing the energy of a sound wave the term amplitude is

School of Audio Engineering Chennai 10

Waveform amplitudes are measured using various standards. Peak

Our perception of loudness is not proportional to the energy of the

School of Audio Engineering Chennai 11

School of Audio Engineering Chennai 12

Below are some examples of various frequencies and their corresponding

School of Audio Engineering Chennai 13

The motion relationship “distance = speed of sound x time” is the key to

School of Audio Engineering Chennai 14

School of Audio Engineering Chennai 15

However, phase shift is more often used to describe the instantaneous

We know that when two waves of identical frequency and amplitude

School of Audio Engineering Chennai 16

Phase (ø) = Time (in seconds) x Frequency x 360°

Comb filter or Combing

School of Audio Engineering Chennai 17

The Behavior of Waves

The reflection of sound follows the law “angle of incidence equals

School of Audio Engineering Chennai 18

Phase Change upon Reflection

Plane Wave Reflection

Point source reflecting from a plane surface

School of Audio Engineering Chennai 19

Reflection from Concave Surface

School of Audio Engineering Chennai 20

Sound propagates in all directions from a point source. Normally, only

School of Audio Engineering Chennai 21

School of Audio Engineering Chennai 22