Professional Documents
Culture Documents
What is Sound?
Sound is a disturbance of the atmosphere that human beings can sense
with their hearing systems. Such disturbances are produced by
practically everything that moves, especially if it moves quickly or in a
rapid and repetitive manner. The movement could be initiated by:
Hammering (rods), plucking (string), Bowing (strings), Forced air flow
(vibration of air column - Organ, voice). Vibrations from any of these
sources cause a series of pressure fluctuations of the medium
surrounding the object to travel outwards through the air from the
source.
You should be aware that the air is made up of molecules. All matter is
made out of particles and air is no exception. As you sit there in the
room, you are surrounded by molecules of oxygen, nitrogen, carbon
dioxide and some pollutants like carbon monoxide. Normally these
particles are all moving around the room, randomly dispersed and all
roughly the same distance from adjacent particles as all of the others.
The distance between these particles is determined by the air pressure
in the room. This air pressure is mostly a result of the barometric
pressure of the particular day. If it’s raining outside, you’re likely to be
in a low pressure system, so the air particles are further apart from each
other than usual. On sunny days you’re in a high pressure system so
the particles are squeezed together.
Most of the characteristics we expect of air are a result of the fact that
these particular molecules are very light and are in extremely rapid but
disorganized motion. This motion spreads the molecules out evenly, so
that any part of an enclosed space has just as many molecules as any
other. If a little extra volume were to be suddenly added to the enclosed
space (say by moving a piston into a box), the molecules nearest the
new volume would move into the recently created void, and all the
others would move a little farther apart to keep the distribution even or
in ‘equilibrium’.
Wave Theory
Waves are everywhere. Whether we recognize or not, we encounter
waves on a daily basis. Sound waves, visible light waves, radio waves,
microwaves, water waves, sine waves, waves on a string, and slinky
waves and are just a few of the examples of our daily encounters with
waves. In addition to waves, there is a variety of phenomenon in our
physical world which resembles waves so closely that we can describe
such phenomenon as being wavelike. The motion of a pendulum, the
Waves, as we will learn, carry energy from one location to another. And if
the frequency of those waves can be changed, then we can also carry a
complex signal which is capable of transmitting an idea or thought
from one location to another. Perhaps this is one of the most important
aspects of waves and will become a focus of our study in later units.
Waves on a Pond
A pebble thrown into a pond will produce concentric circular ripples
which move outward from the point of impact. If a fishing float is in the
water, the float will bob up and down as the wave moves by. This is a
characteristic of transverse waves.
Longitudinal Waves
In longitudinal waves the
displacement of the medium is
parallel to the propagation of
the wave. A wave in a “slinky”
is a good visualization. Sound
waves in air are longitudinal
waves.
are regions in the air where the air particles are compressed together and
other regions where the air particles are spread apart. These regions are
known as compressions and rarefactions respectively. The
compressions are regions of high air pressure while the rarefactions
are regions of low air pressure. The diagram below depicts a sound
wave created by a tuning fork and propagated through the air in an
open tube. The compressions and rarefactions are labeled.
The above diagram can be somewhat misleading if you are not careful.
The representation of sound by a sine wave is merely an attempt to
illustrate the sinusoidal nature of the pressure-time fluctuations. Do not
conclude that sound is a transverse wave which has crests and troughs.
Sound is indeed a longitudinal wave with compressions and
rarefactions. As sound passes through a medium, the particles of that
medium do not vibrate in a transverse manner. Do not be misled -
sound is a longitudinal wave.
Wave Graphs
Waves may be graphed as a function of time or distance. A single
frequency wave will appear as a sine wave in either case. From the
distance graph the wavelength may be determined. From the time graph,
the period and frequency can be obtained. From both together, the wave
speed can be determined.
Amplitude/Loudness/ Volume/Gain
Frequency
The rate of repetition of
the cycles of a periodic
quantity, such as a sound
wave is called frequency.
Frequency can be defined
as the number of cycles
that a periodic waveform
completes in the time of 1
(one) second.
Fig. 1
Frequency is denoted
by the symbol ƒ, and is
measured in Hertz
(Hz) - formerly called
cycles per second (cps
or c/s) - kilohertz
(kHz), or megahertz
(MHz). The three
diagrams on the right
illustrate this. The first Fig. 2
diagram shows a
complete cycle of a sine
wave completing 1
cycle in 1 second. Its
frequency is therefore
1Hz. The second
diagram shows a sine
wave completing 5
cycles in 1 second. Its
frequency is thus 5Hz.
Fig. 3
The last diagram
shows the same sine wave completing 10 cycles in the time frame of 1
second. Its frequency is 10Hz. Likewise a 1000Hz tone will complete
1000 cycles in a second and a 20,000Hz tone complete 20,000 cycles in 1
second.
The only sound which consists of a single frequency is the pure sine
tone such as produced by+a sine wave oscillator or approximated by a
tuning fork. All other sounds are complex, consisting of a number of
frequencies of greater or lesser intensity. The frequency content of a
sound is commonly referred to as its spectrum or spectra. The
subjective sense of frequency
- is called pitch. That is, frequency is an
acoustic variable, whereas pitch is a psychoacoustic one.
Wavelength
The wavelength of a wave is merely the distance (in meters) which a
disturbance travels along the medium in one complete wave cycle.
Since a wave repeats its pattern once every wave cycle, the wavelength
c
Speed of Sound
We have seen that in an elastic medium such as air, the pressure wave
of alternating condensations and rarefactions moves away from the
source of the disturbance. Since the movement of sound wave is not
confined to a specific direction, it should be referred to in terms of
speed. The lowercase letter c is usually used as the specific abbreviation
for speed of sound in air.
The speed of sound in dry air is given approximately by;
feet/ sec
The speed of sound at 0ºC in air at sea level is 331.4 m/s. The speed of
sound increases 0.6 m/s for every 1ºC increase in the temperature.
Likewise, the speed of sound decreases 0.6m/s for every 1ºC decrease in
the temperature.
The speed of sound is faster in denser mediums. The table to your right
compares the speed of sound in different mediums
Example 1
What would be the speed of sound be if the temperature was 20ºC?
331.4 + (0.6 x 20) = 331.4 + 12
= 343.4 m/s
Example 2
How far can a 100Hz sine wave travel in 2 seconds at 20ºC, sea level?
From d = t x c, where t = 2 sec and speed of sound = 343.4
d = 2 x 343.4
= 686.8 meters
Particle velocity
The term particle velocity refers to the
velocity at which a particle in the path of a
sound wave is moved (displaced) by the
wave as it passes. It should not be confused
with the velocity at which the sound wave
travels through the medium, which is constant unless the sound wave
encounters a different medium in which case the sound wave will
refract. If the sound wave is sinusoidal (sine wave shape), particle
velocity will be zero at the peaks of displacement and will reach a
maximum when passing through its normal rest position.
Phase
If a mass on a rod is rotated at constant
speed and the resulting circular path
illuminated from the edge, its shadow
will trace out simple harmonic motion. If
the shadow vertical position is traced as a
function of time, it will trace out a sine
wave. A full period of the sine wave will
correspond to a complete circle or 360
Waves that are of the same frequency which commence at the same
time are said to be in phase, phase coherent or phase correlated. In such
a case, the phase angle between the two sine waves would be 0º
Those commencing at different times are said to be out of phase, phase
incoherent or uncorrelated.
Example 1
For instance, a 250Hz wave, delayed by 1ms will be shifted in phase by
90°. How do we get this value?
Phase (ø) = Time (in seconds) x Frequency x 360°
Phase (ø) = 0.001s x 250x 360°= 90°
Example 2
What would be the phase shift if the same wave was delayed by
0.25ms?
Phase (ø) = Time (in seconds) x Frequency x 360°
Phase (ø) = 0.00025s x 250 x 360°= 22.5°
Example 3
If you want to shift two identical 500Hz waveforms with a phase
relationship of 180°, how much delay would you apply to either
waveform to be phase-shifted by that amount?
Phase (ø) = Time (in seconds) x Frequency x 360°
180° = (X sec) x 500 x 360°
X sec = 180°÷ 500 x 360°= 180÷ 180 000= 0.001 seconds or 1ms
Absorption of Sound
Different materials absorb specific
frequencies, with high frequencies being
the most susceptible to absorption.
Concrete, glass, wood and plywood
reflect sound waves, while draperies,
carpet and acoustical tiles absorb more
sound waves, especially the higher
frequencies. The more humid air is, the
greater it’s potential for high frequency absorption. That is why the
humidity levels must be controlled in a concert hall. When the air is
extremely cold, there is little humidity. As a result sounds from a jet
plane sound closer and more piercing on a cold brisk day as compared
to the identical sound on a warm summer day when more high
frequencies are absorbed in the humid air.
Refraction of Sound
Refraction is the bending of waves when they enter a medium where
their speed is different. Refraction is not so important a phenomena
with sound as it is with light where it is responsible for image
formation by lenses, the eye, cameras, etc. But bending of sound waves
does occur and is an interesting phenomenon in sound.
If the air above the earth is warmer than that at the surface, sound will
be bent back downward toward the surface by refraction.
amplifying the sound. Natural amplifiers can occur over cool lakes.
Diffraction of Sound
Diffraction is term used to describe the bending of waves around
obstacles (i.e. walls, barriers, etc) and the spreading out of waves
beyond small openings whose size is smaller relative to the wavelength
of the frequency. Diffraction is exclusively displayed by low frequencies
Simple Tone
A simple tone is a sound having a single frequency. A sine wave is an
example of a pure tone, in that there is only a single frequency and no
harmonics. The sine tone is the fundamental or ‘backbone’ of every
complex tone there is in existence.
Complex Tone
A tone having more than a single frequency component is called a
complex. For instance, a tone consisting of a fundamental and overtones
or harmonics, may be said to be complex. Two waveforms combine in a
manner which simply adds their respective amplitudes linearly at every
point in time. Thus, a complex tone or waveform can be built by mixing
together different sine waves of various amplitudes.
Square Wave
An audio waveform theoretically comprised of an infinite set of odd
harmonic sine waves.
Triangle Wave
An audio WAVEFORM theoretically comprised of an infinite set of odd
harmonic SINE WAVE. It is often used in sound synthesis because it is
less harsh than the square wave. The amplitude of its upper harmonics
falls off more rapidly.
Cylinders with one end closed will vibrate with only odd harmonics of
the fundamental. Vibrating membranes typically produce vibrations at
harmonics, but also have some resonant frequencies which are not
whole number multiples of their fundamental frequencies. It is for this
class of vibrators that the term overtone becomes useful - they are said
to have some non-harmonic overtones.
Pitch
In the same sense that loudness is the subjective sense of the objective
parameters of intensity or amplitude of a sound, the subjective
impression of frequency is pitch. As such, pitch is a psychoacoustic
variable, and the degree of sensitivity shown to it varies widely with
people. Some individuals have a sense of remembered pitch, that is, a
pitch once heard can be remembered and compared to others for some
length of time; others have a sense of absolute pitch called perfect pitch.
The term ‘pitch’ is used to describe the frequency of sound. For
example, middle C in equal temperament = 261.6 Hz
The Place Theory and its refinements provide plausible models for the
perception of the relative pitch of two tones, but do not explain the
phenomenon of perfect pitch. The just noticeable difference in pitch is
Octave
The musician often refers to the octave, a logarithmic concept that is
firmly embedded in musical scales and terminology because of its
relationship to the ear’s characteristics. Whereby harmonics of a
fundamental frequency ascend in a linear manner (i.e. 100, 200, 300,
400….), octaves of a fundamental frequency ascend/ descend in a
logarithmic manner. An octave interval between frequencies is
characterized by the frequency ratio 2:1, such as that produced when
the length of a vibrating string is halved.
A note whose pitch is an octave above or below a given note gives the
subjective impression of duplicating that note, such as when men and
women sing in unison, but actually are producing notes an octave
apart.
Envelope of Sound
Defining Envelope of Sound
The variation of (maximum) amplitude over time is called the envelope
of the sound. An envelope of sound is composed of a sound’s attack,
sustain, and decay. Graphical representation of the envelope of a sound
object may show distinctive features in its attack or onset transients,
stationary state, internal dynamics and Release. The envelope of sounds
is its macro-level amplitude behavior in time, whereas its micro-level
pattern of sound pressure variation is its waveform.
Attack
Sound begins at A and reaches its peak at level B.
Decay
Sound level falls from B to an intermediate steady level C
Sustain
It drops slightly in level and remains steady until D.
Release
When the sound source is removed at D, the sound dies down to a point of silence E.
Attack
The way a sound is initiated is called attack. There are two types of
attack:
(i) Fast
(ii) Slow
Fast attack
The closer the attack of a sound A is to the peak B of a sound, the faster
its attack is. Sounds that have a fast attack are...
Gunshots
Slaps
A snare drum or kick drum
Door slams
Sustain
Once a sound has reached its peak, the length of time that the sound
will sustain is dependent upon the energy from the source vibrations.
When the source sound stops, the sound will began to decay.
Manipulating the sustain time of a sound is yet another way of either
modifying a sound or create a totally new one.
Release
The decrease in amplitude when a vibrating force has been removed is
called release. The actual time it takes for a sound to diminish to silence
is the release time. Listening to a sound tells if it is.
Indoors (small enclosed area with a great deal of absorbency)
- little release and with very little or no reverberation
Outdoors (open unconfined area)
- long decay and release with an echo
The end of a sound is often referred to as the “tail” of a sound, and
conversely, the beginning of a sound is its “head”
among researchers.
Sound energy spreads out from its sources. For a point source of sound,
it spreads out. For a given sound intensity, a larger ear captures more
of the wave and hence more sound energy. The outer ear structures act
as part of the ear’s preamplifier to enhance the sensitivity of hearing.
The outer ear (pinna) collects more sound energy than the ear canal
would receive without it and thus contributes some area amplification.
The outer and middle ears contribute something like a factor of 100 or
about 20 decibels of amplification under optimum conditions.
The numbers here are just representative ... not precise data.
The Ossicles
The three tiniest bones in the body form the coupling between the
vibration of the eardrum and the forces exerted on the oval window of
the inner ear. The ossicles can be thought of as a compound lever which
achieves a multiplication of force. This lever action is thought to achieve
amplification by a factor of about three under optimum conditions, but
can be adjusted by muscle action to actually attenuate the sound signal
for protection against loud sounds.
The basilar membrane of the inner ear plays a critical role in the
perception of pitch according to the Place Theory.
The Cochlea
The inner ear structure called the cochlea is a snail-shell like structure
divided into three fluid-filled parts. Two are canals for the transmission
of pressure and in the third is the sensitive organ of Corti, which detects
pressure impulses and responds with electrical impulses which travel
along the auditory nerve to the brain.
Section of Cochlea
The cochlea has three fluid filled sections.
The perilymph fluid in the canals differs
from the endolymph fluid in the cochlear
duct. The Organ of Corti is the sensor of
pressure variations.
Organ of Corti
The organ of Corti is the
sensitive element in the
inner ear and can be
thought of as the body’s
microphone. It is situated
on the basilar membrane
in one of the three
compartments of the
Cochlea. It contains four
rows of hair cells which
protrude from its surface.
Above them is the tectoral
membrane which can
move in response to
pressure variations in the
fluid- filled tympanic and
vestibular canals. There
are some 16,000 -20,000 of
the hair cells distributed
along the basilar
membrane which follows
the spiral of the cochlea.
Place Theory
High frequency sounds selectively vibrate
the basilar membrane of the inner ear near
the entrance port (the oval window). Lower
frequencies travel further along the
membrane before causing appreciable
excitation of the membrane. The basic pitch
determining mechanism is based on the
location along the membrane where the hair
cells are stimulated. A schematic view of the Place Theory unrolls the
cochlea and represents the distribution of sensitive hair cells on the
organ of Corti. Pressure waves are sent through the fluid of the inner
ear by force from the stirrup.
Pitch Resolution
The normal human ear can detect the difference between 440 Hz and
441 Hz. The high pitch resolution of the ear suggests that only about a
dozen hair cells or about three tiers from the four banks of cells are
associated with each distinguishable pitch. It is hard to conceive of a
Beats
When two sound waves of different
frequency approach your ear, the
alternating constructive and destructive
interference causes the sound to be
alternatively soft and loud - a
phenomenon which is called ‘beating’ or
producing beats. The beat frequency is
equal to the absolute value of the
difference in frequency of the two
waves.
Following this, you would hear beats at a rate equal to the lower
frequency subtracted from the higher frequency. For example, if the
two tones are at 440 and 444 Hz, you’ll hear the two notes beating 4
times per second (or f1-f2).
First and foremost, you’re going to hear the two sine waves of two
frequencies.
Critical Bandwidth
There is no exact frequency difference at which the beats are heard for a
listener. However, the approximate frequency and order in which they
occur is common to all listeners, and in common with all
psychoacoustic effects, average values are quoted which are based on
measurements made for a large number of listeners.
The point where the two tones are heard as separate as opposed to
fused when the frequency difference is increased can be thought of as
the point where two peak displacements on the basilar membrane begin
to emerge from a single maximum displacement on the membrane.
However, at this point the underlying motion of the membrane which
gives rise to the two peaks causes them to interfere with each other
giving the rough sensation, and it is only when the rough sensation
becomes smooth that the separation of the places on the membrane is
sufficient to fully resolve the two tones. The frequency difference
between the pure tones at the point where a listener’s perception
changes from a rough and separate to smooth and separate is known as
critical bandwidth. A more formal definition is given by Scharf (1970),
‘the critical bandwidth is that bandwidth at which subjective responses rather
abruptly change.’
Threshold of Pain
The nominal dynamic range of human hearing is from the standard
threshold of hearing to the threshold of pain. A nominal figure for the
threshold of pain is 130 decibels, but that which may be considered
painful for one may be welcomed as entertainment by others.
Generally, younger persons are more tolerant of loud sounds than older
persons because their protective mechanisms are more effective. This
tolerance does not make them immune to the damage that loud sounds
can produce.
(ii) The curves are lowest in the range from 1 to 5 kHz, with a dip at 4 kHz,
indicating that the ear is most sensitive to frequencies in this range. This
is an interesting area for the reason that the bulk of our speech
(specifically consonant sounds) relies on information in this frequency
range (although it’s like that the speech evolved to capitalize on the
sensitive frequency range)
Given this facts of auditory life, there’s something to be said for keeping
listening levels within reason during a mixdown session. If that level
gets too loud (as often happens), the final product will probably suffer
as a result, when heard by the listener at a normal listening level.
Furthermore, listening to loud volumes subjects the hearing system to
fatigue, causing the threshold of hearing and pain to shift somewhat
and misinterpret what’s being reproduced by the speakers. It is also
worthwhile to mention that studio grade professional speakers can
reproduce larger quantities of bass and treble without distortion or
other erroneous sonic artifacts. However, the same mix if played back
on a normal system with average speakers may distort, sound muddy
(clouding of the frequency spectrum by bass frequencies) or thin
(excessive high frequency content).
The graph below shows the duration of high intensity sound exposure in
minutes, hours, days, and weeks and how it affects the threshold of
hearing.
Pinna
Gathers the sound
D
ifferentiates (to some extent) sounds from front and rear
Sound localization due to the interference on the pinna
Noise Spectra
If we have all frequencies with random relative phase, the result is
noise in its various incarnations, the two most common of which are
white and pink noise.
White Noise
White noise is a type of
noise that is produced by
combining sounds of all
different frequencies
together. If you took all of
the imaginable tones that a
human can hear and
combined them together,
you would have white
noise.
White noise is defined as a noise that has equal amount of energy per
frequency. This means that if you could measure the amount of energy
between 100 Hz and 200 Hz it would equal the amount of energy
between 1000 Hz and 1100 Hz. This sounds “bright” (hence “white”) to
us because we hear pitch in octaves. 1 octave is a doubling of frequency,
therefore 100 Hz - 200 Hz is an octave, but 1000 Hz - 2000 Hz is also an
Here is one way to think about it. Let’s say two people are talking at the
same time. Your brain can normally “pick out” one of the two voices
and actually listen to it and understand it. If three people are talking
simultaneously, your brain can probably still pick out one voice.
However, if 1,000 people are talking simultaneously, there is no way
that your brain can pick out one voice. It turns out that 1,000 people
talking together sounds a lot like white noise. So when you turn on a
fan to create white noise, you are essentially creating a source of 1,000
voices. The voice next-door makes it 1,001 voices, and your brain can’t
pick it out any more.
Pink Noise
Pink noise is noise that
has an equal amount of
energy per octave. This
means that there is less
energy per Hz as you
go up in frequency.
Pink noise is achieved
by running white noise
through a pinking
filter.
A pinking filter is no
more that a 6dB per
octave roll off filter.
Since white noise rises up 6dB per octave, there is a drop of 6dB of the
energy each time you go up an octave. To say it in another way, the 6dB
roll-off filter negates and equalizes the amount of energy per octave –
Psychoacoustics
Psychoacoustics may be broadly defined as a study of the complex
relationship between physical sound and the brain’s reaction and
interpretation of them in a sound field. Until recently, psychoacoustics
has devoted more attention to the behavior of the peripheral auditory
system than to the details of cognitive processing.
Response
The response of a device or system is the motion (or other output)
resulting from excitation (by a stimulus) under specified conditions. A
qualifying adjective is usually prefixed to the term (e.g., frequency
response, amplitude response, transient response, etc.) to indicate the
type of response under consideration.
Frequency response curves for (a) two violin strings, showing characteristic
resonance regions, and (b) a loudspeaker which reproduces frequencies
Binaural Localization
Humans, like most vertebrates, have two ears that are positioned at
about equal height at the two sides of the head. Physically, the two ears
and the head form an antenna system, mounted on a mobile base. This
antenna system receives acoustic waves of the medium in which it is
immersed, usually air. The two waves received and transmitted by the
two ears are the physiologically adequate input to a specific sensory
system, the auditory system.
Although humans can hear with one ear only - so called monaural
hearing - hearing with two functioning ears is clearly superior. This fact
(iii) Enhancement of the signals from a chosen source with respect to further
signals from incoherent sources, as well as enhancement of the direct
(unreflected) signals from sources in a reverberant environment.
Definitions
A few frequently encountered terms are given brief definitions here, for
the sake of the discussion which follows:-
Image localization. - The term localization refers to the perception of the point at
which a sound source, or image, seems to be situated with respect to the listener’s
own position.
Arrival angle. - The angle from which an original sound source arrives, with zero
degrees understood to indicate a source that is directly in front of the listener.
Reproduced source - Any sound recorded earlier and played back over one or
two speakers.
Localization
Localization refers to our ability
to make judgments as to the
direction and distance of a sound
source in the real world
environment. We use various
cues to help us localize the
direction of a sound. When
considering the sound source
and the environment, the fact
that sound waves travel in all
directions from a particular
sound forces the listener to
cope with direct and indirect
sounds. Direct sound is the most direct path that sounds takes, that is
from the object creating the sound to the actual perceiver of the sound.
Indirect sound incorporates all of the reflections of the sound that the
perceiver hears at a delayed interval from the direct sound and
provides the listener with information as to the space, location and
distance of the sound within the environment.
Sound enters the ear canal through direct paths, and indirect paths that
reflect from the complex folds of the pinna. When the reflections of the
indirect sounds combine in the ear with the direct sounds, pinna
filtering occurs, changing the received sound’s frequency response. The
ear/brain duo interprets this equalization, producing cues (assisting
zenith localization, for example) from the filtering effect. To provide
still more directional cues, small head movements allow the ear/brain to
judge relative differences in the sound field perspective. With our
marvelous acuity, we can hear sounds coming from all around us,
whether they are naturally created or coming from the speakers of a
stereo or surround sound system.
The median plane is the region where the sound sources are equidistant
from the two ears. The horizontal plane is level with the listener’s ears.
The frontal or lateral plane divides the head vertically between the front
and the back. The position of the sound source relative to the center of
the listener’s head is expressed in terms of azimuth (0-360 degrees, from
in front of the head all the way around the head), elevation (angle
between the horizontal plane up 90 degrees or below -90 degrees) and
distance.
Localization Parameter
Using the listener’s own position as a reference point, the localization of
a sound source may be conveniently described by two parameters:
distance and arrival angle.
Distance cues
The perception of the distance from which a sound arrives is itself a
function of four (4) variables, each which is discussed below. The
variables are
(i) Loudness
(ii) Ratio of direct to reflected sound
(iii) Frequency response (high frequency attenuation)
(iv) Time delay
Loudness
All else being equal, it is obvious that the closer a listener is to the
sound source, the louder it will be. However, all else is rarely ever
equal, and loudness by itself is relatively uninformative, For example,
turning a volume control up or down does nothing to vary the
impression of distance, unless the level change is accompanies by one
or more other important distance cues.
Time Delay
As a final distance cue, it takes a certain
amount of time for any sound to reach the
listener. For example, the sound produced by
a musician in the last row of a large ensemble
may arrive some 20 or more milliseconds
later than the sound of a front and center
placed soloist. With the earlier sound serving
as a frame of reference, the later arrival of a
more distant source becomes a subtle yet
powerful distance cue.
Haas Effect
Also called the Precedence Effect, or Law of the First Wave front, describes
the human psychoacoustic phenomena of correctly identifying the
direction of a sound source heard in both ears but arriving at different
times. Due to the head’s geometry (two ears spaced apart, separated by
a barrier) the direct sound from any source first enters the ear closest to
the source, then the ear farthest away.
The Haas Effect tells us that humans localize a sound source based
upon the first arriving sound, if the subsequent arrivals are within 25-30
milliseconds. If the later arrivals are longer than this, then two distinct
sounds are heard. The Haas Effect is true even when the second arrival
is louder than the first (even by as much as 10 dB!). In essence we do
not “hear” the delayed sound. This is the hearing example of human
sensory inhibition that applies to all our senses. Sensory inhibition
describes the phenomena where the response to a first stimulus causes
the response to a second stimulus to be inhibited, i.e., sound first
entering one ear cause us to “not hear” the delayed sound entering into
the other ear (within the 35 milliseconds time window). Sound arriving
at both ears simultaneously is heard as coming from straight ahead, or
behind, or within the head.
relative loudness between the two ears or Interaural Intensity Differences (IID)
time of arrival difference between the two ears or Interaural Time Difference
(ITD)
They are dependent on the direction of the sound. For example, before
a sound wave gets to the eardrum it first passes through the outer ear
structure, called the pinna. The pinna acts as a variable filter;
accentuating or suppressing mid and high frequency energy of a sound
wave to various degrees, depending on the angle at which the sound
wave hits the pinna.
Sounds that are above 1500 Hz are reflected off the body, while
waveforms of lower frequency actually bend around the body. The
1500 Hz wave form is significant because one cycle is roughly the same
size as the diameter of our head.
HRTFs can be thought of a two audio filters, one for each of the ears,
that capture the listening cues that are applied to the sound as it travels
through the environment to the eardrum. The filters will change
depending on the direction of the sound source.
Early Reflections
Those reflections reaching a
listener after the arrival of the
direct sound, but before the
arrival of reverberation sound
resulting from late reflections.
The early reflections give rise to a
feeling of spaciousness in the
music hall, but in the typical listening room they tend to confuse the
stereo image, giving rise to coloration of sound due to combing.
Psychoacoustic Effects
When two or more pure tones are heard together, an effect known as
‘masking’ can occur, where each individual tone can become more
difficult or impossible to perceive, or is partially or completely
‘masked’, due to the presence of another relatively louder tone.
In such a case, the tone which causes the masking is known as the
‘masker’ and the tone which is masked is called ‘maskee’. Given the rarity
of pure tone being heard in the music that we hear everyday, these
tones are more likely to be individual frequency component of a note
played on one instrument which either masks other components in that
note, or frequency components of another note. The extent to which
masking occurs depends on the frequencies of the masker and masker
and their relative amplitudes.
When two sounds of equal loudness when sounded separately are close
together in pitch, their combined loudness when sounded together will
be only slightly louder than one of them alone. They may be said to be
in the same critical band where they are competing for the same nerve
endings on the basilar membrane of the inner ear. According the place
theory of pitch perception, sounds of a given frequency will excite the
nerve cells of the organ of Corti only at a specific place. The available
receptors show saturation effects which lead to the general rule of
thumb for loudness by limiting the increase in neural response.
If the two sounds are widely separated in pitch, the perceived loudness
of the combined tones will be considerably greater because they do not
overlap on the basilar membrane and compete for the same hair cells.
The phenomenon of the critical band has been widely investigated.
It has been found that this critical band is about 90 Hz wide for sounds
below 200 Hz and increases to about 900 Hz for frequencies around
5000 Hertz. It is suggested that this corresponds to a roughly constant
length on the basilar membrane of length about 1.2 mm and involving
some 1300 hair cells. If the tones are far apart in frequency, not within a
critical band, the combined sound may be perceived as twice as loud as
one alone.
If the lower harmonics are not produced because of the poor fidelity or
filtering of the sound reproduction equipment, you still hear the tone as
having the pitch of the non-existent fundamental because of the
presence of these beat frequencies. This is called the missing fundamental
effect. It plays an important role in sound reproduction by preserving
the sense of pitch (including the perception of melody), when
reproduced sound loses some of its lower frequencies especially in
radio and television broadcast.
Doppler Effect
You may have heard the changed pitch of a train whistle or a car horn
as the train or car approached and then receded from you. As the train
approaches the pitch of the blast is higher and it becomes lower as the
train recedes from you. This implies that the frequency of the sound
waves changes depending on the velocity of the source with respect to
you, as the train approaches the pitch is higher indicating a higher
frequency and smaller wavelength, as the train recedes from you the
Ear Training
The basic requirement of a creative sound engineer is to be able to listen
well and analyze what they hear. There are no golden ears, just
educated ears. A person develops his or her awareness of sound
through years of education and practice. We have to constantly work at
training our ears by developing good listening habits. As an engineer,
we can concentrate our ear training around three basic practices -
music, microphones and mixing.
Listening to Music
Try and dedicate at least half an hour per day to listening to well
recorded and mixed acoustic and electric music. Listen to direct-to-two
track mixes and compare with heavily produced mixes. Listen to
different styles of music, including complex musical forms. Note the
basic ensembles used production trends and mix set-ups. Also attend
live music concerts. The engineer must learn the true timbral sound of
an instrument and its timbral balances. The engineer must be able to
For heavily produced music, listen for production tricks. Identify the
use of different signal processing FX. Listen for panning tricks,
doubling of instruments and voices. Analyze a musical mix into the
various components of the sound stage. Notice the spread of
instruments from left to right, front to back up and down. Notice how
different stereo systems and listening rooms influence the sound of the
same piece of music.
Dimensional Mixing:
The final 10% of a mix picture is spatial placement and layering of
instruments or sounds. Dimensional mixing encompasses timbral
balancing and layering of spectral content and effects with the basic
instrumentation. For this, always think sound in dimensional space:
left/right, front/back, up/down. Think of a mix in Three Levels:
Level A 0 to 1 meter
Level B 1 to 6 meters
Level C 6 meters and further
Instruments which are tracked in the studio are all recorded at roughly
the same level (SOL) and are often close miked. If an instrument is to
stand further back in the mix it has to change in volume and frequency.
Most instruments remain on level B so you can hear them all the time.
Their dynamics must be kept relatively stable so their position does not
change. Level A instruments will be lead and solo instruments. Level C
can be background instruments, loud instruments drifting in the
background, sounds, which are felt, rather than heard and Reverb.
Summary
Stereophonic sound (3D sonic view) is based on our amazing
process of encoding sounds with our bodies and decoding it
with our brains.
References
http://hyperphysics.phy-astr.gsu.edu
http://arts.ucsc.edu/ems/music/tech_background/tech_background.html
http://www.glenbrook.k12.il.us/gbssci/phys/Class/waves/u10l2d.html
“Am I Too Loud?” Jour Audio Engineering Soc, V25, p126, Mar 1977.
Beranek, Music, Acoustics and Architecture, Wiley, 1962
Acoustics and Perception of Speech, 2nd Ed, Williams and Wilkins, 1984
Cohen, Abraham B, Hi-Fi Loudspeakers and Enclosures, Hayden, 1968
Halliday & Resnick, Fundamentals of Physics, 3E, Wiley 1988
Huber, David and Runstein, Robert, Modern Recording Techniques, 4th
Ed., Boston: Focal Press, 1997.
Terhardt, E., “Calculating Virtual Pitch”, Hearing Research 1, 155 (1979)
Tipler, Paul, Elementary Modern Physics, Worth, 1992,
Tipler, Paul, Physics for Scientists and Engineers, 2nd Ed Ext, Worth, 1982.
White, Harvey E. and White, Donald H., Physics and Music, Saunders
College, 1980
E. Zwicker, G. Flottorp, S S Stevens, Critical Bandwidth in Loudness
Summation, JASA, 29 (1957) pp548-57.