Interactive Audio: Sound, Waves, The Ear 3D Audio

Interactive Audio
Sound, Waves, the Ear

3D audio
Apr 15, Spring 2002 CS 7497

Overview
 Fundamentals of Sound
 Psychoacoustics
 Interactive Audio
 Applications

What is sound?
 Sound is the sensation perceived by the

sense of hearing
 Audio is acoustic, mechanical, or
electrical frequencies corresponding to
normally audible sound waves

Dual Nature of Sound
 Transfer of sound and physical stimulation

of ear
 Physiological and psychological processing
in ear and brain (psychoacoustics)

Transmission of Sound
 Requires a medium with elasticity and

inertia (air, water, steel, etc.)
 Movements of air molecules result in the
propagation of a sound wave

Particle Motion

Longitudinal Motion of Air

Wavefronts and Rays

Reflection of Sound

Absorption of Sound
 Some materials readily absorb the energy

of a sound wave
 Example: carpet, curtains at a movie
theater

Refraction of Sound

Refraction of Sound

Diffusion of Sound
 Not analogous to diffusion of light

 Naturally occurring diffusions of sounds
typically affect only a small subset of
audible frequencies
 Nearly full diffusion of sound requires a
reflection phase grating (Schroeder
Diffuser)

The Inverse-Square Law
(Attenuation)
W
I
4r 2
I is the sound intensity in W/cm^2
W is the sound power of the source in W
r is the distance from the source in cm

Psychoacoustics
 Physiological Interactions with audio

 Psychological processing

Ear Anatomy

“Idealized” Ear

Mechanical Model of
Middle Ear

The Skull
 Occludes wavelengths “small” relative to

the skull
 Causes diffraction around the head (helps
amplify sounds)
 Wavelengths much larger than the skull
are not affected (explains how low
frequencies are not directional)

The Pinna

The Pinna
 Directs sound into the ear

 Provide cues which indicate sound
direction

Importance of the Pinna

Ear Canal
 ~0.7cm diam. And ~3cm long

 Amplifies sound at quarter wavelength
resonant frequency (~3kHz)

Ear Canal and Skull
 (A) Dark line – ear canal only

 (B) Dashed line – ear canal and skull
diffraction

Middle Ear
 Eardrum vibrates from sound pressure

changes
 Ossicles transfer vibration to the oval
window
 Impedance difference of air and inner ear
fluid is matched by ratio of surface area of
eardrum and surface area of oval window

Inner Ear

The Cochlea
 Mechanical-to-electrical transducer
 Frequency-selective analyzer
 Tectorial and Basilar membranes rub
together to stimulate Hair Cells

Place Theory

Place Theory
 The position of maximum vibration of the

basilar membrane corresponds to the
perceived pitch of pure tones
 Each hair cell and each nerve fiber has
very sharp bandpass characteristics

Auditory Area (20Hz-
20kHz)

Spatial Hearing
 Ability to determine direction and

distance from a sound source
 Not fully understood process
 However, some cues have been identified
as useful

The “Duplex” Theory of
Localization
 Interaural Intensity Differences (IIDs)

 Interaural Arrival-Time Differences (ITDs)

Interaural Intensity
Difference
 The skull produces a sound shadow
 Intensity difference results from one ear being
shadowed and the other not
 The IID does not apply to frequencies below
1000Hz (waves similar or larger than size of
head)
 Sound shadowing can result in up to ~20dB
drops for frequencies >=6000Hz
 The Inverse-Square Law can also effect intensity

Interaural Intensity
Difference

Interaural Arrival-Time
Difference
 Perception of phase difference between

ears caused by arrival-time delay (ITD)
 Ear closest to sound source hears the
sound before the other ear

Interaural Arrival-Time
Difference

Cones of Confusion
 Binaural difference cues (IIDs and ITDs)

result in a locus of points for which
measurements will be the same
 Results in ambiguity in the determination
of sound source position

Cones of Confusion

How do humans resolve
the “Cones of Confusion”
problem?
 Cues used for localization are embodied in

the free-field to the eardrum
 The free-field is affected by sound
shadowing from head and torso as well as
diffractions from the pinna

Pinna’s Effect On The Free-
field

Head-related Transfer
Function (HRTF)
 The acoustic transfer function between a

point in space and the eardrum of the
listener
 Encompasses all free-field effects

HRTF effect on IID

Monaural and Dynamic
Cues
 Spectral cues
 Distance cues
 Direct-to-reverberant energy ratio
 High-to-low frequency energy ratio
 Head rotation or tilt

Spectral Cues
 Comparison of a known source spectrum

with received spectrum
 If spectrum is not known cues can still be
obtained by assuming spectrum is locally
flat (or constant slope)

Pinna’s Effect on Spectrum

Distance Cues
 Variation of signal level with distance

(attenuation)
 Useful only in regards to changes in
distance or if the sound source has a
known signal level

Direct-to-Reverberant
Energy Ratio
 Results from observation that

reverberation level is constant over
position in an enclosed space
 But direct sound energy level decreases
with increasing source-to-listener distance

High-to-Low Frequency
Energy Ratio
 Observation that air attenuates high

frequencies more rapidly than low
frequencies over distance

Head Rotation or Tilt
 Rotation or tilt can alter interaural

spectrum in predictable manner
 Can resolve positional ambiguities on a
cone of confusion

The Haas (or Precedence)
Effect
 The perceptual weighting of binaural cues
of the first arriving sound over reflections
of the same sound
 Generally reveals true location of sound
source while filtering out contradictory
reflections
 Hypothesized to be important from
evolutionary standpoint

Interactive Audio
 Virtual Sound Space

 Facilitate the perception of monaural,
binaural, and dynamic cues within the
virtual environment
 Model the virtual sound space in real-
time

Digital Recording
 Audio can be digitized and processed on a

computer
 Digital formats have frequency and
dynamic range limitations

Digital Problems
 Current formats do not completely cover

the full frequency range of human hearing
(especially low frequencies)
 Representing 0-120dB would require too
many bits!

Review
 Distance cues (attenuation)

 Direct-to-reverberant energy ratio
 High-to-low frequency energy ratio
 Doppler Effect
 IID, ITD
 Spectral cues (effects from head, pinna)

Attenuation
 Inverse-Square Law
 “Overkill”
 Sounds fall off too fast
 Solution: Add an ambient term (just like in
graphics)

Static Attenuation
 Set sample volume based on distance

 Volume level is only calculated at
beginning of sample
 Low CPU usage
 Best for short duration samples
 Bad for long duration samples

Dynamic Attenuation
 Sample volume based on distance

 Volume level recalculated every frame
 Good for long duration samples
 Higher CPU usage (3 multiplies every
frame per sample)
 Temporal Aliasing

Temporal Aliasing
 You will hear “Stair Stepping” or discrete

volume levels as attenuation is
recalculated
 Solution: Increase update rate
 Rule of Thumb: At least 20Hz (twice as
much as needed for VR graphics)
 Some cues more susceptible than others

Stereo Attenuation
 Set sample volume based on distance per

channel (left and right)
 Even higher CPU usage (3 multiplies +
trigonometry per frame, per sample)
 Gross approximation of IID

Stereo Attenuation

Multiple Channel Audio
 More than 2 speakers
 Typically oriented in a horizontal plane around
the user
 Usually 4 or 5 directional speakers (Surround
Sound™ or Dolby Digital™)
 Good for directional cues
 Expensive to calculate (probably need hardware
support—especially for Surround Sound or Dolby
Digital)

Stereo Extenders
 Processing techniques for increasing

stereo spread
 Processed after stereo attenuation is
calculated (DSP inside speakers usually)
 Example: QSound

Stereo Extenders

“Solution” to the Dynamic
Range Problem
 Assume that an individual sound will not

have much dynamic range
 Scale the attenuation function to fit a min
and max distance

“Solution” to the Dynamic
Range Problem

Sound Source With Limited
Dynamic Range

Modeling Interaural Arrival-
Time Difference
 Want to introduce phase difference

between left and right ear
 PROBLEM: left ear must only hear what
was meant for left ear; same for right ear!

How to control what ears
hear?
 Easy solution: Head phones

 Hard solution: Cross-talk cancellation

Headphone Solution
 Precise control of what each ear hears
 Good for VR (immersive)
 Not good for multi-user VR (CAVE)
 Cumbersome
 Need to track user’s head for proper HRTF
calculations
 If using HRTF, ear buds are ideal (remove
effect of pinna)
Cross-talk Cancellation
 Left speaker plays left channel and the

cancellation of the right channel (same for
right)
 Results in a sweet spot where left ear will
only hear left channel and right ear will
only hear right channel

Cross-talk Cancellation

Problems with Cross-Talk
Cancellation
 Sweet Spot is a single user experience

 Implementation requires intimate
knowledge of advanced calculus and
Fourier Analysis
 Speakers must be accurately placed and
oriented
 Needs dedicated DSP hardware

Calculating ITD Effects
 Determine distance from sound source to

each ear
 Simple physics to determine arrival time
of sound to each ear
 Heavy Duty™ math required to smoothly
interpolate phase changes

Pinna, Head, and
Shoulders
 Determine HRTF from spectral analysis of

head-related impulse response (HRIR)
 Filter sounds by scaling intensities at each
frequency
 Definitely need dedicated hardware

Determining the HRTF from
head-related impulse
response (HRIR)
Microphone for recording HRIRs

HRIRs

HRIRs

Generic HRTF
 Use of an “average” HRIR to determine

HRTF
 Works fairly well for 80% of people
 Custom HRTF’s are quite often impractical

Environmental Effects
 Obstruction/Occlusion
 Reverberation
 Doppler Shift
 Atmospheric Effects

Obstruction
 Same as sound shadowing

 Generally approximated by a ray test and
a low pass filter
 High frequencies should get shadowed
while low frequencies diffract

Obstruction

Occlusion
 A completely blocked sound

 Example: A sound that penetrates a
closed door or a wall
 The sound will be muffled (low pass filter)

Reverberation
 Effects from sound reflection

 Similar to echo
 Static reverberation
 Dynamic reverberation

Static Reverberation
 Relies on the “closed container”

assumption
 Parameters used to specify approximate
environment conditions (decay, room size,
etc.)
 Example: Microsoft DirectSound3D EAX

Static Reverberation

Dynamic Reverberation
 Calculation of reflections off of surfaces

taking into account surface properties
 Typically diffusion and diffraction ignored
 “Wave Tracing”
 Example: Aureal A3D 2.0 or Beam Tracing
Paper

Dynamic Reverberation

Comparison
 Static Reverberation less expensive

computationally, simple to implement
 Dynamic Reverberation very expensive
computationally, difficult to implement,
but potentially superior results

Doppler Shift
 Change in frequency due to velocity

 Very susceptible to temporal aliasing
 The faster the update rate the better
 Requires dedicated hardware

Atmospheric Effects
 Attenuate high frequencies faster than low

frequencies
 Moisture in air increases this effect

Applications and Current
Research
 Beam Tracing
 NAVE
 Effect of Audio on visual quality
 Audio Spotlight

Beam Tracing
 Video! (from Siggraph ’98 Conference

Proceedings Video Tape)
 From paper: “A Beam Tracing Approach to
Acoustic Modeling for Interactive Virtual
Environments”, Thomas Funkhouser

NAVE
 HRTF (ITD, IID) via cross-talk cancellation

of two front speakers (SBLive! DS3D)
 Two rear speakers provide directional and
intensity cues
 Discrete bass channel (2nd sound card)
 Static reverberation (EAX)

Effect of Audio on Visual
Quality
 GT Study shows that ambient sounds
enhance sense of presence, as well as
subjective quality of 3D graphics
 Enhanced recall and recognition of visual
objects
 Dr. Russell Storms study showed
enhanced subjective quality of 2D
graphics

Audio Spotlight
 Produces audio beam (like a flash light)

 Makes use of interference from ultrasonic
waves
 Potentially great dynamic range (better
than speaker cones)

Audio Spotlight

Audio Spotlight

Audio Spotlight Compared
to Speaker

Audio Spotlight Beam
Dimensions

Audio Spotlight Distortion

Audio Spotlight
 Holy Grail of interactive audio?

 Avoid cross-talk cancellation
 Track user’s ears and aim spotlight at
head
 AR; aim it at objects

Open Research
 Diffusion (some work with radiosity)

 Diffraction
 HRTFs with audio spotlights
 Integration of graphics and audio
hardware for wave tracing (Nvidia?)

Interactive Audio: Sound, Waves, The Ear 3D Audio

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Interactive Audio: Sound, Waves, The Ear 3D Audio

Uploaded by

Copyright:

Available Formats

Interactive Audio

Sound, Waves, the Ear

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

 Sound is the sensation perceived by the

Apr 15, Spring 2002 CS 7497

 Transfer of sound and physical stimulation

Apr 15, Spring 2002 CS 7497

 Requires a medium with elasticity and

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

 Some materials readily absorb the energy

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

 Not analogous to diffusion of light

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

 Physiological Interactions with audio

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

 Occludes wavelengths “small” relative to

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

 Directs sound into the ear

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

 ~0.7cm diam. And ~3cm long

Apr 15, Spring 2002 CS 7497

 (A) Dark line – ear canal only

Apr 15, Spring 2002 CS 7497

 Eardrum vibrates from sound pressure

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

 The position of maximum vibration of the

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

 Ability to determine direction and

Apr 15, Spring 2002 CS 7497

 Interaural Intensity Differences (IIDs)

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

 Perception of phase difference between

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

 Binaural difference cues (IIDs and ITDs)

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

 Cues used for localization are embodied in

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

 The acoustic transfer function between a

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497

Apr 15, Spring 2002 CS 7497