You are on page 1of 102

Interactive Audio

Sound, Waves, the Ear


3D audio

Apr 15, Spring 2002 CS 7497


Overview

 Fundamentals of Sound
 Psychoacoustics
 Interactive Audio
 Applications

Apr 15, Spring 2002 CS 7497


What is sound?

 Sound is the sensation perceived by the


sense of hearing
 Audio is acoustic, mechanical, or
electrical frequencies corresponding to
normally audible sound waves

Apr 15, Spring 2002 CS 7497


Dual Nature of Sound

 Transfer of sound and physical stimulation


of ear
 Physiological and psychological processing
in ear and brain (psychoacoustics)

Apr 15, Spring 2002 CS 7497


Transmission of Sound

 Requires a medium with elasticity and


inertia (air, water, steel, etc.)
 Movements of air molecules result in the
propagation of a sound wave

Apr 15, Spring 2002 CS 7497


Particle Motion

Apr 15, Spring 2002 CS 7497


Longitudinal Motion of Air

Apr 15, Spring 2002 CS 7497


Wavefronts and Rays

Apr 15, Spring 2002 CS 7497


Reflection of Sound

Apr 15, Spring 2002 CS 7497


Absorption of Sound

 Some materials readily absorb the energy


of a sound wave
 Example: carpet, curtains at a movie
theater

Apr 15, Spring 2002 CS 7497


Refraction of Sound

Apr 15, Spring 2002 CS 7497


Refraction of Sound

Apr 15, Spring 2002 CS 7497


Diffusion of Sound

 Not analogous to diffusion of light


 Naturally occurring diffusions of sounds
typically affect only a small subset of
audible frequencies
 Nearly full diffusion of sound requires a
reflection phase grating (Schroeder
Diffuser)

Apr 15, Spring 2002 CS 7497


The Inverse-Square Law
(Attenuation)

W
I
4r 2
I is the sound intensity in W/cm^2
W is the sound power of the source in W
r is the distance from the source in cm

Apr 15, Spring 2002 CS 7497


Psychoacoustics

 Physiological Interactions with audio


 Psychological processing

Apr 15, Spring 2002 CS 7497


Ear Anatomy

Apr 15, Spring 2002 CS 7497


“Idealized” Ear

Apr 15, Spring 2002 CS 7497


Mechanical Model of
Middle Ear

Apr 15, Spring 2002 CS 7497


The Skull

 Occludes wavelengths “small” relative to


the skull
 Causes diffraction around the head (helps
amplify sounds)
 Wavelengths much larger than the skull
are not affected (explains how low
frequencies are not directional)

Apr 15, Spring 2002 CS 7497


The Pinna

Apr 15, Spring 2002 CS 7497


The Pinna

 Directs sound into the ear


 Provide cues which indicate sound
direction

Apr 15, Spring 2002 CS 7497


Importance of the Pinna

Apr 15, Spring 2002 CS 7497


Ear Canal

 ~0.7cm diam. And ~3cm long


 Amplifies sound at quarter wavelength
resonant frequency (~3kHz)

Apr 15, Spring 2002 CS 7497


Ear Canal and Skull

 (A) Dark line – ear canal only


 (B) Dashed line – ear canal and skull
diffraction

Apr 15, Spring 2002 CS 7497


Middle Ear

 Eardrum vibrates from sound pressure


changes
 Ossicles transfer vibration to the oval
window
 Impedance difference of air and inner ear
fluid is matched by ratio of surface area of
eardrum and surface area of oval window

Apr 15, Spring 2002 CS 7497


Inner Ear

Apr 15, Spring 2002 CS 7497


The Cochlea

 Mechanical-to-electrical transducer
 Frequency-selective analyzer
 Tectorial and Basilar membranes rub
together to stimulate Hair Cells

Apr 15, Spring 2002 CS 7497


Place Theory

Apr 15, Spring 2002 CS 7497


Place Theory

 The position of maximum vibration of the


basilar membrane corresponds to the
perceived pitch of pure tones
 Each hair cell and each nerve fiber has
very sharp bandpass characteristics

Apr 15, Spring 2002 CS 7497


Auditory Area (20Hz-
20kHz)

Apr 15, Spring 2002 CS 7497


Spatial Hearing

 Ability to determine direction and


distance from a sound source
 Not fully understood process
 However, some cues have been identified
as useful

Apr 15, Spring 2002 CS 7497


The “Duplex” Theory of
Localization

 Interaural Intensity Differences (IIDs)


 Interaural Arrival-Time Differences (ITDs)

Apr 15, Spring 2002 CS 7497


Interaural Intensity
Difference
 The skull produces a sound shadow
 Intensity difference results from one ear being
shadowed and the other not
 The IID does not apply to frequencies below
1000Hz (waves similar or larger than size of
head)
 Sound shadowing can result in up to ~20dB
drops for frequencies >=6000Hz
 The Inverse-Square Law can also effect intensity

Apr 15, Spring 2002 CS 7497


Interaural Intensity
Difference

Apr 15, Spring 2002 CS 7497


Interaural Arrival-Time
Difference

 Perception of phase difference between


ears caused by arrival-time delay (ITD)
 Ear closest to sound source hears the
sound before the other ear

Apr 15, Spring 2002 CS 7497


Interaural Arrival-Time
Difference

Apr 15, Spring 2002 CS 7497


Cones of Confusion

 Binaural difference cues (IIDs and ITDs)


result in a locus of points for which
measurements will be the same
 Results in ambiguity in the determination
of sound source position

Apr 15, Spring 2002 CS 7497


Cones of Confusion

Apr 15, Spring 2002 CS 7497


How do humans resolve
the “Cones of Confusion”
problem?

 Cues used for localization are embodied in


the free-field to the eardrum
 The free-field is affected by sound
shadowing from head and torso as well as
diffractions from the pinna

Apr 15, Spring 2002 CS 7497


Pinna’s Effect On The Free-
field

Apr 15, Spring 2002 CS 7497


Head-related Transfer
Function (HRTF)

 The acoustic transfer function between a


point in space and the eardrum of the
listener
 Encompasses all free-field effects

Apr 15, Spring 2002 CS 7497


HRTF effect on IID

Apr 15, Spring 2002 CS 7497


Monaural and Dynamic
Cues

 Spectral cues
 Distance cues
 Direct-to-reverberant energy ratio
 High-to-low frequency energy ratio
 Head rotation or tilt

Apr 15, Spring 2002 CS 7497


Spectral Cues

 Comparison of a known source spectrum


with received spectrum
 If spectrum is not known cues can still be
obtained by assuming spectrum is locally
flat (or constant slope)

Apr 15, Spring 2002 CS 7497


Pinna’s Effect on Spectrum

Apr 15, Spring 2002 CS 7497


Distance Cues

 Variation of signal level with distance


(attenuation)
 Useful only in regards to changes in
distance or if the sound source has a
known signal level

Apr 15, Spring 2002 CS 7497


Direct-to-Reverberant
Energy Ratio

 Results from observation that


reverberation level is constant over
position in an enclosed space
 But direct sound energy level decreases
with increasing source-to-listener distance

Apr 15, Spring 2002 CS 7497


High-to-Low Frequency
Energy Ratio

 Observation that air attenuates high


frequencies more rapidly than low
frequencies over distance

Apr 15, Spring 2002 CS 7497


Head Rotation or Tilt

 Rotation or tilt can alter interaural


spectrum in predictable manner
 Can resolve positional ambiguities on a
cone of confusion

Apr 15, Spring 2002 CS 7497


The Haas (or Precedence)
Effect
 The perceptual weighting of binaural cues
of the first arriving sound over reflections
of the same sound
 Generally reveals true location of sound
source while filtering out contradictory
reflections
 Hypothesized to be important from
evolutionary standpoint

Apr 15, Spring 2002 CS 7497


Interactive Audio

 Virtual Sound Space


 Facilitate the perception of monaural,
binaural, and dynamic cues within the
virtual environment
 Model the virtual sound space in real-
time

Apr 15, Spring 2002 CS 7497


Digital Recording

 Audio can be digitized and processed on a


computer
 Digital formats have frequency and
dynamic range limitations

Apr 15, Spring 2002 CS 7497


Digital Problems

 Current formats do not completely cover


the full frequency range of human hearing
(especially low frequencies)
 Representing 0-120dB would require too
many bits!

Apr 15, Spring 2002 CS 7497


Review

 Distance cues (attenuation)


 Direct-to-reverberant energy ratio
 High-to-low frequency energy ratio
 Doppler Effect
 IID, ITD
 Spectral cues (effects from head, pinna)

Apr 15, Spring 2002 CS 7497


Attenuation

 Inverse-Square Law
 “Overkill”
 Sounds fall off too fast
 Solution: Add an ambient term (just like in
graphics)

Apr 15, Spring 2002 CS 7497


Static Attenuation

 Set sample volume based on distance


 Volume level is only calculated at
beginning of sample
 Low CPU usage
 Best for short duration samples
 Bad for long duration samples

Apr 15, Spring 2002 CS 7497


Dynamic Attenuation

 Sample volume based on distance


 Volume level recalculated every frame
 Good for long duration samples
 Higher CPU usage (3 multiplies every
frame per sample)
 Temporal Aliasing

Apr 15, Spring 2002 CS 7497


Temporal Aliasing

 You will hear “Stair Stepping” or discrete


volume levels as attenuation is
recalculated
 Solution: Increase update rate
 Rule of Thumb: At least 20Hz (twice as
much as needed for VR graphics)
 Some cues more susceptible than others

Apr 15, Spring 2002 CS 7497


Stereo Attenuation

 Set sample volume based on distance per


channel (left and right)
 Even higher CPU usage (3 multiplies +
trigonometry per frame, per sample)
 Gross approximation of IID

Apr 15, Spring 2002 CS 7497


Stereo Attenuation

Apr 15, Spring 2002 CS 7497


Multiple Channel Audio
 More than 2 speakers
 Typically oriented in a horizontal plane around
the user
 Usually 4 or 5 directional speakers (Surround
Sound™ or Dolby Digital™)
 Good for directional cues
 Expensive to calculate (probably need hardware
support—especially for Surround Sound or Dolby
Digital)

Apr 15, Spring 2002 CS 7497


Stereo Extenders

 Processing techniques for increasing


stereo spread
 Processed after stereo attenuation is
calculated (DSP inside speakers usually)
 Example: QSound

Apr 15, Spring 2002 CS 7497


Stereo Extenders

Apr 15, Spring 2002 CS 7497


“Solution” to the Dynamic
Range Problem

 Assume that an individual sound will not


have much dynamic range
 Scale the attenuation function to fit a min
and max distance

Apr 15, Spring 2002 CS 7497


“Solution” to the Dynamic
Range Problem

Apr 15, Spring 2002 CS 7497


Sound Source With Limited
Dynamic Range

Apr 15, Spring 2002 CS 7497


Modeling Interaural Arrival-
Time Difference

 Want to introduce phase difference


between left and right ear
 PROBLEM: left ear must only hear what
was meant for left ear; same for right ear!

Apr 15, Spring 2002 CS 7497


How to control what ears
hear?

 Easy solution: Head phones


 Hard solution: Cross-talk cancellation

Apr 15, Spring 2002 CS 7497


Headphone Solution
 Precise control of what each ear hears
 Good for VR (immersive)
 Not good for multi-user VR (CAVE)
 Cumbersome
 Need to track user’s head for proper HRTF
calculations
 If using HRTF, ear buds are ideal (remove
effect of pinna)
Apr 15, Spring 2002 CS 7497
Cross-talk Cancellation

 Left speaker plays left channel and the


cancellation of the right channel (same for
right)
 Results in a sweet spot where left ear will
only hear left channel and right ear will
only hear right channel

Apr 15, Spring 2002 CS 7497


Cross-talk Cancellation

Apr 15, Spring 2002 CS 7497


Problems with Cross-Talk
Cancellation

 Sweet Spot is a single user experience


 Implementation requires intimate
knowledge of advanced calculus and
Fourier Analysis
 Speakers must be accurately placed and
oriented
 Needs dedicated DSP hardware

Apr 15, Spring 2002 CS 7497


Calculating ITD Effects

 Determine distance from sound source to


each ear
 Simple physics to determine arrival time
of sound to each ear
 Heavy Duty™ math required to smoothly
interpolate phase changes

Apr 15, Spring 2002 CS 7497


Pinna, Head, and
Shoulders

 Determine HRTF from spectral analysis of


head-related impulse response (HRIR)
 Filter sounds by scaling intensities at each
frequency
 Definitely need dedicated hardware

Apr 15, Spring 2002 CS 7497


Determining the HRTF from
head-related impulse
response (HRIR)

Microphone for recording HRIRs


Apr 15, Spring 2002 CS 7497
HRIRs

Apr 15, Spring 2002 CS 7497


HRIRs

Apr 15, Spring 2002 CS 7497


Generic HRTF

 Use of an “average” HRIR to determine


HRTF
 Works fairly well for 80% of people
 Custom HRTF’s are quite often impractical

Apr 15, Spring 2002 CS 7497


Environmental Effects

 Obstruction/Occlusion
 Reverberation
 Doppler Shift
 Atmospheric Effects

Apr 15, Spring 2002 CS 7497


Obstruction

 Same as sound shadowing


 Generally approximated by a ray test and
a low pass filter
 High frequencies should get shadowed
while low frequencies diffract

Apr 15, Spring 2002 CS 7497


Obstruction

Apr 15, Spring 2002 CS 7497


Occlusion

 A completely blocked sound


 Example: A sound that penetrates a
closed door or a wall
 The sound will be muffled (low pass filter)

Apr 15, Spring 2002 CS 7497


Reverberation

 Effects from sound reflection


 Similar to echo
 Static reverberation
 Dynamic reverberation

Apr 15, Spring 2002 CS 7497


Static Reverberation

 Relies on the “closed container”


assumption
 Parameters used to specify approximate
environment conditions (decay, room size,
etc.)
 Example: Microsoft DirectSound3D EAX

Apr 15, Spring 2002 CS 7497


Static Reverberation

Apr 15, Spring 2002 CS 7497


Dynamic Reverberation

 Calculation of reflections off of surfaces


taking into account surface properties
 Typically diffusion and diffraction ignored
 “Wave Tracing”
 Example: Aureal A3D 2.0 or Beam Tracing
Paper

Apr 15, Spring 2002 CS 7497


Dynamic Reverberation

Apr 15, Spring 2002 CS 7497


Comparison

 Static Reverberation less expensive


computationally, simple to implement
 Dynamic Reverberation very expensive
computationally, difficult to implement,
but potentially superior results

Apr 15, Spring 2002 CS 7497


Doppler Shift

 Change in frequency due to velocity


 Very susceptible to temporal aliasing
 The faster the update rate the better
 Requires dedicated hardware

Apr 15, Spring 2002 CS 7497


Atmospheric Effects

 Attenuate high frequencies faster than low


frequencies
 Moisture in air increases this effect

Apr 15, Spring 2002 CS 7497


Applications and Current
Research

 Beam Tracing
 NAVE
 Effect of Audio on visual quality
 Audio Spotlight

Apr 15, Spring 2002 CS 7497


Beam Tracing

 Video! (from Siggraph ’98 Conference


Proceedings Video Tape)
 From paper: “A Beam Tracing Approach to
Acoustic Modeling for Interactive Virtual
Environments”, Thomas Funkhouser

Apr 15, Spring 2002 CS 7497


NAVE

 HRTF (ITD, IID) via cross-talk cancellation


of two front speakers (SBLive! DS3D)
 Two rear speakers provide directional and
intensity cues
 Discrete bass channel (2nd sound card)
 Static reverberation (EAX)

Apr 15, Spring 2002 CS 7497


Effect of Audio on Visual
Quality
 GT Study shows that ambient sounds
enhance sense of presence, as well as
subjective quality of 3D graphics
 Enhanced recall and recognition of visual
objects
 Dr. Russell Storms study showed
enhanced subjective quality of 2D
graphics

Apr 15, Spring 2002 CS 7497


Audio Spotlight

 Produces audio beam (like a flash light)


 Makes use of interference from ultrasonic
waves
 Potentially great dynamic range (better
than speaker cones)

Apr 15, Spring 2002 CS 7497


Audio Spotlight

Apr 15, Spring 2002 CS 7497


Audio Spotlight

Apr 15, Spring 2002 CS 7497


Audio Spotlight Compared
to Speaker

Apr 15, Spring 2002 CS 7497


Audio Spotlight Beam
Dimensions

Apr 15, Spring 2002 CS 7497


Audio Spotlight Distortion

Apr 15, Spring 2002 CS 7497


Audio Spotlight

 Holy Grail of interactive audio?


 Avoid cross-talk cancellation
 Track user’s ears and aim spotlight at
head
 AR; aim it at objects

Apr 15, Spring 2002 CS 7497


Open Research

 Diffusion (some work with radiosity)


 Diffraction
 HRTFs with audio spotlights
 Integration of graphics and audio
hardware for wave tracing (Nvidia?)

Apr 15, Spring 2002 CS 7497

You might also like