You are on page 1of 6

Voice analysis and

resynthesis for Psychologists Course schedule


Summer 2012 Lecture 1: voice production
Lecture 2: voice structure
Lecture 1 Workshop 1: voice analysis
Workshop 2: speech analysis
Introduction Lecture 3: Voice & Psychology
Essential definitions Workshop 3: voice synthesis
Basic acoustics

Language / Natural language


Course Assessment
Language:
Open communication system that uses a set of written, gestural, or spoken
symbols that refer to people, objects or ideas.
Exercise (100%, 1000 words) Open-ended system of communication in which the grammatical structure allows
information of great cognitive complexity to be passed from one individual (the
produce a spectrogram of your voice, analyse/ speaker) to another (the listener).
report values and discuss their relevance.
See instructions for more details. Natural Language:
Spoken or signed language (as opposed to written languages, computer
programming languages). The ASL, Spanish, English & French are natural
languages.
Deadline: week 10 (check SD)
Linguistics:
The study of human language.

Speech:
Human spoken language (as opposed to sign language).

Voice Phonetics & Phonology


the sounds made by a person using the vocal folds for talking, Phonetics is about the physical production and
singing, screaming or crying
perception of speech sounds.
How vowels and consonants are produced, their acoustic structure, and
The voice results from the act of phonation = the use of the how they are perceived.
laryngeal system to generate an audible source of acoustic
energy Textbooks:
A course in Phonetics, Ladefoged 2000
Speech Physiology, Speech perception, and Acoustic Phonetics,
Not just speech - but more generally vocal communication, Lieberman & Blumstein, 1988
including animal vocal communication. Principles of voice production Titze, 1994

The term voice refers to the form and to the quality of the vocal Phonology describes the way sounds function - within a
signal rather than to its content. given language or across languages. "

1
Bioacoustics Psychoacoustics

Bioacoustics: how animals use sound for communication Psychoacoustics:


and echolocation. the psychology of acoustical perception

How animals produce sounds, the physical structure of these sounds, how The study of subjective human perception of sounds.
animals perceive them, what their function is and how they evolved.
Study of the relations between the sound stimuli and their auditory
Textbooks: perception in terms of hearing sensations.
The Evolution of Communication, Hauser 1996
The Principles of Animal Communication, Bradbury & Vehrencamp, 1998 These relationships are not simple and linear.
Animal Signals, Maynard Smith & Harper, 2003.
Different people will hear the different things when they listen to the
same sound.
Textbooks:

Speech Physiology, Speech perception, and Acoustic Phonetics,


Lieberman & Blumstein, 1988

A sound wave is caused by an increase in pressure at a certain point which


Introduction to Acoustics: causes a "domino effect" outward.

What is sound?"

Vibration as perceived by the sense of hearing (Wikipedia -


Psychoacoustics definition)"

A disturbance of the equilibrium of density (or pressure of a gas,


liquid or solid) (Titze - Physics definition)"

A local pressure disturbance in a continuous medium that contains


frequencies in the range of 20 to 20,000Hz (the audible
range) (Titze, a compromise between physics and psychophysics)"

Small variations in air pressure that occur very rapidly one after
another (Ladefoged)"

If the perturbation is repeated periodically, then it Propagation and speed of Sound


generates a series of sound waves:
In an homogeneous medium (~ the atmosphere), sound
propagates from the source at equal speed in all three
Vibrating

dimensions, therefore sound waves are spherical waves."


source

The speed at which sound propagates depends on the type,


temperature and pressure of the medium through which it
propagates. "
Pressure


In dry air at 20 C the speed of sound is approximately 343m/s."
space Thats approximately 1 meter every 2.9 milliseconds."

The crests correspond to the high pressure points and the troughs
In the human vocal tract, which is more humid and warmer,"
correspond to the low pressure points.
the speed of sound is higher at 355m/s."

2
Waveform / Oscillogram Periodic Sounds
Sound waves can be represented as the temporal variation of sound
pressure at a fixed point in space - for example the membrane of Most sounds are generated by oscillators (strings, vocal folds,
a microphone. resonators, etc)

When we record a sound - we record (analogically or digitally) this Why oscillators?


temporal variation.

This can be represented as a waveform or oscillogram:

Therefore most natural sounds are are periodic (or quasi-periodic).

The pressure variation of a periodic sound is an oscillation with a


given period and a given amplitude.

Period Frequency
The frequency of a sound is the number of air
The period of a sound wave is the the duration of pressure oscillation cycles per second. It is
an oscillation cycle the multiplicative inverse (or reciprocal) of the
Can be measured as the time between two peaks.
period: F = 1/T

T = 0.74 ms,
F = 1/0.0074 = 133Hz

0.74 ms
One single oscillatory cycle per second corresponds to 1 Hz. This is not audible.
7.5 ms 125 oscillations (the fundamental frequency in male voice), is 125HZ.
200 oscillations (the fundamental frequency in female voice), is 200Hz
2000 oscillations (some bird calls), is 2000Hz,
15000 oscillations (some bats calls), is 15000Hz etc

Wavelength Wavelength
The wavelength of a periodical sound is the distance (in The wavelength of a periodical sound is the distance (in
space) between two successive crests (and is the space) between two successive crests (and is the
distance that a wave travels in the time of one distance that a wave travels in the time of one
oscillatory cycle). oscillatory cycle).
It is a function of the frequency of the sound and of the speed of It is a function of the frequency of the sound and of the speed of
sound in the medium in which the sound is propagated. sound in the medium in which the sound is propagated.
The wavelength of a sound of frequency F traveling at speed c is The wavelength of a sound of frequency F traveling at speed c is
given by d = c/F. given by d = c/F.

For c = 343 m/s (speed of sound in the atmosphere): For c = 343 m/s (speed of sound in the atmosphere):
a 20 kHz sound wave has a wavelength of 343/20000 = 17 mm. a 20 kHz sound wave has a wavelength of 343/20000 = 17 mm.
a 440 Hz wave has a wavelength of 343/440 = 78 cm, a 440 Hz wave has a wavelength of 343/440 = 78 cm,
a 20 Hz (an elephant rumble) wave has a wavelength of 343/20 a 20 Hz (an elephant rumble) wave has a wavelength of 343/20
=17 m. =17 m.

3
Amplitude, SPL and loudness The intensity contour
The amplitude is the magnitude of the change in sound pressure
within the wave. It corresponds to the maximum amount of The amplitude envelope of a sound is the
pressure at any point in the sound wave. smooth curve that passes through the
peaks of the amplitude.
It is also called Sound Pressure Level and measured in decibels, a
It is also called the intensity contour
logarithmic (perceptual) scale.

Determines the temporal structure of


dB(SPL)
animal calls / speech.

Examples of dB levels : ambiant speech in an office/restaurant:


60dB, vacuum cleaner at 1m: 80dB, red deer roar at 1m: 104 dB,
jet aircraft at 100m: 120dB, blue whale at 1m:180dB.
Loudness is the perceptual correlate of amplitude it is a
subjective, non linear perceptual attribute of sound (varies with
people, frequency, distance)

How can we study the frequency structure


Spectrums: frequency / amplitude representation. The time
of sounds ? dimension is removed.

Spectrograms: H U M A N
H U M A N
5
5
Enable to visualise the distribution
of the energy (amplitude) in two Spectrogram
dimensions: time (s) and
frequency (Hz).

Time is on the x axis, frequency on


the y axis, and the energy is
represented by different shades
of grey.
0

0
0 0.5 Amplitude (dB)
Time (s)

0
Time (s)
0.5 Spectrum
Waveform

Complex sounds
H U M A N

Simple sound-waves: pure tones


Animals, humans and most
Pure tones are single frequency tones with no harmonic content (no musical instruments usually
overtones). This corresponds to a sine wave. generate periodic sounds
which have energy at more
5 than one frequency. 0

These sounds are called


Frequency (kHz)

0 0.5
Time (s)
complex sounds
Frequency (kHz)

1.5 kHz
These sounds are composed of
0
more than one pure tone
dB
(more than one sinusoidal
wave).
0 0.5
Time (s)
Examples:
Red deer roar, herring gull call.
Examples of pure tones: whistles, scops owl hoots, most electronic beeps.

4
Fundamental frequency and harmonics in
The pitch
complex periodic sounds
Typical vocal sounds are composed of several sinusoidal waves which appear
on spectrograms as evenly spaced, parallel, narrow frequency components. What is the Pitch of a voice?

H U M A N The lowest of these parallel The pitch is the perceived height of a voice (Titze)
frequency components is called
5
the fundamental frequency
(F0). It is mainly determined by the fundamental frequency of the sound.

The harmonics are integer


multiples of the fundamental
frequency: H1 = 2F0, H2 = 3F0,
Harmonics etc 60Hz 140Hz 200Hz 500Hz
The fundamental frequency
determines the pitch of the tone
(how high or low it is perceived very low male female child
F0 to be).
(early morning)
The variation of F0 with time
0 determines the fundamental
frequency contour. In speech it
affects the intonation. Pitch goes up to 1.4 kHz (whistle register - female singers only).
0 0.5
Time (s)

Noise The spectral envelope: resonance


frequencies (formants)
Noise is sound that is made of aperiodic series of waves, corresponding The distribution of the energy is not
to irregular and disordered vibrations that include all possible uniform across frequencies:
frequencies (e.g. waves breaking on shore, wind) H U M A N

5
Can play a role in speech: e.g. whisper - see next lecture. The peaks and valleys represent the
resonances that take place in the Formants
cavities of the vocal tract.
White noise:
Called formants (in latin formare
5 = to shape) because they shape
the spectral structure of the
speech signal. 0

Formants are central to human 0 Time (s) 0.5

0 speech as they provide the


acoustic variation at the basis of
vowels and consonants.
(see next lectures!).
0 0.5
Time (s)

Summary How vertebrates make sounds

Anurans use their


A vocal sound can be
described in terms of
larynx, they often use
H U M A N

its: 5 two sets of folds (AM).

- intensity contour Birds use their syrinx,


located at the base of
- periodicity
the trachea.
pure tone?
F0+harmonics ?
F0 contour ? Mammals use their
noise ? 0
larynx, at the top of the
trachea.
- spectral envelope 0 Time (s) 0.5

formants ?

From Fitch & Hauser 2002

5
The two functional components: the
What is the vocal apparatus
source and the filter

Speech (and most mammal) sounds result from a two-


The lungs (generate stage process:
the power)
- a periodic wave (called the glottal wave) is generated
The trachea in the larynx (= the source). Its fundamental frequency
The larynx determines the pitch of the voice.

The supralaryngeal - this wave is then filtered in the supralaryngeal


vocal tract: cavities of the vocal tract (= the filter), creating broad
the pharynx bands of energy called vocal tract resonances or
formants.
The mouth
The nasal cavity
Defined by Fant, G. (1960). Acoustic Theory of Speech Production.
From Titze, 1994

Illustration of the source filter theory

Speaker with an Speaker with a


Beheaded
anesthetized vocal normal vocal tract
speaker
tract

Glottal wave Glottal wave Glottal wave


only filtered by a filtered by a non
uniform tube uniform, changing
vocal tract

You might also like