You are on page 1of 30

14EC3029 SPEECH AND AUDIO

SIGNAL PROCESSING
Credits3:0:0
Pre requisites: Basic Digital Signal Processing,
Good knowledge of MATLAB and Simulink
D.Sugumar
Asst.Prof (AGP 8000)/ECE
Karunya University
Coimbatore

Course objective
To study the analysis of various M-band filter
banks for audio coding
To learn various transform coders for audio
coding.
To study the speech processing methods in
time and frequency domain
To study the basic concepts of speech and
audio.

Course outcome
On successful completion you should be able to:
1. Express the speech signal in terms of its time domain and frequency
domain representations and the different ways in which it can be
modelled.
2. Express the simple features used in speech and audio applications
3. Able to understand the operation of algorithms, and the effects of varying
parameter values within these;
4. Synthesise block diagrams for speech applications, know the purpose of
the various blocks, and detail algorithms that could be used to implement
them;
5. Able to Implement components of speech processing systems, including
speech recognition and speaker recognition, in MATLAB.
6. Able to understand the behaviour of previously unseen speech processing
systems and hypothesise about their merits.

Course Contents
Mechanics of speech and audio
Nature of Speech signal Discrete time modelling of
Speech production Classification of Speech sounds
Absolute Threshold of Hearing - Critical Bands- Masking,
Perceptual Entropy -The perceptual audio quality measure
(PAQM) - Cognitive effects in judging audio quality.

Time-frequency analysis - filter banks and transforms.


Audio coding and transform coders.
Time and frequency domain methods for speech
processing, Homomorphic speech analysis.
Linear predictive analysis of speech, Application of LPC
parameters. Formant analysis.

Audio Engineering-related Professions

Musician
Audio/electronics technician
Recording Engineer
Architectural Acoustician
Psychoacoustician
Electroacoustic device designer
Electronics designer
Computer programmer
Audio engineer: One who devises creative solutions
to difficult problems in the field of audio.

Audio-related Organizations,
Conferences and Publications
Acoustical Society of American (ASA) Journal: JASA
(since 1929) meets twice yearly in various locations.
Most members work at universities or scientific labs.
http://asa.aip.org
Audio Engineering Society (AES) Journal: JAES
(since 1953) meets several times a year in various
locations. Most members work in the audio industry.
http://www.aes.org
IEEE Signal Processing Society
Conferences: Int. Conf. on Acoustics, Speech and Signal
Processing (ICASSP) and Workshop on Applications of
Signal Processing to Audio and Acoustics (WASSP).
http://www.ieee.org/organizations/society/sp/

Audio-related Organizations,
Conferences and Publications
International Computer Music Association (since 1980)
Publication: Computer Music Journal (quarterly since 1977)
Yearly International Computer Music Conference (ICMC) at American, European,
and Asian locations (since 1974).
http://computermusic.org
Society for Music Perception and Cognition (SMPC)
Yearly conferences alternate between U.S./Canada and Europe/Asia.
http://www.musicperception.org/

International Conference on Digital Audio Effects (DAFx)


http://dafx.labri.fr Yearly international (since 1997).
International Conference on Music Information Retrieval
http://ismir2007.ismir.net/ Yearly international (since 2000).
International Symposium on Musical Acoustics
http://iwk.mdw.ac.at/ma/ Roughly biennial international.

Reference Books

Books

1. Digital Audio Signal Processing, Second Edition, Udo Zolzer, A John Wiley& sons Ltd
Publicatioons,2008.
This book ... covers noiseshaping, gives you formulas
for peaking and shelving
filters used in mixing
consoles, tells you how to
implement a state of the art
reverb or dynamic
compression algorithm and
explains how audio
compression using
psychoacoustic effects
works. The mathematics are
not that complicated, but
you should already know
what FFT or IIR stands for
and how they work to be
able to use the book.

2. Applications of Digital Signal Processing to Audio and Acoustics, Mark Kahrs,


Karlheinz Brandenburg, Kluwer Academic Publishers New York, Boston, Dordrecht,
London , Moscow,2002

Good One

3. . Digital Processing of Speech signals L.R.Rabiner and R.W.Schaffer - Prentice


Hall 1978

Perfect Choice for this course

4.Speech and Audio Signal Processing by Ben Gold and Nelson


Morgan
This is a book much needed in the speech and

audio community because of its unique


perspective on these topics. By their very
nature, speech, music and other audio signals
are only fully understood if one takes into
account their perception, production, and the
context within which they exist (language,
symphony). To appreciate what to process
about such signals, the scientist must have a
broad appreciation of linguistics, hearing, vocal
tract models, and the brain in general, in
addition to the standard engineering tools and
approaches. This is why this book is valuable.
It indeed attempts to reach out to all these
fields with just enough details to inspire the
reader, and to provide links to existing more
detailed literature. The book is well written,
full of excellent illustrations, and it was the
perfect choice for this class.

5. T. F. Quatieri, Principles of Discrete Time Speech Processing, Prentice Hall


Inc, 2002
1. Express the speech signal in terms of its time
domain and frequency domain representations
and the different ways in which it can be
modelled
2. Derive expressions for simple features used in
speech classification applications;
3. Explain the operation of example algorithms
covered in lectures, and discuss the effects of
varying parameter values within these;
4. Synthesise block diagrams for speech
applications, explain the purpose of the
various blocks, and describe in detail algorithms
that could be used to implement them;
5. Implement components of speech processing
systems, including speech recognition and
speaker recognition, in MATLAB.
6. Deduce the behaviour of previously unseen
speech processing systems and hypothesise
about their merits.

study
I hear...and
I forget
I see...and
I remember
learn
I Practice
do...and I understand

Chinese Proverb

Hardware & Software for QA

My collection @
Google drive
E-Books

E-books
Videos

For DSP & solution NPTL


Youtube
For Matlab
For Labview
For Scilab

Websites
JDSP
Mathwork center
Dspguru
etc

A Brief History of Audio (Analog)


Electromagnetic microphone (Ernst Siemens 1874)
Telephone (Alexander Graham Bell 1876)
Phonograph (wax cylinders) (Thomas Edison 1877)
Gramophone 78 record (Emile Berliner 1888)
Telegraphone magnetic wire recorder (Valdemar Paulsen 1898)
Telharmonium first electrical synthesizer (Thadeus Cahill 1900)
AM radio (Reginald Fessenden 1905)
First radio broadcast (Lee DeForest (Met Opera) 1910)
Vacuum tube amplifier (Edwin Armstrong 1912)
Electrostatic microphone (E. Wente, Bell Labs 1916)
Electromagnetic loudspeaker (Chester Rice &Edward Kellog 1924)
FM radio (Edwin Armstrong 1933; common use began in 1950s)
Wire recorder (consumer heyday 1947-1952)
Magnetic tape (reel-to-reel Ampex 1948; cassette Phillips 1962)
33 rpm (LP) record (Columbia Records 1948)
multitrack recording (Ampex 1954)
stereo LP (Westrex 1958) and FM (GE/Zenith 1961, based on Armstrong)

A Brief History of Audio (Digital)


12-bit digital recording on computer tape (Bell Labs, 1957)
13-bit digital recording on computer tape (Illiac II, UIUC, 1964)
12-bit digital recording on computer disk (DEC/Stanford U., 1965)
16-bit 2/4 channel digital recorder (Soundstream, 1976/77)
Sony PCM (16-bit stereo recorded on video tape 1978)
compact disk (CD) (16-bit stereo format) (Phillips/Sony 1983)
stereo digital audio tape (DAT) (Phillips/Sony 1986)
recordable CD: CD-R (Phillips/Sony 1988)
sound compression (MP3 standard 1989)
audio record/playback from home computer (NeXT 1989)
8-track digital audio tape (ADAT) (Alesis 1991)
8-track digital audio tape (DA-88) (Tascam 1992)
MiniDisc MD (Sony 1998)

Introduction to Speech Signal


Processing
Speech: Fundamental and eortless mode of communication
among humans.
Speech communication: Talker, listener and channel
Speech Production Process: Message formulation, language
coding, neuro-muscular commands, movement of speech
production organs, acoustic pressure variations
Speech Perception Process: acoustic pressure variations,
movement of speech perception organs, neuro-muscular
commands, message comprehension

Applications (Project Areas)


1 Speech Modification:
time-scale manipulations:
Fitting the speech waveform In Radio and TV commercials into an allocated time slot and the synchronization of audio and video
presentation.

Speeding up speech
Message playback
Voice mail
Reading machines and books for the blind

Slowing down speech


Learning a foreign language

Voice transformations using Pitch and spectral changes of speech signal:


Voice disguise
Entertainment
Speech synthesis

Spectral change of frequency compression and expansion:


may be useful in transforming speech as an aid to the partially deaf.

Many methods can be applied to music and special effects.

Applications (Project Areas)


2.Speech Coding
Goal is to reduce the information rate measured in bits per second
while maintaining the quality of the original waveform.
Waveform coders:
Represent the speech waveform directly and do not rely on a speech
production model.
Operate in a high range of 16-64 kbps

Vocoders:
Largely are speech model-based and rely on a small set of model parameters.
Operate at the low bit range of 1.2-4.8 kbps
Lower quality then waveform coders.

Hybrid coders:
Partly waveform based and partly speech model-based
Operate in the 4.8 16 kbps range

Applications (Project Areas)


Applications of speech coders include:
Digital telephony over constrained bandwidth channels

Cellular
Satellite
Voice over IP (Internet)
Video phones
Storage of Voice messages for computer voice mail
applications.

Applications (Project Areas)


3 Speech Enhancement
Goal is to improve the quality of degraded speech.
Preprocess speech before is degraded:
Increasing the broadcast range of transmitters constrained by a peak power
transmission limits (e.g., AM radio and TV transmissions).

Enhancing the speech waveform after it is degraded.


Reduction of additive noise in
(Digital) telephony
Vehicle and aircraft communications
Reduction of interfering backgrounds and speakers for the hearing impaired,
Removal of unwanted convolutional channel distortion and reverberation
Restoration of old phonograph recordings degraded by:
Acoustic horns
Impulse-like scratches from age and wear

Applications (Project Areas)


4 Speaker Recognition
Speech signal processing exploits the variability of speech model
parameters across speakers.
Verifying a persons identity (Biometrics)
Voice identification in forensic investigation.

Understanding of the speech model features that cue a persons


identity is also important in speech modification where model
parameters can be transformed for the study of specific voice
characteristics:
Speech modification and speaker recognition can be developed synergistically.

Office
Room No 206 (2nd Floor of ECE)

E-Mail: sugumar@karunya.edu