You are on page 1of 21

Lecture: 1-2

Course Overview
Dr. Shikha Tripathi,PES, Blr
¡  Review of Signal Processing basics
§  Transforms and their relation
§  Digital filters
¡  Important terminologies
§  Concept of negative frequency
§  Analog/Digital frequency
§  Stationary / Non stationary signals
§  Linear time invariant / variant systems
§  Minimum/Maximum/Mixed phase systems
¡  Overview of Course
14/08/2019 Dr.Shikha Tripathi@PES Blr 2
14/08/2019 Dr.Shikha Tripathi@PES Blr 3
¡  Negative Frequency

¡  Analog / Digital frequency

¡  Stationary / Non Stationary signals





14/08/2019 Dr.Shikha Tripathi@PES Blr 4
Consider
x(t)=cos(2π10t)+cos(2π25t)+cos(2π50t)+cos(2π 100t)

14/08/2019 Dr.Shikha Tripathi@PES Blr 5


14/08/2019 Dr.Shikha Tripathi@PES Blr 6
¡  Linear (Time / Shift) Invariant / Variant
systems
§  LTI / LSI systems are completely characterized by
their impulse/unit sample response

14/08/2019 Dr.Shikha Tripathi@PES Blr 7


¡  Minimum phase : System and its inverse are
causal and stable
§  poles and zeros inside unit circle
¡  Minimum phase sequence: Corresponding
impulse response
¡  All pole system is minimum phase
¡  Maximum phase : All poles/zeros outside the
unit circle
¡  H(z)(causal and stable) is generally mixed phase
consisting of minimum phase and maximum
phase components
H(z)=Hmin(z) Hmax(z)
Poles and zeros inside unit circle All zeros outside unit circle
14/08/2019 Dr.Shikha Tripathi@PES Blr 8
14/08/2019 Dr.Shikha Tripathi@PES Blr 9
14/08/2019 Dr.Shikha Tripathi@PES Blr 10
¡  Modules: 5
§  Mechanics of speech
§  Time Domain Models for Speech Processing
§  Frequency Domain Methods for Speech Processing
§  Homomorphic Speech Processing
§  Linear Predictive Coding of Speech

14/08/2019 Dr.Shikha Tripathi@PES Blr 11


Credits: 4 No. of Hours: 56
Faculty : Dr. Shikha Tripathi (ST)
UE16EC426 – Speech Processing

Credits: 4 LESSON PLAN No. of Hours: 56


Faculty : Dr. Shikha Tripathi (ST)

LESSON PLAN
Chapter Title / % of Portion covered
Class # Reference Topics to be covered
Reference
Literature Cumulative
Chapter Title / %Unit
of Portion covered
Class # Reference
Speech Topics to be covered
production: Mechanism Reference
Cumulative
Literature
of speech production, Acoustic Unit
UNIT-1
Speech production: Mechanism
Mechanics of phonetics - Digital models for
of speech production, Acoustic
UNIT-1
speech signals - Representations
1-8 Speech and Speech
Mechanics of phonetics - Digital models for
of speech
speech waveform:
signals Sampling
- Representations 15% 15%
hrs) Speech and Speech
(81-8 production of speech signals,
waveform: Sampling 15% 15%
(8 hrs) production speech basics of
(R1: Ch3, Ch5) speech signals, basics of
(R1: Ch3, Ch5) quantization, delta modulation,
quantization, delta modulation,
and
and Differential
Differential PCM
PCM - - Auditory
Auditory
perception: psycho acoustics.
perception: psycho acoustics.
Time dependent
Time dependent processing
processing of
of
speech, Short time energy and
speech,
average Short time energy
magnitude, and
Short time
UNIT-II average
average magnitude,
zero crossing Short time
rate,
UNIT-II
Time Domain average
Speech vs zero
silence crossing rate,
discrimination
9-22
Time Domain
Models for Speech using energy
Speech & zero
vs silence crossings,
discrimination
Processing:
(14 hrs) Models for Speech
Pitch period estimation, Short 25% 40%
using energy & zero crossings,
time autocorrelation function,
9-22 (R1:CH4)
(14 hrs) Processing: Pitch
Short period estimation,
time average Short
magnitude 25% 40%
(R1:CH4) time autocorrelation
difference function, Pitch function,
period
estimation using autocorrelation
Short time average magnitude
function
difference function, Pitch period
estimation
Short Time using autocorrelation
Fourier Analysis:
14/08/2019 Dr.Shikha Tripathi@PES Blr 12
Linear Filtering interpretation,
function
UNIT-III
Filter bank summation method,
Frequency Domain Overlap addition method, Design
Models for Speech using energy & zero crossings,
9-22
(14 hrs) Processing: Pitch period estimation, Short 25% 40%
(R1:CH4) time autocorrelation function,
Short time average magnitude
difference function, Pitch period
estimation using autocorrelation
function

Short Time Fourier Analysis:


Linear Filtering interpretation,
UNIT-III Filter bank summation method,
Frequency Domain Overlap addition method, Design
23-36 Methods for of digital filter banks,
25% 65%
(14 hrs) Speech Processing: Implementation using FFT,
(R1: Ch6,R2:Ch6) Spectrographic displays, Pitch
detection, Analysis by synthesis,
Analysis synthesis systems.

Homomorphic
Homomorphic systems
systems for
for
UNIT-IV
UNIT-IV convolution, Complex
convolution, Complex cepstrum,
cepstrum,
Mel
Mel Frequency
Frequency Cepstral
Cepstral
37-44
37-44 Homomorphic
Homomorphic
Coefficients Pitch
Coefficients Pitch detection,
detection, 15%
15% 80%
80%
(8 hrs)
(8 hrs) Speech Processing:
Speech Processing: Formant
Formant estimation,
estimation,
(R1: Ch7,R2
(R1: Ch7,R2 :Ch6)
:Ch6) Homomorphic vocoder.
Homomorphic vocoder.

Basic principles
Basic principles of of linear
linear
predictive analysis,
predictive analysis, Solution
Solution of
of
LPC equations,
LPC equations, Prediction
Prediction error
error
UNIT-V
UNIT-V signal,
signal, Frequency
Frequency domain
domain
interpretation;
interpretation; Speech
Speech
Linear Predictive
Linear Predictive
45-56
45-56 Recognition:
Recognition: Introduction,
Introduction,
(12 hrs)
(12 hrs) Coding of Speech:
Coding of Speech: Speech
Speech recognition,
recognition, Signal
Signal
20%
20% 100%
100%
(R1 :Ch8,
(R1 :Ch8, Ch9)
Ch9) processing and analysis methods,
processing and analysis methods,
Pattern
Pattern comparison
comparison techniques,
techniques,
Hidden
Hidden Markov
Markov Models,
Models, Isolated
Isolated
digit recognizer.
digit recognizer.

14/08/2019 Dr.Shikha Tripathi@PES Blr 13


References:
References:
Publication
Publication Info
Info
(R1 :Ch8, Ch9) processing and analysis methods,
Pattern comparison techniques,
Hidden Markov Models, Isolated
digit recognizer.

References:
Publication Info
Book Type Code Title & Author
Edition Publisher Year
Digital Processing of Speech Pearson
Signals Education
Text Book R1 1st 2004
L. R. Rabiner and R. W. (Asia) Pte.
Schafer Ltd.
Pearson
Discrete-time Speech Signal Education
Reference 1st
R2 Processing: Principles and (Singapore) 2008
Book - 1
Practice, Thomas F. Quatieri Pvt. Ltd.

Speech Communications: Universities


Reference nd
R3 Human and Machine 2 Press. 2001
Book - 2
D. O’Shaughnessy
Pearson
Fundamentals of Speech Education
Reference
R4 Recognition L. R. Rabiner and 2nd (Asia) Pvt. 2004
Book - 3
B. Juang Ltd.

Discrete-Time Processing of
Reference Speech signals
R5 2nd IEEE Press 2000
Book - 4 J. R. Deller, Jr., J. H. L. Hansen
and J. G. Proakis

14/08/2019 Dr.Shikha Tripathi@PES Blr 14


Activity Marks

Test 1 20

Test 2 20

Project 20
Assignment
ESA 40

Total 100 (60+40)

14/08/2019 Dr.Shikha Tripathi@PES Blr 15


¡  Sound
§  Air in motion — pushed, pulled, beaten, blown, plucked, talked, or sung into
motion
¡  Audio (Audible frequency)
§  Range: 20 Hz – 20KHz
¡  Speech
§  Has evolved as a primary form of communication between humans
§  Speech is the oral communication of meaningful information through the rules
of a specific language
§  Range: 300 Hz-3400Hz(telephonic)
¡  Music:
§  Music is sound's highest achievement, a wonderfully varied mixture of
patterned vibrations sent into the air by all kinds of instruments, from a
cricket's hind legs to a massive pipe organ
§  Range: 20Hz-1KHz(bass),1-8 KHz(mid frequency), 8-16KHz(Trebble)

14/08/2019 Dr.Shikha Tripathi@PES Blr 16


¡  Importance of digital techniques in speech
communication systems:
¡  Speech in digital form can be stored for periods of time and
transmitted over noisy channels relatively uncorrupted.
¡  Speech signal in digital form is identical to data of other forms
¡  Digital signals can be encrypted by scrambling the bits, which are
then unscrambled at the receiver(security)
¡  Digital speech can be encoded and compressed for efficient
transmission and storage

14/08/2019 Dr.Shikha Tripathi@PES Blr 17


Ø  Processing of speech has moved almost entirely into the
discrete time domain (Magnitude not quantized)
Ø  Acoustic wave produced in human speech is continuously
varying pattern represented as xa(t)
Ø  Speech is initially a variation in air pressure which is
converted into a continuous voltage by a microphone

14/08/2019 Dr.Shikha Tripathi@PES Blr 18


¡  Speech as a Single Input Multiple Output
system (SIMO)
§  Sampling xa(t) results in x[n]
§  Transformation of x[n] is speech processing:
▪  Estimating several time varying parameters (multiple)
from samples of speech wave (Single)

14/08/2019 Dr.Shikha Tripathi@PES Blr 19


¡  Applications of speech processing
¡  Speech Communication pathway
¡  Speech representation
¡  Speech Production model
¡  Model for glottal flow

14/08/2019 Dr.Shikha Tripathi@PES Blr 20


14/08/2019 Dr.Shikha Tripathi@PES Blr 21

You might also like