Lec - 1-2 Speech Processing

Lecture: 1-2
Course Overview
Dr. Shikha Tripathi,PES, Blr
¡  Review of Signal Processing basics
§  Transforms and their relation
§  Digital filters
¡  Important terminologies
§  Concept of negative frequency
§  Analog/Digital frequency
§  Stationary / Non stationary signals
§  Linear time invariant / variant systems
§  Minimum/Maximum/Mixed phase systems
¡  Overview of Course
14/08/2019 Dr.Shikha Tripathi@PES Blr 2
¡  Negative Frequency
¡  Analog / Digital frequency
¡  Stationary / Non Stationary signals

Consider
x(t)=cos(2π10t)+cos(2π25t)+cos(2π50t)+cos(2π 100t)

¡  Linear (Time / Shift) Invariant / Variant
systems
§  LTI / LSI systems are completely characterized by
their impulse/unit sample response

¡  Minimum phase : System and its inverse are
causal and stable
§  poles and zeros inside unit circle
¡  Minimum phase sequence: Corresponding
impulse response
¡  All pole system is minimum phase
¡  Maximum phase : All poles/zeros outside the
unit circle
¡  H(z)(causal and stable) is generally mixed phase
consisting of minimum phase and maximum
phase components
H(z)=Hmin(z) Hmax(z)
Poles and zeros inside unit circle All zeros outside unit circle
¡  Modules: 5
§  Mechanics of speech
§  Time Domain Models for Speech Processing
§  Frequency Domain Methods for Speech Processing
§  Homomorphic Speech Processing
§  Linear Predictive Coding of Speech

Credits: 4 No. of Hours: 56
Faculty : Dr. Shikha Tripathi (ST)
UE16EC426 – Speech Processing
Credits: 4 LESSON PLAN No. of Hours: 56

Faculty : Dr. Shikha Tripathi (ST)
LESSON PLAN
Chapter Title / % of Portion covered
Class # Reference Topics to be covered
Reference
Literature Cumulative
Chapter Title / %Unit
of Portion covered
Class # Reference
Speech Topics to be covered
production: Mechanism Reference
Cumulative
Literature
of speech production, Acoustic Unit
UNIT-1
Speech production: Mechanism
Mechanics of phonetics - Digital models for
of speech production, Acoustic
UNIT-1
speech signals - Representations
1-8 Speech and Speech
Mechanics of phonetics - Digital models for
of speech
speech waveform:
signals Sampling
- Representations 15% 15%
hrs) Speech and Speech
(81-8 production of speech signals,
waveform: Sampling 15% 15%
(8 hrs) production speech basics of
(R1: Ch3, Ch5) speech signals, basics of
(R1: Ch3, Ch5) quantization, delta modulation,
quantization, delta modulation,
and
and Differential
Differential PCM
PCM - - Auditory
Auditory
perception: psycho acoustics.
perception: psycho acoustics.
Time dependent
Time dependent processing
processing of
of
speech, Short time energy and
speech,
average Short time energy
magnitude, and
Short time
UNIT-II average
average magnitude,
zero crossing Short time
rate,
UNIT-II
Time Domain average
Speech vs zero
silence crossing rate,
discrimination
9-22
Time Domain
Models for Speech using energy
Speech & zero
vs silence crossings,
discrimination
Processing:
(14 hrs) Models for Speech
Pitch period estimation, Short 25% 40%
using energy & zero crossings,
time autocorrelation function,
9-22 (R1:CH4)
(14 hrs) Processing: Pitch
Short period estimation,
time average Short
magnitude 25% 40%
(R1:CH4) time autocorrelation
difference function, Pitch function,
period
estimation using autocorrelation
Short time average magnitude
function
difference function, Pitch period
estimation
Short Time using autocorrelation
Fourier Analysis:
Linear Filtering interpretation,
function
UNIT-III
Filter bank summation method,
Frequency Domain Overlap addition method, Design
Models for Speech using energy & zero crossings,
9-22
(14 hrs) Processing: Pitch period estimation, Short 25% 40%
(R1:CH4) time autocorrelation function,
Short time average magnitude
difference function, Pitch period
estimation using autocorrelation
function
Short Time Fourier Analysis:

Linear Filtering interpretation,
UNIT-III Filter bank summation method,
Frequency Domain Overlap addition method, Design
23-36 Methods for of digital filter banks,
25% 65%
(14 hrs) Speech Processing: Implementation using FFT,
(R1: Ch6,R2:Ch6) Spectrographic displays, Pitch
detection, Analysis by synthesis,
Analysis synthesis systems.
Homomorphic
Homomorphic systems
systems for
for
UNIT-IV
UNIT-IV convolution, Complex
convolution, Complex cepstrum,
cepstrum,
Mel
Mel Frequency
Frequency Cepstral
Cepstral
37-44
37-44 Homomorphic
Homomorphic
Coefficients Pitch
Coefficients Pitch detection,
detection, 15%
15% 80%
80%
(8 hrs)
(8 hrs) Speech Processing:
Speech Processing: Formant
Formant estimation,
estimation,
(R1: Ch7,R2
(R1: Ch7,R2 :Ch6)
:Ch6) Homomorphic vocoder.
Homomorphic vocoder.
Basic principles
Basic principles of of linear
linear
predictive analysis,
predictive analysis, Solution
Solution of
of
LPC equations,
LPC equations, Prediction
Prediction error
error
UNIT-V
UNIT-V signal,
signal, Frequency
Frequency domain
domain
interpretation;
interpretation; Speech
Speech
Linear Predictive
Linear Predictive
45-56
45-56 Recognition:
Recognition: Introduction,
Introduction,
(12 hrs)
(12 hrs) Coding of Speech:
Coding of Speech: Speech
Speech recognition,
recognition, Signal
Signal
20%
20% 100%
100%
(R1 :Ch8,
(R1 :Ch8, Ch9)
Ch9) processing and analysis methods,
processing and analysis methods,
Pattern
Pattern comparison
comparison techniques,
techniques,
Hidden
Hidden Markov
Markov Models,
Models, Isolated
Isolated
digit recognizer.
digit recognizer.

References:
References:
Publication
Publication Info
Info
(R1 :Ch8, Ch9) processing and analysis methods,
Pattern comparison techniques,
Hidden Markov Models, Isolated
digit recognizer.
References:
Publication Info
Book Type Code Title & Author
Edition Publisher Year
Digital Processing of Speech Pearson
Signals Education
Text Book R1 1st 2004
L. R. Rabiner and R. W. (Asia) Pte.
Schafer Ltd.
Pearson
Discrete-time Speech Signal Education
Reference 1st
R2 Processing: Principles and (Singapore) 2008
Book - 1
Practice, Thomas F. Quatieri Pvt. Ltd.
Speech Communications: Universities

Reference nd
R3 Human and Machine 2 Press. 2001
Book - 2
D. O’Shaughnessy
Pearson
Fundamentals of Speech Education
Reference
R4 Recognition L. R. Rabiner and 2nd (Asia) Pvt. 2004
Book - 3
B. Juang Ltd.
Discrete-Time Processing of
Reference Speech signals
R5 2nd IEEE Press 2000
Book - 4 J. R. Deller, Jr., J. H. L. Hansen
and J. G. Proakis

Activity Marks
Test 1 20
Test 2 20
Project 20
Assignment
ESA 40
Total 100 (60+40)

¡  Sound
§  Air in motion — pushed, pulled, beaten, blown, plucked, talked, or sung into
motion
¡  Audio (Audible frequency)
§  Range: 20 Hz – 20KHz
¡  Speech
§  Has evolved as a primary form of communication between humans
§  Speech is the oral communication of meaningful information through the rules
of a specific language
§  Range: 300 Hz-3400Hz(telephonic)
¡  Music:
§  Music is sound's highest achievement, a wonderfully varied mixture of
patterned vibrations sent into the air by all kinds of instruments, from a
cricket's hind legs to a massive pipe organ
§  Range: 20Hz-1KHz(bass),1-8 KHz(mid frequency), 8-16KHz(Trebble)

¡  Importance of digital techniques in speech
communication systems:
¡  Speech in digital form can be stored for periods of time and
transmitted over noisy channels relatively uncorrupted.
¡  Speech signal in digital form is identical to data of other forms
¡  Digital signals can be encrypted by scrambling the bits, which are
then unscrambled at the receiver(security)
¡  Digital speech can be encoded and compressed for efficient
transmission and storage

Ø  Processing of speech has moved almost entirely into the
discrete time domain (Magnitude not quantized)
Ø  Acoustic wave produced in human speech is continuously
varying pattern represented as xa(t)
Ø  Speech is initially a variation in air pressure which is
converted into a continuous voltage by a microphone

¡  Speech as a Single Input Multiple Output
system (SIMO)
§  Sampling xa(t) results in x[n]
§  Transformation of x[n] is speech processing:
▪  Estimating several time varying parameters (multiple)
from samples of speech wave (Single)

¡  Applications of speech processing
¡  Speech Communication pathway
¡  Speech representation
¡  Speech Production model
¡  Model for glottal flow


Lec - 1-2 Speech Processing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec - 1-2 Speech Processing

Uploaded by

Copyright:

Available Formats

Lecture: 1-2

¡  Analog / Digital frequency

¡  Stationary / Non Stationary signals

14/08/2019 Dr.Shikha Tripathi@PES Blr 5

14/08/2019 Dr.Shikha Tripathi@PES Blr 7

14/08/2019 Dr.Shikha Tripathi@PES Blr 11

Credits: 4 LESSON PLAN No. of Hours: 56

Short Time Fourier Analysis:

14/08/2019 Dr.Shikha Tripathi@PES Blr 13

Speech Communications: Universities

14/08/2019 Dr.Shikha Tripathi@PES Blr 14

Total 100 (60+40)

14/08/2019 Dr.Shikha Tripathi@PES Blr 15

14/08/2019 Dr.Shikha Tripathi@PES Blr 16

14/08/2019 Dr.Shikha Tripathi@PES Blr 17

14/08/2019 Dr.Shikha Tripathi@PES Blr 18

14/08/2019 Dr.Shikha Tripathi@PES Blr 19

14/08/2019 Dr.Shikha Tripathi@PES Blr 20

You might also like

Lec - 1-2 Speech Processing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec - 1-2 Speech Processing

Uploaded by

Copyright:

Available Formats

Lecture: 1-2

¡ Analog / Digital frequency

¡ Stationary / Non Stationary signals

14/08/2019 Dr.Shikha Tripathi@PES Blr 5

14/08/2019 Dr.Shikha Tripathi@PES Blr 7

14/08/2019 Dr.Shikha Tripathi@PES Blr 11

Credits: 4 LESSON PLAN No. of Hours: 56

Short Time Fourier Analysis:

14/08/2019 Dr.Shikha Tripathi@PES Blr 13

Speech Communications: Universities

14/08/2019 Dr.Shikha Tripathi@PES Blr 14

Total 100 (60+40)

14/08/2019 Dr.Shikha Tripathi@PES Blr 15

14/08/2019 Dr.Shikha Tripathi@PES Blr 16

14/08/2019 Dr.Shikha Tripathi@PES Blr 17

14/08/2019 Dr.Shikha Tripathi@PES Blr 18

14/08/2019 Dr.Shikha Tripathi@PES Blr 19

14/08/2019 Dr.Shikha Tripathi@PES Blr 20

You might also like

¡  Analog / Digital frequency

¡  Stationary / Non Stationary signals