Guided by Prof.

Uday Rote

Speech Production
Acoustic sound pressure waves created by exiting of air from vocal tract and voluntary movement of anatomical structure

Fundamental Excitation Types: Voiced Unvoiced


Feature Extraction
Important Characteristics of Speech Pitch Frequency S(n), is composed of a Formants convolved combination
of excitation signal, with the vocal tract impulse response.

Cepstral Analysis
Time domain analysis Separation of two convolved signals
The output signal of speech production system S(n), is as follows: s(n) = e(n) * (n) 1. s(w) = E(w) (w) Using Fourier transform 2. log s(w) = log E(w) + log (w) Taking logarithm 3. cs(w) = ce(w) + c (w) Equ. can be expressed as cs(n) = ce(n) + c (n) Using IDFT, cepstral coefficients are obtained

Cepstral Coefficients : cs(n) = f 1(log* f (s(n))+


Mel-frequency Scaling
Human auditory system does not follow a linear scale. For each tone with an actual frequency, f, measured in Hz, a subjective pitch is mapped on a scale called the mel scale.

The mel-frequency scale is a

Linear frequency spacing below 1000 Hz Logarithmic spacing above 1000 Hz.
For simulating subjective spectrum triangular filter bank is used. The relation between linear frequency and mel frequency is as follows:
Mel(f)=2595* log 10 (1+ f / 700)

MFCC Computation

Vector Quantization
Vector quantization (VQ) is used for command identification

Mapping large vector space into smaller one Each region is called a cluster Each cluster represented by a centroid The collection of centroids form the codebook

Merits of using VQ Amount of data is significantly less Reduced amount of computations

Codebook is smaller than the original still it accurately represents command characteristics

Codebook Generation
LindeBuzoGray(LBG) algorithm used for codebook generation recursively

Step 1

Design a 1-vector codebook

Doubling by splitting Nearest-Neighbor Search Centroid Update Repeat steps 3 and 4 Repeat steps 2, 3 and 4

Step 2

Step 3

Step 4

Step 5

Step 6

Command Matching
Feature Extraction is performed & stored in sequence of vectors

1. Vector sequence compared with those in codebook. 2. For each codebook a distance measure is computed. 3. The command with the lowest distance is chosen.


PC control for physically handicapped people Pronunciation evaluation in computer-aided language learning applications Video games, with Tom Clancy's EndWar and Lifeline as working examples Home automation NASAs Mars Polar Lander used speech recognition from technology Sensory, Inc. in the Mars Microphone on the Lander