You are on page 1of 11

Final Year Project Proposal on


Under supervision of: Dr. Y N Singh Associate Professor Department of Computer Science and Engineering Institute of Engineering and Technology, Lucknow

Proposed by: Aditya Sharma Computer Science and Engineering Final Year Roll No: 1005210005

Problem Statement
Given a speech sample uttered by a given user , the system will sense the voice activity and extract out the significant voice sample thereafter converting it into text message according to the language specification and model .
This text message can be further used to send commands to the system or as an input into an expert system.

Basic Challenges
Robustness graceful degradation, not catastrophic failure Portability independence of computing platform Adaptability to changing conditions (different mic, background noise, new speaker, new task domain, new language even) Language Modelling is there a role for linguistics in improving the language models? Confidence Measures better methods to evaluate the absolute correctness of hypotheses. Out-of-Vocabulary (OOV) Words Systems must have some method of detecting OOV words, and dealing with them in a sensible way. Spontaneous Speech disfluencies (filled pauses, false starts, hesitations, ungrammatical constructions etc.) remain a problem. Prosody Stress, intonation, and rhythm convey important information for word recognition and the user's intentions (e.g., sarcasm, anger) Accent, dialect and mixed language non-native speech is a huge problem, especially where code-switching is commonplace

Speech Recognition Process

Acoustic Model (HMM)
Input Speech

Feature Analysis (Spectral Analysis)

Pattern Classification (Decode/Search)

Hello World Utterance Verification

Language Model

Word Lexicon

Milestones in Speech Recognition Research

Isolated Words

Isolated Words; Connected Digits; Continuous Speech

Connected Words; Continuous Speech

Continuous Speech; Speech understanding

Spoken Dialog; Multiple modalities

Filter-bank analysis; Timenormalization ; Dynamic programming 1962

Pattern Recognition; LPC analysis; Clustering algorithms; Level Building

Hidden Markov models; Stochastic language Modeling

Stochastic language understanding; Finite-state machines; Statistical learning 1987 1992

Concatenative synthesis; Machine learning; Mixedinitiative dialog




1982 Year



Future of Speech Recognition Technologies

Very Large Vocabulary, Limited tasks, Controlled Environment Very Large Vocabulary, Limited tasks, Arbitrary Environment Unlimited Vocabulary, Unlimited tasks, many Languages

Dialog Systems

Robust Systems

Multilingual Systems; Multimodal Speech Enabled Devices


2005 Year



Software Modules
Preprocessing Voice Activation Detection Input Noise Cancelling Pre-emphasis

Frame Blocking and Windowing

Feature Extraction

Post processing
Observations for HMM based classification Weight Function Normalization O={O1, O2,O3,.On}

HMMs in Automatic Speech Recognition

HMM can be used to classify features sequences to known classes by making a HMM for each class.

By determining the probability of a sequences to the HMMs, we can decide Which HMM could most probably generate the sequence.

There are several idea about what to model: Isolated word recognition (HMM for each word) Monophone acoustic model (HMM for each phone) Triphone Acoustic model (HMM for each three phone sequence)

Hierarchical System of HMMs

HMM of a Triphone

HMM of a Triphone

HMM of a Triphone

Higher level HMM of a word

Language model

HMM Limitations
Data intensive Computationally intensive

50 phones = 125000 possible triphones

3 states per triphone 3 Gaussian mixture for each state 262 trillion trigrams 2-20 phonemes per word in 64k vocabulary

64k word vocabulary

39 dimensional feature vector sampled every 10ms

100 frame per second

Reading and References

M. Narasimha Murty and V. Susheela Devi, Pattern Recognition, An Algorithmic Approach, Springer University Press. Christopher M Bishop, Pattern Recognition and Machine Learning, Springer University Press Lawrence Rabiner and Biing-Hwang Juang, Fundamentals of Speech Recognition, Prentice-Hall International, Inc. Lawrence Rabiner, A Tutorial on Hidden Markov Models and Selected applications in Speech Recognition, IEEE.