You are on page 1of 23

Minor Project On

Speech Recognition

Guided by: Chinmaya Ku. Swain Asst. Prof. Comp. Sci. Engg.

Presented by:Ashish Kumar (0811012097) 4th year( comp.sci.engg) College:- iter(bhubaneswar) india

Introduction to Speech Recognition Steps of Speech Recognition Speech Recognition Algorithm Speech Recognition Software Problems in Speech Recognition Techniques Future Scope of Speech Recognition Conclusion

 The technique of enabling a computer to understanding human language.  Developed in the late 1960 s by AT&T Bell Labs.  Follows a fundamental DATABASE APPROACH.  The following mark the phases of Speech Recognition Techniques: Phase I Converting Speech Input in to Digital Signals. Phase II Identifying the various Digital signals and its meaning. Phase III Providing an appropriate response to the input speech.  Practical usage: Its is used by both the private user for daily ease of use, and the commercial users to increase productivity and save time.

The Most Important Algorithm and the most commonly used. Its is a statistical model implementing Dynamic Bayesian Network Widely used in temporal pattern recognition for speech, handwriting, gesture, part-of-speech etc. HMM used in Speech Recognition views the speech signal as a piece-wise stationary signal or a short-time stationary signal ranging within 10msec.

MARKOV ASSUMPTION.

1.Limited Horizon Assumption. 2.Stationary Process Assumption

QUESTIONS OF A MARKOV PROCESS.

1.Probability of a particular sequence of state z. 2.How to estimate the parameters of model.

PROBABILITY OF A STATE SEQUENCE.

MAXIMUM LIKELIHOOD PARAMETER ASSIGNMENT.

We don t get to observe actual sequence of states rather we can only observe some outcome generated by each state. It s a Markov model for which we have a series of observed output x={x1,x2 xT} drawn from output alphabet v={v1,v2 vN} OUTPUT INDEPENDENCE ASSUMPTION

A new parameter , matrix B encodes the probability our hidden state generating output Vk given that the corresponding time was sj

QUESTIONS OF HMM. 1.Probability of an observed sequence. 2.Most likely series of states to generate the observation. 3.Learning the values of HMM parameters(A,B) FORWARD PROCEDURE: For probability of observed sequence.

VITERBI ALGORITHM: Maximum likelihood state assignment. Just like forward procedure except that instead of tracking the probability we only track the maximum probability and record its corresponding state sequence.

EXPECTATION MAXIMISATION ALGORITHM: Parameter learning for HMM. That is to determine the value of matrix A and B.

 After identification of the speech input, the system must provide a

valid response.  The identified speech command is dynamically compared to list of pre-defined commands present in the software database.  The software libraries must be updated in order to provide optimal performance.  For better performance of the recognition algorithm, adaptive modification may be done where each of the pre-defined commands are modified to match the accent, dialect, and speech mannerisms of the user during the testing phase.

 Definition : It is a theoretically motivated range of computation

techniques for analyzing and naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications.  It enables a computer to process human speech and provide a corresponding human output without implementing any additional database of responses.  Being a concept within the scope of Artificial Intelligence, it aims to provide computers ability to produce near-human responses.

Paraphrase an input text: To convert a given a command string in text to a simpler and structured form. In case of speech recognition, the speech input after being identified using the recognition algorithms, is simplified into a structured format that is understood by the computer system. Process the input text: To process the simple text command to form set of output results. Determine an appropriate response: Draw inference from the text output, and process a number of response statements .

Due to increase in popularity and needs for a better interface to computer system, there are a number of Speech Recognition software in market. The most common among them are:  Windows Speech Recognition by Microsoft  Nuances Dragon NaturallySpeaking

 Developed

in 1993, by Xuedong Huang.

 Included in Windows Vista/7 Operating System.  Integrates Microsoft text-to-speech API with Microsoft Speech API 5.3.  Includes a prototype Microsoft Dictation package .  Latest development in Microsoft Bing Search Engine by the use of Microsoft Speech Recognition 8.0.

 Developed and sold by Nuances Communication for Windows OS.  Preceded by DragonDictate for Mac OS.  Released in 1982 for DOS, implementing HMM.  Based on Discrete Utterance Speech Recognition Engine to under stand continuous speech.  Requires high processing speed and heavy CPU usage.  Integrates a heavy training phase into the initial installation stage.

Performance of the speech recognition can be measure in term of accuracy and speed
Accuracy can be measure in term of Word Error Rate.

WER=(S+D+I)/N
Speed can be measure in term of Real Time Factor.

RT F = P / I

Requires high processor speed.

 Noise Problem and Distortion.  Ambiguity Problem.  Portability Problem.  Out-of-Vocabulary Problem.

Ambiguity problems are two type  Homophones


ONE ANALYSIS The tail of a dog The sail of a boat ALTERNATIVE ANALYSIS The tale of a dog The sale of a boat

Word boundary ambiguity


It s not easy to wreck a nice beach. It s not easy to recognize speech. It s not easy to wreck an ice beach.

HYBRID LANGUAGE MODEL


Used for detecting the problems out of vocabulary Here we partition the recognition vocabulary into 2 sets First set contains the most frequent N words 2nd set rest of vocabulary. Each word that belongs to 2nd mapped to it s pronunciation Ex- If the word outfit is in the 2nd set then the sentence What type of outfit do you have? will be transformed to What type of /aw/ /T/ /f/ /ih/ /T/ do you have? And the final dictionary will be outfit ! / a w / / T / / t / / f / a w / ! / a w / ! / / T /

 Noise Power Subtraction Algorithm

Clean speech = Corrupt Speech - Noise Spectrum


 Mean Capstral Normalization Approach

# This is used in case of larger vocabulary here. # The speech is parameterized using capstral parameter in which mean is subtracted from capstral element.

 UNIVERSAL TRANSLATOR:-

By combining the automatic translator and voice activation.


 It can be also used as fighter air crafts and helicopters.  It can be also used as air traffic controller.

Even with wide sophistication and high level of research, Speech Recognition is still at its base level. The main implementation of this technology is by its merger with the concept of Natural Language Processing . In this Age of Automation, prospects of Speech Processing are limitless.

Speech And Language Processing by Jorasky and Martin. Spoken Language Processing by Xuedong Huang Hidden Markov s Models Fundamentals by Daniel Ramage Speech Recognition Technology: A Critique by Stephen E. Levinson

You might also like