Professional Documents
Culture Documents
Speech Recognition
Guided by: Chinmaya Ku. Swain Asst. Prof. Comp. Sci. Engg.
Presented by:Ashish Kumar (0811012097) 4th year( comp.sci.engg) College:- iter(bhubaneswar) india
Introduction to Speech Recognition Steps of Speech Recognition Speech Recognition Algorithm Speech Recognition Software Problems in Speech Recognition Techniques Future Scope of Speech Recognition Conclusion
The technique of enabling a computer to understanding human language. Developed in the late 1960 s by AT&T Bell Labs. Follows a fundamental DATABASE APPROACH. The following mark the phases of Speech Recognition Techniques: Phase I Converting Speech Input in to Digital Signals. Phase II Identifying the various Digital signals and its meaning. Phase III Providing an appropriate response to the input speech. Practical usage: Its is used by both the private user for daily ease of use, and the commercial users to increase productivity and save time.
The Most Important Algorithm and the most commonly used. Its is a statistical model implementing Dynamic Bayesian Network Widely used in temporal pattern recognition for speech, handwriting, gesture, part-of-speech etc. HMM used in Speech Recognition views the speech signal as a piece-wise stationary signal or a short-time stationary signal ranging within 10msec.
MARKOV ASSUMPTION.
We don t get to observe actual sequence of states rather we can only observe some outcome generated by each state. It s a Markov model for which we have a series of observed output x={x1,x2 xT} drawn from output alphabet v={v1,v2 vN} OUTPUT INDEPENDENCE ASSUMPTION
A new parameter , matrix B encodes the probability our hidden state generating output Vk given that the corresponding time was sj
QUESTIONS OF HMM. 1.Probability of an observed sequence. 2.Most likely series of states to generate the observation. 3.Learning the values of HMM parameters(A,B) FORWARD PROCEDURE: For probability of observed sequence.
VITERBI ALGORITHM: Maximum likelihood state assignment. Just like forward procedure except that instead of tracking the probability we only track the maximum probability and record its corresponding state sequence.
EXPECTATION MAXIMISATION ALGORITHM: Parameter learning for HMM. That is to determine the value of matrix A and B.
valid response. The identified speech command is dynamically compared to list of pre-defined commands present in the software database. The software libraries must be updated in order to provide optimal performance. For better performance of the recognition algorithm, adaptive modification may be done where each of the pre-defined commands are modified to match the accent, dialect, and speech mannerisms of the user during the testing phase.
techniques for analyzing and naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications. It enables a computer to process human speech and provide a corresponding human output without implementing any additional database of responses. Being a concept within the scope of Artificial Intelligence, it aims to provide computers ability to produce near-human responses.
Paraphrase an input text: To convert a given a command string in text to a simpler and structured form. In case of speech recognition, the speech input after being identified using the recognition algorithms, is simplified into a structured format that is understood by the computer system. Process the input text: To process the simple text command to form set of output results. Determine an appropriate response: Draw inference from the text output, and process a number of response statements .
Due to increase in popularity and needs for a better interface to computer system, there are a number of Speech Recognition software in market. The most common among them are: Windows Speech Recognition by Microsoft Nuances Dragon NaturallySpeaking
Developed
Included in Windows Vista/7 Operating System. Integrates Microsoft text-to-speech API with Microsoft Speech API 5.3. Includes a prototype Microsoft Dictation package . Latest development in Microsoft Bing Search Engine by the use of Microsoft Speech Recognition 8.0.
Developed and sold by Nuances Communication for Windows OS. Preceded by DragonDictate for Mac OS. Released in 1982 for DOS, implementing HMM. Based on Discrete Utterance Speech Recognition Engine to under stand continuous speech. Requires high processing speed and heavy CPU usage. Integrates a heavy training phase into the initial installation stage.
Performance of the speech recognition can be measure in term of accuracy and speed
Accuracy can be measure in term of Word Error Rate.
WER=(S+D+I)/N
Speed can be measure in term of Real Time Factor.
RT F = P / I
Noise Problem and Distortion. Ambiguity Problem. Portability Problem. Out-of-Vocabulary Problem.
# This is used in case of larger vocabulary here. # The speech is parameterized using capstral parameter in which mean is subtracted from capstral element.
UNIVERSAL TRANSLATOR:-
Even with wide sophistication and high level of research, Speech Recognition is still at its base level. The main implementation of this technology is by its merger with the concept of Natural Language Processing . In this Age of Automation, prospects of Speech Processing are limitless.
Speech And Language Processing by Jorasky and Martin. Spoken Language Processing by Xuedong Huang Hidden Markov s Models Fundamentals by Daniel Ramage Speech Recognition Technology: A Critique by Stephen E. Levinson