You are on page 1of 2

Speech recognition using matlab

Project leader: Anurag kumar dwivedi

Team members: Gaurav srivastava

Nitin banarasi

Ram prakash Mishra

Objective:
This project aims at developing a system which is able to recognize the
human being on the basis of the words pronounced by them.

Technical details:
Speech recognition is a multileveled pattern recognition task, in which
acoustical signals are examined and structured into a hierarchy of subword
units (e.g., phonemes), words,phrases, and sentences. Each level may
provide additional temporal constraints, e.g., known word pronunciations or
legal word sequences, which can compensate for errors or uncertainties at
lower levels. This hierarchy of constraints can best be exploited by
combining decisions probabilistically at all lower levels, and making discrete
decisions only at the highest level. There is basically three parameters
which governs the speech recognition:
 Energy(Amplification)
 Spectral parameter
 Fundamental frequency
Since our speech contains both voiced , unvoiced and transient excitation.in
speech recognition system first of all we separate vocal tract from
excitation.Hence we use MFCC(mel frequency ceptral coefficient). The
speech introduced in the system is first sampled and digitized.After
digitizing we take small samples and using premphasis filter we eliminate
the unwanted slope which is occurred due to excitation.Now the signal
obtained from filter is windowed and we take the small part of it.Since the
windowing function changes every time & hence the over lapping occurs
.This overlapping is used is used to recognize the speech. Similar to
filtering a time signal by manipulation in the frequency domain cepstral
liftering create a smooth spectrum without ripple. Power spectrum is real
valued and symmetric .Hence we use cosine transform.To recognize the
speech we enter certain input into the system and then compute the
difference between actual input and desired input.Difference is smaller more
accurate our system will be.For this purpose we use different distance
measure algorithm .We use dynamic programing for pattern matching .It is
also known as dynamic time warping. DP finds the optimal time warping
function
needed to compare two vector sequences X and M.Simultaneously, the
distance D(X,M) between the vector sequences is computed.The vector
sequence which has the least distance is preferred for the further analysis.
High variability of speech patterns require a large number of prototypes to
adequately model a given class. Variability of speech patterns is extremely
high for speaker independent tasks.Hence we use statistic modeling of
speech production process.We use HMM(Hidden markov model) for
stochastic modeling.first all the object is classified into different classes.For
each class a stochastic model is constructed.For a given unknown vector
sequence the model specific emission probability density is computed.This
unknown vector sequence is assigned to those stochastic model which has
highest probability density. Each word model requires huge amount of
training data. Changing the vocabulary requires recording of new speech
samples.hence first we make the word model from subword unit and then a
limited number of subword units can be trained using a limited amount of
speech data.Since we know that All sequences of words differ often only in
the permutations of the most ambiguous words. To remove this problem we
use word graphs.

Innovativeness & Usefulness:


In this project we have used HMM model. HMMs have a variety of
applications. When HMM is applied to speech recognition,the states are
interpreted as acoustic models indicating what sounds are likely to be heard
during their corresponding segments of speech; while the transitions provide
temporal constraints, indicating how the states may follow each other in
sequence. Because speech always goes forward in time, transitions in a
speech application always go forward (or make a self-loop, allowing a state
to have arbitrary duration).

Tools used:
We have used Matlab R2008b.

You might also like