Professional Documents
Culture Documents
1812082 1812088
Abstract—A Speech Recognition is a cycle to empower Com- interaction. The feature extraction is the process of removing
puters to distinguish and react to human speech sounds. The unwanted and redundant information and retains only the
project analyzes include extraction procedures applied in speech useful information in type of speaker independent automatic
acknowledgment, even presence of numerous strategies; the
exactness rate is a central point of interest in speech acknowl- speech recognition.
edgment. In this article we present some notable extraction
procedures, for example, LPC, MFCC, RASTA, PCA, LDA,
and PLP recognize for the most part utilized element extraction
strategy in speech acknowledgment measure.
Index Terms—Speech Recognition, Feature Extraction, LPC,
RASTA, MFCC, PCA, LDA ,PLP II. L ITERATURE S URVEY
I. I NTRODUCTION
Ganga Banavath, Sreedhar Potla aimed to describe
Speech is one of the ancient ways to express ourselves.
an efficient technique or method resulting in effective
Today these speech signals are also used in biometric
speech processing applications. Performance of a classifier
recognition technologies and communicating with machine.
is a function depending on the length of speech sample,
These speech signals are slowly timed varying signals (quasi-
environment etc. They carried out their work using Mel
stationary). When examined over a sufficiently short period
frequency cepstral coefficients (MFCC) along with Gaussian
of time (5-100 msec), its characteristics are fairly stationary.
Mixture Model (GMM) classifier.
But, if for a period of time the signal characteristics changes,
Kishori R. Ghule, R. R. Deshmukh recognized that speech
it reflects to the different speech sounds being spoken. The
features extraction and word recognition, these two steps
information in speech signal is actually represented by short
are followed. After feature extraction feature matching is
term amplitude spectrum of the speech wave form. This
performed for word recognition. Their paper describe the
allows us to extract features based on the short term amplitude
different feature extractions techniques like MFCC, LPC,
spectrum from speech. Speech Recognition is the ability
LPCC, DWT etc.
P.V. Naresh examined feature extraction techniques applied
in speech recognition, even existence of many techniques; the
accuracy percentage is a key issue in speech recognition.In
their article they have presented some well-known extraction
techniques such as LPC, MFCC, RASTA, PCA, LDA, and
PLP identify mostly used feature extraction technique in
speech recognition process.
Urmila Shrawankar examined that features should describe
each segment in such a characteristic way that other similar
Fig. 1. Feature Extraction
segments can be grouped together by comparing their
of machine or program to identify words and phrase from features. There are enormous interesting and exceptional
spoken language and convert them in to machine readable ways to describe the speech signal in terms of parameters.
format. The main intension of speech recognition area is to Anup Vibhute intended to focus on the survey of various
evolve techniques and system for speech input to machine. feature extraction techniques in speech processing such as Fast
Speech is the way of communication in human beings, and Fourier Transforms, Linear Predictive Coding, Mel Frequency
it is the dominancy of this medium that motivates research Cepstral Coefficients, Discrete Wavelet Transforms, Wavelet
efforts to allow speech to become a viable human computer Packet Transforms, Hybrid Algorithm DWPD and their
applications in speech processing.
III. F EATURE E XTRACTION T ECHNIQUES E. Wavelet Packet Decomposition(WPD)
A. Linear prediction coding (LPC) The wavelet packet decomposition is the wavelet transform.
LPC is one of the good signal analysis methods for linear In WPD the signal is passed through more filters than discrete
prediction in speech recognition process. The feature extrac- wavelet transform. Wavelet packets are the linier combination
tion techniques find out the basic parameters of speech. LPC of wavelets. The coefficients in the linear combinations are
is the most powerful method for determining the basic pa- computed by a recursive algorithm making each newly com-
rameter and computational model of speech. The idea behind puted wavelet packet Coefficient sequence the root of its own
LPC is the Speech sample can be approximated as a linear analysis tree.[5]
combination of past speech samples[1]
B. Mel frequency Cepstral Coefficient (MFCC) F. Perceptual Linear Prediction (PLP)
MFCC is most popular feature extraction technique. Fre-
The Perceptual Linear prediction (PLP) technique is devel-
quency bands are placed logarithmically here so it approx-
oped by the Hermansky. PLP removes the unwanted informa-
imates the human system response more closely than any
tion of the speech and thus improves speech recognition rate.
other system. Due to its advantage of less complexity in
PLP is identical to LPC except that its spectral characteristics
implementation of feature extraction algorithm, only sixteen
have been transformed to match characteristics of human
coefficients of MFCC corresponding to the Mel scale fre-
auditory system.
quencies of speech Cepstrum are extracted from spoken word
samples in database.[2][3]
IV. S PEECH R ECOGNITION T ECHNIQUES
11) We now run the model on the test set and check its
performance. We observe an accuracy of 90 %.
Fig. 4. Audio waveforms with their labels 12) We finally, verify the model’s prediction output using an
input audio file of someone saying ”no.” We can see that
8) We convert the waveform into a spectrogram, which our model very clearly recognized the audio command
shows frequency changes over time and can be repre- as ”no.” Because ’no’ and ’go’ audio commands are very
sented as a 2D image. This can be done by applying similar in terms of syllables. Hence the prediction rate
the short-time Fourier transform (STFT) to convert the for ’no’ as ’go’ is high. Even humans can make mistakes
audio into the time-frequency domain. differentiating between the two.
will eventually give better results when provided with more
audio files and more time to train the model.
R EFERENCES
[1] Leena R Mehta 1, S.P.Mahajan 2, Amol S Dabhade” Comparative Study
Of MFCC And LPC For Marathi Isolated Word Recognition System”
Lecturer, Dept. of ECE, CusrowWadia Institute of Technology, Pune,
Maharashtra, India 1 Associate Professor, Dept. of ECE, College of
Engineering , Pune, Maharashtra, India 2 PG Student [SP], Dept. of ECE,
College of Engieering, Pune, Maharashtra, India 3, International Journal
of Advanced Research in Electrical, Electronics and Instrumentation En-
gineering Vol. 2, Issue 6, June 2013
[2] Vimal Krishnan V.R “Features of Wavelet Packet Decomposition and Dis-
crete Wavelet Transform for Malayalam Speech Recognition”, BabuAnto P
School of Information Science and Technology Kannur University, Kerala,
India. 670 567, International Journal of Recent Trends in Engineering, Vol.
1, No. 2, May 2009
[3] Hazrat Ali1,2*, Nasir Ahmad3, Xianwei Zhou2, Khalid Iqbal2 and
Sahibzada Muhammad Ali4 “DWT features performance analysis for auto-
matic speech recognition of Urdu” Ali et al. SpringerPlus a SpringerOpen
Journal 2014.
[4] Nidhi Desai1, Prof.Kinnal Dhameliya2, Prof.Vijayendra Desai3 “ Feature
Extraction and Classification Techniques for Speech Recognition: A Re-
Fig. 7. Confusion Matrix view” 1M.Tech. [Electronics andCommunication] Student, Department Of
Electronics and Communication Engineering, C.G.P.I.T, Bardoli, Gujarat,
International Journal of Emerging Technology and Advanced Engineer-
ing, Website: www.ijetae.com ISSN 2250-2459, ISO 9001:2008 Certified
Journal, Volume 3, Issue 12, December 2013
[5] M.A.Anusuya, S.K.Katti, “Comparison of Different Speech Feature Ex-
traction Techniques with and without Wavelet Transform to Kannada
Speech Recognition”, International Journal of Computer Science and
Information security, Vol.6, No.3, 2010
[6] Santosh K.Gaikward and Bharti W.Gawali, “A Review on Speech Recog-
nition Technique,” International Journal of Computer Applications, vol 10,
No.3, November 2010
[7] Shivanker Dev Dhingra 1, Geeta Nijhawan 2 , Poonam Pandit3, “Isolated
Speech Recognition Using MFCC And DTW”,International Journal of
Advanced Research in Electrical, Electronics and Instrumentation Engi-
neering, Vol. 2, Issue 8, August 2013
[8] M.A.Anusuya, “Speech Recognition by Machine,” International Journal
of Computer Science and Information security, Vol.6, No.3, 2009
[9] Nidhi Srivastava and Dr.Harsh Dev“Speech Recognition using MFCC and
Neural Networks”, International Journal of Modern Engineering Research
(IJMER), march 2007
VII. C ONCLUSION
We have successfully designed a speech recognition model
which has been trained to identify eight different words being
said by the user. We observe that the model gives an accuracy
of about 90 % on giving around 8000 audio files. This model