Jennifer - Springer - 2020 Published PDF

Perceptual Linear Prediction Feature
as an Indicator of Dysphonia
Jennifer C. Saldanha and Malini Suvarna
Abstract Voice is the most widely used form of communication in humans. The
analysis of human voice is one of the areas that has been of great significance. This
area is studied for its various applications in the field of medicine and engineering.
The analysis in general deals with the extraction of some parameters of the voice
signal for investigation and processing. A system that diagnoses the patients from
their speech signals is proposed in this work. It includes detection and classifica-
tion of certain common medical conditions which affect the voice patterns of the
patients. The feature vectors of the speech samples are obtained by using perceptual
linear prediction (PLP) and relative spectral transform perceptual linear prediction
(RASTA–PLP) feature extraction methods. The detection and classification of voice
pathology are done by using the support vector machine (SVM) classifier. The accu-
racy of the classifier is computed using speech samples of gastroesophageal reflux
disease (GERD) and vocal fold paralysis from the pre-classified database.
Keywords Mel frequency cepstral coefficients · Linear prediction cepstral

coefficients · Perceptual linear prediction · Relative spectral transform perceptual
linear prediction · Support vector machine
1 Introduction
The information or data transferred from one individual to another through speech is
called as speech communication. Speech is merely a variation in the amount of pres-
sure coming from the speaker’s mouth. Such changes in pressure propagate though
air as waves and enter the ears of the listening person. The listener decodes these wave
J. C. Saldanha (B)
St. Joseph Engineering College, Affiliated to Visvesvaraya Technological University Belagavi,
Mangalore, Karnataka, India
e-mail: jennifers@sjec.ac.in
M. Suvarna
Mangalore Institute of Technology and Engineering, Affiliated to Visvesvaraya Technological
University Belagavi, Moodbidri, Karnataka, India
© Springer Nature Singapore Pte Ltd. 2020 51

V. I. George and B. K. Roy (eds.), Advances in Control Instrumentation
Systems, Lecture Notes in Electrical Engineering 660,
https://doi.org/10.1007/978-981-15-4676-1_5
52 J. C. Saldanha and M. Suvarna
signals into a meaningful message. The term “speech” is used to denote the sound that
the human being makes in order to communicate with other people. Human voice is
adversely affected by numerous medical conditions. These conditions are primarily
occurring in the vocal system of the individual person. There are many tools available
to detect the pathology in speech [1]. All these tools are either invasive which are
harmful or they require the service of an expert to analyze the speech signal features.
Hence, a non-invasive tool is required for the assessment of the speech pathology
which is accurate and reliable. The system should automatically recognize the speech
abnormalities. In general, a high-quality dysphonia detection and classification tech-
nique perceives whether the speech sample is normal or disordered sample. If in case
the speaker’s voice is pathological, then the system should categorize the pathology
as gastroesophageal reflux disease (GERD) or vocal fold paralysis.
In the medical terminology, “dysphonia” means disorder of voice. It is the variation
in speech signal produced due to disordered larynx involved in the phonation. The
most common voice disorder and their symptoms are
1. GERD: This voice disorder affects the lower esophageal sphincter (LES) of the
digestive system. LES is a ring of muscle between the esophagus and stomach.
2. Vocal Fold Paralysis: This type of paralysis relates to the loss of muscle function
in a particular part of the body. If any one of the vocal cord or both the vocal
cords are unable to move, then the person will experience voice problems. They
may also possibly suffer from breathing and swallowing problems.
2 Related Work
Over the recent years, many researchers have worked on acoustic analysis of speech
for automatic recognition of voice dysphonia in patients. This work mainly involves
in finding the most vital feature in voice to estimate the voice quality and then identi-
fying a classifier to detect the pathology. Pathological voices are mainly characterized
by pitch jitter, pitch shimmer, and noise. In the work by Davis [2], the average fun-
damental frequency is used as a speech feature to identify the presence of voice
disorder. In the work by Kasuya et al. [3], the pith jitter and shimmer characteristics
are measured from pathological voices for identification of voice pathology. Esti-
mating noise parameters is important in pathological voice analysis as most of the
disordered voices are hoarse in nature consisting of high degree of noise resulting
into a hoarse voice. The author Michales et al. [4] use glottal-to-noise excitation
ratio as a feature for indicating the presence of pathology in the larynx. Harmonic-
to-noise ratio is used as speech feature in [5–7] along with normalized noise energy
and critical band energy spectrum as features of voice and laryngeal pathology.
Cepstral coefficients of sustained phonation of vowels are widely used as parame-
ter to detect and classify the normal voice from the pathological voice. The variation in
the glottal waveform for the sustained vowels is detected easily from the cepstral coef-
ficients. The speech samples are de-convolved into source and system components
Perceptual Linear Prediction Feature as an Indicator of Dysphonia 53
by cepstral analysis. The cepstral coefficients are highly uncorrelated with each other
and produce the data compression to the large extent. Mel frequency cepstral coef-
ficients and linear prediction coefficients are used for discrimination of pathological
voices from the normal ones. A linear discriminant analysis (LDA) classifier, princi-
pal component analysis (PCA) with minimum distance classifier (MDC), principal
component analysis (PCA) with k-nearest neighbor (k-NN) classifier, PCA with LDA
classifiers are implemented, and an accuracy up to 93% with PCA + LDA classifier
is gained. Least accuracy in classification is obtained with PCA + MDC when com-
pared with the rest of the classifiers [8]. Hariharan et al. [9] proposed two new param-
eters, wavelet packet transform and singular value decomposition for the detection of
dysphonia. k-means algorithm is used for classification of extracted parameters. The
speech signals for this study are taken from Massachusetts Eye and Ear Infirmary
(MEEI) database and MAPACI database. The accuracy of k-means classifier is com-
pared with four other classifiers, namely k-nearest neighbor (k-NN), probabilistic
neural network, least-square support vector machine, and general regression neural
network. The classification accuracy shows that the parameters proposed in this work
are favorable in classifying the pathology, and an accuracy of 100% is gained in both
MEEI and MAPACI voice disorder databases. In the work by Godino-Llorente and
Gomez-Vilda [10], mel frequency cepstral coefficients are used as parameters with
classifiers such as multilayer perceptron network and learning vector quantization for
voice pathology detection. The architectures developed by the authors detect the voice
disorders like glottis cancer under high reliable conditions. Under identical work-
ing conditions, the learning vector quantization technique showed more reliability
than the multilayer perceptron technique. It yielded an accuracy of 96% in detecting
the pathology. Llorente et al. [11] used mel frequency cepstral coefficient features
along with their first and second derivatives as parameters to identify voice disorders.
F-Ratio, Fisher discriminant ratio, and Gaussian mixture models are used for opti-
mization of feature vectors and classification in this work. The classification accuracy
of 96% is achieved in this method. In the work proposed by Dibazar et al. [12], mel
frequency cepstral coefficients and pitch dynamics are extracted from 700 subjects of
normal and different pathology cases from MEEI database to evaluate their proposed
system. These feature sets are modeled by Gaussian mixtures using hidden Markov
model (HMM) classifier. The sustained phoneme /a/ data is taken for evaluation.
The proposed method gave an accuracy of 99.44% in classifying the cases of normal
and pathological speech. There was error rate detection improvement of 8% over
the highly performing classifiers. The analysis was done using cautiously measured
features prevailing in the modern pathological speech analysis. Hirano et al. [13] con-
ducted the acoustic analysis of pathological voices of each patient before and after
the treatment of their voices. The acoustic parameters namely amplitude perturba-
tion quotient (APQ), pitch perturbation quotient (PPQ), and normalized noise energy
(NNE) are utilized in the analysis. The analysis of the results is concluded as follows:
PPQ and APQ parameters are found greater in cases with paralysis than in cases with
carcinoma and polyp. These parameters are not found useful discriminating the three
disease groups, i.e., paralysis, carcinoma, and polyp which are under investigation.
All the three parameters are correlated to the roughness, breathiness, quality and
level of hoarseness, rate of airflow, and periodicity of the vocal fold vibration. Zhang
and Jiang [14] used the acoustic characteristics of sustained phonation of the vowels
and running vowels to extract the parameters such as shimmer, jitter, signal-to-noise
ratio (SNR), correlation dimension, and second-order entropy. The analysis of jit-
ter, correlation dimension, shimmer, and second-order entropy for sustained vowels
displayed a noteworthy distinction between normal and pathological voice samples.
The jitter and shimmer parameter did not statistically distinguish between the normal
and disordered voices for the running voice samples. But altogether, the parameters
SNR, correlation dimension, and second-order entropy gave a remarkable contrast
between the pathological voices and normal voices. MFCC feature set is used along
with GMM classifier by Pravena et al. [15] for voice classification. It is observed that
GMM with 16 mixture obtained good accuracy in classification between normal and
pathological samples.
A perceptual linear prediction (PLP) and relative spectral transform perceptual
linear prediction (RASTA–PLP) are widely used in the literature for speech signal
parameterization and recognition. These features are used in this study to test their
effectiveness to find the presence of pathology using speech samples.
3 Methodology
The speech signal to be evaluated is given as the input to the system. The PLP features
are extracted to detect if the speech sample is normal or pathological. The extracted
speech features are given to the trained SVM classifier. This classifier classifies the
speech sample as either normal or pathological. If the speech sample is found to be
pathological, the RASTA–PLP feature extraction is done and given as input to the
trained SVM classifier. The classifier classifies the speech sample either as paralysis
speech sample or GERD speech sample.
4 Implementation
4.1 Windowing
The speech signal input for this work is sustained phonation of vowel /a/ of normal and
pathological classes. This data sample is resampled at 8 kHz and windowed using
hamming window of 20 msec duration resulting into speech frames of the same
duration. An overlap of 50% is considered while windowing in order to maintain
signal continuity. Figure 1 shows the speech frame of the normal sample, and Figs. 2
and 3 show the speech frame of the paralysis and GERD pathologies, respectively.
The glottal excitation for voice speech is in the form of quasi-periodic pulses. Hence,
a periodic pattern is observed in the normal speech sample in Fig. 1, whereas the
Fig. 1 Windowed normal 0.8

speech sample for sustained Normal Speech frame after Windowing
phonation of vowel /a/ 0.6
0.4
0.2
-0.2
-0.4
-0.6
0 20 40 60 80 100 120 140 160
Fig. 2 Windowed speech 0.6

Paralysis Speech frame after Windowing
frame of a paralysis sample
0.4
for sustained phonation of
vowel /a/ 0.2
-0.2
-0.4
-0.6
-0.8
0 20 40 60 80 100 120 140 160
Fig. 3 Windowed speech 0.3

frame of a GERD sample for GERD Speech frame after Windowing
0.2
sustained phonation of vowel
/a/ 0.1
-0.1
-0.2
-0.3
-0.4
-0.5
0 20 40 60 80 100 120 140 160
vocal fold paralysis and GERD speech sample in Figs. 2 and 3, respectively, show
no periodicity due to non-periodic vibration of vocal folds because of pathology.
4.2 PLP Estimation
The PLP algorithm [16] is based on the phenomenon of psychophysics of human

hearing. The PLP model improves the rate of speech recognition by modeling the
speech on the basis of auditory perception. Frequency characteristics are transformed
to follow the characteristics of the basilar membrane in cochlea of human ear in this
model. The steps involved in PLP computation are shown in Fig. 4. It consists of
convolving the power spectrum of the windowed speech sample with a bark filter
bank having 18 triangular filters. The filter bank is simulated as per the critical band
masking curve. To compensate for nonlinear sensitivity of human ear for different
frequencies, the convoluted pattern is pre-emphasized by using equal loudness curve.
In order to suppress the amplitude variations within different frequency bands and to
model the nonlinear relationship between intensity and loudness, cubic root compres-
sion of filter energies is performed. The next step in PLP is to obtain linear prediction
coefficients and then conversion from linear prediction to cepstral coefficients.
Fig. 4 PLP computation Speech
Hamming Window
|FFT|2
Bark Filter Bank
Equal Loudness Pre

Emphasis
Intensity Loudness
IDFT
Auto Regressive
Modeling
PLP coefficients
Fig. 5 Bark filter bank with 1

18 critical bands
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 2 4 6 8 10 12 14 16 18
In PLP feature, the properties of human perception are approximated using prac-
tical simulations, and the resulting auditory spectrum is subjected to autoregressive
all-pole model, which models the human vocal tract in speech production. The con-
version of the frequency from hertz to a nonlinear scale such as bark is an ideal
representation of resolution of the human hearing and is given by,
0.5
ω ω 2
(ω) = 6ln + +1 (1)
1200π 1200π
Here, ω = 2π f is the frequency in radians per second, and f is the frequency in

hertz. The triangular window is an approximation to the critical band along basilar
membrane in the cochlea of human ear. The bark scale is linear below 500 Hz and
logarithmic above 500 Hz. The bark filter bank with 18 triangular filters is shown in
Fig. 5.
The spectrum of speech is transformed into bark scale by the phenomenon called
frequency warping using Eq. 1. The warped spectrum is convoluted using the bark
filter bank consisting of 18 critical band filters which are simulated using critical
band masking curve as given in Eq. 2.
⎧ −2.5(− −0.5)
⎨ 10 k
for ≥ k + 0.5
ψk () = 1 for k − 0.5 < < k + 0.5 (2)
⎩ 1.0(−k +0.5)
10 for ≤ k − 0.5
As the human hearing has different sensitivity level for different frequencies, equal
loudness pre-emphasis is performed using Eq. 3. The next step in PLP computation
is to apply intensity loudness power law.
Fig. 6 PLP coefficients of a 0.1

normal speech sample 0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
1 2 3 4 5 6 7 8 9 10

GERD speech sample
0.05
-0.05
-0.1
-0.15
-0.2
-0.25
1 2 3 4 5 6 7 8 9 10
2
ω + 56.8 × 106 ω4
E(ω) = 2 (3)
ω2 + 6.3 × 106 ω2 + 0.38 × 109
In the last step of PLP computation, the obtained auditory spectrum is approxi-
mated by the autocorrelation method of all-pole spectral modeling. The autocorre-
lation coefficients are then converted to cepstral coefficients of the all-pole model.
Figure 6 shows the plot of PLP coefficients of the normal speech sample. Figures 7
and 8 show the plot of PLP coefficients of GERD and vocal fold paralysis speech
samples, respectively.
It is observed that the plot of PLP coefficients of GERD and paralysis samples
varies from the PLP coefficient plot of normal ones. But there is no significant change
observed in the PLP coefficients of GERD and paralysis samples.
4.3 RASTA–PLP Estimation
RASTA–PLP includes filtering to remove noise variations in the signal. RASTA

filter is applied to the signal in the log spectral domain. Figure 9 shows steps in the

paralysis speech sample
0.1
0.05
-0.05
-0.1
-0.15
-0.2
1 2 3 4 5 6 7 8 9 10
Fig. 9 RASTA–PLP
computation Input Speech Signal
Discrete Fourier Transform
Logarithm
Filtering
Inverse Logarithm
Inverse Discrete Fourier Transform
Solving Set of Linear Equations
Cepstral Recursion
Obtaining Cepstral Coeffiecients
computation of RASTA–PLP. The spectral analysis is done using the PLP technique.
The log of each critical band spectrum is filtered using band pass filter, and inverse
logarithm is applied to filtered signal. The characteristic of band pass filter is varied
by compressing the static nonlinearity. The filter is used to suppress any constant or
slow varying component in each frequency channel of the bark filter. The remaining
steps are same as the PLP computation [17].
Figures 10, 11, and 12 show the plots of RASTA–PLP coefficients obtained from
normal, GERD, and vocal fold paralysis speech samples, respectively. From Figs. 11
Fig. 10 RASTA–PLP 0.4

coefficients of a normal
speech sample 0.2
-0.2
-0.4
-0.6
-0.8
-1
1 2 3 4 5 6 7 8 9 10

coefficients of a GERD
speech sample 0.2
-0.2
-0.4
-0.6
-0.8
-1
1 2 3 4 5 6 7 8 9 10

coefficients of a paralysis
speech sample 0.2
-0.2
-0.4
-0.6
-0.8
-1
1 2 3 4 5 6 7 8 9 10
Table 1 Number of voice

Speech Total Training Testing
samples for training and
samples samples samples samples
testing
Normal 53 37 16
GERD 48 34 14
Vocal fold 62 43 19
paralysis
and 12, it is observed that RASTA–PLP coefficients of GERD and vocal fold paraly-
sis do not vary much as RASTA–PLP includes band pass filter which eliminates noise
variation in each frequency band. For pathology detection, speaker-specific informa-
tion is more relevant as pathology affects human physiology, and this information is
being suppressed in above-specified features.
4.4 SVM Classifier
SVM classifies the data by finding a hyperplane which best separates two classes of
dataset [18]. It is a supervised machine learning algorithm. The stage of building a
predictor SVM model is called the training phase. Once the trained SVM model is
obtained, the system is ready to test the unknown data and classify them. This stage
of classifying the unknown test samples is called as the testing phase.
Table 1 shows the number of voice samples used for this study. About 70% of
the samples from the database are used to obtain a SVM model, and rest 30% of the
samples are used to test the system.
A dataset is created using the known speech samples. The dataset contains the
mean and standard deviation of the feature vectors for the training samples used.
4.5 Database
The speech signals for this work are taken from Massachusetts Eye and Ear Infirmary
(MEEI) database [19]. The speech samples in this database are available at sampling
frequency of 25 and 50 kHz. The length of the normal voice record is 3 sec, whereas
the pathologic voice record is 1 sec. In this work, 37 normal and 77 disordered voice
samples including both paralyses and GERD classes are used.
5 Results
The sensitivity, specificity, and accuracy of the classifier are calculated as follows:
Table 2 Performance of
PLP RASTA–PLP
SVM classifier for
classification of normal and TP 27 12
pathological speech samples TN 12 13
FP 4 3
FN 6 21
Sensitivity (%) 81.81 36.36
Specificity (%) 75 81.25
Accuracy (%) 79.59 51.02
TP
Sensitivity = ∗ 100 (4)
TP + FN
TN
Specificity = ∗ 100 (5)
TN + FP
TP + TN
Accuracy = ∗ 100 (6)
TP + FP + FN + TN
where,
TP True Positive, which gives the correct acceptance.
FN False Negative, which gives the false rejection.
TN True Negative, which gives the correct rejection.
FP False Positive, which gives the false acceptance.
Positive and negative are labels for the two different classes used. Table 2 shows
the results of classification between normal and pathological classes and Table 3
shows the results of classification between paralysis and GERD classes.
Table 3 Performance of
PLP RASTA–PLP
SVM classifier for
classification of paralysis and TP 9 10
GERD speech samples TN 12 16
FP 7 3
FN 5 4
Sensitivity (%) 64.28 71.4
Specificity (%) 63.15 84.21
Accuracy (%) 63.63 78.78
6 Discussion
This paper presents a non-invasive way for diagnosing patients suffering from voice
disorders using acoustic analysis of speech signal. The methodology for analysis
is digital processing of the speech signals. The speech signal is analyzed to find
the variation in the speech signal due to pathology in the vocal folds and evaluate
the voice signal based on pathology. This technique is based on the extraction of
individual feature sets like PLP and RASTA–PLP from the input speech samples and
classifies the extracted features using SVM classifier. In voice pathology detection,
PLP performs better than RASTA–PLP which is evident in Figs. 6, 7, and 8. It is
observed from the figures that the PLP coefficient plot of GERD and paralysis varies
drastically with each other, but they do not show any specific pattern for variation. The
variation in PLP coefficient plots of GERD and paralysis is random with each other.
It is also observed that there is a significant variation between the PLP coefficient
plots of the normal sample when compared with the pathological samples (including
GERD and paralysis). The SVM classifier gives an accuracy of 79.59% for detecting
a pathology using PLP coefficients and an accuracy of 51.02% using the RASTA–
PLP coefficients. The SVM classifier gives an accuracy of 63.33% for inter pathology
classification using PLP coefficients and an accuracy of 78.78% using RASTA–PLP
coefficients. RASTA–PLP performs inter pathology classification better than PLP
as the RASTA filter is used as a band pass filter and filters out the noise variations
that is introduced into the speech signal due to the presence of the pathology. The
accuracy of the SVM can be improved by increasing the number of samples in the
dataset.
7 Conclusion
In this work, the feature vectors of the speech samples are obtained using the PLP and
RASTA–PLP algorithms. The mean and standard deviations of the feature vectors
are found, and two separate datasets are created, that is, one using PLP features and
other using RASTA–PLP features. The datasets created are used to train the two
SVM models. From the experiments, it is concluded that, based on the specificity,
sensitivity, and accuracy, PLP gives better results while detecting the speech sample
as normal or pathological, and RASTA–PLP gives better results while classifying
between the pathologies. As a part of future work, a multi-class classification system
can be developed with a neural network classifier so that the detection of different
types of pathological speech can be done.
References
1. Alsulaiman M (2014) Voice pathology assessment systems for dysphonic patients: detection,
classification, and speech recognition. IETE J Res 60(2):156–167
2. Davis SB (1979) Acoustic characteristics of normal and pathological voices. Speech Lang
1:271–335
3. Kasuya H, Endo Y, Saliu S (1993) Novel acoustic measurements of jitter and shimmer
characteristics from pathological voices. In: Proceedings of EUROSPEECH’93, pp 1973–1976
4. Michales D, Grams T, Strube HW (1997) Glottal to excitation ratio—a new measure for
describing pathological voices. Acta Acustica United Acustica 83(4):700–706
5. Qi Y, Hillman RE (1997) Temporal and spectral estimation of harmonic to noise ratio in human
voice signals. J Acoust Soc Am 102(1):537–543
6. Shama K, Krishna A, Cholayya NU (2007) Study of harmonic to noise ratio and critical band
energy spectrum of speech as acoustic indicators of laryngeal and voice pathology. EURASIP
J Appl Sig Process (1)
7. Yomoto E, Gould WJ, Baer T (1982) Harmonic to noise ratio as an index of the degree of
hoarseness. J Acoust Soc Am 71(6):1544–1550
8. Saldanha JC, Ananthakrishna T, Pinto R (2014) Vocal fold pathology assessment using mel-
frequency cepstral coefficients and linear predictive cepstral coefficients features. J Med Imag
Health Inf 4:1–6
9. Hariharan M, Polat K, Yaacob S (2014) A new feature constituting approach to detection of
vocal fold pathology. Int J Syst Sci 45(8):1622–1634
10. Godino-Llorente J, Gomez-Vilda P (2004) Automatic detection of voice impairments by means
of short-term cepstral parameters and neural network based detectors. IEEE Trans Biomed Eng
51(2):380–384
11. Godino-Llorente J, Gomez-Vilda P, Blanco-Velasco M (2006) Dimensionality reduction of a
pathological voice quality assessment system based on gaussian mixture models and short-term
cepstral parameters. IEEE Trans Biomed Eng 53(10):1943–1953
12. Dibazar AA, Narayanan S, Berger TW (2002) Feature analysis for automatic detection of patho-
logical speech. In: Proceedings of the second joint 24th annual conference and the annual fall
meeting of the biomedical engineering society, engineering in medicine and biology, Houston,
TX, USA
13. Hirano M, Hibi S, Yoshida Y, Hirade Y, Kasuya H, Kikuchi Y (1988) Acoustic analysis of
pathological voice: some results of clinical application. J Acta Oto-Laryngol 105(5–6):432–438
14. Zhang Y, Jiang JJ (2008) Acoustic analyses of sustained and running voices from patients with
laryngeal pathologies. J Voice 22(1):1–9
15. Pravena D, Dhivya S, Durga Devi A (2012) Pathological voice recognition for vocal fold
disease. Int J Comput Appl 47(13):31–37
16. Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am
87(4):1738–1752
17. Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE Trans Speech Audio
Process 2(4):578–589
18. Fradkin D, Munchink I (2000) Support vector margins for classification. DIMACS Series in
Discrete Mathematics and Theoretical Computer Science
19. Kay Elemetrics Corp Disordered Voice Database (1994) 1.03 ed

Jennifer - Springer - 2020 Published PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Jennifer - Springer - 2020 Published PDF

Uploaded by

Copyright:

Available Formats

Perceptual Linear Prediction Feature

Jennifer C. Saldanha and Malini Suvarna

Keywords Mel frequency cepstral coefficients · Linear prediction cepstral

© Springer Nature Singapore Pte Ltd. 2020 51

Fig. 1 Windowed normal 0.8

phonation of vowel /a/ 0.6

Fig. 2 Windowed speech 0.6

Fig. 3 Windowed speech 0.3

4.2 PLP Estimation

The PLP algorithm [16] is based on the phenomenon of psychophysics of human

Fig. 4 PLP computation Speech

Bark Filter Bank

Equal Loudness Pre

Fig. 5 Bark filter bank with 1

Here, ω = 2π f is the frequency in radians per second, and f is the frequency in

Fig. 6 PLP coefficients of a 0.1

Fig. 7 PLP coefficients of a 0.1

4.3 RASTA–PLP Estimation

RASTA–PLP includes filtering to remove noise variations in the signal. RASTA

Fig. 8 PLP coefficients of a 0.15

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Solving Set of Linear Equations

Obtaining Cepstral Coeffiecients

Fig. 10 RASTA–PLP 0.4

Fig. 11 RASTA–PLP 0.4

Fig. 12 RASTA–PLP 0.4

Table 1 Number of voice

4.4 SVM Classifier

You might also like