Professional Documents
Culture Documents
Abstract— In this paper, a proposed speech recognition Artificial neural network performance is depending on the
algorithm is presented. This paper deals with the combination of a size and quality of training samples [6]. When the number
feature extraction by wavelet transform, subtractive clustering and of training data is small, not representative of the possibility
adaptive neuro-fuzzy inference system (ANFIS). The feature space, standard neural network results are poor [7], [8].
extraction is used as input of the subtractive clustering to put the Incorporation of neuro-fuzzy techniques can improve
data in a group of clusters. Also it is used as an input of the neural performance in this case [7], [8]. Fuzzy theory has been
network in ANFIS. The initial fuzzy inference system is trained by
used successfully in many applications [9]. This
the neural network to obtain the least possible error between the
desired output (target) and the fuzzy inference system (FIS) output applications show that fuzzy theory can be used to improve
to get the final FIS. The performance of the proposed speech neural network performance. Artificial neural networks
recognition algorithm (SRA) using a wavelet transform and (ANN) are known to be excellent classifiers, but their
ANFIS is evaluated by different samples of speech signals- performance can be prevented by the size and quality of the
isolated words- with added background noise. The proposed training set [1].By incorporation some fuzzy techniques and
speech recognition algorithm is tested using different isolated neural networks, a more efficient network results, one
words obtaining a recognition ratio about 99%. which is extremely effective for a class of problems [1].
A combination of wavelet packet signal processing and
Key-words—Adaptive neuro-fuzzy inference system, noise ANFIS to efficiently extract the features from pre-processed
cancellation, subtractive clustering. real speech signals for the purpose of speech recognition
among variety English language words [1].
I. INTRODUCTION In this study, a proposed algorithm of speech recognition
s peech /voice recognition is a very difficult task to be
performed by a computer system [1]. Many speech
using a wavelet transform to establish fuzzy inference
system through subtractive clustering and neural network
/voice processing tasks, like speech and word recognition, (ANFIS) is displayed as an efficient algorithm in speech
reached satisfactory performance levels on specific recognition area
applications, and although a variety of commercial products In this paper, English language speech signals-isolated
were launched in the last decade, many problems remain an words- were used. The speech recognition experiment set
open research area, and absolute solutions have not been used in this study, speech signals are transmitted to the
found out yet [2]. Wavelets which bring a new tool to the computer by using a microphone and an audio card which
speech signal classification [1]. It can be said that when has 22 KHZ sampling frequencies .
using wavelets [3]-[6], the event is connected to the time The paper is organized as follows: Section 2 stated the
when it occurs. Wavelet and wavelet packet analysis have following issues: reading signals and noise cancellation,
been proven effective signal processing techniques for a preparation of data using wavelet transform (wavelet packet
variety of signal processing problems [2]. decomposition), subtractive clustering and ANFIS. The
There are some basic methods of fuzzy clustering. One of experimental applications including database of this study,
them is called a subtractive clustering. The basic data acquisition, ANFIS architecture and training
characteristic of this method is illustrated by experiment. parameters are described in section 3. The effectiveness of
the proposed method for classification of speech signals in
the proposed SRA through ANFIS is demonstrated in
First A. M.ELWAKDY is with the Department of Electronics section 4. Finally, in section 5 discussion and conclusion are
Technology, Faculty of Industrial Education, Helwan University, Ameria, presented.
EGYPT (Email: mohamedelwakdy2004@yahoo.com)
Second B. E.ELSEHELY is with the Avionics Research and 2 Proposed Algorithm
Development Center, Salah Salim, EGYPT. (Email: elsehely@yahoo.com)
Third C. M.ELTOKHY is with the Department of Electronics
The proposed algorithm of speech recognition consists
Technology, Faculty of Industrial Education, Helwan University, America, of four parts that are noise cancellation, wavelet transform,
, EGYPT (Email: mostafaeltokhy@hotmail.com) subtractive clustering and ANFIS. The proposed algorithm
Fourth D. A.ELHENNAWY, 3 Faculty of Engineering, Ain Shams is used to carry the goal of this research.
University, Abasia, EGYPT (Email: a.hennawy@marsem.com)
2.1 English word discrimination using a wavelet transform, one, three and six- is 28000 samples of training and
subtractive clustering and ANFIS. checking signals. If the number of samples is less than
The proposed algorithm is illustrated in Fig.1. 28000 samples, the difference in the number of samples is
zeros.
START
Speech signals
(Training signals)
Speech signals One’s , three’s and six’s
(Test signals) Noise cancellation
One’s , three’s and six’s
Wavelet transform
Wavelet transform
Details of signals
(1 to 6)
Subtractive Clustring
Initial FIS
(Fuzzy Inference System)
ANFIS
Adaptive Neuro-Fuzzy Inference System
FISparameters in
Targets the formof
Neural Networks
2.1.3.1. Range Of Influence Considering a first-order Takagi, Sugeno and Kang (TSK)
It indicates the radius of a cluster when the data fuzzy inference system, a fuzzy model contains two rules
space is considered as a unit hypercube [18].A small cluster [19]:
radius will usually yield many small clusters in the data,
resulting in many rules and vice versa [18]. The value of Rule 1: If x is A1 and y is B1; then f1 ¼ p1x q1y r
0.15 was used for each cluster here. Rule 2: If x is A2 and y is B2; then f2 ¼ p2x q y r2
The criteria for cluster center consideration are based on Fig.4 (a) the first- order of TSK fuzzy model; (b) corresponding
acceptance and rejection ratios [15]. ANFIS architecture.
Based on the range of inference, squash factor, accept ratio
and reject ratio is the construction of initial FIS as in Fig.1. If f1 and f2 are constants instead of linear equations, then
we have zero-order TSK fuzzy model. Fig. 4(a) and (b)
2.1.4. Adaptive Neuro-Fuzzy Inference System (ANFIS) illustrates the fuzzy reasoning mechanism and the
Fuzzy modeling [17] is a new branch of system corresponding ANFIS architecture, respectively. Node
identification which deals with the construction of a fuzzy functions in the same layer are of the same function family,
inference system ‘or fuzzy model’ that can predict and as described below. Note that Oji denotes the output of the
explain the behavior of an unknown system described by a ith node in layer j:
set of sample data. The FIS structure contains all the FIS
information [14].All the information for a given FIS is Layer 1: Each node i in this layer generates a membership
contained in the FIS structure; including variable names, grades of a linguistic label. For instance, the node
membership function definitions [1]. function of the ith node might be as in Eq. (7):
The ANFIS is limited to using Sugeno systems. Both Where x is the input to node i; and Ai is the linguistic label
artificial neural network and fuzzy logic are used in (small, large, etc.) associated with this node; and {ai; bi; ci}
ANFIS’s architecture. An adaptive neural network is a (or { ai ; ci } in the latter case) is the parameter set that
network structure consisting of a number of nodes
changes the shapes of the membership function. Parameters
connected through directional links. Each node is
in this layer are referred to as the ‘premise parameters’.
characterized by a node function with fixed or adjustable
parameters. Learning or training phase of a neural network
is a process to determine parameter values to sufficiently fit
the training data [18].
Layer 2: Each node in this layer calculates the ‘firing estimate [18]. In the backward pass, the error rates
strength’ of each rule via multiplication as in Eq. propagate backward and the premise parameters are
(8): updated by the gradient descent. When the values of the
premise parameters are fixed, the overall output can be
oi2 = wi = µAi ( x)xµBi ( y), i = 1,2. (8) expressed as a linear combination of the consequent
parameters [18]:
Layer 3: The ith node of this layer calculates the ratio of the
w1 w
ith rule’s firing strength to the sum of all rules’ f= fi + 2 f2 = w1 f2 +w2 f2
firing strengths as in Eq. (9): w1 + w2 w1 + w2
wi = (w1x) p1 +(w1y)q1 +(w1)r1 +(w2x) p2 +(w1y)q2 (12)
oi3 = wi = i = 1,2 (9)
w1 + w2
,
+(w1)r2
For convenience, outputs of this layer will be called Which is linear in the consequent parameters p1; q1; r1; p2;
‘normalized firing’ strengths. q2 and r2 .A flowchart of hybrid learning procedure for
ANFIS is shown schematically Fig.5[18].
Layer 4: Every node i in this layer is a square node with a
node function as in Eq. (10):
oi4 = wi fi = w( pi x + qi y + ri ) (10)
The schematic of the architecture of ANFIS based on For simplicity, if the FIS has the input standard deviation
Sugeno fuzzy model is shown in Fig.6. x1, x2, x3, x4, x5 and x6 that are the first order Sugeno-type
fuzzy inference model of six input standard deviations can
be expressed with base fuzzy if-then rules as in Eq.(13):
If x1A1x2B1x3C1x4D1x5E1x6F1
Then f1=px1+rx2+qx3 +Sx4+SSx5+PPx6+ u (13)
(a)
(e)
(b) (f)
(D)
3.1 Database used in this experimental study 3.3 ANFIS architecture and training Parameters
A database was created from 3 English words. These 3 ANFIS is intelligent classification using features from
English words are given Table I. One speaker who spoke wavelet packet layer. The training parameters and the
these 3 English words for training and checking phases ten structure of the ANFIS used in this study are shown in
times separately. The training and checking signals are table III.
Multiplied by 10 before background noise was added on it.
The total number of tokens considered for training was 30 Table III ANFIS architecture and training parameters
(1 speaker x 3 English noiseless words x 10 times repetition
of a single word). The total number of tokens considered for
checking was 30 (1 speaker x 3 English noiseless words x
10 times repetition of a single word). Architecture Training parameters
III. REFERENCES:
[1] E. Avci, and Z.H. Akpolat, Speech recognition using a wavelet packet
adaptive network based fuzzy inference system, SinceDirect, vol.31,
no. 3, 2006, pp 495- 503.
Fig.10 Trained FIS against the checking data "one's, three's and [2] M.Siafarikas, T.Ganchev, and N.Fakotakis, Wavelet packet based
six's". speaker verification," Odyssey, in Toledo, Spain, 2004, pp 257-264.
[3] Saito, and Naoki, Local feature extraction and its application using a
library of bases, Proquest, in Yale University, 1994.
Table IV the average error for testing training and checking [4] J.B.Buckheit, and D.L.Donoho, Wavelet and reproducible research,
data against the trained FIS. CiteSeer, in Stanford University, New York: Springer, 1995, pp 55-
82.
[5] E. Wesfreid, and M.V Wickerhauser, Adapted local trigonometric
transforms and speech processing, IEEE SP, vol.41, no. 12, 1993, pp
The average error for testing 3396-3600,.
Value
training [6] E. Visser, M. Otsuka, and T.Lee, A spatio-temporal speech
enhancement scheme for robust speech recognition in noisy
environments, Ingentaconnect, Speech Communication, vol.41, no.
The average error for testing training 2,, 2003, pp 393-407.
2.6872e-7 [7] P.A. Nava, and J.M. Taylor, Speaker independent Voice recognition
data against the trained FIS
with a fuzzy neural network, Proceedings of the Fifth IEEE
international conference on fuzzy systems, vol.3, 1996, pp. 2049–
2052.
The average error for testing checking [8] P. Nava, and J. Taylor, the Optimization of Neural Network
0.010641
data against the trained FIS Performance through Incorporation of Fuzzy Theory, Proceedings of
the Eleventh International Conference on Systems Engineering, pp.
897-901.
[9] L. A. Zadeh, R. Yager, S. Ovchinnokov, R. Tong, and H. Nguyen,
These results are better than the reported work [1]. Fuzzy Sets and Applications, New York, 1987.
[10] G. Strang, Wavelets and filter banks, Wellesley–Cambridge Press,
II. CONCLUSION 1996, pp 1-34, pp 53-60, pp 155-172.
[11] M.S.Crouse, R.D.Nowrak, and R.G.Baraniuk, Wavelet-based
In this study, pattern recognition used the speech/voice statistical signal processing using hidden Markov models, IEEE Tran.
signals of one speaker. The tasks of feature extraction and Signal processing, vol.41, no. 4, 2006, pp 886-902.
[12] G.W. Horgan, Using wavelets for data smoothing: a simulation study,
classification were performed using wavelet transform, Journal of Applied Statistics, vol.26, no. 8, 1999, pp 923-932.
subtractive clustering and ANFIS. The average error for [13] S.L.Chiu, Fuzzy model identification based on cluster estimation,
testing training and checking data against the trained FIS J.Intell.Fuzzy systems, vol.2, no.3, 1994, pp.267-278.
indicated the performance of the speech recognition system. [14] Mathworks, Fuzzy logic Matlab toolbox, ver.7, copyright 1984-2004.
[15] S. Nakkrasae, P.sophatsathit, W.R. Edwards, Fuzzy subtractive
The wavelet packet transform proved to be very useful clustering based indexing approach for software components
features for characterizing the speech signal, furthermore, classification,
the information obtained from the wavelet packet transform
10