You are on page 1of 4

MODELING OF SPEECH SIGNALS USING FRACTIONAL

CALCULUS

Khaled Assaleh Wajdi M. Ahmad


Department of Electrical Engineering. Electrical and Computer Engineering Dept.
American University of Sharjah University of Sharjah
Sharjah, UAE Sharjah, UAE
kassaleh@aus.edu wajdi@sharjah.ac.ae

[FDA ‘04 and FDA ‘06] that were entirely devoted to


ABSTRACT fractional derivatives and their applications. However, to
the knowledge of the authors, the fractional order
In this paper, we present a novel approach for speech modeling technique has not yet been used in the field of
signal modeling using fractional calculus. This approach speech analysis.
is contrasted with the celebrated Linear Predictive Speech signal modeling is entirely based on the integer
Coding (LPC) approach which is based on integer order order model known as Linear Predictive Coding (LPC),
models. It is demonstrated via numerical simulations that [7], which utilizes differential equations of integer order.
by using a few integrals of fractional orders as basis It is the point of this paper to introduce this fractional
functions, the speech signal can be modeled accurately. order concept into such a vital field of research, and
The new approach has the merit of requiring a smaller compare its performance with the existing LPC
number of model parameters, and is demonstrated to be performance. Our aim in doing so is twofold. First, using
superior to the LPC approach in capturing the details of fractional calculus for modeling of speech signals is
the modeled signal. interesting in its own right, as it seems a logical extension
of what has already been done in other fields.
Furthermore, real systems are fractional by nature, and the
1. INTRODUCTION integer order models are but approximations of the reality
of the actual systems. As such, we are optimistic that
Fractional differential equations and their applications fractional order models will prove viable in modeling the
have been the focus of an intensive research effort lately. speech signal. The impact of this will be the savings on
It has been shown that fractional order models can storage requirements of speech signal parameters.
adequately, and perhaps more accurately, describe the
behavior of many physical and biological processes and This paper is organized as follows: In section 2, we
systems [1]. Such models rely heavily on fractional present briefly the traditional integer order LPC method
calculus [2], an old field that can be traced back to 1695 for speech signal modeling, and introduce our fractional
when the first communication between Leibniz and order technique. In section 3, we present and analyze our
L’Hospitals took place. Continuous-time non integer numerical simulations for both the integer and fractional
order models are typically expressed as differential order models. Finally, in section 4 we draw concluding
equations of non integer order. Whereas the exponential remarks.
function serves as the basis of solution for integer order
differential equations, the ‘Mittag-Leffler’ function forms 2.1 Speech modeling using Liner Predictive
the basis for the solution of non integer order differential (LP) analysis
equations. In fact, the integer order setting becomes a
special case of the more general fractional order setting, Linear predictive (LP) analysis has been the method of
where the former is retrieved from the latter by setting all choice for speech modeling for speech coding [7, 8] and
fractional order derivatives to unity. As such, perhaps it is recognition applications [9]. It follows the source-filter
time to revisit the theory of integer order models in the model which represents the transfer function of the speech
various disciplines taking the fractional setting as a production process by an all-pole filter. This corresponds
platform. Within this framework, several types of systems to a linear constant coefficient difference equation
from different fields have recently been investigated (LCCDE).
under the assumption of fractional order models of such
systems, see [3-6] and references therein. The reader is The principle of LP analysis is that a sample of a short
also referred to the proceedings of two recent workshops segment (analysis frame) of a speech signal can be
approximated as a linear combination of the past P

1-4244-0779-6/07/$20.00 ©2007 IEEE


samples. The underlying assumption is that the signal is ∞
assumed to be stationary over the analysis frame. Hence, Γ( z ) = ∫ e −t t z −1dt (6)
the model can be expressed by the following Pth-order 0
LCCDE: is the Euler’s gamma function.
P
xˆ (n) = ∑ a k x(n − k ) , (1) Numerical techniques based on the Reimann-Louiville
k =1 definition of fractional integrals and derivatives [2] are
where x(n) is the N-sample long speech analysis frame, n normally used for simulations of fractional order models.
is the discrete time index, and {ak} are the LP In this paper, we have carried out our numerical
coefficients. It is worth mentioning that equation (1) can simulations using the backward difference method based
be viewed as a discretization of a corresponding on the ‘Grünwald-Letnikov’ approximation of the
differential equation that describes speech modeling in fractional derivative as given by:
the continuous time domain. The LP coefficients can be dα x
Dα x = = h −α ∑ ( −1) C αj x (n − j )
n j
determined by minimizing the energy of the predication α (7)
error given by: dt j =0
N where, h is the integration step size and
ε = ∑ ( x(n) − xˆ (n) )2 . (2)
α 
k =1
Cαj ≡   = α (α − 1) L (α − j + 1) / j!.
In vector-matrix notation the analysis frame and its
 j
approximation can be denoted as N×1 column vectors x
Analogous to the formulation in equation (1), a speech
and x̂ , respectively. Consequently, the prediction signal can be expressed as a linear combination of its
coefficients represented by the vector a can be determined fractional derivatives in discrete form as follows:
by the autocorrelation method such as Q

-1 xˆ (n) = ∑ µ k D α k x (8)
a=R r (3) k =1
It should be emphasized that a negative value of α
where R is a P×P Toeplitz autocorrelation matrix whose
corresponds to a fractional integral of order α, denoted as
first column is given by:
Iαx. Thus, for noise immunity and numerical stability,
[R(0) R (1) L R ( P − 1)] , where
T
equation (8) can be recast as a fractional integral equation
N −1 given by:
R ( k ) = ∑ x ( n) x ( n − k ) (4) Q Q
n =0 xˆ (n) = ∑ γ k I β k x = ∑ γ kψ k (n) (9)
and r = [R(1) R(2) L R( P)]
T k =1 k =1

where, { γ k } are the sought prediction parameters of the


fractional model, henceforth referred to as fractional
2.2 Fractional-order Speech Modeling
linear prediction (FLP) coefficients. These coefficients
can be determined by minimizing the fractional
Motivated by the effectiveness of the recently introduced predication error given in equation (2).
fractional order modeling techniques in various In vector-matrix notation the sequence corresponding
applications, we propose to use such techniques to model
speech signals. The aim is to investigate a more accurate to the fractional integral ψ k (n) is denoted as N×1
and more compact representation of speech signals using column vectors ρk. Consequently, the prediction
fractional differential equations as apposed to the classical coefficients, { γ k } represented by the Q×1 vector, g, can
LP-based modeling that employs integer order differential
be determined by least squares solution such as
equations approximated by difference equations as shown
in the previous section.
g = (ΛTΛ)-1ΛTx, (10)
Many mathematical definitions of the fractional
derivative have been reported in the literarture. The
where Λ = [ρ1 ρ2 ... ρQ ]
Reimann-Louiville definition, one of the most commonly
used definitions, defines the fractional derivative, of order
α, of function x(t) as follows:
3. EXPERIMENTAL RESULTS
d α x (t ) 1 dm t x (τ )
= ∫ dτ ,
dt α Γ( m − α ) dt m (t − τ )α − m + 1
0
Several experiments have been conducted on voiced
speech signals to validate our proposed modeling
technique. We report here a sample of these experiments
(5) which demonstrates the viability of fractional order
where m is an integer such that m < α ≤ m + 1 , and modeling techniques in modeling speech signals. A frame
of 512 samples from the voiced sound ‘i’ such as in ‘hid’
is used here as an example. In Fig. 1, the speech signal For the purpose of comparison with LPC modeling
frame is displayed (top), along with its 3 fractional technique, we have modeled the same signal frame using
integrations (bottom) that are used as the basis functions a 12th order LPC, and the results are displayed in Fig. 3.
in our fractional modeling technique. The fractional The resulting SNR was 21.6 dB. Two observations can be
integration orders used here are 0.15, 0.3, and 0.45. The made from the figure. First, the traditional LPC model
part of the frame shown enclosed by the dotted rectangle (even with a relatively high order) is less effective than a
in the upper half of the figure is used to zoom in on the two-parameter FLP model in modeling the signal
plot to improve graph clarity and visualization, and will especially around the peaks. The second observation is
be used to compare the performance of the proposed that the error signal in the FLP model is much smoother
modeling technique with the commonplace LPC modeling than that in its LPC counterpart. Hence, the merit of
technique. modeling the signal via FLP compared to LPC is clearly
realized in terms of fitting accuracy and the number of
In Fig. 2, we present the fitting results obtained using modeling parameters.
FLP with only 2 coefficients based on the fractional
integrals 0.15 and 0.13. The accuracy of fitting can be
verified both from the close matching between the signal 0.6 original
and its estimated version, as well as from the small fitting fitted
error shown in the figure. The signal-to-noise ratio (SNR) 0.4 error
is used as a figure of merit for performance comparison.
0.2
In this simulation, the SNR was calculated as SNR = 26.5
dB. 0

1 -0.2

0.5
-0.4

0
-0.6

-0.5
-0.8

-1 130 140 150 160 170 180 190 200 210


50 100 150 200
β = 0.15 Sample #
0.4
0.2
β = 0.3
β = 0.45 Figure 3. Fitting a speech signal using 12th order LPC
model; showing part of the frame, its fit, and fitting error;
0
SNR = 21.6 dB
-0.2
-0.4
Even though it is expected that the advantages of FLP
0 100 200 300 400 500 over LPC will be more apparent for lower LPC orders, it
Sample # is still interesting to show what happens when the LPC
Figure 1. A 512-sample speech frame (top) and its order is reduced to two to have a direct comparison with
fractional integral basis functions (bottom). the two-parameter FLP. Fig. 4 shows the result of
modeling the same segment used before using a 2nd order
0.6 original LPC model. As expected, the modeling is poorer than that
fitted obtained using the 12th order model, and certainly much
0.4 error poorer modeling that that of the two-parameter FLP
model. The direct comparison with these two-parameter
0.2
models show that the FLP based model results in a gain of
0
10 dB over that of the LPC.

-0.2 Finally, in Fig. 5 we show the fitting error plots for the
three cases discussed above for the whole 512-sample
-0.4
frame; the 12th order LPC (top), the 2nd order LPC
-0.6
(middle), and the two-parameter FLP (bottom). It is quite
visible that the proposed modeling technique, with only
-0.8 two parameters, is far more superior to the LPC modeling
130 140 150 160 170 180 190 200 210
technique. The modeling error using a 2nd order LPC is
Sample # much larger in magnitude and frequency content than its
Figure 2. Fitting a speech signal using Two-parameter two-parameter FLP model. The figure also shows that
FLP model; showing part of the frame, its fit, and fitting increasing the LPC order to 12 fails to match the
error; SNR = 26.5 dB performance of the two-parameter FLP.
0.6 original
filtted REFERENCES
0.4 error

0.2
[1] I. Podlubny, “Fractional Differential Equations”,
Academic Press, 1999.
0
[2] K. Oldham and J. Spanier, “Fractional Calculus”,
-0.2 Academic press, New York, 1974
-0.4
[3] W. Ahmad and J. C. Sprott, “Chaos in fractional-
-0.6 order autonomous nonlinear systems”, Chaos,
Solitons, and Fractals, 16, 339-351, 2003
-0.8

130 140 150 160 170 180 190 200 210 [4] W. Ahmad and A. Harb, “On nonlinear control
Sample # design for autonomous chaotic systems of integer
Figure 4. Fitting a speech signal using 2nd order LPC and fractional orders”, Chaos, Solitons, and
model; showing part of the frame, its fit, and fitting error; Fractals, vol. 18, 693-701, 2003
SNR = 16.1 dB.
[5] R. El-Khazali, W. Ahmad, and Y. Al-Assaf,
0.2
“Sliding Mode Control of Fractional Chaotic
0.1 Systems”, Proc. IFAC workshop on Fractional
0
-0.1
Differentiation and its Application, France, 495-
-0.2 500, July 2004
50 100 150 200 250 300 350 400 450
0.2
0.1
[6] W. Ahmad, R. El-Khazali, and Y. Al-Assaf,
0 “Stabilization of Fractional Chaotic Systems Using
-0.1 State-Feedback Control”, Chaos, Solitons, and
-0.2
Fractals, vol 22/1, pp 141-150, 2004.
50 100 150 200 250 300 350 400 450
0.2
0.1 [7] J. Makhoul, “Linear prediction: a tutorial review,”
0
Proceedings of the IEEE, Vol. 63, No. 4, April
-0.1
-0.2 1975.
0 50 100 150 200 250 300 350 400 450 500
[8] J. P. Campbell, T. E. Tremain, and V. C. Welch,
th
Figure 5. Fitting errors; (top) 12 order LPC; (middle) 2 nd “The Federal Standard 1016 4800 bps CELP Voice
order LPC; (bottom) two-parameter [0.15 and 0.3] FLP. Coder,” Digital Signal Processing, Vol. 1, no. 3
(1991): 145 – 155.
4. CONCLUSION [9] K. T. Assaleh, and R. J. Mammone, “New LP-
Derived Features for Speaker Identification,” IEEE
A novel approach for speech signal modeling has been Transactions. on Speech and Audio Processing,
presented. The new approach utilizes fractional integrals Vol. 2, no. 4, pp 630-638, 1994
as basis functions to model the speech signal. It has been
demonstrated that fractional order linear prediction
models (FLP) with a smaller number of parameters give
better performance results than those obtained from the
commonly used LPC-based models in terms of the energy
of the fitting error and its smoothness. The smoothness of
the fitting error indicates that in the case of FLP less
information is contained in the fitting error. Furthermore,
we have showed that a two-parameter FLP model
outperforms 12th order LPC models. Hence, the proposed
FLP modeling technique shows considerable promise as a
viable and accurate technique for speech signal modeling.
Further investigations are underway for unvoiced signals.
The obtained preliminary results are encouraging to
further investigate the proposed modeling technique for
other applications like speech recognition and text-to-
speech synthesis.

You might also like