You are on page 1of 9

5 Linear Prediction

Linear prediction is one of the most important tools in speech processing. It can be utilized in many ways but regarding to speech processing, the most important property is the ability to model the vocal tract. It can be shown that the lattice structured model of the vocal tract is an all-pole lter which means a lter that has only poles. One can also think that the lack of zeros restricts the lter to bolster up certain frequencies which in this case are the formant frequencies of the vocal tract. In reality the vocal tract is not composed of lossless uniform tubes, but in practice, modeling the vocal tract with an all-pole lter works ne. Linear prediction (LP) is a usefull method for estimating the parameters of this all-pole lter according to a recorded speech signal. Let us rst study an example of the usefullness of LP with this respect. Figure 1 presents a 30 ms window of the vowel [a] with a sampling frequency of 16 kHz. Its amplitude spectrum can be found in gure 2, showing the fundamental frequency (dense peaks) and formants (broad peaks in the pulse envelope). In the same picture, there is also the amplitude response of a 20th degree LP-model that models very well the broad peak envelope.
ikkunoitu anne 0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

50

100

150

200

250

300

350

400

450

500

Figure 1: Hanning-windowed waveform of vowel [a].

5.1

Background of Linear Prediction

The term linear prediction refers to the prediction of the output of a linear system based on its input and previous outputs  !""! $#% :

& (0) 2 1 A @B C@BED 2 F P@BQ(R@B ' 35479 68 35H 4 GBI


29

(1)

Amplitudispektri ja LPCspektri 40

20

20

40

60

80

1000

2000

3000

4000 taajuus, Hz

5000

6000

7000

8000

Figure 2: Amplitude spectrum and LP spectrum. The notation ( refers to the estimate or prediction of ( . The idea is that once we know the input (( and the output we would like to predict the behaviour of the unknown system S UTV as illustrated in gure 3. In the gure the output has been delayed so that we cant use the & real output. The problem is now to determine the constants P@B and P@B in a such a way that 8 I approximates the real output as accurately as possible. The following terms describe the model: autoregressive, (AR) model The output is predicted by using only previous outputs and the current input, which means that P@BW)YXB`@bacX and only A@H and UXd must be determined. I 8 I This corresponds to an all-pole lter. moving average, (MA) model In this model the prediction is based only on the input, which gives A@Be)fX . This model corresponds to a FIR lter.

&

&

autoregressive moving average (ARMA) model This is the general model as in equation (1) corresponding to a general linear recursive lter. In speech processing the AR-model is preferred based on the following reasons:

g g g

the input (excitation signal at vocal cords) is unknown computational easiness of determination of parameters

P@B

as shown before, the vocal tract is theoretically an all-pole lter (excluding nasal sounds)

30

x(n) H(z) -1 z

y(n)

B(z)

A(z) ^ y(n)

Figure 3: Prediction of the output of the unknown system S UTV based in input and previous outputs. In speech processing S UTV corresponds to a vocal tract and the input is usually unavailable.

g g

AR-model of higher degree can also model ARMA-model stabile all-pole model can be used to present the amplitude response of any system with any desired precision (however, the degree of the required all-pole model may be considerably high).

h Consider an all-pole system with transfer function i UTV 


where (2)

i pTV0)q`D 8

6 Tsr 6 D"!tD

and denotes the gain. The tranfer function is the ratio of the output z-transformed form h

81

Tsr 1

u UTV

and the input

v pTV

in

u UTV ) i UTV  v UTV h

which implies

By taking the inverse T -transform of (3) yields the time domain relation

i u UTV pTV0)

v UTV h

(3)

21 ' xy xye) D w 47 6 8
which is

'(

(0)

'( 2 1 ' x xy w 476 8


31

(4)

where is the input, is the response and !""! #% are the coefcients of the lter UTV . 8 8 In other words the output of the all-pole system can be predicted perfectly if the input and the previous outputs are known. In practice the prediction is never perfect since the systems are not linear nor all-pole type and there is generally some noise in the output. Moreover in speech processing the input '( is unknown. Nevertheless, the vocal tract (as well as any other system) can be modeled by using all-pole model and in this case the model works really well. So by getting rid of dependence on the input (( in equation (4) we end up with the following model that will be used in from now on:

& (e)q 2 1 A @B C@B ' 35479 68

The hat over refers to estimate of . & Our goal is to determine the parameters  U"""! #% so that '( would be close to the 8 8 8 recorded speech in some frame of the signal, i.e., so that the prediction error would be minimized. As the parameters have been determined, we may, according to equation (2), use the following model of vocal tract

h
where i UTV ).

 i UTV 

)

h
but we are now mainly interested in

(There is a much more elegant way to estimate

Autocorrelation Method Parameters y"""! #% are to be determined so that the sum of squared errors 8 8 2 & ( '(Q
will be minimized over all indices. In practice the sum is nite due to the niteness of the signal but it is usefull to think that the frame is innitely long and only few samples are nonzero. In the following the output '( will be denoted as '( (s referring to speech). So we have a windowed speech signal where only a nite number of samples are nonzero. With the given prediction coefcients,  U""" #% , the energy of the prediction error can be 8 8 8 written as

5.2

2 dy 4 r 1 ) 2 4 r ) 2 4 r & where # is the length of the prediction lter and By having convention that UXde)Y , the energy 8 ) 2 2 1 4 r 34HG 8 1 )

& (A 2 1 A@H R@BA  3476 8 '( is the estimate of '( 1 P@B R@B

(prediction in this case). of the prediction error can be written as

32

Let us minimize by choosing suitable coefcients  P!""! #% . A necessary condition for 8 8 8 1 with respect to variable xy equals optimality of the choise of 'x is that the partial derivative of 8 8 1 zero. Notice that depends on the variables ""!! #% so it could be written as ""!" #% 8 8 8 1 8 but we omit this to1 keep the notation short. So lets differentiate! The partial derivative with respect to 'xy (xe)q95H"""!# ) is

8 @B 'C@B P 1 3 4HG 8 xy 1 ) xy 8 8 34HG P @B 'C@B 1 8 ) 2Q  2 1 A@B R@BA xy 354HGH8 8 ) 2Q  2 1 A@B R@BA xy 354HGH8 h h h % B f ) f p % fp% has been utilized. By regrouping this we get e where the differentiation rule e d 2  2 1 A@H R@BA 'x 4 r 34HG 8 2 1 A@H 2 c C@B 'xy 4 r 34HGB8 2 1 A@HgBP@7y x c 34HGB8 ) )
where

is in fact the autocorrelation of the signal (

gBP@7yxy0) 2 4 r 2 4 r

C@B 'xy
with delay @hx which is

'( 'A@ixy

Why? Well, by making a substitution in the sum yields

2 R@B xy 4 r ) 2 'kDlxy(R@B mDlxyxy 4 r ) 2 nP@hxy 4 r B g P 7 @ y  y x Moreover the term depends only on value @olx so it can be denoted by one variable autocorgBP@7yxj)
relation function

gBP@hxye)gHA@7yxy
33

By setting the derivatives to zero, we obtain:

pqqq  5 1 3 4HG  r 1 354HG qqq qs  354HG 1 pqqq q 1 3 476 3476 r qqq 1 qs 3476 1 8 8 8 vww ww gHUXd gH gHP gH gHUXd gH gHP gH gHUXd

A@HgBP@h0)nX 8 A@HgBP@hC90)nX 8 8 A@HgBP@h$#70)nX


. . .


and gHA@Be)gHyd@B )

which can also be written in the form (by remembering that

UXde)Y 8 A@BAgBA@he)qtgH A@BAgBA@hC9e)qtgHP  . A@BAgBA@hu#%e)qtgH#% y!y!ycgB#z y!y!ycgB#zC y!y!ycgB#z{d ~    vww


. .

which in turn can be reformulated with matrices as:

vww gB ~  ~ ww 8 y ww gBP    ww ww 8 U w  w p{d  ) w gBU{d  8 "" . . . . .. . . . . x "" . . . . . x x #% gB#% gB#m|gB#mC9|gB#m{}y!y!y gBpXd 8 Notice that the coefcient matrix is symmetric (due to gBP@B)gByd@H ) and Toeplitz (due to gBA@7yxy) gBA@txy ), which is crucial when deriving a fast computational method to nd the coefcients  U"""! #% . 8 8 8
5.2.1 Levinson-Durbin Recursion Recap: at this point we have derived the equations (so called normal equations) for the prediction coefcients !""! #% based on the minimization of the prediction error. Now the coefcients 8 8 could be solved by inverting the autocorrelation matrix, but this is computationally rather demanding. To help us, Mr. Levinson and Mr. Durbin have developed an efcient algorithm for solving a symmetric Toeplitz-type equation group. The basic idea is to solve the matrix equation

)
in steps, that is, by increasing the length of the vector the previous solution. The optimal coefcients satisfy

and by calculating a new solution based on

where is the sum of squares of prediction error (more information can be found, for instance, in the book T. W. Parsons, Voice and Speech Processing, McGraw-Hill, Inc., 1987). By using this, the

21 xygB'xy0)  w 4HG 8

34

group of equations boils down to

vww gBUX ww gB ww w gBP9 x


. . .

gH gHUXd gH


. . .

gBP9 gB gBUX


. . .

y"y!y gB#% y"y!ycgB#m y"y!ycgB#m


...

gB#7gH#zgH#zC9y"y!y vww ~ ww X  w  ) ww X  x X


. . .

gBpXd

. . .

~  wv  ww   ww y  w 8 U 8 y!y!y x #% 8

~   

The matrix on the left is still symmetric and Toeplitz. Assume that we have already solved the equation when #b) . Now, let us see how it helps us to solve  P9 p{d when #)n{ , where the subscript refers to the degree of the equation. So this 8 8d 8 is what we have already solved:

The structure of matrix

vw gBpXd|gBgHP ~  wv  ~   x gBy|gBUXgH x 8 gBU|gBgHUXd 9 P 8 vw ~  ) x X X vw gBpXd|gBgHP ~  wv P9 ~  8  x gBy|gBUXgH x 8 gBU|gBgHUXd  vw X ~  ) x X 

yields

thus: symmetric Toeplitz-matrices (and only them) have the nice property that when the coefcient vector and the result vector are twisted upside down, (switch the last and the rst element and switch the second last and second and so on...), the equation is still satised. Let us now try to use the following kind of solution to a bigger group of equations

vww gBpXd|gB|gBU|gBU{ w gBy|gBUXd|gBy|gBP9 x gBU|gB|gBpXd|gB gBp{d|gBP|gBy| gBUX vww ~  vww ~ w X  w X  ) D@  x X x X

~ qpq vww   r w  q qqs x 8 P 8 X

~ 

~ qq U  DR@ q x 88 y qq  w X

vww

35

w 4HG 'xAgBU{xy . where ) 8 For this to be a solution, we only require that all the elements, except the rst one, in the vector on the right side are equal to zero. It will be so, if DR@ )nXB

in other words

 @ ) 2 xyAgBU{x w 4HG 8
We notice that

Justication:

) y R@s  ) ) ) D@ D @ yd@ @  R

We have thus found that by trying a vector that is a sum of the lower degree solution and its twisted version multiplied by a constant, we get a solution to the problem of the higher degree. Same deduction works in general when increasing the size from  to . Thus, the results are

r 6 2 ) r 6 w 4HG ) r 

6 xygHx 8 r  R @ 6

and

Because

X (

6 xyEDR@ 6 xy 8 r 8 r is the prediction error for the th degree lter), it follows @ s 9 8

'xy0)

The values @

are called reection coefcients. Levinson-Durbin recursion will be started with condition

gBUX0) G 
which may be thought to be the error of the X th degree predictor (no prediction at all). There exist also other methods and variations to solve the coefcients but Levinson-Durbin recursion is the most commonly used one. Besides, calculating the coefcients in this way guarantees that the absolute values of the reection coefcients are always  , yielding a stable lter. 36

The degree, # , of the model is chosen by considering that one pole corresponds to one formant, and because there is approximately one formant per one kHz, the degree is usually the same as the sampling frequency in kHz. For instance, when the sampling frequency is 8 kHz, the degree of the model is 8. In practice, to compensate the inaccuracies in the model (AR assumption and others) the degree is usually chosen to be a little bit higher. For instance, with a sampling frequency of 8 kHz, a reasonable model degree is 10 or 12, and with a 16 kHz sampling frequency, the degree should be 18 or 20. The LP analysis method discussed above is perhaps the most important method in speech processing. In speech coding, for instance, it will be used to code the excitation and vocal tract contributions separately, in speech recognition it will give information about the spectrum of the speech (and in this way about the phoneme) and in speech syntesis it enables to control the vocal tract and excitation separately. In Matlab the LPC (or LP) analysis is implemened by the command lpc.

37