Use of spectral autocorrelation in spectral envelope linear prediction for speech recognition

BY B.LALITHA (08691D3807) Under the guidance of Prof. M. B. MANJUNATHA, M.Tech (PhD)

Introduction ‡ Out-Line of the project
1.Introduction to Speech i) Speech Production ii) Speech Recognition 2. Implementation 3.spectral enevelope LPC Analysis 4.Speech recognition using Dynamic Time Warping (DTW) 5.Result

1. INTRODUCTION TO SPEECH i) Speech Production: Production: 
Speech can be characterized as a signal carrying message information. information.  The Purpose of speech is communication between humans. humans.  Speech is an acoustic waveform that conveys information from a speaker to a listener

Human peech ommunication Human Speech signal Human Ear .

Speech Production Mechanism .

s. fricatives (f.Speech Production Mechanism ‡ ‡ ‡ ‡ ‡ ‡ Flow of air from lungs Vibrating vocal cords Speech production cavities Lips Sound wave Vowels (a. k) . t. z) and plosives (p. i). e.

‡ . and / s / are also fricatives. ‡ These are labeled as / u /. ‡ Unvoiced or Fricative Sounds : Unvoiced sounds are produced by forming constriction at some point in the vocal tract and forcing air through the constriction at a high velocity to produce turbulence. and / e /. / w /.Classification of Speech Signals Voiced Sounds : Voiced sounds are produced when the vocal cords vibrate. These are quasi-periodic pulses of air which excite the vocal tract. / d /. ‡ These are labeled as / œ / is a fricative ³sh´ / f /. / i /.

ii) Speech Recognition The three basic steps in Automatic Speech Recognition (ASR) are: 1. 3. 2. Parameter Estimation Parameter Comparison Decision Making .

IMPLEMENTATION Input Speech signal ‡ Pre-emphasising Hamming windowing Linear prediction Predictive Filter Spectral autocorrelation LPC Coefficients Dynamic time wrapping Reference Word Extract total stored vectors Vector with minimum value Recognized Word .

± spectral tilt ± Spectral tilt is caused by the nature of the glottal pulse ‡ Boosting high-frequency energy gives more information to Acoustic highModel.PrePre-Emphasis ‡ Boosting the energy in the high frequencies. ‡ The spectrum for voiced segments has more energy at lower frequencies than higher frequencies. ± Improves phone recognition performance .

Example of pre-emphasis pre‡ Before and after pre-emphasis pre± Spectral slice from the vowel [aa] .

8000. %Resample Decimation by 4 % x = x(1:Fs/8000:Len). .x).‡ ‡ ‡ ‡ ‡ ‡ ‡ ‡ Pre-emphasing b = [ 1 -15/16]. x = resample(x. Len = length(x). x= filter(b. Fs=8000.Fs).1.

Pre-emphasized and Resampled .

windowing ‡ Speech is Non-stationary signal. . ‡ The speech extracted from each window is called as a frame. ‡ A window is non-zero inside some region and zero elsewhere.

Windowing ‡ The windowing process. showing the frame shift and frame size ‡ .

end end ‡ h = hamming(n).1) * m) + i). .‡ ‡ ‡ ‡ ‡ ‡ Applying the window for i = 1:n for j = 1:nbFrame M(i. j) = speech(((j . ‡ M2 = diag(h) * M.

Common window shapes Rectangular window Hamming window .

Common window shapes .

« ap are constant for each frame of speech.Linear prediction ‡ Linear Predictive Coding (LPC) provides ± low-dimension representation of speech signal at one frame ± representation of spectral envelope. a2.  a p s(n  p) ‡ where a1. . not harmonics ± ³analytically tractable´ method ± some ability to identify formants ‡ LPC models the speech signal at time point n as an approximate linear combination of previous p samples : s(n) } a1s(n  1)  a2 s(n  2)  .

obtaining p equations and p unknowns: p M2 M2 Ö § ak k !1 § s ( m  i ) s ( m  k ) ! § s (m  i ) s ( m ) n n n n m!M1 m!M1 1e i e p Error is minimum (not maximum) when derivative is zero.If the error over a segment of speech is defined as En ! §e m!M 1 M2 M2 2 n ( m) 2 p ¸ ¨ ! § © s n ( m )  § ak s n ( m  k ) ¹ ¹ © m!M 1 ª k !1 º where (sn = signal starting at time n) then we can find ak by setting xEn/xak = 0 for k = 1. error will increase.«p. .2. because as any ak changes away from optimum value.

r !1 ­ ½ xE n x a1 !0! m !M1 §s M2 2 ( m )  .Features: LPC p ¸ ¨ E n ! § © s (m )  § a k s ( m  k ) ¹ ¹ © m!M1 ª k !1 º M2 2 En ! p « 2 ¨ p ¸¨ p ¸» s ( m)  2 s ( m) § ak s ( m  k )  © § ak s ( m  k ) ¹ © § a k s ( m  r ) ¹ ¼ §¬ © ¹© ¹ k !1 m ! M1 ­ ª k !1 ºª r !1 º½ M2 p « 2 » ¬ s ( m)  2s ( m) a1s ( m  1)  a1s ( m  1)§ ar s (m  r )  ¼ r !1 ¼ M2 ¬ p  2 s ( m ) a 2 s ( m  2)  a 2 s ( m  2) § a r s ( m  r )  ¼ En ! § ¬ ¼ m !M 1 ¬ r !1 p ¬ ¼  2 s ( m) a p s ( m  p )  a p s ( m  p) § ar s ( m  r ) ¼ ¬.

.  a1 s ( m  1) a p s ( m  p ) 2 2  .. s ( m ) a1 s ( m  1)  2 a1 s ( m  1) a1 s ( m  1)  .

 a p s ( m  p) s (m  1) ! 0 m !M § M 2  2 s ( m ) s ( m  1 )  2 a 1 s ( m  1 ) s ( m  1 )  2 a 2 s ( m  1 ) s ( m  2 )  . s ( m ) a 2 s ( m  2 )  a 2 s ( m  2 ) a1 s ( m  1)  .. a3.  s ( m  1) a p s ( m  p)  a2 s (m  2) s ( m  1)  a3 s (m  3) s ( m  1)  . « ap ......  2 a p s ( m  1 ) s ( m  p ) ! 0 1 Repeat above equationns for a2...  a 2 s ( m  2 ) a p s ( m  p ) m ! M1 § M2  2s ( m) s ( m  1)  2a1s ( m  1) s ( m  1)  s ( m  1) a2 s (m  2)  .

p » «  2 s ( m) s ( m  i )  2 § a k s ( m  i ) s ( m  k ) ¼ ! 0 1 e i e p §¬ m!M1 ­ k !1 ½ M2 M2 M2 » « p  2 § ?s ( m) s ( m  i ) A 2 § ¬§ ak s ( m  i ) s ( m  k ) ¼ ! 0 1 e i e p m!M1 m ! M 1 ­ k !1 ½ § ak k !1 p § s (m  i ) s (m  k ) ! m!M1 M2 § s(m  i )s(m) m!M1 M2 1e i e p .

As a result. k ) ! § s (m  i) s (m  k ) n n m!M1 M2 we can re-write equation as Ö §a J k !1 p k n (i.g.LPC Autocorrelation Method Autocorrelation: measure of periodicity in signal J n (i. The most common method in speech processing is the ³autocorrelation´ method: Force the signal to be zero outside of interval 0 e m e N-1: where w(m) is a finite-length window (e. Hamming) of length N that is zero when less than 0 and greater than N-1. is the windowed signal. k ) ! J n (i. .0) 1e i e p We can solve for ak using several methods.

k ) ! § sn (m  i ) sn (m  k ) k n n k !1 p and this can be expressed as Jn (i. eqn (6):  p 1 1e i e p z Ö Ö Jn (i. k ) ! and this is identical to the autocorrelation function for |i-k| because E ! § e (m) the autocorrelation function is symmetric.Ö 1e i e p § a R (| i  k |) ! R (i) because of setting the signal to zero outside the window. Rn(-k) = Rn(k) : N  p 1 m !0 n 2 n J n ( i . k ) ! R n (| i  k |) Rn ( k ) ! m!0 so the set of equations for ak (eqn (7)) can be combo of (7) and (12):   m!0 0ek e p 1 ( i  k ) m!0 § Ö Ö sn ( m) sn (m  (i  k )) 1e i e p 0ek e p Ö §s 1 k n Ö (m ) sn (m  k ) .

n ( p  3) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ( p  1) » « a1 » Ö « ( p  2)¼ ¬ a2 ¼ ¬ Ö n ¬ ( p  3)¼ ¬ a3 ¼ Ö n ¼¬ ¼ ! ¬ . ¼ ¬.In matrix form. n ¡ ¡ ¡ ¡ ¡ (0 ) n (1 ) n (2) . . . ¼ ¬. ¼ ¬ . . diagonal elements equal) matrix for values of E: ¡ ¡ . . n ( p  2) (2) n (1 ) n (0 ) . ¼ ¬ Ö ½ ¼ ¬a p ¼ ­ n (0 ) ½­ (1 ) » (2) ¼ n ¼ n (3) ¼ . n ( p  1) (1 ) (0) n n (1 ) . ¼ n ( p )½ n ¡ . . equation (14) looks like this: « ¬ ¬ ¬ ¬ ¬ ¬ ­ n n n There is a recursive algorithm to solve this: Durbin¶s solution LPC Durbin¶s Solution Solve a Toeplitz (symmetric. . ¼ . . .

§E k !1 p k Rn (| i  k |) ! Rn (i ) 1e i e p E (0 ) ! R (0) i 1 « » ki ! ¬ R (i )  § E (ji 1) R (i  j ) ¼ E ( i 1) j !1 ­ ½ E i( i ) ! ki i E (j i ) ! E (j i 1)  kiE i(j1) 1e i e p 1 e j e i 1 E (i ) ! (1  ki2 ) E (i 1) We can compute spectral envelope magnitude from LPC parameters by evaluating the transfer function S(z) for z=ej[: S (e j[ Ö a j ! E (j p ) ) ! G A (e j[ ) ! 1  G § p a k e  j[ k k ! 1 .

Finding frequency envelope using LPC method ‡ for col =1:nbFrame ‡ % compute Mth-order autocorrelation function: ‡ rx = zeros(1. ‡ covmatrix(i:Or. ‡ for i=1:Or+1.Acoeffs'].Or). ‡ end A(z) ‡ .i) = rx(1:Or-i+1).covmatrix \ rx(2:Or+1). ‡ speech1 = M2(:.Or+1)'. ‡ rx(i) = rx(i) + speech1(1:n-i+1) * speech1(1+i-1:n)'.Alp.n*2)')).000001. ‡ end ‡ % prepare the M by M Toeplitx covariance matrix: ‡ covmatrix = zeros(Or. ‡ covmatrix(i. ‡ Alp = [1. ‡ end ‡ % solve "normal equations" for prediction coeffs ‡ Acoeffs = . % LP polynomial dbenvlp(:. ‡ for i=1:Or.i:Or) = rx(1:Or-i+1)'.col) = 20*log(abs(freqz(1.col)'+0.

ri.«««. y) = MIN [C (x + 1. y + 1) ] + D (x. y) ..t2.r2.«««. y + 1) .rm} ‡ T= Test Signal or Unknown Signal Dynamic Time Warping (DTW ‡ R= Reference Signal or known Signal ‡ A matrix of m x n is created ‡ C (x.«.tn} . C (x. y) . C (x + 1. R= {r1.ti.«.‡ The SELP analysis is evaluated using a Dynamic time wrapping ‡ T = { t1.

‡ A matrix of m x n is created .

Dynamic Time Warping (DTW) .

Dynamic Time Warping (DTW .