16 views

Original Title: l7

Uploaded by AakankshaSingh

- Biomedical Signals: Identification of PVC beats in an EMG signal
- Molnar Etal 2008
- Digital Signal Processing
- PSO TEST
- COCHELEA
- Labov-1963
- Lecture 1
- Voice Tweaker 4 Users Manual Rev 2
- 42913383 Digital Signal Processing Notes
- EC2029-DIP important question.pdf
- Unit 3 - Bandwidth Efficient Schemes
- Matlab Code
- SSUM_ Signals and Systems Using MATLAB
- ml
- ferragne_2010 (1)
- Application of Measurement Models to Impedance Spectroscopy
- 303Mills.pdf
- Arva Manual
- Data Analysis
- Section-5

You are on page 1of 24

Introduction

Linear prediction

Finding the linear prediction coefficients

Alternative representations

This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009, ch. 12; Rabiner and Schaefer, 2007, ch. 6]

Introduction

Review of speech production

Speech is produced by an excitation signal generated in the throat,

which is modified by resonances due to the shape of the vocal, nasal

and pharyngeal tracts

The excitation signal can be

Glottal pulses created by periodic opening and closing of the vocal folds

(voiced speech)

These periodic components are characterized by their fundamental frequency

(0), whose perceptual correlate is the pitch

A combination of the two

formants

Pitch appears as narrow peaks for fundamental and harmonics

Formants appear as wide peaks in the spectral envelope

Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU

Linear prediction

The source-filter model

Originally proposed by Gunnar Fant in 1960 as a linear model of

speech production in which glottis and vocal tract are fully uncoupled

According to the model, the speech signal is the output of an allpole filer 1 excited by

=

1 =1

signals, respectively, and is the prediction order

the inverse filter

As discussed before, the excitation signal is either

A sequence of regularly spaced pulses, whose period 0 and amplitude

can be adjusted, or

White Gaussian noise, whose variance 2 can be adjusted

predictability, which gives name to the model

Taking the inverse z-transform, the speech signal can be expressed as

= +

=1

sum of the previous samples plus some excitation contribution

In linear prediction, the term is usually referred to as the error (or

residual) and is often written as to reflect this

Inverse filter

For a given speech signal , and given the LP parameters , the

residual can be estimated as

=

=1

which is simply the output of the inverse filter excited by the speech

signal (see figure below)

excitation signal that led to the speech signal

One will then expect that will approximate a sequence of pulses (for

voiced speech) or white Gaussian noise (for unvoiced speech)

Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU

How do we estimate the LP parameters?

We seek to estimate model parameters that minimize the

expectation of the residual energy 2

= arg min 2

the covariance method

the autocorrelation method

Using the term to denote the sum squared error, we can state

1

=0

=0

=1

each coefficient and setting to zero

=0

= 2

1

=0

1

=0

+2

=1

1

=0

= 1,2,

= 0

=1

which gives

1

=0

=1

=2

1

=0

Defining , as

, =

1

=0

, 0 =

Or in matrix notation as

1,0

1,1

2,0

2,1

=

=1

1,2

2,2

, 0

, 1 , 2

or even more compactly as =

1,

2,

1

2

efficiently using Cholesky decomposition in 3

10

NOTES

This method is known as the covariance method (for unclear reasons)

The method calculates the error in the region 0 < 1, but to do

so uses speech samples in the region < 1

Note that to estimate the error at 0 , one needs samples up to

If the signal follows an all-pole model, the covariance matrix can produce

an exact solution

In contrast, the method we will see next is suboptimal, but leads to more

efficient and stable solutions

11

The autocorrelation function of a signal can be defined as

extends over to rather than to the range 0 <

, =

Hann), which sets to zero all values outside 0 <

Thus, all errors will be zero before the window and samples after

the window, and the calculation of the error over can be rewritten as

, =

1+

=0

, =

1()

=0

12

thus, , =

which allows us to write , 0 =

=

The resulting matrix

0

1

2

1

=

=1

, as

=1

1

0

1

2

1

2

1 2

0

is now a Toeplitz matrix (symmetric, with all elements on each

diagonal being identical), which is significantly easier to invert

In particular, the Levinson-Durbin recursion provides a solution in 2

13

The frequency response of the LP filter can be found by evaluating the

transfer function on the unit circle at angles 2 , that is

2

2

2

=

1 =1 2

Remember that this all-pole filter models the resonances of the vocal

tract and that the glottal excitation is captured in the residual

Therefore, the frequency response of 1 will be smooth and

free of pitch harmonics

This response is generally referred to as the spectral envelope

14

The next slide shows the spectral envelope for = 12, 40 , and the

reduction in mean-squared error over a range of values

At = 12 the spectral envelope captures the broad spectral peaks (i.e.

the harmonics), whereas at = 40 the spectral peaks also capture the

harmonic structure

Notice also that the MSE curve flattens out above about = 12 and then

decreases modestly after

Also consider the various factors that contribute to the speech spectra

Resonance structure comprising about one resonance per 1Khz, each

resonance needing one complex pole pair

A low-pass glottal pulse spectrum, and a high-pass filter due to radiation

at the lips, which can be modeled by 1-2 complex pole pairs

This leads to a rule of thumb of = 4 + 1000, or about 10-12 LP

coefficients for a sampling rate of = 8

15

Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU

16

http://www.phys.unsw.edu.au/jw/graphics/voice3.gif

17

Examples

ex7p1.m

Computing linear predictive coefficients

Estimating spectral envelope as a function

of the number of LPC coefficients

Speech synthesis with simple excitation

models (white noise and pulse trains)

ex7p2.m

Repeat the above at the sentence level

18

Alternative representations

A variety of different equivalent representations can be

obtained from the parameters of the LP model

This is important because the LP coefficients are hard to interpret

and also too sensitive to numerical precision

Here we review some of these alternative representations and how

they can be derived from the LP model

Root pairs

Line spectrum frequencies

Reflection coefficients

Log-area ratio coefficients

prediction) will be discussed in a different lecture

19

Root pairs

The polynomial can be factored into complex pairs, each of which

represents a resonance in the model

These roots (poles of the LP transfer function) are relatively stable and are

numerically well behaved

The example in the next slide shows the roots (marked with a ) of a

12-th order model

Eight of the roots (4 pairs) are close to the unit circle, which indicates

they model formant frequencies

The remaining four roots lie well within the unit circle, which means

they only provide for the overall spectral shaping due to glottal and

radiation influences

20

[Rabiner and Schafer, 2007]

21

A more desirable alternative to quantization of the roots of is

based on the so-called line spectrum pair polynomials

= + +1 1

= +1 1

which, when added up, yield the original

All the roots of and are on the unit circle and their frequencies

(angles in the z-plane) are known as the line spectral frequencies

The LSFs are close together when the roots of are close to the unit

circle; in other words, presence of two close LSFs is indicative of a strong

resonance (see previous slide)

LSFs are not overly sensitive to quantization noise and are also stable, so

they are widely used for quantizing LP filters

22

Reflection coefficients

The reflection coefficients represent the fraction of energy reflected at

each section of a non-uniform tube model of the vocal tract

They are a popular choice of LP representation for various reasons

They are easily computed as a by-product of the Levinson-Durbin iteration

They are robust to quantization error

They have a physical interpretation, making then amenable to

interpolation

through the following backward recursion

= = , , 1

1

=

1<

2

1

where we initialize =

23

Log-area ratios

Log-area ratio coefficients are the natural logarithm of the ratio of the

areas of adjacent sections of a lossless tube equivalent to the vocal

tract (i.e., both having the same transfer function)

While it is possible to estimate the ratio of adjacent sections, it is not

possible to find the absolute values of those areas

1

= ln

1 +

where is the LAR and is the corresponding reflection coefficient

24

- Biomedical Signals: Identification of PVC beats in an EMG signalUploaded byOwenHumphreys
- Molnar Etal 2008Uploaded byLalo Lawliet Betra
- Digital Signal ProcessingUploaded bybala_c84111
- PSO TESTUploaded byusama32
- COCHELEAUploaded byBibin Jose
- Labov-1963Uploaded byאדמטהאוצ'וה
- Lecture 1Uploaded byChadwick Barclay
- Voice Tweaker 4 Users Manual Rev 2Uploaded bySalvatore Nauta
- 42913383 Digital Signal Processing NotesUploaded bySahana Balasubramanian
- EC2029-DIP important question.pdfUploaded byjayarajan
- Unit 3 - Bandwidth Efficient SchemesUploaded byTristan Cabatac
- Matlab CodeUploaded byRakib Hyder
- SSUM_ Signals and Systems Using MATLABUploaded bydialauchenna
- mlUploaded byniraka
- ferragne_2010 (1)Uploaded byTyler Lee
- Application of Measurement Models to Impedance SpectroscopyUploaded byYu Shu Hearn
- 303Mills.pdfUploaded byfaisal
- Arva ManualUploaded byJimmy Watt Abarca
- Data AnalysisUploaded byShafi Rahman
- Section-5Uploaded bysaiful khalid
- B.tech. Syllabus Admitted 2013-14Uploaded bylokheshdon
- 1IJBMRFEB20191Uploaded byTJPRC Publications
- Acquisition and Processing Pitfall Associated With Clipping Near-surface Seismic Reflection TracesUploaded byKhamvanh Phengnaone
- pset7solUploaded byownagewolf
- 06689571Uploaded byMekaTron
- Cadbury RevenueUploaded bySameep Verma
- Research on Hollywood movies of 2012Uploaded bySayed Anwar
- MAT 404 Inferential Statistics for ResearchUploaded byMuhammad Qasim
- Using Excel for RegressionUploaded byPhat Luong
- CS20100100001_71976433Uploaded byvlsipranati

- slyc124Uploaded byAakankshaSingh
- StdXi-Voc-EMA-EM-1Uploaded bykalaikalai360
- 78Uploaded byAakankshaSingh
- L14 AddersUploaded byAakankshaSingh
- Kuang 2014Uploaded byAakankshaSingh
- Unit-II Elasticity of Demand and SupplyUploaded byAakankshaSingh
- NITPATNA WorkshopUploaded byAakankshaSingh
- L16 Sequential CKTUploaded byAakankshaSingh
- L13_K_mapUploaded byAakankshaSingh
- Security TutorialUploaded byNina Libidov
- l2Uploaded byAakankshaSingh
- l6Uploaded bysouvik5000
- D17111GC30_sg2Uploaded byAakankshaSingh
- Asimov Quick MathsUploaded byDani Ibrahim
- Electronics EnggUploaded byAakankshaSingh
- cell (1)Uploaded byAakankshaSingh
- EE168Uploaded byAakankshaSingh

- Input Impedance Analysis of Single-phase PFC ConvertersUploaded byCem Caneren
- Frequency Selective Filters LectureUploaded byAditya Chanda
- Presentation RF-Optimization-And-PlanningUploaded byUmmara24
- 01 Gsm IntroductionUploaded byBrian Scarella
- BroadbandUploaded byntakuseni
- 15661 FaUploaded bygiapy0000
- Harman Kardon Avr151 230 Service Manual Rev.0Uploaded byhugosub
- Transmission Line ProtectionUploaded bysansui227kss
- DSP 2015.pdfUploaded byPraveen
- Comm-04-Phase and Frequency ModulationUploaded bySahana B
- ETHERNET PRINCIPLEUploaded byGossan Anicet
- IRJET- Gesture Based Wireless Control of Robotic Hand using Image ProcessingUploaded byIRJET Journal
- intro 5GUploaded bymehran_4x
- Introduction to OFDMUploaded byFalasteen Hora
- ISDN NotesUploaded byaniket-mhatre-164
- 9768 Metro Radio Outdoor en DatasheetUploaded byFemi Oketoki
- AUDIO COMMUNICATION USING OPTICAL FIBERUploaded byjason_patidar
- 3 LAN KINERJA TINGGI 1Uploaded byBambang Waluyojati, S.Kom
- Power Line Carrier Communication (PLCC) _ EEPUploaded byAnonymous i8hifn7
- MSO4000 DPO4000 Mixed Signal Oscilloscope Datasheet 22Uploaded byNaresh Kumar
- The Mobile Broadband AdvantageUploaded byRicardo Casseres Barrios
- Lab 1-Equipment Overview Final.pdfUploaded by27189
- Fault Secure Encoder and Decoder for Nano-Memory ApplicationsUploaded byIRJET Journal
- Fibre-Optics Educator LeafletUploaded byEmmac01
- SmartBridge Series III.pdfUploaded byMohsen
- Service Manual JVC RX-6010vbk, 6012vslUploaded byaliarro
- RF Fields From a WEL Networks Smart MeterUploaded byvilensky3cediel
- Practical Considerations in Using IEPE Accelerómeters - Tp326Uploaded bySebastian Hernandez
- ImageProcessing13 RevisionUploaded byThilaga Mohan
- Tutorial - Digital Audio Re Sampling - Julius SmithUploaded byyonodio