13 views

Uploaded by api-3731921

- Your Essential Guide to Digital
- Pertemuan3-Sampling_and_Quantization_in_MATLAB(1).pptx
- Phyzok
- Sound Perception
- Nexus Pmr Science Form 1•2•3 Page 131 (Questions )
- Speech Recognition
- Digital Audio Theory
- A Comparison Between Hearing and Tone Burst
- Senses
- 1998 - Olson - Observing Middle and Inner Ear Mechanics With Novelintracochlear Pressure Sensors
- Raynal, Harion, Favre-Marinet, Binder - The Oscillatory Instability of Plane Variable-Density Jets (1996)
- Up Sampling
- carrano-ion-2010-028
- x071 12
- Spectral Analysis
- Identities Mrsp
- 12FPGA Implementation 0f 32 Point GM24Sept13VIT Copy
- 4.ec6501_lp_dc_2018
- 290
- Digital Signal Processing Matlab

You are on page 1of 30

Speech Analysis

by

Dr Philip Jackson

lecturer in speech & audio

Department of Electronic Engineering.

http://www.ee.surrey.ac.uk/Teaching/Courses/eem.ssr

What’s the point of analysing

speech?

• Speech analysis, or speech processing,

transforms a speech waveform into a

representation that is suitable for

extracting its features:

• Human visual inspection

– e.g., by a speech scientist, speech therapist,

or forensic phonetician

• Computer analysis

– e.g., for automatic speech recognition,

speaker recognition, or paralinguistic

processing

And what does that mean?

• Suitable could be:

– amenable to human visual inspection

– using a small number of bits per

second (for transmission or storage)

– compatible with the models in a

speech recognizer

– in line with our understanding of

human auditory processing

Cochlear section

• Cochlea, or inner

ear, has a spiral

form:

– vestibular canal

– basilar

membrane

– tympanic canal

– auditory nerve

Response of the cochlea

Basilar membrane

• travels along the basilar membrane

• vibrates at matching position

• activates auditory nerves

Short-term spectrum

• Represents the distribution of power

with respect to frequency over a time

interval centred at time, t, like a vertical

slice through the spectrogram

• From a source-filter perspective, it gives

us some information about the shape of

the vocal tract at time t

• From a human speech perception view,

it provides similar information to that

sent from the cochlea to the auditory

nerve

Computing the ST-spectrum

• Analogue-to-Digital (A/D)

Conversion

– convert the analogue signal from the

microphone into a digital signal

• Windowing

– select a short section of speech,

centred at time t, and smooth

• Frequency analysis

– estimate the distribution of power with

respect to frequency

Waterfall display

Speech spectrogram

Derived formant tracks

A/D conversion

• Sampling measures the speech signal at

regular intervals, n

• Quantisation encodes the signal xn with

a discrete value

xn

n

Sample rate

• Nyquist’s theorem: for a signal band-

limited to B Hz, then a rate of 2B

samples per second is needed to encode

the signal faithfully

• Human ear sensitive up 20 kHz (hence

44 kHz rate for CDs)

• But for speech:

– high-quality needs 10 kHz bandwidth, i.e., 20

kHz sample rate

– bandwidth can be reduced to ~4 kHz (8 kHz

rate), for telephone quality

– e.g., 8-bit PCM at 8kHz = 64 kbps

CD-quality: fS = 44 kHz

High-quality speech: fS = 20 kHz

Telephone speech: fS = 8 kHz

Window functions

Frequency analysis

• Discrete Fourier Transform (DFT) is

applied to the windowed digital waveform

{x(n):n=1,…,N}.

• With an N-sample window, an N-point

complex spectrum is obtained {X(k):

k=1,…,N}.

• The modulus squared gives the power

spectrum, |X2(k)|

• The logarithm gives the log-power

spectrum, log|X2(k)|

Discrete Fourier transform

• over a finite period of time

• sampled at regular intervals

Forward transform:

(

X ( k ) = ∑n =0 x ( n ) cos

N −1 − j 2πkn

N + j sin − j 2πkn

N

)

Inverse transform:

x (n ) =

1

N

∑

N −1

k =0

X ( k )(cos + j 2πkn

N + j sin + j 2πkn

N

)

Frequency analysis

• Alternative methods include:

– filter-bank analysis (based on a set of

band-pass filters)

– approximations of the spectral

envelope, e.g., Linear predictive

coding (LPC)

Time-frequency resolution 1

• If the window is long then

– the time resolution is poor

– the number of points, N, is large

– there are N points in the spectrum

– so there is fine frequency resolution

– narrow-band frequency analysis, or

narrow-band spectrum

Narrow-band spectrum

Time-frequency resolution 2

• If the window is short then

– the time resolution is good

– the number of points, N, is small

– there are N points in the spectrum

– so the frequency resolution is coarse

– broad-band frequency analysis, or

broad-band spectrum

Wide-band spectrum

Time-frequency resolution 3

• In summary:

– long window, narrow-band spectrum;

– short window, broad-band spectrum.

• Indeed, the bandwidth-time product

cannot exceed a half:

1

BT ≤

2

where T = N f S and f S is the

sample rate

Wide-band and narrow-band spectrograms

Mel-frequency filter bank

• Allocation of DFT bins to filters,

spaced according to the Mel scale:

The real cepstrum

• Procedure for computing cepstral

coefficients from the magnitude

spectrum:

Mel-frequency cepstrum

• Procedure for computing cepstral

coefficients, based on the output

from Mel-frequency binning:

Summary of Fourier analysis

• Fourier leads to frequency representation

– good for visualisation

– is reversible

– continuous and discrete time forms

• Wide- and narrow-band spectra obtained

by adjusting frame size

• Windowing

– reduces spectral smearing

– allows for adaptation

- Your Essential Guide to DigitalUploaded byscribd264
- Pertemuan3-Sampling_and_Quantization_in_MATLAB(1).pptxUploaded byEstika Vriscilla Ginting
- PhyzokUploaded byGaurav Mittal
- Sound PerceptionUploaded bySiva Ram S
- Nexus Pmr Science Form 1•2•3 Page 131 (Questions )Uploaded byMs_Syazana
- Speech RecognitionUploaded byVishu Grover
- Digital Audio TheoryUploaded bymws_97
- A Comparison Between Hearing and Tone BurstUploaded byCatalina EspiGa
- SensesUploaded byAnesa Halilagic
- 1998 - Olson - Observing Middle and Inner Ear Mechanics With Novelintracochlear Pressure SensorsUploaded byoverkind
- Raynal, Harion, Favre-Marinet, Binder - The Oscillatory Instability of Plane Variable-Density Jets (1996)Uploaded byAryce_
- Up SamplingUploaded byАлексей Грабко
- carrano-ion-2010-028Uploaded bysathyavemuri9760
- x071 12Uploaded byapi-298216893
- Spectral AnalysisUploaded byLiviu Galatanu
- Identities MrspUploaded byAejaz Aamer
- 12FPGA Implementation 0f 32 Point GM24Sept13VIT CopyUploaded bykartik
- 4.ec6501_lp_dc_2018Uploaded byHARIPRASATH ECE
- 290Uploaded bynisa_kartika
- Digital Signal Processing MatlabUploaded byCatalin del Bosque

- Quality Assurance ProjectUploaded bySabharish Murali
- SOM Guide.pdfUploaded byFulu
- SPM Add Maths Pass Year QuestionUploaded byShatesh Kumar Chandrahasan
- Tok Essay 1 RevisedUploaded byDaniel Uy
- 2018 BSCpE ANNEX III Course Specifications Nov. 28 2017Uploaded byRoman Marcos
- C#Uploaded bysrksan
- 15 Electric Forces FieldsUploaded byAgung Arif Nur Wibowo
- Case Studies on Optimum Reflux Ratio of Distillation Towers in Petroleum Refining ProcessesUploaded byMcChima Leonard
- Maple GuideUploaded byReynaldo Jauregui
- Chapter 2. Convolutional CodesUploaded bycoolkad81
- Hardness TestUploaded byJatin
- Iaetsd-jaras-review on Vortex Flowmeter AnalysisUploaded byiaetsdiaetsd
- Synon 2E EvaluationUploaded byRasmi Ranjan
- Models for Sorption Isotherms for Foods- A ReviewUploaded bybhuniakanishka
- AllUploaded byMicro Step
- fitjee study material class -10Uploaded byg_group
- Dynamic Lab ReportUploaded byJohnsheng Lee
- Aswer Key AverageUploaded byasfandyar noor
- RoboticsUploaded byShanu Power
- Mathcad - HW4 ECE427 SolnUploaded bypriyadarshini212007
- 05705790Uploaded byRavali Reddy
- Mender MouradUploaded byghassen laouini
- Lab Report Shell &Tube Heat ExchangerUploaded byfahmirased
- Volatility Arch GarchUploaded byActive Anirudh Chhatwal
- Highway CapacityUploaded byNalin Jayaratne
- ch8Uploaded bydocasadaku5651
- JNTU KAKINADA IT SYLLABUSUploaded bysambhanimadhubabu
- SWFWMD - Green and Ampt Analyses SupportUploaded byalirfane
- A Step-by-Step Approach.pdfUploaded byepalo
- BE EEEUploaded byGopinathbl