This action might not be possible to undo. Are you sure you want to continue?

ssr: Speaker & Speech Recognition

Speech Analysis

by Dr Philip Jackson

lecturer in speech & audio Centre for Vision, Speech & Signal Processing, Department of Electronic Engineering.

http://www.ee.surrey.ac.uk/Teaching/Courses/eem.ssr

**What’s the point of analysing speech?
**

• Speech analysis, or speech processing, transforms a speech waveform into a representation that is suitable for extracting its features: • Human visual inspection

– e.g., by a speech scientist, speech therapist, or forensic phonetician

• Computer analysis

– e.g., for automatic speech recognition, speaker recognition, or paralinguistic processing

**And what does that mean?
**

• Suitable could be:

– amenable to human visual inspection – using a small number of bits per second (for transmission or storage) – compatible with the models in a speech recognizer – in line with our understanding of human auditory processing

Cochlear section

• Cochlea, or inner ear, has a spiral form:

– vestibular canal – basilar membrane – tympanic canal – auditory nerve

Response of the cochlea

Basilar membrane

• • • •

sound enters at the stapes travels along the basilar membrane vibrates at matching position activates auditory nerves

**Short-term spectrum
**

• Represents the distribution of power with respect to frequency over a time interval centred at time, t, like a vertical slice through the spectrogram • From a source-filter perspective, it gives us some information about the shape of the vocal tract at time t • From a human speech perception view, it provides similar information to that sent from the cochlea to the auditory nerve

**Computing the ST-spectrum
**

• Analogue-to-Digital (A/D) Conversion

– convert the analogue signal from the microphone into a digital signal

• Windowing

– select a short section of speech, centred at time t, and smooth

• Frequency analysis

– estimate the distribution of power with respect to frequency

Waterfall display

Speech spectrogram

Derived formant tracks

A/D conversion

• Sampling measures the speech signal at regular intervals, n • Quantisation encodes the signal xn with a discrete value

xn

n

Sample rate

• Nyquist’s theorem: for a signal bandlimited to B Hz, then a rate of 2B samples per second is needed to encode the signal faithfully • Human ear sensitive up 20 kHz (hence 44 kHz rate for CDs) • But for speech:

– high-quality needs 10 kHz bandwidth, i.e., 20 kHz sample rate – bandwidth can be reduced to ~4 kHz (8 kHz rate), for telephone quality – e.g., 8-bit PCM at 8kHz = 64 kbps

CD-quality: fS = 44 kHz

High-quality speech: fS = 20 kHz

Telephone speech: fS = 8 kHz

Window functions

Frequency analysis

• Discrete Fourier Transform (DFT) is applied to the windowed digital waveform {x(n):n=1,…,N}. • With an N-sample window, an N-point complex spectrum is obtained {X(k): k=1,…,N}. • The modulus squared gives the power spectrum, |X2(k)| • The logarithm gives the log-power spectrum, log|X2(k)|

**Discrete Fourier transform
**

• over a finite period of time • sampled at regular intervals Forward transform:

N −1

X ( k ) = ∑n =0 x ( n ) cos

Inverse transform:

(

− j 2πkn N

+ j sin

− j 2πkn N

)

1 x (n ) = N

∑

X ( k ) cos k =0

N −1

(

+ j 2πkn N

+ j sin

+ j 2πkn N

)

Frequency analysis

• Alternative methods include:

– filter-bank analysis (based on a set of band-pass filters) – approximations of the spectral envelope, e.g., Linear predictive coding (LPC)

**Time-frequency resolution 1
**

• If the window is long then

– – – – – the time resolution is poor the number of points, N, is large there are N points in the spectrum so there is fine frequency resolution narrow-band frequency analysis, or narrow-band spectrum

Narrow-band spectrum

**Time-frequency resolution 2
**

• If the window is short then

– – – – – the time resolution is good the number of points, N, is small there are N points in the spectrum so the frequency resolution is coarse broad-band frequency analysis, or broad-band spectrum

Wide-band spectrum

**Time-frequency resolution 3
**

• In summary:

– long window, narrow-band spectrum; – short window, broad-band spectrum.

• Indeed, the bandwidth-time product cannot exceed a half: 1 BT ≤ 2 where T = N f S and f S is the sample rate

Wide-band and narrow-band spectrograms

**Mel-frequency filter bank
**

• Allocation of DFT bins to filters, spaced according to the Mel scale:

**The real cepstrum
**

• Procedure for computing cepstral coefficients from the magnitude spectrum:

**Mel-frequency cepstrum
**

• Procedure for computing cepstral coefficients, based on the output from Mel-frequency binning:

**Summary of Fourier analysis
**

• Fourier leads to frequency representation

– good for visualisation – is reversible – continuous and discrete time forms

**• Wide- and narrow-band spectra obtained by adjusting frame size • Windowing
**

– reduces spectral smearing – allows for adaptation

- speaker_recognition
- Speaker Recognition
- Lecture 1 Speech Analysis
- Felps09sc51.10Foreign Accent Conversion in Computer Assisted Pronunciation Training
- Glottal Inverse Filtering
- Examples Fourier.ps
- time doma
- Speaker Identification in Odiya using Mel Frequency Cepstral Coefficients and Vector Quantisation
- Random Signals Fft
- Calculating AppNote
- A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
- 1/f noise
- 1503.06675
- Acoustic Fluent
- Cepstrum
- SASP Slides
- Chemometrics for Raman Spectroscopy
- GUSTAVO HENRIQUE 303-leonowicz
- 48748066
- Complete Theory
- 7.OffshoreHydromech1_Waves.pdf
- 4 Amit R Bhende Research Communication May 2011
- Survey of Technical Progress in Speech Recognition by Machine over Few Years of Research
- Bayesian analysis - verzija 3.doc
- 5988-8647EN
- Zeitler_2007_Int J Pharm (2)
- Clase8
- YH_apr11th2014
- Automated Surface Inspection for Statistical Textures
- Communication Disorders Quarterly 2010 Bahr 131 8
- slides03

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd