Feature Extraction MFCCs PDF

Feature extraction of speech with Mel
frequency cepstral coefficients (MFCCs)

Generic audio classifier
Feature Predicted
Classifier class(es)
extraction
Speech
pressure wave
Stored
acoustic
models
Short-term speech processing
N 1 2
X [k ]   x[
n 0
n ]w[ n ]e  2jnk / N
The source and the filter
Intuition to cepstral processing
In cepstral processing, we are interested in the “frequency content”

of the magnitude spectrum. We apply Fourier techniques to the
magnitude spectrum, just like we usually apply them to time-domain
signals.
From theory to practice
Windowed & pre- FFT |.| log DCT

emphasized frame Truncate
cepstrum vector
The inverse Fourier transform is replaced by
the discrete cosine transform (DCT)
% frame is a short-term chunk of speech

NFFT = 2^nextpow2(length(frame));
frame = filter([1, -0.97],[1],frame);
frame = frame .* hamming(length(frame));
spectrum = fft(frame,NFFT);
logmagspec = log(abs(spectrum(1:NFFT/2+1))+eps);
cepstrum = dct(logmagspec);
cepstrum = cepstrum(2:numcoefficients+1);
We retain only a few of the lowest coefficients. Note that we

drop c[0] since it is proportional to log-energy and therefore
depends on the distance to microphone, the
vocal effort of the speaker, etc.
Matlab demo: Reconstruction of spectra
from cepstral coefficients
Hamming-windowed frame
Each spectrum is reconstructed by
0.04 zeroing out the coefficients as
0.02 shown in the figure, and applying
0 inverse dct to that vector
-0.02
50 100 150 200
Log magnitude spectrum Cepstrum
10
-2
-4 0
-6
-10
20 40 60 80 100 120 0 50 100
Log magnitude spectrum Truncated cepstrum
10
-2
Lower coefficients  spectral envelope,
-4
-6
0
smooth spectral variations, the ”low-frequency”
-8 component of the log magnitude spectrum.
-10
20 40 60 80 100 120 0 50 100
Log magnitude spectrum Truncated cepstrum
10 10
Higher coefficients  the ”periodic” and
0 0 other ”high-frequency” components of the log
magnitude spectrum.
-10 -10
0 50 100 0 50 100
Mel-frequency cepstral coefficients (MFCCs)
- A combination of a psycho-acoustically motivated mel-frequency warped
filterbank and cepstrum
Dimensionality of the
Frame of speech intermediate vectors:
Windowed frame
Pre-emphasis
+ windowing Frame: N=240 samples
Mel-frequency
Magnitude spectrum
filterbank
|FFT| FFT magnitude
Mel- spectrum:
frequency
filtering
NFFT/2+1=129 bins
Mel-filtered spectrum
log ( . )
Mel filter outputs:
M=27 filters
DCT
MFCC vector
Truncation MFCC vector:

d=12 coefficients
Mel-scale filterbank
Typical MFCC parameters
• 25 or 30 msec frame, 50% frame overlap
• FFT size 512, 1024 or 2048
• P=24 to P=30 filters in the Mel bank
• Retain ≈ P/2 cepstral coefficients
• Delta and double delta coefficients and
cepstral mean subtraction (CMS)
How do the MFCCs look like?
Time
(Frame
number)
MFCC
index
The energy of the cepstral coefficients is concentrated

on the low coefficients (low ”quefrencies”)
MFCCs – the Swiss Army Knife of
Speech and Audio Signal Processing
• Speech classification
– Automatic speech recognition
– Speaker identification
– Language identification
– Emotion recognition
• Music information retrieval
– Musical instrument recognition
– Music genre identification
– Singer identification
• Speech synthesis, coding, conversion
– Statistical parametric speech synthesis
– Speaker conversion
– Speech coding
• Others
– Speech pathology classification
– Identification of cell phone models
• etc …
Example MFCC implementations
• Voicebox (Matlab)
http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voi
cebox.html
• Rastamat (Matlab)
http://labrosa.ee.columbia.edu/matlab/rastamat/
• Bob toolkit
http://idiap.github.io/bob/
• Hidden Markov model (HTK) toolkit
http://htk.eng.cam.ac.uk/
Recommended literature
• X. Huang, A. Acero, H.-W. Hon, Spoken Language Processing:
A Guide to Theory, Algorithm and System Development.
Prentice Hall, 2001.
• R. M. Schafer, “Homomorphic systems and cepstrum analysis
of speech”. In J. Benesty, M. Sandhi, Y. Huang (Eds), Springer
Handbook of speech processing, pp. 161—180.

Feature Extraction MFCCs PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Feature Extraction MFCCs PDF

Uploaded by

Copyright:

Available Formats

Feature extraction of speech with Mel

frequency cepstral coefficients (MFCCs)

In cepstral processing, we are interested in the “frequency content”

Windowed & pre- FFT |.| log DCT

% frame is a short-term chunk of speech

We retain only a few of the lowest coefficients. Note that we

Truncation MFCC vector:

The energy of the cepstral coefficients is concentrated

You might also like