Professional Documents
Culture Documents
Feature Predicted
Classifier class(es)
extraction
Speech
pressure wave
Stored
acoustic
models
Short-term speech processing
N 1 2
X [k ] x[
n 0
n ]w[ n ]e 2jnk / N
The source and the filter
Intuition to cepstral processing
cepstrum vector
The inverse Fourier transform is replaced by
the discrete cosine transform (DCT)
Dimensionality of the
Frame of speech intermediate vectors:
Windowed frame
Pre-emphasis
+ windowing Frame: N=240 samples
Mel-frequency
Magnitude spectrum
filterbank
|FFT| FFT magnitude
Mel- spectrum:
frequency
filtering
NFFT/2+1=129 bins
Mel-filtered spectrum
log ( . )
Mel filter outputs:
M=27 filters
DCT
MFCC vector
Time
(Frame
number)
MFCC
index