Professional Documents
Culture Documents
Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis: Speech Technology: A Practical
Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis: Speech Technology: A Practical
Introduction
Topics
Spectrogram
Cepstrum
Mel-Frequency Analysis
Mel-Frequency Cepstral Coefficients
2
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectrogram
3
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
FFT
FFT
FFT
Spectrum
4
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT FFT
FFT
Spectrum
5
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT FFT
FFT
Spectrum
Amp.
Hz
6
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT FFT
FFT
Spectrum
Rotate it by 90 degrees
Hz
Amplitude
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT FFT
FFT
Spectrum
Hz
MAP spectral amplitude to a grey level (0255) value. 0 represents black and 255
represents white.
Higher the amplitude, darker the
corresponding region.
8
Amplitude
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT FFT
FFT
Spectrum
Hz
Time
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Time Vs Frequency
FFT
FFT
FFT
FFT
representation of a speech
signal is referred to as
spectrogram
Spectrum
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT FFT
FFT
Hz
Time
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
10
11
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
12
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
13
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Hidden Markov
Models implicitly
model these
spectrograms to
perform speech
recognition
14
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Usefulness of Spectrogram
Time-Frequency representation of the speech signal
Spectrogram is a tool to study speech sounds (phones)
Phones and their properties are visually studied by phoneticians
Hidden Markov Models implicitly model spectrograms for speech to
text systems
Useful for evaluation of text to speech systems
A high quality text to speech system should produce synthesized
speech whose spectrograms should nearly match with the natural
sentences.
15
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Cepstral Analysis
16
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Frequency (Hz)
17
dB
Frequency (Hz)
18
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral Envelope
Spectrum
Spectral
Envelope
Spectral
details
19
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral Envelope
Spectrum
log X[k]
Spectral
Envelope
log H[k]
Spectral
details
log E[k]
20
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral Envelope
Spectrum
log X[k]
log X[k] = log H[k] + log E[k]
Spectral
Envelope
log H[k]
Spectral
details
log E[k]
22
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
Envelope
Spectral
details
23
Spectral
Envelope
A pseudo-frequency
axis
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
details
24
High Freq.
region
Spectral
Envelope
A pseudo-frequency
axis
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
details
25
High Freq.
region
Spectral
Envelope
IFFT
A pseudo-frequency
axis
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
details
26
HighTreat
Freq.this as a
regionsine wave
with 4 cycles
per sec.
Spectral
Envelope
IFFT
A pseudo-frequency
axis
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
details
27
Gives a peak
at 4 Hz in
frequency
axis
Low Freq.
region
Spectrum
HighTreat
Freq.this as a
regionsine wave
with 4 cycles
per sec.
Spectral
Envelope
IFFT
A pseudo-frequency
axis
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
details
28
Gives a peak
at 4 Hz in
frequency
axis
Low Freq.
region
Spectrum
HighTreat
Freq.this as a
regionsine wave
with 4 cycles
per sec.
Spectral
Envelope
IFFT
A pseudo-frequency
axis
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
details
29
High Freq.
region
Spectral
Envelope
IFFT
A pseudo-frequency
axis
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
details
30
Spectrum
High Freq.
region
Treat this as a
sine wave with
100 cycles per
sec.
Spectral
Envelope
IFFT
A pseudo-frequency
axis
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
details
31
High Freq.
region
Spectral
Envelope
IFFT
IFFT
A pseudo-frequency
axis
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
details
32
Spectral
Envelope
A pseudo-frequency
axis
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
details
33
IFFT
log H[k]
Spectral
Envelope
log E[k]
A pseudo-frequency
axis
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
details
34
IFFT
log H[k]
Spectral
Envelope
log E[k]
A pseudo-frequency
axis
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
details
35
IFFT
log H[k]
In practice all you
have access to only
log X[k] and hence
you can obtain x[k]
Spectral
Envelope
log E[k]
A pseudo-frequency
axis
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
details
36
IFFT
log H[k]
If you know x[k]
Filter the low
frequency region to get
h[k]
Spectral
Envelope
log E[k]
A pseudo-frequency
axis
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Spectral
details
37
IFFT
A pseudo-frequency
axis
log H[k]
Spectral
Envelope
log E[k]
Spectral
details
38
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Cepstral Analysis
X [ k ] = H [ k ] E[ k ]
|| X [k ] || = || H [k ] || || E[k ] ||
|| . || denotes magnitude
Take Log on both sides
log || X [k ] || = log ||H [k ] || + log || E[k ] ||
Taking inverse FFT on both sides
x[k ] = h[k ] + e[k ]
39
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Mel-Frequency Analysis
40
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
dB
Frequency (Hz)
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
41
Mel-Frequency Analysis
Mel-Frequency analysis of speech is
based on human perception experiments
It is observed that human ear acts as filter
It concentrates on only certain frequency
components
Mel-Frequency Filters
43
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Mel-Frequency Filters
More no. of filters in low
freq. region
44
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Mel-Frequency Cepstral
Coefficients (MFCC)
Spectrum Mel-Filters Mel-Spectrum
Say log X[k] = log (Mel-Spectrum)
NOW perform Cepstral analysis on log X[k]
log X[k] = log H[k] + log E[k]
Taking IFFT
x[k] = h[k] + e[k]
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT FFT
FFT
Spectrum
Mel-Filters
Cepstral Analy.
46
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT
FFT FFT
FFT
Spectrum
Cepstral
Vectors
47
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
48
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)
Additional Reading
Chapter 6
Pg: 273 281
Pg: 304 311
Pg: 314 - 316
50
Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)