You are on page 1of 15

SPEECH ANALYSIS

• Speech analysis: Used to extract features from a speech signal


s(n), directly pertinent for different applications while
suppressing redundant aspects of the speech.
• Transformation of s(n) into another signal, a set of signals, or a
set of parameters, with the objective of simplification and data
reduction.
• An efficient representation for speech recognition would be a
set of parameters which is consistent across speakers, yielding
similar values for the same phonemes uttered by various
speakers, while exhibiting reliable variation for different
phonemes
Methods Of Speech Analysis
• The time domain (operating directly on the
speech waveform)
• The frequency domain (after a spectral
transformation of the speech).
Short-Time Speech Analysis
• Speech is dynamic and time varying
• Speech analysis assumes that signal properties
change relatively slowly with time
• During speech production, the vocal tract shape
and type of excitation may not alter much for
small durations (100 ms for vowel, 10 ms for
stops). This allows examination of a short time
window to extract parameters (stationary) which
yields parameters averaged over the window
• To take care of dynamics, analysis windows
allowed to overlap
Windowing
• Windowing is the multiplication of speech signal
s(n) by a window w(n), which yields a set of
speech samples weighted by the shape of the
window.
• Windows have a finite length typically 20-30 ms to
include 2-3 pitch periods
• Window can be shifted to examine any part of s(n)
• Choice of window size-trading of 3 parameters
(1) short enough to satisfy stationarity
(2) long enough to limit number of frames per sec.
(3) long enough to get the desired parameters
Shape of Analysis Window
Time-domain parameters
• Advantage: Simplicity in calculation and physical interpretation.
• Speech features relevant for coding and recognition occur in
temporal analysis: e.g., energy (or amplitude), voicing, and F0.

• Energy can be used to segment speech in automatic recognition


systems, and must be replicated in synthesizing speech. Accurate
voicing and F0 estimation are crucial for many speech coders.
• Time features, e.g., zero-crossing rate and autocorrelation, provide
inexpensive spectral detail without formal spectral techniques
Signal Analysis in the Time
Domain
Short-Time Average Energy and
Magnitude
• Q(n) corresponds to short-time energy or amplitude if T in
Equation (1) is a squaring or absolute magnitude operation.
• Energy emphasizes high amplitudes (since the signal is
squared in calculating Q(n), while the amplitude or magnitude
measure avoids such emphasis and is simpler to calculate
• Such measures can help segment speech into smaller phonetic
units, e.g., approximately corresponding to syllables or
phonemes.
• The large variation in amplitude between voiced and unvoiced
speech, as well as smaller variations between phonemes with
different manners of articulation, permit segmentations based
on energy Q(n) in automatic recognition systems
Short-Time Average Zero-crossing
Rate (ZCR)
Continued……..
Short-Time Aurocorrelation Function
Continued………

You might also like