/  487
 
 AUDIO SIGNALPROCESSING AND CODING
 Andreas SpaniasTed Painter Venkatraman Atti
WILEY-INTERSCIENCE A John Wiley & Sons, Inc., Publication
 
INTRODUCTION
213
and transmitted for each block from each subband. Subband bit allocations arederived from a simplified psychoacoustic analysis. The original coder reportedin [Thei87] considered only in-band simultaneous masking. Later, as describedin [Stol88], interband simultaneous masking and temporal masking were addedto the bit rate calculation. Temporal postmasking is exploited by updating scalefactors less frequently during periods of signal decay. The MASCAM coder wasreported to achieve high-quality results for 15 kHz bandwidth input signals atbit rates between 80 and 100 kb/s per channel. A similar subband coder wasdeveloped at Philips during this same period. As described by Velhuis
et al
.in [Veld89], the Philips group investigated subband schemes based on 20- and26-band nonuniform filter banks. Like the original MASCAM system, the Philipscoder relies upon a highly simplified masking model that considers only theupward spread of simultaneous masking. Thresholds are derived from a prototypebasilar excitation function under worst-case assumptions regarding the frequencyseparation of masker and maskee. Within each subband, signal energy levels aretreated as single maskers. Given SNR targets due to the masking model, uniformADPCM is applied to the normalized output of each subband. The Philips coderwas claimed to deliver high-quality coding of CD-quality signals at 110 kb/s forthe 26-band version and 180 kb/s for the 20-band version.
 8.1.1.2 Masking Pattern Adapted Universal Subband Integrated Cod- ing and Multiplexing (MUSICAM)
Based primarily upon coders developedat IRT and Philips, the MUSICAM algorithm [Wies90] [Dehe91] was successfulin the 1990 ISO/IEC competition [SBC90] for a new audio coding standard. Iteventually formed the basis for MPEG-1 and MPEG-2 audio layers I and II. Rela-tive to its predecessors, MUSICAM (Figure 8.1) makes several practical tradeoffsbetween complexity, delay, and quality. By utilizing a uniform bandwidth, 32-band pseudo-QMF bank (aka “polyphase” filter bank) instead of a tree-structuredQMF bank, both complexity and delay are greatly reduced relative to the IRTand Phillips coders. Delay and complexity are 10.66 ms and 5 MFLOPS, respec-tively. These improvements are realized at the expense of using a sub-optimal
s(n)PolyphaseAnalysisFilterbank32 ch.(750 Hz @ 48 kHz)Side Info1024-pt.FFTPsychoacousticAnalysisQuantizationBit AllocationScl Fact.Samples8,16,24 ms
Figure 8.1.
MUSICAM encoder (after [Wies90]).
 
214
SUBBAND CODERS
filter bank, however, in the sense that filter bandwidths (constant 750 Hz for48 kHz sample rate) no longer correspond to the critical band rate. Despite theseexcessive filter bandwidths at low frequencies, high-quality coding is still possi-ble with MUSICAM due to its enhanced psychoacoustic analysis. High-resolutionspectral estimates (46 Hz/line at 48 kHz sample rate) are obtained through theuse of a 1024-point FFT in parallel with the PQMF bank. This parallel structureallows for improved estimation of masking thresholds and hence determinationof more accurate minimum signal-to-mask ratios (SMRs) required within eachsubband.The MUSICAM psychoacoustic analysis procedure is essentially the same asthe MPEG-1 psychoacoustic model 1. The remainder of MUSICAM works asfollows. Subband output sequences are processed in 8-ms blocks (12 samplesat 48 kHz), which is close to the temporal resolution of the auditory system(4–6 ms). Scale factors are extracted from each block and encoded using 6 bitsover a 120-dB dynamic range. Occasionally, temporal redundancy is exploitedby repetition over 2 or 3 blocks (16 or 24 ms) of slowly changing scale factorswithin a single subband. Repetition is avoided during transient periods such assharp attacks. Subband samples are quantized and coded in accordance with SMRrequirements for each subband as determined by the psychoacoustic analysis. Bitallocations for each subband are transmitted as side information. On the CCIRfive-grade impairment scale, MUSICAM scored 4.6 (std. dev. 0.7) at 128 kb/s,and 4.3 (std. dev. 1.1) at 96 kb/s per monaural channel, compared to 4.7 (std.dev. 0.6) on the same scale for the uncoded original. Quality was reported tosuffer somewhat at 96 kb/s for critical signals which contained sharp attacks (e.g.,triangle, castanets), and this was reflected in a relatively high standard deviation of 1.1. MUSICAM was selected by ISO/IEC for MPEG-1 audio due to its desirablecombination of high quality, reasonable complexity, and manageable delay. Also,bit error robustness was found to be very good (errors nearly imperceptible) upto a bit error rate of 10
3
.
8.2 DWT AND DISCRETE WAVELET PACKET TRANSFORM (DWPT)
The previous section described subband coding algorithms that utilize banks of fixed resolution bandpass QMF or pseudo-QMF finite impulse response (FIR)filters. This section describes a different class of subband coders that rely insteadupon a filter-bank interpretation of the discrete wavelet transform (DWT). DWT-based subband coders offer increased flexibility over the subband coders describedpreviously since identical filter-bank magnitude frequency responses can be obtai-ned for many different choices of a wavelet basis, or equivalently, choices of filtercoefficients. This flexibility presents an opportunity for basis optimization. Theadvantage of this optimization in the audio coding application is illustrated bythe following example. First, a desired filter-bank magnitude response can beestablished. This response might be matched to the auditory filter bank. Then,for each segment of audio, one can adaptively choose a wavelet basis that mini-mizes the rate for some target distortion level. Given a psychoacoustically deriveddistortion target, the encoding remains perceptually transparent.
 
DWT AND DISCRETE WAVELET PACKET TRANSFORM (DWPT)
215
2
2
=
y
=
Qx
=
y
lp 
xy
hp 
Q
Q
y
lp 
y
hp 
x
lp 
(
)
hp 
(
)
Figure 8.2.
Filter-bank interpretation of the DWT.
A detailed discussion of specific technical conditions associated with thevarious wavelet families is beyond the scope of this book, and this chaptertherefore concentrates upon high-level coder architectures. In-depth treatmentof wavelets is available from many sources, e.g., [Daub92]. Before describingthe wavelet-based coders, however, it is useful to summarize some basic waveletcharacteristics. Wavelets are a family of basis functions for the space of squareintegrable signals. A finite energy signal can be represented as a weighted sumof the translates and dilates of a single wavelet. Continuous-time wavelet sig-nal analysis can be extended to discrete-time and square summable sequences.Under certain assumptions, the DWT acts as an orthonormal linear transform
:
R
R
. For a compact (finite) support wavelet of length
K
, the asso-ciated transformation matrix,
Q
, is fully determined by a set of coefficients
{
c
k
}
for 0
k
K
1. As shown in Figure 8.2, this transformation matrixhas an associated filter-bank interpretation. One application of the transformmatrix,
Q
, to an
×
1 signal vector,
x
, generates an
×
1 vector of wavelet-domain transform coefficients,
y
. The
×
1 vector
y
can be separated into two
2
×
1 vectors of approximation and detail coefficients,
y
lp
and
y
hp
, respec-tively. The spectral content of the signal
x
captured in
y
lp
and
y
hp
correspondsto the frequency subbands realized in the 2:1 decimated output sequences froma QMF bank (Section 6.4), which obeys the “power complimentary condition”,i.e.,
|
lp
()
|
2
+|
lp
(
+
π)
|
2
=
1
, (
8
.
1
)
where
lp
()
is the frequency response of the lowpass filter. Therefore, recursiveDWT applications effectively pass input data through a tree-structured cascadeof lowpass (LP) and highpass (HP) filters followed by 2:1 decimation at everynode. The forward/inverse transform matrices of a particular wavelet are associ-ated with a corresponding QMF analysis/synthesis filter bank. The usual waveletdecomposition implements an octave-band filter bank structure as shown inFigure 8.3. In the figure, frequency subbands associated with the coefficientsfrom each stage are schematically represented for an audio signal sampled at44.1 kHz.Wavelet packet (WP) or discrete wavelet packet transform (DWPT) representa-tions, on the other hand, decompose both the detail and approximation coefficientsat each stage of the tree, as shown in Figure 8.4. In the figure, frequency subbands

Sections

show all« prev | next »

Share & Embed

More from this user

Recent Readcasters

Add a Comment

Characters: ...