You are on page 1of 40

DIGITAL AUDIO TUTORIAL

An Introduction to Digital Audio

Malcolm Omar Hawksford


University of Essex

Part 1: Time sampling and jitter

In these notes, I want to explore some of the principles of digital processing and their
relationship to audio engineering. As well as fundamentals, the subjects of oversampling, noise
shaping and digital filters will be considered as these are both topical and sit at the frontier
between the analogue and digital signals. My interest in this subject commenced in 1968 where
I undertook a study of deltamodulation as a means of encoding colour-television signals. I was
immediately impressed by this technique, both by its relative simplicity and by the way it
represented an almost natural gateway between analogue and digital domain. This simplicity
was reflected both within the converter topology and the structure of the digitised signal which
at a microscopic level is digital, while at a macroscopic level has analogue attributes.
Deltamodulation and the derived system delta-sigma modulation are members of the family of
noise-shaping converter and will therefore be described as special cases of these, more general,
structures once we have established some foundations.

Noise shaping has a close affinity with negative feedback where techniques of closed-loop
control can be used as a means of error reduction since the distortion generated by a non-
linearity is distributed within a wide frequency space so as to yield minimum impairment within
the critical audio band. In understanding this process, it will be necessary to reconsider our
perception of distortion when described as a function of frequency. For example, when
evaluating an amplifier there is a tendency to concentrate upon the in-band distortion while the
out-of-band distortion is often ignored. However, the ensemble of distortion is uniquely linked
and is an expression of the system non-linearity and frequency dependence for a given signal
excitation. Hence, if a circuit is internally modified or interacts with an external system so as to
reconfigure the out-of-band distortion, changes in the in-band distortion residue should be
anticipated.

These comments on distortion are raised because in understanding noise shaping, it is


appropriate to apply analogue analysis techniques particularly where a digital system is heavily
oversampled. Thus, in principle, our investigation can give insight into both recursive digital
systems (i.e. those using feedback) and analogue systems where non-linearity and negative
feedback occur.

The simultaneous application of oversampling and noise shaping enable an interchange of


system complexity between the analogue and digital domains, where in a sense we can move
towards a more natural conversion that requires only minimal analogue signal processing. The
attractiveness of this strategy relates to the numerous distortion mechanisms inherent in
analogue circuitry that include active/passive device nonideality, problems of ground-rail design
and power supplies and the hostile electromagnetic environment when shared with digital
circuits. The design and mass implementation of high-order analogue filters also pose problems
that can compromise a system's performance, whereas digital filters are robust, reproducible,
can offer linear phase and a dynamic range definable by the designer, together with transfer
functions that can approach the ideal.

The aim of these notes is to describe some of the fundamental theory underlying digital audio
with a specific bias towards ADC and DAC systems using oversampling and noise shaping,
where the parallels between analogue and digital circuits will be highlighted. The process of
delta-sigma modulation will also be included as this offers significant hardware simplification
and is particularly attractive as a means of digital-to-analogue conversion. Results will also be
presented suggesting how distortion generated by hardware imperfections can be decorrelated
from the signal to form a noise-like residue.

To commence our study, the three fundamental processes of digital audio will be discussed,
time sampling, amplitude quantization and dither, as a prelude to a more detailed description of
oversampling and noise shaping. These three processes enable a continuous signal to be
transformed into a function that is both discrete in time and amplitude but where the distortion

AES loth INTERNATIONAL CONFERENCE T-3


HAWKSFORD

necessarily incurred by quantization is decorrelated to a noise-like residue. Hence, in a


correctly functioning (uniformly quantised and sampled) digital audio system the only
observable impairments shouM be attributableto bandlimitation and an additive noise residue
and that is our target. Any other impairments are a function of hardware problems and non-
optimal signal processing and are not of a fundamental origin.

The first stage of digitisation of an analogue signal is to transform the continuous function into a
discrete function of time by using the process of time sampling. Fig. 1 illustrates the technique
where a bandlimited signal _) is sampled by a uniform sequences of impulses s(n) whose
repetition frequency fs Hz defines the sampling frequency.

Mathematically, sampling is a process of multiplication where the sampled waveform T(n) is


related to _ and s(n) as,

T(n) = q).s(n)

Since the sequence of samples s(n) is periodic, it can be described by the Fourier series,

s(n)
= +2 I=1
cos
and by substitution reveal an expression for the time sampled sequence,

T(n) = q_+2 _r=l {q_cos(2_rfst) }

This expression is more readily recognised when described in the frequency domain as shown
in Fig.2, '-"-w.c_c_--
the teiwns q)cos f.__..rk_,,_slj
'x represent a,'r_olimdettt_L_U_,u,c.u_'_l"In'at'i
L.u;r'n_w'iare,
._,, o that_.,.,,_,
givta_ rlco..o_tt_..

the familiar replication of the spectrum of q)about fs and its harmonics.

In describing the process of sampling in the frequency domain, it was assumed that the input
signal was bandlimited to fc Hz where fc < fs/2' This strategy is necessary to prevent a
distortion termed aliasing distortion where the spectrum of T(n) shows that had fc > fs/2, then
the lower side band of the first sampling harmonic would have overlapped with the baseband
spectrum of qb; this overlap is also observed for all adjacent replicated spectra. If frequency
overlapping, or aliasing, occurs it is impossible to discriminate the individual structure of the
overlapping spectra, the distortion is therefore irreversible. To prevent aliasing distortion the
spectrum of q) must be bandlimited such that fc < fs/2, i.e. the sampling frequency must be
greater than twice the highest frequency in the bandlimited signal to be sampled; this is a
statement of Nyquist's sampling theorem. In practical terms, the sampling theorem describes a
condition that guarantees all the detail in a bandlimited signal, _, will be captured without
distortion, though the strict requirements of bandiimitation is fundamental.

The realisation of circuits which can meet this theoretical ideal and match the specification of
digital audio (16 bit, 44.1 kHz) set a formidable design challenge. To illustrate the magnitude
of the problem, consider the following question:

"How far does a sample have to be displaced in time to introduce a 1-quantum error (8), when
the sample is close to the zero crossing of a 15 kHz, full amplitude sine wave?"

The question can be answered with reference to Fig. 3, where a sine wave is shown close to a
zero crossing and z is the sample timing error. The sine wave of amplitude A and angular

T4 AES 10th INTERNATIONAL CONFERENCE


DIGITAL AUDIO TUTORIAL

f t analogue signal

1 _ t sampling sequence

t
s

ue signal

T(n) _'f_"'_ I lime-sampled signal

Fig. 1 Illustration of process of sampling Jn time domain.

spectrum of k

signal

analogue
I ' I "f
0 f f f +f 2f
C S S C S

sampling
harmonics

i i I _f
0 fc fs + fc 2fs

spectrum o! T(n) L L L

sequence

time-sampled I '_!_ _ ,___tm f

fc fs + fc 2fs

Fig. 2 Illuslralion of the process o! sampling in the frequency domain.

AES loth INTERNATIONAL CONFERENCE


T-5
HAWKSFORD

frequency co has a zero-crossing slope of A(o. Hence for an error of one quantum 8, the
corresponding timing error, _ = _/(A(o). That is, if A is the maximum amplitude for a N-bit,
uniform quantizer, then A = 8.2 N-1 and _ = 21-N/c0, where if N = 16 bit and f = 15 kHz,
(co = 2_f), · = 0.324 ns. This analysis reveals that extremely small timing errors, as a result of
timing fitter, can introduce an effective increase in quantization distortion that is more prevalent
for high-frequency signals and illustrates one of the problems in seeking the full performance
potential of the digital audio format. This class of problem is also motivation to investigate
oversampled systems as by dispersing the coding over a number of samples, the system is less
susceptible to an error in an individual Nyquist sample.

Let us now take a closer look at the role of the anti-aliasing, low-pass ffiter that is required to
bandlimit the input signal prior to sampling. Although the operation of a low-pass filter can be
described as offering progressive attenuation of a sinewave as the frequency is increased, the
equivalent impulse response is also a useful descriptor, where the filter is seen to extend the
duration of an impulse by introducing time dispersion. In general, the more rapid the
attenuation of a low-pass filter, the greater is the time dispersion.

Our present interest is whether a system using an anti-aliasing f'fiter in association with sampling
and a signal recovery filter to convert the signal back to analogue, introduces observable non-
linear distortion when processing an impulsive input signal. The experimental system is shown
in Fig. 4, where an impulse signal, via a delay of T second, is applied to an anti-aliasing filter,
sampler and signal recovery filter. The anti-aliasing filter has a transfer function A(f) with a
nominal cut-off frequency of 20 kHz while the recovery filter has an assumed brick-wall
response at 22.05 kHz. The system enables the recovered output signal to be observed as both
a function of A(f) and the input impulse timing as it is varied with respect to the sampling
sequence. As there is an almost infinite range of A(f) that could be selected, two representative
examples are chosen to demonstrate optimal and non-optimal anti-aliasing filtering:

Example 1: ideal filter with unity gain to 20 kHz and infinite attenuation above 20 kHz.

Example 2: non-ideal filter, see table.

Example 2 is representative of a high(ish)-order analogue filter, while example 1 is an ideal


target more closely matched by a digital filter.

A computer program was written to calculate the overall system impulse response as the input
impulse timing T was progressively delayed with respect to the sampling sequence by 0, 0.25
'c, 0.5 _ and 0.75 _, where _ equals the sampling period of 22.68 ItS.

The results are reproduced in Fig. 5 where, for example 1, as might be anticipated no
differences in output are observable over the range of T, thus demonstrating the effectiveness of
the sampling theorem. However, the results for example 2 do show a variation with T and is a
consequence of aliasing which produces a dynamic distortion that is dependent upon the relative
timing of the input and the sampling function, where the following observations should be
made:

· Fig. 5 (e), (f), (g), (h) show a change in shape of the impulse response as T is varied
from 0 to 0.75 x, any waveform change as a function of pure time delay is a non-linear
distortion.

* Fig. 5 (f), (g), (h), reveal an error in peak amplitude that is maximum for T = 0.5 'c.

* Fig. 5 (f), (h), also reveal a timing error where the pulse maximum is displaced from the
0.25 'cand 0.75 'clocations.

T--6 AES 10th INTERNATIONAL CONFERENCE


DIGITAL AUDIO TUTORIAL

quantization

levels _ signal

d zero-crossing t

Fig. 3 Estimate of timing error on a sample.

Time Anti-aliasing
delay filter Sampling

I Generator
Impulse I -[JJT r""l
Alf_ _,/_ LPF _,'_r_/"_rv,
F2°Sk'zl a;'a,o;jue

t
o
j_
0 T
r
Sampling
sequence s(n)
ou, u,
Fig. 4 Experimental system to observe effect of sampling on an impulse.

AES loth INTERNATIONAL CONFERENCE T-7


HAWKSFORD

Example 2: non-ideal filter with the following


spot gain/frequency response

Frequency, kHz Attenuation dB


f <22.1 0

22.1 -10

27.5625 -30

33.075 -60

38.5875 -80

44.1 -100

f< 44.1 co

* All curves pass through the Nyquist samples where the optimal recovery filter is
responsible for signal interpolation between samples.

* One should reflect upon the specification for the anti-aliasing filter if the sample error at T
= 0.5 'cis to be held within a few quanta and the timing error to within a few nano-
second.

From this experiment and earlier discussion, some preliminary observations and conclusions
can be drawn:

(a) Bandlimitation of an impulsive signal cause the response to be time dispersed, where for
an ideal filter of cut-off frequency ifcHz, the impulse response is

sin (_ft)
h(t) -
( ft)
and exhibits a form similar to that of Fig. 5(b).

(b) However, this response cannot be achieved by a practical f'fiter as the precursive region
(i.e. the response before the pulse maximum) is infinite. A practical f'fiter therefore has a
truncated response yielding known but predictable frequency response error.

(c) The required filter specification for an analogue filter is difficult to achieve and designs
can suffer a range of degradation due to circuit and component imperfections that span
non-linear distortion, temperature dependence, aging, noise, susceptibility to
electromagnetic fields and non-ideal group delay performance.

(d) Aliasing distortion generates dynamic distortion where the shape of a signal is changed as
a function of the relative timing of signal and sampling function. The system is non-
linear as the impulse response is no longer unique.

(e) However, providing the criteria of anti-aliasing filters can be met and the sampler
approaches that of the mathematically ideal model, then the sampling theorem is sufficient
to guarantee no fundamental in-band distortion generation.

T-8 AES 10th INTERNATIONAL CONFERENCE


/'_t_ I DIGITALAUDIOTUTORIAL

] I _ representsNyquist samples J

: -3T T 2T _}"t 4'[' / -2T 2"[ &T

/1',,_ I
I i---i"amptifude error

'_-/¢x--,

,_/!J1 ::ilk ,-r,fl


Fig5h T=0,75= I I I

n°n-ideglfitter/ J j X

v ... --1' "-- k/


Fig 5 Imputse responsewith fractional sampteperiod time detays
AES loth INTERNATIONAL CONFERENCE
%9
HAWKSFORD

In the early days of digital audio, I expressed concern about the consequences of time
dispersion due to anti-aliasing and signal recovery filters. However. my present stance is that,
providing the filters can approach the theoretical ideal in order to minimise in-band errors and
do not degrade the signal other than that defined by the desired transfer function, then the sonic
performance consequences are minimal. Nevertheless, the attainment of this ideal is not
straightforward from the viewpoint of instrumentation, where there are numerous opportunities
for sound degradation.

Part 2: amplitude quantization and dither

The next stage in reviewing the fundamentals of digital audio as a prelude to discussing
oversampling and noise shaping is to introduce amplitude quantization and to observe its
association with sampling, especially as the high frequency spectral components of the resulting
error are not constrained by the anti-aliasing filter. The arguments have so far assumed infinite
signal resolution which in a digital environment is not the case, as digital signals must be discrete
in both time and amplitude. The mapping of a continuous analogue signal to a discrete signal is
termed amplitude quantization, a process that, at worst, introduces non-linear distortion but, at
best, generates only a noise-like residue. In this dialogue we shall concentrate on uniform
amplitude quantization and discuss the determination of signal-to-noise ratio (SNR) and the effect
of sampling upon the quantization distortion spectrum. The co-subject of dither is also relevant,
where we shall investigate the encoding of low-level, non-sinusoidal signals and build upon the
_observations made in Dialogue 1, linking time shifted impulses and aliasing distortion.

The process of transforming a continuous signal c to a discrete signal is succinctly described by


reference to the stair-case transfer characteristic illustrated in Fig. l(a). This characteristic acts as
a look-up table so that a signal in the range (r - 0.5)d to (r + 0.5)d is approximated to a new level
ra[ where, for uniform quantization, r is an integer and d a constant quantization interval or
quantum. Consequently, in an ideal system, the resulting quantization distortion e spans a range
-d/2 to d/2, although due to non-optimum electronics, this range may be exceeded. It is also
desirable that all values of error over the quantization interval are equi-probable, where this aspect
,,_,..... ,47o,_..... m be returned to m 1,,t,_rd.o,_uoo.ons,,,. ,4_th.... ,_,,,,_ _ho,,_,,,,
·_sl _.._UU. IILIL, L._I. IUII VVIII ¢[11_1 _ll_ ll_lO_ OII_L_.SlII_,

The SNR can be estimated by observing the periodic behaviour of quantization distortion
expressed as a function of signal level as shown in Fig. l(b). Assuming equi-probable levels,
then the mean-square value of quantization distortion is also the mean-square value of this
periodic waveform, which can be estimated by averaging e 2 over the interval c = 0 to d/2 (where
e = c), that is
d/2

2 c2
mean-square quantisation distortion = _ f dc = d2
1'"2'
0

For a N-bit quantizer the maximum amplitude A for a sinewave is 2Nd/2 which has the
corresponding mean-square value of A2/2, whereby

_ [ means-square signal]
SNR = 10 logl0 l mean-square noise I = (6.02N + 1.76)dB
i

This result shows for a N = 16 bit system that SNR = 98 dB and that a 1 bit increment to the
quantizer resolution changes the SNR theoretically by 6.02 dB (usually rounded down to
6 dB).

So far, the investigation of quantization has avoided any reference to time sampling or to the
bandwidth over which the quantization noise was measured· Inevitably, when the amplitude
quantised signal is sampled, the spectrum of the quantization distortion is changed. However,
what is probably surprising on initial encounter is that when the quantised signal is sampled, the
quantization noise power remains unchanged and is concentrated into the frequency band 0 to fs/2

. T-10 AES
10th
INTERNATIONAL
CONFERENCE
DIGITAL AUDIO TUTORIAL

rd d
No q _i,
Quantization

Difference between ( r-. 5) d


curves (r+.5)d

Fig. la Uniform (linear) quantization transfer


characteristic.

,,4 ,4 /
-0.5d
I_-0_ C

Fig. lb Quantization distortion as a function of signal level.

AES10th
INTERNATIONAL
CONFERENCE T-Il
HAWKSFORD

Hz, where fs Hz is the sampling frequency, although a replication of this spectrum also occurs
about each harmonic of fs Hz. This is an important observation and is worth further
consideration, especially as we shall revisit this result in association with oversampling in both
ADC and DAC applications.

The process is not too difficult to comprehend where in Figure 2, an illustrative power spectrum
of the quantization distortion is shown which inevitably decays to insignificance at high
frequency. Since this spectrum represents the total quantization distortion, the area under the
curve must correspond to the noise power d2/12. Also shown in Figure 2 is the sampling
frequency fs Hz and its associated harmonics rf s Hz. Remember from part 1 that the sampling
function and the signal to be sampled are effectively multiplied which corresponds to a
convolution in the frequency domain. In simple language, this is equivalent to the frequency
shifts normally associated with amplitude modulation, where for example, considering the r th
harmonic of the sampling function, the spectrum in the range (frs - fs/2) to (frs + fs/2) Hz is
frequency shifted into the baseband region 0 to fs/2 Hz. If we now observe this frequency shift
for all the remaining sampling harmonics where there exists significant quantization distortion, the
complete distortion spectrum of power d2/12 becomes folded and irreversibly interwoven into the
region 0 to fs/2 Hz. Strictly, this relocation of the distortion spectrum is a gross form of aliasing
distortion as described in part 1, but it only applies to the quantization distortion which is
generated in the amplitude quantizer after the anti-aliasing f'fiter.

In part 1, we demonstrated that time sampling resulted in a harmonically related series of


amplitude modulated carders, which was a result of the input signal being sampled at uniformly-
spaced time instants. However, parallels can also be drawn with amplitude quantization, where
the transfer characteristic of Figure l(a) reveals an input function c that is effectively sampled at
uniform instants, but along the amplitude axis, where the sampling instants are d/2, 3(:1/2, 5d/2,
etc. Indeed, the error e when expressed as a function of c shows precise periodicity (see Figure
1(b)). Consequently, the error e can be represented by a Fourier series as,

e = --
d_N=I ._
1 sin ( 2 _ Nd)

This result demonstrates that quantization distortion is equivalent to an infinite series of phase-
modulated sinusoids, that enables digital coding to be modelled in terms of both amplitude and
phase modulation which forms a useful analysis tool in understanding the formation of non-linear
distortion spectra when a signal is digitised.

The fundamental processes of time sampling and amplitude quantization have now been
introduced and give our first glimpse of the gateway between the analogue and digital domains.
However, as we move closer and observe the finer detail, the gateway exhibits some rough edges
like splinters that emerge to trap the flow of information and cause low-level distortion. The
gateway is in need of smoothing to remove the rough edges so as to allow low-level signals to
flow with only random-like perturbation rather than complete obstruction.

To illustrate the low-level distortion of a quantizer, consider a sinusoidal input function


c = 2d sin tot applied to the quantizer of Figure l(a). In Figure 3 the resultant waveform is
shown, where gross distortion is evident, together with the periodic nature of the distortion.
Since the signal and distortion are both periodic they are correlated, whereby a Fourier analysis
would reveal a line spectrum reminiscent of harmonic and intermodulation distortion in amplifier
systems. Because of the sampled nature of the waveform and the non-bandlimitation of the error
waveform, aliasing distortion generally produces a non-harmonic distortion spectrum. This type
of behaviour of a digital converter is very much the down-side and exhibits a highly objectionable
sonic signature, albeit, at low level.
If the input signal c is lowered still further so that the amplitude is less than d, then the output

T-12 AESloth INTERNATIONAL


CONFERENCE
DIGITAL AUDIO TUTORIAL

, I I i I I I-
0 f 2f 3f
s s s

quantization
distortion to baseband

Replicated and aliased quantization


distortion spectrum

0 f 2f 3f
s s s

Fig. 2 Aliasing of quantization distortion to baseband together with


replication of aliased spectrum about sampling harmonics.

.c: 2d sin (el) __ quonfised signo[


/
rw .-_

.... d-z-V
_.Sd ._\, '_-_V . ,,
_ _ _' _ X I _ _
o._ ? _,
.... _ N ,_ _ ,_ _ ,
-°'_ L ii

Fig.3 Low-LeveLquonfisofion disforfion wifhouf difher,


(quonfisofion ronge m d)

AES 10th INTERNATIONAL CONFERENCE T-13


HAWKSFORD

becomes a square wave for a mid-riser quantizer, that is, the convener has minimal capability of
coding low-level signals. The objective for low-level encoding is to translate the quantization
distortion to a noise-like residue which exhibits no periodicity, whereby the distortion sounds like
a steady hiss reminiscent of white noise, together with the capability for the signal c to sink into
the noise without obvious distortion. This goal is attainable and is achieved using additive dither
or, as will be seen later, high-order noise shaping and oversampling.

My first experience of the effectiveness of dither was in picture coding for television systems in
the late 1960s. If a video waveform is quantised in amplitude, it exhibits contours of constant
brightness. For example, a continuous video grey scale would map to a stair-case like grey
scale; this effect has also become popular more recently in video recorders as solarisation .
However, if a noise-like signal is added to the video signal prior to quantization, the discrete
bands of constant brightness are broken down in exchange for a noisy picture where, if the
picture were attenuated, it would appear to sink into the noise, effectively the signal and
quantization distortion have become decorrelated.

The technique is equally applicable to digital audio and can be considered a fundamental
requirement of the conversion process. Basically, a dither signal is a noise-like signal of low
level which is added to the audio signal prior to quantization. Ideally, after quantization, the same
noise signal should be subtracted in the digital domain to minimise noise impairment. However,
since the dither is usually of low level and is uncorrelated with the signal, this correction is often
omitted.

Although the dither process is usually implemented with an additive noise signal, it is equivalent
to consider the process as a jittering of the quantization levels. Consequently, the quantizer
comparison levels are no longer confined to fixed values but on average can occupy all
intermediate values when observed over a sufficiently long time period. This observation
demonstrates how the low-level coding performance of a quantizer is extended to enable signals
to be converted that sink into the noise. The target in designing a dither signal is to produce a
smooth quantization distortion noise spectrum that is completely decorrelated from the signal,
whereby quantization distortion appears as additive noise.

The effectiveness of quantization distortion decorrelation can be tested by applying a periodic


signal to a convener and then performing an average of the output taken over successive periods
of the waveform. As the number of averages is increased, the correlated signal components will
add coherently and thus reinforce while the decorrelated components will add only as a power
addition; hence, averaging can ultimately expose the repetitive components, allowing the
performance of the dither signal to be tested.

To illustrate the technique of dither, a computer model was constructed based upon the system of
Figure 4. A controlled level of noise was added to a time-sampled sinewave of amplitude 0.5 d
and subsequently subtracted after quantization to prevent unnecessary noise degradation. The
quantised output sequence x(n) was then averaged over 256 consecutive input periods to reduce
the decorrelated distortion components and expose the effectiveness of low-level conversion
below the noise floor. In Figure 5, the results are shown for a range of noise n(n) to demonstrate
the coding dependence on the level of dither, where a split-window presentation shows the coded
output both with and without averaging.

The results of Figure 5 show that providing the dither sequence n(n) produces effective quantizer
comparison levels that span the full range - 0.5d to 0.5 d that excellent low-level coding into the
noise occurs, but if n(n) does not achieve this target then there is inevitable non-linear signal
impairment. Interestingly, for dither with a uniform level distribution, if the dither is increased
over the optimum value (+ d/2), there is an increase in non-linear signal distortion (see n(n) =
1.25). However, our discussion has been based upon dither with uniform amplitude distributions
where a marked sensitivity to dither level was demonstrated (see Fig. 5 for example). In practice,
dither signals with non-uniform probability distributions such as the triangular distribution f'md
greater favour [5,7] where lower modulation noise and a reduced sensitivity to absolute level is
revealed. For example, Blesser [6] has shown that a concentration of narrow band dither located
at the Nyquist frequency results in enhanced SNR even though the dither is of greater level
compared with noise of uniform spectral density. However, for high-accuracy/high-resolution
T-14 AES 10th INTERNATIONAL CONFERENCE
DIGITAL AUDIO TUTORIAL
'10 --
C
¢0 C
03
.m
'_ e3
._-- *O
C
o
C
0
0
c'-
e'-
'0
C
o
o
0
o
O.
6_
0
AES loth INTERNATIONAL CONFERENCE T-l_5
HAWKSFORD

NO AVERAGING 256 AVERAGES


f r"_"_-_7 -'_ _' r
..... i · _ _"'--7N, i I_, ', /c: 0.Ssin(_)t)

· ti _ 1_] I (a)n(n)
=0

._ ll_
, ?!
z, ]',,
iL ti
Fi !k ff SNRa =%-8dB

LZ,.-¢ j i_,,_/, I i x_.,/ .I k_',d

/ k f 'k[ (b) ri(n) : 0.2Sd

· SNR= 8-9dB
L /1 L SNRo = 14-1dB

)J '

(d) n(n) = V'tDU


"' '"'"'-'

SNR -- S-9dB

_,j SNRo =33-3dB

(optimum dither)

(e) n(n) :l.0d


SNR o : 62-3dB
SNR = S · 4dB

SNRo--_-SNR offer 256 overages I

Fig.50uonfisafion disforfionwifhouf end wifh signet nveraging


for o ranae of difher[eve[ 0 fo 1.2Sd

T-16 AES
10th
INTERNATIONAL
CONFERENCE
DIGITAL AUDIO TUTORIAL

ADC source noise and amplifier noise are often sufficient to guarantee decorrelafion; where dither
is mandatory is in the digital domain whenever signal truncation (such as when 18/20 bit data is
reduced to 16 bit) generates re-quantizafion distortion. Hence, the good news is that dither
translates quantization distortion to a noise-like residue and therefore represents the third
fundamental operation necessary to optimally convert an analogue signal to digital.

We have now explored in some detail the fundamental processes of time sampling, amplitude
quanfizafion and dither. The next stage is to investigate some techniques for implementing the
conversion processes, taking account of our ultimate target to achieve the full potential of the
digital audio format, together with the minimum of degradation from associated hardware. In part
3 we shall apply the fundamental principles already discussed and introduce the topic of
oversampling and demonstrate its application in both ADC and DAC.

Part 3: oversampling

Digital audio introduces several difficult-to-achieve design targets, although the difficulty is
usually more a result of the chosen conversion strategy that can include high-order analogue
filters, sample-and-hold circuits and ADC/DAC electronics with sub-optimal quantization
characteristics. Secondary errors in all these sub-systems arc prevalent where, for example, and_
aliasing filters can introduce in-band amplitude ripple and low-level noise and distortion, sample-
and-holds can exhibit non-linear acquisition distortion and jitter and ADC/DACs can have glitch,
slew-induced distortion and non-uniform quantization levels. The latter is particularly disturbing
as not only can high-level non-lin¢_ distortion be introduced by overall curvature in the transfer
characteristic, but displaced quantization levels produce further irregularity and limit low-level
resolution by preventing dither from realising linear signal coding into the quantization noise
floor. These classes of error when located within a sea of wide-band electromagnetic fields
generated by supporting digital electronics are some of the reasons there is current interest in
identifying more radical conversion strategies that can either by-pass or more optimally control
these non-idealities.

The present trend is to design both ADCs and DACs to attain > 16 bit resolution. In the recording
studio this approach is welcome as there is not only greater flexibility on dynamic range, but
when signals are ultimately truncated to 16 bit - using optimal digital dither or noise shaping - to
facilitate coding into the noise, the theoretical target dictated by 16 bit quantization is more
accurately met, where the re-quantization distortion appears only as an additive noise source
superimposed on the initial, wider-dynamic range, digital signal.

The motivation to produce DACs with > 16 bit resolution is also welcome for a number of
reasons. To realise the full potential of dither in association with quantization, the ADC and DAC
should achieve a resolution and accuracy commensurate with 16 bit. Any non-linear
displacements of quantization levels will distort low-level signals as they approach and ultimately
enter the noise floor. This is one reason why 18 to 20 bit DACs have recently become popular
where good 16 bit accuracy should then be achieved enabling signals to be recovered to at least 12
dB into the noise, subject, of course, to optimum coding at the ADC. Also with the advent of
loudspeaker frequency response correction using digital equalisation [1, 2], there is an inherent
requirement for a signal range in excess of 16 bit unless re-quantization can be tolerated. Finally,
there are the equally important requirements of minimal performance drift with time and
temperature, the elimination of preset type calibration or routine alignment together with a
tolerance to hardware imperfections. To solve these problems we shall turn to chaos [3, 4] and
the 1-bit DAC, but first we must understand sampling rate conversion.

Oversampling is applicable to both ADC and DAC, where the sampling rate Rfs Hz is chosen to
be greater than the sampling rate fs Hz dictated by Nyquist's sampling theorem. The parameter R
is the oversampling ratio and is usually selected as a power of 2 for computational efficiency,
although this is not fundamental and non-integer ratios can be accommodated. In digital systems,
sampling rates can be either reduced, a process termed decimation (where samples are
discarded), or increased, a process of interpolation (where new samples are created). It is
important to note that decimation may represent a real loss of information as inevitably the
information bandwidth is reduced, but interpolation can never represent a gain of information. In
AES 10th INTERNATIONAL CONFERENCE T-17
HAWKSFORD

fact, processing errors may actually imply a slight loss of information even though the sampling
rate is increased!

It is helpful to categorise oversampling into mild oversampling where R = 2 to (say) 16 and


heavy oversampling where R > 16 but usually tending to 256 and beyond. The reason for this
grouping is the association with noise shaping (to be discussed in part 4 ) as it has a dramatic
impact on system topology in both ADC and DAC systems.

To introduce oversampling, consider an ADC where R = 4, Rfs = 176.4 kHz and where optimal
dither is assumed to give a uniform spectral spread of quantization distortion over the band 0 to
88.2 kHz. There are two principal consequences of x4 oversampling:

(i) The input bandwidth must be restricted to 0 to 154.35 kHz to prevent aliasing distortion
entering the audio band 0 to 22.05 kHz. Since, in practice, most audio signals in the absence of
ultrasonic interference have minimal spectral content above --30 kHz, the oversampled ADC
requires virtually no pre-quantizer filtering; hence, degradation by now redundant analogue
processing can be eliminated. A degree of aliasing distortion is permissible providing the audio
band is not entered, where the process is illustrated in Fig. 1,

(ii) A second consequence of R = 4 oversampling is the location of the quantization spectrum


into the band 0 to 88.2 kHz, whereby only 0.25 (assuming a uniform noise spectrum) of t h e
power resides in the range 0 to 22.05 kHz. This is our first encounter with noise shaping and is
equivalent to a 6 dB reduction in noise power representing a 1 bit improvement in
resolution.

As the oversampling ratio is increased above R = 4, it may be supposed that further enhancements
can be achieved. However, there are limitations with this more conventional approach:

(a) The faster ADC conversion rate may introduce error as a consequence of finite settling
times.

(b) The quantisation distortion spectrum falls (see Fig. 2 in part 2) [5] with frequency,
thus limiting significant noise advantage with increasing R.

To down-convert the 176.4 kHz sampling rate to 44.1 kHz requires decimation by a factor of 4
where the process is illustrated in Fig. 2. The digital signal is initially filtered to remove all signal
components in the band 22.05 kHz to 154.35 kHz whereby sub-sampling can reduce the
sampling rate without aliasing distortion corrupting the audio band.

Fig. 2 reveals that the digital, decimation filter achieves the same function as the analogue anti-
aliasing filter used with Nyquist sampling as discussed in part 2 In fact, the oversampling
process has enabled the anti-aliasing filter to be transferred from the analogue to the digital
domain, otherwise its function is virtually identical. The advantages are that a digital filter can be
designed to exhibit a near-ideal transfer function, with low in-band amplitude ripple, rapid rates
of out-of-band attenuation, zero group delay distortion, an extremely wide dynamic range,
together with time-invariant performance and no aging or thermal problems. These reasons,
coupled with exact replication in manufacture, designer specified dynamic range and an
insensitivity to power supply, component quality and ground-rail problems associated with
analogue circuits, makes the approach attractive and cost-effective for volume production. There
is also a reduction in sensitivity to jitter (see part 1) [6] as several samples now participate in the
calculation of each Nyquist sample allowing a degree of signal averaging.

A more familiar application of oversampling is in digital-to-analogue conversion, where the


process of interpolation is used to increase the sampling rate typically by a factor of 4. The
attraction of this technique parallels that used in the ADC example, where processes normally
performed in the analogue domain are transferred into the digital domain, thus relaxing the
demands on analogue circuitry and simultaneously allowing better signal recovery and
enhancement in resolution. However, it should be stressed that interpolation in no way adds to
the source information or to the theoretical resolution of ADC/DAC; it is a technique more
sympathetic to hardware non-idealities.
T-18 AES10th
INTERNATIONAL
CONFERENCE
DIGITAL AUDIO TUTORIAL
,,._(/)
N
-'r-
IZ3
CO
O
C:
Q,_
X
,._ (/3 Q.)
O3 C
03
o-o
o ¢-
03 4'"
1:3. -I ..O
Q.
(/) {"- O
O _ ::3
C ¢" o3
Q3
O" O r'-
·..a o3
.o 4-, o3c
:3 E (D ,,-m
(/) c
o '1o(33
03 '10 _ C
¢" Q. (:D '_ O
Q.
0.) (/3 (1:1
I-- o (_ E
O o = _o
,... O
Z c _ o_
"-' O
_ I'_ :_ "O
0 ID
(/3 U')
-- ___o
m
E _ _
('_ --- (/3
ID "-' O3
{"'3 ,..._s-
(/3 'O O "-'
-
_ 0 ag
.o no {..> 0>
6_
0 I.l_
AES loth INTERNATIONAL CONFERENCE T-19
HAWKSFORD

Digital, decimation filter rejects this frequency space

I I '

0 f 2f 3f 4f
S S S S

0 f 3f 4
I
S ' I 2Ifs ' I ' --
S

I I i I I
0 2f 3f 4f
S S S S

Signal sub-sampled producing an equivalent spectrum to a direct Nyquist sampled ADC

Input-_/ SAMPLER "_'"-" SUB-SAMPLER _ Nyquist samples

Input S_IFI.ERII _ Nyquistsamples

f
S

Fig. 2 limes 4 decimation and equivalent Iow-pass filler process in digital and
analogue domains in ADC.

T-20 AES loth INTERNATIONAL CONFERENCE


DIGITAL AUDIO TUTORIAL

The basic process of interpolation is shown in Fig. 3 and consists of two stages:

(i) Zero samples are inserted into the digital code so that the effective data rate is increased
from, say, 44.1 kHz to 176.4 kHz.

(ii) Sample values are calculated using a digital, low-pass filter with a pass band ideally 0 to
22.05 kHz, so that the sampling sideband pairs, associated with 44.1 kHz and 88.2 kHz
and 132.3 kHz are attenuated.

In principle, the same digital filter design can be used to perform the interpolation, low-pass filter
as was used to perform decimation in the ADC system, just as in a conventional system with
analogue filters the same filter can be used for both anfi-aliasing and signal recovery.

In principle, when oversampling is used in a DAC, the DAC amplitude resolution can be reduced.
If the output of the interpolation filter is truncated, say by 1 bit, then the process introduces
additional re-quanfization distortion. Providing there is either sufficient signal activity or the
inclusion of digital dither [7], then this extra quantisafion distortion will have a near-uniform
power spectrum which for R = 4 is spread across the band 0 to 88.2 kHz. Consequently, only
0.25 of this extra noise power falls within the audio band 0 to 22.05 kHz and corresponds to a 1-
bit enhancement. However, this truncation process should be approached with some caution,
otherwise a low-level signal as it enters the noise will be distorted. It is better practice in this
circumstance to retain a 16 bit (or higher) DAC and use oversampling to reduce the effect of slight
low-level linearity errors in the DAC. In taking advantage of the extra resolution capability of an
oversampled DAC, the digital signals produced by the interpolation filter must be handled with
care, especially as truncation is required; it is here that the truncation error can be combined with
a noise shaping algorithm to modify the noise spectrum and is a subject to be revisited in part 4.

The discussion on oversampling has revealed that digital filters are the key that enables
interpolation and decimation to be performed and because the filters use an algorithmic approach
based upon precise arithmetic rather than the vagaries of inductors, resistors and capacitors, they
are precise with designable resolution to equal or, indeed, exceed our resolution requirements.
Also, with present-day processors, filters with over 200 coefficients can readily be
accommodated, giving considerable degrees of freedom to a filter design.

A filter with zero group delay distortion requires an impulse response that is even-symmetric, as
shown in Fig. 4(a). Therefore, because the precursive response must be finite for a realisable
filter, the overall response must also be finite and the signal delayed effectively by one half the
duration of the filter's impulse response. Consequently, interpolation and decimations filters are
usually designed using a derivative of the direct form of finite impulse response filter (FIR) [8]
illustrated in Fig. 4(b), where a I ... an represent the n coefficients defining the response and 'cis
the sampling period (5.609 _s).

If the input sequence x(n) is a single, unit sample, then as the pulse is shifted along the filter at the
sampling rate, the output impulse sequence y(n) maps out the impulse response where each
sample is directly associated with a coefficient {ar}. Hence, the design of a FIR filter is about
selecting the {ar} coefficients to achieve the required frequency response. For example, with 200
coefficients (that is 100 variables for a symmetric filter), where each coefficient is represented by
a 16-bit word, the number of filter permutations is, to say the least, high. Consequently, to select
a coefficient set to match the requirements of cut-off frequency, pass-band amplitude ripple,
transition band, stop-band attenuation and stop-band amplitude ripple is a rather daunting task
which goes beyond the present discussions, although some reference material [9, 10, 11] is
included for further reading. However, for illustration, Fig. 5 shows four filter designs using the
Parks McClellan [12] opfimisation procedure where the number of ffiter coefficients are varied
from 25 to 100 in steps of 25 and the filters are matched approximately to the needs of digital anti-
aliasing and signal recovery applications.

A cautionary note with respect to digital filters is that, although they can be designed to

AES 10th INTERNATIONAL CONFERENCE T-21


HAWKSFORD

Spectrum of digita I signal using Nyquist sampling


........................................... ..............................

ID

0 f 2f 3f 4f
s s s s
inlerpolation filler

· _ _l,_,_c,_n
_a_
_pectrum of digital signal with insertion of zero samples {no change!)_

· ' I ' I ' I ]..............


__
0 f 2f 3f 4f
s s s s

spectral replications now occur at 4fs


...._ Digital signal band-limited by interpolation,low-pass filter
" I I I I I I I w_ _
0 f 2f 3f 4f
s s s s

T I + T!
z_,_
T_m_,_
Iz_,;
__I-
T.._,e_s ,nse.ion
ofzero samples to increase sampling

et_ed_d
di]rp]_asamp i intesa_m_leatsed_ _ed TI rate
spectrum)
by a factor of four (corresponds to centre
digital signal band-limited by interpolation,

\111 Iow-pass filter to rate


4-times Nyquist create(corresponds
new samples at
to lower
spectrum)

Fig. 3 4-times interpolation illustrated in both frequency and time domain.

T-?2 AES loth INTERNATIONAL


CONFERENCE
DIGITAL AUDIO TUTORIAL

8.8

.................. !.f ...........

'i

i:!. -_

i
i
I
I
0,2 .................. I-':..................

i
' · [I _,.

' '
.

-0.2 ' · .................. i.i...................


i '

-0.4
('.'.088 4486
_3. 8. 8988 1-::;469 i, 7S59 ?.24'iS
-f &l? Ti. TI I Tt_r_¢'l'.l
t Ifil. .Il M J L L ._-,_.I_:td ;B

Fig. 4a Example impulse response of digital LPF.


(corresponds to Fig. 5, 100 coefficients)

prior to interpolation
tilter
Zero samples ,nt rodueed Fq Fi"I! T = 1/(4_)FJ

N.B. Three in four mulliplicalions are

by zero in times four interpolation _,_ output

I To make the direct form of FIR filter more compulationally elficient the multiplications by zero samples are ignored. I

Fig. 4b Direct form of FIR filter.

AES 10th INTERNATIONAL CONFERENCE T-23


HAWKSFORD

'\ II

.................. t. i i '_

! ' ' 1
_J
'_
'_
I -4"._.88 : ........... .

w _'_1

o ' il '
we 'J _ili! %0.08 ................... J....... I
, i_
i
-¢;;,.6e 25 coefficients' "! -_:8.._e 50 coefficients .............. i
cut-off = 18kHz [ cut-off = 20kHz I
-S0.88 I -1.,_'8.6(_
_' j
!5.088 _
1_.4iA8 _,--.,--,z._o
........ 19.2380 z_J.-- 6488 22.0568 ,u._,_,
,r. IE:.4160 _7.',:;2_,r__o ->--,,_
.............. _.......Zv_..._,_L_
22.85(
r ,,r t.v / ir, i _ -_
rREOu_Hr.., IH KILuHExT- "'_ ,' '_",'-" ......... ....
Fi,
q[-1 I_-
, i :

i .... -20.8o ........................... :.......

, -::-:6.
co ................................. 'j ..... !i_ -46.o_ ' ..... i

c, -_J.
88 ........ : ...... : ....... : ....... :'j .... -_6.0'._
° I
I
-50.00 ': ....... :'I .....

75 coefficients i
t , -_6.66 100 coefficients ..............
-¢.s.,.t-:_,
cut-off = 21kHz : ", " t
' cut-off = 21kHz
-70,80 J -100.80
15,086 I_.:,4i86 17,q206 19,2380 26.g_80 22,8580 i5.88£: Ig.'HS8 17,8286 19,2300 26.EM68 22.85;
,, · .....l}'l r,_LUHr.
rREQUEt,.I ' ....
RI- FREQiJEN_'Y
. . . ]N t.zL.'.,nLR:c
_" _'-"'

Fig. 5 Four examples of digital LPF using Parks-McClellan optimisation.


(sampling rate = 44.1kHz)

T-24
AES loth INTERNATIONAL CONFERENCE
DIGITAL AUDIO TUTORIAL

commendable standards, there are inherent distortion mechanisms. For example, when two N-
bit numbers are summed, the result is a (N + 1) bit number while, when two N-bit numbers are
multiplied, the result is a 2N-bit number. Multiplications and, to a lesser extent, additions are
therefore forms of distortion when the result is truncated. Hence, when many such operations
are performed there may be degradation that affects the dynamic range and signal resolving
ability of the filter. Also, some filters truncate the coefficients that define the transfer function to
16-bit; this is not a non-linear distortion but can introduce perceptible linear, frequency response
errors. However, in reviewing these inherent distortions, it is necessary to emphasise that they
can be designed to be insignificant providing an adequate number range is inherent in the filter
architecture.

The advantages of mild oversampling have now been introduced, where performance gains are
achievable, though it is important to note that these do not extend the theoretical capabilities of a
conventional system; they only allow a more optimum hardware to be implemented. The main
lesson has been the transformation of the anti-aliasing and signal-recovery filters from the
analogue to the digital domain, where the higher-performance potential of digital filtering could be
realised and the associated analogue circuitry simplified.

Finally, a related note on one aspect of digital processing which, although well-known in
communication theory appears to cause controversy in digital audio. In part 1 we emphasised the
importance of minimising sample jitter, as this results in an increase in noise and distortion.
Fundamentally, ADC systems should be designed to attain a minimum of jitter, thus enabling
sample instants to be precisely located on the basis of a constant sampling period. If jitter enters
during source conversion it cannot be subsequently removed as there is no inherent measure of
the sample timing error. However, in data storage, transmission and reading from disk or tape,
there are numerous means for the introduction of jitter. Indeed, in an extreme case, vibration of
an optical fibre, particularly at the send/receive ends, where relative motion with respect to the
optical transducers may result in slight jitter correlated to the music. However, while signals
remain in the digital domain where jitter is appropriately bounded so that signal recognition and
arithmetic processing are not impaired, then there is no degradation of information even though
the digital words may be displaced from their optimum location. Consequently, at the DAC it is
imperative that the samples (oversampled or otherwise) are re-timed so that each sample attains its
correct relative amplitude and temporal co-ordinates. This re-timing is, in principle, achieved by
re-sampling the digital data against a synchronised, low-noise clock source, preferably in
association with a buffer memory. Hence, providing there are no reading and arithmetic errors
and samples are re-timed where the circuitry after re-timing in no way interacts with earlier stages
of electronics by either direct interconnection, power supplies or electromagnetic fields, then CD
vibration, optical fibre quality, etc. will not affect sound quality. However, there is evidence to
suggest that the optimal conditions for re-timing are not always achieved, leading to a
proliferation of rather "band aid" techniques that are not cognate with the philosophy of digital
systems, but nevertheless span the eternal triangle between objectivity, subjectivity and
psychology.

In part 4 heavy oversampling and noise shaping will be presented in more detail, with
applications to both ADC and DAC. Also, the relationship of delta-sigma modulation to noise
shaping will be described and the 1-bit DAC introduced as the ultimate gateway, even though it is
30 years old and preceded virtually all of the modem DACs.

Part 4: noise shaping

Information theory [1] defines the information capacity of a communication channel as a function
of bandwidth and signal-to-noise ratio and supports the concept that noise and bandwidth can be
interchanged whilst maintaining a constant channel capacity. Two familiar techniques that exploit
this exchange are fm and pcm where the enhanced noise rejection characteristics at the expense of
increased transmission bandwidth are well known to listeners of radio and NICAM [2] television.

In digital audio, the source information is bounded by Nyquist sampling and the uniform
amplitude quantization of each sample, i.e. 44.1 kHz by 16 bit quantization. However, from
information theory, it can be inferred that this format is not unique and there exists a potential for

AES 10th INTERNATIONAL CONFERENCE T-25


HAWKSFORD

an exchange between sampling rate and sample amplitude resolution.

In part 3 we demonstrated that the sampling rate can be increased by interpolation, where
redundant samples are generated; consequently, by using the ideas of information theory, there
emerges the possibility of reducing the amplitude resolution of each sample without incurring a
loss of source information. There is an undeniable logic to this proposal, where the greater the
oversampling ratio, the less dependent the performance becomes upon an individual sample, so
enabling coarser quantization to be used.

An important clue to understanding this proposal is that the higher the oversampling ratio, the
greater is the fractional bandwidth in which no useful information resides. Hence, if samples are
more coarsely quantized, the extra re-quantization noise can be concentrated into this redundant
frequency space. The process of location and Spectral redistribution of re-quantization distortion
is commonly designated noise shaping and is the subject of part 4. Noise shaping coders [3] can
be implemented in both the analogue and digital domains and therefore carl function as either an
ADC or a DAC. In both cases, the coders operate with oversampled data and achieve shaping of
the distortion spectrum by applying local negative feedback to the re-quantization system; a noise
shaper is therefore a recursive coder.

The present discussion aims to explain the operation of a noise shaping coder and then to describe
some specific applications to DAC and ADC that include the delta-sigma modulator (DSM) [4, 5],
the two Philips DAC [6,7] systems and Bob Adams' 20 bit ADC [8]. Also, as hinted in part 1
[9], a brief comparison is attempted to link the non-linear distortion generated by a feedback
amplifier with the distortion residue of a noise shaping coder as, at a conceptual level, a similar
process is observed.

The canonic model of a noise shaping coder that is applicable to either an analogue or digital
realisation, is shown in Fig. 1. The non-linear, quantization characteristic Q is presented as a
unity-gain amplifier with an additive error sequence q(n) to represent the quantization distortion,
while the forward path has a differencing stage and a frequency dependent transfer function A.
The quantizer Q def'mes the output sequence quantization levels which are selected to be coarser
than those of "--u,_
input sequence, where the noise shaping coder then seeks to ',,.,,_,,,,_
...... ,,,_*'o
extra
quantization noise into the oversampled frequency space and not the audio band.

The output sequence y(n) is expressed as a function of the input sequence x(n) and quantization
distortion sequence q(n) as,

A x(n) + q(n)
y(n)- I+A I+A '

where, defining a frequency dependent, noise shaping function as Df = (1 + A)-1 and noting the
gain A >> 1 within the audio band, reduces the expression for y(n) to,

y(n) = x(n) + q(n)Df

This succinct expression gives considerable insight into the noise shaping process, where for
large values of A the inptit and output sequences are almost identical, differing only (in this
approximated model of quantization) by the additive quantization distortion which is frequency
shaped by DF a factor inversely proportional to the gain stage A. Since the noise shaper is
required to suppress noise at low frequency and relocate it at high frequency, the transfer function
A should exhibit a high, low-frequency gain that progressively falls with increasing frequency,
whereby Df is correspondingly small at low frequency and progressively rises with increased
frequency, thus shaping the noise spectrum.

However, since the stage A is part of a loop with unity-gain feedback, A must be selected to form
a stable system_ A basic solution that is common to most unity-gain stable operational amplifiers,
is to choose A to be an integrator, taking the form A = 1/jc0T,whereby the distortion shaping
factor becomes
T-26 AES10th
INTERNATIONAL
CONFERENCE
DIGITAL
AUDIOTUTORIAL

qln)
Quantizer

x(n) +(_ _ [Q_-__ I y(n)

Fig. I Canonic form of noise shaper showing quantizer with additive error.

AES10thINTERNATIONAL
CONFERENCE T-27
HAWKSFORD

cot
ID/ = , _- cot for cot << 1

(co is angular frequency and T is a time constant)

that is, at low frequency the noise shaping characteristic has a slope of 6.02 dB/octave which
mirrors the slope of A of --6.02 dB/octave. However, in practice this slope is found to be rather
low and requires extremely high oversampling ratios to attain an acceptable low-frequency SNR.
The problem is addressed by increasing the order of the loop, that is the number RN of cascaded
integrators, whereon the noise shaping slope now approaches 6.02R N dB/octave with
co_esponding improvements in coding performance. However, forming A directly from a
cascade of integrators does not meet the stability criteria. Consequently the transfer function must
be tempered by the inclusion of (R N - 1) zeros to shape the high-frequency transfer function to a
first-order response. The modified system can be implemented using (R N - 1) feedforward paths
[10], where in Fig. 2 a noise shaper of order R N is shown configured as an ADC with a quantizer
Q using a back-to-back flash converter and DAC, where the flash ADC output forms the digital
output signal in a noise-shaped, oversampled format.

The noise shaping coder can also be configured as a DAC [11] though the loop processes must
now be converted to arithmetic operations, i.e.:

(i) A digital accumulator or integrator is formed by applying unity-gain, positive feedback


around a one-sample delay, whereby new data is added to past values to produce a
summation of sequential samples.

(ii) The re-quantization of data is performed by arithmetic truncation where numbers are
reduced to the required precision.

The transformed noise shaper is shown as a digital system in Fig. 3 where the digital integrators
and re-quantization process that reduces the amplitude resolution of the input samples can be
observed.

The noise shaping function for this system can be Shown to be,

,
where fs is the Nyquist sampling frequency, R the oversampling ratio and R_ the order of the
noise shaper. The form of the noise shaping function, Df, is shown in Fig. 4 for R N --- 1 to 6,
where the following points are highlighted:

(i) for f << Rfs,

= [2_f] P'_
IDfl [Rf]

and shows a similar slope to the analogue model

T-28 AES 10th INTERNATIONAL CONFERENCE


DIGITAL AUDIO TUTORIAL

t-
o o ;3)
"Ti c- 0 _'-_
'_' '6_ -o_
hO ¢- g') _(.IO
2:3
Z m C
_O. _- cD
g m
c
g- e
__.

Z"n

Z' *' ?iji:i


:0 ::::i.'.i?:'
:,. . .-.:?:

o _

= g!i
_ ..:-:,.

9'- iY'.-".:
o- iii!.:..-:i
2c'
3p,.

0
:_ i_
-n¥:.:i

m
:3

_:-_:_i:._.:_.: :!_i_..:,::4:!: i_ .:.':

iiil.-":-.._l',
."3 I_._?.':_:_:_:ll
_, ()i_ li(_ '_

_._¥'.-".-."!!_!_._'::!ii
.s::::::.-::.-.?.-..?-.?._--..%'.'...'.e:.:.:_?.-.'¢'_:_:_:_::..-'::
::-.-.::-:i:_.--:_._:::::.,:.,x.-.,-._
_i_¢_./..':..._!..:-i!?.".._i!i!:....-'
! _'-_-_-:'.'(_-:_:-_..'-._._:-.-_-_-_.::..-..':_-._'_-
.:.:.:.:..'_?_._..-_..-_?.:.:_f...._.....'

o 0
I
C: --

AES 10th INTERNATIONAL CONFERENCE T-20


HAWKSFORD

C C:: (A _ g3

o 3
.-1 m"o
'__o.
C

"19

Z
0
II>

:.:.:.:

'0. --a
" e iii-i
9:'::
:.:.:.:

>
0
o

cz)

·.< --..a

....a

o > iiiii
..-a

FO

0
0

n :_i_i::_
3 :Ii ·.l* _l
0 *,q*C
'_"<_-o..- --

?-. o

._ o "" g o $ _
-,,o ' _

J2_:.il.
'_' _ i_i!!:':,,.':':..
::i'
:':;_:
o'

o
C :3

c o
,-,to
c

T-30 AES loth INTERNATIONAL CONFERENCE


DIGITAL AUDIO TUTORIAL

i ·

4- _.4 l ............................ l
.................................

.t.. g-_ ' i.' 'L '..'. 'i'i i'.i' '.'.

0 SAHPLES 51 0 SAHPLES 512


Fig. 5 Time response of RN =1 and RN =4 noise shapers,
I/P =,f_20kHz, output signal spans -8d to 8d quanta.
Note reddced activity when loop order is tow.

...... _ 40dB
i I I I I r ...:_-_-'
l - 0dB

_ - ....--...,,._,j.

__-- ,., -'_.-" I 133


· , ---' ""' -B0dB -o

.-- .--f
' .dAd'"'
'i -160dB
--
fsi200 frequency fsi20 f t6 fsi2
Fig. 4 Distortion reduction factor JgfJ against
frequency for RN = I to 6.

AES 10th INTERNATIONAL CONFERENCE T-3I


HAWKSFORD

(ii) for f = Rfs/6, IDf l = 1 for all R N

(iii) for f > Rfs/6, I Df I > 1, that is, the noise spectrum is actually amplified

(iv) Df is maximum at f = R fs/2 where

IDfl = 2Rs

All the observations on noise shaping coders have been based upon linear analysis and the
assumption that the non-linear quantizer can be modelled as an additive process. To validate (or
otherwise!) this approach, a computer simulation is required to model the noise shaper sample-
by-sample and to include the precise non-linearity of the quantizer. Fortunately, the general
predictions of linear analysis are confh'med and the trend in the noise shaping characteristic
established. However, one area where the computer response may initially appear to be at
variance with expectation is in the time-domain structure of the output sequence, particularly
when R N > 2. As this is a particularly interesting observation, the time response of a R N = 4
noise coder is shown in Fig. 5, where the sampling rate is 8.82 MHz and the input signal is a 20
kHz sine-wave with an amplitude _ of an output quantum d, i.e. the amplitude is just greater
than the re-quantization interval and represents a significant reduction in sample resolution. The
computation reveals that the output sequence is noise-like and spans a range of-8 d to +8 d
output quanta (i.e. 4 bit), and is considerably greater than the input signal amplitude. Hence,
assuming 16 bit source data, the input samples have approximately undergone a 16 bit to 4 bit
transformation. On initial scrutiny, the loop would appear unstable. However, further
computation reveals that the signal activity is non-divergent and that when the output sequence is
bandlimited to 30 kHz, the input signal is revealed and is coded with a SNR well in excess of 100
dB. The random output signal is a form of chaos and is a consequence of combining non-
linearity within a feedback path and simultaneously forcing the finely quantized input sequence
through a coarse quantizer. On second scrutiny, however, the observation of output noise that is
greater than the re-quantization quantum d should not be too surprising, where a glance at the
expression for ! Df ! shows a gain for f > Rfs/6 with the -8 d to +8 d quantizer output range
closely matched to the product of IDf I at Rfs/2 and re-quantization distortion error of+ 0.5 d.

The 4th-order digital noise shaper, when oversampled with a ratio of 200 can therefore form the
basis of a DAC using only a 4 bit output code (i.e. 16 reconstruction levels) and since chaotic
activity forces the output to span almost the full range of the 4 bit quantizer, irrespective of input
signal level, hardware errors in the DAC tend to manifest themselves as a noise-like residue rather
than non-linear distortion. The system also yields a theoretical audio band resolution well in
excess of the 16 bit by 44.1 kHz format so, combined with an appropriate digital interpolation
filter, the inherent ability to decorrelate hardware-related errors at the digital-analogue gateway
and the minimal requirements for analogue filtering, this technique could prove appropriate for
high resolution DAC applications with an inherent ability to code signals deep into the noise floor
(providing the source information was correctly dithered). However, there still remains the
problem of sample jitter which can degrade SNR because of the large inter-sample changes
associated with this class of noise shaping; we will comment further on this error mechanism
when discussing the 1 bit, DSM DAC.

The above example of noise shaping is a more extreme case. However, the same principles can
be observed in the more familiar x4 oversampling system employed in the well-known Philips
DAC [6]. This converter uses a combination of oversampling and noise shaping to enhance the
performance of the DAC and signal recovery operation.

The problem of truncating the output data from the oversampling filter was introduced in part 3
[12] where the need for caution was emphasised to minimise degradation of low-level signals as
they enter the noise floor. The Philips system approaches this problem by using a lst-order noise
shaper similar to that in Fig. 3, where the coder input is derived from the 120 coefficient, FIR
oversampling filter (data typically expressed as 28 bit numbers at this stage) and the quantizer Q is
matched to the characteristics of the output DAC, i.e. for a 14 bit DAC, Q truncates from 28 bit to
14 bit while for a 16 bit DAC, Q truncates from 28 bit to 16 bit. In Fig. 6, the digital noise

T-32 AESloth INTERNATIONAL


CONFERENCE
DIGITAL AUDIO TUTORIAL

r,.Q' ..0 ':i:i:i:i:i:i:i:i:i:i:i:i:i:E:i;i&!E:-:i:i-:.::F:i:i:.::i:i:i:i


c _'-iiiiiii'i?iiii_'ii'ii_.:ii!
':'!ii!.:i!!i??_!_:'::'Ei'":i:_
:" '

o_ _"
n, ._E'-:'_::"-':"-ii'"'-¥-':':'_"'"::':-"'::'_---¥
.:!i_i!i!i!_:'".i!iii:.:'_
.:...-...,.-.:.:.:.:.:.:.:.:.:..
:' I _/ ! 8 -_.0_ _ o
-.
"'3 : ::::::::-: :::: ::
"0 _ :-:-:.:.:-:-:-:-:-:-:.:-:
.:. ::3 ::::::::::::::::::::::::::::: 0 / · 3 0 _' = = -- n
:3- -E:[:i:!:i::'-'.:E::
_:_o E:':EE:.-':E:::::_:::::::::: _ / -- = = = -.
--.._: :3 !:':!:[:_:.-'&:!::.:_:_:_
:i: _ :::::::::::::::::::::::::::::: o, '_, · o _ _ _3 =

u_ ':':i:i:i:_:i:i:-E:i:i
:_:= E:'-:i-:i:i:i:i:
.......... ---E:': (ITu' ._. 5' _- ",,I
[i_ q/ &_.v,,_ _ -,-o -o E
-'-'- ,,, iiiii!=,i[ii[[i::m[_!ii:::::g::iiii?:ii::%!ii L -,-q =-_ - o_

0, ,_ _!..'_o
_'_----_l:_ii::::ii::iii::::_iiiii_i::!?:!i?:
_i!i_ i!!::i::i !/ ·
,_ _,.o,
_ mE._
3
u_
0 "' _i_::_!:
O. _:_'""_----:_:_i:
'_'"""--':_ [ii::_i_ e ii!::::::::
c _ '_'i!
m !i:'_<'--':--iiiiii:
t""_:'_:_":_!_'_--_.[!jl
!_i..'.:.:_::._
i / ' 3 9 _"
(_ ::::f0 :':i:i...! :-: :::::i:: _'""""
m c "-ii--:i:i:i:i:i:: + .:_
..... _:::i_ii_iiii
0 _
_ _"ii
_:-'::__ _i
:i::::'--"
'"_--E
':" I:: _i_ilE ':: 'o'":':iiii iii.-":-ii::i?if:i::ilZ:::.=ii
-:i::_:::::E:_-': iiiii_!iiii
"_ ':':': _ '"':'__--::i::::':':':':"
_!.-'.-'_,
i_ '"":':':':':_'"':':'"
' _:_;-_i ............. =========================
_c _-_-_:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
'0............ ::i::::il
-n!iiii::::ii
:=J m '!_'!_
:::.-::: ii:iiiiiiiiiiiiiiiiii_"'_::_:_:
_ :::::::::::::::::::::::::::::::::::::::::::::::::::
::ii-_'-=:i:i::':'-:_ :::::::::::::::::::::::::::::::::::
....................
i:..:::._i :i:o
_c c_ ::::: 'ii-.:,,' i:!:i:':!:!:-:!:i:!:!:!::':E:-:':!:.:E:!:!:i:':i:i:i:i:i:i:i:!:ii!ii::.:-:-:
:.:.::iiiiiiiiiiii ::::_
C= !iiiiiii'
-- ::::':1_ :::::::::::::::::::::::::::::::::::::::::::
':::i_ !: aL o' E'ii

_' _ :'ii."ii_- 'wi


o. ! i m -- i:i:i
_ ::? '."i-iiiiiiiiiiiiiiii'iiiiii:"i:,.'!.:_ii.:i!?_
:ii?::g_i??:?:i::iiii?gi?:ii::i iiiiii?,ili!il
!iiii!iiiiiiiiiii'
!i:L::::::::.:.,.,i iiii:iiiii
']:."E-_i:i':::E:iE:-:!::
.._:E:]:::i:i:i::':i-:i:':i:i:i:i:i:]
:. .-i:i:i:iu_ E:i:i:i:

0
'0 ®
__ .??.:._::_:'
_
r-_
':':i_J :'-:;-:E:i:i:i:::E!':':'i'::'":_:E:i!_E_"?:_i]i?i.':E::
::::'::-'.'::i:::':::i:::'::';!:
E_:''':!!
::
::::::::::::::::::::::::::::::::::::::::::::::_I E:_
'_:
'-- :.!.::._!.!!:
::'.::F
_::i::':'
_E!'."_i :::::::::::::::::::::::::::::::::::::
i--:iii :i:i:i:'::E:i:i":i:::
"":": i:.:_E!:E:'"' iii:iii!i ii?iii:i:::::: ::::::Ei?i]i]i_'i:
":0 _E]!
!'''''_'''_'''''_:':_:_i:i:i''r'''''_:'''''''_''''':'''''''[-_E::_E:__ i?_i:
::':i::
0.. :_!:'_.,'
." '_.._'_i_'_.-'i_-'-.:?:-_'.:;_-_ii_:i:E-_i:il
-_-i."-' ..:-i-i-i-i-'-E-i-i.i-i-i-i
_:::! '_:i:':'::..:':?:E:_::'"'"""'"" ' '"'"'"'"" E:_:::::_:__ ':::::
_' cr
_ _-_:i_`_`_i--`-_`_--_i!:i:i:i:i:i!!:_`-_:!!_!_i!:_:_!_!_i_
!:_:.::i_i:_:i:_:iE:_:_:.::-:i;
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:._i:-:!: iii::i'"_'
..'..-ii'.',,
:i-'.':_'_:_:_E:._._!:':i ':':':'_:"_:-'_-__ '_::_:.
_--_-_-!i
........ '-<.m _-!_
::3' 0 '"":%::_-_'.':._i:E:i:E:i:i:i:i:i:i:i:i:i:i_:::i:i:? _':::': :':': :':':" ======================================
:::':::::::::::::::"' _ :::::'
'...':':?
.... ...:_.:.':?':_::.-'::fii:i:i:i':_i:'iiii'.:i
i!'!."'!i!i:]-."._?!i[_EEi!iii-':
_ _i!iii?_i

_' I.
I_-_'""_--?_
....... E::::
_i::ig::iii?:::::::g::?:?:!!::g?-:_--_!::_ii_i
I -,1 _!:_":":'iiiii!g_
_:.. ::"-'=====================================
ii(] ::' :::::::::::' :::::_.:::::'-
::::::::::::::::_
'E_::::::_i_::_i_i!i_!_!i_ii::?:_ii!_?:i_?:iii::::0'3
:::::::::::::::::::::::::::::
....................
::::::::::::::::::::::::::::::
....................
-o ::::::::::
_i::::i
?3:.::::::
s:i:-.'ii.-':i_
iiii"_.:E.'iiii'ii.:-:E.'i.:._
i!:'!i!.:!_i ':i:3'_ ':-$i:'
::::::::::::::::::::::::::::::::::::::::::
:::--::
::::::::'-::':[:_:
C :.":i_:_:::
::'::_:::_
.o ii_!.:'.:::!
..':'_!ii!
_!!__
iEEEE_:?'?:_::_i'i_?.::dE_i!i_iS:::i!!i_i_i!
_
C. ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
_ :::::::::::::::::::::::::

--1

6
_- m ?:iii::i::i::i
_ r'-
<
--, _ :_::::::ili!::!
0 _

_' _ ........... o.
m o :_:.:.:--.
:7' (D i:i:i::CD _ ':i:i_
·-_ _:.:.:.

=r :::-iiiB. ii::i::i::
B. i :;:;:;:

"O i??'.-'_
m E:?:m
0 '_ -"_ -
_ ii_?:ii
_o.
f,_ :i:i:!:
_' _' _ :':': i:i:i:ia)
::::::; _ __ _ ::::::
..
o_ '_ _ iii- :i:i:i:o_ c° _ _i_.::'
_' o_ _!:.'.': ::::::::=r
o' ?'-'.-? _????O cr
-. :?::??
:-.'v

'[[ii[[

-'??:_0
--.

2O
Z
U

AES loth INTERNATIONAL CONFERENCE T-33


HAWKSFORD

shaper in Fig. 3 is reconfigured for RN = 1 to reveal the equivalence with the more familiar
Philips format, though conceptually the systems are identical. The Fig. 6 configuration shows
that the feedback path carries a signal related to the 28 bit to 16 bit (or 14 bit) mmcation error of
the quantizer, where topological parallels can also be drawn with error ¥eedback distortion
reduction in amplifier systems. The rationale for choosing x4 oversampling is evident: not only
does it offer about 1 bit enhancement in resolution from oversampling, but the noise shaping
advantage yields approximately a further 1 bit enhancement and cleverly deals with the mmcation
distortion between oversampling filter and DAC.

However, it is also interesting to observe that a noise shaping advantage is only achieved for
frequencies < fs/6, i.e. 29.4 kHz (though there is no information above 22 kHz), and that the
main improvement occurs at lower frequencies since the noise spectrum slope is 6.02 dB/octave;
hence, the advantage of using a true 16 bit DAC rather than a 14 bit DAC in this application.
However, in observing the noise performance of the first-order, noise shaped DAC, the psycho-
acoustic advantage implied by the Fletcher-Munson hearing characteristics should be noted,
which shows that at high frequency, significant extra low-level noise can be tolerated particularly
if this is in exchange for lower noise in the critical mid-range band. Nevertheless, this advantage
is only realised if the DAC has an accuracy greater than 14 to 16 bit resolution (and ideally
approaching 18 bit +), it is no panacea for DACs with poor linearity, glitch or "class B" type
distortion where the MSB changes at low signal levels. It would appear that, providing the
conventional 16 bit DAC can exhibit a Iow-level resolution commensurate with 16 bit, there is
little advantage with this class of DAC in exceeding R = 4. Indeed, the errors encountered with
finite settling times and slew-rate related problems and sample jitter could actually lead to
degradation as the oversampling ratio is increased. The real advantage comes when oversampling
and noise shaping work together to reduce the DAC resolution and chaos is used to de.correlate
hardware-related non-idealities by allowing, within bounds, all the DAC levels to participate in
the conversion, irrespective of audio signal level.

There is a class of high oversampled and noise shaped coder where the quantizer Q is restricted to
only two levels, thus forming a serial output code that is a binary sequence. This class of coder is
called delta-sigma modulation (DSM) [4, 5] and was introduced in 1962 as a derivative of the
historically significant delta-modulator circa a patent in 1947/48 [13] and de Jager's [14] work in
1952. However, the two-level restriction on Q interacts with the closed-loop transfer function,
requiring A to be limited to either first-, second-, or with some care, a third-order system,
otherwise irreversible non-linear instability results.

The attractiveness of this scheme is the simplicity of the digital-analogue gateway, where in
principle a single binary gate can be used together with a modest analogue-signal recovery rfiter.
The DSM is representative of the limiting case in transforming a multi-level signal to a more
coarse quantization, where the coder generates sequences of 1 and 0 pulses which, when
averaged, yields a close approximately to the input data within the audio band. Since the rate of
occurrences or density of 1 and 0 pulses is made proportional to the input signal this class of
converter is sometimes called pulse density modulation and because the output sequence is in a
coded-binary, serial format, the global name bitstream is gaining favour. To understand the
process of DSM it is constructive to draw a parallel with frequency modulation (see reference 4,
part 3 ). Imagine the input signal modulates the frequency of a voltage controlled oscillator, where
for zero input the frequency is 0.5 of the DSM sample rate. At each positive zero crossing of the
fm signal a constant-area pulse is produced, thus making the rate of occurrence of pulses directly
proportional to the audio signal. If the pulse sequence is now averaged the recovered signal is
proportional to the pulse area-hence input signal. However to quantize the signal, the pulses must
be constrained in time by relocating them to the nearest sampling instant which occur at a rate
equal to the DSM sampling rate. That is, the time axis is divided into a regular array of time slots
where if a pulse (from the VCO) is produced within a time slot, it is then relocated to an instant
coincident with the end of that slot. Consequently, a first-order DSM can be modelled as time
quantized frequency modulation [19].

To demonstrate the behaviour of DSM, Fig. 7 shows both a first- and second-order digital DSM,
together with example pulse-output waveforms for a sinusoidal excitation, where the density of 1
- 0 pulses is seen to follow the input amplitude, though the time quantization of samples should
be noted. Observe how the second-order system introduces more energetic combinations of 1,0
T-34 AES loth INTERNATIONAL CONFERENCE
DIGITAL AUDIO TUTORIAL

Analogue i!!ii
I =Loop filter, A :iiii[iiii_!i!i ii..Qu.a, nt??rLQ'_'_""_'"
""_::_:_iiiiii!!i?!iii 1-bit
inpu :iiiiii!ii
?_ii_i _'_:__D
........................ Q !?_¢i!i!! Y1 digital
X _.............
-*:::*:::::_::'_:_
......
_i_!_':_ ......
_!_i ::::iiii_;ili::/ o u t put

'_:i ''¢}i}iiiii!?i:i
!iii_::iiii'i::iii!??:iiii?:i
iliiiiii:ii!{ili
feedback
i 0 patii?::_ O iiiii
iiiili!
(b) 1st-order Delta-sigma modulator (1 integrator).

iii iiii!_!iii!ii
iiii_iiiii_i_:_..;_?.?'.-?..;::_
iliQ u a nt se r, Q _iiii_iiiiiiiiiiiiiiiiiii_ilili
ii¢ii!!!i
Analogue }!1_°'-'-_"-_¢"'"
:_:''",_....._
_'"'""_-'__-....
'.:_:u::;:-;: · ;_;_::di_;i;!_::¢_;;i::i::::::?:i¢::_i¢i._i_i¥'_..'1
i:_:_;!::.';.:;::
_:!:::::: _k.-.::.,.::._,_,;:.-.:,::.-._.-._,.,,.,..-..-.:.,._¢;i!¢¢i
'"';,'.:;::::::::i:i:-_::'_
_i:_:":'::::::::::"::::::::_::::::::::::
:i:i:_:::
::: :::;_:
::::::: 1-bit
input _ _¥:.:_.:.:?i_r._!,,'.,,_i'_'_---_ii,.,.'¢
_":_:..?i_i_ii:::_,-;.._...:_!_i_i,::.-"*_:'"-':_r_
_k:_: Y2 digital
X :':::! ......
_iii!!iiiiiiiiiiiiili
......
_iiiiii ......
_i::ii_iiiiiiiiiiii iiiiiiiiiiii I output

o
(a) 2nd-order Delta-sigma modulator (2 integrators).

First-order detta-sigma modulator output pulse


waveform for a sinusoidal input signal. Note that
the output pulses are 100% duration, also observe
how Iow-amplitude inputs are poorly coded where

'"" ' II[II_L_ _'_ - rJ I_ the


-0.8 Output sequence fOllOws a '''010101''' pattern'

i waveform for a sinusoidal input signal. Note that


the output pulses are 100% duration, also observe
Second-order delta-sigma modulator output pulse
, how Iow-amplitude inputs are encoded more accurately

by the breakdown of ...010101... sequences. Input is

identical to first-order case.

Ii!Iii [][iF _ where there is evidence of greater pulse activity

UUUU J [ UAL__ J,UU


o,
Fig. 7 First- and second-order delta-sigma modulation
with example time domain output sequences.

AES loth INTERNATIONAL CONFERENCE T-35


HAWKSFORD

pulse patterns, which when averaged by the reconstruction filter give improved signal coding
particularly at lower signal levels.

Providing a high sampling rate and high-order noise shaping can be combined, then DSM offers
an extremely high-performance potential both in ADC and DAC that overcomes the accuracy
problem of the least significant bit which, as we have discussed, is a major factor in determining
the low-level resolution of a digital system. However, it is important to realise that, as with PCM
and dither, DSM has been in existence for a number of years and has been researched in depth for
all manner of electronic applications from ADC, DAC to communications and even motor speed
control. What is now new is the ability to use VLSI to integrate this system where a sophisticated
specification can now be achieved at low cost (circa £15 per chip in small quantities). It is here
that digital audio is gaining the initiative as it represents one of the most demanding applications
of conversion technology.

Philips have recently introduced a DAC [7] (device number SAA 7322 and later SAA 7323) that
incorporates 2nd-order DSM in association with multi-stage interpolation to achieve sampling rate
conversion from 44.1 kHz to 11.2896 MHz, representing an oversampling ratio of 256. The
system schematic and process flow diagram are shown in Fig. 8 and incorporate a x4 FIR
interpolation filter, together with two further stages of x32 and x2 interpolation using sample-and-
hold filtering that are each equivalent to a FIR filter with equal coefficients and therefore eliminate
the need for multiplication. The signal processing includes a digital dither source of 176 kHz
added digitally at a level --20 dB after the x32 interpolator to reduce distortion due to idle-channel
noise [15], which is a quasi-periodic bit pattern distortion affecting DSM coders at low-signal
levels by producing whistling and buzzing noises. At low level and without dither, the bit pattern
approximates to an idling sequence ...010101... occasionally breaking into ...0101101010... or
...101001010... type patterns, which because of the infrequency of the ... 11... or ...00... pulse
groups, produces audible howling sounds; the addition of hf dither randomises and reduces the
period of these patterns thus making them less audible. The x4 interpolator claims a _+0.035 dB
amplitude response ripple with -60 dB attenuation above 24.2 kHz where amplitude response
compensation for a 3rd-order Butterworth analogue filter with a -3 dB break frequency of 60 kHz
is also included to fine tune the in-band frequency response. To date a number of manufactures
have designed DAC systems using bitstrearn/_SM technology where probably the most
technically logical is that designed by Bob Stuart. In this system the left/fight channels of two
bitstream converters are reassigned to each channel and decode left/minus left and right/minus
right respectively. 3rd-order, passive filters then bandlimit the signal directly after each switched-
capacitor integrator, whereon two precision differencing amplifiers form the matrix (L)-(-L) and
(R)-(-R) and second as output buffers. The principal advantage of this technique is that power
supply noise internal to the device itself forms a common mode signal that is rejected by the
output differencing amplifiers rather like a bridge configured power amplifier powered from a
common supply, the system is therefore inherently less supply dependent and probably more
tolerant of ground rail related errors.

A problem common to all oversampled DAC systems is the deleterious effect of sample jitter on
SNR. The effect of sample jitter is exacerbated by the common expedience of using 100%
samples during signal reconstruction. Consider the simple example of a bitstream coder in the
presence of jitter as shown in Fig. 9a, where the jitter produces an error in the area of adjacent
pulses. Obviously for a sequence ...111... there is no problem, however for a sequence
...0101... the effect of sample jitter on reconstruction sample area is maximised; in practice what
is required is for the sample area (whether 1 or 0) to remain constant even in the presence of jitter.
A solution to this problem is to reconstruct the pulse area using a leaky integrator circuit that
generates an output proportional to both the pulse amplitude and period but where the integrator
nme constant is also proportional to the sample period; thus if the sample duration is, say,
increased by jitter then the averaged, integrator output remains unchanged. In the Philips
bitstream converters a switched-capacitor network Shown in Fig. 9b is used to realise an
integrator with a time constant proportional to the instantaneous clock frequency, where this
technique can dramatically reduce the noise related to jitter. In effect the switched-capacitor
network replaces a pulse by a signal proportional to a constant charge - irrespective of the pulse
timing - thus the reconstructed pulse area is related to a 1 or 0 pulse and not the jittered pulse area.

In thefuture, higher-order DSM coders may emerge where the multi-level to 1 bit conversion is

T-36 AES10thINTERNATIONAL
CONFERENCE
DIGITAL AUDIO TUTORIAL

i!!iiiiilf'i::y;3:_ii:_?
! ':i:i:::_i:i
:':?_:;ii_iiii_:iiiii!!ii!i_:iiii::_ii:;iiiiiii
3 :.":i:i:i:!:!:i:.-':i:_:_:!:_:_::'_:_
_ :i:
· i:i:i:i:':i:i:i:!:i:':.:':::':.::':i
'0 :::
,.;.. _:._i:-'_iiiiii_!iiiiiil_iiii.:'::..:iiiiii::i::::?:
._. i_.?'iii_':._iiii:"iii-i.:i:':'iz
_- -.=i

_" iiiii-_i:-_i::ii_ii_:{::?:
_' i.-'i
3>
0 ':':':':':':'"':':':':':':':':':'"':':':':':'"':':':': -'"

o-

Q.
' _'. !i
0 _'::':

= _iii

,,,<
ii
'-I

ro
ol

o
<

.m

ro

m- m.
Il>

O
ea
c-

o o

8
8 _

AES loth INTERNATIONAL CONFERENCE T-37


HAWKSFORD
nominal
period=T v
D-
ONE PULSE

ZERO PULSE
.Dt i

ONE PULSE

°- AREA ERROR ZERO PULSE


I--
iming
-D-- _T
I error= 'i;
I pulse area error over two adjacent samples = 2D'1; I

Fig. 9a ...10... pulse sequence with timing error (jitter)


C4

II
CL P+ R1
I 01 I

'r"r iii M 'm' +_


Input CL I_1 II_ _k P' ._.C3 Output
data _L

CL P-
I C2 I
'r-r II 'r"r

_1_

data 0 0
I I ]
T 2T 3T 4T 5T 6T 7T 8T

_-_1-1 I-_ I-1_


_-II n N_
Fig. 9b Switched capacitor network used for signal
reconstruction in DSM DACs, after Naus et al (7)

T-38 AES loth INTERNATIONAL


CONFERENCE
DIGITAL
AUDIOTUTORIAL

I ·
15-4 _ 4-bit digital
Precisionresislor 1 5, ENCODE _ output
network

reference
levels
-_ Precision
flash I
CMOS
BUFFER _ 4-bit
:onverter_ analogue input to

4 flash converter
0.o1%
Back-to-back ADC/DAC f
._::::::::::::
::::::::::::
:::::::.<-<:
::::-'--:.-:
:::::>.-'--::_:<'
.'.'_:
::_¥::._.-:
.'::::
::::
:::::::
::::::-:K:-:--'
':-:-':::::
::-'--:-:-::::.<-:
:±::::::::::::::::<.<:
:':::::'.¥::::
::±::-':::-'.<::.-:
':-:-:--'--'-_
:::_-:-:-:-:
<-:-::<<-:
':-:-:-.-<.:.:.:-x_:
:.>_-.
?-...
?-:_._'..::_

::ii!
.-'.':-ii
=:;'-.'-.'":.
!:=
:9
.-=,_!_
_i_=
_.".-_[
';.":_<':"-:_
i.:.:'_i
_"-='"--
:_?:.'-'
!:.-".-.:i_[i
[.:
:;:'"_
!.-'[
..-.-'..-i_
;..-..:::-:%--;::-.;.:.,_.,
%-'..--_.-'_
...-_'''-.'.·.
,..-.'--'--.'_[-[
_..-'-_
...'.;?
..',.-.-'.:.:-..=_[-[_..-'
.%?:-.-',:
:':-..':_...,._--'-_
?--i'{'.,_:_i'.=-:-._-_.=.::-.=-:
:-:=
.=-..'
'-:':_-..'-_.i-?.=.=:_[.,.=
.%=
._
_.%'..=_
_j....,.=...-[_[_

_!!!iiiiiiiiiiiiiiiiiiiiiiiiiiil
...... _:

i!ii!_iiiiiii!iiiii_iii_iii_i_ii::_}_.`./...._i_i::ii_iiiii::iiii_ji.ii::i.ii
Loop transfer function ii::::iiiii..:....::::::i?:_.ii:`ii::::ii_::i!i!ii..:...?:i!.?..:Diii_iii_:.:...:._!i_i
i_i_!:_!_!:!_..`:!_!_!_!_:.=i_is._=_._:_?:._!i.::...%_._!_:._._:._:_!_!_i!_iii._
equ,va,ent 'o A ,n F,g. 1 _i.=-iii!i!ii!-i.=:'.:i_.:=.!i!.-'.=_?i!i?!i?i.:i!i:!iii_i_j!::=i;:_
:::::::::::::::::::::::::::::::::::::::::
_==ii=
=:
:.._.'
_ii::
!::_jiiiiii'-_i::
iii.-:::::.-"i!iii!::i!=:
_=:
:..':.:;_=
=.=
=.::
:_::>:!:_:_:
=.-':_.-:-:
:_:!::.-'_:
_:=_:_
=_:':.--.
=:-::?..-':
..=..-.:;=_=
=!_-="!ii
iiiiiii..'.."_
iiiiiiii_ii_!!
i!;_
_i!iiii
={i_'"
":"'"
'_="_'_;':_{
·--
' --

Fig. 10 Front end of dbx 4-bit ADC (higher resolution converter uses 6-bit
flash converter).

AESloth INTERNATIONAL
CONFERENCE T-30
HAWKSFORD

performed in two stages [10, 16], the first stage using for example a 4th-order, recursive noise
shaper similar to Fig. 3 while the second stage incorporates a non-recursive, look-up table to
convert the 16-output levels into equivalent serial binarypulses. Thus, the advantages of a 1 bit
code are retained, together with the more optimum coding of a 4th-order noise shaper which
offers no significant idle-channel noise and a resolution deep into the source noise
(commensurate with the correct application of dither at the ADC and desensitisation ofjitter at the
DAC). If 256 times oversampling were retained for the first stage (i.e. 11.2896 MHz with
respect to 44.1 kHz), then the second stage operating open loop would yield a serial bit stream of
about 250 MHz, which gives a measure of thepotential dynamic range. The two-stage approach
would also simplify the demands on high-speed digital processing circuitry enabling practical
circuitry using available device technology.

However, multi-bit oversampled and noise shaped converters also exist where a development by
Robert Adams [8] approaches 20 bit conversion by employing a front-end topology reminiscent
of the analogue noise shaper in Fig. 2. The front-end coder illustrated in Fig. 10 incorporates a 6
bit, back-to-back flash converter and DAC configured within a 4th-order feedback loop. The
system requires (virtually) no analogue filtering and performs anti-aliasing filtering in the digital
domain, where samples at 6 MHz are converted to 44.1 kHz (or 48 kHz on selection) by using a
4-stage decimation filter. The fa'st three decimation filters use the sample-and-hold approach, that
is, FIR filters with equal coefficients to eliminate the need for multiplication, while the final stage
incorporates a half-band FIR filter [17] which, again, saves multiplications, since every other
sample in the impulse response is zero.

Hence to conclude, using either bitstream or oversampled and noise shaped multi-bit technology
in association with VLSI, a substantial simplification in analogue processing can be achieved in
exchange for more complicated, but precise, digital processing. The high sampling rates
employed together with switched-capacitor circuitry do much to circumvent such problems of
analogue filter errors, sample-and-hold errors and jitter sensitivity and approach more closely the
theoretical boundaries dictated by sampling, amplitude quantization and dither in association with
the standard digital format of 16 bit by 44.1 kHz.

o _ t of 'th,_s,_
The ,-_-_-;_4n_r
,_sp,_c, _ "new, h;,,hh, _3;s_re_rraral_rt
aiz_sl.J p_nxt_t_re is thr, e;crnit_pontr_rtnoti_n in
oa_sssl &_._..z A_

potentially degrading analogue Circuitry and, by the nature of the internal digital architectures and,
ultimately, the one-bit converter, have the means to approach the theoretical performance
boundaries. As these techniques become more established and used with well-engineered support
systems, the tree potential of digital audio will be revealed. There is no doubt that the objectives
of near-theoretic, anti-aliasing filters and the opportunity to code signals deep into the noise
without non-linear distortion, even in the presence of high-level signal components, can be
achieved. Even if questions are raised about the sonic signature of interpolation and decimation
filters and the adequacy of the present digital format, the oversampled ADC and DAC systems can
always be connected back-to-back without reduction to 16 bit by 44.1 kHz, allowing these
questions to be answered in an objective manner using a precise frame of reference. However, it
must be remembered that although the systems approach may identify converters with potentially
better performance, the final results will depend upon the skill of implementation and the care
taken by the designer to minimise all secondary sources of error that arise when combining digital
and analogue circuitry.

Acknowledgement

Permission to edit and reproduce these notes from a recent series in Hi-Fi News and Record
Review by the editor Steve Harris is gratefully acknowledged.

References to part 2

1. Sheppards, W.F., "On the calculation of the most probable value of frequency constants
for data arranged according to the equivalent divisions of scale", Proc. London Math.
Soc., 24, partt. 2, 352, 1898

T-40 AES 10th INTERNATIONAL CONFERENCE


DIGITAL AUDIO TUTORIAL

2. Cattermole, K.W., Principles of pulse code modulation, Iliffe Books Ltd, 592 02834 8,
1969

3. Croll, M.G., "PCM for high quality sound distribution: quantising distortion at very low
signal levels", BBC Res. Report, 1970/18, 1970

4. Hawksford, M.J. "Unified theory of digital modulation", Proc. IEE, 121, (2), 109-115,
February 1974

5. Vanderkooy, J., Lipshitz, S., "Resolution below the least significant bit in digital systems
with dither", JAES, 32, 106-113, March 1984 (correction ibM, p.889, November 1984)

6. Blesser, B.A., "The application of narrow band dither operating at the Nyquist frequency
in digital systems to provide improved signal-to-noise ratio over conventional dither",
81st AES Convention, preprint 2416 (C7), November 1986

7. Vanderkooy, J., Lipshitz, P., "Dither in Digital Audio", JAES, 12, 966-974, December
1987

References to part 3

1. Hawksford, M.J., "The Essex Echo: Within these walls", HFN, part 1, June 1988, part
2, August 1988

2. Greenfield, R. and Hawksford, M.J., "Efficient filter design for loudspeaker


equalisation", preprint 2764 (M-8), 86th AES Convention, Hamburg, 1989

3. Lorentz, E.N., "Deterministic nonperiodic flow", Journal of Atmospheric Sciences,


Boston, 20, (2), pp.130-141, March 1963

4. Gleik, J., Chaos: making a new science, ISBN 0434 29554X, W. Heinemann Ltd, 1988

5. Hawksford, M.O.J., "Digital Discourse 2",HFN/RR, vol 35, no 4, pp 31-35, April. 1990

6. Hawksford, M.O.J., "Digital Discourse 1", HFN/RR, vol 35, no 2, pp 49-53, Feb. 1990

7. Vanderkooy, J. and Lipshitz, S., "Dither in digital audio", JAES 12, 966-974, December
1987

8. Oppenheim, A.V. and Schafer, R.W., Digital Signal Processing, Prentice Hall,
Englewood Cliffs, 1975

9. Rabiner, L.B., and Gold, B., Theory and application of digital signalprocessing, Prentice
Hall Inc., 1975

10. Crochiere, R.E. and Rabiner, L.R., "Optimum FIR filter implementation for decimation,
interpolation and narrow band filtering", IEEE Trans. Acoust., Speech, Signal
Processing, ASSP-32, 444-456, October 1975

11. Goodman, D.J. and Carey, M.J., "Nine digital filters for decimation and interpolation",
IEEE, ASSP-25, (2), 121-126, April 1977

12. Parks, T.W. and McClellan, J.H., "Chebyshev approximation for nonrecursive digital
filters with linear phase", IEEE Trans. Circuit Theory, CT- 19, 189-194, March 1972

References to part 4

1. Shannon, C.E., "A mathematical theory of communication", EST J, 27,379-423, 623-56,

AES 10th INTERNATIONAL CONFERENCE T-41


HAWKSFORD

1948

2. Caine, C.R., English, A.R., "NICAM 3: A companded pcm system for the transmission
of high quality sound programmes", IBC Conference publication 191, 1980

3. Tewksbury, S.K., Hallock, R.W., "Oversampled linear predictive and noise-shaping


coders of order N > 1", IEEE, CAS-25-25/7, pp 437-447, July 1978

4. Inose, H., Yasuda, Y., Murakami, J., "A telemetering system by code modulation - delta
sigma modulation", IRE Trans., PGSET 8, 204, September 1962

5. Inose, H., Yasuda, Y., "A unity bit coding method bY negative feedback",Proc. IEEE, 51,
1524, November 1963

6. Goedhart, D., Van de Plassche, R.J., Sfikvoort, E.F., "Digital to analogue conversion in
playing compact disc", Philips Techn. Rev. 40, (6), pp 174-179, 1982

7. Naus, P.J., Dijkmans, E.C., Stikvoort, E.F., McKnight, A.J., Holland, D.J., Bradinal,
W., "A CMOS stereo 16-bit D/A convener for digital audio", IEEE J., SC-22, (3), June
1987

8. Adams, R.W., "Design and implementation of an audio 18-bit analogue-to-digital convener


using oversampling techniques", JAES, 34, (3),pp 153-166, March 1986

9. Hawksford, M.O.J., "Digital Discourse 1", HFN/RR, vol 35, no 2, pp 49-55, Feb. 1990

10. Hawksford, M.J., "Nth-order recursive sigma-ADC machinery at the analogue-digital


gateway", 78th AES Convention, Anaheim, preprint 2248 A-15, May 1985

11. Hawksford, M.J., "Oversampling and noise shaping for digital to analogue conversion",
Institute of Acoustics, Reproduced Sound 3, pp 151-175, 1987

12. Hawksford, M.O.J., "Digital Discourse 3", HFN/RR, vol 35, no 6, pp 45-47, June. 1990

13. Deloraine, E.M., Van Miefios and Derjavitch, "Methods et systeme de transmission par
impulsions", French Patent 932.140, 1947/48.

14. de Jager, "Deltamodulation, a method of pcm transmission using 1-unit code", Philips
Res. Rep. 7, pp 442-466, 1'952

15. Wang, P.P., "Idle channel noise of delta modulation",lEEE, COM-16, (5), 737, October
1968

16. Hawksford, M.J., "Multi-level to 1-bit transformations for applications in digital-to-


analogue converters using oversampling and noise shaping", Proc. Institute of Acoustics,
10, (7), pp 129-143, November 1988

17. Mintzer, F., "On half-band, third-band and Nth-band FIR filters and their design ", IEEE,
ASSP-30, pp 734-738, October 1982

18. Hawksford, M.O.J., "Digital Discourse 2", HFN/RR, vol 35, no 4, pp 31-35, April. 1990

19 Flood, J.E. and Hawksford, M.O.J., "Exact model for delta-modulation processes", lEE,
Vol. 118, No. 9, pp 1155-1161, September 1971

September 1991

T42 AES loth INTERNATIONAL CONFERENCE

You might also like