Professional Documents
Culture Documents
Chapter6 PDF
Chapter6 PDF
Chapter 6
Basics of Digital Audio
Digitization
Digitization means conversion to a stream of numbers, and
preferably these numbers should be integers for efficiency.
Fig. 6.1 shows the 1-dimensional nature of sound: amplitude
values depend on a 1D variable, time. (And note that images
depend instead on a 2D set of variables, x and y).
Amplitude
Time
Fig. 6.1:
An analog signal: continuous measurement of
pressure wave.
Amplitude
Amplitude
Time
Time
(a)
Fig. 6.2:
(b)
Nyquist Theorem
Signals can be decomposed into a sum of sinusoids. Fig. 6.3
shows how weighted sinusoids can build up quite a complex
signal.
Fundamental
frequency
+ 0.5
2 fundamental
+ 0.33
3 fundamental
+ 0.25
4 fundamental
+ 0.5
5 fundamental
10
Whereas frequency is an absolute measure, pitch is generally relative a perceptual subjective quality of sound.
(a) Pitch and frequency are linked by setting the note A above
middle C to exactly 440 Hz.
(b) An octave above that note takes us to another A note.
An octave corresponds to doubling the frequency. Thus
with the middle A on a piano (A4 or A440) set
to 440 Hz, the next A up is at 880 Hz, or one octave
above.
(c) Harmonics: any series of musical tones whose frequencies
are integral multiples of the frequency of a fundamental
tone: Fig. 6.
(d) If we allow non-integer multiples of the base frequency, we
allow non-A notes and have a more complex resulting
sound.
11
12
(a)
(b)
13
(c)
Fig. 6.4: Aliasing. (a): A single frequency. (b): Sampling at
exactly the frequency produces a constant. (c): Sampling at
1.5 times per cycle produces an alias perceived frequency.
14
The relationship among the Sampling Frequency, True Frequency, and the Alias Frequency is as follows:
falias = fsampling ftrue ,
for
15
(6.1)
4
6
8
True frequency (kHz)
10
SN R = 10 log10
2
Vsignal
2
Vnoise
17
= 20 log10
Vsignal
Vnoise
(6.2)
18
Threshold of hearing
Rustle of leaves
Very quiet room
Average room
Conversation
Busy street
Loud radio
Train through station
Riveter
Threshold of discomfort
Threshold of pain
Damage to ear drum
19
0
10
20
40
60
70
80
90
100
120
140
160
20
21
Vsignal
quan noise
N 1
1
2
= 20 log10 2
(6.3)
Notes:
(a) We map the maximum signal to 2N 1 1 (' 2N 1) and
the most negative signal to 2N 1 .
(b) Eq. (6.3) is the Peak signal-to-noise ratio, PSQNR: peak
signal and peak noise.
22
23
24
(6.4)
(6.5)
(6.6)
(6.7)
(6.8)
-law:
sgn(s)
ln
r =
ln(1 + )
A-law:
)
s
1 + ,
sp
A
s ,
s
p
1+lnA
r =
s
1
sp
(6.9)
s
1
sp
A
sgn(s) 1 + lnA s , 1 s 1
sp
A
sp
1+lnA
(
where sgn(s) =
(6.10)
1
if s > 0,
1 otherwise
-law or A-law
0.8
0.6
-law: = 100
r : -law or A-law
0.4
A-law: A = 87.6
0.2
0
0.2
0.4
0.6
0.8
1
1
0
s/sp
0.2
0.4
0.6
0.8
Audio Filtering
Prior to sampling and AD conversion, the audio signal is
also usually filtered to remove unwanted frequencies. The
frequencies kept depend on the application:
(a) For speech, typically from 50Hz to 10kHz is retained, and other frequencies are blocked by the use of a band-pass filter that screens
out lower and higher frequencies.
(b) An audio music signal will typically contain from about 20Hz up to
20kHz.
(c) At the DA converter end, high frequencies may reappear in the output because of sampling and then quantization, smooth input
signal is replaced by a series of step functions containing all possible
frequencies.
(d) So at the decoder side, a lowpass filter is used after the DA circuit.
29
Sample
Bits per
Mono/
Data Rate
Frequency
Rate
Sample
Stereo
(uncompressed)
Band
(kB/sec)
(KHz)
(KHz)
Telephone
Mono
0.200-3.4
AM Radio
11.025
Mono
11.0
0.1-5.5
FM Radio
22.05
16
Stereo
88.2
0.02-11
44.1
16
Stereo
176.4
0.005-20
48
16
Stereo
192.0
0.005-20
192 (max)
24 (max)
6 channels
1,200.0 (max)
0-96 (max)
CD
DAT
DVD Audio
30
Synthetic Sounds
1. FM (Frequency Modulation): one approach to generating
synthetic sound:
x(t) = A(t) cos[ct + I(t) cos(m t+m)+c]
(6.11)
Link to details.
31
1.0
1.0
0.5
0.5
Magnitude
Magnitude
cos (2 t)
0.0
0.5
1.0
0.0
0.2
0.4 0.6
Time
0.8
0.0
0.5
1.0
1.0
0.0
0.2
(a)
0.5
0.5
Magnitude
Magnitude
1.0
0.0
0.5
0.2
0.4 0.6
Time
1.0
1.0
0.0
0.8
(b)
1.0
0.4 0.6
Time
0.8
1.0
(c)
0.0
0.5
1.0
0.0
0.2
0.4 0.6
Time
0.8
1.0
(d)
Fig. 6.7: Frequency Modulation. (a): A single frequency. (b): Twice the
frequency. (c): Usually, FM is carried out using a sinusoid argument to a
sinusoid. (d): A more complex form arises from a carrier frequency, 2t
and a modulating frequency 4t cosine inside the sinusoid.
32
33
34
35
MIDI Concepts
MIDI channels are used to separate messages.
(a) There are 16 channels numbered from 0 to 15. The channel forms
the last 4 bits (the least significant bits) of the message.
(b) Usually a channel is associated with a particular instrument: e.g.,
channel 1 is the piano, channel 10 is the drums, etc.
(c) Nevertheless, one can switch instruments midstream, if desired, and
associate another instrument with any channel.
36
System messages
(a) Several other types of messages, e.g. a general message for all instruments indicating a change in tuning or timing.
(b) If the first 4 bits are all 1s, then the message is interpreted as a
system common message.
37
38
General MIDI: A standard mapping specifying what instruments (what patches) will be associated with what channels.
(a) In General MIDI, channel 10 is reserved for percussion instruments,
and there are 128 patches associated with standard instruments.
(b) For most instruments, a typical message might be a Note On message
(meaning, e.g., a keypress and release), consisting of what channel,
what pitch, and what velocity (i.e., volume).
(c) For percussion instruments, however, the pitch data means which
kind of drum.
(d) A Note On message consists of status byte which channel,
what pitch followed by two data bytes. It is followed by a Note
Off message, which also has a pitch (which note to turn off) and a
velocity (often set to zero).
The data in a MIDI status byte is between 128 and 255; each
of the data bytes is between 0 and 127. Actual MIDI bytes
are 10-bit, including a 0 start and 0 stop bit.
Transmitting
device
Synthesizer
Attack
Amplitude
Decay
Note off
Sustain
Release
t
Note on
Pitch bend
wheel
Modulation
wheel
43
OUT
IN
THRU
Master keyboard
THRU
IN
MIDI module A
IN
THRU
MIDI module B
etc.
Real-time messages
Exclusive messages
45
46
Status Byte
Data Byte1
Data Byte2
Note Off
&H8n
Key number
Note On
&H9n
Key number
Note On velocity
&HAn
Key number
Amount
Control Change
&HBn
Controller num.
Controller value
Program Change
&HCn
Program number
None
Channel Pressure
&HDn
Pressure value
None
Pitch Bend
&HEn
MSB
LSB
(** &H indicates hexadecimal, and n in the status byte hex value stands
for a channel number. All values are in 0..127 except Controller number,
which is in 0..120)
47
48
Description
Meaning of 2nd
Data Byte
&H79
None; set to 0
&H7A
Local control
0 = off; 127 = on
&H7B
None; set to 0
&H7C
None; set to 0
&H7D
Omni mode on
None; set to 0
&H7E
Controller number
&H7F
None; set to 0
49
B. System Messages:
50
Status Byte
&HF1
&HF2
Song Select
&HF3
Tune Request
&HF6
None
EOX (terminator)
&HF7
None
51
&HF8
Start Sequence
&HFA
Continue Sequence
&HFB
Stop Sequence
&HFC
Active Sensing
&HFE
System Reset
&HFF
52
53
General MIDI
General MIDI is a scheme for standardizing the assignment
of instruments to patch numbers.
a) A standard percussion map specifies 47 percussion sounds.
b) Where a note appears on a musical score determines what percussion instrument is being struck: a bongo drum, a cymbal.
c) Other requirements for General MIDI compatibility: MIDI device must
support all 16 channels; a device must be multitimbral (i.e., each
channel can play a different instrument/program); a device must be
polyphonic (i.e., each channel is able to play many voices); and there
must be a minimum of 24 dynamically allocated voices.
General MIDI Level2: An extended general MIDI has recently been defined, with a standard .smf Standard MIDI
File format defined inclusion of extra character information, such as karaoke lyrics.
54
55
56
c) The result of reducing the variance of values is that lossless compression methods produce a bitstream with shorter bit lengths for more
likely values ( expanded discussion in Chap.7).
57
58
Amplitude
Amplitude
Time
Time
(a)
(b)
59
60
61
62
63
64
Original signal
t
PCM signals
(a)
Amplitude
Amplitude
4
3
2
1
0
1
2
3
4
(b)
4
3
2
1
0
1
2
3
4
(c)
Fig. 6.13: Pulse Code Modulation (PCM). (a) Original analog signal
and its corresponding PCM signals. (b) Decoded staircase signal. (c)
Reconstructed signal after low-pass filtering.
65
2. A discontinuous signal contains not just frequency components due to the original signal, but also a theoretically
infinite set of higher-frequency components:
(a) This result is from the theory of Fourier analysis, in
signal processing.
(b) These higher frequencies are extraneous.
(c) Therefore the output of the digital-to-analog converter
goes to a low-pass filter that allows only frequencies
up to the original maximum to be retained.
66
Input analog
speech signal
-law or
A-law
compressor
Bandlimiting
filter
Linear PCM
Transmission
Output analog
speech signal
-law or
A-law
expander
Low-pass
filter
Digital-to-analog
converter
68
69
fcn = fn1
en = fn fcn
70
(6.12)
c Prentice Hall 2003
Li & Drew
(c) But it is often the case that some function of a few of the previous values, fn1 , fn2, fn3 , etc., provides a better prediction.
Typically, a linear predictor function is used:
fcn =
2X
to 4
ank fnk
(6.13)
k=1
71
72
Count
Magnitude
0.04
0.0
0.04
2000
4000
Samples
6000
8000
1.0
0.5
0.0
0.5
Sample value
1.0
Count
1.0
0.5
0.0
0.5
Sample difference
1.0
73
74
75
(6.14)
e5 = 22 26 = 4
76
(6.15)
The error does center around zero, we see, and coding (assigning bit-string codewords) will be efficient. Fig. 6.16
shows a typical schematic diagram used to encapsulate this
type of system:
77
fn
en
Predictor
fn
en
fn Reconstructed
Predictor
fn
78
DPCM
Differential PCM is exactly the same as Predictive Coding,
except that it incorporates a quantizer step.
(a) One scheme for analytically determining the best set of quantizer
steps, for a non-uniform quantizer, is the Lloyd-Max quantizer, which
is based on a least-squares minimization of the error term.
(b) Our nomenclature: signal values: fn the original signal, fbn the
predicted signal, and fen the quantized, reconstructed signal.
79
(c) DPCM: form the prediction; form an error en by subtracting the prediction from the actual signal; then quantize the error to a quantized
version, een .
The set of equations that describe DPCM are as follows:
fbn = function of (fen1 , , fen2 , fen3 , ...) ,
en = fn fbn ,
een = Q[en] ,
transmit codeword(een) ,
(6.16)
80
(d) The main effect of the coder-decoder process is to produce reconstructed, quantized signal values fn = fn +
en .
P
2
81
min
i+N
X1
2
(fn Q[fn])
(6.17)
n=i
82
Since signal differences are very peaked, we could model them using a
Laplacian probability distribution function, which is strongly peaked at
zero: it looks like
2
l(x) = (1/ 2 )exp( 2|x|/)
for variance 2.
So typically one assigns quantization steps for a quantizer with nonuniform steps by assuming signal differences, dn are drawn from such a
distribution and then choosing steps to minimize
min
i+N
X1
2
(dn Q[dn]) l(dn) .
(6.18)
n=i
83
fn
~
en
en
Symbol
coder
Quantizer
fn
Symbol
decoder
~
fn
Predictor
~
en
Binary stream
~
fn Reconstructed
+
Predictor
fn
Binary stream
84
fn1 + fn2 /2
(6.19)
so that en = fn fn is an integer.
As well, use the quantization scheme:
85
(6.20)
First, we note that the error is in the range 255..255, i.e., there are
511 possible levels for the error term. The quantizer simply divides the
error range into 32 patches of about 16 levels each. It also makes the
representative reconstructed value for each patch equal to the midway
point for each group of 16 levels.
86
Table 6.7 gives output values for any of the input codes: 4-bit codes are
mapped to 32 reconstruction levels in a staircase fashion.
Table 6.7 DPCM quantizer reconstruction levels.
en in range
-255 .. -240
-239 .. -224
.
.
.
-31 .. -16
-15 .. 0
1 .. 16
17 .. 32
.
.
.
225 .. 240
241 .. 255
Quantized to value
-248
-232
.
.
.
-24
-8
8
24
.
.
.
232
248
87
f2
150
f3
140
f4
200
f5
230 .
e
f
=
=
=
=
130
0
0
130
On the decoder side, we again assume extra values f equal to the correct
value f1 , so that the first reconstructed value f1 is correct. What is
received is
en, and the reconstructed fn is identical to that on the encoder
side, provided we use exactly the same prediction rule.
88
DM
DM (Delta Modulation): simplified version of DPCM. Often
used as a quick AD converter.
1. Uniform-Delta DM: use only a single quantized error value, either
positive or negative.
(a) a 1-bit coder. Produces coded output that follows the original
signal in a staircase fashion. The set of equations is:
fn = fn1 ,
en = fn fn = fn fn1 ,
en =
(6.21)
fn = fn +
en .
Note that the prediction simply involves a delay.
89
f2
11
f3
13
f4
15 .
e2 = 4,
f2 = 10 + 4 = 14
f3 = 14, e3 = 13 14 = 1,
e3 = 4, f3 = 14 4 = 10
f4 = 10, e4 = 15 10 = 5,
e4 = 4,
f4 = 10 + 4 = 14 .
The reconstructed set of values 10, 14, 10, 14 is close to the
correct set 10, 11, 13, 15.
(d) However, DM copes less well with rapidly changing signals. One
approach to mitigating this problem is to simply increase the sampling, perhaps to many times the Nyquist rate.
90
2. Adaptive DM: If the slope of the actual signal curve is high, the
staircase approximation cannot keep up. For a steep curve, should
change the step size k adaptively.
One scheme for analytically determining the best set of quantizer
steps, for a non-uniform quantizer, is Lloyd-Max.
91
ADPCM
ADPCM (Adaptive DPCM) takes the idea of adapting the
coder to suit the input much farther. The two pieces that
make up a DPCM coder: the quantizer and the predictor.
1. In Adaptive DM, adapt the quantizer step size to suit the
input. In DPCM, we can change the step size as well as
decision boundaries, using a non-uniform quantizer.
We can carry this out in two ways:
(a) Forward adaptive quantization: use the properties of the input
signal.
(b) Backward adaptive quantizationor: use the properties of the
quantized output. If quantized errors become too large, we should
change the non-uniform quantizer.
92
M
X
aifni
(6.22)
i=1
93
N
X
(fn fn )2
(6.23)
n=1
(b) Here we would sum over a large number of samples fn , for the current
patch of speech, say. But because fn depends on the quantization
we have a difficult problem to solve. As well, we should really be
changing the fineness of the quantization at the same time, to suit
the signals changing nature; this makes things problematical.
94
(c) Instead, one usually resorts to solving the simpler problem that results
from using not fn in the prediction, but instead simply the signal fn
itself. Explicitly writing in terms of the coefficients ai, we wish to
solve:
min
N
X
n=1
(fn
M
X
aifni )2
(6.24)
i=1
95
Convert to
uniform PCM
fn
en
+
-
Adaptive
quantizer
~en
32 kbps
output
+
Adaptive
predictor
^f
n
~f
Decoder
32 kbps
input
.e
^f
n
fn
Convert to
PCM
64 kbps A-law
or u-law
PCM output
Adaptive
predictor
96
97