This action might not be possible to undo. Are you sure you want to continue?
– with Matlab Examples
Prof. Jianfeng Feng
Department of Computer science and Centre for Scientiﬁc Computing
University of Warwick CV4 7AL, UK
1
2
Contents
1 Introduction 4
2 Data Transmission 5
2.1 The transmission of information . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 General Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 The conversion of analogue and digital signals . . . . . . . . . . . . . . . 7
2.1.4 The relationship between information, bandwidth and noise . . . . . . . . 8
2.2 Communication Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Time, frequency and bandwidth . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Digital modulation: ASK, FSK and PSK . . . . . . . . . . . . . . . . . . 13
2.2.3 Spread spectrum techniques . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.4 Digital demodulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.5 Noise in communication systems: probability and random signals . . . . . 18
2.2.6 Errors in digital communication . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.7 Timing control in digital communication . . . . . . . . . . . . . . . . . . 25
3 Information and coding theory 28
3.1 Information sources and entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Information source coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.1 Huffman coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Channel Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Error detection coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4.1 Hamming distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4.2 Parity Check Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5 Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Signal Representation 42
4.1 Sequences and their representation . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Discrete Time Fourier Transform (DTFT) . . . . . . . . . . . . . . . . . . . . . . 44
4.2.1 Computation of the DTFT . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Discrete Fourier Transform (DFT) . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.1 The relationship between DFT and DTFT . . . . . . . . . . . . . . . . . . 48
4.3.2 DFT for spectral estimation . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 *Sampling and reconstruction* . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5 Digital Filters 59
5.1 Operations on Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3 Nonrecursive Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3
5.3.1 Operational Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3.2 Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 Recursive Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4.1 Operational Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4.2 Poles and Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.5 Frequency and digital ﬁlters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.5.1 Poles, Zeros and Frequency Response . . . . . . . . . . . . . . . . . . . . 65
5.5.2 Filter Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.6 Simple Filter Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.7 Matched Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.8 Noise in Communication Systems: Stochastic Processes . . . . . . . . . . . . . . 73
5.8.1 Detection of known signals in noise . . . . . . . . . . . . . . . . . . . . . 74
5.9 Wiener Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.10 Having Fun: Contrast Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . 77
6 Appendix: Mathematics 79
6.1 Sinusoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3 The Exponential Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.4 Trigonometric identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.5 Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.5.1 Fourier’s Song . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.6 Matlab program for simple ﬁlter design . . . . . . . . . . . . . . . . . . . . . . . 82
6.7 Fourier Transform: From Real to Complex Variables . . . . . . . . . . . . . . . . 83
6.8 More Details on Example 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4
1 Introduction
Digital communications and signal processing refers to the ﬁeld of study concerned with the trans
mission and processing of digital data. This is in contrast with analog communications. While
analog communications use a continuously varying signal, a digital transmission can be broken
down into discrete messages. Transmitting data in discrete messages allows for greater signal pro
cessing capability. The ability to process a communications signal means that errors caused by
random processes can be detected and corrected. Digital signals can also be sampled instead of
continuously monitored and multiple signals can be multiplexed together to form one signal.
Because of all these advantages, and because recent advances in wideband communication
channels and solidstate electronics have allowed scientists to fully realize these advantages, dig
ital communications has grown quickly. Digital communications is quickly edging out analog
communication because of the vast demand to transmit computer data and the ability of digital
communications to do so.
Here is a summary on what we will cover in this course.
1. Data transmission: Channel characteristics, signalling methods, interference and noise, and
synchronisation;
2. Information Sources and Coding: Information theory, coding of information for efﬁciency
and error protection encryption;
3. Signal Representation: Representation of discrete time signals in time and frequency; z
transform and Fourier representations; discrete approximation of continuous signals; sam
pling and quantisation; and data compression;
4. Filtering: Analysis and synthesis of discrete time ﬁlters; ﬁnite impulse response and inﬁnite
impulse response ﬁlters; frequency response of digital ﬁlters; poles and zeros; ﬁlters for
correlation and detection; matched ﬁlters and stochastic signals and noise processes;
5. Digital Signal Processing applications: Processing of images using digital techniques.
The application of DCSP in industry and our daily life is enormous, although in this introduc
tory module we are only able to touch several simple examples.
Part of the current lecture notes on DSP is taken from lecture notes of Prof. R. Wilson. Many
materials are adopted from public domain materials. Many thanks to Dr. Enrico Rossoni who
has spent a considerable time on going through the manuscript several times to correct typos. The
sections I include only for your reference and I will not go through them during lectures are marked
with a ∗.
5
transmission
channel
noise,
distortion,
attenuation
transmitter receiver demodulator modulator
source sink (noise) (noise)
Figure 1: A communications system
2 Data Transmission
2.1 The transmission of information
2.1.1 General Form
A communications system is responsible for the transmission of information from the sender to
the recipient. At its simplest, the system contains (see Fig. 1)
1. A modulator that takes the source signal and transforms it so that it is physically suitable for
the transmission channel
2. A transmission channel that is the physical link between the communicating parties
3. A transmitter that actually introduces the modulated signal into the channel, usually ampli
fying the signal as it does so
4. A receiver that detects the transmitted signal on the channel and usually ampliﬁes it (as it
will have been attenuated by its journey through the channel)
5. A demodulator that receives the original source signal from the received signal and passes it
to the sink
At each stage, signal processing techniques are required to detect signals, ﬁlter out noise and
extract features, as we will discuss in the second part of our course.
6
frequency time
bandwidth
f1 f2
time domain
frequency domain
Figure 2: Time domain and frequency domain representation of a signal.
Digital data is universally represented by strings of 1s or 0s. Each one or zero is referred to as a
bit. Often, but not always, these bit strings are interpreted as numbers in a binary number system.
Thus 101001
2
= 41
10
. The information content of a digital signal is equal to the number of bits
required to represent it. Thus a signal that may vary between 0 and 7 has an information content
of 3 bits. Written as an equation this relationship is
I = log
2
(n) bits (2.1)
where n is the number of levels a signal may take. It is important to appreciate that information is
a measure of the number of different outcomes a value may take.
The information rate is a measure of the speed with which information is transferred. It is
measured in bits/second or b/s.
2.1.2 Examples
Telecommunications trafﬁc is characterised by great diversity. A nonexclusive list is the follow
ing:
1. Audio signals. An audio signal is an example of an analogue signal. It occupies a frequency
range from about 200 Hz to about 15KHz. Speech signals occupy a smaller range of fre
quencies, and telephone speech typically occupies the range 300 Hz to 3300 Hz. The range
of frequencies occupied by the signal is called its bandwidth (see Fig. 2).
2. Television. A television signal is an analogue signal created by linearly scanning a two
dimensional image. Typically the signal occupies a bandwidth of about 6 MHz.
7
3. Teletext is written (or drawn) communications that are interpreted visually. Telex describes
a message limited to a predetermined set of alphanumeric characters.
4. Reproducing cells, in which the daughter cells’s DNA contains information from the parent
cells;
5. A disk drive
6. Our brain
The use of digital signals and modulation has great advantages over analogue systems. These
are:
1. High ﬁdelity. The discrete nature of digital signals makes their distinction in the presence of
noise easy. Very high ﬁdelity transmission and representation are possible.
2. Time independence. A digitised signal is a stream of numbers. Once digitised a signal may
be transmitted at a rate unconnected with its recording rate.
3. Source independence. The digital signals may be transmitted using the same format irre
spective of the source of the communication. Voice, video and text may be transmitted using
the same channel.
4. Signals may be coded. The same transmitted message has an inﬁnite number of meanings
according to the rule used to interpret it.
One disadvantage of digital communication is the increased expense of transmitters and re
ceivers. This is particularly true of realtime communication of analogue signals.
2.1.3 The conversion of analogue and digital signals
In order to send analogue signals over a digital communication system, or process them on a digital
computer, we need to convert analogue signals to digital ones. This process is performed by an
analoguetodigital converter (ADC). The analogue signal is sampled (i.e. measured at regularly
spaced instant) (Fig 3) and then quantised (Fig. 3, bottom panel) i.e. converted to discrete numeric
values. The converse operation to the ADC is performed by a digitaltoanalogue converter (DAC).
The ADC process is governed by an important law. The NyquistShannon Theorem (which
will be discussed in Chapter 3) states that an analogue signal of bandwidth B can be completely
recreated fromits sampled formprovided it is sampled at a rate equal to at least twice its bandwidth.
That is
S ≥ 2B (2.2)
The rate at which an ADC generates bits depends on how many bits are used in the converter. For
example, a speech signal has an approximate bandwidth of 4kHz. If this is sampled by an 8bit
ADC at the Nyquist sampling rate, the bit rate R to transform the signal without loss of information
is
R = 8bits ×2B = 64000b/s (2.3)
8
t
1
sampling signal
t
signal
t
sampled signal
error in sampling due to
low sampling frequency
t
sampled signal
t
sampled, quantised signal
0
1
2
3
4
5
6
7
8
9
10
9 4 2 3 5
quantisation levels
6
Figure 3: Upper panel: Periodic sampling of an analogue signal. Bottom panel: Quantisation of a sampled
signal.
2.1.4 The relationship between information, bandwidth and noise
The most important question associated with a communication channel is the maximum rate at
which it can transfer information. Analogue signals passing through physical channels may not
achieve arbitrarily fast changes. The rate at which a signal may change is determined by the band
width. Namely, a signal of bandwidth B may change at a maximum rate of 2B, so the maximum
information rate is 2B. If changes of differing magnitude are each associated with a separate bit,
the information rate may be increased. Thus, if each time the signal changes it can take one of n
levels, the information rate is increased to
R = 2Blog
2
(n) b/s (2.4)
This formula states that as n tends to inﬁnity, so does the information rate.
Is there a limit on the number of levels? The limit is set by the presence of noise. If we continue
to subdivide the magnitude of the changes into ever decreasing intervals, we reach a point where
we cannot distinguish the individual levels because of the presence of noise. Noise therefore places
a limit on the maximum rate at which we can transfer information. Obviously, what really matters
is the signal to noise ratio (SNR). This is deﬁned by the ratio signal power S to noise power N,
and is often expressed in deciBels (dB):
SNR = 10 log
10
(S/N) dB (2.5)
The source of noise signals vary widely.
9
1. Input noise is common in low frequency circuits and arises from electric ﬁelds generated
by electrical switching. It appears as bursts at the receiver, and when present can have a
catastrophic effect due to its large power. Other people’s signals can generate noise: cross
talk is the term given to the pickup of radiated signals from adjacent cabling. When radio
links are used, interference from other transmitters can be problematic.
2. Thermal noise is always present. This is due to the random motion of electric charges present
in all media. It can be generated externally, or internally at the receiver.
There is a theoretical maximum to the rate at which information passes error free over a chan
nel. This maximum is called the channel capacity, C. The famous HartleyShannon Law states
that the channel capacity, C (which we will discuss in details later) is given by
C = Blog
2
(1 + (S/N)) b/s (2.6)
For example, a 10kHz channel operating at a SNR of 15dB has a theoretical maximum infor
mation rate of 10000 log
2
(31.623) = 49828b/s.
The theorem makes no statement as to how the channel capacity is achieved. In fact, in practice
channels only approach this limit. The task of providing high channel efﬁciency is the goal of
coding techniques.
2.2 Communication Techniques
2.2.1 Time, frequency and bandwidth
Most signal carried by communication channels are modulated forms of sine waves. A sine wave
is described mathematically by the expression
s(t) = Acos(ωt +φ) (2.7)
The quantities A, ω, φ are termed the amplitude, frequency and phase of the sine wave. We can
describe this signal in two ways. One way is to describe its evolution in time domain, as in the
equation above. The other way is to describe its frequency content, in frequency domain. The
cosine wave, s(t), has a single frequency, ω = 2πf.
This representation is quite general. In fact we have the following theorem due to Fourier.
Theorem 1 Any signal x(t) of period T can be represented as the sum of a set of cosinusoidal and
sinusoidal waves of different frequencies and phases.
Mathematically
x(t) = A
0
+
∞
¸
n=1
A
n
cos(ωnt) +
∞
¸
n=1
B
n
sin(ωnt) (2.8)
10
t
1
f 0 1 3 5 1 3 5
s(f)
Figure 4: Upper panel: a square wave. Bottom panel: The frequency spectrum for the square wave.
where
A
0
=
1
T
T/2
−T/2
x(t)dt
A
n
=
2
T
T/2
−T/2
x(t) cos(ωnt)dt
B
n
=
2
T
T/2
−T/2
x(t) sin(ωnt)dt
ω =
2π
T
(2.9)
where A
0
is the d.c. term, and T is the period of the signal. The description of a signal in terms of
its constituent frequencies is called its frequency spectrum.
Example 1 As an example, consider the square wave (Fig. 4)
s(t) = 1, 0 < t < π, 2π < t < 3π, ... (2.10)
and zero otherwise. This has the Fourier series:
s(t) =
1
2
+
2
π
[sin(t) +
1
3
sin(3t) +
1
5
sin(5t) +· · · ] (2.11)
A graph of the spectrum has a line at the odd harmonic frequencies, 1,3,5,9, ..., whose respec
tive amplitudes decay as 2/π, 2/3π, · · · . The spectrum of a signal is usually shown as a twosided
11
0
1
0
1
0
1
0
1
Figure 5: A square wave with 3,4,5 and 6 of its Fourier terms.
spectrum with positive and negative frequency components. The coefﬁcients are obtained accord
ing to
A
0
=
1
2π
π
0
1dt
=
1
2
A
n
=
2
2π
π
0
1 cos(nt)dt
=
1
nπ
sin(nπ)
= 0
B
n
=
2
2π
π
0
1 sin(nt)dt
=
1
nπ
(1 −cos(nπ))
(2.12)
which gives B
1
= 2/π, B
2
= 0, B
3
= 2/3π, ....
A periodic signal is uniquely decided by its coefﬁcients A
n
, B
n
. For example, in the Example
1, we have
x(t)

{· · · , B
−n
, · · · , B
−2
, B
−1
, B
0
, B
1
, B
2
, · · · , B
n
, · · · , }
(2.13)
If we truncate the series into ﬁnite terms, the signal can be approximated by a ﬁnite series as shown
in Fig. 5.
12
In general, a signal can be represented as follows (see Appendix 6.7)
x(t) = A
0
+
¸
∞
n=1
A
n
cos(ωnt) +
¸
∞
n=1
B
n
sin(ωnt)
= A
0
+
¸
∞
n=1
A
n
[exp(jωnt) + exp(−jωnt)]/2
+
¸
∞
n=1
B
n
[exp(jωnt) −exp(−jωnt)]/2j
=
¸
∞
n=−∞
c
n
exp(jωnt)
(2.14)
which is the exponential form of the Fourier series. In this expression, the values c
n
are complex
and so c
n
 and arg(c
n
) are the magnitude and the phase of the spectral component respectively,
c
n
=
1
T
T/2
−T/2
x(t) exp(−jωnt)dt (2.15)
where ω = 2π/T.
Signals whose spectra consist of isolated lines are periodic, i.e. they represent themselves
indeﬁnitely. The lines in this spectrum are inﬁnitely thin, i.e. they have zero bandwidth. The
HartleyShannon law tells us that the maximum information rate of a zero bandwidth channel is
zero. Thus zero bandwidth signals carry no information. To permit the signal to carry information
we must introduce the capacity for aperiodic change. The consequence of an aperiodic change is
to introduce a spread of frequencies into the signal.
If the square wave signal discussed in the previous example is replaced with an aperiodic
sequence, the spectrum changes substantially.
X(F) = FT{x(t)} =
∞
−∞
x(t) exp(−j2πFt)dt
x(t) = IFT{X(t)} =
∞
−∞
X(t) exp(j2πFt)dt
(2.16)
Example 2 Consider the case of a rectangular pulse. In particular deﬁne the signal
x(t) =
1, if −0.5 ≤ t < 0.5
0, otherwise
This is shown in Fig. 6 and its Fourier transform can be readily computed from the deﬁnition, as
X(F) =
0.5
−0.5
x(t) exp(−j2πFt)dt =
sin(πF)
πF
and is plotted in Fig. 6.
There are a number of features to note:
1. The bandwidth of the signal is only approximately ﬁnite. Most of the energy is contained in
a limited region called the mainlobe. However, some energy is found at all frequencies.
2. The spectrum has positive and negative frequencies. These are symmetric about the origin.
This may seem nonintuitive but can be seen from equations above.
The bandwidth of a communication channel is limited by the physical construction of the chan
nel.
13
−5 0 5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
−5 0 5
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Figure 6: A square wave and its Fourier transform.
2.2.2 Digital modulation: ASK, FSK and PSK
There are three ways in which the bandwidth of the channel carrier may be altered simply. These
are the altering of the amplitude, frequency and phase of the carrier wave. These techniques give
rise to amplitudeshiftkeying (ASK), frequencyshiftkeying (FSK) and phaseshiftkeying (PSK),
respectively.
ASK describes the technique by which a carrier wave is multiplied by the digital signal f(t).
Mathematically the modulated carrier signal s(t) is (Fig. 7)
s(t) = f(t) cos(ω
c
t +φ) (2.17)
ASK is a special case of amplitude modulation (AM). Amplitude modulation has the property
of translating the spectrum of the modulation f(t) to the carrier frequency. The bandwidth of the
signal remains unchanged. This can be seen if we examine a simple case when f(t) = cos(ωt)
and we use the identities:
cos(A +B) = cos(A) cos(B) −sin(A) sin(B)
cos(A −B) = cos(A) cos(B) + sin(A) sin(B)
(2.18)
then
s(t) = cos(ωt) cos(ω
c
t) = 1/2[cos((ω +ω
c
)t) + cos((ω −ω
c
)t)
See ﬁg. 8.
FSK describes the modulation of a carrier (or two carriers) by using a different frequency for
a 1 or 0. The resultant modulated signal may be regarded as the sum of two amplitude modulated
14
ASK modulated carrier
signal
1 0 1 0 0 1 0 1 1
carrier wave
Figure 7: Amplitude shift keying
fc fc f f
Figure 8: Amplitude shift keying: frequency domain
15
carrier wave for 0
signal
1 0 1 0 0 1 0 1 1
FSK modulated carrier carrier wave for 1
Figure 9: Frequency shift keying
signals of different carrier frequency
s(t) = f
0
(t) cos(ω
0
t + φ) +f
1
(t) cos(ω
1
t + φ)
FSK is classiﬁed as wideband if the separation between the two carrier frequencies is larger
than the bandwidth of the spectrums of f
0
and f
1
. In this case the spectrum of the modulated signal
appears as two separate ASK signals.
PSK describes the modulation technique that alters the phase of the carrier. Mathematically
s(t) = cos(ω
c
t + φ(t))
Binary phaseshiftkeying (BPSK) has only two phases, 0 and π. It is therefore a type of ASK
with f(t) taking he values −1 and 1, and its bandwidth is the same as that of ASK (Fig. 11).
Phaseshift keying offers a simple way of increasing the number of levels in the transmission
without increasing the bandwidth by introducing smaller phase shifts. Quadrature phaseshift
keying (QPSK) has four phases, 0,π/2, π, 3π/2. Mary PSK has M phases.
2.2.3 Spread spectrum techniques
Spreadspectrum techniques are methods in which energy generated at a single frequency is de
liberately spread over a wide band of frequencies. This is done for a variety of reasons, including
increasing resistance to natural interference or jamming and to prevent hostile detection.
We shall not delve deeply into mechanisms, but shall look at one particular technique that is
used called frequency hopping, as shown in Fig. 12. In frequency hopping, the bandwidth is
16
f f fc1 fc0 fc1 fc0
Figure 10: Frequency shift keying: frequency domain
signal
1 0 1 0 0 1 0 1 1
carrier wave BPSK modulated carrier
Figure 11: Phase shift keying
17
t
f
spread signal
unspread signal
Figure 12: Frequency hopping spread spectrum technique
effectively split into frequency channels. The signal is then spread across the channels. The hop
set (channel hopping sequence) is not arbitrary, but determined by the use of a pseudo random
sequence. The receiver can reproduce the identical hop set and so decode the signal. The hop rate
(the rate at which the signal switches channels) can be thousands of times a second, so the dwell
time (time spent on one channel) is very short. If the hop set is generated by a pseudo random
number generator then the seed to that generator is effectively a key decoding the transmitted
message, and so this technique has obvious security applications, for instance military use or in
mobile phone systems.
2.2.4 Digital demodulation
From the discussion above it might appear that QPSK offers advantages over both ASK, FSK and
PSK. However, the demodulation of these signals requires various degrees of difﬁculty and hence
expense. The method of demodulation is an important factor in determining the selection of a
modulation scheme. There are two types of demodulation which are distinguished by the need
to provide knowledge of the phase of the carrier. Demodulation schemes requiring the carrier
phase are termed coherent. Those that do not need knowledge of the carrier phase are termed
incoherent. Incoherent demodulation can be applied to ASK and wideband FSK. It describes
demodulation schemes that are sensitive only to the power in the signal. With ASK, the power is
either present, or it is not. With wideband FSK, the power is either present at one frequency, or the
other. Incoherent modulation is inexpensive but has poorer performance. Coherent demodulation
requires more complex circuity, but has better performance.
In ASK incoherent demodulation, the signal is passed to an envelope detector. This is a device
that produces as output the outline of the signal. A decision is made as to whether the signal is
18
present or not. Envelope detection is the simplest and cheapest method of demodulation. In optical
communications, phase modulation is technically very difﬁcult, and ASK is the only option. In
the electrical and microwave context, however, it is considered crude. In addition, systems where
the signal amplitude may vary unpredictably, such as microwave links, are not suitable for ASK
modulation.
Incoherent demodulation can also be used for wideband FSK. Here the signals are passed to
two circuits, each sensitive to one of the two carrier frequencies. Circuits whose output depends on
the frequency of the input are called discriminators or ﬁlters. The outputs of the two discriminators
are interrogated to determine the signal. Incoherent FSK demodulation is simple and cheap, but
very wasteful of bandwidth. The signal must be wideband FSK to ensure the two signals f
0
(t) and
f
1
(t) are distinguished. It is used in circumstances where bandwidth is not the primary constraint.
With coherent demodulation systems, the incoming signal is compared with a replica of the
carrier wave. This is obviously necessary with PSK signals, because here the power in the signal is
constant. The difﬁculty with coherent detection is the need to keep the phase of the replica signal,
termed local oscillator, ‘locked’ to the carrier. This is not easy to do. Oscillators are sensitive to
(among other things) temperature, and a ‘freerunning’ oscillator will gradually drift in frequency
and phase.
Another way to demodulate the signal is performed by multiplying the incoming signal with a
replica of the carrier. If the output of this process is h(t), we have that
h(t) = f(t) cos(ω
c
t) cos(ω
c
t) =
f(t)
2
[1 + cos(2ω
c
t)] =
f(t)
2
+
f(t)
2
cos(2ω
c
t)
i.e. the original signal plus a term at twice the carrier frequency. By removing, or ﬁltering out,
the harmonic term, the output of the demodulation is the modulation f(t). Suppose there is some
phase error φ present in the local oscillator signal. After ﬁltering, the output of a demodulator will
be
h(t) = f(t) cos(ω
c
t) cos(ω
c
t +φ) =
f(t)
2
cos(φ) +
f(t)
2
cos(2ω
c
t +φ)
Clearly the consequence for the correct interpretation of the demodulated signal is catastrophic.
Therefore, some more sophisticated methods such as differential phaseshiftkeying (DSPK)
have to be introduced to resolve the issue.
2.2.5 Noise in communication systems: probability and random signals
Noise plays a crucial role in communication systems. In theory, it determines the theoretical ca
pacity of the channel. In practise it determines the number of errors occurring in a digital commu
nication. We shall consider how the noise determines the error rates in the next subsection. In this
subsection we shall provide a description of noise.
Noise is a random signal. By this we mean that we cannot predict its value. We can only make
statements about the probability of it taking a particular value, or range of values. The Probability
density function (pdf) p(x) of a random signal, or random variable x is deﬁned to be the probability
that the random variable x takes a value between x
0
and x
0
+δx. We write this as follows:
19
p(x
0
)δx = P(x
0
< x < x
0
+ δx)
The probability that the random variable will take a value lying between x
1
and x
2
is then the
integral of the pdf over the interval [x
1
x
2
]:
P(x
1
< x < x
2
) =
x
2
x
1
p(x)dx
The probability P(−∞< x < ∞) is unity. Thus
∞
−∞
p(x)dx = 1
a density satisfying the equation above is termed normalized. The cumulative distribution function
(CDF) P(x) is deﬁned to be the probability that a random variable, x is less than x
0
P(x
0
) = P(x < x
0
) =
x
0
−∞
p(x)dx
From the rules of integration:
P(x
1
< x < x
2
) = P(x
2
) −P(x
1
)
Some commonly used distributions are:
1. Continuous distributions. An example of a continuous distribution is the Normal, or Gaus
sian distribution:
p(x) =
1
√
2πσ
exp(
−(x −m)
2
2σ
2
) (2.19)
where mis the mean value of p(x). The constant term ensures that the distribution is normal
ized. This expression is important as many actually occurring noise source can be described
by it, i.e. white noise. Further, we can simplify the expression by considering the source to
be a zero mean random variable, i.e. m = 0. σ is the standard deviation of the distribution
(see Fig. 13).
How would this be used? If we want to know the probability of, say, the noise signal, n(t),
having the value [−v
1
, v
1
], we would evaluate:
P(v
1
) −P(−v
1
)
In general to evaluate P(−x
1
< x < x
1
) if we use
u = x/(
√
2σ), dx = duσ
√
2
then we have (m = 0)
P(−x
1
< x < x
1
) =
1
√
π
u
1
−u
1
exp(−u
2
)du =
2
√
π
u
1
0
exp(−u
2
)du
20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.5 1 0.5 0 0.5 1 1.5
Figure 13: Gaussian distribution pdf.
where u
1
= x
1
/(
√
2σ). The distribution function P(x) is usually written in terms of a
function of the error function erf(x). The complementary error function erfc is deﬁned by
erfc(x) = 1 −erf(x)
2. Discrete distributions. Probability density functions need not be continuous. If a random
variable can only take discrete value, its PDF takes the forms of lines (see Fig. 14). An
example of a discrete distribution is the Poisson distribution
p(n) = P(x = n) =
α
n
n!
exp(−α)
where n = 0, 1, 2, · · · ,.
We cannot predict the value a random variable may take on a particular occasion but we can
introduce measures that summarise what we expect to happen on average. The two most important
measures are the mean (or expectation) and the standard deviation.
The mean η of a random variable x is deﬁned to be
η =
xp(x)dx
or, for a discrete distribution:
η =
¸
np(n)
In the examples above we have assumed that the mean of the Gaussian distribution to be 0,
while the mean of the Poisson distribution is found to be α. The mean of a distribution is, in
common sense, the average value taken by the corresponding random variable.
21
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 2 4 6 8 10 12 14 16 18 20
Figure 14: Discrete distribution pdf (Poisson).
The variance σ
2
is deﬁned to be
σ
2
=
(x −η)
2
p(x)dx
or, for a discrete distribution,
σ
2
=
¸
(n −η)
2
p(n)
The square root of the variance is called standard deviation. The standard deviation is a measure
of the spread of the probability distribution around the mean. A small standard deviation means
the distribution is concentrated about the mean. A large value indicates a wide range of possible
outcomes. The Gaussian distribution contains the standard deviation within its deﬁnition. The
Poisson distribution has a standard deviation of α
2
.
In many cases the noise present in communication signals can be modelled as a zeromean,
Gaussian random variable. This means that its amplitude at a particular time has a PDF given by
Eq. (2.19) above. The statement that noise is zero mean says that, on average, the noise signal
takes the values zero. We have already seen that the signal to noise ratio is an important quantity
in determining the performance of a communication channel. The noise power referred to in the
deﬁnition is the mean noise power. It can therefore be rewritten as
SNR = 10 log
10
(S/σ
2
)
If only thermal noise is considered, we have σ
2
= kT
m
B where k is the Boltzman’s constant
(k = 1.38 ×10
−23
J/K), T
m
is the temperature and B is the receiver bandwidth.
22
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
+ 0 
Figure 15: Schematic of noise on a two level line
2.2.6 Errors in digital communication
We noted earlier that one of the most important advantages of digital communications is that it
permits very high ﬁdelity. In this subsection we shall investigate this more closely. We shall
consider in detail only BPSK systems, and comment on the alternative modulations.
In the absence of noise, the signal V , from a BPSK system can take one of two values ±v
b
. In
the ideal case, if the signal is greater than 0, the value that is read is assigned to 1. If the signal is
less than 0, the value that is read is assigned to 0. When noise is present, this distinction between
±v
b
(with the threshold at 0) becomes blurred. There is a ﬁnite probability of the signal dropping
below 0, and thus being assigned 0, even though a 1 was transmitted. When this happens, we say
that a biterror has occurred. The probability that a biterror will occur in a given time is referred
to as the biterror rate (BER) (see Fig. 15).
We suppose that the signal V , which has the signal levels ±v
b
, is combined with noise N of
variance σ
2
. The probability that an error will occur in the transmission of a 1 is
P(N +v
b
< 0) = P(N < −v
b
) =
2
√
π
−v
b
−∞
exp(−u
2
)du =
1
2
erfc(v
b
/2σ)
Similarly the probability that an error will occur in the transmission of a 0 is
P(N −v
b
> 0) = P(N > v
b
) =
2
√
π
∞
v
b
exp(−u
2
)du =
1
2
erfc(v
b
/2σ)
It is usual to write these expressions in terms of the ratio of E
b
(energy per bit) to E
n
(noise
power per unit Hz). The power S in the signal is, on average v
2
b
, and the total energy in the
23
Figure 16: Expressions for error rates in some modulation schemes
signalling period T is v
2
b
T. Using the expressions above, we have
1
2
erfc(v
b
/2σ) =
1
2
erfc(
E
b
2TE
n
B
)
where we have used the fact that σ
2
= kT
m
B = E
n
B for temperature T
m
.
For BPSK, the signaling period T is half the reciprocal of the bandwidth B, i.e. T = 1/2B;
thus
P(error) =
1
2
erfc(
E
b
/E
n
) (2.20)
All coherent detection schemes give rise to error rates of the formin Eq. (2.20) above. For example,
QPSK has twice the error probability of BPSK, reﬂecting the fact that with a quadrature scheme,
there are more ways an error can occur. Narrowband FSK has an error probability rather worse
than QPSK, although its numerical value depends on the exact scheme used. Fig. 17 shows graphs
of P(error) for incoherent ASK, incoherent FSK, BPSK, and DPSK; the expressions are given in
Table 16.
Incoherent demodulation schemes have a higher probability of error than coherent schemes.
Incoherent schemes are forms of power detection, i.e. produce an output proportional to the square
of the input. Power detection always decreases the SNR. It is quite easy to see why this is so.
Suppose the input, X, is of the form X = v +N, as before. The input SNR is
SNR
in
=
v
2
N
2
If we square the input, the output is
X
2
= (v + N)
2
24
1e09
1e08
1e07
1e06
1e05
0.0001
0.001
0.01
0.1
1
2 4 6 8 10 12 14 16
P
{
e
r
r
o
r
}
Eb/En (dB)
"ask"
"fsk"
"bpsk"
"dpsk"
Figure 17: Comparison of error rates in some modulation schemes
Assuming the SNR is high, vN >> N
2
, and the SNR of the output is
SNR
out
∼
(v
2
)
2
(2vN)
2
=
SNR
in
4
This decrease in the signaltonoise ratio causes an increase in the error probability. The detailed
analysis is beyond our scope. Although poorer, however, their performance is good nonetheless.
This explains the widespread use of incoherent ASK and FSK.
Error rates are usually quoted as bit error rates (BER). The conversion from error probability
to BER is numerically simple: BER = P(error). However, this conversion assumes that the prob
ability of errors from bittobit are independent. This may or may not be a reasonable assumption.
In particular, loss of timing can cause multiple bit failures that can dramatically increase the BER.
When signals travel along the channel, they are being attenuated. As the signal is losing power,
the BER increases with the length of the channel. Regenerators, placed at regular intervals, can
dramatically reduce the error rate over long channels. To determine the BER of the channel with
N regenerators, it is simple to calculate ﬁrst the probability of no error. This probability is the
probability of no error over one regenerator, raised to the Nth power:
P( No error over N regenerators) = (1 −P(error))
N
assuming the regenerators are regularly spaced and the probabilities are independent. The BER is
then determined simply by:
P( error over N regenerators) = 1 −P( no error over N regenerators)
This avoids having to enumerate all the ways in which the multiple system can fail.
25
data
Figure 18: The received signal could be 1001010 or 11000011001100.
2.2.7 Timing control in digital communication
In addition to providing the analogue modulation and demodulation functions, digital communi
cation also requires timing control. Timing control is required to identify the rate at which bits
are transmitted and to identify the start and end of each bit. This permits the receiver to correctly
identify each bit in the transmitted message. Bits are never sent individually. They are grouped
together in segments, called blocks. A block is the minimum segment of data that can be sent with
each transmission. Usually, a message will contain many such blocks. Each block is framed by
binary characters identifying the start and end of the block.
The type of method used depends on the source of the timing information. If the timing in the
receiver is generated by the receiver, separately from the transmitter, the transmission is termed
asynchronous. If the timing is generated, directly or indirectly, from the transmitter clock the
transmission is termed synchronous.
Asynchronous transmission is used for low datarate transmission and standalone equipment.
We will not discuss it in detail here. Synchronous transmission is used for high data rate trans
mission. The timing is generated by sending a separate clock signal, or embedding the timing
information into the transmission. This information is used to synchronize the receiver circuitry
to the transmitter clock. The necessity to introduce a clock in signal transmission is obvious if
we look at Fig. 18. Without a clock, would we be able to tell whether it is a 1001010 or a
11000011001100?
Synchronous receivers require a timing signal from the transmitter. An additional channel
may be used in the system to transmit the clock signal. This is wasteful of bandwidth, and it is
more customary to embed the timing signal within the transmitted data stream by use of suitable
encoding (selfclocking encoding).
26
Figure 19: Bipolar coding.
In bipolar coding, a binary 0 is encoded as zero volts. A binary 1 is encoded alternately as a
positive voltage and a negative voltage (see Fig. 19). Other systems must synchronize using some
form of outofband communication, or add frame synchronization sequences that don’t carry data
to the signal. These alternative approaches require either an additional transmission medium for
the clock signal or a loss of performance due to overhead, respectively. A bipolar encoding is an
often good compromise: runs of ones will not cause a lack of transitions, however long sequences
of zeroes are still an issue. Since the signal doesn’t change for as long as the data to send is a
zero, they will result in no transitions and a loss of synchronization. Where frequent transitions are
a requirement, a selfclocking encoding such as Manchester code discussed below may be more
appropriate.
Manchester code (also known as Phase Encoding, or PE) is a form of data communications in
which each bit of data is signiﬁed by at least one voltage level transition. Manchester encoding
is therefore considered to be selfclocking, which means that accurate synchronisation of a data
stream is possible. Manchester coding has been adopted into many efﬁcient and widely used
telecommunications standards, such as Ethernet.
Here is a summary for Manchester code:
• Data and clock signals are combined to form a single selfsynchronizing data stream
• each encoded bit contains a transition at the midpoint of a bit period
• the direction of transition determines whether the bit is a ”0” or a ”1,” and
• the ﬁrst half is the true bit value and the second half is the complement of the true bit value
(see Fig. 20).
27
Figure 20: Manchester coding.
Manchester codes always have a transition at the middle of each bit period. The direction of
the midbit transition is what carries the data, with a lowtohigh transition indicating one binary
value, and a hightolow transition indicating the other. Transitions that don’t occur midbit don’t
carry useful information, and exist only to place the signal in a state where the necessary midbit
transition can take place. Though this allows the signal to be selfclocking, it essentially doubles
the bandwidth.
However, there are today many more sophisticated codes (8B/10B encoding) which accomplish
the same aims with less bandwidth overhead, and less synchronisation ambiguity in pathological
cases.
28
3 Information and coding theory
Information theory is concerned with the description of information sources, the representation of
the information from a source, and the transmission of this information over channel. This might
be the best example to demonstrate how a deep mathematical theory could be successfully applied
to solving engineering problems.
Information theory is a discipline in applied mathematics involving the quantiﬁcation of data
with the goal of enabling as much data as possible to be reliably stored on a mediumand/or commu
nicated over a channel. The measure of data, known as information entropy, is usually expressed
by the average number of bits needed for storage or communication.
Applications of fundamental topics of information theory include ZIP ﬁles (lossless data com
pression), MP3s (lossy data compression), and DSL (channel coding). The ﬁeld is at the crossroads
of mathematics, statistics, computer science, physics, neurobiology, and electrical engineering. Its
impact has been crucial to success of the Voyager missions to deep space, the invention of the CD,
the feasibility of mobile phones, the development of the Internet, the study of linguistics and of
human perception, the understanding of black holes, and numerous other ﬁelds.
Information theory is generally considered to have been founded in 1948 by Claude Shannon in
his seminal work, A Mathematical Theory of Communication. The central paradigm of classic
information theory is the engineering problem of the transmission of information over a noisy
channel. The most fundamental results of this theory are Shannon’s source coding theorem, which
establishes that, on average, the number of bits needed to represent the result of an uncertain event
is given by its entropy; and Shannon’s noisychannel coding theorem, which states that reliable
communication is possible over noisy channels provided that the rate of communication is below
a certain threshold called the channel capacity. The channel capacity can be approached by using
appropriate encoding and decoding systems.
Information theory is closely associated with a collection of pure and applied disciplines that
have been investigated and reduced to engineering practice under a variety of rubrics throughout
the world over the past half century or more: adaptive systems, anticipatory systems, artiﬁcial intel
ligence, complex systems, complexity science, cybernetics, informatics, machine learning, along
with systems sciences of many descriptions. Information theory is a broad and deep mathematical
theory, with equally broad and deep applications, amongst which is the vital ﬁeld of coding theory
which is the main focus of our course.
Coding theory is concerned with ﬁnding explicit methods, called codes, of increasing the efﬁ
ciency and reducing the net error rate of data communication over a noisy channel to near the limit
that Shannon proved is the maximum possible for that channel. These codes can be roughly sub
divided into data compression (source coding) and errorcorrection (channel coding) techniques.
In the latter case, it took many years to ﬁnd the methods Shannon’s work proved were possible.
A third class of information theory codes are cryptographic algorithms (both codes and ciphers).
Concepts, methods and results from coding theory and information theory are widely used in cryp
tography and cryptanalysis.
29
3.1 Information sources and entropy
We start our examination of information theory by way of an example.
Consider predicting the activity of Prime Minister tomorrow. This prediction is an information
source. Assume such information source has two outcomes:
• The Prime Minister will be in his ofﬁce,
• The Prime Minister will be naked and run 10 miles in London.
Clearly, the outcome ‘in ofﬁce’ contains little information; it is a highly probable outcome. The
outcome ’naked run’, however contains considerable information; it is a highly improbable event.
In information theory, an information source is a probability distribution, i.e. a set of proba
bilities assigned to a set of outcomes. This reﬂects the fact that the information contained in an
outcome is determined not only by the outcome, but by how uncertain it is. An almost certain
outcome contains little information.
A measure of the information contained in an outcome was introduced by Hartley in 1927. He
deﬁned the information contained in an outcome x
i
as
I(x
i
) = −log
2
p(x
i
)
This measure satisﬁed our requirement that the information contained in an outcome is propor
tional to its uncertainty. If P(x
i
) = 1, then I(x
i
) = 0, telling us that a certain event contains no
information.
The deﬁnition above also satisﬁes the requirement that the total information in independent
events should add. Clearly, our Prime Minister prediction for two days contains twice as much
information as that for one day. For two independent outcomes x
i
and x
j
,
I(x
i
and x
j
) = log
2
P(x
i
and x
j
) = log
2
P(x
i
)P(x
j
) = I(x
i
) + I(x
j
)
Hartley’s measure deﬁnes the information in a single outcome. The measure entropy H(X) deﬁnes
the information content of the source X as a whole. It is the mean information provided by the
source. We have
H(X) =
¸
i
P(x
i
)I(x
i
) = −
¸
i
P(x
i
) log
2
P(x
i
)
A binary symmetric source (BSS) is a source with two outputs whose probabilities are p and
1 −p respectively. The prime minister discussed is a BSS. The entropy of the source is
H(X) = −p log
2
p −(1 −p) log
2
(1 −p)
The function (Fig. 21) takes the value zero when p = 0. When one outcome is certain, so is the
other, and the entropy is zero. As p increases, so too does the entropy, until it reaches a maximum
when p = 1 − p = 0.5. When p is greater than 0.5, the curve declines symmetrically to zero,
reached when p = 1. We conclude that the average information in the BSS is maximised when
both outcomes are equally likely. The entropy is measuring the average uncertainty of the source.
30
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
H
(
X
)
p
Figure 21: Entropy vs. p
(The term entropy is borrowed from thermodynamics. There too it is a measure of the uncertainty,
or disorder of a system).
When p = 0.5, H(X) = 1. The unit of entropy is bits/symbol. An equally probable BSS has
an entropy, or average information content per symbol, of 1 bit per symbol.
By long tradition, engineers have used the word bit to describe both the symbol, and its infor
mation content. A BSS whose output are 1 or 0 has an output we describe as a bit. The entropy
of source is also measured in bits, so that we might say the equiprobable BSS has an information
rate of 1 bit/bit. The numerator bit refers to the information content, while the denominator bit
refers to the symbol 1 or 0. We can avoid this by writing it as 1 bit/symbol. When p = 0.5, the
BSS information rate falls. When p = 0.1, H(X) = 0.47 bits/symbol. This means that on average,
each symbol (1 or 0) of source output provides 0.47 bits of information.
3.2 Information source coding
It seems intuitively reasonable that an information source of entropy H needs on average only
H binary bits to represent each symbol. Indeed, the equiprobable BSS generate on average 1
information bit per symbol bit. However, consider the prime minister example again. Suppose the
probability of ‘naked run’ is 0.1 (N) and that of ‘ofﬁce’ is 0.9 (O). We have already noted that this
source has an entropy of 0.47 bits/symbol. Suppose we identify ‘naked run’ with 1 and ‘ofﬁce’
with zero. This representation uses 1 binary bit per symbol, hence is using more binary bits per
symbol than the entropy suggests is necessary.
The Shannon’s ﬁrst theorem states that an instantaneous code can be found that encodes a
31
Sequence OOO OON ONO NOO NNO NON ONN NNN
Probability 0.729 0.081 0.081 0.081 0.009 0.009 0.009 0.001
Codeword 0 1 01 10 11 00 000 111
Codeword Length (in bits) 1 1 2 2 2 2 3 3
Weighted length
Entropy
Table 1: Variable length source coding
source of entropy H(X) with an average number of bits per symbol B
s
such that
B
s
≥ H(X)
Ordinarily, the longer the sequence of symbols, the closer B
s
will be to H(X).
The replacement of the symbols naked run/ofﬁce with a binary representation is termed source
coding. In any coding operation we replace the symbol with a codeword. The purpose of source
coding is to reduce the number of bits required to convey the information provided by the infor
mation source: minimize the average length of codes.
Central to source coding is the use of sequence. By this, we mean that codewords are not
simply associated to a single outcome, but to a sequence of outcomes. To see why this is useful, let
us return to the problem of the Prime Minister. Suppose we group the outcomes in three, according
to their probability, and assign binary codewords to these grouped outcomes. Table 1 shows such
a code, and the probability of each codeword occurring. It is easy to compute that this code will
on average use 1.4 bits/sequence
0.729∗log
2
(0.729)+0.081∗log
2
(0.081)∗3+0.009∗log
2
(0.009)∗3+0.001∗log
2
(0.001) = −1.4070
The average length of coding is given by
0.729*1+0.081*1+2*0.081*2+2*0.009*2+3*0.009+3*0.001=1.2
This example shows how using sequences permits us to decrease the average number of bits
per symbol. Moreover, without difﬁculty, we have found a code that has an average bit usage less
than the source entropy. However, there is a difﬁculty with the code in Table 1. Before a code can
be decoded, it must be parsed. Parsing describes that activity of breaking the message string into
its component codewords. After parsing, each codeword can be decoded into its symbol sequence.
An instantaneously parsable code is one that can be parsed as soon as the last bit of a codeword
is received. An instantaneous code must satisfy the preﬁx condition: that no codeword may be a
preﬁx of any other codeword. This condition is not satisﬁed by the code in Table 1.
3.2.1 Huffman coding
The code in Table 2, however, is an instantaneously parsable code. It satisﬁes the preﬁx condition.
As a consequence of Shannon’s Source coding theorem, the entropy is a measure of the smallest
codeword length that is theoretically possible for the given alphabet with associated weights. In
32
Sequence A B C D E F G H
Probability 0.729 0.081 0.081 0.081 0.009 0.009 0.009 0.001
Codeword 1 011 010 001 00011 00010 00001 00000
Codeword Length (in bits) 1 3 3 3 5 5 5 5
Weighted length
Entropy
Table 2: OOO=A, OON=B, ONO=C, NOO=D, NNO=E, NON=F, ONN=G, NNN=H
this example, the weighted average codeword length is 1.59, only slightly larger than the calculated
entropy of 1.6 bits per symbol. So not only is this code optimal in the sense that no other feasible
code performs better, but it is very close to the theoretical limit established by Shannon.
The code in Table 2 uses
0.729 ∗ 1 + 0.081 ∗ 3 ∗ 3 + 0.009 ∗ 5 ∗ 3 + 0.001 ∗ 5 = 1.5980
bits per sequence. In fact, this is the Huffman code for the sequence set. We might conclude that
there is little point in expending the effort in ﬁnding a code better than the Huffman code. The
codeword for each sequence is found by generating the Huffman code tree for the sequence. A
Huffman code tree is an unbalanced binary tree.
History
In 1951, David Huffman and his MIT information theory classmates were given the choice
of a term paper or a ﬁnal exam. The professor, Robert M. Fano, assigned a term paper on the
problem of ﬁnding the most efﬁcient binary code. Huffman, unable to prove any codes were the
most efﬁcient, was about to give up and start studying for the ﬁnal when he hit upon the idea of
using a frequencysorted binary tree and quickly proved this method the most efﬁcient.
In doing so, the student outdid his professor, who had worked with information theory inventor
Claude Shannon to develop a similar code. Huffman avoided the major ﬂaw of the suboptimal
ShannonFano coding by building the tree from the bottom up instead of from the top down.
Problem deﬁnition
Given a set of symbols and their probabilities, ﬁnd a preﬁxfree binary code with minimum
expected codeword length.
The derivation of the Huffman code tree is shown in Fig. 22 and the tree itself is shown in Fig.
23. In both these ﬁgures, the letters A to H have been used in place of the sequences in Table 2 to
make them easier to read.
Note that Huffman coding relies on the use of bit patterns of variable length. In most data
communication systems, the data symbols are encoded as bit pattern of a ﬁxed length, i.e. 8 bits.
This is done for technical simplicity. Often, coding scheme such as Huffman coding are used
on a source symbol set to produce variable bit length coding and are referred to as compression
algorithms.
In Fig. 22 the sequences are ordered with respect to the probability of the sequence occurring,
with the highest probability at the top of the list. The tree is derived bottom up, in terms of branch
nodes and leaf nodes by combining probabilities and removing leaf nodes in progressive stages.
33
A
B
C
D
E
F
G
H
A
B
C
D
0.010
E
F
A
B
C
D
0.018
0.010
A
B
C
D
0.028
A
B
C
A
0.162
0.109
A
0.271
1
0
1
0
1
0
1
0
1
0
1
0
0.109 0
1
Figure 22: Example derivation of a Huffman code tree
1 0
1 0 1 0
1 0
1 0 1 0
0.162
0.271
0.109
B D C 0.028
0.018
E F G H
0.010
A
1 0
Figure 23: Example Huffman code tree
34
As shown in Fig. 22, the two lowest leaf nodes G and H have their weights added, and the topmost
node is labelled with a 1 and the lower one with a 0. In the next stage the symbol G and H are
represented by that with the lowest weight and the list is rewritten, again in order of the weights.
The two lowest leaf nodes are now E and F, and they are labelled 1 and 0, respectively, and their
weights are added to be taken onto the next stage. This continues until only two nodes remain.
The Huffman tree shown in Fig. 23 is then produced by following backwards along the arrows. To
derive the codewords from the tree, descend from the top node, and list the 1s and 0s in the order
they appear until you reach the leaf node for one of the letters.
We summarize it here.
Creating the tree:
1. Start with as many leaves as there are symbols.
2. Queue all leaf nodes into the ﬁrst queue (in order).
3. While there is more than one node in the queues:
• Remove two nodes with the lowest weight from the queues.
• Create a new internal node, with the two justremoved nodes as children (either node
can be either child) and the sum of their weights as the new weight.
• Update the parent links in the two justremoved nodes to point to the justcreated parent
node.
4. Queue the new node into the second queue.
5. The remaining node is the root node; the tree has now been generated.
3.3 Channel Capacity
One of the most famous results of information theory is Shannon’s channel coding theorem. For a
given channel there exists a code that will permit the errorfree transmission across the channel at a
rate R, provided R < C, the channel capacity. Equality is achieved only when the SNR is inﬁnity.
As we have already noted, the astonishing part of the theory is the existence of a channel
capacity. Shannon’s theorem is both tantalizing and frustrating. It offers errorfree transmission,
but it makes no statements as to what code is required. In fact, all we may deduce from the proof
of the theorem is that is must be a long one. No none has yet found a code that permits the use of
a channel at its capacity. However, Shannon has thrown down the gauntlet, in as much as he has
proved that the code exists.
We shall not give a description of how the capacity is calculated. However, an example is
instructive. The binary channel is a channel with a binary input and output. Associated with each
output is a probability p that the output is correct, and a probability 1 −p that it is not. For such a
channel, the channel capacity turns out to be:
C = 1 + p log
2
p + (1 −p) log
2
(1 −p)
35
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
C
p
Figure 24: Channel capacity.
Here (Fig. 24), p is the bit error probability. If p = 0, then C = 1. If p = 0.5, then C = 0. Thus
if there is an equal probability of receiving a 1 or a 0, irrespective of the signal sent, the channel is
completely unreliable and no message can be sent across it.
So deﬁned, the channel capacity is a nondimensional number. We normally quote the capacity
as a rate, in bits/second. To do this we relate each output to a change in the signal. For a channel
of bandwidth B, we can transmit at most 2B changes per second. Thus the capacity in bits/second
in 2BC. For the binary channel we have
C = B[1 +p log
2
p + (1 −p) log
2
(1 −p)]
For the binary channel the maximum bit rate W is 2B. We note that C < W, i.e. the capacity is
always less than the bit rate. The data rate D, or information rate, describes the rate of transfer of
data bits across the channel. In theory we have
W > C > D
Shannon’s channel coding theorem applies to the channel, not to the source. If the source is opti
mally coded, we can rephrase the channel coding theorem: A source of information with entropy
H(X) can be transmitted error free over a channel provided H(x) ≤ C.
3.4 Error detection coding
3.4.1 Hamming distance
The task of source coding is to represent the source information with the minimum number of
symbols. When a code is transmitted over a channel in the presence of noise, errors will occur.
36
The task of channel coding is to represent the source information in a manner that minimises the
probability of errors in decoding.
It is apparent that channel coding requires the use of redundancy. If all possible outputs of
the channel correspond uniquely to a source input, there is no possibility of detecting errors in the
transmission. To detect, and possibly correct errors, the channel code sequence must be longer
the the source sequence. The rate R of a channel code is the average ratio of the source sequence
length to the channel code length. Thus R < 1.
A good channel code is designed so that, if a few errors occur in transmission, the output can
still be decoded with the correct input. This is possible because although incorrect, the output is
sufﬁciently similar to the input to be recognisable. The idea of similarity is made more ﬁrm by
the deﬁnition of the Hamming distance. Let x and y be two binary sequences of same length. The
Hamming distance between these two codes is the number of symbols that disagree.
For example:
• The Hamming distance between 1011101 and 1001001 is 2.
• The Hamming distance between 2143896 and 2233796 is 3.
• The Hamming distance between ”toned” and ”roses” is 3.
Suppose the code x is transmitted over the channel. Due to error, y is received. The decoder
will assign to y the code x that minimises the Hamming distance between x and y. For example,
consider the codewords:
a = (100000), b = (011000), c = (000111)
if the transmitter sends 10000 but there is a signal bit error and the receiver gets 10001, it can be
seen that the nearest codeword is in fact 10000 and so the correct codeword is found.
It can be shown that to detect n bit errors, a coding scheme requires the use of codewords with
a Hamming distance of at least n + 1. It can be also shown that to correct n bit errors requires a
coding scheme with a least a Hamming distance of 2n + 1 between the codewords.
By designing a good code, we try to ensure that the Hamming distance between the possible
codewords x is larger than the Hamming distance arising from errors.
3.4.2 Parity Check Codes
The theoretical limitations of coding are placed by the results of information theory. These results
are frustrating in that they offer little clue as to how the coding should be performed. Errors occur–
they must, at the very least, be detected.
Error detection coding is designed to permit the detection of errors. Once detected, the receiver
may ask for a retransmission of the erroneous bits, or it may simply inform the recipient that
the transmission was corrupted. In a binary channel, error checking code are called parity check
codes. Practical codes are normally block codes. A block code converts a ﬁxed length of K data
bits to a ﬁxed length N code word, where N > K. The rate of the code is the ratio K/N, and
37
the redundancy of the code is 1 − K/N. Our ability to detect errors depends on the rate. A
low rate has a high detection probability, but a high redundancy. The receiver will assign to the
received codeword the preassigned codeword that minimises the Hamming distance between the
two words. If we wish to identify any pattern of n or less errors, the Hamming distance between
the preassigned codes must be n + 1 or greater.
A very common code is the single parity check code. This code appends to each K data bits an
additional bit whose value is taken to make the K + 1 word even or odd. Such a choice is said to
have even (odd) parity. With even (odd) parity, a single bit error will make the received word odd
(even). The preassigned code words are always even (odd), and hence are separated by a Hamming
distance of 2 or more.
To see how the addition of a parity bit can improve error performance, consider the following
example. A common choice of K is eight. Suppose that BER is p = 10
−4
. Then
P( single bit error ) = p
P( no error in single bit ) = 1 −p
P( no error in 8 bits ) = (1 −p)
8
P( unseen error in 8 bits ) = 1 −(1 −p)
8
= 7.9 ×10
−4
So, the probability of a transmission with an error is as above. With the additional of a parity
error bit we can detect any single bit error. So:
P( no error single bit ) = 1 −p
P( no error in 9 bits ) = (1 −p)
9
P( single error in 9 bits ) = 9[P( single bit error )P(no error in other 8 bits)]
= 9p(1 −p)
8
P( unseen error in 9 bits ) = 1 −P(no error in 9 bits) −P(single error in 9 bits)
= 1 −(1 −p)
9
−9p(1 −p)
8
= 3.6 ×10
−7
As can be seen the addition of a parity bit has reduced the uncorrected error rate by three orders
or magnitude.
Single parity bits are common in asynchronous, character oriented transmission. Where syn
chronous transmission is used, additional parity symbols are added that check not only the parity
of each 8 bit row, but also the parity of each 8 bit column. The column is formed by listing each
successive 8 bit word one beneath the other. This type of parity checking is called block sum
checking, and it can correct any single 2 bit error in the transmitted block of rows and columns.
However, there are some combinations of errors that will go undetected in such a scheme (see
Table. 3 )
Parity checking in this way provides good protection against single and multiple errors when
the probability of the errors are independent. However, in many circumstances, errors occur in
groups, or bursts. Parity checking of the kind just described then provides little protection. In
these circumstances, a polynomial code is used.
38
p1 B6 B5 B4 B3 B2 B1 B0
0 1 0 0 0 0 0 0
1 0 1 0 1 0 0 0
0 1 0 (*) 0 0 1 (*) 1 0
0 0 1 0 0 0 0 0
1 0 1 0 1 1 0 1
0 1 0 0 0 0 0 0
1 1 1 (*) 0 0 0 (*) 1 1
1 0 0 0 0 0 1 1
p2 1 1 0 0 0 0 0 1
Table 3: p1 is odd parity for rows; p2 is even parity for columns (*) mark undetected error combi
nation.
The mechanism of polynomial codes is beyond the scope of this course. We shall not discuss
it in details.
Error correction coding is more sophisticated than error detection coding. Its aim is to detect
and locate errors in transmission. Once located, the correction is trivial: the bit is inverted. Error
correction coding requires lower rate codes than error detection, often markedly so. It is there
fore uncommon in terrestrial communication, where better performance is usually obtained with
error detection and retransmission. However, in satellite communication, the propagation delay
often means that many frames are transmitted before an instruction to retransmit is received. This
can make the task of data handling very complex. Realtime transmission often precludes retrans
mission. It is necessary to get it right ﬁrst time. In these special circumstances, the additional
bandwidth required for the redundant checkbits is an acceptable price. There are two principle
types: Hamming codes and convolutional codes. Again we will not discuss them in details here.
3.5 Encryption
In all our discussion of coding, we have not mentioned what is popularly supposed to be the
purpose of coding: security. We have only considered coding as a mechanism for improving the
integrity of the communication system in the presence of noise. The use of coding for security has
a different name: encryption. The use of digital computers has made highly secure communication
a normal occurrence. The basis for key based encryption is that is very much easier to encrypt with
knowledge of the key than it is to decipher without knowledge of the key. The principle is just that
of a combination lock. With a computer the number of the digits in the lock can be very large. Of
course, one still has to keep the combination secure.
The most commonly used encryption algorithms are block ciphers. This means that the al
gorithm splits the plain text (message to be encrypted) into (usually) ﬁxed size blocks which are
then subjected to various functions to produce a block of cipher text. The most common functions
are permutations based on expansion and compression and straight shufﬂing transformations. In a
straight permutation, the bits of an n bit block are simply reordered. In expansion, as well as being
39
O0
O5
O1
O2
O3
O4
O0
O5
O1
O2
O3
O4
O1
O2
O3
O0
I0
I1
I2
I3
I4
I5
I0
I1
I2
I3
I4
I5
expansion compression straight
I0
I1
I2
I3
Figure 25: Examples of block cipher permutations.
reordered, the grouped of n bits is converted to m bits (m > n), with some bits being duplicated.
In compression, the n bit block in converted to a p bits (p < n), with some of the original bits
unused (see Fig. 25).
The most widely used form of encryption is deﬁned by the National Bureau of Standards and
is known as the data encryption standard (DES). The DES is a block cipher, splitting the data
stream into 64bit blocks that are enciphered separately. A unique key of 56 bits is then used to
perform a succession of transposition and substitution operations. A 56 bit key has 7.2
16
possible
combinations. Assuming a powerful computer could attempt 10
8
a combinations per second, it
would still take over 20 years to break the code. If the code is changed once per year, there is
little possibility of it being broken, unless the code breaker had additional information. The DES
converts 64 bits of of plain text into 64 bits of cipher text. The receiver uses the same key to
decipher the cipher text into plain text.
The difﬁculty with this method is that each block is independent. This permits an interceptor in
possession of the key to introduce additional blocks without the recipient being aware of this fact.
Like the combination of a lock, the system is only secure if the key is secure. If the key is
changed often, the security of the key becomes a problem, because the transfer of the key between
sender and receiver may not be secure. This is avoided by the use of matched keys. In a matched
key scheme, the encryption is not reversible with the same key. The message is encrypted using one
key, and decrypted with a second, matched key. The receiver makes available the ﬁrst, public key.
This key is use by the sender to encrypt the message. This message is unintelligible to anyone not
in possession of the second, private key. In this way the private key needs not be transferred. The
most famous of such scheme is the Public Key mechanism using the work of Rivest, Shamir and
Adleman (RSA). It is based on the use of multiplying extremely large numbers and, with current
40
technology, is computationally very expensive.
RSA numbers are composite numbers having exactly two prime factors that have been listed
in the Factoring Challenge of RSA Security? and have been particularly chosen to be difﬁcult to
factor. While RSA numbers are much smaller than the largest known primes, their factorization is
signiﬁcant because of the curious property of numbers that proving or disproving a number to be
prime (primality testing) seems to be much easier than actually identifying the factors of a number
(prime factorization).
Thus, while it is trivial to multiply two large numbers and together, it can be extremely difﬁcult
to determine the factors if only their product is given.
With some ingenuity, this property can be used to create practical and efﬁcient encryption
systems for electronic data. RSA Laboratories sponsors the RSA Factoring Challenge to encourage
research into computational number theory and the practical difﬁculty of factoring large integers,
and because it can be helpful for users of the RSA encryption publickey cryptography algorithm
for choosing suitable key lengths for an appropriate level of security.
A cash prize is awarded to the ﬁrst person to factor each challenge number.RSA numbers were
originally spaced at intervals of 10 decimal digits between 100 and 500 digits, and prizes were
awarded according to a complicated formula.
A list of the open Challenge numbers may be downloaded from RSA homepage.
Number digits prize (USD) factored (references)
RSA −100 100 Apr. 1991
RSA −110 110 Apr. 1992
RSA −120 120 Jun. 1993
RSA −129 129 Apr. 1994 (Leutwyler 1994, Cipra 1995)
RSA −130 130 Apr. 10, 1996
RSA −140 140 Feb. 2, 1999 (te Riele 1999)
RSA −150 150 Apr. 6, 2004 (Aoki 2004)
RSA −155 155 Aug. 22, 1999 (te Riele 1999, Peterson 1999)
RSA −160 160 Apr. 1, 2003 (Bahr et al. 2003)
RSA −200 200 May 9, 2005 (Weisstein 2005)
RSA −576 10000 Dec. 3, 2003 (Franke 2003; Weisstein 2003)
RSA −640 20000 Nov. 4, 2005 (Weisstein 2005)
RSA −704 30000 open
RSA −768 50000 open
RSA −896 75000 open
RSA −102 100000 open
RSA −153 150000 open
RSA −204 200000 open
Example 3 In order to see all this in action, we want to stick with numbers that we can actually
work with.
41
Finding RSA numbers
So we have 7 for P, our public key, and 23 for Q, our private key (RSA number, very small).
Encoding
We create the following character set:
2 3 4 6 7 8 9 12 13 14 16 17 18
A B C D E F G H I J K L M
19 21 23 24 26 27 28 29 31 32 34 36 37
N O P Q R S T U V W X Y Z
38 39 41 42 43 46 47 48 49 51 52 53
sp 0 1 2 3 4 5 6 7 8 9 ?
The message we will encrypt is ”VENIO” (Latin for ”I come”):
V E N I O
31 7 19 13 21
To encode it, we simply need to raise each number to the power of P modulo R=55.
V : 31
7
(mod 55) = 27512614111(mod 55) = 26
E : 7
7
(mod 55) = 823543(mod 55) = 28
N : 19
7
(mod 55) = 893871739(mod 55) = 24
I : 13
7
(mod 55) = 62748517(mod 55) = 7
O : 21
7
(mod 55) = 1801088541(mod 55) = 21
So, our encrypted message is 26, 28, 24, 7, 21 – or RTQEO in our personalized character set. When
the message ”RTQEO” arrives on the other end of our insecure phone line, we can decrypt it simply
by repeating the process – this time using Q, our private key, in place of P.
R : 26
23
(mod 55) = 350257144982200575261531309080576(mod 55) = 31
T : 28
23
(mod 55) = 1925904380037276068854119113162752(mod 55) = 7
Q : 24
23
(mod 55) = 55572324035428505185378394701824(mod 55) = 19
E : 7
23
(mod 55) = 27368747340080916343(mod 55) = 13
O : 21
23
(mod 55) = 2576580875108218291929075869661(mod 55) = 21
The result is 31,7,19,13,21 – or ”VENIO”, our original message.
42
4 Signal Representation
In this Chapter, we are going to present a detailed discussion on Sampling theorem mentioned
before: how fast to sample an analogous signal so that we could recover the signal perfectly. We
will also introduce some basic tools which are essential for our later part of the module, including
z transform, discrete time Fourier transform (DTFT) and discrete Fourier transform (DFT).
4.1 Sequences and their representation
A sequence is an inﬁnite series of real numbers {x(n)}, which is written
{x(n)} = {· · · , x(−1), x(0), x(1), x(2), · · · , x(n) · · · }
This can be used to represent a sampled signal, i.e. x(n) = x(nT), where x(t) is the original
(continuous) function of time. Sometimes sequence elements are subscripted, x
n
being used in
place of x(n).
The most basic tool in DCSP is the Z transform(ZT), which is related to the generating function
used in the analysis of series. In mathematics and signal processing, the Ztransform converts a
discrete time domain signal, which is a sequence of real numbers, into a complex frequency domain
representation. The ZT of {x(n)} is
X(z) =
¸
x(n)z
−n
where the variable z is a complex number in general and
¸
is
¸
∞
n=−∞
. The ﬁrst point to note
about ZT’s is that some sequences have simple rational ZT’s.
Example 4
{x(n)} = {1, r, r
2
, · · · }, n ≥ 0 and x(n) = 0, n < 0
has ZT
X(z) =
¸
r
n
z
−n
which can be simpliﬁed if r/z < 1 to give
X(z) = 1/(1 −rz
−1
)
To check that this is correct, we simply invert the ZT by long division
X(z) = 1 +rz
−1
+ rz
−2
+· · ·
which obviously corresponds to the original sequence.
The ZT’s main property concerns the effects of a shift of the original sequence. Consider the
ZT
Y (z) =
¸
x(n)z
−(m+n)
43
which can be written
Y (z) = z
−m
X(z)
This obviously corresponds to the sequence {y(n)} = {x(n −m)}.
Now if we add sequences {a(n)}, {b(n)}, we get a sequence
{c(n)} = {· · · , a(−1) +b(−1), a(0) +b(0), a(1) +b(1), · · · }
with ZT
C(z) =
¸
(a(n) +b(n))z
−n
= A(z) +B(z)
which is just the sum of the ZT’s of {a(n)}, {b(n)}, i.e. the ZT is linear.
Now consider the product of two ZT’s
C(z) = A(z)B(z)
for example. The question arises as to what sequence this represents. If we write it out in full
C(z) =
¸
a(n)z
−n
¸
b(m)z
−m
which can be rewritten
C(z) =
¸
a(n)b(m)z
−(m+n)
which is the ZT of the sequence
{c(n)} =
¸
a(m)b(n −m)
Sequences {c(n)} of this form are called the convolution of the two component sequences
a(n), b(n), and are sometimes written as
c(n) = a(n) ∗ b(n)
Convolution describes the operation of a digital ﬁlter, as we shall see in due course. The fundamen
tal reason why we use ZT is that convolution is reduced to multiplication and this is a consequence
of the even more basic shift property expressed in the equation above.
Example 5 (Discrete time unit impulse)
The unit impulse (Fig. 26) δ(n) is the most elementary signal, and it provides the simplest
expansion. It is deﬁned as
δ(n) =
1 if n = 0
0 otherwise
(4.1)
Any discrete signal can be expanded into the superposition of elementary shifted impulse, each
one representing each of the samples. This is expressed as
x(n) =
¸
x(k)δ(n −k)
where each term x(k)δ(n −k) in the summation expresses the nth sample of the sequence.
44
Figure 26: An impulse δ(n) and a shifted impulse δ(n −k)
4.2 Discrete Time Fourier Transform (DTFT)
The basic tool of signal analysis is the Fourier transform, which is treated in detail in a number of
references and revisited before. Although the Fourier transform is not the only transform, it is the
one most widely used as a tool in signal analysis. Most likely, you have seen the Fourier transform
in its symbolic formulation applied to signals expressed mathematically. For example, we know
what the Fourier transform of a rectangular pulse, of a sinusoid, or of a decaying exponential is.
However, in most applications of interest, the goal is to determine the frequency content of a signal
from a ﬁnite set of samples stored on a disk or a tape.
The discrete Fourier transform (DFT), is the algorithm we use for numerical computation.
With the DFT we compute and estimate of the frequency spectrum of any sort of data set stored
as an array of numerical entries. In this chapter we quickly review the Fourier transform for
discrete time signal (the DTFT, discrete time Fourier transform) and present the DFT in detail. In
particular, we will be concentrating on two classes of applications: (1) spectral estimation, and
(2) compression. The latter involves a variant of the DFT called the discrete cosine transform (or
DCT), which is particularly well behaved when applied to approximation problems.
Deﬁnition and properties:
The DTFT gives the frequency representation of a discrete time sequence with inﬁnite length.
By deﬁnition
45
0 5 10 15
0
1
2
3
4
magnitude X(ω)
0 5 10 15
−1
−0.5
0
0.5
1
Figure 27: DTFT magnitude and phase (right).
X(ω) = DTFT{x(n)} =
∞
¸
n=−∞
x(n) exp(−jωn), π ≤ ω < π
x(n) = IDTFT(X(ω)) =
1
2π
π
−π
X(ω) exp(jωn)dω, −∞< n < ∞
(4.2)
The inverse discrete time Fourier transform (IDTFT) can be easily determined by substituting the
expression of the DTFT, which yields
1
2π
π
−π
X(ω) exp(jωn)dω =
∞
¸
m=−∞
x(m)
1
2π
π
−π
exp(jω(n −m))dω
= x(n)
where we used the fact that the term in brackets in the expression above is 1 when n = m and 0
for all other cases.
By deﬁnition, X(ω) is always periodic with period 2π, since
X(ω + 2π) = X(ω)
and this is the reason why all the information is included within one period, say in the interval
−π ≤ ω < π, as shown in Fig. (27)
Example 6 (Fig. 28) consider the signal x(n) = 0.5
n
, n = 0, 1, 2, · · · then
X(ω) =
∞
¸
n=−∞
0.5
n
exp(−jωn) =
1
1 −0.5 exp(−jω)
46
−30 −20 −10 0 10 20 30 40
0.8
1
1.2
1.4
1.6
1.8
2
−3 −2 −1 0 1 2 3
0.8
1
1.2
1.4
1.6
1.8
2
−30 −20 −10 0 10 20 30 40
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
Figure 28: Upper panel (left), Magnitude of DTFT of X(ω), right, one period of it. Bottom panel, phase
of DTFT of X(ω).
47
−3 −2 −1 0 1 2 3 4
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Figure 29: Sinusoid and its DTFT
Example 7 Consider the signal x(n) = δ(n) then
X(ω) =
∞
¸
n=−∞
δ(n) exp(−jωn) = 1
4.2.1 Computation of the DTFT
It is well known that the whole Fourier approach to signal analysis is based on the expansion
of a signal in terms of sinusoids or, more precisely complex exponentials. In this approach we
begin analyzing a signal by determining the frequencies contributing to its spectrum in terms of
magnitudes and phases.
For example, if a sequence is a sinusoid x(n) = cos(ω
0
n) of inﬁnite length, its DTFT yields
two ‘delta’ functions, X(ω) = πδ(ω − ω
0
) + πδ(ω + ω
0
), as in Fig. 29 where we assume all
frequencies ω and ω
0
to be within the intervals [−π, π). This shows that the DTFT gives perfect
frequency localization as an exact concentration of energy at ±ω
0
provided (a) the sequence lasts
from −∞to ∞and we have an inﬁnite amount of memory to actually collect all the data points,
and (b) we compute X(ω) for all possible frequencies ω in the interval [−π, π), again requiring an
inﬁnite amount of memory and an inﬁnite computational time.
In practice, we do not have inﬁnite memory, we do not have inﬁnite time, and also any signal we
want to analyze does not have inﬁnite duration. In addition, the spectrum of the signal changes with
time, just as music is composed of different notes that change with time. Consequently, the DTFT
generally is not computable unless we have an analytical expression for the signal we analyze, in
which case we can compute it symbolically. But most of the time this is not the case, especially
when we want to determine the frequency spectrum of a signal measured from an experiment. In
48
this situation, we do not have an analytical expression for the signal, and we need to develop an
algorithm that can be computed numerically in a ﬁnite number of steps, such as the discrete Fourier
transform (DFT).
4.3 Discrete Fourier Transform (DFT)
Deﬁnition
The discrete Fourier transform (DFT) and its own inverse discrete Fourier transform (IDFT),
associate a vector X = (X(0), X(1), · · · , X(N −1)) to a vector x = (x(0), x(1), · · · , x(N −1))
of N data points, as follows,
X(k) = DFT{x(n)} =
N−1
¸
n=0
x(n) exp(−j2πkn/N), k = 0, 1, · · · , N −1
x(n) = IDFT{X(k)} =
1
N
N−1
¸
k=0
X(k) exp(jπkn/N), n = 0, · · · , N −1
(4.3)
If we deﬁne w
N
= exp(−j2πk/N), then
X(k) = DFT{x(n)} =
N−1
¸
n=0
x(n)(w
N
)
n
which implies that X(k) is a weighted summation of (W
N
)
n
(basis).
Example 8 (see Appendix 6.8) Let x = [1, 2, −1, −1] be a data vector of length N = 4. Then
applying the deﬁnition, w
4
= exp(−j2π/4) = −j, and therefore
X(k) = 1 + 2(−j)
k
−1(−j)
2k
−1(−j)
3k
for k = 0, 1, 2, 3. This yields the DFT vector
X = DFT{x} = [1, 2 −3j, −1, 2 + 3j]
4.3.1 The relationship between DFT and DTFT
In spite of their similarities, the DFT and DTFT are two quite different operations. Whereas the
DFT is a numerical operation, which computers a ﬁnite number of coefﬁcients X(0), · · · , X(N −
1) from a ﬁnite set of data x(0), · · · , x(N −1), the DTFT is not computable numerically because
it yields a continuous function, X(ω), based on an inﬁnite sequence x(n).
In this section we address the problem of estimating X(ω) = DTFT{x(n)} based on the
DFT of a sample of ﬁnite length N. The ﬁnite sample can be given in terms of either an analytic
expression or a set of observations stored in memory, such as a vector.
49
Before going any further, let us see how we can reconcile the fact that we deﬁned the DFT for
data in the form x(0), · · · , x(N − 1), while in reality we might have data starting at any point in
time, such as x(n
0
), · · · , x(n
0
+N−1). To simplify the arguments to follow, we assume the initial
time to be n
0
= −N/2, with any choice of rounding for odd values of the data length N. Then we
can write the DFT in terms of the ﬁnite sum
ˆ
X
N
(ω) =
N/2−1
¸
n=−N/2
x(n) exp(−jωn)
and X(k) =
ˆ
X
N
(k2π/N). Also recall the DTFT as an inﬁnite sum
X(ω) =
∞
¸
−∞
x(n) exp(−jωn)
Now the question is, will
ˆ
X
N
→ X as N → ∞ for each value of the frequency variable ω?. If
the answer is positive, the approximation makes sense and we can say that we can estimate the
frequency spectrum of the whole signal based on a set of samples. The more data(i.e. the larger N
is), the better the approximation.
What decides the convergence is how the signal x(n) behaves as the index n →∞. In order to
fully understand the arguments presented, recall that the inﬁnite series
¸
1/n
L
, with L an integer,
is divergent for L = 1 and convergent for L ≥ 2. In particular, we distinguish three different cases,
all illustrated in Fig. 30.
• Absolutely summable signals. These signals are such that
∞
¸
n=−∞
x(n) < ∞
For these signals the DTFT always exists, is ﬁnite, and converges for every ω, as
lim
N→∞
ˆ
X
N
(ω) = X(ω) < ∞
The fact that X(ω) is ﬁnite comes from the following inequality
X(ω) = 
¸
x(n) exp(−jωn) ≤
¸
x(n) < ∞
These are the bestbehaved sequences, in terms of convergence. This means that for every
ω, we can approximated X(ω) inﬁnitely closely by the ﬁnite sequence
ˆ
X
N
(ω), with N
sufﬁciently large.
Typical examples in this class are signals decaying as 1/n
2
for which x(n) ≤ C/n
2
, for n
sufﬁciently large. In particular, signals decaying exponentially fast.
50
Figure 30: Three cases.
51
• Signals with Finite Energy (not inﬁnitely summable). There are signals such that
¸
x(n)
2
< ∞
For these signals we do not have pointwise convergence as in the preceding case, but we can
say that
lim
N→∞
X(ω) −
ˆ
X
N
(ω)
2
dω →0
• This is one of the more important issues in DSP and we will treat the third case in the next
subsection.
Example 9 Consider the sequence
x(n) =
sin(0.25πn)
πn
We want to estimate X(ω) = DTFT(x(n)) using the DFT. Choose for example, ﬁnite sequences
x(n) for −L ≤ n ≤ L, of length N = 2L + 1. The plots of
ˆ
X
N
(k) for several values of L are
shown in Fig. 31, and the lack of convergence at the discontinuities (ω = ±π/4) is evident.
Example 10 Consider the sequence
x(n) =
sin
2
(0.25πn)
(πn)
2
Again we want to estimate X(ω) = DTFT(x(n) using the DFT. Fig. 32 shows convergence for
all values of ω by comparing two cases with L = 32 and L = 1024. This is expected because the
signal is absolutely summable.
4.3.2 DFT for spectral estimation
One of the uses of the DFT is to estimate the frequency spectrum of a signal. In particular, given
a continuous time signal x(t), the goal is to determine its frequency components by taking a ﬁnite
set of samples as a vector, x(0), x(1), · · · , x(N − 1), and computing its DFT as in the previous
section. This is shown in Fig 33.
The fact that we window the signal x(n) (i.e. we take a ﬁnite set of data) will have an effect on
the frequency spectrum we compute, and we have to be aware of that when we interpret the results
of the DFT. This can be seen by deﬁning a window sequence
w(n) =
1 if n = 0, 1, · · · , N −1
0 otherwise
52
0 0.5 1 1.5 2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
x=2π/65
L=32
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
L=512
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
L=1024
Figure 31: Discontinuity for L = 32, 128, 512, 1024
53
0 0.5 1 1.5 2 2.5 3
0
0.05
0.1
0.15
0.2
0.25
2π k/65
L=32
0 0.5 1 1.5 2 2.5 3
0
0.05
0.1
0.15
0.2
0.25
L=1024
Figure 32: A summable case.
0 2 4 6 8 10
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
2 3 4 5 6 7
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Figure 33: Windowing and then sampling.
54
and the DTFT of the windowed signal
X
w
(ω) = DTFT(w(n)x(n)) =
N−1
¸
n=0
x(n) exp(−jωn)
Now notice that we can relate the righthand side of the preceding expression to the DFT of the
ﬁnite sequence x(0), · · · , x(N −1) as
X(k) = X
w
(ω)
ω=k2π/N
In other words, by the DFT we compute samples of the frequency spectrum of the windowed signal
x(n)w(n). To see the effect of the windowing operation on the frequency spectrum, let us examine
the case when the signal x(n) is a complex exponential, such as x(n) = exp(jω
0
n). Then
X
w
(ω) =
∞
¸
−∞
w(n) exp(jω
0
n) exp(−jωn) = W(ω −ω
0
)
where the rightmost term is the frequency spectrum of the window function
W(ω) = DTFT(w(n)) =
N−1
¸
n=0
1 −exp(−jωN)
1 −exp(jω)
This expression can be easily manipulated to simpler form and we can write
W(ω) = exp(−jω(N −1)/2)
sin(ωN/2)
sin(ω/2)
The magnitude of this expression is shown in Fig. 34; it repeats periodically with period 2π. As a
result, the DTFT of the windowed complex exponential is given by
X
w
(ω) = W(ω −ω
0
) =
 sin[(ω −ω
0
)N/2]
 sin[(ω −ω
0
)/2]
Fig. 34 compares the frequency spectra of the complex exponential before and after the win
dowing operations. Without windowing, the complex exponential is perfectly localized in fre
quency, since
DTFT(exp(jω
0
n)) = 2πδ(ω −ω
0
)
Once we window the signal, its frequency spectrum is not perfectly localized any more. The
energy spreads around the frequency ω
0
within the mainlobe and trickles to other frequencies by
the sidelobes of the sinc function.
A number of windows currently used are shown in Fig. 35. In all cases, they are designed
with a smoother transition between the two regions (inside and outside the window), and as a
consequence the sidelobes are lower. The drawback is that they all have a wider mainlobe. Given
55
1 2 3 4 5 6 7 8 9 10
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−3 −2 −1 0 1 2 3
0
1
2
3
4
5
6
7
8
9
10
Figure 34: Left, sin(jω
0
n) and cos(j ∗ ω
0
n). Right: the spectrum of exp(jω
0
n) with n = 1, 2, · · · , 10
(N=10) and ω
0
= 0.5.
a ﬁxed set of data points, we have to accept a compromise between frequency resolution (width of
the mainlobe) and sidelobes.
As we have seen, the DFT is the basis of a number of applications, and is one of the most
important tools in digital signal processing. The problem with this algorithm is that a bruteforce
implementation would hardly be feasible in real time, even for a data set of modest length. For
tunately, the DFT can be computed in a very efﬁcient way, exploiting the very structure of the
algorithm we have presented by using the fast Fourier transform (FFT), which is an efﬁcient way
of computing he DFT. It is know that for a data set of length N, the complexity of the FFT grows
as N log
2
N, the complexity of the DFT computed by brute force grow as N
2
. FFT is widely used
in many applications.
4.4 *Sampling and reconstruction*
We have assumed so far that the sequence of samples on which we are to operate is given to us. In
practice, we shall totally have to obtain this by sampling a continuous time (or space) signal x(t),
say. The question we consider here is under what conditions we can completely reconstruct the
original signal x(t) from its discretely sampled signal x(n).
To this end, let us introduce more notation. Consider a signal x(t) and its samples x(n) =
x(nT
s
), with T
s
being the sampling interval and F
s
= 1/T
s
the sampling frequency.
Now the question is how we can relate X(ω) = DTFT(x(n)) to X(F) = FT(x(t)). In
particular, we want to identify conditions under which we can determine X(F) from X(ω), or in
other words, if we can reconstruct the analog signal x(t) form its samples x(n).
56
0 10 20 30 40 50 60 70 80 90 100
−40
−20
0
20
40
M
a
g
n
i
t
u
d
e
(
d
B
)
Frequency (Hertz)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−1
−0.5
0
0.5
1
W
a
v
e
f
o
r
m
Time (Seconds)
Click and drag waveform to change
fundamental frequency and amplitude
0 10 20 30 40 50 60 70 80 90 100
−40
−20
0
20
40
M
a
g
n
i
t
u
d
e
(
d
B
)
Frequency (Hertz)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−1
−0.5
0
0.5
1
W
a
v
e
f
o
r
m
Time (Seconds)
Click and drag waveform to change
fundamental frequency and amplitude
0 10 20 30 40 50 60 70 80 90 100
−40
−20
0
20
40
M
a
g
n
i
t
u
d
e
(
d
B
)
Frequency (Hertz)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−1
−0.5
0
0.5
1
W
a
v
e
f
o
r
m
Time (Seconds)
Click and drag waveform to change
fundamental frequency and amplitude
Figure 35: Various window. Created from Matlab demos: Signal Processing: transforms: DFT.
57
To achieve this goal, deﬁne
x
s
(t) =
∞
¸
n=−∞
x(nT
s
)δ(t −nT
s
) = x(t)
¸
δ(t −nT
s
)
This signal shows the sampling operation; all the information within the sampling interval is lost
but at the same time the signal is still deﬁned in continuous time. Therefor we can take the Fourier
transform, using both expressions on the right hand side, to obtain
X
s
(F) = FT(x
s
(t)) =
∞
¸
n=−∞
x(n) exp(−j2π(F/F)s)/n) = X(F) ∗ F
s
∞
¸
−∞
δ(t −nT
s
)
from the fact tat
FT(
¸
δ(t −nT
s
)) = F
s
¸
δ(F −kF
s
)
. The convolution with a delta function yields just a shift of the variable, so we can write
X
s
(F) = X(ω)
ω=2π(F/F
s
)
= F
s
¸
X(F −kF
s
)
where X(ω) = DTFT(x(n)). In other words, when a signal x(t) is sampled in time, its frequency
spectrum X(F) is repeated in frequency.
We can see that if the signal x(t) is bandlimited, in the sense that X(F) = 0 for F > F
B
, for
some frequency F
B
called the bandwidth of the signal, and we sample at a frequency F
s
> 2F
B
,
then there is no overlapping between the repetitions on the frequency spectrum. In other words, if
F
s
> 2F
B
,
X
s
(F) = F
s
X(F)
in the interval −F
s
/2 < F < F
s
/2 and X(F) can be fully recovered from X
s
(F). This is very
important because it states that the signal x(t) can be fully recovered from its samples x(n) =
x(nT
s
), provided we sample fast enough, F
s
> 2F
B
.
The way to recover the continuous time signal x(t) from its samples x(nT
s
) is by an interpo
lating function g(t). In fact, deﬁne ˆ x(t)
ˆ x(t) =
¸
x(nT
s
)g(t −nT
s
)
Its Fourier transform is determined using simple properties
ˆ
X(F) =
¸
x(n) exp(−j2πFT
s
)G(F)
where G(F) = FG(g(t), the interpolating function. As we have seen, the summation is just
X
s
(F). Therefore, if G(F) is such that
G(F) = T
s
rect(
F
F
s
=
T, if −F
s
/2 < F < F
s
/2
0 otherwise
58
Then
ˆ
X(F) = X(F) and ˆ x(t) = x(t). This implies that the ideal interpolating function is given
by
g(t) = IFT
T
s
rect
F
F
s
= sinc(t/T
s
)
and we can perfectly reconstruct the continuous time signal x(t) from its samples x(n):
X(t) =
¸
x(n)sinc
t −nT
s
T
s
In summary, we have the following Nyquist–Shannon sampling theorem.
Theorem 2 Exact reconstruction of a continuoustime baseband signal from its samples is possi
ble if the signal is bandlimited and the sampling frequency is greater than twice the signal band
width.
59
a
+ D
multiplier adder delay
x(n) x(n) y(n)
y(n)
y(n) x(n)
x’(n)
Figure 36: Multiplier, adder and delay units
5 Digital Filters
A digital ﬁlter is any electronic ﬁlter that works by performing digital mathematical operations
on an intermediate form of a signal. Digital ﬁlters can achieve virtually any ﬁltering effect that
can be expressed as a mathematical function or algorithm. The two primary limitations of digital
ﬁlters are their speed (the ﬁlter can’t operate any faster than the computer at the heart of the ﬁlter),
and their cost. However as the cost of integrated circuits has continued to drop over time, digital
ﬁlters have become increasingly commonplace and are now an essential element of many everyday
objects such as radios, cellphones, and stereo receivers.
A wellknown signal processing wizard is said to have remarked, “When you think about it,
everything is a ﬁlter.”
5.1 Operations on Sequences
Digital ﬁltering is all based on a small set of very simple linear shift invariant operation on se
quence. These are
1. Multiplication by a constant coefﬁcient. If {x(n)} is the input sequence, the output is
{y(n)} = a{x(n)} = {· · · , ax(−1), ax(0), a(x), · · · , ax(n) · · · }
. Multipliers are normally depicted graphically by a triangle as in Fig. 36.
2. Adders do the obvious thing: if {x
1
(n)}, {x
2
(n)} are the input sequences, the output is just
{y(n)} = {· · · , x
1
(−1) +x
2
(−1), x
1
(0) +x
2
(0), x
1
(1) +x
2
(1), · · · }
60
3. The other component is a delay unit, i.e. a single memory element. This operates on the
input sequence {x(n)} to give the output
{y(n)} = D{x(n)} = {x(n −1)}
Note that you must regard this as chning the sequence, rather than acting independently on
the elements.
5.2 Filters
The commonly used digital ﬁlters are all based on combinations of these three operations. The
general ﬁlter can be described by the operational equation
{y(n)} =
N
¸
m=0
a(m)D
m
{x(n)} +
N
¸
m=1
b(m)D
m
{y(n)}
Note that the sum over the outputs starts at a delay of 1, not 0. If we use the ZT, with {x(n)} ↔
X(z),{y(n)} ↔ Y (z), we must ﬁrst note that the operator D has a ZT representation z
−1
, from
the shift result we established before. This mean that the ﬁlter above can be rewritten in ZT form
Y (z)(1 −
N
¸
m=1
b(m)z
−m
) =
N
¸
m=0
a(m)z
−m
X(z)
No the ratio
H(z) = Y (z)/X(z)
is known as the ZT transfer function of the ﬁlter
H(z) =
¸
N
m=0
a(m)z
−m
1 −
¸
N
m=1
b(m)z
−m
Now the transfer function H(z) is the ZT of the ﬁlter output sequence when the input sequence
has ZT X(z) = 1. This is the ZT of the unit impulse sequence
{x(n)} = {1, 0, · · · , · · · }
Thus the other way to characterise the ﬁlter is in terms of its impulse response since the re
sponse to any sequence is just the sum of the responses to the individual impulses of the sequence
components.
61
+ + + D D
x(n)
y(n) D
h(N) h(N1) h(1) h(0)
Figure 37: Direct nonrecursive ﬁlter implementation
5.3 Nonrecursive Filters
5.3.1 Operational Deﬁnition
When a ﬁlter is nonrecursive, its operational equation can be written
{y(n)} =
N
¸
m=0
h(m)D
m
{x(n)}
Such ﬁlters are also called ﬁnite impulse response ﬁlters, for the obvious reason that their IR
contain only ﬁnitely many nonzero terms. Correspondingly, the ZT of a nonrecursive ﬁlter can be
written as
H(z) =
N
¸
m=0
h(m)z
−m
and the impulse response is just
{h(n)} = {h(0), h(1), · · · }
Such a ﬁlter obviously has ﬁnite memory, in the sense that its response to a particular input element
is zero after N time steps. This can also be seen from the block diagram of a direct implementation
of the ﬁlter, Fig 37, which is a canonical form. FIR ﬁlters are all called moving average (MA)
ﬁlters.
62
Re z
Im z
z
z
z
z
z
*
*
1
2
3
3
2
Figure 38: The zeros of an FIR ﬁlter transfer function lies on the real axis of occur in conjugate pairs
5.3.2 Zeros
Trying to ﬁgure out how an FIR ﬁlter will behave, given its IR, is note always so simple. Another
way of looking at it is through its frequency domain behaviors. We can make a start on this by
examining the zeros of its transfer function H(z), i.e. those values of z for which
H(z) = 0
Since H(z) is a polynomial of order N with real coefﬁcients, it follows that the equation has N
roots, which are either real or occur in complex conjugate pairs. We can express H(z) in terms of
the roots by writing
z
N
H(z) =
N
¸
m=1
(z −z
m
)
where in general z
m
= z
m
 exp(j arg[z
m
]] is the mth root, or zero of the transfer function. The
zeros of a transfer function are usually denoted graphically in the complex zplane by circles, as
shown in Fig. 38.
The factorisation of H(z) implies that H(z) can be written as a product of quadratic polyno
mials in z
−1
with real coefﬁcients
H(z) =
M
¸
m=1
(h
m
(0) +h
m
(1)z
−1
+h
m
(2)z
−2
)
This is effectively a series implementation of the transfer function. Series implementations have
advantage in terms of both hardware and software modularity and sensitively to coefﬁcient quan
tization. They are often sued in practice.
63
Figure 39: Block diagram of an ARMA ﬁlter.
5.4 Recursive Filters
5.4.1 Operational Deﬁnition
Of the many ﬁlter transfer function which are not FIR, the most commonly use in DSP are the
recursive ﬁlters, so called because their current output depends not only on the last N inputs but
also on the last N outputs. They have operational equation which are written as
(1 −
N
¸
m=1
b(m)D
m
){y(n)} =
N
¸
m=1
a(m)D
m
){x(n)}
and a ZT transfer function
H(z) =
A(z)
B(z)
where
A(z) =
N
¸
m=0
a(m)z
−m
, B(z) = 1 −
N
¸
m=1
b(m)z
−m
and if B(z) = 1, then their impulse response in inﬁnitythey are IIR.
{h(n)} = {h(0), h(1), · · · }
The block diagram (Fig. 39) of such a ﬁlter is an obvious generalisation of that for the nonrecursive
Nth order ﬁlter. If the numerator coefﬁcient a(m) = 0, then the ﬁlter is called an all pole ﬁlter(or
sometimes an autoregressive (AR) ﬁlter).
64
Re z
Im z
z=1
Figure 40: Poles and zeros of a ﬁlter transfer function.
5.4.2 Poles and Zeros
Writing he numerator and denominator of H(z) in factors we get
H(z) =
A(z)
B(z)
= (
N
¸
n=1
(z −α
n
))/(
N
¸
n=1
(z −β
n
))
We knowthat the roots α
n
are the zeros of the transfer function. The roots of the equation B(z) = 0
are called the poles of the transfer function. They have greater signiﬁcance for the behaviour of
H(z): it is singular at the points z = β
n
. Poles are drawn on the zplane as crosses, as shown in
Fig. 40.
ON this ﬁgure, the unit circle has been shown on the zplane. In order for a recursive ﬁlter to
be bounded input bounded output stable, all of the poles of its transfer function must lie inside the
unit circle. A ﬁlter is BIBO stable if any bounded input sequence gives rise to a bounded output
sequence. Now if the pole of the transfer lie insider the unit circle, then they represent geometric
series with a coefﬁcient whose magnitude β
m
 < 1, i.e. a sequence {1, β
m
, β
2
m
, · · · } which is
convergent. Consequently, the ﬁlter output, which is a sum of such sequences weighted by the
appropriate input terms, is bounded if the input is. If, on the other hand, β
m
 > 1, the geometric
series diverges and the ﬁlter output will grow without bound as n → ∞. If β
m
 = 1, the ﬁlter
is said to be conditionally stable: some input sequence will lead to bounded output sequence and
some will not.
Since FR ﬁlters have no poles, they are always BIBO stable: zeros have no effect on stability.
65
5.5 Frequency and digital ﬁlters
5.5.1 Poles, Zeros and Frequency Response
Now suppose we have the ZT transfer function of a ﬁlter
H(z) =
A(z)
B(z)
We can formally represent the frequency response of the ﬁlter by substituting z = exp(jω) and
obtain
H(ω) =
A(ω)
B(ω)
Obviously, H(ω) depends on the locations of the poles and zeros of the transfer function, a fact
which ca n be made more explicit by factoring the numerator and denominator polynomials to
write
H(ω) = K
¸
N
n=1
(exp(jω) −α
n
)
¸
N
n=1
(exp(jω) −β
n
)
where K is a constant, α
n
, β
n
are respectively the nth zero and pole of the transfer function. Each
factor in the numerator or denominator is a complex function of frequency, which has a graphical
interpretation in the rms of the location of the corresponding root in relation to the unit circle This
is illustrated in Fig. 41. The factors are simply the difference vectors between the root location and
the point on the unit circle at angle ω. This is true for any stable ﬁlter (i.e poles inside unit circle).
Writing, for example, the nth zero as
α
n
= α
n
 exp(jθ
n
)
we ﬁnd that it makes a contribution with magnitude
A
n
(ω)
2
= 1 +α
n

2
−2α
n
 · (ω −θ
n
)
to the overall frequency response. This has a minimum at ω = θ
n
and a maximum at ω = θ
n
+ π.
In other words, the contribution of a zero to the ﬁlter frequency response is a minimum at the angle
of the zero and a maximum at the diametrically opposite frequency. Similarly, the contribution
from a pole is a maximum at eh angle of the pole and a minimum at the opposite angle. Thus we
can make a reasonable guess about the ﬁlter frequency response imply by looking at the polezero
diagram: lowpass ﬁlters have poles near the original ω = 0 and zeros at high frequencies and so
on.
To get a more detailed picture of the frequency response, it is always useful to calculate it at
frequencies ω = 0, π/2, π, which are simple to calculate, but even with these few points and the
locations of poles and zeros, we can get a good idea of how the response will look. Note also that,
because they are in the denominator, the contribution of a pole to the response grows as β
n
 →1,
while a zero on the unit circle implies a null in the response at that frequency.
66
Re z
Im z
z=1
z=e
jw
β
α
n
n
 e 
 e 
β 
α 
n
n
jw
jw
Figure 41: The contribution of a pole or zero to a ﬁlter frequency response
To make matters more concrete, Fig. 42 shows what effect varying the magnitude and angle of
a simple zero has on the frequency response magnitude, while Fig. 43 does the same thing for a
pole. A signiﬁcant point to note in comparing to the diagrams is how even a single pole can give a
sharp peak in the frequency response magnitude, where a zero inevitable gives a rather broad one.
This is a consequence of the poles belonging to the denominator of the transfer function, of course.
It is primarily for this reason that recursive designs are preferred in many applications.
5.5.2 Filter Types
There are four main classes of ﬁlter in widespread use: lowpass, highpass, bandpass and bandstop
ﬁlters. The name are selfexplanatory, but he extent to which the ideal frequency responses can be
achieved in practice is limited. This four types are shown in Fig. 44.
In practice, these ideal shapes require IIR ﬁlters which do not have a good approximation in
terms of a rational transfer function. To achieve a given magnitude response, a recursive ﬁlter
of a given order is generally better than a nonrecursive one, but this is often at the expense of
an unpleasant (nonlinear) phase response: linear phase implies the ﬁlter simply delays the input
signals, as well as shaping its frequency content. The resulting ﬁlter will be inadequate in several
respects compared with the ideal:
1. Finite stopband attenuation. A practical ﬁlter will not have a null response to frequency
outside its passband: this would require zeros at all such frequencies.
2. Finite transition bandwidth. Any realisation will not have a discontinuous frequency re
sponse: there will be a band of frequencies width are neither stopped not passed, but are
transitional.
67
3 2 1 1 2 3
(i)
0.2
0.4
0.6
0.8
1
3 2 1 0 1 2 3
(ii)
0.2
0.4
0.6
0.8
1
Figure 42: The effect of varying (1) the magnitude and (ii) the angle of a simple zero. The squared
magnitude is plotted vs. radial frequency.
3 2 1 1 2 3
(i)
0.2
0.4
0.6
0.8
1
3 2 1 1 2 3
(ii)
0.2
0.4
0.6
0.8
1
Figure 43: The effect of varying (1) the magnitude and (ii) the angle of a simple pole. The squared
magnitude is plotted vs. radial frequency.
68
H(
1
ω)
H(
1
ω)
0 ω 2π−ω 2π
c c
0 ω 2π−ω 2π
c c
ω ω
0 1
ω ω
0 1 2π−ω 2π−ω
1 0
2π−ω 2π−ω
1 0
(i) (ii)
(iii) (iv)
Figure 44: The common types of digital ﬁlter frequency response: (i) lowpass, (ii) highpass, (iii) bandpass,
(iv) bandstop
3. Many ﬁlters also have a nonuniform response in the passband: passband ripple
Different designs, such Butterworth, Chebyshev or Elliptic, have different tradeoff between
these artifacts. Originally, much digital ﬁlter design was based on the transformation of analogue
designs, using a variety of techniques including he impulseinvariant of bilinear transforms.
5.6 Simple Filter Design
In a number of cases, we can design a linear ﬁlter almost by inspection, by moving poles and
zeros like pieces on a chessboard. This is not just a simple exercise designed for an introductory
course, for in many cases the use of more sophisticated techniques might not yield a signiﬁcant
improvement in performance.
In this example consider the audio signal s(n), digitized with a sampling frequency F
s
kHZ.
The signal is affected by a narrowband (i.e., very close to sinusoidal) disturbance w(n). Fig. 45
show the frequency spectrum of the overall signal plus noise, x(n) = s(n) + w(n), which can be
easily determined.
Notice two facts:
1. The signal has a frequency spectrum concentrated within the interval 0 to 20+ kHz.
2. The disturbance is at frequency F
0
kHz.
Now the goal is to design and implement a simple ﬁlter that rejects the disturbance without
affecting the signal excessively. We will follow these steps:
69
0 5 10 15 20
10
−5
10
0
10
5
A
m
p
l
i
t
u
d
e
0 5 10 15 20
10
−5
10
0
10
5
Frequency (kHz)
A
m
p
l
i
t
u
d
e
Figure 45: A signal with a disturbance. ’Say it’ by John Coltrane Quartet.
Step 1: Frequency domain speciﬁcations. We need to reject the signal at the frequency of the
disturbance. Ideally, we would like to have the following frequency response:
H(ω) =
0 if ω = ω
0
1 Otherwise
where ω
0
= 2π(F
0
/F
s
) (let us say, it is π/4) radians, the digital frequency of the disturbance.
Recall that the digital frequency is a relative frequency, the therefore has no dimensions.
Step 2: Determine poles and zeros We need to place two zeros on the unit circle z
1
= exp(jω
0
) =
exp(jπ/4) and z
2
= exp(−jω
0
) = exp(−jπ/4). This would yield the transfer function
H(z) = K(z
−1
−z
1
)(z
−1
−z
2
) = K(1 −1.414z
−1
+z
−2
)
If we choose, say K = 1, the frequency response is shown in Fig. 46. As we can see, it rejects the
desired frequencies, as expected, but greatly distorts the signal.
A better choice would be to select the poles close to the zeros, within the unit circle, for
stability. For example, let the poles be p
1
= ρ exp(jω
0
) and p
2
= ρ exp(−jω
0
). With ρ = 0.95,
for example, we obtain the transfer function
H(z) = K
(z −z
1
)(z −z
2
)
(z −p
1
)(z −p
2
)
= 0.9543
z
2
−1.414z + 1
z
2
−1.343z + 0.9025
and we choose the overall gain K so that H(z)
z=1
= 1. This yields the frequency response shown
in Fig. 47, which is more selective than the previous choice. The only caveat is that the poles have
to be within the unit circle, even in the presence of numerical uncertainties.
70
0 0.5 1 1.5 2 2.5 3
0
0.5
1
1.5
2
2.5
3
3.5
0 0.5 1 1.5 2 2.5 3
−1
−0.5
0
0.5
1
1.5
2
2.5
0 5 10 15 20
10
−5
10
0
10
5
Nonrecursive filter
A
m
p
l
i
t
u
d
e
Disturbed Signal
Filtered Signal
0 5 10 15 20
10
−5
10
0
10
5
Frequency (kHz)
A
m
p
l
i
t
u
d
e
Original Signal
Filtered Signal
Figure 46: Frequency response with zeros and actual output of ﬁltered signal.
71
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 0.5 1 1.5 2 2.5 3
−1.5
−1
−0.5
0
0.5
1
1.5
2
Figure 47: Frequency response with poles close to the unit circle
Step 3: Determine the difference equation in the time domain From the transfer function, the
difference equation is determined by inspection:
y(n) = 1.343y(n −1) −0.9025y(n −2) + 0.954x(n) −1.3495x(n −1) + 0.9543x(n −2)
The difference equation can be easily implemented as a recursion in a highlevel language. The
ﬁnal result is the signal y(n) with the frequency spectrum shown in Fig. 48, where we notice the
absence of the disturbance.
A Matlab program is available in Appendix.
5.7 Matched Filter
One common problem in signal processing is that of detecting the presence of a known signal
against a background of noise. This has applications in radar, sonar, communications and pattern
recognition. The problem is to choose the ﬁlter IR {h(n)} which maximises the ﬁlter output
{y(n)}, then we want to maximise the output at time 0
y(0) =
∞
¸
n=−∞
h(n)x(−n)
As it stands, however, this is not a well deﬁned problem: if we double all the ﬁlter coefﬁcients, the
output will also double. We must therefore constrain the ﬁlter coefﬁcients to have ﬁnite energy, i.e.
¸
h
2
(n) = 1
72
0 5 10 15 20
10
−5
10
0
10
5
Recursive filter
Frequency (kHz)
A
m
p
l
i
t
u
d
e
Disturbed Signal
Filtered Signal
0 5 10 15 20
10
−5
10
0
10
5
Frequency (kHz)
A
m
p
l
i
t
u
d
e
Original Signal
Filtered Signal
Figure 48: Frequency spectrum of ﬁltered signal
We can solve this constrained maximisation using the technique of Langrange multipliers, which
tells us to maximise the function
E(h) = y(0) −λ(
¸
h
2
(n) −1)
where the second term is the constraint and λ is called the Lagrange multiplier, which is a number
we shall select so that the constraint equation above is satisﬁed. Differentiating the equation above
with respect to h(n) gives
∂E(h)
∂h(n)
= x(−n) + 2λh(n)
and setting this to 0 gives the solution
h(n) = x(−n)
where the constant = 1/2λ is chosen so that the energy is in unity
=
1
¸
x
2
(n)
In other words, the best ﬁlter, which is called hte matched ﬁlter for the signal, has an IR which is
just the timereverse of the signal sequence. For this ﬁlter, the output at time 0 is
y(n) =
¸
x(−m)x(n −m) =
¸
x(m)x(n +m)
In this case, the sequence {y(n)} is called the autocorrelation sequence of the signal {x(n)} and
is deﬁned as
r
x
(n) =
¸
x(m)x(n +m)
73
The autocorrelation sequence {r
x
(n)} has two important properties
1. Its is maximum is at time 0, r
x
(0) > r
x
(n), n = 0
2. It is symmetrical about 0, r
x
(−n) = r
x
(n)
The ZT of {r
x(n)
} is
R
x
(z) = X(z)X(z
−1
)
Similarly, we can deﬁne the crosscorrelation between two sequence {a(n)}, {b(n)} as
r
ab
(n) =
¸
a(m)b(m−n)
Note that here we have a −, i.e. {b(n)} is delayed with respect to {a(n)}. This is important
because in general crosscorrelation is not symmetric. In fact
r
ab
(−n) = r
ba
(n)
unless of course the two sequence are the same, a(n) = b(n), in which case we get the autocorre
lation.
Among random processes, the most random are the white noise processes, so called because
of their spectral properties. If {x(n)} is a zero mean stationary white noise process with variance
σ
2
, then it satisﬁes
Ex(n) = 0, R
x
(n) = σ
2
δ
n0
5.8 Noise in Communication Systems: Stochastic Processes
The main application for matched ﬁltering techniques is in the detection of known (deterministic)
signals in a background of random disturbances. In order to understand how well they work, it is
necessary to ﬁnd a way of modelling random noise: a sample from a stochastic process x(t) which
is random variable for each ﬁxed t.
Except the mean and variance we deﬁned before, the third important statical description of a
process is second order: the autocorrelation sequence. Unlike that deﬁned for deterministic signals
in the previous section, this must be deﬁned in an appropriate way for a random process, i.e. as as
expected value.
R(n) = E(x(m) −Ex(m))(x(m−n) −Ex(m−n))
Here we have assumed stationarity, i.e. the quantity above depends on only the difference
m, m−n. Although it is deﬁned in a completely different way from the deterministic autocorrela
tion before, the autocorrelation for random sequences shares the two properties of the deterministic
autocorrelation: symmetry and a maximum at the origin. Like the process mean, it can also be es
timated by an appropriate sample average, but getting a reasonable estimate is a complex task,
which is known as spectrum estimation, which we do not have space for. It is essential to be clear
about the distinction between these two types of correlation sequence: one applies to known, ﬁnite
74
energy signals, the other to random, ﬁnite power signals. The ZT of the autocorrelation sequence
is deﬁned as
S(z) =
¸
R(n)z
−n
The autocorrelation of a process describes how similar it is to its time shifts: how dependent the
value at a given time n is on that at a different time. If the process has nonzero mean, however, a
large value of the autocorrelation will result even if the variation of the signal is completely random
Consequently, it is generally preferable to remove the mean before calculating the correlation. The
resulting measure is called the autocovariance. Among random processes, the most random are the
white noise processes, so called because of their spectral properties. If {ξ(n)} is a sample from a
zero mean stationary white noise process with variance σ
2
, then it satisﬁes
Eξ(n) = 0, R
ξ
(n) = σ
2
δ
n0
In other words, its autocorrelation sequence is an impulse at the origin. This means that values
ξ(n)ξ(m) taken at different times are completely unrelated. The whiteness of the power spectrum
can be see by considering either S(x) or equivalently the Fourier transform
S(ω) = S(exp(jω)) = σ
2
which is a constant dependent on the noise power: the spectrum is ﬂat.
By any large, we shall not deal explicitly with the probability densities, but only work with
the expected values, which are a sufﬁcient characterisation of the signal processes for much of
the DCSP used at present, from audio analysis to computer vision or radar. Although various
probability densities are used, the most widespread is the Gaussian or normal density, as we have
discussed before. This has a number of signiﬁcant features.
1. it is completely characterised by its ﬁrst and second order statistics. For a single variate, the
normal density is given as in Chapter II.
2. Any linear combination of normal variables is also normal. This means that if we ﬁlter a
normal process, we get another normal process (with different mean and correlation).
3. The sum of a large number of independent random variables is generally approximately
normally distributed. This result is called the Central limit Theorem.
Normal processes have been very widely studied and are very useful tool in designing DCSP
algorithms of all sorts.
5.8.1 Detection of known signals in noise
We can now turn to the problem which motivated the above summary of the elements of the theory
of stochastic processes, namely the detection of a known, ﬁnite energy signal {x(n)} in noise,
75
i(n)
Transmitter
Filter {x(n)}
+
White Noise
v(n)
Matched
Filter {x(n)}
Threshold
y(n) d(n)
Figure 49: Model of a typical signal detection application, such as radar or communication
which we shall assume is a sample form a zero mean white noise process {ξ(n)}. The noisy signal
is just
w(n) = x(n) +ξ(n)
with
E(ξ(n)x(m)) = x(m)Eξ(n) = 0, E(ξ(n)ξ(m) = σ
2
δ
mn
The overall system is modelled by the example in Fig. 49 in which a sequence of impulse
{i(n)}, which might be at regular or random intervals, drives the transmitter ﬁlter, which generates
the signal sequence {x(n)} in response to each impulse. The detector operates on the stream of
signals, corrupted by the channel noise process, using a matched ﬁlter, whose output is thresholded
to determine whether the signal is present at a given time. The system output is the binary decision
sequence {d(n)}. I na radar system, of course, the input sequence {i(n)} would represent the
returns from different targets, which would be randomly spaced in time, while in communication
system, the impulses would occur at regular intervals, with data being used to modulate their phase
or amplitude.
Now, according to our previous calculation, the ﬁlter which maximises the output in response
to the known signal is just the matched ﬁlter for the signal
h(n) = x(N −n)
but the signal in the present case is corrupted by noise, whose effects we must take into account.
First, suppose that no signal is present, so w(n) = ξ(n). In signal detection theory, this known as
the null hypothesis. In this case, the ﬁlter optput at time 0 has statistics
E(y(N) = 0
76
and
Ey
2
(N) = σ
2
N
¸
n=0
h
2
(n)
To proceed, now consider the case where the signal is present. Then
Ey(N) =
N
¸
n=0
x
2
(n)
and
Ey
2
(N) = (
N
¸
n=0
x
2
(n))
2
+σ
2
¸
n=0
Nh
2
(n)
Since the matched ﬁlter maximises the mean output when the signal is present and the mean output
is 0 when the signal is absent, the matched ﬁlter is the best ﬁlter for detecting the signal when the
noise is white.
5.9 Wiener Filter
If there is a signiﬁcant amount of noise in the image, then it is better to ﬁnd a way to deal with the
randomness directly than to rely on methods which are essentially ad hoc. The simple way to deal
with noise is just to lowpass ﬁlter the signal, since as we have already seen most of its energy is at
low frequencies. Unfortunately, this will blur the edges too, leading to undesirable results. To do
any better, we must use a signal model.
The statistical approach to the problem is based on stochastic models of the image and noise
processes. The simplest one to describe is Wiener ﬁltering, named after its inventer. To understand
it, we shall start with a very simple problem. Suppose we have a single sample y = x + ξ, where
x is the desired variable and ξ is a sample of white noise, independent of x with variance σ
2
. We
assume that the variance of x is 1 and that both signal and noise have zero mean
Ex = Eξ = Exξ = 0, Ex
2
= 1, Eξ
2
= σ
2
We want to choose a coefﬁcient a so that the difference between the estimate ˆ x = ay and x is
as low as possible
min
a
E[x −ay]
2
Differentiating the error with respect to a and equating to zero
2E(xy) = 2aEy
2
or
a =
E(xy)
Ey
2
77
at which value the error, x − ˆ x is orthogonal to the data y, in the sense that
E(x − ˆ x)y = Exy −aEy
2
= 0
This orthogonality principle can readily be extended to solve the problem of estimating images
distorted by noise.
The Wiener formulation works for any type of noise or signal spectrum and any ﬁlter. If it
has a weakness, it is that the AR signal models which typically get used to not do a good job of
representing image features like edges, so that the Wiener ﬁlter tends to blur them. This problem
can be solved by using a nonstationary formulation, possibly using a state space model to give an
optimum recursive nonstationary ﬁlter. It goes beyond the scope of this note to go into details.
5.10 Having Fun: Contrast Enhancement
Many images are captured in low light conditions, which can cause then to have low contrast
between the darker and lighter regions. This problem can be solved by contrast stretching, in
which a nonlinear function such as f(x) = x
2
or f(x) = exp[ax] is used to increase the dynamic
range of the image. The most sophisticated form of contrast enhancement is known as histogram
equalisation, which works by trying to spread out the intensities so that their probability density is
as uniform as possible.
Fig. 50 shows a comparison between two ﬁgures. The left one is generated from the positive
Gaussian random variable, the right is from the uniformly distributed random variable. It is clearly
demonstrated that the uniform one has a stronger contrast to us.
It os based on the observation that if u is a random variable with probability density p
u
(x), the
cumulative distribution of u is deﬁned to be the total probability that u is less than x, ie.
P
u
(x) =
x
−∞
P
u
(y)dy
If we deﬁne the function
f(u) = P
u
(u)
which is just the cumulative distribution of u, then f(u) itself has a cumulative distribution
P
f
(x) = P
u
(P
−1
u
(x)) =
x 0 < x < 1
0 x ≤ 0
1 x ≥ 1
which corresponds to a uniform density pf(x) = 1, 0 ≤ x ≤ 1, as intended. Because the in
tensities are normally integers, rather than real numbers, this equalisation can only be achieved
approximately in practice, however. In ﬁg. 51, we show another example.
78
0 1 2 3 4 5
0
10
20
30
40
50
60
70
80
90
100
0 0.2 0.4 0.6 0.8 1
0
5
10
15
20
25
30
35
40
Figure 50: Upper panel: left is from Gaussian, right from uniform. Bottom panel: left is the histogram of
Gaussian, right is the histogram of uniform.
79
Figure 51: Another application.
6 Appendix: Mathematics
6.1 Sinusoids
The term sinusoid is used here to designate a waveform of the type
Acos(2πft + φ) = Acos(ωt +φ)
Thus, a sinusoid is deﬁned as a cosine at amplitude A, frequency ω, and phase φ. We could
also have deﬁned a sinusoid based on the sine function as Asin(2πft +φ), in which case φ would
be different by 90
o
(π/2 radians). A sinusoid’s phase is typically denoted by φ, and is in radian
units. The instantaneous frequency of a sinusoid is deﬁned as the derivative of the instantaneous
phase with respect to time:
ω =
d
dt
[ωt +φ]
A discretetime sinusoid is simply obtained from a continuoustime sinusoid by replacing t by
nT:
Acos(2πfnT + φ) = Acos(ωnT +φ)
80
6.2 Complex Numbers
j =
√
−1
z = x +jy = r exp(jθ)
x = r cos(θ)
y = r sin(θ)
r = z =
x
2
+ y
2
θ = ∠z = tan
−1
(y/x)
z
1
z
2
 = z
1
z
2

z
1
/z
2
 = z
1
/z
2

 exp(jθ) = 1
∠r = 0
z
1
z
2
= (x
1
x
2
−y
1
y
2
) +j(x
1
x
2
+y
1
y
2
)
z
1
z
2
= r
1
r
2
exp(j(θ
1
+ θ
2
)
¯ z = x −jy = r exp(−jθ)
z¯ z = z
2
= x
2
+ y
2
= r
2
6.3 The Exponential Function
e
x
= lim
n→∞
(1 +x/n)
n
e
x
=
¸
∞
n=0
x
n
n!
e
jθ
= cos(θ) +j sin(θ)
e
jnθ
= cos(nθ) + j sin(nθ)
sin(θ) =
e
jθ
−e
−jθ
2j
cos(θ) =
e
jθ
+e
−jθ
2
e = 2.71828182...
81
6.4 Trigonometric identities
sin(−θ) = −sin(θ)
cos(−θ) = cos(θ)
sin(A +B) = sin(A) cos(B) + cos(A) sin(B)
cos(A +B) = cos(A) cos(B) −sin(A) sin(B)
sin
2
(θ) =
1 −cos(2θ)
2
cos
2
(θ) =
1 + cos(2θ)
2
sin(A) sin(B) =
cos(A −B) −cos(A +B)
2
cos(A) cos(B) =
cos(A + B) + cos(A −B)
2
sin(A) cos(B) =
sin(A +B) + sin(A −B)
2
sin(A) + sin(B) = 2 sin
A +B
2
cos
A −B
2
cos(A) + cos(B) = −2 cos
A +B
2
cos
A −B
2
6.5 Spectrum
In this module, we think of ﬁlters primarily in terms of their effect on the spectrum of a signal. This
is appropriate because the ear (to a ﬁrst approximation) converts the timewaveform at the eardrum
into a neurologically encoded spectrum. Intuitively, a spectrum (a complex function of frequency
ω) gives the amplitude and phase of the sinusoidal signalcomponent at frequency ω. Mathemat
ically, the spectrum of a signal x is the Fourier transform of its timewaveform. Equivalently, the
spectrum is the z transform evaluated on the unit circle z = exp(jωT).
We denote both the spectrum and the z transform of a signal by uppercase letters. For ex
ample, if the timewaveform is denoted x(n), its z transform is called X(z) and its spectrum
is therefore X(exp(jωT)). The timewaveform x(n) is said to “correspond” to its z transform
X(z), meaning they are transform pairs. This correspondence is often denoted x(n) X(z),
or x(n) X(exp(jωT). Both the z transform and its special case, the (discretetime) Fourier
transform, are said to transform from the time domain to the frequency domain.
We deal most often with discrete time nT (or simply n) but continuous frequency f (or ω =
2πf). This is because the computer can represent only digital signals, and digital timewaveforms
are discrete in time but may have energy at any frequency. On the other hand, if we were going
to talk about FFTs (Fast Fourier Transforms–efﬁcient implementations of the Discrete Fourier
Transform, or DFT), then we would have to discretize the frequency variable also in order to
represent spectra inside the computer. In this book, however, we use spectra only for conceptual
insights into the perceptual effects of digital ﬁltering; therefore, we avoid discrete frequency for
simplicity.
82
When we wish to consider an entire signal as a “thing in itself,” we write x(·), meaning the
whole timewaveform (x(n) for all n), or X(·), to mean the entire spectrum taken as a whole.
Imagine, for example, that we have plotted x(n) on a strip of paper that is inﬁnitely long. Then
x(·) refers to the complete picture, while x(n) refers to the nth sample point on the plot.
6.5.1 Fourier’s Song
Integrate your function times a complex exponential
It’s really not so hard you can do it with your pencil
And when you’re done with this calculation
You’ve got a brand new function  the Fourier Transformation
What a prism does to sunlight, what the ear does to sound
Fourier does to signals, it’s the coolest trick around
Now ﬁltering is easy, you don’t need to convolve
All you do is multiply in order to solve.
From time into frequency–from frequency to time
Every operation in the time domain
Has a Fourier analog  that’s what I claim
Think of a delay, a simple shift in time
It becomes a phase rotation  now that’s truly sublime!
And to differentiate, here’s a simple trick
Just multiply by J omega, ain’t that slick?
Integration is the inverse, what you gonna do?
Divide instead of multiply–you can do it too.
From time into frequency–from frequency to time
Let’s do some examples... consider a sine
It’s mapped to a delta, in frequency  not time
Now take that same delta as a function of time
Mapped into frequency –of course –it’s a sine!
Sine x on x is handy, let’s call it a sinc.
Its Fourier Transform is simpler than you think.
You get a pulse that’s shaped just like a top hat...
Squeeze the pulse thin, and the sinc grows fat.
Or make the pulse wide, and the sinc grows dense,
The uncertainty principle is just common sense.
6.6 Matlab program for simple ﬁlter design
(From Enrico Rossoni)
clear all; close all;clc;
Fs = 11025;
Fnoise = Fs/8;
83
SNR = 5;
load jazz signal;
samplelength = length(signal);
SAMPLINGFREQUENCY = Fs;
NFFT = min([256, length(signal)]);
NOVERLAP = 0;
WINDOW = HANNING(NFFT);
[Ps,F] = spectrum(signal,NFFT,NOVERLAP,WINDOW,SAMPLINGFREQUENCY);
spectrumsignal = Ps(:,1); Psignal = mean(spectrumsignal);
deltats = 1/Fs;
inoise = min(ﬁnd(F¿=Fnoise));
fprintf(1,’SignaltoNoise ratio =
Pnoise = spectrumsignal(inoise) * 10
−SNR/10
;
ampnoise = sqrt(Pnoise);
noise = ampnoise*sin(2*pi*Fnoise*(1:samplelength)’*deltats);
Pn = spectrum(noise,NFFT,NOVERLAP,WINDOW,SAMPLINGFREQUENCY);
spectrumnoise = Pn(:,1); Pnoise = mean(spectrumnoise)
noisy = signal + noise;
Pns = spectrum(noisy, NFFT, NOVERLAP,WINDOW, SAMPLINGFREQUENCY);
a1 = 1.3435; a2 = 0.9025;
b0 = 0.9543; b1 = 0.9543*(1.4142); b2 = 0.9543;
y = zeros(size(noisy)); y(1) = 0; y(2) = 0;
for ii = 3:samplelength
y(ii) = a1*y(ii1) + a2*y(ii2) + b0*noisy(ii) + b1*noisy(ii1) + b2*noisy(ii2);
end
Pﬁltered = spectrum(y, NFFT, NOVERLAP, WINDOW, SAMPLINGFREQUENCY);
ﬁgure(2); plot(F*0.001, Pns(:,1), F*0.001, Pﬁltered(:,1), ’r’);
xlabel(’Frequency (kHz)’); ylabel(’—X(F)—’); legend(’Disturbed Signal’,’Filtered Signal’);
fprintf(’Playing ﬁltered audio sample n’); soundsc(y)
6.7 Fourier Transform: From Real to Complex Variables
For a signal x(t) with a period T, we have
x(t) =
A
0
2
+
∞
¸
n=1
[A
n
cos(nωt) +B
n
sin(nωt)] (6.1)
where ω = (2π)/T and A
i
, B
i
are deﬁned as in Chapter II.
Using the Euler formula as in this Appendix, we have
exp(jnωt) = cos(nωt) +j sin(nωt) (6.2)
84
and
exp(−jnωt) = cos(−nωt) +j sin(−nωt)
= cos(nωt) −j sin(nωt)
(6.3)
where j
2
= −1. Eq. (6.2)+Eq. (6.3) gives us
cos(nωt) =
[exp(jnωt) + exp(−jnωt)]
2
(6.4)
Eq. (6.2)Eq. (6.3) yields
sin(nωt) =
[exp(jnωt) −exp(−jnωt)]
2j
(6.5)
Substituting Eq. (6.4) and Eq. (6.5) into Eq. (6.1), we obtain
x(t) =
A
0
2
+
∞
¸
n=1
[A
n
cos(nωt) +B
n
sin(nωt)]
=
A
0
2
+
∞
¸
n=1
¸
A
n
[exp(jnωt) + exp(−jnωt)]
2
+ B
n
[exp(jnωt) −exp(−jnωt)]
2j
=
A
0
2
+
∞
¸
n=1
¸
A
n
exp(jnωt)
2
+B
n
exp(jnωt)
2j
+
¸
A
n
exp(−jnωt)
2
−B
n
exp(−jnωt)]
2j
=
A
0
2
+
∞
¸
n=1
¸
A
n
2
+
B
n
2j
exp(jnωt) +
¸
A
n
2
−
B
n
2j
exp(−jnωt)
(6.6)
If we deﬁne
C
0
= A
0
/2, c
n
=
A
n
2
+
B
n
2j
, c
−n
=
A
n
2
−
B
n
2j
Eq. (6.6) becomes
x(t) = c
0
+
∞
¸
n=1
c
n
exp(jnωt) +
∞
¸
n=1
c
−n
exp(−jnωt)
=
∞
¸
n=−∞
c
n
exp(jnωt)
(6.7)
6.8 More Details on Example 8
Given a ﬁnite sequence x[n], its Discrete Fourier Transform (DFT) X(k) is deﬁned as
X(k) = DFT{x[n]} =
N−1
¸
n=0
x[n] exp [−j(2π/N)kn] , k = 0, . . . , N −1
85
where N is the lenght of the sequence. Note that by deﬁning the set of angular frequencies ω
k
=
(2π/N)k, k = 0, . . . , N −1, we can rewrite Equation 1 as
X(k) =
N−1
¸
n=0
x[n] exp [−jω
k
n] (6.8)
The meaning of this formula is clear if you think that the sequences e
k
[n] = exp(jω
k
n),
k = 0, . . . , N − 1, form a ‘basis’ for representing any sequence of N numbers (you can easily
check these N sequences are all independent). Therefore, we can interpret X(k) as the ‘projection’
of our original signal onto the kth sequence of our basis. In conclusion, we can regard the Fourier
transform as a simple ‘change of coordinate’ (from time to frequency, from frequency to time..).
Let us consider for example the sequence x[n] = (1, 2, −1, −1). In this case we have N = 4 and
ω
k
= (π/2)k. From Eq. (6.8) we then obtain
X(k) =
N−1
¸
n=0
x[n] exp [−jω
k
n]
= x(0) exp(−j(π/2)k · 0) +x(1) exp(−j(π/2)k · 1) +
+ x(2) exp(−j(π/2)k · 2) +x(3) exp(−j(π/2)k · 3)
= 1 + 2 · (−j)
k
−1 · (−j)
2k
−1 · (−j)
3k
where we have used the Euler formula to get
exp(−jπ/2) = cos(−π/2) +j sin(−π/2) = −j,
and the basic property of the exponential exp(nz) = [exp(z)]
n
. Finally, from the equation above
we get X[k] = (1, 2 −3j, −1, 2 + 3j).
As we mentioned above, these complex numbers can be though as coordinates to represent our
signal. In fact, for any given ﬁnite sequence x[n] we have
1
x[n] =
1
N
N−1
¸
k=0
X(k)e
k
[n]
The coefﬁcients in this expansion can be more conveniently rewritten as X(k) = ρ
k
exp(jφ
k
)
where ρ
k
= X(k) is the modulus and φ
k
= arg(X(k)) is the phase of X(k). Thus, in order to
reconstruct our sequence, we just need to scale each basis sequence by ρ
k
, rotate its elements by
φ
k
in the complex plane and ﬁnally sum all the resulting sequences up together.
In Figure 52 we plot the signal x respectively in the time (top) and in the z domain (right). In
Figure 6.8 we plot the signal in the frequency domain, i.e. the amplitude and the phase of its DFT
coefﬁcients X(k) are plotted as a function of the frequency index k. We will usually refer to this
as to the spectrum of x.
1
The factor 1/N is due to the fact that our basis sequences {e
k
} are not normalized.
86
0 1 2 3
−3
−2
−1
0
1
2
3
n
x
[
n
]
−3 −2 −1 0 1 2 3
−3
−2
−1
0
1
2
3
Re(X)
I
m
(
X
) X(1)
X(2)
X(3)
X(4)
Figure 52: The signal x plotted in the time domain (top) and in the zdomain (bottom).
0 1 2 3
0
1
2
3
4
0 1 2 3
k
π
π/2
−π
−π/2
0
ρ
k
=X(k)
φ
k
= arg(X(k))
Figure 53: The signal x in the frequency domain.
2
Contents
1 Introduction 2 Data Transmission 2.1 The transmission of information . . . . . . . . . . . . . . . . . . . . . . 2.1.1 General Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 The conversion of analogue and digital signals . . . . . . . . . . 2.1.4 The relationship between information, bandwidth and noise . . . 2.2 Communication Techniques . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Time, frequency and bandwidth . . . . . . . . . . . . . . . . . . 2.2.2 Digital modulation: ASK, FSK and PSK . . . . . . . . . . . . . 2.2.3 Spread spectrum techniques . . . . . . . . . . . . . . . . . . . . 2.2.4 Digital demodulation . . . . . . . . . . . . . . . . . . . . . . . . 2.2.5 Noise in communication systems: probability and random signals 2.2.6 Errors in digital communication . . . . . . . . . . . . . . . . . . 2.2.7 Timing control in digital communication . . . . . . . . . . . . . 3 Information and coding theory 3.1 Information sources and entropy 3.2 Information source coding . . . 3.2.1 Huffman coding . . . . 3.3 Channel Capacity . . . . . . . . 3.4 Error detection coding . . . . . 3.4.1 Hamming distance . . . 3.4.2 Parity Check Codes . . . 3.5 Encryption . . . . . . . . . . . . 4 5 5 5 6 7 8 9 9 13 15 17 18 22 25 28 29 30 31 34 35 35 36 38 42 42 44 47 48 48 51 55
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
4 Signal Representation 4.1 Sequences and their representation . . . . . . . . 4.2 Discrete Time Fourier Transform (DTFT) . . . . 4.2.1 Computation of the DTFT . . . . . . . . 4.3 Discrete Fourier Transform (DFT) . . . . . . . . 4.3.1 The relationship between DFT and DTFT 4.3.2 DFT for spectral estimation . . . . . . . 4.4 *Sampling and reconstruction* . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
5 Digital Filters 59 5.1 Operations on Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.3 Nonrecursive Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3
5.3.1 Operational Deﬁnition . . . . . . . . . . . . . . 5.3.2 Zeros . . . . . . . . . . . . . . . . . . . . . . . 5.4 Recursive Filters . . . . . . . . . . . . . . . . . . . . . 5.4.1 Operational Deﬁnition . . . . . . . . . . . . . . 5.4.2 Poles and Zeros . . . . . . . . . . . . . . . . . . 5.5 Frequency and digital ﬁlters . . . . . . . . . . . . . . . 5.5.1 Poles, Zeros and Frequency Response . . . . . . 5.5.2 Filter Types . . . . . . . . . . . . . . . . . . . . 5.6 Simple Filter Design . . . . . . . . . . . . . . . . . . . 5.7 Matched Filter . . . . . . . . . . . . . . . . . . . . . . . 5.8 Noise in Communication Systems: Stochastic Processes 5.8.1 Detection of known signals in noise . . . . . . . 5.9 Wiener Filter . . . . . . . . . . . . . . . . . . . . . . . 5.10 Having Fun: Contrast Enhancement . . . . . . . . . . . 6 Appendix: Mathematics 6.1 Sinusoids . . . . . . . . . . . . . . . . . . . . . . . 6.2 Complex Numbers . . . . . . . . . . . . . . . . . . 6.3 The Exponential Function . . . . . . . . . . . . . . . 6.4 Trigonometric identities . . . . . . . . . . . . . . . . 6.5 Spectrum . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Fourier’s Song . . . . . . . . . . . . . . . . 6.6 Matlab program for simple ﬁlter design . . . . . . . 6.7 Fourier Transform: From Real to Complex Variables 6.8 More Details on Example 8 . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
61 62 63 63 64 65 65 66 68 71 73 74 76 77 79 79 80 80 81 81 82 82 83 84
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
4
1 Introduction
Digital communications and signal processing refers to the ﬁeld of study concerned with the transmission and processing of digital data. This is in contrast with analog communications. While analog communications use a continuously varying signal, a digital transmission can be broken down into discrete messages. Transmitting data in discrete messages allows for greater signal processing capability. The ability to process a communications signal means that errors caused by random processes can be detected and corrected. Digital signals can also be sampled instead of continuously monitored and multiple signals can be multiplexed together to form one signal. Because of all these advantages, and because recent advances in wideband communication channels and solidstate electronics have allowed scientists to fully realize these advantages, digital communications has grown quickly. Digital communications is quickly edging out analog communication because of the vast demand to transmit computer data and the ability of digital communications to do so. Here is a summary on what we will cover in this course. 1. Data transmission: Channel characteristics, signalling methods, interference and noise, and synchronisation; 2. Information Sources and Coding: Information theory, coding of information for efﬁciency and error protection encryption; 3. Signal Representation: Representation of discrete time signals in time and frequency; z transform and Fourier representations; discrete approximation of continuous signals; sampling and quantisation; and data compression; 4. Filtering: Analysis and synthesis of discrete time ﬁlters; ﬁnite impulse response and inﬁnite impulse response ﬁlters; frequency response of digital ﬁlters; poles and zeros; ﬁlters for correlation and detection; matched ﬁlters and stochastic signals and noise processes; 5. Digital Signal Processing applications: Processing of images using digital techniques. The application of DCSP in industry and our daily life is enormous, although in this introductory module we are only able to touch several simple examples. Part of the current lecture notes on DSP is taken from lecture notes of Prof. R. Wilson. Many materials are adopted from public domain materials. Many thanks to Dr. Enrico Rossoni who has spent a considerable time on going through the manuscript several times to correct typos. The sections I include only for your reference and I will not go through them during lectures are marked with a ∗.
A demodulator that receives the original source signal from the received signal and passes it to the sink At each stage. as we will discuss in the second part of our course. At its simplest. signal processing techniques are required to detect signals. A modulator that takes the source signal and transforms it so that it is physically suitable for the transmission channel 2. . A transmission channel that is the physical link between the communicating parties 3. the system contains (see Fig. ﬁlter out noise and extract features. 1) 1.5 source (noise) noise. distortion. A transmitter that actually introduces the modulated signal into the channel.1 General Form A communications system is responsible for the transmission of information from the sender to the recipient.1 The transmission of information 2. usually amplifying the signal as it does so 4. A receiver that detects the transmitted signal on the channel and usually ampliﬁes it (as it will have been attenuated by its journey through the channel) 5. attenuation (noise) sink modulator transmitter transmission channel receiver demodulator Figure 1: A communications system 2 Data Transmission 2.1.
Speech signals occupy a smaller range of frequencies. It is measured in bits/second or b/s. The information content of a digital signal is equal to the number of bits required to represent it.1) where n is the number of levels a signal may take.6 time domain frequency domain bandwidth time f1 f2 frequency Figure 2: Time domain and frequency domain representation of a signal. Often. Television. The information rate is a measure of the speed with which information is transferred. these bit strings are interpreted as numbers in a binary number system. 2). A television signal is an analogue signal created by linearly scanning a two dimensional image. and telephone speech typically occupies the range 300 Hz to 3300 Hz. Thus a signal that may vary between 0 and 7 has an information content of 3 bits. 2. Digital data is universally represented by strings of 1s or 0s. It occupies a frequency range from about 200 Hz to about 15KHz. Thus 1010012 = 4110 . Each one or zero is referred to as a bit.2 Examples Telecommunications trafﬁc is characterised by great diversity. Written as an equation this relationship is I = log2 (n) bits (2. An audio signal is an example of an analogue signal. 2. The range of frequencies occupied by the signal is called its bandwidth (see Fig. . but not always. A nonexclusive list is the following: 1. Audio signals. It is important to appreciate that information is a measure of the number of different outcomes a value may take.1. Typically the signal occupies a bandwidth of about 6 MHz.
For example. High ﬁdelity. or process them on a digital computer. Time independence. That is S ≥ 2B (2. we need to convert analogue signals to digital ones. the bit rate R to transform the signal without loss of information is R = 8bits × 2B = 64000b/s (2. Reproducing cells. One disadvantage of digital communication is the increased expense of transmitters and receivers. Once digitised a signal may be transmitted at a rate unconnected with its recording rate. 3. The same transmitted message has an inﬁnite number of meanings according to the rule used to interpret it. 3. If this is sampled by an 8bit ADC at the Nyquist sampling rate. 2. 4.7 3. This process is performed by an analoguetodigital converter (ADC). A disk drive 6.2) The rate at which an ADC generates bits depends on how many bits are used in the converter. 4. These are: 1.e. 5. video and text may be transmitted using the same channel. measured at regularly spaced instant) (Fig 3) and then quantised (Fig. Our brain The use of digital signals and modulation has great advantages over analogue systems. The converse operation to the ADC is performed by a digitaltoanalogue converter (DAC). 2. A digitised signal is a stream of numbers. The ADC process is governed by an important law.3 The conversion of analogue and digital signals In order to send analogue signals over a digital communication system. The analogue signal is sampled (i. a speech signal has an approximate bandwidth of 4kHz. Signals may be coded. Telex describes a message limited to a predetermined set of alphanumeric characters. bottom panel) i. Teletext is written (or drawn) communications that are interpreted visually.1. Very high ﬁdelity transmission and representation are possible. converted to discrete numeric values. The discrete nature of digital signals makes their distinction in the presence of noise easy.3) . The NyquistShannon Theorem (which will be discussed in Chapter 3) states that an analogue signal of bandwidth B can be completely recreated from its sampled form provided it is sampled at a rate equal to at least twice its bandwidth.e. in which the daughter cells’s DNA contains information from the parent cells. Source independence. Voice. This is particularly true of realtime communication of analogue signals. The digital signals may be transmitted using the same format irrespective of the source of the communication.
a signal of bandwidth B may change at a maximum rate of 2B. Namely. Obviously.8 signal sampled signal t t sampling signal error in sampling due to 1 low sampling frequency t 10 9 8 7 6 5 t 4 3 2 1 0 quantisation levels 9 4 2 6 3 5 t sampled signal sampled. and is often expressed in deciBels (dB): SN R = 10 log10 (S/N ) dB The source of noise signals vary widely.1. Noise therefore places a limit on the maximum rate at which we can transfer information. If we continue to subdivide the magnitude of the changes into ever decreasing intervals. 2. so does the information rate.4 The relationship between information. we reach a point where we cannot distinguish the individual levels because of the presence of noise. Bottom panel: Quantisation of a sampled signal. if each time the signal changes it can take one of n levels. the information rate is increased to R = 2B log2 (n) b/s (2. This is deﬁned by the ratio signal power S to noise power N . the information rate may be increased. quantised signal Figure 3: Upper panel: Periodic sampling of an analogue signal. bandwidth and noise The most important question associated with a communication channel is the maximum rate at which it can transfer information.4) This formula states that as n tends to inﬁnity. The rate at which a signal may change is determined by the bandwidth. what really matters is the signal to noise ratio (SNR). Is there a limit on the number of levels? The limit is set by the presence of noise. so the maximum information rate is 2B. Thus. If changes of differing magnitude are each associated with a separate bit.5) . (2. Analogue signals passing through physical channels may not achieve arbitrarily fast changes.
frequency and phase of the sine wave. The task of providing high channel efﬁciency is the goal of coding techniques.2 Communication Techniques 2. frequency and bandwidth Most signal carried by communication channels are modulated forms of sine waves. We can describe this signal in two ways. 2. This is due to the random motion of electric charges present in all media. The cosine wave. Input noise is common in low frequency circuits and arises from electric ﬁelds generated by electrical switching. ω = 2πf . A sine wave is described mathematically by the expression s(t) = A cos(ωt + φ) (2. The other way is to describe its frequency content.7) The quantities A. This representation is quite general. and when present can have a catastrophic effect due to its large power. ω. Thermal noise is always present. in frequency domain. It appears as bursts at the receiver. The famous HartleyShannon Law states that the channel capacity.8) . Mathematically ∞ ∞ x(t) = A0 + n=1 An cos(ωnt) + n=1 Bn sin(ωnt) (2. in practice channels only approach this limit.6) For example. a 10kHz channel operating at a SNR of 15dB has a theoretical maximum information rate of 10000 log2 (31. Other people’s signals can generate noise: crosstalk is the term given to the pickup of radiated signals from adjacent cabling.2. One way is to describe its evolution in time domain.623) = 49828b/s.1 Time. There is a theoretical maximum to the rate at which information passes error free over a channel. In fact we have the following theorem due to Fourier. C. The theorem makes no statement as to how the channel capacity is achieved. or internally at the receiver. When radio links are used. 2. as in the equation above. has a single frequency. In fact. Theorem 1 Any signal x(t) of period T can be represented as the sum of a set of cosinusoidal and sinusoidal waves of different frequencies and phases. interference from other transmitters can be problematic. This maximum is called the channel capacity. C (which we will discuss in details later) is given by C = B log2 (1 + (S/N )) b/s (2. s(t).9 1. φ are termed the amplitude. It can be generated externally.
whose respective amplitudes decay as 2/π. 0 < t < π. consider the square wave (Fig. 2/3π. 1. and T is the period of the signal..9. 4) s(t) = 1.10) where A graph of the spectrum has a line at the odd harmonic frequencies.5. 2π < t < 3π. term.. . Example 1 As an example. The spectrum of a signal is usually shown as a twosided .11) (2. · · · .10 1 t s(f) 5 3 1 0 1 3 5 f Figure 4: Upper panel: a square wave.. . The description of a signal in terms of its constituent frequencies is called its frequency spectrum.. This has the Fourier series: s(t) = 1 2 1 1 + [sin(t) + sin(3t) + sin(5t) + · · · ] 2 π 3 5 (2.c.. and zero otherwise. Bottom panel: The frequency spectrum for the square wave. T /2 A = 1 0 x(t)dt T −T /2 T /2 A = 2 n x(t) cos(ωnt)dt T −T /2 (2.3.9) T /2 B = 2 n x(t) sin(ωnt)dt T −T /2 ω = 2π T where A0 is the d.
Bn . } If we truncate the series into ﬁnite terms. 5. B2 = 0. B−n . Bn . · · · . in the Example 1. · · · . .12) = sin(nπ) nπ = 0 π 2 Bn = 1 sin(nt)dt 2π 0 1 = (1 − cos(nπ)) nπ which gives B1 = 2/π. B3 = 2/3π... the signal can be approximated by a ﬁnite series as shown in Fig.11 1 1 0 0 1 1 0 0 Figure 5: A square wave with 3. . we have x(t)  (2. B−2 . B0 . B2 . The coefﬁcients are obtained according to π 1 A0 = 1dt 2π 0 1 = 2 π 2 An = 1 cos(nt)dt 2π 0 1 (2. For example.5 and 6 of its Fourier terms. A periodic signal is uniquely decided by its coefﬁcients An . · · · . B1 ..13) {· · · .4. spectrum with positive and negative frequency components. B−1 .
i. There are a number of features to note: 1. ∞ x(t) exp(−j2πF t)dt X(F ) = F T {x(t)} = −∞ (2.14) which is the exponential form of the Fourier series. they have zero bandwidth.16) ∞ x(t) = IF T {X(t)} = X(t) exp(j2πF t)dt −∞ Example 2 Consider the case of a rectangular pulse.7) x(t) = A0 + ∞ An cos(ωnt) + ∞ Bn sin(ωnt) n=1 n=1 = A0 + ∞ An [exp(jωnt) + exp(−jωnt)]/2 n=1 + ∞ Bn [exp(jωnt) − exp(−jωnt)]/2j n=1 ∞ = n=−∞ cn exp(jωnt) (2. i. some energy is found at all frequencies.e. However. the spectrum changes substantially. Thus zero bandwidth signals carry no information. Most of the energy is contained in a limited region called the mainlobe. If the square wave signal discussed in the previous example is replaced with an aperiodic sequence.5 x(t) exp(−j2πF t)dt = and is plotted in Fig. 6.e. they represent themselves indeﬁnitely. 0. The bandwidth of the signal is only approximately ﬁnite. the values cn are complex and so cn  and arg(cn ) are the magnitude and the phase of the spectral component respectively.5 otherwise sin(πF ) πF This is shown in Fig.5 X(F ) = −0. In this expression. cn = 1 T T /2 x(t) exp(−jωnt)dt −T /2 (2. if − 0.5 ≤ t < 0. 6 and its Fourier transform can be readily computed from the deﬁnition. The lines in this spectrum are inﬁnitely thin. The consequence of an aperiodic change is to introduce a spread of frequencies into the signal. Signals whose spectra consist of isolated lines are periodic.12 In general. as 0. . This may seem nonintuitive but can be seen from equations above. a signal can be represented as follows (see Appendix 6. 2. The bandwidth of a communication channel is limited by the physical construction of the channel. The HartleyShannon law tells us that the maximum information rate of a zero bandwidth channel is zero. The spectrum has positive and negative frequencies. These are symmetric about the origin.15) where ω = 2π/T . In particular deﬁne the signal x(t) = 1. To permit the signal to carry information we must introduce the capacity for aperiodic change.
8 0. Mathematically the modulated carrier signal s(t) is (Fig.2 −0.2 0 −5 0 5 1 0.2 Digital modulation: ASK. These techniques give rise to amplitudeshiftkeying (ASK).13 2 1.17) ASK is a special case of amplitude modulation (AM). These are the altering of the amplitude.4 1.2 0 −0.4 −5 0 5 Figure 6: A square wave and its Fourier transform.18) . 2. The resultant modulated signal may be regarded as the sum of two amplitude modulated (2. Amplitude modulation has the property of translating the spectrum of the modulation f (t) to the carrier frequency. FSK describes the modulation of a carrier (or two carriers) by using a different frequency for a 1 or 0.4 0. respectively. ASK describes the technique by which a carrier wave is multiplied by the digital signal f (t). frequencyshiftkeying (FSK) and phaseshiftkeying (PSK). FSK and PSK There are three ways in which the bandwidth of the channel carrier may be altered simply.6 0.2 1 0. frequency and phase of the carrier wave. The bandwidth of the signal remains unchanged.6 0. 7) s(t) = f (t) cos(ωc t + φ) (2.8 0. 8. This can be seen if we examine a simple case when f (t) = cos(ωt) and we use the identities: cos(A + B) = cos(A) cos(B) − sin(A) sin(B) cos(A − B) = cos(A) cos(B) + sin(A) sin(B) then s(t) = cos(ωt) cos(ωc t) = 1/2[cos((ω + ωc )t) + cos((ω − ωc )t) See ﬁg.2.4 0.6 1.8 1.
14 carrier wave ASK modulated carrier signal 1 0 1 0 0 1 0 1 1 Figure 7: Amplitude shift keying fc f fc f Figure 8: Amplitude shift keying: frequency domain .
3 Spread spectrum techniques Spreadspectrum techniques are methods in which energy generated at a single frequency is deliberately spread over a wide band of frequencies.15 carrier wave for 1 FSK modulated carrier carrier wave for 0 signal 1 0 1 0 0 1 0 1 1 Figure 9: Frequency shift keying signals of different carrier frequency s(t) = f0 (t) cos(ω0 t + φ) + f1 (t) cos(ω1 t + φ) FSK is classiﬁed as wideband if the separation between the two carrier frequencies is larger than the bandwidth of the spectrums of f0 and f1 . In frequency hopping.2. 11). and its bandwidth is the same as that of ASK (Fig. 2. This is done for a variety of reasons. 0. but shall look at one particular technique that is used called frequency hopping. as shown in Fig. It is therefore a type of ASK with f (t) taking he values −1 and 1. Phaseshift keying offers a simple way of increasing the number of levels in the transmission without increasing the bandwidth by introducing smaller phase shifts. PSK describes the modulation technique that alters the phase of the carrier. Mathematically s(t) = cos(ωc t + φ(t)) Binary phaseshiftkeying (BPSK) has only two phases. Quadrature phaseshiftkeying (QPSK) has four phases. We shall not delve deeply into mechanisms. including increasing resistance to natural interference or jamming and to prevent hostile detection. Mary PSK has M phases. 12.π/2. π. In this case the spectrum of the modulated signal appears as two separate ASK signals. 3π/2. the bandwidth is . 0 and π.
16 fc1 fc0 f fc1 fc0 f Figure 10: Frequency shift keying: frequency domain carrier wave BPSK modulated carrier signal 1 0 1 0 0 1 0 1 1 Figure 11: Phase shift keying .
It describes demodulation schemes that are sensitive only to the power in the signal. Those that do not need knowledge of the carrier phase are termed incoherent. Demodulation schemes requiring the carrier phase are termed coherent. and so this technique has obvious security applications. With ASK. or it is not. the signal is passed to an envelope detector. If the hop set is generated by a pseudo random number generator then the seed to that generator is effectively a key decoding the transmitted message. for instance military use or in mobile phone systems. The hop rate (the rate at which the signal switches channels) can be thousands of times a second. or the other. but has better performance. The hop set (channel hopping sequence) is not arbitrary. 2. The method of demodulation is an important factor in determining the selection of a modulation scheme. This is a device that produces as output the outline of the signal. However. In ASK incoherent demodulation. the power is either present at one frequency. but determined by the use of a pseudo random sequence.2. The signal is then spread across the channels. Incoherent modulation is inexpensive but has poorer performance. so the dwell time (time spent on one channel) is very short.4 Digital demodulation From the discussion above it might appear that QPSK offers advantages over both ASK. the demodulation of these signals requires various degrees of difﬁculty and hence expense. The receiver can reproduce the identical hop set and so decode the signal. There are two types of demodulation which are distinguished by the need to provide knowledge of the phase of the carrier. the power is either present. Coherent demodulation requires more complex circuity. A decision is made as to whether the signal is . FSK and PSK. Incoherent demodulation can be applied to ASK and wideband FSK.17 f spread signal unspread signal t Figure 12: Frequency hopping spread spectrum technique effectively split into frequency channels. With wideband FSK.
termed local oscillator. The Probability density function (pdf) p(x) of a random signal. In this subsection we shall provide a description of noise. In the electrical and microwave context. the incoming signal is compared with a replica of the carrier wave. In theory. such as microwave links. Suppose there is some phase error φ present in the local oscillator signal. systems where the signal amplitude may vary unpredictably. Circuits whose output depends on the frequency of the input are called discriminators or ﬁlters. it determines the theoretical capacity of the channel. the harmonic term. We can only make statements about the probability of it taking a particular value. The difﬁculty with coherent detection is the need to keep the phase of the replica signal. Therefore. some more sophisticated methods such as differential phaseshiftkeying (DSPK) have to be introduced to resolve the issue. Envelope detection is the simplest and cheapest method of demodulation. phase modulation is technically very difﬁcult. Another way to demodulate the signal is performed by multiplying the incoming signal with a replica of the carrier. Oscillators are sensitive to (among other things) temperature.e. If the output of this process is h(t). The outputs of the two discriminators are interrogated to determine the signal. each sensitive to one of the two carrier frequencies. or ﬁltering out. 2. In optical communications. We shall consider how the noise determines the error rates in the next subsection. or random variable x is deﬁned to be the probability that the random variable x takes a value between x0 and x0 + δx. Incoherent demodulation can also be used for wideband FSK. and a ‘freerunning’ oscillator will gradually drift in frequency and phase. the output of the demodulation is the modulation f (t). After ﬁltering. or range of values.18 present or not. however. the output of a demodulator will be f (t) f (t) h(t) = f (t) cos(ωc t) cos(ωc t + φ) = cos(φ) + cos(2ωc t + φ) 2 2 Clearly the consequence for the correct interpretation of the demodulated signal is catastrophic. the original signal plus a term at twice the carrier frequency. By this we mean that we cannot predict its value. We write this as follows: . Noise is a random signal. The signal must be wideband FSK to ensure the two signals f0 (t) and f1 (t) are distinguished. are not suitable for ASK modulation. but very wasteful of bandwidth. By removing.2. it is considered crude. It is used in circumstances where bandwidth is not the primary constraint. This is obviously necessary with PSK signals. Incoherent FSK demodulation is simple and cheap.5 Noise in communication systems: probability and random signals Noise plays a crucial role in communication systems. This is not easy to do. In practise it determines the number of errors occurring in a digital communication. we have that h(t) = f (t) cos(ωc t) cos(ωc t) = f (t) f (t) f (t) [1 + cos(2ωc t)] = + cos(2ωc t) 2 2 2 i. In addition. Here the signals are passed to two circuits. ‘locked’ to the carrier. because here the power in the signal is constant. and ASK is the only option. With coherent demodulation systems.
dx = duσ 2 then we have (m = 0) 1 P (−x1 < x < x1 ) = √ π 2 exp(−u2 )du = √ π −u1 u1 u1 0 exp(−u2 )du . or Gaussian distribution: 1 −(x − m)2 ) (2. n(t). the noise signal. Further. m = 0. Continuous distributions. we would evaluate: P (v1 ) − P (−v1 ) In general to evaluate P (−x1 < x < x1 ) if we use √ √ u = x/( 2σ).e. σ is the standard deviation of the distribution (see Fig. white noise. This expression is important as many actually occurring noise source can be described by it.19) p(x) = √ exp( 2σ 2 2πσ where m is the mean value of p(x).19 p(x0 )δx = P (x0 < x < x0 + δx) The probability that the random variable will take a value lying between x1 and x2 is then the integral of the pdf over the interval [x1 x2 ]: x2 P (x1 < x < x2 ) = x1 p(x)dx The probability P (−∞ < x < ∞) is unity. An example of a continuous distribution is the Normal. 13). i. say. The constant term ensures that the distribution is normalized. we can simplify the expression by considering the source to be a zero mean random variable. The cumulative distribution function (CDF) P (x) is deﬁned to be the probability that a random variable. Thus ∞ p(x)dx = 1 −∞ a density satisfying the equation above is termed normalized. i. having the value [−v1 . v1 ]. x is less than x0 x0 P (x0 ) = P (x < x0 ) = −∞ p(x)dx From the rules of integration: P (x1 < x < x2 ) = P (x2 ) − P (x1 ) Some commonly used distributions are: 1. How would this be used? If we want to know the probability of.e.
8 0. √ where u1 = x1 /( 2σ). The mean η of a random variable x is deﬁned to be η= or. 14). If a random variable can only take discrete value..7 0. The complementary error function erf c is deﬁned by erfc(x) = 1 − erf(x) 2. its PDF takes the forms of lines (see Fig.2 0. 2.5 0.5 1 1.9 0. The distribution function P (x) is usually written in terms of a function of the error function erf(x). while the mean of the Poisson distribution is found to be α. An example of a discrete distribution is the Poisson distribution p(n) = P (x = n) = where n = 0. Discrete distributions. The mean of a distribution is.1 0 1. 1.5 1 0.4 0. Probability density functions need not be continuous. · · · .20 0. The two most important measures are the mean (or expectation) and the standard deviation. We cannot predict the value a random variable may take on a particular occasion but we can introduce measures that summarise what we expect to happen on average.3 0. for a discrete distribution: η= np(n) In the examples above we have assumed that the mean of the Gaussian distribution to be 0. xp(x)dx αn exp(−α) n! .6 0.5 0 0.5 Figure 13: Gaussian distribution pdf. the average value taken by the corresponding random variable. in common sense.
14 0.19) above. This means that its amplitude at a particular time has a PDF given by Eq. The variance σ 2 is deﬁned to be σ2 = or.12 0.06 0. A small standard deviation means the distribution is concentrated about the mean. We have already seen that the signal to noise ratio is an important quantity in determining the performance of a communication channel. The standard deviation is a measure of the spread of the probability distribution around the mean.38 × 10−23 J/K). A large value indicates a wide range of possible outcomes. σ2 = (n − η)2 p(n) (x − η)2 p(x)dx The square root of the variance is called standard deviation.02 0 0 2 4 6 8 10 12 14 16 18 20 Figure 14: Discrete distribution pdf (Poisson). The statement that noise is zero mean says that. In many cases the noise present in communication signals can be modelled as a zeromean. . on average.08 0.1 0.04 0. The Poisson distribution has a standard deviation of α2 . Tm is the temperature and B is the receiver bandwidth. we have σ 2 = kTm B where k is the Boltzman’s constant (k = 1. The Gaussian distribution contains the standard deviation within its deﬁnition. (2.16 0. The noise power referred to in the deﬁnition is the mean noise power. Gaussian random variable. the noise signal takes the values zero.21 0.18 0. It can therefore be rewritten as SN R = 10 log10 (S/σ 2 ) If only thermal noise is considered. for a discrete distribution.
22 0. 15). We shall consider in detail only BPSK systems.2 0.1 0 + 0  Figure 15: Schematic of noise on a two level line 2.8 0. The probability that an error will occur in the transmission of a 1 is 2 P (N + vb < 0) = P (N < −vb ) = √ π −vb −∞ 1 exp(−u2 )du = erfc(vb /2σ) 2 Similarly the probability that an error will occur in the transmission of a 0 is 2 P (N − vb > 0) = P (N > vb ) = √ π ∞ vb 1 exp(−u2 )du = erfc(vb /2σ) 2 It is usual to write these expressions in terms of the ratio of Eb (energy per bit) to En (noise 2 power per unit Hz). When noise is present.6 Errors in digital communication We noted earlier that one of the most important advantages of digital communications is that it permits very high ﬁdelity. and comment on the alternative modulations. the value that is read is assigned to 1. In the absence of noise. The power S in the signal is. In the ideal case. we say that a biterror has occurred.5 0.3 0. We suppose that the signal V .4 0.6 0. The probability that a biterror will occur in a given time is referred to as the biterror rate (BER) (see Fig. When this happens. the value that is read is assigned to 0. on average vb . and thus being assigned 0. and the total energy in the . from a BPSK system can take one of two values ±vb . There is a ﬁnite probability of the signal dropping below 0. In this subsection we shall investigate this more closely.9 0. which has the signal levels ±vb . is combined with noise N of variance σ 2 . if the signal is greater than 0. even though a 1 was transmitted.2. If the signal is less than 0. this distinction between ±vb (with the threshold at 0) becomes blurred. the signal V .7 0.
For BPSK. For example.23 Figure 16: Expressions for error rates in some modulation schemes 2 signalling period T is vb T . reﬂecting the fact that with a quadrature scheme. and DPSK.20) above. Using the expressions above. QPSK has twice the error probability of BPSK. T = 1/2B. Fig.20) 2 All coherent detection schemes give rise to error rates of the form in Eq. although its numerical value depends on the exact scheme used. the output is X 2 = (v + N )2 v2 N2 . Suppose the input. incoherent FSK. Power detection always decreases the SNR. X. It is quite easy to see why this is so. i. is of the form X = v + N . Incoherent schemes are forms of power detection.e. 17 shows graphs of P (error) for incoherent ASK. thus 1 P (error) = erfc( Eb /En ) (2. i. Narrowband FSK has an error probability rather worse than QPSK. Incoherent demodulation schemes have a higher probability of error than coherent schemes. (2. produce an output proportional to the square of the input.e. the expressions are given in Table 16. as before. BPSK. we have 1 1 erfc(vb /2σ) = erfc( 2 2 Eb ) 2T En B where we have used the fact that σ 2 = kTm B = En B for temperature Tm . The input SNR is SN Rin = If we square the input. the signaling period T is half the reciprocal of the bandwidth B. there are more ways an error can occur.
and the SNR of the output is SN Rout (v 2 )2 SN Rin ∼ = 2 (2vN ) 4 This decrease in the signaltonoise ratio causes an increase in the error probability. can dramatically reduce the error rate over long channels. However. The detailed analysis is beyond our scope. In particular. it is simple to calculate ﬁrst the probability of no error. raised to the Nth power: P ( No error over N regenerators) = (1 − P (error))N assuming the regenerators are regularly spaced and the probabilities are independent. To determine the BER of the channel with N regenerators.0001 1e05 1e06 1e07 1e08 1e09 2 4 6 8 10 Eb/En (dB) 12 14 16 Figure 17: Comparison of error rates in some modulation schemes Assuming the SNR is high.001 "ask" "fsk" "bpsk" "dpsk" P{error} 0. they are being attenuated. . their performance is good nonetheless. The conversion from error probability to BER is numerically simple: BER = P (error).01 0. As the signal is losing power. This explains the widespread use of incoherent ASK and FSK. This probability is the probability of no error over one regenerator.1 0. vN >> N 2 . however. Although poorer. This may or may not be a reasonable assumption. When signals travel along the channel. loss of timing can cause multiple bit failures that can dramatically increase the BER. Regenerators.24 1 0. The BER is then determined simply by: P ( error over N regenerators) = 1 − P ( no error over N regenerators) This avoids having to enumerate all the ways in which the multiple system can fail. this conversion assumes that the probability of errors from bittobit are independent. the BER increases with the length of the channel. placed at regular intervals. Error rates are usually quoted as bit error rates (BER).
25
data
Figure 18: The received signal could be 1001010 or 11000011001100. 2.2.7 Timing control in digital communication In addition to providing the analogue modulation and demodulation functions, digital communication also requires timing control. Timing control is required to identify the rate at which bits are transmitted and to identify the start and end of each bit. This permits the receiver to correctly identify each bit in the transmitted message. Bits are never sent individually. They are grouped together in segments, called blocks. A block is the minimum segment of data that can be sent with each transmission. Usually, a message will contain many such blocks. Each block is framed by binary characters identifying the start and end of the block. The type of method used depends on the source of the timing information. If the timing in the receiver is generated by the receiver, separately from the transmitter, the transmission is termed asynchronous. If the timing is generated, directly or indirectly, from the transmitter clock the transmission is termed synchronous. Asynchronous transmission is used for low datarate transmission and standalone equipment. We will not discuss it in detail here. Synchronous transmission is used for high data rate transmission. The timing is generated by sending a separate clock signal, or embedding the timing information into the transmission. This information is used to synchronize the receiver circuitry to the transmitter clock. The necessity to introduce a clock in signal transmission is obvious if we look at Fig. 18. Without a clock, would we be able to tell whether it is a 1001010 or a 11000011001100? Synchronous receivers require a timing signal from the transmitter. An additional channel may be used in the system to transmit the clock signal. This is wasteful of bandwidth, and it is more customary to embed the timing signal within the transmitted data stream by use of suitable encoding (selfclocking encoding).
26
Figure 19: Bipolar coding. In bipolar coding, a binary 0 is encoded as zero volts. A binary 1 is encoded alternately as a positive voltage and a negative voltage (see Fig. 19). Other systems must synchronize using some form of outofband communication, or add frame synchronization sequences that don’t carry data to the signal. These alternative approaches require either an additional transmission medium for the clock signal or a loss of performance due to overhead, respectively. A bipolar encoding is an often good compromise: runs of ones will not cause a lack of transitions, however long sequences of zeroes are still an issue. Since the signal doesn’t change for as long as the data to send is a zero, they will result in no transitions and a loss of synchronization. Where frequent transitions are a requirement, a selfclocking encoding such as Manchester code discussed below may be more appropriate. Manchester code (also known as Phase Encoding, or PE) is a form of data communications in which each bit of data is signiﬁed by at least one voltage level transition. Manchester encoding is therefore considered to be selfclocking, which means that accurate synchronisation of a data stream is possible. Manchester coding has been adopted into many efﬁcient and widely used telecommunications standards, such as Ethernet. Here is a summary for Manchester code: • Data and clock signals are combined to form a single selfsynchronizing data stream • each encoded bit contains a transition at the midpoint of a bit period • the direction of transition determines whether the bit is a ”0” or a ”1,” and • the ﬁrst half is the true bit value and the second half is the complement of the true bit value (see Fig. 20).
27
Figure 20: Manchester coding. Manchester codes always have a transition at the middle of each bit period. The direction of the midbit transition is what carries the data, with a lowtohigh transition indicating one binary value, and a hightolow transition indicating the other. Transitions that don’t occur midbit don’t carry useful information, and exist only to place the signal in a state where the necessary midbit transition can take place. Though this allows the signal to be selfclocking, it essentially doubles the bandwidth. However, there are today many more sophisticated codes (8B/10B encoding) which accomplish the same aims with less bandwidth overhead, and less synchronisation ambiguity in pathological cases.
28
3 Information and coding theory
Information theory is concerned with the description of information sources, the representation of the information from a source, and the transmission of this information over channel. This might be the best example to demonstrate how a deep mathematical theory could be successfully applied to solving engineering problems. Information theory is a discipline in applied mathematics involving the quantiﬁcation of data with the goal of enabling as much data as possible to be reliably stored on a medium and/or communicated over a channel. The measure of data, known as information entropy, is usually expressed by the average number of bits needed for storage or communication. Applications of fundamental topics of information theory include ZIP ﬁles (lossless data compression), MP3s (lossy data compression), and DSL (channel coding). The ﬁeld is at the crossroads of mathematics, statistics, computer science, physics, neurobiology, and electrical engineering. Its impact has been crucial to success of the Voyager missions to deep space, the invention of the CD, the feasibility of mobile phones, the development of the Internet, the study of linguistics and of human perception, the understanding of black holes, and numerous other ﬁelds. Information theory is generally considered to have been founded in 1948 by Claude Shannon in his seminal work, A Mathematical Theory of Communication. The central paradigm of classic information theory is the engineering problem of the transmission of information over a noisy channel. The most fundamental results of this theory are Shannon’s source coding theorem, which establishes that, on average, the number of bits needed to represent the result of an uncertain event is given by its entropy; and Shannon’s noisychannel coding theorem, which states that reliable communication is possible over noisy channels provided that the rate of communication is below a certain threshold called the channel capacity. The channel capacity can be approached by using appropriate encoding and decoding systems. Information theory is closely associated with a collection of pure and applied disciplines that have been investigated and reduced to engineering practice under a variety of rubrics throughout the world over the past half century or more: adaptive systems, anticipatory systems, artiﬁcial intelligence, complex systems, complexity science, cybernetics, informatics, machine learning, along with systems sciences of many descriptions. Information theory is a broad and deep mathematical theory, with equally broad and deep applications, amongst which is the vital ﬁeld of coding theory which is the main focus of our course. Coding theory is concerned with ﬁnding explicit methods, called codes, of increasing the efﬁciency and reducing the net error rate of data communication over a noisy channel to near the limit that Shannon proved is the maximum possible for that channel. These codes can be roughly subdivided into data compression (source coding) and errorcorrection (channel coding) techniques. In the latter case, it took many years to ﬁnd the methods Shannon’s work proved were possible. A third class of information theory codes are cryptographic algorithms (both codes and ciphers). Concepts, methods and results from coding theory and information theory are widely used in cryptography and cryptanalysis.
the outcome ‘in ofﬁce’ contains little information. so too does the entropy. For two independent outcomes xi and xj . He deﬁned the information contained in an outcome xi as I(xi ) = − log2 p(xi ) This measure satisﬁed our requirement that the information contained in an outcome is proportional to its uncertainty. i. it is a highly improbable event.e. If P (xi ) = 1. When one outcome is certain. until it reaches a maximum when p = 1 − p = 0. When p is greater than 0. . the curve declines symmetrically to zero. an information source is a probability distribution. • The Prime Minister will be naked and run 10 miles in London. It is the mean information provided by the source. and the entropy is zero. Clearly. An almost certain outcome contains little information. telling us that a certain event contains no information. I(xi and xj ) = log2 P (xi and xj ) = log2 P (xi )P (xj ) = I(xi ) + I(xj ) Hartley’s measure deﬁnes the information in a single outcome. it is a highly probable outcome. however contains considerable information. then I(xi ) = 0. The entropy is measuring the average uncertainty of the source. In information theory. This prediction is an information source. A measure of the information contained in an outcome was introduced by Hartley in 1927.29 3. 21) takes the value zero when p = 0. The deﬁnition above also satisﬁes the requirement that the total information in independent events should add. As p increases.1 Information sources and entropy We start our examination of information theory by way of an example. The entropy of the source is H(X) = −p log2 p − (1 − p) log2 (1 − p) The function (Fig. our Prime Minister prediction for two days contains twice as much information as that for one day. The outcome ’naked run’.5. but by how uncertain it is. We conclude that the average information in the BSS is maximised when both outcomes are equally likely. The measure entropy H(X) deﬁnes the information content of the source X as a whole. We have H(X) = P (xi )I(xi ) = − P (xi ) log2 P (xi ) i i A binary symmetric source (BSS) is a source with two outputs whose probabilities are p and 1 − p respectively. Clearly. Assume such information source has two outcomes: • The Prime Minister will be in his ofﬁce. The prime minister discussed is a BSS. Consider predicting the activity of Prime Minister tomorrow. This reﬂects the fact that the information contained in an outcome is determined not only by the outcome. a set of probabilities assigned to a set of outcomes. reached when p = 1. so is the other.5.
When p = 0. Suppose the probability of ‘naked run’ is 0. so that we might say the equiprobable BSS has an information rate of 1 bit/bit. H(X) = 0.5. This representation uses 1 binary bit per symbol.30 1 0. the BSS information rate falls.47 bits of information.1 (N) and that of ‘ofﬁce’ is 0. hence is using more binary bits per symbol than the entropy suggests is necessary. Suppose we identify ‘naked run’ with 1 and ‘ofﬁce’ with zero. while the denominator bit refers to the symbol 1 or 0. or average information content per symbol. Indeed.2 Information source coding It seems intuitively reasonable that an information source of entropy H needs on average only H binary bits to represent each symbol.47 bits/symbol. p (The term entropy is borrowed from thermodynamics. There too it is a measure of the uncertainty. engineers have used the word bit to describe both the symbol.47 bits/symbol. The numerator bit refers to the information content.4 p 0. An equally probable BSS has an entropy.2 0. A BSS whose output are 1 or 0 has an output we describe as a bit. When p = 0.8 H(X) 0. 3.2 0 0 0. consider the prime minister example again.9 (O). By long tradition. H(X) = 1. or disorder of a system).8 1 Figure 21: Entropy vs. When p = 0. The entropy of source is also measured in bits. each symbol (1 or 0) of source output provides 0.6 0. of 1 bit per symbol. We have already noted that this source has an entropy of 0.5. the equiprobable BSS generate on average 1 information bit per symbol bit. The unit of entropy is bits/symbol. This means that on average. However.4 0. The Shannon’s ﬁrst theorem states that an instantaneous code can be found that encodes a . We can avoid this by writing it as 1 bit/symbol. and its information content.6 0.1.
001=1. Moreover. and assign binary codewords to these grouped outcomes.009 0.081)∗3+0. This condition is not satisﬁed by the code in Table 1. 3. according to their probability. The replacement of the symbols naked run/ofﬁce with a binary representation is termed source coding. As a consequence of Shannon’s Source coding theorem.729)+0. Central to source coding is the use of sequence. however. It is easy to compute that this code will on average use 1.081∗log2 (0.081 0.2 This example shows how using sequences permits us to decrease the average number of bits per symbol. In any coding operation we replace the symbol with a codeword. and the probability of each codeword occurring. In . let us return to the problem of the Prime Minister. each codeword can be decoded into its symbol sequence. without difﬁculty. the entropy is a measure of the smallest codeword length that is theoretically possible for the given alphabet with associated weights.729 Codeword 0 Codeword Length (in bits) 1 Weighted length Entropy OON ONO NOO NNO NON ONN NNN 0. Before a code can be decoded. The purpose of source coding is to reduce the number of bits required to convey the information provided by the information source: minimize the average length of codes.081*2+2*0. we have found a code that has an average bit usage less than the source entropy. it must be parsed.009*2+3*0.001) = −1.009)∗3+0.729∗log2 (0. Suppose we group the outcomes in three.009∗log2 (0.009 0.009+3*0.2.31 Sequence OOO Probability 0.4 bits/sequence 0. is an instantaneously parsable code. but to a sequence of outcomes. It satisﬁes the preﬁx condition.1 Huffman coding The code in Table 2. An instantaneous code must satisfy the preﬁx condition: that no codeword may be a preﬁx of any other codeword.001∗log2 (0.729*1+0. However. To see why this is useful.081 0. By this. we mean that codewords are not simply associated to a single outcome. Table 1 shows such a code. After parsing. the longer the sequence of symbols. An instantaneously parsable code is one that can be parsed as soon as the last bit of a codeword is received.081 0. Parsing describes that activity of breaking the message string into its component codewords. the closer Bs will be to H(X). there is a difﬁculty with the code in Table 1.001 1 01 10 11 00 000 111 1 2 2 2 2 3 3 Table 1: Variable length source coding source of entropy H(X) with an average number of bits per symbol Bs such that Bs ≥ H(X) Ordinarily.009 0.4070 The average length of coding is given by 0.081*1+2*0.
Fano. Note that Huffman coding relies on the use of bit patterns of variable length. who had worked with information theory inventor Claude Shannon to develop a similar code.009 011 010 001 00011 00010 3 3 3 5 5 G 0. the data symbols are encoded as bit pattern of a ﬁxed length. only slightly larger than the calculated entropy of 1. ONO=C.729 Codeword 1 Codeword Length (in bits) 1 Weighted length Entropy B C D E F 0. 8 bits. We might conclude that there is little point in expending the effort in ﬁnding a code better than the Huffman code. coding scheme such as Huffman coding are used on a source symbol set to produce variable bit length coding and are referred to as compression algorithms. the letters A to H have been used in place of the sequences in Table 2 to make them easier to read. this is the Huffman code for the sequence set.e. ﬁnd a preﬁxfree binary code with minimum expected codeword length. was about to give up and start studying for the ﬁnal when he hit upon the idea of using a frequencysorted binary tree and quickly proved this method the most efﬁcient. but it is very close to the theoretical limit established by Shannon.59. In Fig.32 Sequence A Probability 0.6 bits per symbol. In doing so. David Huffman and his MIT information theory classmates were given the choice of a term paper or a ﬁnal exam.729 ∗ 1 + 0.009 ∗ 5 ∗ 3 + 0. 23. So not only is this code optimal in the sense that no other feasible code performs better. History In 1951. Robert M. NON=F. Often. unable to prove any codes were the most efﬁcient. NOO=D.5980 bits per sequence. This is done for technical simplicity. Huffman avoided the major ﬂaw of the suboptimal ShannonFano coding by building the tree from the bottom up instead of from the top down. i. . NNN=H this example.001 ∗ 5 = 1. ONN=G.001 00000 5 Table 2: OOO=A. with the highest probability at the top of the list.081 0. In both these ﬁgures. A Huffman code tree is an unbalanced binary tree. Problem deﬁnition Given a set of symbols and their probabilities.009 00001 5 H 0. the weighted average codeword length is 1. In fact. 22 the sequences are ordered with respect to the probability of the sequence occurring.081 0. OON=B. in terms of branch nodes and leaf nodes by combining probabilities and removing leaf nodes in progressive stages. The code in Table 2 uses 0. the student outdid his professor. assigned a term paper on the problem of ﬁnding the most efﬁcient binary code. The derivation of the Huffman code tree is shown in Fig.081 0.009 0. In most data communication systems. The tree is derived bottom up.081 ∗ 3 ∗ 3 + 0. The codeword for each sequence is found by generating the Huffman code tree for the sequence. Huffman. NNO=E. 22 and the tree itself is shown in Fig. The professor.
028 0 F E 1 0.271 0 C C C C B 1 0.109 0.018 1 0 E F 0 0.271 0 0.162 1 0.010 0 G 1 F 0 H 0 Figure 22: Example derivation of a Huffman code tree 1 A 1 0.028 0 0.018 1 0.010 0.33 A A A A A A A 1 B B B B 0.162 1 B 0 0 0.109 0 D D D D 1 C 0 E 0.010 1 0 G H Figure 23: Example Huffman code tree .109 1 C D 1 0.
Queue all leaf nodes into the ﬁrst queue (in order). the tree has now been generated. As we have already noted. The Huffman tree shown in Fig. Associated with each output is a probability p that the output is correct. To derive the codewords from the tree. but it makes no statements as to what code is required. 2. The remaining node is the root node. The binary channel is a channel with a binary input and output. Equality is achieved only when the SNR is inﬁnity. The two lowest leaf nodes are now E and F. and list the 1s and 0s in the order they appear until you reach the leaf node for one of the letters.34 As shown in Fig. and a probability 1 − p that it is not. and the topmost node is labelled with a 1 and the lower one with a 0. provided R < C. No none has yet found a code that permits the use of a channel at its capacity. We summarize it here. In fact. In the next stage the symbol G and H are represented by that with the lowest weight and the list is rewritten. Shannon has thrown down the gauntlet. 22. Shannon’s theorem is both tantalizing and frustrating. 23 is then produced by following backwards along the arrows. 5. It offers errorfree transmission. and their weights are added to be taken onto the next stage. Queue the new node into the second queue. While there is more than one node in the queues: • Remove two nodes with the lowest weight from the queues. with the two justremoved nodes as children (either node can be either child) and the sum of their weights as the new weight. all we may deduce from the proof of the theorem is that is must be a long one. • Create a new internal node. in as much as he has proved that the code exists. For such a channel.3 Channel Capacity One of the most famous results of information theory is Shannon’s channel coding theorem. the astonishing part of the theory is the existence of a channel capacity. again in order of the weights. descend from the top node. However. • Update the parent links in the two justremoved nodes to point to the justcreated parent node. For a given channel there exists a code that will permit the errorfree transmission across the channel at a rate R. the two lowest leaf nodes G and H have their weights added. respectively. 3. We shall not give a description of how the capacity is calculated. Start with as many leaves as there are symbols. However. and they are labelled 1 and 0. the channel capacity turns out to be: C = 1 + p log2 p + (1 − p) log2 (1 − p) . the channel capacity. Creating the tree: 1. 4. This continues until only two nodes remain. an example is instructive. 3.
When a code is transmitted over a channel in the presence of noise. in bits/second.1 Hamming distance The task of source coding is to represent the source information with the minimum number of symbols. We normally quote the capacity as a rate.8 0. Thus if there is an equal probability of receiving a 1 or a 0. errors will occur.6 C 0. Here (Fig.5. 3.4. If p = 0. p is the bit error probability. . we can rephrase the channel coding theorem: A source of information with entropy H(X) can be transmitted error free over a channel provided H(x) ≤ C. For the binary channel we have C = B[1 + p log2 p + (1 − p) log2 (1 − p)] For the binary channel the maximum bit rate W is 2B.e. then C = 0. For a channel of bandwidth B.4 0. the capacity is always less than the bit rate.2 0. or information rate. then C = 1. So deﬁned.8 1 Figure 24: Channel capacity. 24).35 1 0.6 0. we can transmit at most 2B changes per second. the channel is completely unreliable and no message can be sent across it.4 p 0. the channel capacity is a nondimensional number. We note that C < W . If the source is optimally coded. In theory we have W >C>D Shannon’s channel coding theorem applies to the channel. If p = 0. Thus the capacity in bits/second in 2BC.2 0 0 0.4 Error detection coding 3. i. To do this we relate each output to a change in the signal. irrespective of the signal sent. not to the source. describes the rate of transfer of data bits across the channel. The data rate D.
we try to ensure that the Hamming distance between the possible codewords x is larger than the Hamming distance arising from errors. It can be shown that to detect n bit errors. 3. c = (000111) if the transmitter sends 10000 but there is a signal bit error and the receiver gets 10001.2 Parity Check Codes The theoretical limitations of coding are placed by the results of information theory. error checking code are called parity check codes. if a few errors occur in transmission. where N > K. Thus R < 1. In a binary channel. If all possible outputs of the channel correspond uniquely to a source input. To detect. there is no possibility of detecting errors in the transmission. The idea of similarity is made more ﬁrm by the deﬁnition of the Hamming distance.4. the receiver may ask for a retransmission of the erroneous bits. Errors occur– they must. The rate of the code is the ratio K/N . and possibly correct errors. be detected. The decoder will assign to y the code x that minimises the Hamming distance between x and y. it can be seen that the nearest codeword is in fact 10000 and so the correct codeword is found. This is possible because although incorrect. the channel code sequence must be longer the the source sequence. y is received. The rate R of a channel code is the average ratio of the source sequence length to the channel code length. consider the codewords: a = (100000). Suppose the code x is transmitted over the channel. the output can still be decoded with the correct input. These results are frustrating in that they offer little clue as to how the coding should be performed. • The Hamming distance between ”toned” and ”roses” is 3. For example. It can be also shown that to correct n bit errors requires a coding scheme with a least a Hamming distance of 2n + 1 between the codewords.36 The task of channel coding is to represent the source information in a manner that minimises the probability of errors in decoding. a coding scheme requires the use of codewords with a Hamming distance of at least n + 1. A good channel code is designed so that. or it may simply inform the recipient that the transmission was corrupted. Practical codes are normally block codes. Error detection coding is designed to permit the detection of errors. Once detected. the output is sufﬁciently similar to the input to be recognisable. and . The Hamming distance between these two codes is the number of symbols that disagree. Due to error. It is apparent that channel coding requires the use of redundancy. b = (011000). at the very least. By designing a good code. Let x and y be two binary sequences of same length. • The Hamming distance between 2143896 and 2233796 is 3. For example: • The Hamming distance between 1011101 and 1001001 is 2. A block code converts a ﬁxed length of K data bits to a ﬁxed length N code word.
errors occur in groups. . but a high redundancy. a polynomial code is used. Then P ( single bit error ) P ( no error in single bit ) P ( no error in 8 bits ) P ( unseen error in 8 bits ) = = = = p 1−p (1 − p)8 1 − (1 − p)8 = 7. With even (odd) parity. So: P ( no error single bit ) = 1 − p P ( no error in 9 bits ) = (1 − p)9 P ( single error in 9 bits ) = 9[P ( single bit error )P (no error in other 8 bits)] = 9p(1 − p)8 P ( unseen error in 9 bits ) = 1 − P (no error in 9 bits) − P (single error in 9 bits) = 1 − (1 − p)9 − 9p(1 − p)8 = 3. The column is formed by listing each successive 8 bit word one beneath the other. A very common code is the single parity check code. The receiver will assign to the received codeword the preassigned codeword that minimises the Hamming distance between the two words. or bursts. the probability of a transmission with an error is as above. However. in many circumstances. A common choice of K is eight. Parity checking of the kind just described then provides little protection. but also the parity of each 8 bit column. Suppose that BER is p = 10−4 . However. If we wish to identify any pattern of n or less errors. the Hamming distance between the preassigned codes must be n + 1 or greater. and it can correct any single 2 bit error in the transmitted block of rows and columns. 3 ) Parity checking in this way provides good protection against single and multiple errors when the probability of the errors are independent. A low rate has a high detection probability. a single bit error will make the received word odd (even). and hence are separated by a Hamming distance of 2 or more. consider the following example. In these circumstances.6 × 10−7 As can be seen the addition of a parity bit has reduced the uncorrected error rate by three orders or magnitude. there are some combinations of errors that will go undetected in such a scheme (see Table. additional parity symbols are added that check not only the parity of each 8 bit row. This code appends to each K data bits an additional bit whose value is taken to make the K + 1 word even or odd. Single parity bits are common in asynchronous.9 × 10−4 So.37 the redundancy of the code is 1 − K/N . Where synchronous transmission is used. Such a choice is said to have even (odd) parity. The preassigned code words are always even (odd). To see how the addition of a parity bit can improve error performance. This type of parity checking is called block sum checking. character oriented transmission. With the additional of a parity error bit we can detect any single bit error. Our ability to detect errors depends on the rate.
We have only considered coding as a mechanism for improving the integrity of the communication system in the presence of noise. the propagation delay often means that many frames are transmitted before an instruction to retransmit is received. the correction is trivial: the bit is inverted. In these special circumstances. This can make the task of data handling very complex. The use of digital computers has made highly secure communication a normal occurrence. Realtime transmission often precludes retransmission. In a straight permutation. There are two principle types: Hamming codes and convolutional codes. Again we will not discuss them in details here. However. This means that the algorithm splits the plain text (message to be encrypted) into (usually) ﬁxed size blocks which are then subjected to various functions to produce a block of cipher text. In expansion. We shall not discuss it in details. It is therefore uncommon in terrestrial communication. as well as being . The most common functions are permutations based on expansion and compression and straight shufﬂing transformations. It is necessary to get it right ﬁrst time. often markedly so.5 Encryption In all our discussion of coding. the additional bandwidth required for the redundant checkbits is an acceptable price. the bits of an n bit block are simply reordered. in satellite communication. p2 is even parity for columns (*) mark undetected error combination. The basis for key based encryption is that is very much easier to encrypt with knowledge of the key than it is to decipher without knowledge of the key. The mechanism of polynomial codes is beyond the scope of this course. 3.38 p1 0 1 0 0 1 0 1 1 p2 1 B6 1 0 1 0 0 1 1 0 1 B5 0 1 0 (*) 1 1 0 1 (*) 0 0 B4 0 0 0 0 0 0 0 0 0 B3 0 1 0 0 1 0 0 0 0 B2 0 0 1 (*) 0 1 0 0 (*) 0 0 B1 0 0 1 0 0 0 1 1 0 B0 0 0 0 0 1 0 1 1 1 Table 3: p1 is odd parity for rows. The most commonly used encryption algorithms are block ciphers. we have not mentioned what is popularly supposed to be the purpose of coding: security. With a computer the number of the digits in the lock can be very large. Error correction coding is more sophisticated than error detection coding. one still has to keep the combination secure. The use of coding for security has a different name: encryption. Its aim is to detect and locate errors in transmission. The principle is just that of a combination lock. Of course. where better performance is usually obtained with error detection and retransmission. Error correction coding requires lower rate codes than error detection. Once located.
private key. The receiver makes available the ﬁrst. This key is use by the sender to encrypt the message. In a matched key scheme. with current . with some of the original bits unused (see Fig. it would still take over 20 years to break the code. A unique key of 56 bits is then used to perform a succession of transposition and substitution operations. The message is encrypted using one key. If the code is changed once per year. The DES is a block cipher. 25). In this way the private key needs not be transferred. the encryption is not reversible with the same key. the n bit block in converted to a p bits (p < n). the grouped of n bits is converted to m bits (m > n).39 O0 I0 I1 I2 I3 O1 O2 O3 O4 O5 I0 I1 I2 I3 I4 I5 O0 O1 O2 O3 I0 I1 I2 I3 I4 I5 O0 O1 O2 O3 O4 O5 expansion compression straight Figure 25: Examples of block cipher permutations. splitting the data stream into 64bit blocks that are enciphered separately. This permits an interceptor in possession of the key to introduce additional blocks without the recipient being aware of this fact. This is avoided by the use of matched keys. The receiver uses the same key to decipher the cipher text into plain text. The difﬁculty with this method is that each block is independent. Like the combination of a lock. If the key is changed often. A 56 bit key has 7. public key. It is based on the use of multiplying extremely large numbers and. and decrypted with a second. because the transfer of the key between sender and receiver may not be secure. This message is unintelligible to anyone not in possession of the second. The most widely used form of encryption is deﬁned by the National Bureau of Standards and is known as the data encryption standard (DES). reordered. unless the code breaker had additional information. with some bits being duplicated. the security of the key becomes a problem. Shamir and Adleman (RSA). the system is only secure if the key is secure. there is little possibility of it being broken. matched key. The most famous of such scheme is the Public Key mechanism using the work of Rivest.216 possible combinations. Assuming a powerful computer could attempt 108 a combinations per second. In compression. The DES converts 64 bits of of plain text into 64 bits of cipher text.
while it is trivial to multiply two large numbers and together. and prizes were awarded according to a complicated formula. it can be extremely difﬁcult to determine the factors if only their product is given. 1992 Jun. 2003 (Franke 2003. 2005 (Weisstein 2005) Dec. their factorization is signiﬁcant because of the curious property of numbers that proving or disproving a number to be prime (primality testing) seems to be much easier than actually identifying the factors of a number (prime factorization). this property can be used to create practical and efﬁcient encryption systems for electronic data. While RSA numbers are much smaller than the largest known primes. 22. and because it can be helpful for users of the RSA encryption publickey cryptography algorithm for choosing suitable key lengths for an appropriate level of security. Peterson 1999) Apr. 4. 1996 Feb. 1999 (te Riele 1999) Apr. 1999 (te Riele 1999. With some ingenuity. 10. Thus. 3. Weisstein 2003) Nov. 2004 (Aoki 2004) Aug.RSA numbers were originally spaced at intervals of 10 decimal digits between 100 and 500 digits. Number digits RSA − 100 RSA − 110 RSA − 120 RSA − 129 RSA − 130 RSA − 140 RSA − 150 RSA − 155 RSA − 160 RSA − 200 RSA − 576 RSA − 640 RSA − 704 RSA − 768 RSA − 896 RSA − 102 RSA − 153 RSA − 204 prize (USD) 100 110 120 129 130 140 150 155 160 200 10000 20000 30000 50000 75000 100000 150000 200000 factored (references) Apr. A list of the open Challenge numbers may be downloaded from RSA homepage. A cash prize is awarded to the ﬁrst person to factor each challenge number. 2005 (Weisstein 2005) open open open open open open Example 3 In order to see all this in action. 2003 (Bahr et al. is computationally very expensive. we want to stick with numbers that we can actually work with. Cipra 1995) Apr. 2003) May 9. RSA numbers are composite numbers having exactly two prime factors that have been listed in the Factoring Challenge of RSA Security? and have been particularly chosen to be difﬁcult to factor. 1991 Apr. RSA Laboratories sponsors the RSA Factoring Challenge to encourage research into computational number theory and the practical difﬁculty of factoring large integers. 1993 Apr.40 technology. 6. 2. . 1. 1994 (Leutwyler 1994.
7. When the message ”RTQEO” arrives on the other end of our insecure phone line. 21 – or RTQEO in our personalized character set. .19. 7. Encoding We create the following character set: 2 A 19 N 38 sp 3 B 21 O 39 0 4 C 23 P 41 1 6 D 24 Q 42 2 7 E 26 R 43 3 8 F 27 S 46 4 9 G 28 T 47 5 12 H 29 U 48 6 13 I 31 V 49 7 14 J 32 W 51 8 16 K 34 X 52 9 17 L 36 Y 53 ? 18 M 37 Z The message we will encrypt is ”VENIO” (Latin for ”I come”): V E N I O 31 7 19 13 21 To encode it. our public key. our private key (RSA number. our original message. our private key.41 Finding RSA numbers So we have 7 for P. very small).21 – or ”VENIO”. we simply need to raise each number to the power of P modulo R=55. our encrypted message is 26. in place of P. R: T : Q: E: O: 2623 (mod 55) = 350257144982200575261531309080576(mod 55) = 31 2823 (mod 55) = 1925904380037276068854119113162752(mod 55) = 7 2423 (mod 55) = 55572324035428505185378394701824(mod 55) = 19 723 (mod 55) = 27368747340080916343(mod 55) = 13 2123 (mod 55) = 2576580875108218291929075869661(mod 55) = 21 The result is 31. and 23 for Q. 28. we can decrypt it simply by repeating the process – this time using Q. V : E: N: I: O: 317 (mod 55) = 27512614111(mod 55) = 26 77 (mod 55) = 823543(mod 55) = 28 197 (mod 55) = 893871739(mod 55) = 24 137 (mod 55) = 62748517(mod 55) = 7 217 (mod 55) = 1801088541(mod 55) = 21 So. 24.13.
x(n) · · · } This can be used to represent a sampled signal. The ZT’s main property concerns the effects of a shift of the original sequence. r. we simply invert the ZT by long division X(z) = 1 + rz −1 + rz −2 + · · · which obviously corresponds to the original sequence. into a complex frequency domain representation. The most basic tool in DCSP is the Z transform (ZT). Example 4 The ﬁrst point to note {x(n)} = {1. r2 . we are going to present a detailed discussion on Sampling theorem mentioned before: how fast to sample an analogous signal so that we could recover the signal perfectly. x(0). where the variable z is a complex number in general and is about ZT’s is that some sequences have simple rational ZT’s. which is related to the generating function used in the analysis of series. n < 0 has ZT X(z) = which can be simpliﬁed if r/z < 1 to give X(z) = 1/(1 − rz −1 ) To check that this is correct. xn being used in place of x(n). · · · }. which is a sequence of real numbers. Consider the ZT Y (z) = x(n)z −(m+n) rn z −n . The ZT of {x(n)} is X(z) = x(n)z −n ∞ n=−∞ . x(−1). x(2). i. discrete time Fourier transform (DTFT) and discrete Fourier transform (DFT). x(1). including z transform. the Ztransform converts a discrete time domain signal.42 4 Signal Representation In this Chapter. x(n) = x(nT ). · · · . which is written {x(n)} = {· · · .1 Sequences and their representation A sequence is an inﬁnite series of real numbers {x(n)}. Sometimes sequence elements are subscripted. where x(t) is the original (continuous) function of time. n ≥ 0 and x(n) = 0. 4. In mathematics and signal processing.e. We will also introduce some basic tools which are essential for our later part of the module.
The fundamental reason why we use ZT is that convolution is reduced to multiplication and this is a consequence of the even more basic shift property expressed in the equation above. It is deﬁned as 1 if n = 0 δ(n) = (4. Now if we add sequences {a(n)}. we get a sequence {c(n)} = {· · · . and it provides the simplest expansion.1) 0 otherwise Any discrete signal can be expanded into the superposition of elementary shifted impulse.43 which can be written Y (z) = z −m X(z) This obviously corresponds to the sequence {y(n)} = {x(n − m)}. The question arises as to what sequence this represents. b(n). . · · · } with ZT C(z) = (a(n) + b(n))z −n = A(z) + B(z) which is just the sum of the ZT’s of {a(n)}. {b(n)}. This is expressed as x(n) = x(k)δ(n − k) where each term x(k)δ(n − k) in the summation expresses the nth sample of the sequence. i. each one representing each of the samples. a(0) + b(0). a(−1) + b(−1). Example 5 (Discrete time unit impulse) The unit impulse (Fig. {b(n)}. If we write it out in full C(z) = which can be rewritten C(z) = which is the ZT of the sequence {c(n)} = a(m)b(n − m) a(n)b(m)z −(m+n) a(n)z −n b(m)z −m Sequences {c(n)} of this form are called the convolution of the two component sequences a(n). as we shall see in due course. the ZT is linear. and are sometimes written as c(n) = a(n) ∗ b(n) Convolution describes the operation of a digital ﬁlter. Now consider the product of two ZT’s C(z) = A(z)B(z) for example. 26) δ(n) is the most elementary signal. a(1) + b(1).e.
2 Discrete Time Fourier Transform (DTFT) The basic tool of signal analysis is the Fourier transform. of a sinusoid. the goal is to determine the frequency content of a signal from a ﬁnite set of samples stored on a disk or a tape. For example. we know what the Fourier transform of a rectangular pulse. With the DFT we compute and estimate of the frequency spectrum of any sort of data set stored as an array of numerical entries. The latter involves a variant of the DFT called the discrete cosine transform (or DCT). Although the Fourier transform is not the only transform.44 Figure 26: An impulse δ(n) and a shifted impulse δ(n − k) 4. By deﬁnition . is the algorithm we use for numerical computation. Most likely. and (2) compression. it is the one most widely used as a tool in signal analysis. which is treated in detail in a number of references and revisited before. which is particularly well behaved when applied to approximation problems. in most applications of interest. we will be concentrating on two classes of applications: (1) spectral estimation. In this chapter we quickly review the Fourier transform for discrete time signal (the DTFT. you have seen the Fourier transform in its symbolic formulation applied to signals expressed mathematically. or of a decaying exponential is. However. In particular. discrete time Fourier transform) and present the DFT in detail. The discrete Fourier transform (DFT). Deﬁnition and properties: The DTFT gives the frequency representation of a discrete time sequence with inﬁnite length.
45 4 magnitude X(ω) 1 3 0. · · · then ∞ X(ω) = n=−∞ 0.5n . π ≤ ω < π n=−∞ π x(n) = IDT F T (X(ω)) = 1 2π (4. n = 0. 28) consider the signal x(n) = 0. which yields 1 2π π ∞ X(ω) exp(jωn)dω = −π m=−∞ x(m) 1 2π π exp(jω(n − m))dω −π = x(n) where we used the fact that the term in brackets in the expression above is 1 when n = m and 0 for all other cases.5 exp(−jω) .5 0 0 5 10 15 −1 0 5 10 15 Figure 27: DTFT magnitude and phase (right).5 2 0 1 −0. 1. as shown in Fig. X(ω) is always periodic with period 2π. since X(ω + 2π) = X(ω) and this is the reason why all the information is included within one period. say in the interval −π ≤ ω < π. X(ω) = DT F T {x(n)} = ∞ x(n) exp(−jωn).5n exp(−jωn) = 1 1 − 0. 2.2) X(ω) exp(jωn)dω. (27) Example 6 (Fig. By deﬁnition. −∞ < n < ∞ −π The inverse discrete time Fourier transform (IDTFT) can be easily determined by substituting the expression of the DTFT.
Magnitude of DTFT of X(ω).4 1. phase of DTFT of X(ω).4 1.6 1.4 −0.8 0. Bottom panel. one period of it.4 0.2 0 −0.6 0.46 2 2 1.8 −30 −20 −10 0 10 20 30 40 −3 −2 −1 0 1 2 3 0.2 −0.2 1 1 0.8 1.8 1. .6 −0. right.8 −30 −20 −10 0 10 20 30 40 Figure 28: Upper panel (left).6 1.2 1.
2 0 −0. X(ω) = πδ(ω − ω0 ) + πδ(ω + ω0 ). especially when we want to determine the frequency spectrum of a signal measured from an experiment.4 0. as in Fig.2. we do not have inﬁnite memory. For example. and (b) we compute X(ω) for all possible frequencies ω in the interval [−π.6 −0. π).2 −0.8 −1 −3 −2 −1 0 1 2 3 4 Figure 29: Sinusoid and its DTFT Example 7 Consider the signal x(n) = δ(n) then ∞ X(ω) = n=−∞ δ(n) exp(−jωn) = 1 4. 29 where we assume all frequencies ω and ω0 to be within the intervals [−π. In addition. more precisely complex exponentials.6 0. if a sequence is a sinusoid x(n) = cos(ω0 n) of inﬁnite length.47 1 0. in which case we can compute it symbolically. In this approach we begin analyzing a signal by determining the frequencies contributing to its spectrum in terms of magnitudes and phases. In .1 Computation of the DTFT It is well known that the whole Fourier approach to signal analysis is based on the expansion of a signal in terms of sinusoids or. π). again requiring an inﬁnite amount of memory and an inﬁnite computational time. In practice. and also any signal we want to analyze does not have inﬁnite duration. the DTFT generally is not computable unless we have an analytical expression for the signal we analyze. the spectrum of the signal changes with time. But most of the time this is not the case. its DTFT yields two ‘delta’ functions. just as music is composed of different notes that change with time. we do not have inﬁnite time. This shows that the DTFT gives perfect frequency localization as an exact concentration of energy at ±ω0 provided (a) the sequence lasts from −∞ to ∞ and we have an inﬁnite amount of memory to actually collect all the data points.8 0. Consequently.4 −0.
2. · · · . The ﬁnite sample can be given in terms of either an analytic expression or a set of observations stored in memory. such as a vector. −1. k = 0. 1. · · · . 2. · · · . −1] be a data vector of length N = 4. n = 0. which computers a ﬁnite number of coefﬁcients X(0). Example 8 (see Appendix 6. 4. the DTFT is not computable numerically because it yields a continuous function.3. X(1). · · · . X(N − 1)) to a vector x = (x(0). · · · . associate a vector X = (X(0). such as the discrete Fourier transform (DFT). and therefore X(k) = 1 + 2(−j)k − 1(−j)2k − 1(−j)3k for k = 0.3) X(k) exp(jπkn/N ). . X(ω). In this section we address the problem of estimating X(ω) = DT F T {x(n)} based on the DFT of a sample of ﬁnite length N .8) Let x = [1. Then applying the deﬁnition. based on an inﬁnite sequence x(n). X(k) = DF T {x(n)} = N −1 x(n) exp(−j2πkn/N ). This yields the DFT vector X = DF T {x} = [1. · · · . −1. Whereas the DFT is a numerical operation. 2 + 3j] 4. N − 1 k=0 If we deﬁne wN = exp(−j2πk/N ). x(N − 1). x(1). 3. the DFT and DTFT are two quite different operations. N − 1 n=0 N −1 x(n) = IDF T {X(k)} = 1 N (4. w4 = exp(−j2π/4) = −j. X(N − 1) from a ﬁnite set of data x(0). 1. as follows.3 Discrete Fourier Transform (DFT) Deﬁnition The discrete Fourier transform (DFT) and its own inverse discrete Fourier transform (IDFT). x(N − 1)) of N data points. we do not have an analytical expression for the signal.1 The relationship between DFT and DTFT In spite of their similarities. and we need to develop an algorithm that can be computed numerically in a ﬁnite number of steps. 2 − 3j. then N −1 X(k) = DF T {x(n)} = n=0 x(n)(wN )n which implies that X(k) is a weighted summation of (WN )n (basis).48 this situation.
To simplify the arguments to follow. recall that the inﬁnite series 1/nL . is ﬁnite. This means that for every ˆ ω. we assume the initial time to be n0 = −N/2.49 Before going any further. 30. the approximation makes sense and we can say that we can estimate the frequency spectrum of the whole signal based on a set of samples. Also recall the DTFT as an inﬁnite sum ∞ X(ω) = −∞ x(n) exp(−jωn) ˆ Now the question is. in terms of convergence. let us see how we can reconcile the fact that we deﬁned the DFT for data in the form x(0). Typical examples in this class are signals decaying as 1/n2 for which x(n) ≤ C/n2 . In particular. . we distinguish three different cases. while in reality we might have data starting at any point in time. as N →∞ ˆ lim XN (ω) = X(ω) < ∞ The fact that X(ω) is ﬁnite comes from the following inequality X(ω) =  x(n) exp(−jωn) ≤ x(n) < ∞ These are the bestbehaved sequences. x(n0 + N − 1). signals decaying exponentially fast. with L an integer. In particular. is divergent for L = 1 and convergent for L ≥ 2. · · · .e. we can approximated X(ω) inﬁnitely closely by the ﬁnite sequence XN (ω). the better the approximation. Then we can write the DFT in terms of the ﬁnite sum N/2−1 ˆ XN (ω) = n=−N/2 x(n) exp(−jωn) ˆ and X(k) = XN (k2π/N ). • Absolutely summable signals. The more data(i. such as x(n0 ). What decides the convergence is how the signal x(n) behaves as the index n → ∞. with N sufﬁciently large. x(N − 1). · · · . In order to fully understand the arguments presented. will XN → X as N → ∞ for each value of the frequency variable ω?. for n sufﬁciently large. and converges for every ω. all illustrated in Fig. If the answer is positive. with any choice of rounding for odd values of the data length N . the larger N is). These signals are such that ∞ x(n) < ∞ n=−∞ For these signals the DTFT always exists.
.50 Figure 30: Three cases.
· · · . N − 1 0 otherwise . This is shown in Fig 33. Example 9 Consider the sequence x(n) = sin(0. ﬁnite sequences ˆ x(n) for −L ≤ n ≤ L. This can be seen by deﬁning a window sequence w(n) = 1 if n = 0. Fig. and the lack of convergence at the discontinuities (ω = ±π/4) is evident. · · · . 4. and computing its DFT as in the previous section. This is expected because the signal is absolutely summable.25πn) (πn)2 Again we want to estimate X(ω) = DT F T (x(n) using the DFT. 1. 32 shows convergence for all values of ω by comparing two cases with L = 32 and L = 1024. the goal is to determine its frequency components by taking a ﬁnite set of samples as a vector.25πn) πn We want to estimate X(ω) = DT F T (x(n)) using the DFT. x(0). There are signals such that x(n)2 < ∞ For these signals we do not have pointwise convergence as in the preceding case. x(1). we take a ﬁnite set of data) will have an effect on the frequency spectrum we compute.e. The fact that we window the signal x(n) (i. 31.2 DFT for spectral estimation One of the uses of the DFT is to estimate the frequency spectrum of a signal. given a continuous time signal x(t).51 • Signals with Finite Energy (not inﬁnitely summable). The plots of XN (k) for several values of L are shown in Fig. Example 10 Consider the sequence x(n) = sin2 (0. In particular. x(N − 1). but we can say that ˆ X(ω) − XN (ω)2 dω → 0 lim N →∞ • This is one of the more important issues in DSP and we will treat the third case in the next subsection. Choose for example.3. of length N = 2L + 1. and we have to be aware of that when we interpret the results of the DFT.
2 1 0.4 0.2 1.5 3 Figure 31: Discontinuity for L = 32.5 1 1.2 1 0.5 2 0 0 1.6 0.2 0 0 1. 512.5 2 2.4 1.6 0.4 0.5 3 0.4 1.8 0.5 L=1024 2 2.4 0.52 L=32 1.5 1 1.2 1 0.5 2 2.4 1.5 1 1.4 1.2 0 0 0.4 0.8 0.6 0.8 0.8 0.2 1 0.6 0.5 3 0.5 1 x=2π/65 L=512 1. 128.2 0 0 0. 1024 .
6 0.25 L=1024 0. 1 0.53 L=32 0.2 0 −0.05 0 0 0.5 2 2.2 0.15 0.5 1 1.6 0.05 0.4 0.1 0.2 0 −0.4 −0.6 −0.5 2π k/65 2 2.4 0.5 1 1. .5 3 Figure 32: A summable case.4 −0.25 0.8 0.8 −1 0 2 4 6 8 10 1 0.6 −0.2 −0.8 0.2 0.1 0.8 −1 2 3 4 5 6 7 Figure 33: Windowing and then sampling.5 3 0 0 0.2 −0.15 0.
such as x(n) = exp(jω0 n). the complex exponential is perfectly localized in frequency. by the DFT we compute samples of the frequency spectrum of the windowed signal x(n)w(n). As a result. Without windowing. they are designed with a smoother transition between the two regions (inside and outside the window). 35. x(N − 1) as X(k) = Xw (ω)ω=k2π/N In other words. it repeats periodically with period 2π. A number of windows currently used are shown in Fig. In all cases. Given . Then ∞ Xw (ω) = −∞ w(n) exp(jω0 n) exp(−jωn) = W (ω − ω0 ) where the rightmost term is the frequency spectrum of the window function N −1 W (ω) = DT F T (w(n)) = n=0 1 − exp(−jωN ) 1 − exp(jω) This expression can be easily manipulated to simpler form and we can write W (ω) = exp(−jω(N − 1)/2) sin(ωN/2) sin(ω/2) The magnitude of this expression is shown in Fig. The energy spreads around the frequency ω0 within the mainlobe and trickles to other frequencies by the sidelobes of the sinc function. 34. The drawback is that they all have a wider mainlobe.54 and the DTFT of the windowed signal N −1 Xw (ω) = DT F T (w(n)x(n)) = n=0 x(n) exp(−jωn) Now notice that we can relate the righthand side of the preceding expression to the DFT of the ﬁnite sequence x(0). let us examine the case when the signal x(n) is a complex exponential. since DT F T (exp(jω0 n)) = 2πδ(ω − ω0 ) Once we window the signal. the DTFT of the windowed complex exponential is given by Xw (ω) = W (ω − ω0 ) =  sin[(ω − ω0 )N/2]  sin[(ω − ω0 )/2] Fig. 34 compares the frequency spectra of the complex exponential before and after the windowing operations. its frequency spectrum is not perfectly localized any more. and as a consequence the sidelobes are lower. To see the effect of the windowing operation on the frequency spectrum. · · · .
The problem with this algorithm is that a bruteforce implementation would hardly be feasible in real time.4 *Sampling and reconstruction* We have assumed so far that the sequence of samples on which we are to operate is given to us. Fortunately.8 0. exploiting the very structure of the algorithm we have presented by using the fast Fourier transform (FFT). the DFT is the basis of a number of applications. 4. 10 (N=10) and ω0 = 0. even for a data set of modest length. Now the question is how we can relate X(ω) = DT F T (x(n)) to X(F ) = F T (x(t)). let us introduce more notation.8 −1 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 −3 −2 −1 0 1 2 3 Figure 34: Left.4 −0. say. or in other words. In particular. Consider a signal x(t) and its samples x(n) = x(nTs ). sin(jω0 n) and cos(j ∗ ω0 n). · · · . Right: the spectrum of exp(jω0 n) with n = 1. which is an efﬁcient way of computing he DFT. we shall totally have to obtain this by sampling a continuous time (or space) signal x(t). 2. the complexity of the DFT computed by brute force grow as N 2 . the DFT can be computed in a very efﬁcient way. if we can reconstruct the analog signal x(t) form its samples x(n). The question we consider here is under what conditions we can completely reconstruct the original signal x(t) from its discretely sampled signal x(n). with Ts being the sampling interval and Fs = 1/Ts the sampling frequency.4 0.6 −0.6 0. To this end. In practice. we want to identify conditions under which we can determine X(F ) from X(ω).55 1 0. .5.2 0 −0. a ﬁxed set of data points. As we have seen. we have to accept a compromise between frequency resolution (width of the mainlobe) and sidelobes. and is one of the most important tools in digital signal processing.2 −0. the complexity of the FFT grows as N log2 N . FFT is widely used in many applications. It is know that for a data set of length N .
1 0.5 −1 0 0.5 Waveform 0 −0.5 −1 0 0.4 0.9 1 40 Magnitude (dB) 20 0 −20 −40 0 10 20 30 40 50 60 Frequency (Hertz) 70 80 90 100 Figure 35: Various window.4 0.5 0.8 0.4 0.7 0.1 0.9 1 40 Magnitude (dB) 20 0 −20 −40 0 10 20 30 40 50 60 Frequency (Hertz) 70 80 90 100 Click and drag waveform to change fundamental frequency and amplitude 1 0.2 0. Created from Matlab demos: Signal Processing: transforms: DFT.6 Time (Seconds) 0.3 0.8 0.7 0.8 0.5 Waveform 0 −0.56 Click and drag waveform to change fundamental frequency and amplitude 1 0.2 0.6 Time (Seconds) 0.6 Time (Seconds) 0.5 Waveform 0 −0. .5 −1 0 0.2 0.5 0.3 0.3 0.7 0.5 0.9 1 40 Magnitude (dB) 20 0 −20 −40 0 10 20 30 40 50 60 Frequency (Hertz) 70 80 90 100 Click and drag waveform to change fundamental frequency and amplitude 1 0.1 0.
provided we sample fast enough. its frequency spectrum X(F ) is repeated in frequency. deﬁne ∞ xs (t) = n=−∞ x(nTs )δ(t − nTs ) = x(t) δ(t − nTs ) This signal shows the sampling operation. This is very important because it states that the signal x(t) can be fully recovered from its samples x(n) = x(nTs ). in the sense that X(F ) = 0 for F  > FB .57 To achieve this goal. Therefore. when a signal x(t) is sampled in time. Xs (F ) = Fs X(F ) in the interval −Fs /2 < F < Fs /2 and X(F ) can be fully recovered from Xs (F ). using both expressions on the right hand side. The way to recover the continuous time signal x(t) from its samples x(nTs ) is by an interpolating function g(t). to obtain ∞ ∞ Xs (F ) = F T (xs (t)) = n=−∞ x(n) exp(−j2π(F/F )s)/n) = X(F ) ∗ Fs −∞ δ(t − nTs ) from the fact tat FT( δ(t − nTs )) = Fs δ(F − kFs ) . In other words. if G(F ) is such that G(F ) = Ts rect( F = Fs T. if Fs > 2FB . As we have seen. Therefor we can take the Fourier transform. the summation is just Xs (F ). The convolution with a delta function yields just a shift of the variable. all the information within the sampling interval is lost but at the same time the signal is still deﬁned in continuous time. deﬁne x(t) ˆ x(t) = ˆ x(nTs )g(t − nTs ) Its Fourier transform is determined using simple properties ˆ X(F ) = x(n) exp(−j2πF Ts )G(F ) where G(F ) = F G(g(t). We can see that if the signal x(t) is bandlimited. for some frequency FB called the bandwidth of the signal. and we sample at a frequency Fs > 2FB . so we can write Xs (F ) = X(ω)ω=2π(F/Fs ) = Fs X(F − kFs ) where X(ω) = DT F T (x(n)). Fs > 2FB . if − Fs /2 < F < Fs /2 0 otherwise . In fact. In other words. then there is no overlapping between the repetitions on the frequency spectrum. the interpolating function.
we have the following Nyquist–Shannon sampling theorem. This implies that the ideal interpolating function is given ˆ by F g(t) = IFT Ts rect = sinc(t/Ts ) Fs and we can perfectly reconstruct the continuous time signal x(t) from its samples x(n): X(t) = x(n)sinc t − nTs Ts In summary. .58 ˆ Then X(F ) = X(F ) and x(t) = x(t). Theorem 2 Exact reconstruction of a continuoustime baseband signal from its samples is possible if the signal is bandlimited and the sampling frequency is greater than twice the signal bandwidth.
” 5. the output is {y(n)} = a{x(n)} = {· · · . Multiplication by a constant coefﬁcient. The two primary limitations of digital ﬁlters are their speed (the ﬁlter can’t operate any faster than the computer at the heart of the ﬁlter). adder and delay units 5 Digital Filters A digital ﬁlter is any electronic ﬁlter that works by performing digital mathematical operations on an intermediate form of a signal. 2. These are 1. · · · . x1 (1) + x2 (1).1 Operations on Sequences Digital ﬁltering is all based on a small set of very simple linear shift invariant operation on sequence. digital ﬁlters have become increasingly commonplace and are now an essential element of many everyday objects such as radios. and their cost. ax(0). If {x(n)} is the input sequence. a(x). A wellknown signal processing wizard is said to have remarked. Digital ﬁlters can achieve virtually any ﬁltering effect that can be expressed as a mathematical function or algorithm. {x2 (n)} are the input sequences. x1 (0) + x2 (0). ax(n) · · · } . everything is a ﬁlter. Adders do the obvious thing: if {x1 (n)}. “When you think about it. However as the cost of integrated circuits has continued to drop over time. · · · } . ax(−1).59 x’(n) + x(n) y(n) x(n) a multiplier adder y(n) x(n) delay D y(n) Figure 36: Multiplier. and stereo receivers. the output is just {y(n)} = {· · · . cellphones. Multipliers are normally depicted graphically by a triangle as in Fig. x1 (−1) + x2 (−1). 36.
2 Filters The commonly used digital ﬁlters are all based on combinations of these three operations.e. The other component is a delay unit.{y(n)} ↔ Y (z). The general ﬁlter can be described by the operational equation N N {y(n)} = m=0 a(m)Dm {x(n)} + m=1 b(m)Dm {y(n)} Note that the sum over the outputs starts at a delay of 1. a single memory element. 0. . rather than acting independently on the elements. This is the ZT of the unit impulse sequence {x(n)} = {1. · · · . 5. from the shift result we established before. This operates on the input sequence {x(n)} to give the output {y(n)} = D{x(n)} = {x(n − 1)} Note that you must regard this as chning the sequence. This mean that the ﬁlter above can be rewritten in ZT form N N Y (z)(1 − m=1 b(m)z −m ) = m=0 a(m)z −m X(z) No the ratio H(z) = Y (z)/X(z) is known as the ZT transfer function of the ﬁlter H(z) = N −m m=0 a(m)z N −m m=1 b(m)z 1− Now the transfer function H(z) is the ZT of the ﬁlter output sequence when the input sequence has ZT X(z) = 1. If we use the ZT. · · · } Thus the other way to characterise the ﬁlter is in terms of its impulse response since the response to any sequence is just the sum of the responses to the individual impulses of the sequence components.60 3. with {x(n)} ↔ X(z). we must ﬁrst note that the operator D has a ZT representation z −1 . not 0. i.
· · · } Such a ﬁlter obviously has ﬁnite memory. the ZT of a nonrecursive ﬁlter can be written as N H(z) = m=0 h(m)z −m and the impulse response is just {h(n)} = {h(0). in the sense that its response to a particular input element is zero after N time steps. for the obvious reason that their IR contain only ﬁnitely many nonzero terms. Fig 37. Correspondingly. FIR ﬁlters are all called moving average (MA) ﬁlters. .61 x(n) h(N) h(N1) h(1) + D + h(0) y(n) D + D Figure 37: Direct nonrecursive ﬁlter implementation 5. its operational equation can be written N {y(n)} = m=0 h(m)Dm {x(n)} Such ﬁlters are also called ﬁnite impulse response ﬁlters. This can also be seen from the block diagram of a direct implementation of the ﬁlter. h(1).3 Nonrecursive Filters 5. which is a canonical form.1 Operational Deﬁnition When a ﬁlter is nonrecursive.3.
Another way of looking at it is through its frequency domain behaviors.e. which are either real or occur in complex conjugate pairs. i.2 Zeros Trying to ﬁgure out how an FIR ﬁlter will behave. . 38. given its IR. We can make a start on this by examining the zeros of its transfer function H(z). We can express H(z) in terms of the roots by writing N z H(z) = m=1 N (z − zm ) where in general zm = zm  exp(j arg[zm ]] is the mth root.62 Im z z3 z* 3 z 2 z 1 Re z z* 2 Figure 38: The zeros of an FIR ﬁlter transfer function lies on the real axis of occur in conjugate pairs 5. those values of z for which H(z) = 0 Since H(z) is a polynomial of order N with real coefﬁcients.3. is note always so simple. it follows that the equation has N roots. as shown in Fig. Series implementations have advantage in terms of both hardware and software modularity and sensitively to coefﬁcient quantization. The factorisation of H(z) implies that H(z) can be written as a product of quadratic polynomials in z −1 with real coefﬁcients M H(z) = m=1 (hm (0) + hm (1)z −1 + hm (2)z −2 ) This is effectively a series implementation of the transfer function. The zeros of a transfer function are usually denoted graphically in the complex zplane by circles. They are often sued in practice. or zero of the transfer function.
4. then their impulse response in inﬁnitythey are IIR. 39) of such a ﬁlter is an obvious generalisation of that for the nonrecursive Nth order ﬁlter. then the ﬁlter is called an all pole ﬁlter(or sometimes an autoregressive (AR) ﬁlter). They have operational equation which are written as N N (1 − m=1 b(m)Dm ){y(n)} = m=1 a(m)Dm ){x(n)} and a ZT transfer function H(z) = where A(z) = m=0 N A(z) B(z) N a(m)z −m . the most commonly use in DSP are the recursive ﬁlters. 5.1 Operational Deﬁnition Of the many ﬁlter transfer function which are not FIR.63 Figure 39: Block diagram of an ARMA ﬁlter. · · · } The block diagram (Fig. {h(n)} = {h(0). If the numerator coefﬁcient a(m) = 0. h(1). .4 Recursive Filters 5. so called because their current output depends not only on the last N inputs but also on the last N outputs. B(z) = 1 − m=1 b(m)z −m and if B(z) = 1.
Now if the pole of the transfer lie insider the unit circle. the unit circle has been shown on the zplane. then they represent geometric 2 series with a coefﬁcient whose magnitude βm  < 1. a sequence {1. they are always BIBO stable: zeros have no effect on stability. on the other hand.2 Poles and Zeros Writing he numerator and denominator of H(z) in factors we get A(z) H(z) = = ( (z − αn ))/( (z − βn )) B(z) n=1 n=1 We know that the roots αn are the zeros of the transfer function. They have greater signiﬁcance for the behaviour of H(z): it is singular at the points z = βn . which is a sum of such sequences weighted by the appropriate input terms. ON this ﬁgure. the ﬁlter output. · · · } which is convergent. the ﬁlter is said to be conditionally stable: some input sequence will lead to bounded output sequence and some will not. Since FR ﬁlters have no poles. βm . If.4. βm . as shown in Fig. N N . Consequently. The roots of the equation B(z) = 0 are called the poles of the transfer function. Poles are drawn on the zplane as crosses. In order for a recursive ﬁlter to be bounded input bounded output stable. is bounded if the input is. 40. βm  > 1. A ﬁlter is BIBO stable if any bounded input sequence gives rise to a bounded output sequence. i. the geometric series diverges and the ﬁlter output will grow without bound as n → ∞. 5.64 Im z z=1 Re z Figure 40: Poles and zeros of a ﬁlter transfer function.e. all of the poles of its transfer function must lie inside the unit circle. If βm  = 1.
Writing. This is true for any stable ﬁlter (i. 41. π. for example.e poles inside unit circle). while a zero on the unit circle implies a null in the response at that frequency. a fact which ca n be made more explicit by factoring the numerator and denominator polynomials to write N (exp(jω) − αn ) H(ω) = K n=1 N n=1 (exp(jω) − βn ) where K is a constant. This has a minimum at ω = θn and a maximum at ω = θn + π. βn are respectively the nth zero and pole of the transfer function. Note also that. we can get a good idea of how the response will look. Similarly. αn . but even with these few points and the locations of poles and zeros. which has a graphical interpretation in the rms of the location of the corresponding root in relation to the unit circle This is illustrated in Fig. the contribution of a zero to the ﬁlter frequency response is a minimum at the angle of the zero and a maximum at the diametrically opposite frequency. because they are in the denominator.65 5. π/2. In other words. Each factor in the numerator or denominator is a complex function of frequency. The factors are simply the difference vectors between the root location and the point on the unit circle at angle ω. the nth zero as αn = αn  exp(jθn ) we ﬁnd that it makes a contribution with magnitude An (ω)2 = 1 + αn 2 − 2αn  · (ω − θn ) to the overall frequency response. Thus we can make a reasonable guess about the ﬁlter frequency response imply by looking at the polezero diagram: lowpass ﬁlters have poles near the original ω = 0 and zeros at high frequencies and so on.1 Poles. the contribution of a pole to the response grows as βn  → 1. . To get a more detailed picture of the frequency response.5. which are simple to calculate. H(ω) depends on the locations of the poles and zeros of the transfer function. it is always useful to calculate it at frequencies ω = 0. Zeros and Frequency Response Now suppose we have the ZT transfer function of a ﬁlter H(z) = A(z) B(z) We can formally represent the frequency response of the ﬁlter by substituting z = exp(jω) and obtain A(ω) H(ω) = B(ω) Obviously.5 Frequency and digital ﬁlters 5. the contribution from a pole is a maximum at eh angle of the pole and a minimum at the opposite angle.
but he extent to which the ideal frequency responses can be achieved in practice is limited. This four types are shown in Fig.5. where a zero inevitable gives a rather broad one. but are transitional. as well as shaping its frequency content. Finite transition bandwidth.β n Figure 41: The contribution of a pole or zero to a ﬁlter frequency response To make matters more concrete. 5. 44.66 Im z z=e jw jw αn  e . bandpass and bandstop ﬁlters. In practice. The resulting ﬁlter will be inadequate in several respects compared with the ideal: 1. highpass. A signiﬁcant point to note in comparing to the diagrams is how even a single pole can give a sharp peak in the frequency response magnitude. of course. a recursive ﬁlter of a given order is generally better than a nonrecursive one. Any realisation will not have a discontinuous frequency response: there will be a band of frequencies width are neither stopped not passed. A practical ﬁlter will not have a null response to frequency outside its passband: this would require zeros at all such frequencies. but this is often at the expense of an unpleasant (nonlinear) phase response: linear phase implies the ﬁlter simply delays the input signals. The name are selfexplanatory. 2. Fig. It is primarily for this reason that recursive designs are preferred in many applications.αn z=1 βn jw Re z e . To achieve a given magnitude response. . these ideal shapes require IIR ﬁlters which do not have a good approximation in terms of a rational transfer function. This is a consequence of the poles belonging to the denominator of the transfer function. 43 does the same thing for a pole. while Fig. Finite stopband attenuation. 42 shows what effect varying the magnitude and angle of a simple zero has on the frequency response magnitude.2 Filter Types There are four main classes of ﬁlter in widespread use: lowpass.
2 3 2 1 1 2 3 3 2 1 (ii) 1 0.2 3 2 1 1 2 3 3 2 1 (ii) 1 0.4 0. The squared magnitude is plotted vs.8 0.6 0. radial frequency.2 1 2 3 Figure 43: The effect of varying (1) the magnitude and (ii) the angle of a simple pole.8 0.8 0.4 0.6 0.67 (i) 1 0.4 0.4 0. The squared magnitude is plotted vs. (i) 1 0.6 0. radial frequency.2 0 1 2 3 Figure 42: The effect of varying (1) the magnitude and (ii) the angle of a simple zero.8 0.6 0. .
(iv) bandstop 3. very close to sinusoidal) disturbance w(n).e. digitized with a sampling frequency Fs kHZ. have different tradeoff between these artifacts. 5. Fig. This is not just a simple exercise designed for an introductory course. such Butterworth. by moving poles and zeros like pieces on a chessboard. The disturbance is at frequency F0 kHz. x(n) = s(n) + w(n). Chebyshev or Elliptic. 45 show the frequency spectrum of the overall signal plus noise. 2. Originally. which can be easily determined. Now the goal is to design and implement a simple ﬁlter that rejects the disturbance without affecting the signal excessively. for in many cases the use of more sophisticated techniques might not yield a signiﬁcant improvement in performance. Many ﬁlters also have a nonuniform response in the passband: passband ripple Different designs. (iii) bandpass. Notice two facts: 1.6 Simple Filter Design In a number of cases. (ii) highpass. The signal has a frequency spectrum concentrated within the interval 0 to 20+ kHz. much digital ﬁlter design was based on the transformation of analogue designs. using a variety of techniques including he impulseinvariant of bilinear transforms. we can design a linear ﬁlter almost by inspection.68 (i) (ii) 1 H( ω) 0 (iii) ωc 2π−ωc 2π 0 (iv) ωc 2π−ωc 2π 1 H( ω) ω0 ω1 2π−ω1 2π−ω0 ω0 ω1 2π−ω1 2π−ω0 Figure 44: The common types of digital ﬁlter frequency response: (i) lowpass. In this example consider the audio signal s(n).. We will follow these steps: . The signal is affected by a narrowband (i.
Step 1: Frequency domain speciﬁcations. even in the presence of numerical uncertainties. we obtain the transfer function H(z) = K (z − z1 )(z − z2 ) z 2 − 1. Ideally. but greatly distorts the signal. For example. for stability. ’Say it’ by John Coltrane Quartet.9025 and we choose the overall gain K so that H(z)z=1 = 1. We need to reject the signal at the frequency of the disturbance. say K = 1. This yields the frequency response shown in Fig. for example. Recall that the digital frequency is a relative frequency.414z + 1 = 0. 47. As we can see. 46. the digital frequency of the disturbance.69 10 5 Amplitude 10 0 10 −5 0 5 10 15 20 10 5 Amplitude 10 0 10 −5 0 5 10 Frequency (kHz) 15 20 Figure 45: A signal with a disturbance.343z + 0. which is more selective than the previous choice. as expected. Step 2: Determine poles and zeros We need to place two zeros on the unit circle z1 = exp(jω0 ) = exp(jπ/4) and z2 = exp(−jω0 ) = exp(−jπ/4).95. the frequency response is shown in Fig.414z −1 + z −2 ) If we choose. .9543 2 (z − p1 )(z − p2 ) z − 1. This would yield the transfer function H(z) = K(z −1 − z1 )(z −1 − z2 ) = K(1 − 1. With ρ = 0. within the unit circle. A better choice would be to select the poles close to the zeros. let the poles be p1 = ρ exp(jω0 ) and p2 = ρ exp(−jω0 ). it is π/4) radians. it rejects the desired frequencies. the therefore has no dimensions. we would like to have the following frequency response: H(ω) = 0 1 if ω = ω0 Otherwise where ω0 = 2π(F0 /Fs ) (let us say. The only caveat is that the poles have to be within the unit circle.
5 2.5 3 10 5 Nonrecursive filter Disturbed Signal Filtered Signal Amplitude 10 0 10 −5 0 5 10 15 20 10 5 Original Signal Filtered Signal Amplitude 10 0 10 −5 0 5 10 Frequency (kHz) 15 20 Figure 46: Frequency response with zeros and actual output of ﬁltered signal.5 −0.5 0 0 0.5 2 2.5 0.5 3 2 2.5 2 2.5 2 1 1.5 1.5 1 1. .5 3 −1 0 0.70 3.5 1 1.5 1 0 0.
4 −0. then we want to maximise the output at time 0 ∞ y(0) = n=−∞ h(n)x(−n) As it stands. the output will also double. i. the difference equation is determined by inspection: y(n) = 1.5 0 0.7 Matched Filter One common problem in signal processing is that of detecting the presence of a known signal against a background of noise.4 2 1.5 3 Figure 47: Frequency response with poles close to the unit circle Step 3: Determine the difference equation in the time domain From the transfer function. A Matlab program is available in Appendix.5 1 1.5 3 −1.5 0.3495x(n − 1) + 0. however. 48.2 1. We must therefore constrain the ﬁlter coefﬁcients to have ﬁnite energy.e.343y(n − 1) − 0. this is not a well deﬁned problem: if we double all the ﬁlter coefﬁcients.5 2 2.9543x(n − 2) The difference equation can be easily implemented as a recursion in a highlevel language.5 1 1 0. This has applications in radar.8 0. where we notice the absence of the disturbance.5 1 1. communications and pattern recognition.2 −1 0 0 0. 5.71 1. The ﬁnal result is the signal y(n) with the frequency spectrum shown in Fig.5 0. sonar.9025y(n − 2) + 0. h2 (n) = 1 .6 0 0.954x(n) − 1. The problem is to choose the ﬁlter IR {h(n)} which maximises the ﬁlter output {y(n)}.5 2 2.
which is a number we shall select so that the constraint equation above is satisﬁed. Differentiating the equation above with respect to h(n) gives ∂E(h) = x(−n) + 2λh(n) ∂h(n) and setting this to 0 gives the solution h(n) = x(−n) where the constant = 1/2λ is chosen so that the energy is in unity = 1 x2 (n) In other words. which tells us to maximise the function E(h) = y(0) − λ( h2 (n) − 1) where the second term is the constraint and λ is called the Lagrange multiplier. the sequence {y(n)} is called the autocorrelation sequence of the signal {x(n)} and is deﬁned as rx (n) = x(m)x(n + m) . which is called hte matched ﬁlter for the signal. has an IR which is just the timereverse of the signal sequence. the best ﬁlter.72 10 5 Recursive filter Disturbed Signal Filtered Signal Amplitude 10 0 10 −5 0 5 10 Frequency (kHz) 15 20 10 5 Original Signal Filtered Signal Amplitude 10 0 10 −5 0 5 10 Frequency (kHz) 15 20 Figure 48: Frequency spectrum of ﬁltered signal We can solve this constrained maximisation using the technique of Langrange multipliers. For this ﬁlter. the output at time 0 is y(n) = x(−m)x(n − m) = x(m)x(n + m) In this case.
the most random are the white noise processes. i. This is important because in general crosscorrelation is not symmetric. It is symmetrical about 0. in which case we get the autocorrelation. Rx (n) = σ 2 δn0 5.73 The autocorrelation sequence {rx (n)} has two important properties 1. it can also be estimated by an appropriate sample average. n = 0 2. but getting a reasonable estimate is a complex task. the quantity above depends on only the difference m. {b(n)} is delayed with respect to {a(n)}. Except the mean and variance we deﬁned before. we can deﬁne the crosscorrelation between two sequence {a(n)}. {b(n)} as rab (n) = a(m)b(m − n) Note that here we have a −. which we do not have space for. a(n) = b(n). so called because of their spectral properties. Among random processes. this must be deﬁned in an appropriate way for a random process. i. Its is maximum is at time 0. If {x(n)} is a zero mean stationary white noise process with variance σ 2 .e. Unlike that deﬁned for deterministic signals in the previous section. rx (0) > rx (n).e. then it satisﬁes Ex(n) = 0.e. It is essential to be clear about the distinction between these two types of correlation sequence: one applies to known. the autocorrelation for random sequences shares the two properties of the deterministic autocorrelation: symmetry and a maximum at the origin. In order to understand how well they work. Like the process mean. i. the third important statical description of a process is second order: the autocorrelation sequence. m − n. rx (−n) = rx (n) The ZT of {rx(n) } is Rx (z) = X(z)X(z −1 ) Similarly. ﬁnite . R(n) = E(x(m) − Ex(m))(x(m − n) − Ex(m − n)) Here we have assumed stationarity. In fact rab (−n) = rba (n) unless of course the two sequence are the same. Although it is deﬁned in a completely different way from the deterministic autocorrelation before. as as expected value. which is known as spectrum estimation. it is necessary to ﬁnd a way of modelling random noise: a sample from a stochastic process x(t) which is random variable for each ﬁxed t.8 Noise in Communication Systems: Stochastic Processes The main application for matched ﬁltering techniques is in the detection of known (deterministic) signals in a background of random disturbances.
If {ξ(n)} is a sample from a zero mean stationary white noise process with variance σ 2 . as we have discussed before. namely the detection of a known. The whiteness of the power spectrum can be see by considering either S(x) or equivalently the Fourier transform S(ω) = S(exp(jω)) = σ 2 which is a constant dependent on the noise power: the spectrum is ﬂat. By any large. The sum of a large number of independent random variables is generally approximately normally distributed.1 Detection of known signals in noise We can now turn to the problem which motivated the above summary of the elements of the theory of stochastic processes. which are a sufﬁcient characterisation of the signal processes for much of the DCSP used at present. ﬁnite energy signal {x(n)} in noise. Among random processes. the normal density is given as in Chapter II. we get another normal process (with different mean and correlation). . For a single variate. Normal processes have been very widely studied and are very useful tool in designing DCSP algorithms of all sorts.74 energy signals. so called because of their spectral properties. The resulting measure is called the autocovariance. then it satisﬁes Eξ(n) = 0. a large value of the autocorrelation will result even if the variation of the signal is completely random Consequently.8. its autocorrelation sequence is an impulse at the origin. however. 3. the other to random. the most random are the white noise processes. Rξ (n) = σ 2 δn0 In other words. ﬁnite power signals. The ZT of the autocorrelation sequence is deﬁned as S(z) = R(n)z −n The autocorrelation of a process describes how similar it is to its time shifts: how dependent the value at a given time n is on that at a different time. 5. we shall not deal explicitly with the probability densities. it is completely characterised by its ﬁrst and second order statistics. the most widespread is the Gaussian or normal density. This means that if we ﬁlter a normal process. This means that values ξ(n)ξ(m) taken at different times are completely unrelated. 1. it is generally preferable to remove the mean before calculating the correlation. This has a number of signiﬁcant features. but only work with the expected values. If the process has nonzero mean. Although various probability densities are used. Any linear combination of normal variables is also normal. 2. This result is called the Central limit Theorem. from audio analysis to computer vision or radar.
whose output is thresholded to determine whether the signal is present at a given time. the impulses would occur at regular intervals. I na radar system. according to our previous calculation. The detector operates on the stream of signals. the input sequence {i(n)} would represent the returns from different targets. In signal detection theory. this known as the null hypothesis. while in communication system. 49 in which a sequence of impulse {i(n)}. E(ξ(n)ξ(m) = σ 2 δmn The overall system is modelled by the example in Fig.75 White Noise v(n) Threshold i(n) Transmitter Filter {x(n)} Matched + Filter {x(n)} y(n) d(n) Figure 49: Model of a typical signal detection application. First. Now. the ﬁlter which maximises the output in response to the known signal is just the matched ﬁlter for the signal h(n) = x(N − n) but the signal in the present case is corrupted by noise. In this case. suppose that no signal is present. such as radar or communication which we shall assume is a sample form a zero mean white noise process {ξ(n)}. whose effects we must take into account. The noisy signal is just w(n) = x(n) + ξ(n) with E(ξ(n)x(m)) = x(m)Eξ(n) = 0. so w(n) = ξ(n). which generates the signal sequence {x(n)} in response to each impulse. The system output is the binary decision sequence {d(n)}. corrupted by the channel noise process. using a matched ﬁlter. the ﬁlter optput at time 0 has statistics E(y(N ) = 0 . of course. which might be at regular or random intervals. which would be randomly spaced in time. with data being used to modulate their phase or amplitude. drives the transmitter ﬁlter.
this will blur the edges too.9 Wiener Filter If there is a signiﬁcant amount of noise in the image. where x is the desired variable and ξ is a sample of white noise. we must use a signal model. now consider the case where the signal is present. The simplest one to describe is Wiener ﬁltering. We assume that the variance of x is 1 and that both signal and noise have zero mean Ex = Eξ = Exξ = 0. The simple way to deal with noise is just to lowpass ﬁlter the signal. leading to undesirable results. named after its inventer. To do any better. Ex2 = 1. Eξ 2 = σ 2 We want to choose a coefﬁcient a so that the difference between the estimate x = ay and x is ˆ as low as possible min E[x − ay]2 a Differentiating the error with respect to a and equating to zero 2E(xy) = 2aEy 2 or a= E(xy) Ey 2 . To understand it. then it is better to ﬁnd a way to deal with the randomness directly than to rely on methods which are essentially ad hoc. Unfortunately. since as we have already seen most of its energy is at low frequencies. independent of x with variance σ 2 . we shall start with a very simple problem. 5. Suppose we have a single sample y = x + ξ. The statistical approach to the problem is based on stochastic models of the image and noise processes. the matched ﬁlter is the best ﬁlter for detecting the signal when the noise is white. Then N Ey(N ) = n=0 x2 (n) and Ey (N ) = ( 2 N x2 (n))2 + σ 2 n=0 n=0 N h2 (n) Since the matched ﬁlter maximises the mean output when the signal is present and the mean output is 0 when the signal is absent.76 and Ey 2 (N ) = σ 2 N h2 (n) n=0 To proceed.
the right is from the uniformly distributed random variable. It goes beyond the scope of this note to go into details. ie. possibly using a state space model to give an optimum recursive nonstationary ﬁlter. however. this equalisation can only be achieved approximately in practice. the cumulative distribution of u is deﬁned to be the total probability that u is less than x. Because the intensities are normally integers. The Wiener formulation works for any type of noise or signal spectrum and any ﬁlter. as intended. 51.10 Having Fun: Contrast Enhancement Many images are captured in low light conditions. This problem can be solved by contrast stretching. which can cause then to have low contrast between the darker and lighter regions. If it has a weakness. This problem can be solved by using a nonstationary formulation. which works by trying to spread out the intensities so that their probability density is as uniform as possible. rather than real numbers. so that the Wiener ﬁlter tends to blur them. x Pu (x) = −∞ Pu (y)dy If we deﬁne the function f (u) = Pu (u) which is just the cumulative distribution of u.77 at which value the error. in the sense that ˆ E(x − x)y = Exy − aEy 2 = 0 ˆ This orthogonality principle can readily be extended to solve the problem of estimating images distorted by noise. then f (u) itself has a cumulative distribution x 0<x<1 −1 0 x≤0 Pf (x) = Pu (Pu (x)) = 1 x≥1 which corresponds to a uniform density pf (x) = 1. 5. x − x is orthogonal to the data y. The most sophisticated form of contrast enhancement is known as histogram equalisation. it is that the AR signal models which typically get used to not do a good job of representing image features like edges. 50 shows a comparison between two ﬁgures. 0 ≤ x ≤ 1. It os based on the observation that if u is a random variable with probability density pu (x). The left one is generated from the positive Gaussian random variable. It is clearly demonstrated that the uniform one has a stronger contrast to us. in which a nonlinear function such as f (x) = x2 or f (x) = exp[ax] is used to increase the dynamic range of the image. Fig. In ﬁg. we show another example. .
8 1 Figure 50: Upper panel: left is from Gaussian.2 0.4 0.78 100 90 80 40 35 30 70 60 50 40 30 10 20 10 0 0 1 2 3 4 5 5 0 25 20 15 0 0. right is the histogram of uniform.6 0. Bottom panel: left is the histogram of Gaussian. right from uniform. .
1 Sinusoids The term sinusoid is used here to designate a waveform of the type A cos(2πf t + φ) = A cos(ωt + φ) Thus. We could also have deﬁned a sinusoid based on the sine function as A sin(2πf t + φ). The instantaneous frequency of a sinusoid is deﬁned as the derivative of the instantaneous phase with respect to time: ω= d [ωt + φ] dt A discretetime sinusoid is simply obtained from a continuoustime sinusoid by replacing t by nT : A cos(2πf nT + φ) = A cos(ωnT + φ) . and is in radian units. 6 Appendix: Mathematics 6. A sinusoid’s phase is typically denoted by φ. a sinusoid is deﬁned as a cosine at amplitude A. and phase φ.79 Figure 51: Another application. frequency ω. in which case φ would be different by 90o (π/2 radians).
3 The Exponential Function ex = limn→∞ (1 + x/n)n n ∞ x ex = n=0 n! jθ e = cos(θ) + j sin(θ) ejnθ = cos(nθ) + j sin(nθ) ejθ − e−jθ sin(θ) = 2j ejθ + e−jθ cos(θ) = 2 e = 2..2 Complex Numbers j z x y r θ z1 z2  z1 /z2   exp(jθ) ∠r z1 z2 z1 z2 z ¯ zz ¯ = = = = = = = = = = = = = = −1 x + jy = r exp(jθ) r cos(θ) r sin(θ) z = x2 + y 2 ∠z = tan−1 (y/x) z1 z2  z1 /z2  1 0 (x1 x2 − y1 y2 ) + j(x1 x2 + y1 y2 ) r1 r2 exp(j(θ1 + θ2 ) x − jy = r exp(−jθ) z2 = x2 + y 2 = r2 √ 6.80 6.71828182. ..
5 Spectrum In this module. then we would have to discretize the frequency variable also in order to represent spectra inside the computer. On the other hand. the spectrum of a signal x is the Fourier transform of its timewaveform. Both the z transform and its special case. we avoid discrete frequency for simplicity. and digital timewaveforms are discrete in time but may have energy at any frequency. For example. . This is appropriate because the ear (to a ﬁrst approximation) converts the timewaveform at the eardrum into a neurologically encoded spectrum. if the timewaveform is denoted x(n). or DFT). This correspondence is often denoted x(n) X(z). a spectrum (a complex function of frequency ω) gives the amplitude and phase of the sinusoidal signalcomponent at frequency ω. we use spectra only for conceptual insights into the perceptual effects of digital ﬁltering.81 6. The timewaveform x(n) is said to “correspond” to its z transform X(z). however. In this book. the spectrum is the z transform evaluated on the unit circle z = exp(jωT ). Intuitively. if we were going to talk about FFTs (Fast Fourier Transforms–efﬁcient implementations of the Discrete Fourier Transform. Equivalently. are said to transform from the time domain to the frequency domain. the (discretetime) Fourier transform. meaning they are transform pairs. We deal most often with discrete time nT (or simply n) but continuous frequency f (or ω = 2πf ). Mathematically. therefore.4 Trigonometric identities sin(−θ) cos(−θ) sin(A + B) cos(A + B) = = = = − sin(θ) cos(θ) sin(A) cos(B) + cos(A) sin(B) cos(A) cos(B) − sin(A) sin(B) 1 − cos(2θ) 2 1 + cos(2θ) 2 cos(A − B) − cos(A + B) 2 cos(A + B) + cos(A − B) 2 sin(A + B) + sin(A − B) 2 A+B A−B 2 sin cos 2 2 A+B A−B −2 cos cos 2 2 sin2 (θ) = cos2 (θ) = sin(A) sin(B) = cos(A) cos(B) = sin(A) cos(B) = sin(A) + sin(B) = cos(A) + cos(B) = 6. This is because the computer can represent only digital signals. its z transform is called X(z) and its spectrum is therefore X(exp(jωT )). we think of ﬁlters primarily in terms of their effect on the spectrum of a signal. We denote both the spectrum and the z transform of a signal by uppercase letters. or x(n) X(exp(jωT ).
.5. You get a pulse that’s shaped just like a top hat.. what the ear does to sound Fourier does to signals. you don’t need to convolve All you do is multiply in order to solve. to mean the entire spectrum taken as a whole. . From time into frequency–from frequency to time Every operation in the time domain Has a Fourier analog .. while x(n) refers to the nth sample point on the plot. here’s a simple trick Just multiply by J omega. Then x(·) refers to the complete picture.1 Fourier’s Song Integrate your function times a complex exponential It’s really not so hard you can do it with your pencil And when you’re done with this calculation You’ve got a brand new function .82 When we wish to consider an entire signal as a “thing in itself.6 Matlab program for simple ﬁlter design (From Enrico Rossoni) clear all. let’s call it a sinc. for example. Imagine. close all. and the sinc grows dense. in frequency .the Fourier Transformation What a prism does to sunlight. or X(·). what you gonna do? Divide instead of multiply–you can do it too. and the sinc grows fat. 6. it’s the coolest trick around Now ﬁltering is easy. ain’t that slick? Integration is the inverse. Its Fourier Transform is simpler than you think. Fnoise = Fs/8.clc. that we have plotted x(n) on a strip of paper that is inﬁnitely long.not time Now take that same delta as a function of time Mapped into frequency –of course –it’s a sine! Sine x on x is handy.that’s what I claim Think of a delay.. Or make the pulse wide. consider a sine It’s mapped to a delta. Squeeze the pulse thin. meaning the whole timewaveform (x(n) for all n). a simple shift in time It becomes a phase rotation .” we write x(·). The uncertainty principle is just common sense. From time into frequency–from frequency to time Let’s do some examples. 6. Fs = 11025.now that’s truly sublime! And to differentiate.
inoise = min(ﬁnd(F¿=Fnoise)). ﬁgure(2). ’r’).1).9543. y(2) = 0. b0 = 0. spectrumsignal = Ps(:.7 Fourier Transform: From Real to Complex Variables For a signal x(t) with a period T .WINDOW.SAMPLINGFREQUENCY).3435.WINDOW. a2 = 0.NOVERLAP. fprintf(’Playing ﬁltered audio sample n’). Pn = spectrum(noise. b2 = 0.NFFT. Pns = spectrum(noisy.F] = spectrum(signal.’SignaltoNoise ratio = Pnoise = spectrumsignal(inoise) * 10−SN R/10 . we have exp(jnωt) = cos(nωt) + j sin(nωt) (6.1).1) . SAMPLINGFREQUENCY). NFFT.9025.WINDOW. legend(’Disturbed Signal’. Bi are deﬁned as in Chapter II. y(1) = 0.2) ∞ (6. xlabel(’Frequency (kHz)’). Psignal = mean(spectrumsignal). samplelength = length(signal). Pns(:. noise = ampnoise*sin(2*pi*Fnoise*(1:samplelength)’*deltats). F*0.NFFT. ampnoise = sqrt(Pnoise).9543.’Filtered Signal’). Using the Euler formula as in this Appendix.9543*(1. b1 = 0. NOVERLAP. soundsc(y) 6. load jazz signal. [Ps. WINDOW. NOVERLAP = 0. end Pﬁltered = spectrum(y. NFFT. deltats = 1/Fs.001.NOVERLAP.001. NOVERLAP.4142). fprintf(1. Pnoise = mean(spectrumnoise) noisy = signal + noise. ylabel(’—X(F)—’). length(signal)]). for ii = 3:samplelength y(ii) = a1*y(ii1) + a2*y(ii2) + b0*noisy(ii) + b1*noisy(ii1) + b2*noisy(ii2).83 SNR = 5.1). y = zeros(size(noisy)). SAMPLINGFREQUENCY). a1 = 1. WINDOW = HANNING(NFFT). spectrumnoise = Pn(:. SAMPLINGFREQUENCY = Fs. NFFT = min([256. Pﬁltered(:. we have A0 + [An cos(nωt) + Bn sin(nωt)] x(t) = 2 n=1 where ω = (2π)/T and Ai .1).SAMPLINGFREQUENCY). plot(F*0.
(6. (6. . Eq.3) gives us cos(nωt) = Eq.4) Substituting Eq.4) and Eq. we obtain A0 [An cos(nωt) + Bn sin(nωt)] + x(t) = 2 n=1 A0 [exp(jnωt) + exp(−jnωt)] [exp(jnωt) − exp(−jnωt)] = + An + Bn 2 2 2j n=1 A0 = + 2 n=1 = A0 + 2 n=1 ∞ ∞ ∞ ∞ An exp(jnωt) exp(jnωt) exp(−jnωt) exp(−jnωt)] + Bn + An − Bn 2 2j 2 2j An Bn An Bn + − exp(jnωt) + exp(−jnωt) 2 2j 2 2j (6.7) = n=−∞ cn exp(jnωt) 6.1). its Discrete Fourier Transform (DFT) X(k) is deﬁned as N −1 X(k) = DFT{x[n]} = n=0 x[n] exp [−j(2π/N )kn] .2)+Eq. (6.3) where j 2 = −1.6) An Bn An Bn + .6) becomes ∞ ∞ x(t) = c0 + ∞ n=1 cn exp(jnωt) + n=1 c−n exp(−jnωt) (6. (6.8 More Details on Example 8 Given a ﬁnite sequence x[n]. . k = 0. . (6.84 and exp(−jnωt) = cos(−nωt) + j sin(−nωt) = cos(nωt) − j sin(nωt) (6. c−n = − 2 2j 2 2j If we deﬁne C0 = A0 /2.2)Eq. N − 1 . (6.5) [exp(jnωt) + exp(−jnωt)] 2 (6. (6. cn = Eq.5) into Eq.3) yields sin(nωt) = [exp(jnωt) − exp(−jnωt)] 2j (6. (6. .
k = 0.). In Figure 52 we plot the signal x respectively in the time (top) and in the z domain (right). i. In conclusion. 2 + 3j).8 we plot the signal in the frequency domain. N − 1. N − 1. . k = 0. we can interpret X(k) as the ‘projection’ of our original signal onto the kth sequence of our basis.e. In this case we have N = 4 and ωk = (π/2)k. We will usually refer to this as to the spectrum of x. rotate its elements by φk in the complex plane and ﬁnally sum all the resulting sequences up together. . from the equation above we get X[k] = (1.85 where N is the lenght of the sequence. and the basic property of the exponential exp(nz) = [exp(z)]n . Therefore. −1. In fact. these complex numbers can be though as coordinates to represent our signal. . (6. form a ‘basis’ for representing any sequence of N numbers (you can easily check these N sequences are all independent). . we can regard the Fourier transform as a simple ‘change of coordinate’ (from time to frequency. 2. the amplitude and the phase of its DFT coefﬁcients X(k) are plotted as a function of the frequency index k. Note that by deﬁning the set of angular frequencies ωk = (2π/N )k. . 2 − 3j. In Figure 6. Thus. Finally. 1 The factor 1/N is due to the fact that our basis sequences {ek } are not normalized. From Eq. from frequency to time. .. . in order to reconstruct our sequence. we just need to scale each basis sequence by ρk . −1. we can rewrite Equation 1 as N −1 X(k) = n=0 x[n] exp [−jωk n] (6. for any given ﬁnite sequence x[n] we have1 1 x[n] = N N −1 X(k)ek [n] k=0 The coefﬁcients in this expansion can be more conveniently rewritten as X(k) = ρk exp(jφk ) where ρk = X(k) is the modulus and φk = arg(X(k)) is the phase of X(k). −1). . Let us consider for example the sequence x[n] = (1. .8) The meaning of this formula is clear if you think that the sequences ek [n] = exp(jωk n).8) we then obtain N −1 X(k) = n=0 x[n] exp [−jωk n] = x(0) exp(−j(π/2)k · 0) + x(1) exp(−j(π/2)k · 1) + + x(2) exp(−j(π/2)k · 2) + x(3) exp(−j(π/2)k · 3) = 1 + 2 · (−j)k − 1 · (−j)2k − 1 · (−j)3k where we have used the Euler formula to get exp(−jπ/2) = cos(−π/2) + j sin(−π/2) = −j. As we mentioned above.
86 3 2 1 x[n] 0 −1 −2 −3 0 3 1 2 3 n X(2) 2 1 Im(X) X(4) 0 X(1) −1 −2 X(3) −3 −3 −2 −1 0 1 2 3 Re(X) Figure 52: The signal x plotted in the time domain (top) and in the zdomain (bottom). . 4 3 2 1 0 0 1 2 3 ρk=X(k) π π/2 0 −π/2 −π 0 φ = arg(X(k)) k 1 k 2 3 Figure 53: The signal x in the frequency domain.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.