Chapter 13

Fourier Analysis
In addition to their inestimable importance in mathematics and its applications,
Fourier series also serve as the entry point into the wonderful world of Fourier analy-
sis and its wide-ranging extensions and generalizations. An entire industry is devoted to
further developing the theory and enlarging the scope of applications of Fourier–inspired
methods. New directions in Fourier analysis continue to be discovered and exploited in
a broad range of physical, mathematical, engineering, chemical, biological, financial, and
other systems. In this chapter, we will concentrate on four of the most important variants:
discrete Fourier sums leading to the Fast Fourier Transform (FFT); the modern theory
of wavelets; the Fourier transform; and, finally, its cousin, the Laplace transform. In ad-
dition, more general types of eigenfunction expansions associated with partial differential
equations in higher dimensions will appear in the following chapters.
Modern digital media, such as CD’s, DVD’s and MP3’s, are based on discrete data,
not continuous functions. One typically samples an analog signal at equally spaced time
intervals, and then works exclusively with the resulting discrete (digital) data. The asso-
ciated discrete Fourier representation re-expresses the data in terms of sampled complex
exponentials; it can, in fact, be handled by finite-dimensional vector space methods, and
so, technically, belongs back in the linear algebra portion of this text. However, the insight
gained from the classical continuous Fourier theory proves to be essential in understand-
ing and analyzing its discrete digital counterpart. An important application of discrete
Fourier sums is in signal and image processing. Basic data compression and noise removal
algorithms are applied to the sample’s discrete Fourier coefficients, acting on the obser-
vation that noise tends to accumulate in the high frequency Fourier modes, while most
important features are concentrated at low frequencies. The first Section 13.1 develops
the basic Fourier theory in this discrete setting, culminating in the Fast Fourier Transform
(FFT), which produces an efficient numerical algorithm for passing between a signal and
its discrete Fourier coefficients.
One of the inherent limitations of classical Fourier methods, both continuous and
discrete, is that they are not well adapted to localized data. (In physics, this lack of
localization is the basis of the Heisenberg Uncertainty Principle.) As a result, Fourier-based
signal processing algorithms tend to be inaccurate and/or inefficient when confronting
highly localized signals or images. In the second section, we introduce the modern theory
of wavelets, which is a recent extension of Fourier analysis that more naturally incorporates
multiple scales and localization. Wavelets are playing an increasingly dominant role in
many modern applications; for instance, the new JPEG digital image compression format
is based on wavelets, as are the computerized FBI fingerprint data used in law enforcement
2/25/07 691 c (2006 Peter J. Olver
in the United States.
Spectral analysis of non-periodic functions defined on the entire real line requires re-
placing the Fourier series by a limiting Fourier integral. The resulting Fourier transform
plays an essential role in functional analysis, in the analysis of ordinary and partial differ-
ential equations, and in quantum mechanics, data analysis, signal processing, and many
other applied fields. In Section 13.3, we introduce the most important applied features of
the Fourier transform.
The closely related Laplace transform is a basic tool in engineering applications. To
mathematicians, the Fourier transform is the more fundamental of the two, while the
Laplace transform is viewed as a certain real specialization. Both transforms change differ-
entiation into multiplication, thereby converting linear differential equations into algebraic
equations. The Fourier transform is primarily used for solving boundary value problems
on the real line, while initial value problems, particularly those involving discontinuous
forcing terms, are effectively handled by the Laplace transform.
13.1. Discrete Fourier Analysis and the Fast Fourier Transform.
In modern digital media — audio, still images or video — continuous signals are
sampled at discrete time intervals before being processed. Fourier analysis decomposes the
sampled signal into its fundamental periodic constituents — sines and cosines, or, more
conveniently, complex exponentials. The crucial fact, upon which all of modern signal
processing is based, is that the sampled complex exponentials form an orthogonal basis.
The section introduces the Discrete Fourier Transform, and concludes with an introduction
to the Fast Fourier Transform, an efficient algorithm for computing the discrete Fourier
representation and reconstructing the signal from its Fourier coefficients.
We will concentrate on the one-dimensional version here. Let f(x) be a function
representing the signal, defined on an interval a ≤ x ≤ b. Our computer can only store its
measured values at a finite number of sample points a ≤ x
0
< x
1
< < x
n
≤ b. In the
simplest and, by far, the most common case, the sample points are equally spaced, and so
x
j
= a + j h, j = 0, . . . , n, where h =
b −a
n
indicates the sample rate. In signal processing applications, x represents time instead of
space, and the x
j
are the times at which we sample the signal f(x). Sample rates can be
very high, e.g., every 10–20 milliseconds in current speech recognition systems.
For simplicity, we adopt the “standard” interval of 0 ≤ x ≤ 2π, and the n equally
spaced sample points

x
0
= 0, x
1
=

n
, x
2
=

n
, . . . x
j
=
2j π
n
, . . . x
n−1
=
2(n −1)π
n
.
(13.1)
(Signals defined on other intervals can be handled by simply rescaling the interval to have
length 2π.) Sampling a (complex-valued) signal or function f(x) produces the sample

We will find it convenient to omit the final sample point x
n
= 2π from consideration.
2/25/07 692 c (2006 Peter J. Olver
1 2 3 4 5 6
-1
-0.5
0.5
1
1 2 3 4 5 6
-1
-0.5
0.5
1
1 2 3 4 5 6
-1
-0.5
0.5
1
1 2 3 4 5 6
-1
-0.5
0.5
1
1 2 3 4 5 6
-1
-0.5
0.5
1
1 2 3 4 5 6
-1
-0.5
0.5
1
Figure 13.1. Sampling e
−i x
and e
7 i x
on n = 8 sample points.
vector
f =

f
0
, f
1
, . . . , f
n−1

T
=

f(x
0
), f(x
1
), . . . , f(x
n−1
)

T
,
where
f
j
= f(x
j
) = f

2j π
n

. (13.2)
Sampling cannot distinguish between functions that have the same values at all of the
sample points — from the sampler’s point of view they are identical. For example, the
periodic complex exponential function
f(x) = e
i nx
= cos nx + i sin nx
has sampled values
f
j
= f

2j π
n

= exp

i n
2j π
n

= e
2j π i
= 1 for all j = 0, . . . , n −1,
and hence is indistinguishable from the constant function c(x) ≡ 1 — both lead to the
same sample vector ( 1, 1, . . . , 1 )
T
. This has the important implication that sampling at n
equally spaced sample points cannot detect periodic signals of frequency n. More generally,
the two complex exponential signals
e
i (k+n)x
and e
i kx
are also indistinguishable when sampled. This has the important consequence that we
need only use the first n periodic complex exponential functions
f
0
(x) = 1, f
1
(x) = e
i x
, f
2
(x) = e
2 i x
, . . . f
n−1
(x) = e
(n−1) i x
, (13.3)
in order to represent any 2π periodic sampled signal. In particular, exponentials e
−i kx
of
“negative” frequency can all be converted into positive versions, namely e
i (n−k)x
, by the
same sampling argument. For example,
e
−i x
= cos x − i sin x and e
(n−1) i x
= cos(n −1) x + i sin(n −1) x
2/25/07 693 c (2006 Peter J. Olver
have identical values on the sample points (13.1). However, off of the sample points, they
are quite different; the former is slowly varying, while the latter represents a high frequency
oscillation. In Figure 13.1, we compare e
−i x
and e
7 i x
when there are n = 8 sample values,
indicated by the dots on the graphs. The top row compares the real parts, cos x and cos 7x,
while the bottom row compares the imaginary parts, sinx and − sin 7x. Note that both
functions have the same pattern of sample values, even though their overall behavior is
strikingly different.
This effect is commonly referred to as aliasing

. If you view a moving particle under
a stroboscopic light that flashes only eight times, you would be unable to determine which
of the two graphs the particle was following. Aliasing is the cause of a well-known artifact
in movies: spoked wheels can appear to be rotating backwards when our brain interprets
the discretization of the high frequency forward motion imposed by the frames of the
film as an equivalently discretized low frequency motion in reverse. Aliasing also has
important implications for the design of music CD’s. We must sample an audio signal at
a sufficiently high rate that all audible frequencies can be adequately represented. In fact,
human appreciation of music also relies on inaudible high frequency tones, and so a much
higher sample rate is actually used in commercial CD design. But the sample rate that was
selected remains controversial; hi fi aficionados complain that it was not set high enough
to fully reproduce the musical quality of an analog LP record!
The discrete Fourier representation decomposes a sampled function f(x) into a linear
combination of complex exponentials. Since we cannot distinguish sampled exponentials
of frequency higher than n, we only need consider a finite linear combination
f(x) ∼ p(x) = c
0
+ c
1
e
i x
+ c
2
e
2 i x
+ + c
n−1
e
(n−1) i x
=
n−1
¸
k=0
c
k
e
i kx
(13.4)
of the first n exponentials (13.3). The symbol ∼ in (13.4) means that the function f(x)
and the sum p(x) agree on the sample points:
f(x
j
) = p(x
j
), j = 0, . . . , n −1. (13.5)
Therefore, p(x) can be viewed as a (complex-valued) interpolating trigonometric polynomial
of degree ≤ n −1 for the sample data f
j
= f(x
j
).
Remark: If f(x) is real, then p(x) is also real on the sample points, but may very well
be complex-valued in between. To avoid this unsatisfying state of affairs, we will usually
discard its imaginary component, and regard the real part of p(x) as “the” interpolating
trigonometric polynomial. On the other hand, sticking with a purely real construction
unnecessarily complicates the analysis, and so we will retain the complex exponential form
(13.4) of the discrete Fourier sum.

In computer graphics, the term “aliasing” is used in a much broader sense that covers a
variety of artifacts introduced by discretization — particularly, the jagged appearance of lines
and smooth curves on a digital monitor.
2/25/07 694 c (2006 Peter J. Olver
Since we are working in the finite-dimensional vector space C
n
throughout, we may
reformulate the discrete Fourier series in vectorial form. Sampling the basic exponentials
(13.3) produces the complex vectors
ω
k
=

e
i kx
0
, e
i kx
1
, e
i kx
2
, . . . , e
i kx
n

T
=

1, e
2kπ i /n
, e
4kπ i /n
, . . . , e
2(n−1)kπ i /n

T
,
k = 0, . . . , n −1. (13.6)
The interpolation conditions (13.5) can be recast in the equivalent vector form
f = c
0
ω
0
+ c
1
ω
1
+ + c
n−1
ω
n−1
. (13.7)
In other words, to compute the discrete Fourier coefficients c
0
, . . . , c
n−1
of f, all we need
to do is rewrite its sample vector f as a linear combination of the sampled exponential
vectors ω
0
, . . . , ω
n−1
.
Now, as with continuous Fourier series, the absolutely crucial property is the orthonor-
mality of the basis elements ω
0
, . . . , ω
n−1
. Were it not for the power of orthogonality,
Fourier analysis might have remained a mere mathematical curiosity, rather than today’s
indispensable tool.
Proposition 13.1. The sampled exponential vectors ω
0
, . . . , ω
n−1
form an orthonor-
mal basis of C
n
with respect to the inner product
' f , g ` =
1
n
n−1
¸
j =0
f
j
g
j
=
1
n
n−1
¸
j =0
f(x
j
) g(x
j
) , f , g ∈ C
n
. (13.8)
Remark: The inner product (13.8) is a rescaled version of the standard Hermitian dot
product (3.90) between complex vectors. We can interpret the inner product between the
sample vectors f , g as the average of the sampled values of the product signal f(x) g(x).
Remark: As usual, orthogonality is no accident. Just as the complex exponentials
are eigenfunctions for a self-adjoint boundary value problem, so their discrete sampled
counterparts are eigenvectors for a self-adjoint matrix eigenvalue problem; details can be
found in Exercise 8.4.12. Here, though, to keep the discussion on track, we shall outline a
direct proof.
Proof : The crux of the matter relies on properties of the remarkable complex numbers
ζ
n
= e
2π i /n
= cos

n
+ i sin

n
, where n = 1, 2, 3, . . . . (13.9)
Particular cases include
ζ
2
= −1, ζ
3
= −
1
2
+

3
2
i , ζ
4
= i , and ζ
8
=

2
2
+

2
2
i .
(13.10)
The n
th
power of ζ
n
is
ζ
n
n
=

e
2π i /n

n
= e
2π i
= 1,
2/25/07 695 c (2006 Peter J. Olver
ζ
0
5
= 1
ζ
5
ζ
2
5
ζ
3
5
ζ
4
5
Figure 13.2. The Fifth Roots of Unity.
and hence ζ
n
is one of the complex n
th
roots of unity: ζ
n
=
n

1. There are, in fact, n
different complex n
th
roots of 1, including 1 itself, namely the powers of ζ
n
:
ζ
k
n
= e
2 k π i /n
= cos
2 k π
n
+ i sin
2 k π
n
, k = 0, . . . , n −1.
(13.11)
Since it generates all the others, ζ
n
is known as the primitive n
th
root of unity. Geometri-
cally, the n
th
roots (13.11) lie on the vertices of a regular unit n–gon in the complex plane;
see Figure 13.2. The primitive root ζ
n
is the first vertex we encounter as we go around the
n–gon in a counterclockwise direction, starting at 1. Continuing around, the other roots
appear in their natural order ζ
2
n
, ζ
3
n
, . . . , ζ
n−1
n
, and finishing back at ζ
n
n
= 1. The complex
conjugate of ζ
n
is the “last” n
th
root
e
−2π i /n
= ζ
n
=
1
ζ
n
= ζ
n−1
n
= e
2(n−1)π i /n
. (13.12)
The complex numbers (13.11) are a complete set of roots of the polynomial z
n
− 1,
which can therefore be factored:
z
n
−1 = (z −1)(z −ζ
n
)(z −ζ
2
n
) (z −ζ
n−1
n
).
On the other hand, elementary algebra provides us with the real factorization
z
n
−1 = (z −1)(1 + z + z
2
+ + z
n−1
).
Comparing the two, we conclude that
1 + z + z
2
+ + z
n−1
= (z −ζ
n
)(z −ζ
2
n
) (z −ζ
n−1
n
).
Substituting z = ζ
k
n
into both sides of this identity, we deduce the useful formula
1 + ζ
k
n
+ ζ
2k
n
+ + ζ
(n−1)k
n
=

n, k = 0,
0, 0 < k < n.
(13.13)
Since ζ
n+k
n
= ζ
k
n
, this formula can easily be extended to general integers k; the sum is
equal to n if n evenly divides k and is 0 otherwise.
2/25/07 696 c (2006 Peter J. Olver
Now, let us apply what we’ve learned to prove Proposition 13.1. First, in view of
(13.11), the sampled exponential vectors (13.6) can all be written in terms of the n
th
roots
of unity:
ω
k
=

1, ζ
k
n
, ζ
2k
n
, ζ
3k
n
, . . . , ζ
(n−1)k
n

T
, k = 0, . . . , n −1.
(13.14)
Therefore, applying (13.12, 13), we conclude that
' ω
k
, ω
l
` =
1
n
n−1
¸
j =0
ζ
j k
n
ζ
jl
n
=
1
n
n−1
¸
j =0
ζ
j(k−l)
n
=

1, k = l,
0, k = l,
0 ≤ k, l < n,
which establishes orthonormality of the sampled exponential vectors. Q.E.D.
Orthonormality of the basis vectors implies that we can immediately compute the
Fourier coefficients in the discrete Fourier sum (13.4) by taking inner products:
c
k
= ' f , ω
k
` =
1
n
n−1
¸
j =0
f
j
e
i kx
j
=
1
n
n−1
¸
j =0
f
j
e
−i kx
j
=
1
n
n−1
¸
j =0
ζ
−j k
n
f
j
. (13.15)
In other words, the discrete Fourier coefficient c
k
is obtained by averaging the sampled val-
ues of the product function f(x) e
−i kx
. The passage from a signal to its Fourier coefficients
is known as the Discrete Fourier Transform or DFT for short. The reverse procedure of
reconstructing a signal from its discrete Fourier coefficients via the sum (13.4) (or (13.7))
is known as the Inverse Discrete Fourier Transform or IDFT. The Discrete Fourier Trans-
form and its inverse define mutually inverse linear transformations on the space C
n
, whose
matrix representations can be found in Exercise 5.7.9.
Example 13.2. If n = 4, then ζ
4
= i . The corresponding sampled exponential
vectors
ω
0
=

¸
¸
1
1
1
1

, ω
1
=

¸
¸
1
i
−1
−i

, ω
2
=

¸
¸
1
−1
1
−1

, ω
3
=

¸
¸
1
−i
−1
i

,
form an orthonormal basis of C
4
with respect to the averaged Hermitian dot product
' v , w` =
1
4

v
0
w
0
+ v
1
w
1
+ v
2
w
2
+ v
3
w
3

, where v =

¸
¸
v
0
v
1
v
2
v
3

, w =

¸
¸
w
0
w
1
w
2
w
3

.
Given the sampled function values
f
0
= f(0), f
1
= f

1
2
π

, f
2
= f(π), f
3
= f

3
2
π

,
we construct the discrete Fourier representation
f = c
0
ω
0
+ c
1
ω
1
+ c
2
ω
2
+ c
3
ω
3
, (13.16)
where
c
0
= ' f , ω
0
` =
1
4
(f
0
+ f
1
+ f
2
+ f
3
), c
1
= ' f , ω
1
` =
1
4
(f
0
− i f
1
−f
2
+ i f
3
),
c
2
= ' f , ω
2
` =
1
4
(f
0
−f
1
+ f
2
−f
3
), c
3
= ' f , ω
3
` =
1
4
(f
0
+ i f
1
−f
2
− i f
3
).
2/25/07 697 c (2006 Peter J. Olver
1 2 3 4 5 6
2
4
6
8
10
1 2 3 4 5 6
2
4
6
8
10
1 2 3 4 5 6
2
4
6
8
10
1 2 3 4 5 6
2
4
6
8
10
1 2 3 4 5 6
2
4
6
8
10
1 2 3 4 5 6
2
4
6
8
10
Figure 13.3. The Discrete Fourier Representation of x
2
−2πx.
We interpret this decomposition as the complex exponential interpolant
f(x) ∼ p(x) = c
0
+ c
1
e
i x
+ c
2
e
2 i x
+ c
3
e
3 i x
that agrees with f(x) on the sample points.
For instance, if
f(x) = 2πx −x
2
,
then
f
0
= 0., f
1
= 7.4022, f
2
= 9.8696, f
3
= 7.4022,
and hence
c
0
= 6.1685, c
1
= −2.4674, c
2
= −1.2337, c
3
= −2.4674.
Therefore, the interpolating trigonometric polynomial is given by the real part of
p(x) = 6.1685 −2.4674 e
i x
−1.2337 e
2 i x
−2.4674 e
3 i x
, (13.17)
namely,
Re p(x) = 6.1685 −2.4674 cos x −1.2337 cos 2x −2.4674 cos 3x. (13.18)
In Figure 13.3 we compare the function, with the interpolation points indicated, and dis-
crete Fourier representations (13.18) for both n = 4 and n = 16 points. The resulting
graphs point out a significant difficulty with the Discrete Fourier Transform as developed
so far. While the trigonometric polynomials do indeed correctly match the sampled func-
tion values, their pronounced oscillatory behavior makes them completely unsuitable for
interpolation away from the sample points.
However, this difficulty can be rectified by being a little more clever. The problem is
that we have not been paying sufficient attention to the frequencies that are represented
in the Fourier sum. Indeed, the graphs in Figure 13.3 might remind you of our earlier
observation that, due to aliasing, low and high frequency exponentials can have the same
sample data, but differ wildly in between the sample points. While the first half of the
2/25/07 698 c (2006 Peter J. Olver
1 2 3 4 5 6
2
4
6
8
10
1 2 3 4 5 6
2
4
6
8
10
1 2 3 4 5 6
2
4
6
8
10
1 2 3 4 5 6
2
4
6
8
10
1 2 3 4 5 6
2
4
6
8
10
1 2 3 4 5 6
2
4
6
8
10
Figure 13.4. The Low Frequency Discrete Fourier Representation of x
2
−2πx.
summands in (13.4) represent relatively low frequencies, the second half do not, and can be
replaced by equivalent lower frequency, and hence less oscillatory exponentials. Namely, if
0 < k ≤
1
2
n, then e
−i kx
and e
i (n−k)x
have the same sample values, but the former is of
lower frequency than the latter. Thus, for interpolatory purposes, we should replace the
second half of the summands in the Fourier sum (13.4) by their low frequency alternatives.
If n = 2m + 1 is odd, then we take
´ p(x) = c
−m
e
−i mx
+ +c
−1
e
−i x
+c
0
+c
1
e
i x
+ +c
m
e
i mx
=
m
¸
k=−m
c
k
e
i kx
(13.19)
as the equivalent low frequency interpolant. If n = 2m is even — which is the most
common case occurring in applications — then
´ p(x) = c
−m
e
−i mx
+ +c
−1
e
−i x
+c
0
+c
1
e
i x
+ +c
m−1
e
i (m−1)x
=
m−1
¸
k=−m
c
k
e
i kx
(13.20)
will be our choice. (It is a matter of personal taste whether to use e
−i mx
or e
i mx
to
represent the highest frequency term.) In both cases, the Fourier coefficients with negative
indices are the same as their high frequency alternatives:
c
−k
= c
n−k
= ' f , ω
n−k
` = ' f , ω
−k
`, (13.21)
where ω
−k
= ω
n−k
is the sample vector for e
−i kx
∼ e
i (n−k)x
.
Returning to the previous example, for interpolating purposes, we should replace
(13.17) by the equivalent low frequency interpolant
´ p(x) = −1.2337 e
−2 i x
−2.4674 e
−i x
+ 6.1685 −2.4674 e
i x
, (13.22)
with real part
Re ´ p(x) = 6.1685 −4.9348 cos x −1.2337 cos 2x.
Graphs of the n = 4 and 16 low frequency trigonometric interpolants can be seen in
Figure 13.4. Thus, by utilizing only the lowest frequency exponentials, we successfully
2/25/07 699 c (2006 Peter J. Olver
suppress the aliasing artifacts, resulting in a quite reasonable trigonometric interpolant to
the given function.
Remark: The low frequency version also serves to unravel the reality of the Fourier
representation of a real function f(x). Since ω
−k
= ω
k
, formula (13.21) implies that
c
−k
= c
k
, and so the common frequency terms
c
−k
e
−i kx
+ c
k
e
i kx
= a
k
cos kx + b
k
sin kx
add up to a real trigonometric function. Therefore, the odd n interpolant (13.19) is a real
trigonometric polynomial, whereas in the even version (13.20) only the highest frequency
term c
−m
e
−i mx
produces a complex term — which is, in fact, 0 on the sample points.
Compression and Noise Removal
In a typical experimental signal, noise primarily affects the high frequency modes,
while the authentic features tend to appear in the low frequencies. Think of the hiss and
static you hear on an AM radio station or a low quality audio recording. Thus, a very
simple, but effective, method for denoising a corrupted signal is to decompose it into its
Fourier modes, as in (13.4), and then discard the high frequency constituents. A similar
idea underlies the Dolby
T
recording system used on most movie soundtracks: during
the recording process, the high frequency modes are artificially boosted, so that scaling
them back when showing the movie in the theater has the effect of eliminating much of
the extraneous noise. The one design issue is the specification of a cut-off between low
and high frequency, that is, between signal and noise. This choice will depend upon the
properties of the measured signal, and is left to the discretion of the signal processor.
A correct implementation of the denoising procedure is facilitated by using the un-
aliased forms (13.19, 20) of the trigonometric interpolant, in which the low frequency
summands only appear when [ k [ is small. In this version, to eliminate high frequency
components, we replace the full summation by
q
l
(x) =
l
¸
k=−l
c
k
e
i kx
, (13.23)
where l <
1
2
(n + 1) specifies the selected cut-off frequency between signal and noise. The
2l + 1 ≪ n low frequency Fourier modes retained in (13.23) will, in favorable situations,
capture the essential features of the original signal while simultaneously eliminating the
high frequency noise.
In Figure 13.5 we display a sample signal followed by the same signal corrupted by
adding in random noise. We use n = 2
8
= 256 sample points in the discrete Fourier repre-
sentation, and to remove the noise, we retain only the 2l +1 = 11 lowest frequency modes.
In other words, instead of all n = 512 Fourier coefficients c
−256
, . . . , c
−1
, c
0
, c
1
, . . . , c
255
, we
only compute the 11 lowest order ones c
−5
, . . . , c
5
. Summing up just those 11 exponentials
produces the denoised signal q(x) = c
−5
e
−5 i x
+ + c
5
e
5 i x
. To compare, we plot both
the original signal and the denoised version on the same graph. In this case, the maximal
deviation is less than .15 over the entire interval [ 0, 2π].
2/25/07 700 c (2006 Peter J. Olver
1 2 3 4 5 6
1
2
3
4
5
6
7
The Original Signal
1 2 3 4 5 6
1
2
3
4
5
6
7
The Noisy Signal
1 2 3 4 5 6
1
2
3
4
5
6
7
The Denoised Signal
1 2 3 4 5 6
1
2
3
4
5
6
7
Comparison of the Two
Figure 13.5. Denoising a Signal.
1 2 3 4 5 6
1
2
3
4
5
6
7
The Original Signal
1 2 3 4 5 6
1
2
3
4
5
6
7
Moderate Compression
1 2 3 4 5 6
1
2
3
4
5
6
7
High Compression
Figure 13.6. Compressing a Signal.
The same idea underlies many data compression algorithms for audio recordings, digi-
tal images and, particularly, video. The goal is efficient storage and/or transmission of the
signal. As before, we expect all the important features to be contained in the low frequency
constituents, and so discarding the high frequency terms will, in favorable situations, not
lead to any noticeable degradation of the signal or image. Thus, to compress a signal
(and, simultaneously, remove high frequency noise), we retain only its low frequency dis-
crete Fourier coefficients. The signal is reconstructed by summing the associated truncated
discrete Fourier series (13.23). A mathematical justification of Fourier-based compression
algorithms relies on the fact that the Fourier coefficients of smooth functions tend rapidly
to zero — the smoother the function, the faster the decay rate. Thus, the small high
frequency Fourier coefficients will be of negligible importance.
In Figure 13.6, the same signal is compressed by retaining, respectively, 2l + 1 = 21
and 2l + 1 = 7 Fourier coefficients only instead of all n = 512 that would be required for
complete accuracy. For the case of moderate compression, the maximal deviation between
the signal and the compressed version is less than 1.5 10
−4
over the entire interval,
2/25/07 701 c (2006 Peter J. Olver
while even the highly compressed version deviates at most .05 from the original signal. Of
course, the lack of any fine scale features in this particular signal means that a very high
compression can be achieved — the more complicated or detailed the original signal, the
more Fourier modes need to be retained for accurate reproduction.
The Fast Fourier Transform
While one may admire an algorithm for its intrinsic beauty, in the real world, the
bottom line is always efficiency of implementation: the less total computation, the faster
the processing, and hence the more extensive the range of applications. Orthogonality is
the first and most important feature of many practical linear algebra algorithms, and is the
critical feature of Fourier analysis. Still, even the power of orthogonality reaches its limits
when it comes to dealing with truly large scale problems such as three-dimensional medical
imaging or video processing. In the early 1960’s, James Cooley and John Tukey, [44],
discovered

a much more efficient approach to the Discrete Fourier Transform, exploiting
the rather special structure of the sampled exponential vectors. The resulting algorithm is
known as the Fast Fourier Transform, often abbreviated FFT, and its discovery launched
the modern revolution in digital signal and data processing, [30, 31].
In general, computing all the discrete Fourier coefficients (13.15) of an n times sampled
signal requires a total of n
2
complex multiplications and n
2
−n complex additions. Note
also that each complex addition
z + w = (x + i y) + (u + i v) = (x + u) + i (y + v) (13.24)
generally requires two real additions, while each complex multiplication
z w = (x + i y)(u + i v) = (xu −yv) + i (xv + yu) (13.25)
requires 4 real multiplications and 2 real additions, or, by employing the alternative formula
xv + yu = (x + y)(u + v) −xu −yv (13.26)
for the imaginary part, 3 real multiplications and 5 real additions. (The choice of formula
(13.25) or (13.26) will depend upon the processor’s relative speeds of multiplication and
addition.) Similarly, given the Fourier coefficients c
0
, . . . , c
n−1
, reconstruction of the sam-
pled signal via (13.4) requires n
2
−n complex multiplications and n
2
−n complex additions.
As a result, both computations become quite labor intensive for large n. Extending these
ideas to multi-dimensional data only exacerbates the problem.
In order to explain the method without undue complication, we return to the original,
aliased form of the discrete Fourier representation (13.4). (Once one understands how the
FFT works, one can easily adapt the algorithm to the low frequency version (13.20).) The
seminal observation is that if the number of sample points
n = 2m

In fact, the key ideas can be found in Gauss’ hand computations in the early 1800’s, but his
insight was not fully appreciated until modern computers arrived on the scene.
2/25/07 702 c (2006 Peter J. Olver
is even, then the primitive m
th
root of unity ζ
m
=
m

1 equals the square of the primitive
n
th
root:
ζ
m
= ζ
2
n
.
We use this fact to split the summation (13.15) for the order n discrete Fourier coefficients
into two parts, collecting together the even and the odd powers of ζ
k
n
:
c
k
=
1
n

f
0
+ f
1
ζ
−k
n
+ f
2
ζ
−2k
n
+ + f
n−1
ζ
−(n−1)k
n

=
1
n

f
0
+ f
2
ζ
−2k
n
+ f
4
ζ
−4k
n
+ + f
2m−2
ζ
−(2m−2)k
n

+
+ ζ
−k
n
1
n

f
1
+ f
3
ζ
−2k
n
+ f
5
ζ
−4k
n
+ + f
2m−1
ζ
−(2m−2)k
n

=
1
2

1
m

f
0
+ f
2
ζ
−k
m
+ f
4
ζ
−2k
m
+ + f
2m−2
ζ
−(m−1)k
m

+
+
ζ
−k
n
2

1
m

f
1
+ f
3
ζ
−k
m
+ f
5
ζ
−2k
m
+ + f
2m−1
ζ
−(m−1)k
m

.
(13.27)
Now, observe that the expressions in braces are the order m Fourier coefficients for the
sample data
f
e
=

f
0
, f
2
, f
4
, . . . , f
2m−2

T
=

f(x
0
), f(x
2
), f(x
4
), . . . , f(x
2m−2
)

T
,
f
o
=

f
1
, f
3
, f
5
, . . . , f
2m−1

T
=

f(x
1
), f(x
3
), f(x
5
), . . . , f(x
2m−1
)

T
.
(13.28)
Note that f
e
is obtained by sampling f(x) on the even sample points x
2j
, while f
o
is
obtained by sampling the same function f(x), but now at the odd sample points x
2j+1
. In
other words, we are splitting the original sampled signal into two “half-sampled” signals
obtained by sampling on every other point. The even and odd Fourier coefficients are
c
e
k
=
1
m

f
0
+ f
2
ζ
−k
m
+ f
4
ζ
−2k
m
+ + f
2m−2
ζ
−(m−1)k
m

,
c
o
k
=
1
m

f
1
+ f
3
ζ
−k
m
+ f
5
ζ
−2k
m
+ + f
2m−1
ζ
−(m−1)k
m

,
k = 0, . . . , m−1.
(13.29)
Since they contain just m data values, both the even and odd samples require only m
distinct Fourier coefficients, and we adopt the identification
c
e
k+m
= c
e
k
, c
o
k+m
= c
o
k
, k = 0, . . . , m−1. (13.30)
Therefore, the order n = 2m discrete Fourier coefficients (13.27) can be constructed from
a pair of order m discrete Fourier coefficients via
c
k
=
1
2

c
e
k
+ ζ
−k
n
c
o
k

, k = 0, . . . , n −1. (13.31)
Now if m = 2l is also even, then we can play the same game on the order m Fourier
coefficients (13.29), reconstructing each of them from a pair of order l discrete Fourier
coefficients — obtained by sampling the signal at every fourth point. If n = 2
r
is a power
of 2, then this game can be played all the way back to the start, beginning with the trivial
order 1 discrete Fourier representation, which just samples the function at a single point.
2/25/07 703 c (2006 Peter J. Olver
The result is the desired algorithm. After some rearrangement of the basic steps, we arrive
at the Fast Fourier Transform, which we now present in its final form.
We begin with a sampled signal on n = 2
r
sample points. To efficiently program
the Fast Fourier Transform, it helps to write out each index 0 ≤ j < 2
r
in its binary (as
opposed to decimal) representation
j = j
r−1
j
r−2
. . . j
2
j
1
j
0
, where j
ν
= 0 or 1; (13.32)
the notation is shorthand for its r digit binary expansion
j = j
0
+ 2j
1
+ 4j
2
+ 8j
3
+ + 2
r−1
j
r−1
.
We then define the bit reversal map
ρ(j
r−1
j
r−2
. . . j
2
j
1
j
0
) = j
0
j
1
j
2
. . . j
r−2
j
r−1
. (13.33)
For instance, if r = 5, and j = 13, with 5 digit binary representation 01101, then ρ(j) = 22
has the reversed binary representation 10110. Note especially that the bit reversal map
ρ = ρ
r
depends upon the original choice of r = log
2
n.
Secondly, for each 0 ≤ k < r, define the maps
α
k
(j) = j
r−1
. . . j
k+1
0 j
k−1
. . . j
0
,
β
k
(j) = j
r−1
. . . j
k+1
1 j
k−1
. . . j
0
= α
k
(j) + 2
k
,
for j = j
r−1
j
r−2
. . . j
1
j
0
.
(13.34)
In other words, α
k
(j) sets the k
th
binary digit of j to 0, while β
k
(j) sets it to 1. In the
preceding example, α
2
(13) = 9, with binary form 01001, while β
2
(13) = 13 with binary
form 01101. The bit operations (13.33, 34) are especially easy to implement on modern
binary computers.
Given a sampled signal f
0
, . . . , f
n−1
, its discrete Fourier coefficients c
0
, . . . , c
n−1
are
computed by the following iterative algorithm:
c
(0)
j
= f
ρ(j)
, c
(k+1)
j
=
1
2

c
(k)
α
k
(j)
+ ζ
−j
2
k+1
c
(k)
β
k
(j)

,
j = 0, . . . , n −1,
k = 0, . . . , r −1,
(13.35)
in which ζ
2
k+1 is the primitive 2
k+1
root of unity. The final output of the iterative proce-
dure, namely
c
j
= c
(r)
j
, j = 0, . . . , n −1, (13.36)
are the discrete Fourier coefficients of our signal. The preprocessing step of the algorithm,
where we define c
(0)
j
, produces a more convenient rearrangement of the sample values.
The subsequent steps successively combine the Fourier coefficients of the appropriate even
and odd sampled subsignals together, reproducing (13.27) in a different notation. The
following example should help make the overall process clearer.
Example 13.3. Consider the case r = 3, and so our signal has n = 2
3
= 8 sampled
values f
0
, f
1
, . . . , f
7
. We begin the process by rearranging the sample values
c
(0)
0
= f
0
, c
(0)
1
= f
4
, c
(0)
2
= f
2
, c
(0)
3
= f
6
, c
(0)
4
= f
1
, c
(0)
5
= f
5
, c
(0)
6
= f
3
, c
(0)
7
= f
7
,
2/25/07 704 c (2006 Peter J. Olver
in the order specified by the bit reversal map ρ. For instance ρ(3) = 6, or, in binary
notation, ρ(011) = 110.
The first stage of the iteration is based on ζ
2
= −1. Equation (13.35) gives
c
(1)
0
=
1
2
(c
(0)
0
+ c
(0)
1
), c
(1)
1
=
1
2
(c
(0)
0
−c
(0)
1
), c
(1)
2
=
1
2
(c
(0)
2
+ c
(0)
3
), c
(1)
3
=
1
2
(c
(0)
2
−c
(0)
3
),
c
(1)
4
=
1
2
(c
(0)
4
+ c
(0)
5
), c
(1)
5
=
1
2
(c
(0)
4
−c
(0)
5
), c
(1)
6
=
1
2
(c
(0)
6
+ c
(0)
7
), c
(1)
7
=
1
2
(c
(0)
6
−c
(0)
7
),
where we combine successive pairs of the rearranged sample values. The second stage of
the iteration has k = 1 with ζ
4
= i . We find
c
(2)
0
=
1
2
(c
(1)
0
+ c
(1)
2
), c
(2)
1
=
1
2
(c
(1)
1
− i c
(1)
3
), c
(2)
2
=
1
2
(c
(1)
0
−c
(1)
2
), c
(2)
3
=
1
2
(c
(1)
1
+ i c
(1)
3
),
c
(2)
4
=
1
2
(c
(1)
4
+ c
(1)
6
), c
(2)
5
=
1
2
(c
(1)
5
− i c
(1)
7
), c
(2)
6
=
1
2
(c
(1)
4
−c
(1)
6
), c
(2)
7
=
1
2
(c
(1)
5
+ i c
(1)
7
).
Note that the indices of the combined pairs of coefficients differ by 2. In the last step,
where k = 2 and ζ
8
=

2
2
(1 + i ), we combine coefficients whose indices differ by 4 = 2
2
;
the final output
c
0
= c
(3)
0
=
1
2
(c
(2)
0
+ c
(2)
4
), c
4
= c
(3)
4
=
1
2
(c
(2)
0
−c
(2)
4
),
c
1
= c
(3)
1
=
1
2

c
(2)
1
+

2
2
(1 − i ) c
(2)
5

, c
5
= c
(3)
5
=
1
2

c
(2)
1


2
2
(1 − i ) c
(2)
5

,
c
2
= c
(3)
2
=
1
2

c
(2)
2
− i c
(2)
6

, c
6
= c
(3)
6
=
1
2

c
(2)
2
+ i c
(2)
6

,
c
3
= c
(3)
3
=
1
2

c
(2)
3


2
2
(1 + i ) c
(2)
7

, c
7
= c
(3)
7
=
1
2

c
(2)
3
+

2
2
(1 + i ) c
(2)
7

,
is the complete set of discrete Fourier coefficients.
Let us count the number of arithmetic operations required in the Fast Fourier Trans-
form algorithm. At each stage in the computation, we must perform n = 2
r
complex
additions/subtractions and the same number of complex multiplications. (Actually, the
number of multiplications is slightly smaller since multiplications by ±1 and ± i are ex-
tremely simple. However, this does not significantly alter the final operations count.)
There are r = log
2
n stages, and so we require a total of r n = nlog
2
n complex addi-
tions/subtractions and the same number of multiplications. Now, when n is large, nlog
2
n
is significantly smaller than n
2
, which is the number of operations required for the direct
algorithm. For instance, if n = 2
10
= 1, 024, then n
2
= 1, 048, 576, while nlog
2
n = 10, 240
— a net savings of 99%. As a result, many large scale computations that would be in-
tractable using the direct approach are immediately brought into the realm of feasibility.
This is the reason why all modern implementations of the Discrete Fourier Transform are
based on the FFT algorithm and its variants.
The reconstruction of the signal from the discrete Fourier coefficients c
0
, . . . , c
n−1
is
speeded up in exactly the same manner. The only differences are that we replace ζ
−1
n
= ζ
n
by ζ
n
, and drop the factors of
1
2
since there is no need to divide by n in the final result
(13.4). Therefore, we apply the slightly modified iterative procedure
f
(0)
j
= c
ρ(j)
, f
(k+1)
j
= f
(k)
α
k
(j)
+ ζ
j
2
k+1
f
(k)
β
k
(j)
,
j = 0, . . . , n −1,
k = 0, . . . , r −1,
(13.37)
2/25/07 705 c (2006 Peter J. Olver
and finish with
f(x
j
) = f
j
= f
(r)
j
, j = 0, . . . , n −1. (13.38)
Example 13.4. The reconstruction formulae in the case of n = 8 = 2
3
Fourier coef-
ficients c
0
, . . . , c
7
, which were computed in Example 13.3, can be implemented as follows.
First, we rearrange the Fourier coefficients in bit reversed order:
f
(0)
0
= c
0
, f
(0)
1
= c
4
, f
(0)
2
= c
2
, f
(0)
3
= c
6
, f
(0)
4
= c
1
, f
(0)
5
= c
5
, f
(0)
6
= c
3
, f
(0)
7
= c
7
,
Then we begin combining them in successive pairs:
f
(1)
0
= f
(0)
0
+ f
(0)
1
, f
(1)
1
= f
(0)
0
−f
(0)
1
, f
(1)
2
= f
(0)
2
+ f
(0)
3
, f
(1)
3
= f
(0)
2
−f
(0)
3
,
f
(1)
4
= f
(0)
4
+ f
(0)
5
, f
(1)
5
= f
(0)
4
−f
(0)
5
, f
(1)
6
= f
(0)
6
+ f
(0)
7
, f
(1)
7
= f
(0)
6
−f
(0)
7
.
Next,
f
(2)
0
= f
(1)
0
+ f
(1)
2
, f
(2)
1
= f
(1)
1
+ i f
(1)
3
, f
(2)
2
= f
(1)
0
−f
(1)
2
, f
(2)
3
= f
(1)
1
− i f
(1)
3
,
f
(2)
4
= f
(1)
4
+ f
(1)
6
, f
(2)
5
= f
(1)
5
+ i f
(1)
7
, f
(2)
6
= f
(1)
4
−f
(1)
6
, f
(2)
7
= f
(1)
5
− i f
(1)
7
.
Finally, the sampled signal values are
f(x
0
) = f
(3)
0
= f
(2)
0
+ f
(2)
4
, f(x
4
) = f
(3)
4
= f
(2)
0
−f
(2)
4
,
f(x
1
) = f
(3)
1
= f
(2)
1
+

2
2
(1 + i ) f
(2)
5
, f(x
5
) = f
(3)
5
= f
(2)
1


2
2
(1 + i ) f
(2)
5
,
f(x
2
) = f
(3)
2
= f
(2)
2
+ i f
(2)
6
, f(x
6
) = f
(3)
6
= f
(2)
2
− i f
(2)
6
,
f(x
3
) = f
(3)
3
= f
(2)
3


2
2
(1 − i ) f
(2)
7
, f(x
7
) = f
(3)
7
= f
(2)
3
+

2
2
(1 − i ) f
(2)
7
.
13.2. Wavelets.
Trigonometric Fourier series, both continuous and discrete, are amazingly powerful,
but they do suffer from one potentially serious defect. The basis functions e
i kx
= cos kx+
i sin kx are spread out over the entire interval [ −π, π], and so are not well-suited to
processing localized signals — meaning data that are concentrated in a relatively small
regions. Indeed, the most concentrated data of all — a single delta function — has every
Fourier component of equal magnitude in its Fourier series (12.61) and its high degree
of localization is completely obscured. Ideally, one would like to construct a system of
functions that is orthogonal, and so has all the advantages of the Fourier trigonometric
functions, but, in addition, adapts to localized structures in signals. This dream was the
inspiration for the development of the modern theory of wavelets.
The Haar Wavelets
Although the modern era of wavelets started in the mid 1980’s, the simplest example
of a wavelet basis was discovered by the Hungarian mathematician Alfr´ed Haar in 1910,
[88]. We consider the space of functions (signals) defined the interval [ 0, 1], equipped with
the standard L
2
inner product
' f , g ` =

1
0
f(x) g(x) dx. (13.39)
2/25/07 706 c (2006 Peter J. Olver
-0.2 0.2 0.4 0.6 0.8 1 1.2
-1
-0.5
0.5
1
ϕ
1
(x)
-0.2 0.2 0.4 0.6 0.8 1 1.2
-1
-0.5
0.5
1
ϕ
2
(x)
-0.2 0.2 0.4 0.6 0.8 1 1.2
-1
-0.5
0.5
1
ϕ
3
(x)
-0.2 0.2 0.4 0.6 0.8 1 1.2
-1
-0.5
0.5
1
ϕ
4
(x)
Figure 13.7. The First Four Haar Wavelets.
This choice is merely for convenience, being slightly better suited to our construction than
[ −π, π] or [ 0, 2π]. Moreover, the usual scaling arguments can be used to adapt the wavelet
formulas to any other interval.
The Haar wavelets are certain piecewise constant functions. The first four are graphed
in Figure 13.7 The first is the box function
ϕ
1
(x) = ϕ(x) =

1, 0 < x ≤ 1,
0, otherwise,
(13.40)
known as the scaling function, for reasons that shall appear shortly. Although we are only
interested in the value of ϕ(x) on the interval [ 0, 1], it will be convenient to extend it, and
all the other wavelets, to be zero outside the basic interval. Its values at the points of
discontinuity, i.e., 0, 1, is not critical, but, unlike the Fourier series midpoint value, it will
be more convenient to consistently choose the left hand limiting value. The second Haar
function
ϕ
2
(x) = w(x) =

1, 0 < x ≤
1
2
,
−1,
1
2
< x ≤ 1,
0, otherwise,
(13.41)
is known as the mother wavelet. The third and fourth Haar functions are compressed
2/25/07 707 c (2006 Peter J. Olver
versions of the mother wavelet:
ϕ
3
(x) = w(2x) =

1, 0 < x ≤
1
4
,
−1,
1
4
< x ≤
1
2
,
0, otherwise,
ϕ
4
(x) = w(2x −1) =

1,
1
2
< x ≤
3
4
,
−1,
3
4
< x ≤ 1,
0, otherwise,
called daughter wavelets. One can easily check, by direct evaluation of the integrals, that
the four Haar wavelet functions are orthogonal with respect to the L
2
inner product (13.39).
The scaling transformation x → 2x serves to compress the wavelet function, while
the translation 2x → 2x − 1 moves the compressed version to the right by a half a unit.
Furthermore, we can represent the mother wavelet by compressing and translating the
scaling function,:
w(x) = ϕ(2x) −ϕ(2x −1). (13.42)
It is these two operations of scaling and compression — coupled with the all-important
orthogonality — that underlies the power of wavelets.
The Haar wavelets have an evident discretization. If we decompose the interval (0, 1]
into the four subintervals

0 ,
1
4

,

1
4
,
1
2

,

1
2
,
3
4

,

3
4
, 1

, (13.43)
on which the four wavelet functions are constant, then we can represent each of them by
a vector in R
4
whose entries are the values of each wavelet function sampled at the left
endpoint of each subinterval. In this manner, we obtain the wavelet sample vectors
v
1
=

¸
¸
1
1
1
1

, v
2
=

¸
¸
1
1
−1
−1

, v
3
=

¸
¸
1
−1
0
0

, v
4
=

¸
¸
0
0
1
−1

, (13.44)
that form the orthogonal wavelet basis of R
4
we encountered in Examples 2.35 and 5.10.
Orthogonality of the vectors (13.44) with respect to the standard Euclidean dot product is
equivalent to orthogonality of the Haar wavelet functions with respect to the inner product
(13.39). Indeed, if
f(x) ∼ f = (f
1
, f
2
, f
3
, f
4
) and g(x) ∼ g = (g
1
, g
2
, g
3
, g
4
)
are piecewise constant real functions that achieve the indicated values on the four subin-
tervals (13.43), then their L
2
inner product
' f , g ` =

1
0
f(x) g(x) dx =
1
4

f
1
g
1
+ f
2
g
2
+ f
3
g
3
+ f
4
g
4

=
1
4
f g,
is equal to the averaged dot product of their sample values — the real form of the inner
product (13.8) that was used in the discrete Fourier transform.
Since the vectors (13.44) form an orthogonal basis of R
4
, we can uniquely decompose
any such piecewise constant function as a linear combination of wavelets
f(x) = c
1
ϕ
1
(x) + c
2
ϕ
2
(x) + c
3
ϕ
3
(x) + c
4
ϕ
4
(x),
2/25/07 708 c (2006 Peter J. Olver
or, equivalently, in terms of the sample vectors,
f = c
1
v
1
+ c
2
v
2
+ c
3
v
3
+ c
4
v
4
.
The required coefficients
c
k
=
' f , ϕ
k
`
| ϕ
k
|
2
=
f v
k
| v
k
|
2
are fixed by our usual orthogonality formula (5.7). Explicitly,
c
1
=
1
4
(f
1
+ f
2
+ f
3
+ f
4
),
c
2
=
1
4
(f
1
+ f
2
−f
3
−f
4
),
c
3
=
1
2
(f
1
−f
2
),
c
4
=
1
2
(f
3
−f
4
).
Before proceeding to the more general case, let us introduce an important analytical
definition that quantifies precisely how localized a function is.
Definition 13.5. The support of a function f(x), written suppf, is the closure of
the set where f(x) = 0.
Thus, a point will belong to the support of f(x), provided f is not zero there, or at
least is not zero at nearby points. More precisely:
Lemma 13.6. If f(a) = 0, then a ∈ supp f. More generally, a point a ∈ supp f if
and only if there exist a convergent sequence x
n
→ a such that f(x
n
) = 0. Conversely,
a ∈ supp f if and only if f(x) ≡ 0 on an interval a −δ < x < a + δ for some δ > 0.
Intuitively, the smaller the support of a function, the more localized it is. For example,
the support of the Haar mother wavelet (13.41) is suppw = [ 0, 1] — the point x = 0 is
included, even though w(0) = 0, because w(x) = 0 at nearby points. The two daughter
wavelets have smaller support:
supp ϕ
3
=

0 ,
1
2

, supp ϕ
4
=

1
2
, 1

,
and so are twice as localized. An extreme case is the delta function, whose support is a
single point. In contrast, the support of the Fourier trigonometric basis functions is all of
R, since they only vanish at isolated points.
The effect of scalings and translations on the support of a function is easily discerned.
Lemma 13.7. If supp f = [ a, b], and
g(x) = f(r x −δ), then supp g =
¸
a + δ
r
,
b + δ
r

.
In other words, scaling x by a factor r compresses the support of the function by a
factor 1/r, while translating x translates the support of the function.
The key requirement for a wavelet basis is that it contains functions with arbitrarily
small support. To this end, the full Haar wavelet basis is obtained from the mother wavelet
by iterating the scaling and translation processes. We begin with the scaling function
ϕ(x), (13.45)
2/25/07 709 c (2006 Peter J. Olver
from which we construct the mother wavelet via (13.42). For any “generation” j ≥ 0, we
form the wavelet offspring by first compressing the mother wavelet so that its support fits
into an interval of length 2
−j
,
w
j,0
(x) = w(2
j
x), so that suppw
j,0
= [ 0, 2
−j
], (13.46)
and then translating w
j,0
so as to fill up the entire interval [ 0, 1] by 2
j
subintervals, each
of length 2
−j
, defining
w
j,k
(x) = w
j,0
(x −k) = w(2
j
x −k), where k = 0, 1, . . . 2
j
−1. (13.47)
Lemma 13.7 implies that supp w
j,k
= [ 2
−j
k, 2
−j
(k + 1) ], and so the combined supports
of all the j
th
generation of wavelets is the entire interval:
2
j
−1
[
k=0
supp w
j,k
= [ 0, 1] . The
primal generation, j = 0, just consists of the mother wavelet
w
0,0
(x) = w(x).
The first generation, j = 1, consists of the two daughter wavelets already introduced as
ϕ
3
and ϕ
4
, namely
w
1,0
(x) = w(2x), w
1,1
(x) = w(2x −1).
The second generation, j = 2, appends four additional granddaughter wavelets to our
basis:
w
2,0
(x) = w(4x), w
2,1
(x) = w(4x −1), w
2,2
(x) = w(4x −2), w
2,3
(x) = w(4x −3).
The 8 Haar wavelets ϕ, w
0,0
, w
1,0
, w
1,1
, w
2,0
, w
2,1
, w
2,2
, w
2,3
are constant on the 8 subin-
tervals of length
1
8
, taking the successive sample values indicated by the columns of the
matrix
W
8
=

¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
1 1 1 0 1 0 0 0
1 1 1 0 −1 0 0 0
1 1 −1 0 0 1 0 0
1 1 −1 0 0 −1 0 0
1 −1 0 1 0 0 1 0
1 −1 0 1 0 0 −1 0
1 −1 0 −1 0 0 0 1
1 −1 0 −1 0 0 0 −1

. (13.48)
Orthogonality of the wavelets is manifested in the orthogonality of the columns of W
8
.
(Unfortunately, the usual terminological constraints of Definition 5.18 prevent us from
calling W
8
an orthogonal matrix because its columns are not orthonormal!)
The n
th
stage consists of 2
n+1
different wavelet functions comprising the scaling func-
tions and all the generations up to the n
th
: w
0
(x) = ϕ(x) and w
j,k
(x) for 0 ≤ j ≤ n and
0 ≤ k < 2
j
. They are all constant on each subinterval of length 2
−n−1
.
Theorem 13.8. The wavelet functions ϕ(x), w
j,k
(x) form an orthogonal system
with respect to the inner product (13.39).
2/25/07 710 c (2006 Peter J. Olver
Proof : First, note that each wavelet w
j,k
(x) is equal to +1 on an interval of length
2
−j−1
and to −1 on an adjacent interval of the same length. Therefore,
' w
j,k
, ϕ` =

1
0
w
j,k
(x) dx = 0, (13.49)
since the +1 and −1 contributions cancel each other. If two different wavelets w
j,k
and
w
l,m
with, say j ≤ l, have supports which are either disjoint, or just overlap at a single
point, then their product w
j,k
(x) w
l,m
(x) ≡ 0, and so their inner product is clearly zero:
' w
j,k
, w
l,m
` =

1
0
w
j,k
(x) w
l,m
(x) dx = 0.
Otherwise, except in the case when the two wavelets are identical, the support of w
l,m
is
entirely contained in an interval where w
j,k
is constant and so w
j,k
(x) w
l,m
(x) = ±w
l,m
(x).
Therefore, by (13.49),
' w
j,k
, w
l,m
` =

1
0
w
j,k
(x) w
l,m
(x) dx = ±

1
0
w
l,m
(x) dx = 0.
Finally, we compute
| ϕ|
2
=

1
0
dx = 1, | w
j,k
|
2
=

1
0
w
j,k
(x)
2
dx = 2
−j
. (13.50)
The second formula follows from the fact that [ w
j,k
(x) [ = 1 on an interval of length 2
−j
and is 0 elsewhere. Q.E.D.
In direct analogy with the trigonometric Fourier series, the wavelet series of a signal
f(x) is given by
f(x) ∼ c
0
ϕ(x) +

¸
j =0
2
j
−1
¸
k=0
c
j,k
w
j,k
(x). (13.51)
Orthogonality implies that the wavelet coefficients c
0
, c
j,k
can be immediately computed
using the standard inner product formula coupled with (13.50):
c
0
=
' f , ϕ`
| ϕ|
2
=

1
0
f(x) dx,
c
j,k
=
' f , w
j,k
`
| w
j,k
|
2
= 2
j

2
−j
k+2
−j−1
2
−j
k
f(x) dx − 2
j

2
−j
(k+1)
2
−j
k+2
−j−1
f(x) dx.
(13.52)
The convergence properties of the wavelet series (13.51) are similar to those of Fourier
series; details can be found [52].
Example 13.9. In Figure 13.8, we plot the Haar expansions of the signal in the first
plot. The next plots show the partial sums over j = 0, . . . , r with r = 2, 3, 4, 5, 6. We have
used a discontinuous signal to demonstrate that there is no nonuniform Gibbs phenomenon
in a Haar wavelet expansion. Indeed, since the wavelets are themselves discontinuous, they
do not have any difficult uniformly converging to a discontinuous function. On the other
hand, it takes quite a few wavelets to begin to accurately reproduce the signal. In the last
plot, we combine a total of 2
6
= 64 Haar wavelets, which is considerably more than would
be required in a comparably accurate Fourier expansion (excluding points very close to
the discontinuity).
2/25/07 711 c (2006 Peter J. Olver
0.2 0.4 0.6 0.8 1
1
2
3
4
5
6
7
0.2 0.4 0.6 0.8 1
1
2
3
4
5
6
7
0.2 0.4 0.6 0.8 1
1
2
3
4
5
6
7
0.2 0.4 0.6 0.8 1
1
2
3
4
5
6
7
0.2 0.4 0.6 0.8 1
1
2
3
4
5
6
7
0.2 0.4 0.6 0.8 1
1
2
3
4
5
6
7
Figure 13.8. Haar Wavelet Expansion.
Remark: To the novice, there may appear to be many more wavelets than trigono-
metric functions. But this is just another illusion of the magic show of infinite dimensional
space. The point is that they both form a countably infinite set of functions, and so could,
if necessary, but less conveniently, be numbered in order 1, 2, 3, . . . . On the other hand,
accurate reproduction of functions usually does require many more Haar wavelets. This
handicap makes the Haar system of less use in practical situations, and served to motivate
the search for a more sophisticated choice of wavelet basis.
Just as the discrete Fourier representation arises from a sampled version of the full
Fourier series, so there is a discrete wavelet transformation for suitably sampled signals.
To take full advantage of the wavelet basis, we sample the signal f(x) at n = 2
r
equally
spaced sample points x
k
= k/2
r
, for k = 0, . . . , n −1, on the interval [ 0, 1]. As before, we
can identify the sampled signal with the vector
f =

f(x
0
), f(x
1
), . . . , f(x
n−1
)

T
=

f
0
, f
1
, . . . , f
n−1

T
∈ R
n
. (13.53)
Since we are only sampling on intervals of length 2
−r
, the discrete wavelet transform of our
sampled signal will only use the first n = 2
r
wavelets ϕ(x) and w
j,k
(x) for j = 0, . . . , r −1
and k = 0, . . . , 2
j
− 1. Let w
0
∼ ϕ(x) and w
j,k
∼ w
j,k
(x) denote the corresponding
sampled wavelet vectors; all of their entries are either +1, −1, or 0. (When r = 3, so
n = 8, these are the columns of the wavelet matrix (13.48).) Moreover, orthogonality of
the wavelets immediately implies that the wavelet vectors form an orthogonal basis of R
n
with n = 2
r
. We can decompose our sample vector (13.53) as a linear combination of the
sampled wavelets,
f = ´c
0
w
0
+
r−1
¸
j =0
2
j
−1
¸
k=0
´c
j,k
w
j,k
, (13.54)
2/25/07 712 c (2006 Peter J. Olver
where, by our usual orthogonality formulae,
´c
0
=
' f , w
0
`
| w
0
|
2
=
1
2
r
2
r
−1
¸
i =0
f
i
,
´c
j,k
=
' f , w
j,k
`
| w
j,k
|
2
= 2
j−r

¸
k+2
r−j−1
−1
¸
i =k
f
i

k+2
r−j
−1
¸
i =k+2
r−j−1
f
i

.
(13.55)
These are the basic formulae connecting the functions f(x), or, rather, its sample vector
f , and its discrete wavelet transform consisting of the 2
r
coefficients ´c
0
, ´c
j,k
. The recon-
structed function
¯
f(x) = ´c
0
ϕ(x) +
r−1
¸
j =0
2
j
−1
¸
k=0
´c
j,k
w
j,k
(x) (13.56)
is constant on each subinterval of length 2
−r
, and has the same value
¯
f(x) =
¯
f(x
i
) = f(x
i
) = f
i
, x
i
= 2
−r
i ≤ x < x
i+1
= 2
−r
(i + 1),
as our signal at the left hand endpoint of the interval. In other words, we are interpolating
the sample points by a piecewise constant (and thus discontinuous) function.
Modern Wavelets
The main defect of the Haar wavelets is that they do not provide a very efficient means
of representing even very simple functions — it takes quite a large number of wavelets to
reproduce signals with any degree of precision. The reason for this is that the Haar
wavelets are piecewise constant, and so even an affine function y = αx +β requires many
sample values, and hence a relatively extensive collection of Haar wavelets, to be accurately
reproduced. In particular, compression and denoising algorithms based on Haar wavelets
are either insufficiently precise or hopelessly inefficient, and hence of minor practical value.
For a long time it was thought that it was impossible to simultaneously achieve the
requirements of localization, orthogonality and accurate reproduction of simple functions.
The breakthrough came in 1988, when, in her Ph.D. thesis, the Dutch mathematician In-
grid Daubechies produced the first examples of wavelet bases that realized all three basic
criteria. Since then, wavelets have developed into a sophisticated and burgeoning industry
with major impact on modern technology. Significant applications include compression,
storage and recognition of fingerprints in the FBI’s data base, and the JPEG2000 image
format, which, unlike earlier Fourier-based JPEG standards, incorporates wavelet tech-
nology in its image compression and reconstruction algorithms. In this section, we will
present a brief outline of the basic ideas underlying Daubechies’ remarkable construction.
The recipe for any wavelet system involves two basic ingredients — a scaling function
and a mother wavelet. The latter can be constructed from the scaling function by a
prescription similar to that in (13.42), and therefore we first concentrate on the properties
of the scaling function. The key requirement is that the scaling function must solve a
2/25/07 713 c (2006 Peter J. Olver
-2 -1 1 2 3 4
-0.2
0.2
0.4
0.6
0.8
1
1.2
Figure 13.9. The Hat Function.
dilation equation of the form
ϕ(x) =
p
¸
k=0
c
k
ϕ(2x −k) = c
0
ϕ(2x) + c
1
ϕ(2x −1) + + c
p
ϕ(2x −p) (13.57)
for some collection of constants c
0
, . . . , c
p
. The dilation equation relates the function ϕ(x)
to a finite linear combination of its compressed translates. The coefficients c
0
, . . . , c
p
are
not arbitrary, since the properties of orthogonality and localization will impose certain
rather stringent requirements.
Example 13.10. The Haar or box scaling function (13.40) satisfies the dilation
equation (13.57) with c
0
= c
1
= 1, namely
ϕ(x) = ϕ(2x) + ϕ(2x −1). (13.58)
We recommend that you convince yourself of the validity of this identity before continuing.
Example 13.11. Another example of a scaling function is the hat function
ϕ(x) =

x, 0 ≤ x ≤ 1,
2 −x, 1 ≤ x ≤ 2,
0, otherwise,
(13.59)
graphed in Figure 13.9, whose variants play a starring role in the finite element method,
cf. (11.167). The hat function satisfies the dilation equation
ϕ(x) =
1
2
ϕ(2x) + ϕ(2x −1) +
1
2
ϕ(2x −2), (13.60)
which is (13.57) with c
0
=
1
2
, c
1
= 1, c
2
=
1
2
. Again, the reader should be able to check
this identity by hand.
The dilation equation (13.57) is a kind of functional equation, and, as such, is not
so easy to solve. Indeed, the mathematics of functional equations remains much less well
developed than that of differential equations or integral equations. Even to prove that
(nonzero) solutions exist is a nontrivial analytical problem. Since we already know two
2/25/07 714 c (2006 Peter J. Olver
explicit examples, let us defer the discussion of solution techniques until we understand
how the dilation equation can be used to construct a wavelet basis.
Given a solution to the dilation equation, we define the mother wavelet to be
w(x) =
p
¸
k=0
(−1)
k
c
p−k
ϕ(2x −k)
= c
p
ϕ(2x) −c
p−1
ϕ(2x −1) + c
p−2
ϕ(2x −2) + ±c
0
ϕ(2x −p),
(13.61)
This formula directly generalizes the Haar wavelet relation (13.42), in light of its dilation
equation (13.58). The daughter wavelets are then all found, as in the Haar basis, by
iteratively compressing and translating the mother wavelet:
w
j,k
(x) = w(2
j
x −k). (13.62)
In the general framework, we do not necessarily restrict our attention to the interval [ 0, 1]
and so j and k can, in principle, be arbitrary integers.
Let us investigate what sort of conditions should be imposed on the dilation coeffi-
cients c
0
, . . . , c
p
in order that we obtain a viable wavelet basis by this construction. First,
localization of the wavelets requires that the scaling function has bounded support, and
so ϕ(x) ≡ 0 when x lies outside some bounded interval [ a, b]. If we integrate both sides of
(13.57), we find

b
a
ϕ(x) dx =


−∞
ϕ(x) dx =
p
¸
k=0
c
k


−∞
ϕ(2x −k) dx. (13.63)
Now using the change of variables y = 2x −k, with dx =
1
2
dy, we find


−∞
ϕ(2x −k) dx =
1
2


−∞
ϕ(y) dy =
1
2

b
a
ϕ(x) dx, (13.64)
where we revert to x as our (dummy) integration variable. We substitute this result back
into (13.63). Assuming that

b
a
ϕ(x) dx = 0, we discover that the dilation coefficients must
satisfy
c
0
+ + c
p
= 2. (13.65)
Example 13.12. Once we impose the constraint (13.65), the very simplest version
of the dilation equation is
ϕ(x) = 2 ϕ(2x) (13.66)
where c
0
= 2 is the only (nonzero) coefficient. Up to constant multiple, the only “solutions”
of the functional equation (13.66) with bounded support are scalar multiples of the delta
function δ(x), which follows from the identity in Exercise 11.2.5. Other solutions, such as
ϕ(x) = 1/x, are not localized, and thus not useful for constructing a wavelet basis.
2/25/07 715 c (2006 Peter J. Olver
The second condition we require is orthogonality of the wavelets. For simplicity, we
only consider the standard L
2
inner product

' f , g ` =


−∞
f(x) g(x) dx.
It turns out that the orthogonality of the complete wavelet system is guaranteed once we
know that the scaling function ϕ(x) is orthogonal to all its integer translates:
' ϕ(x) , ϕ(x −m) ` =


−∞
ϕ(x) ϕ(x −m) dx = 0 for all m = 0. (13.67)
We first note the formula
' ϕ(2x −k) , ϕ(2x −l) ` =


−∞
ϕ(2x −k) ϕ(2x −l) dx (13.68)
=
1
2


−∞
ϕ(x) ϕ(x + k −l) dx =
1
2
' ϕ(x) , ϕ(x + k −l) `
follows from the same change of variables y = 2x − k used in (13.64). Therefore, since ϕ
satisfies the dilation equation (13.57),
' ϕ(x) , ϕ(x −m) ` =

p
¸
j =0
c
j
ϕ(2x −j) ,
p
¸
k=0
c
k
ϕ(2x −2m−k)
¸
(13.69)
=
p
¸
j,k=0
c
j
c
k
' ϕ(2x −j) , ϕ(2x −2m−k) ` =
1
2
p
¸
j,k=0
c
j
c
k
' ϕ(x) , ϕ(x + j −2m−k) `.
If we require orthogonality (13.67) of all the integer translates of ϕ, then the left hand side
of this identity will be 0 unless m = 0, while only the summands with j = 2m + k will be
nonzero on the right. Therefore, orthogonality requires that
¸
0 ≤k ≤p−2m
c
2m+k
c
k
=

2, m = 0,
0, m = 0.
(13.70)
The algebraic equations (13.65, 70) for the dilation coefficients are the key requirements
for the construction of an orthogonal wavelet basis.
For example, if we have just two nonzero coefficients c
0
, c
1
, then (13.65, 70) reduce to
c
0
+ c
1
= 2, c
2
0
+ c
2
1
= 2,
and so c
0
= c
1
= 1 is the only solution, resulting in the Haar dilation equation (13.58). If
we have three coefficients c
0
, c
1
, c
2
, then (13.65), (13.70) require
c
0
+ c
1
+ c
2
= 2, c
2
0
+ c
2
1
+ c
2
2
= 2, c
0
c
2
= 0.

In all instances, the functions have bounded support, and so the inner product integral can
be reduced to an integral over a finite interval where both f and g are nonzero.
2/25/07 716 c (2006 Peter J. Olver
Thus either c
2
= 0, c
0
= c
1
= 1, and we are back to the Haar case, or c
0
= 0, c
1
= c
2
= 1,
and the resulting dilation equation is a simple reformulation of the Haar case; see Exercise
. In particular, the hat function (13.59) does not give rise to orthogonal wavelets.
The remarkable fact, discovered by Daubechies, is that there is a nontrivial solution
for four (and, indeed, any even number) of nonzero coefficients c
0
, c
1
, c
2
, c
3
. The basic
equations (13.65), (13.70) require
c
0
+ c
1
+ c
2
+ c
3
= 2, c
2
0
+ c
2
1
+ c
2
2
+ c
2
3
= 2, c
0
c
2
+ c
1
c
3
= 0. (13.71)
The particular values
c
0
=
1+

3
4
, c
1
=
3+

3
4
, c
2
=
3−

3
4
, c
3
=
1−

3
4
,
(13.72)
solve (13.71). These coefficients correspond to the Daubechies dilation equation
ϕ(x) =
1+

3
4
ϕ(2x) +
3+

3
4
ϕ(2x −1) +
3−

3
4
ϕ(2x −2) +
1−

3
4
ϕ(2x −3).
(13.73)
Any nonzero solution of bounded support to this remarkable functional equation will give
rise to a scaling function ϕ(x), a mother wavelet
w(x) =
1−

3
4
ϕ(2x) −
3−

3
4
ϕ(2x −1) +
3+

3
4
ϕ(2x −2) −
1+

3
4
ϕ(2x −3),
(13.74)
and then, by compression and translation (13.62), the complete system of orthogonal
wavelets w
j,k
(x).
Before explaining how to solve the Daubechies dilation equation, let us complete the
proof of orthogonality. It is easy to see that, by translation invariance, since ϕ(x) and
ϕ(x −m) are orthogonal for any m = 0, so are ϕ(x −k) and ϕ(x −l) for any k = l. Next
we prove orthogonality of ϕ(x −m) and w(x):
' w(x) , ϕ(x −m) ` =

p
¸
j =0
(−1)
j+1
c
j
ϕ(2x −1 + j) ,
p
¸
k=0
c
k
ϕ(2x −2m−k)
¸
=
p
¸
j,k=0
(−1)
j+1
c
j
c
k
' ϕ(2x −1 + j) , ϕ(2x −2m−k) `
=
1
2
p
¸
j,k=0
(−1)
j+1
c
j
c
k
' ϕ(x) , ϕ(x −1 + j −2m−k) `,
using (13.68). By orthogonality (13.67) of the translates of ϕ, the only summands that are
nonzero are when j = 2m + k + 1; the resulting coefficient of | ϕ(x) |
2
is
¸
k
(−1)
k
c
1−2m−k
c
k
= 0,
where the sum is over all 0 ≤ k ≤ p such that 0 ≤ 1 −2m−k ≤ p. Each term in the sum
appears twice, with opposite signs, and hence the result is always zero — no matter what
the coefficients c
0
, . . . , c
p
are! The proof of orthogonality of the translates w(x − m) of
the mother wavelet, along with all her wavelet descendants w(2
j
x −k), relies on a similar
argument, and the details are left as an exercise for the reader.
2/25/07 717 c (2006 Peter J. Olver
Figure 13.10. Approximating the Daubechies Wavelet.
Solving the Dilation Equation
Let us next discuss how to solve the dilation equation (13.57). The solution we are
after does not have an elementary formula, and we require a slightly sophisticated approach
to recover it. The key observation is that (13.57) has the form of a fixed point equation
ϕ = F[ ϕ],
not in ordinary Euclidean space, but in an infinite-dimensional function space. With luck,
the fixed point (or, more correctly, fixed function) will be stable, and so starting with a
suitable initial guess ϕ
0
(x), the successive iterates
ϕ
n+1
= F[ ϕ
n
]
will converge to the desired solution: ϕ
n
(x) −→ ϕ(x). In detail, the iterative version of
the dilation equation (13.57) reads
ϕ
n+1
(x) =
p
¸
k=0
c
k
ϕ
n
(2x −k), n = 0, 1, 2, . . . . (13.75)
Before attempting to prove convergence of this iterative procedure to the Daubechies scal-
ing function, let us experimentally investigate what happens.
A reasonable choice for the initial guess might be the Haar scaling or box function
ϕ
0
(x) =

1, 0 < t ≤ 1.
0, otherwise.
In Figure 13.10 we graph the next 5 iterates ϕ
1
(x), . . . , ϕ
5
(x). There clearly appears to
be converging to some function ϕ(x), although the final result does look a little bizarre.
Bolstered by this preliminary experimental evidence, we can now try to prove convergence
of the iterative scheme. This turns out to be true; a fully rigorous proof relies on the
Fourier transform, [52], but is a little too advanced for this text and will be omitted.
Theorem 13.13. The functions converge ϕ
n
(x) defined by the iterative functional
equation (13.75) converge uniformly to a continuous function ϕ(x), called the Daubechies
scaling function.
2/25/07 718 c (2006 Peter J. Olver
Once we have established convergence, we are now able to verify that the scaling
function and consequential system of wavelets form an orthogonal system of functions.
Proposition 13.14. All integer translates ϕ(x − k), for k ∈ Z of the Daubechies
scaling function, and all wavelets w
j,k
(x) = w(2
j
x − k), j ≥ 0, are mutually orthogonal
functions with respect to the L
2
inner product. Moreover, | ϕ|
2
= 1, while | w
j,k
|
2
= 2
−j
.
Proof : As noted earlier, the orthogonality of the entire wavelet system will follow once
we know the orthogonality (13.67) of the scaling function and its integer translates. We
use induction to prove that this holds for all the iterates ϕ
n
(x), and so, in view of uniform
convergence, the limiting scaling function also satisfies this property. Details are relegated
to Exercise . Q.E.D.
In practical computations, the limiting procedure for constructing the scaling function
is not so convenient, and an alternative means of computing its values is employed. The
starting point is to determine its values at integer points. First, the initial box function
has values ϕ
0
(m) = 0 for all integers m ∈ Z except ϕ
0
(1) = 1. The iterative functional
equation (13.75) will then produce the values of the iterates ϕ
n
(m) at integer points m ∈ Z.
A simple induction will convince you that ϕ
n
(m) = 0 except for m = 1 and m = 2, and,
therefore, by (13.75),
ϕ
n+1
(1) =
3+

3
4
ϕ
n
(1) +
1+

3
4
ϕ
n
(2), ϕ
n+1
(2) =
1−

3
4
ϕ
n
(1) +
3−

3
4
ϕ
n
(2),
since all other terms are 0. This has the form of a linear iterative system
v
(n+1)
= Av
(n)
(13.76)
with coefficient matrix
A =

3+

3
4
1+

3
4
1−

3
4
3−

3
4

and where v
(n)
=

ϕ
n
(1)
ϕ
n
(2)

.
Referring back to Chapter 10, the solution to such an iterative system is specified by
the eigenvalues and eigenvectors of the coefficient matrix, which are
λ
1
= 1, v
1
=

1+

3
4
1−

3
4

, λ
2
=
1
2
, v
2
=

−1
1

.
We write the initial condition as a linear combination of the eigenvectors
v
(0)
=

ϕ
0
(1)
ϕ
0
(2)

=

1
0

= 2 v
1

1 −

3
2
v
2
.
The solution is
v
(n)
= A
n
v
(0)
= 2 A
n
v
1

1 −

3
2
A
n
v
2
= 2 v
1

1
2
n
1 −

3
2
v
2
.
2/25/07 719 c (2006 Peter J. Olver
-0.5 0.5 1 1.5 2 2.5 3 3.5
-1.5
-1
-0.5
0.5
1
1.5
2
-0.5 0.5 1 1.5 2 2.5 3 3.5
-1.5
-1
-0.5
0.5
1
1.5
2
Figure 13.11. The Daubechies Scaling Function and Mother Wavelet.
The limiting vector

ϕ(1)
ϕ(2)

= lim
n→∞
v
(n)
= 2v
1
=

1+

3
2
1−

3
2

gives the desired values of the scaling function:
ϕ(1) =
1 −

3
2
= 1.366025 . . . , ϕ(2) =
−1 +

3
2
= −.366025 . . . ,
ϕ(m) = 0, for all m = 1, 2.
(13.77)
With this in hand, the Daubechies dilation equation (13.73) then prescribes the func-
tion values ϕ

1
2
m

at all half integers, because when x =
1
2
m then 2x −k = m−k is an
integer. Once we know its values at the half integers, we can re-use equation (13.73) to give
its values at quarter integers
1
4
m. Continuing onwards, we determine the values of ϕ(x) at
all dyadic points, meaning rational numbers of the form x = m/2
j
for m, j ∈ Z. Continuity
will then prescribe its value at any other x ∈ R since x can be written as the limit of dyadic
numbers x
n
— namely those obtained by truncating its binary (base 2) expansion at the
n
th
digit beyond the decimal (or, rather “binary”) point. But, in practice, this latter step
is unnecessary, since all computers are ultimately based on the binary number system, and
so only dyadic numbers actually reside in a computer’s memory. Thus, there is no real
need to determine the value of ϕ at non-dyadic points.
The preceding scheme was used to produce the graphs of the Daubechies scaling
function in Figure 13.11. It is continuous, but non-differentiable function — and its graph
has a very jagged, fractal-like appearance when viewed at close range. The Daubechies
scaling function is, in fact, a close relative of the famous example of a continuous, nowhere
differentiable function originally due to Weierstrass, [127, 154], whose construction also
relies on a similar scaling argument.
With the values of the Daubechies scaling function on a sufficiently dense set of dyadic
points in hand, the consequential values of the mother wavelet are given by formula (13.74).
Note that supp ϕ = suppw = [ 0, 3]. The daughter wavelets are then found by the usual
compression and translation procedure (13.62).
2/25/07 720 c (2006 Peter J. Olver
0.2 0.4 0.6 0.8 1
2
4
6
8
0.2 0.4 0.6 0.8 1
2
4
6
8
0.2 0.4 0.6 0.8 1
2
4
6
8
0.2 0.4 0.6 0.8 1
2
4
6
8
0.2 0.4 0.6 0.8 1
2
4
6
8
0.2 0.4 0.6 0.8 1
2
4
6
8
Figure 13.12. Daubechies Wavelet Expansion.
The Daubechies wavelet expansion of a function whose support is contained in

[ 0, 1]
is then given by
f(x) ∼ c
0
ϕ(x) +

¸
j =0
2
j
−1
¸
k=−2
c
j,k
w
j,k
(x). (13.78)
The inner summation begins at k = −2 so as to include all the wavelet offspring w
j,k
whose
support has a nontrivial intersection with the interval [ 0, 1]. The wavelet coefficients c
0
, c
j,k
are computed by the usual orthogonality formula
c
0
= ' f , ϕ` =

3
0
f(x) ϕ(x) dx,
c
j,k
= ' f , w
j,k
` = 2
j

2
−j
(k+3)
2
−j
k
f(x) w
j,k
(x) dx =

3
0
f

2
−j
(x + k)

w(x) dx,
(13.79)
where we agree that f(x) = 0 whenever x < 0 or x > 1. In practice, one employs a
numerical integration procedure, e.g., the trapezoid rule, [9], based on dyadic nodes to
speedily evaluate the integrals (13.79). A proof of completeness of the resulting wavelet
basis functions can be found in [52]. Compression and denoising algorithms based on
retaining only low frequency modes proceed as before, and are left as exercises for the
reader to implement.
Example 13.15. In Figure 13.12, we plot the Daubechies wavelet expansions of
the same signal for Example 13.9. The first plot is the original signal, and the following
show the partial sums of (13.78) over j = 0, . . . , r with r = 2, 3, 4, 5, 6. Unlike the Haar
expansion, the Daubechies wavelets do exhibit the nonuniform Gibbs phenomenon at the
interior discontinuity as well as the endpoints, since the function is set to 0 outside the
interval [ 0, 1]. Indeed, the Daubechies wavelets are continuous, and so cannot converge
uniformly to a discontinuous function.

For functions with larger support, one should include additional terms in the expansion
corresponding to further translates of the wavelets so as to cover the entire support of the function.
Alternatively, one can translate and rescale x to fit the function’s support inside [ 0, 1].
2/25/07 721 c (2006 Peter J. Olver
13.3. The Fourier Transform.
Fourier series and their ilk are designed to solve boundary value problems on bounded
intervals. The extension of Fourier methods to unbounded intervals — the entire real line
— leads naturally to the Fourier transform, which is a powerful mathematical tool for the
analysis of non-periodic functions. The Fourier transform is of fundamental importance in
a broad range of applications, including both ordinary and partial differential equations,
quantum mechanics, signal processing, control theory, and probability, to name but a few.
We begin by motivating the Fourier transform as a limiting case of Fourier series.
Although the rigorous details are rather exacting, the underlying idea is not so difficult. Let
f(x) be a reasonably nice function defined for all −∞< x < ∞. The goal is to construct a
Fourier expansion for f(x) in terms of basic trigonometric functions. One evident approach
is to construct its Fourier series on progressively larger and larger intervals, and then take
the limit as the intervals’ length becomes infinite. The limiting process converts the Fourier
sums into integrals, and the resulting representation of a function is renamed the Fourier
transform. Since we are dealing with an infinite interval, there are no longer any periodicity
requirements on the function f(x). Moreover, the frequencies represented in the Fourier
transform are no longer constrained by the length of the interval, and so we are effectively
decomposing a quite general, non-periodic function into a continuous superposition of
trigonometric functions of all possible frequencies.
Let us present the details of this construction in a more concrete form. The computa-
tions will be significantly simpler if we work with the complex version of the Fourier series
from the outset. Our starting point is the rescaled Fourier series (12.85) on a symmetric
interval [ −ℓ , ℓ ] of length 2 ℓ, which we rewrite in the adapted form
f(x) ∼

¸
ν =−∞

π
2
´
f

(k
ν
)

e
i k
ν
x
. (13.80)
The sum is over the discrete collection of frequencies
k
ν
=
πν

, ν = 0, ±1, ±2, . . . , (13.81)
corresponding to those trigonometric functions that have period 2 ℓ. For reasons that will
soon become apparent, the Fourier coefficients of f are now denoted as
c
ν
= ' f , e
i k
ν
x
` =
1
2 ℓ


−ℓ
f(x) e
−i k
ν
x
dx =

π
2
´
f(k
ν
)

,
so that
´
f(k
ν
) =
1


−ℓ
f(x) e
−i k
ν
x
dx. (13.82)
This reformulation of the basic Fourier series formula allows us to smoothly pass to the
limit when the intervals’ length ℓ →∞.
On an interval of length 2 ℓ, the frequencies (13.81) required to represent a function
in Fourier series form are equally distributed, with interfrequency spacing
∆k = k
ν+1
−k
ν
=
π

.
2/25/07 722 c (2006 Peter J. Olver
As ℓ → ∞, the spacing ∆k → 0, and so the relevant frequencies become more and more
densely packed in the space of all possible frequencies: −∞ < k < ∞. In the limit, we
anticipate that all possible frequencies will be represented. Indeed, letting k
ν
= k be
arbitrary in (13.82), and sending ℓ →∞, results in the infinite integral
´
f(k) =
1


−∞
f(x) e
−i kx
dx (13.83)
known as the Fourier transform of the function f(x). If f(x) is a reasonably nice function,
e.g., piecewise continuous and decayin to 0 reasonably quickly as [ x[ → ∞, its Fourier
transform
´
f(k) is defined for all possible frequencies −∞ < k < ∞. This formula will
sometimes conveniently be abbreviated as
´
f(k) = T[ f(x)], (13.84)
where T is the Fourier transform operator. The extra

2π factor in front of the integral,
which is sometimes omitted, is for later convenience.
To reconstruct the function from its Fourier transform, we employ a similar limiting
procedure on the Fourier series (13.80), which we first rewrite in a more suggestive form:
f(x) ∼
1



¸
ν =−∞
´
f

(k
ν
) e
i k
ν
x
∆k. (13.85)
For each fixed value of x, the right hand side has the form of a Riemann sum, [9, 155],
over the entire frequency space −∞< k < ∞, for the function g

(k) =
´
f

(k) e
i kx
. Under
reasonable hypotheses, as ℓ → ∞, the functions
´
f

(k) →
´
f(k) converge to the Fourier
transform; moreover, the interfrequency spacing ∆k →0 and so one expects the Riemann
sums to converge to the integral
f(x) ∼
1


−∞
´
f(k) e
i kx
dk, (13.86)
known as the inverse Fourier transform, that serves to recover the original signal from its
Fourier transform. In abbreviated form
f(x) = T
−1
[
´
f(k)] (13.87)
is the inverse of the Fourier transform operator (13.84). In this manner, the Fourier series
(13.85) becomes a Fourier integral that reconstructs the function f(x) as a (continuous)
superposition of complex exponentials e
i kx
of all possible frequencies. The contribution
of each such exponential is the value of the Fourier transform
´
f(k) at the exponential’s
frequency.
It is also worth pointing out that both the Fourier transform (13.84) and its inverse
(13.87) define linear maps on function space. This means that the Fourier transform of
the sum of two functions is the sum of their individual transforms, while multiplying a
function by a constant multiplies its Fourier transform by the same factor:
T[ f(x) + g(x)] = T[ f(x)] +T[ g(x)] =
´
f(k) +´ g(k),
T[ cf(x)] = c T[ f(x)] = c
´
f(k).
(13.88)
2/25/07 723 c (2006 Peter J. Olver
-2 -1 1 2
-0.2
0.2
0.4
0.6
0.8
1
1.2
-3 -2 -1 1 2 3
-0.5
0.5
1
1.5
2
-2 -1 1 2
-0.2
0.2
0.4
0.6
0.8
1
1.2
-2 -1 1 2
-0.2
0.2
0.4
0.6
0.8
1
1.2
Figure 13.13. Fourier Transform of a Rectangular Pulse.
A similar statement hold for the inverse Fourier transform T
−1
.
Recapitulating, by letting the length of the interval go to ∞, the discrete Fourier
series has become a continuous Fourier integral, while the Fourier coefficients, which were
defined only at a discrete collection of possible frequencies, have become an entire function
´
f(k) defined on all of frequency space k ∈ R. The reconstruction of f(x) from its Fourier
transform
´
f(k) via (13.86) can be rigorously justified under suitable hypotheses. For
example, if f(x) is piecewise C
1
on all of R and f(x) → 0 decays reasonably rapidly as
[ x[ → ∞ ensuring that its Fourier integral (13.83) converges, then it can be proved that
the inverse Fourier integral (13.86) will converge to f(x) at all points of continuity, and
to the midpoint
1
2
(f(x

) + f(x
+
)) at jump discontinuities — just like a Fourier series. In
particular, its Fourier transform
´
f(k) → 0 must also decay as [ k [ → 0, implying that
(as with Fourier series) the very high frequency modes make negligible contributions to
the reconstruction of the signal. A more precise, general result will be formulated in
Theorem 13.28 below.
Example 13.16. The Fourier transform of a rectangular pulse or box function
f(x) = σ(x + a) −σ(x −a) =

1, −a < x < a,
0, [ x[ > a.
(13.89)
of width 2a is easily computed:
´
f(k) =
1

a
−a
e
−i kx
dx =
e
i ka
−e
−i ka

2π i k
=

2
π
sin ak
k
. (13.90)
On the other hand, the reconstruction of the pulse via the inverse transform (13.86) tells
2/25/07 724 c (2006 Peter J. Olver
us that
1
π


−∞
e
i kx
sin ak
k
dk = f(x) =

1, −a < x < a,
1
2
, x = ±a,
0, [ x[ > a.
(13.91)
Note the convergence to the middle of the jump discontinuities at x = ±a. Splitting this
complex integral into its real and imaginary parts, we deduce a pair of striking trigono-
metric integral identities
1
π


−∞
cos kx sin ak
k
dk =

1, −a < x < a,
1
2
, x = ±a,
0, [ x[ > a,
1
π


−∞
sin kx sin ak
k
dk = 0.
(13.92)
Just as many Fourier series yield nontrivial summation formulae, the reconstruction
of a function from its Fourier transform often leads to nontrivial integral identities. One
cannot compute the integral (13.91) by the Fundamental Theorem of Calculus, since there
is no elementary function

whose derivative equals the integrand. Moreover, it is not even
clear that the integral converges; indeed, the amplitude of the oscillatory integrand decays
like 1/[ k [, but the latter function does not have a convergent integral, and so the usual
comparison test for infinite integrals, [9, 46, 167], fails to apply. Thus, the convergence
of the integral is marginal at best: the trigonometric oscillations somehow overcome the
slow rate of decay of 1/k and thereby induce the (conditional) convergence of the integral!
In Figure 13.13 we display the box function with a = 1, its Fourier transform, along with
a reconstruction obtained by numerically integrating (13.92). Since we are dealing with
an infinite integral, we must break off the numerical integrator by restricting it to a finite
interval. The first graph is obtained by integrating from −5 ≤ k ≤ 5 while the second is
from −10 ≤ k ≤ 10. The non-uniform convergence of the integral leads to the appearance
of a Gibbs phenomenon at the two discontinuities, similar to that of a Fourier series.
Example 13.17. Consider an exponentially decaying right-handed pulse

f
r
(x) =

e
−ax
, x > 0,
0, x < 0,
(13.93)
where a > 0. We compute its Fourier transform directly from the definition:
´
f(k) =
1


0
e
−ax
e
−i kx
dx = −
1


e
−(a+i k)x
a + i k


x=0
=
1

2π (a + i k)
.

One can use Euler’s formula (3.84) to reduce the integrand to one of the form e
αk
/k, but it
can be proved, [int], that there is no formula for
Z
(e
αk
/k) dk in terms of elementary functions.

Note that we can’t Fourier transform the entire exponential function e
−ax
because it does
not go to zero at both ±∞, which is required for the integral (13.83) to converge.
2/25/07 725 c (2006 Peter J. Olver
-3 -2 -1 1 2 3
-0.2
0.2
0.4
0.6
0.8
1
Right Pulse f
r
(x)
-3 -2 -1 1 2 3
-0.2
0.2
0.4
0.6
0.8
1
Left Pulse f
l
(x)
-3 -2 -1 1 2 3
-0.2
0.2
0.4
0.6
0.8
1
Even Pulse f
e
(x)
-3 -2 -1 1 2 3
-1
-0.5
0.5
1
Odd Pulse f
o
(x)
Figure 13.14. Exponential Pulses.
As in the preceding example, the inverse Fourier transform for this function produces a
nontrivial complex integral identity:
1


−∞
e
i kx
a + i k
dk =

e
−ax
, x > 0,
1
2
, x = 0,
0, x < 0.
(13.94)
Similarly, a pulse that decays to the left,
f
l
(x) =

e
ax
, x < 0,
0, x > 0,
(13.95)
where a > 0 is still positive, has Fourier transform
´
f(k) =
1

2π (a − i k)
. (13.96)
This also follows from the general fact that the Fourier transform of f(−x) is
´
f(−k); see
Exercise . The even exponentially decaying pulse
f
e
(x) = e
−a| x |
(13.97)
is merely the sum of left and right pulses: f
e
= f
r
+ f
l
. Thus, by linearity,
´
f
e
(k) =
1

2π (a + i k)
+
1

2π (a − i k)
=

2
π
a
k
2
+ a
2
, (13.98)
The result is real and even because f
e
(x) is a real even function; see Exercise . The
inverse Fourier transform (13.86) produces another nontrivial integral identity:
e
−a| x |
=
1
π


−∞
a e
i kx
k
2
+ a
2
dk =
a
π


−∞
cos kx
k
2
+ a
2
dk. (13.99)
2/25/07 726 c (2006 Peter J. Olver
(The imaginary part of the integral vanishes because the integrand is odd.) On the other
hand, the odd exponentially decaying pulse,
f
o
(x) = (sign x) e
−a| x |
=

e
−ax
, x > 0,
−e
ax
, x < 0,
(13.100)
is the difference of the right and left pulses, f
o
= f
r
− f
l
, and has purely imaginary and
odd Fourier transform
´
f
o
(k) =
1

2π (a + i k)

1

2π (a − i k)
= −i

2
π
k
k
2
+ a
2
. (13.101)
The inverse transform is
(sign x) e
−a| x |
= −
i
π


−∞
k e
i kx
k
2
+ a
2
dk =
1
π


−∞
k sin kx
k
2
+ a
2
dk. (13.102)
As a final example, consider the rational function
f(x) =
1
x
2
+ c
2
, where c > 0. (13.103)
Its Fourier transform requires integrating
´
f(k) =
1


−∞
e
−i kx
x
2
+ a
2
dx. (13.104)
The indefinite integral (anti-derivative) does not appear in basic integration tables, and, in
fact, cannot be done in terms of elementary functions. However, we have just managed to
evaluate this particular integral! Look at (13.99). If we change x to k and k to −x, then
we exactly recover the integral (13.104) up to a factor of a

2/π. Therefore, in accordance
with Theorem 13.18, we conclude that the Fourier transform of (13.103) is
´
f(k) =

π
2
e
−a| k |
a
. (13.105)
This last example is indicative of an important general fact. The reader has no doubt
already noted the remarkable similarity between the Fourier transform (13.83) and its
inverse (13.86). Indeed, the only difference is that the former has a minus sign in the
exponential. This implies the following Symmetry Principle relating the direct and inverse
Fourier transforms.
Theorem 13.18. If the Fourier transform of the function f(x) is
´
f(k), then the
Fourier transform of
´
f(x) is f(−k).
Symmetry allows us to reduce the tabulation of Fourier transforms by half. For
instance, referring back to Example 13.16, we deduce that the Fourier transform of the
function
f(x) =

2
π
sin ax
x
2/25/07 727 c (2006 Peter J. Olver
is
´
f(k) = σ(−k + a) −σ(−k −a) = σ(k + a) −σ(k −a) =

1, −a < k < a,
1
2
, k = ±a,
0, [ k [ > a.
(13.106)
Note that, by linearity, we can divide both f(x) and
´
f(k) by

2/π to deduce the Fourier
transform of
sin ax
x
.
Warning: Some authors leave out the

2π factor in the definition (13.83) of the
Fourier transform
´
f(k). This alternative convention does have a slight advantage of elim-
inating many

2π factors in the Fourier transforms of elementary functions. However,
this requires an extra such factor in the reconstruction formula (13.86), which is achieved
by replacing

2π by 2π. A significant disadvantage is that the resulting formulae for
the Fourier transform and its inverse are not as close, and so the Symmetry Principle of
Theorem 13.18 requires some modification. (On the other hand, convolution — to be dis-
cussed below — is a little easier without the extra factor.) When consulting any particular
reference book, the reader always needs to check which convention is being used. See also
Exercise .
All of the functions in Example 13.17 required a > 0 for the Fourier integrals to con-
verge. The functions that emerge in the limit as a goes to 0 are of fundamental importance.
Let us start with the odd exponential pulse (13.100). When a → 0, the function f
o
(x)
converges to the sign function
f(x) = sign x = σ(x) −σ(−x) =

+1, x > 0,
−1, x < 0.
(13.107)
Taking the limit of the Fourier transform (13.101) leads to
´
f(k) = −i

2
π
1
k
. (13.108)
The nonintegrable singularity of
´
f(k) at k = 0 is indicative of the fact that the sign func-
tion does not decay as [ x[ →∞. In this case, neither the Fourier transform integral nor its
inverse are well-defined as standard (Riemann, or even Lebesgue, [155]) integrals. Never-
theless, it is possible to rigorously justify these results within the framework of generalized
functions.
More interesting are the even pulse functions f
e
(x), which, in the limit a →0, become
the constant function
f(x) ≡ 1. (13.109)
The limit of the Fourier transform (13.98) is
lim
a →0

2
π
2a
k
2
+ a
2
=

0, k = 0,
∞, k = 0.
(13.110)
2/25/07 728 c (2006 Peter J. Olver
This limiting behavior should remind the reader of our construction (11.31) of the delta
function as the limit of the functions
δ(x) = lim
n→∞
n
π (1 + n
2
x
2
)
= lim
a →0
a
π (a
2
+ x
2
)
,
Comparing with (13.110), we conclude that the Fourier transform of the constant function
(13.109) is a multiple of the delta function in the frequency variable:
´
f(k) =

2π δ(k). (13.111)
The direct transform integral
δ(k) =
1


−∞
e
−i kx
dx
is, strictly speaking, not defined because the infinite integral of the oscillatory sine and
cosine functions doesn’t converge! However, this identity can be validly interpreted within
the framework of weak convergence and generalized functions. On the other hand, the
inverse transform formula (13.86) yields


−∞
δ(k) e
i kx
dk = e
i k0
= 1,
which is in accord with the basic definition (11.36) of the delta function. As in the previous
case, the delta function singularity at k = 0 manifests the lack of decay of the constant
function.
Conversely, the delta function δ(x) has constant Fourier transform
´
δ(k) =
1


−∞
δ(x) e
−i kx
dx =
e
−i k0



1


. (13.112)
a result that also follows from the symmerty principle of Theorem 13.18. To determine
the Fourier transform of a delta spike δ
y
(x) = δ(x −y) concentrated at position x = y, we
compute
´
δ
y
(k) =
1


−∞
δ(x −y) e
−i kx
dx =
e
−i ky


. (13.113)
The result is a pure exponential in frequency space. Applying the inverse Fourier transform
(13.86) leads, formally, to the remarkable identity
δ
y
(x) = δ(x −y) =
1


−∞
e
−i k(x−y)
dk =
1

' e
i ky
, e
i kx
` , (13.114)
where ' , ` denotes the usual L
2
inner product on R. Since the delta function vanishes
for x = y, this identity is telling us that complex exponentials of differing frequencies
are mutually orthogonal. However, this statement must be taken with a grain of salt,
since the integral does not converge in the normal (Riemann or Lebesgue) sense. But
2/25/07 729 c (2006 Peter J. Olver
it is possible to make sense of this identity within the language of generalized functions.
Indeed, multiplying both sides by f(x), and then integrating with respect to x, we find
f(y) =
1


−∞


−∞
f(x) e
−i k(x−y)
dxdk. (13.115)
This is a perfectly valid formula, being a restatement (or, rather, combination) of the basic
formulae (13.83, 86) connecting the direct and inverse Fourier transforms of the function
f(x).
Conversely, the Symmetry Principle tells us that the Fourier transform of a pure
exponential e
i l x
will be a shifted delta spike

2π δ(k − l), concentrated in frequency
space. Both results are particular cases of the general Shift Theorem, whose proof is left
as an exercise for the reader.
Theorem 13.19. If f(x) has Fourier transform
´
f(k), then the Fourier transform of
the shifted function f(x−y) is e
−i ky
´
f(k). Similarly, the transform of the product function
e
i l x
f(x) for l real is the shifted transform
´
f(k −l).
Since the Fourier transform uniquely associates a function
´
f(k) on frequency space
with each (reasonable) function f(x) on physical space, one can characterize functions by
their transforms. Practical applications rely on tables (or, even better, computer algebra
systems) that recognize a wide variety of transforms of basic functions of importance
in applications. The accompanying table lists some of the most important examples of
functions and their Fourier transforms. Note that, according to the Symmetry Principle of
Theorem 13.18, each tabular entry can be used to deduce two different Fourier transforms.
A more extensive collection of Fourier transforms can be found in [140].
Derivatives and Integrals
One of the most remarkable and important properties of the Fourier transform is that
it converts calculus into algebra! More specifically, the two basic operations in calculus —
differentiation and integration of functions — are realized as algebraic operations on their
Fourier transforms. (The downside is that algebraic operations become more complicated
in the transform domain.)
Let us begin with derivatives. If we differentiate

the basic inverse Fourier transform
formula
f(x) ∼
1


−∞
´
f(k) e
i kx
dk.
with respect to x, we obtain
f

(x) ∼
1


−∞
i k
´
f(k) e
i kx
dk. (13.116)
The resulting integral is itself in the form of an inverse Fourier transform, namely of
2π i k
´
f(k) which immediately implies the following basic result.

We are assuming the integrand is sufficiently nice so that we can bring the derivative under
the integral sign; see [64, 195] for a fully rigorous derivation.
2/25/07 730 c (2006 Peter J. Olver
Short Table of Fourier Transforms
f(x)
´
f(k)
1

2π δ(k)
δ(x)
1


σ(x)

π
2
δ(k) −
i

2π k
sign x −i

2
π
1
k
σ(x + a) −σ(x −a)

2
π
sin ak
k
e
−ax
σ(x)
1

2π (a + i k)
e
ax
(1 −σ(x))
1

2π (a − i k)
e
−a| x |

2
π
a
k
2
+ a
2
e
−ax
2 e
−k
2
/4a

2a
tan
−1
x −i

π
2
e
−| k |
k
+
π
3/2

2
δ(k)
f(cx + d)
e
i k d/c
[ c [
´
f

k
c

f(x)
´
f(−k)
´
f(x) f(−k)
f

(x) i k
´
f(k)
xf(x) i
´
f

(k)
f ∗ g(x)


´
f(k) ´ g(k)
Note: The parameter a > 0 is always real and positive,
while c = 0 is any nonzero complex number.
2/25/07 731 c (2006 Peter J. Olver
Proposition 13.20. The Fourier transform of the derivative f

(x) of a function is
obtained by multiplication of its Fourier transform by i k:
T[ f

(x)] = i k
´
f(k). (13.117)
Similarly, the Fourier transform of xf(x) is obtained by differentiating the Fourier trans-
form of f(x):
T[ xf(x)] = i k
d
´
f
dk
. (13.118)
The second statement follows from the first by use of the Symmetry Principle of
Theorem 13.18.
Example 13.21. The derivative of the even exponential pulse f
e
(x) = e
−a| x |
is a
multiple of the odd exponential pulse f
o
(x) = (sign x) e
−a| x |
:
f

e
(x) = −a (sign x) e
−a| x |
= −af
o
(x).
Proposition 13.20 says that their Fourier transforms are related by
´
f
o
(k) = −
i k
a
´
f
e
(k) = −i

2
π
k
k
2
+ a
2
= −
i k
a
´
f
o
(k),
as previously noted in (13.98, 101). On the other hand, since the odd exponential pulse
has a jump discontinuity of magnitude 2 at x = 0, its derivative contains a delta function,
and is equal to
f

o
(x) = −a e
−a| x |
+ 2 δ(x) = −af
e
(x) + 2 δ(x).
This is reflected in the relation between their Fourier transforms. If we multiply (13.101)
by i k we obtain
i k
´
f
o
(k) =

2
π
k
2
k
2
+ a
2
=

2
π

2
π
a
2
k
2
+ a
2
= 2
´
δ(k) −a
´
f
e
(k).
The Fourier transform, just like Fourier series, is completely compatible with the calculus
of generalized functions.
Higher order derivatives are handled by iterating formula (13.117), and so:
Corollary 13.22. The Fourier transform of f
(n)
(x) is ( i k)
n
´
f(k).
This result has an important consequence: the smoothness of f(x) is manifested
in the rate of decay of its Fourier transform
´
f(k). We already noted that the Fourier
transform of a (nice) function must decay to zero at large frequencies:
´
f(k) → 0 as
[ k [ → ∞. If the n
th
derivative f
(n)
(x) is also a reasonable function, then its Fourier
transform
¯
f
(n)
(k) = ( i k)
n
´
f(k) must go to zero as [ k [ →∞. This requires that
´
f(k) go to
zero more rapidly than [ k [
−n
. Thus, the smoother f(x), the more rapid the decay of its
Fourier transform. As a general rule of thumb, local features of f(x), such as smoothness,
are manifested by global features of
´
f(k), such as decay for large [ k [. The Symmetry
2/25/07 732 c (2006 Peter J. Olver
Principle implies that reverse is also true: global features of f(x) correspond to local
features of
´
f(k). This local-global duality is one of the major themes of Fourier theory.
Integration of functions is the inverse operation to differentiation, and so should corre-
spond to division by 2π i k in frequency space. As with Fourier series, this is not completely
correct; there is an extra constant involved, and this contributes an extra delta function
in frequency space.
Proposition 13.23. If f(x) has Fourier transform
´
f(k), then the Fourier transform
of its integral g(x) =

x
−∞
f(y) dy is
´ g(k) = −
i
k
´
f(k) + π
´
f(0) δ(k). (13.119)
Proof : First notice that
lim
x →−∞
g(x) = 0, lim
x →+∞
g(x) =


−∞
f(x) dx =


´
f(0).
Therefore, by subtracting a suitable multiple of the step function from the integral, the
resulting function
h(x) = g(x) −


´
f(0) σ(x)
decays to 0 at both ±∞. Consulting our table of Fourier transforms, we find
´
h(k) = ´ g(k) −π
´
f(0) δ(k) +
i
k
´
f(0) . (13.120)
On the other hand,
h

(x) = f(x) −


´
f(0) δ(x).
Since h(x) →0 as [ x[ →∞, we can apply our differentiation rule (13.117), and conclude
that
i k
´
h(k) =
´
f(k) −
´
f(0). (13.121)
Combining (13.120) and (13.121) establishes the desired formula (13.119). Q.E.D.
Example 13.24. The Fourier transform of the inverse tangent function
f(x) = tan
−1
x =

x
0
dy
1 + y
2
=

x
−∞
dy
1 + y
2

π
2
can be computed by combining Proposition 13.23 with (13.105):
´
f(k) = −i

π
2
e
−| k |
k
+
π
3/2

2
δ(k).
Applications to Differential Equations
The fact that the Fourier transform changes differentiation in the physical domain
into multiplication in the frequency domain is one of its most compelling features. A
2/25/07 733 c (2006 Peter J. Olver
particularly important consequence is that the Fourier transform effectively converts dif-
ferential equations into algebraic equations, and thereby opens the door to their solution
by elementary algebra! One begins by applying the Fourier transform to both sides of
the differential equation under consideration. Solving the resulting algebraic equation will
produce a formula for the Fourier transform of the desired solution, which can then be
immediately reconstructed via the inverse Fourier transform.
The Fourier transform is particularly well adapted to boundary value problems on the
entire real line. In place of the boundary conditions used on finite intervals, we look for
solutions that decay to zero sufficiently rapidly as [ x[ →∞ — in order that their Fourier
transform be well-defined (in the context of ordinary functions). In quantum mechanics,
[129], these solutions are known as the bound states of the system, and correspond to
subatomic particles that are trapped or localized in a region of space by some sort of
force field. For example, the bound electrons in an atom are localized by the electrostatic
attraction of the nucleus.
As a specific example, consider the boundary value problem

d
2
u
dx
2
+ ω
2
u = h(x), −∞< x < ∞,
(13.122)
where ω > 0 is a positive constant. In lieu of boundary conditions, we require that the
solution u(x) → 0 as [ x[ → ∞. We will solve this problem by applying the Fourier
transform to both sides of the differential equation. Taking Corollary 13.22 into account,
the result is a linear algebraic equation
k
2
´ u(k) + ω
2
´ u(k) =
´
h(k)
relating the Fourier transforms of u and h. Unlike the differential equation, the transformed
equation can be immediately solved for
´ u(k) =
´
h(k)
k
2
+ ω
2
(13.123)
Therefore, we can reconstruct the solution by applying the inverse Fourier transform for-
mula (13.86):
u(x) =
1


−∞
´
h(k) e
i kx
k
2
+ ω
2
dk. (13.124)
For example, if the forcing function is an even exponential pulse,
h(x) = e
−| x |
with
´
h(k) =

2
π
1
k
2
+ 1
,
then (13.124) writes the solution as a Fourier integral:
u(x) =
1
π


−∞
e
i kx
(k
2
+ ω
2
)(k
2
+ 1)
dk =
1
π


−∞
cos kx
(k
2
+ ω
2
)(k
2
+ 1)
dk .
2/25/07 734 c (2006 Peter J. Olver
noting that the imaginary part of the complex integral vanishes because the integrand is
an odd function. (Indeed, if the forcing function is real, the solution must also be real.)
The Fourier integral can be explicitly evaluated by using partial fractions to rewrite
´ u(k) =

2
π
1
(k
2
+ ω
2
)(k
2
+ 1)
=

2
π
1
ω
2
−1

1
k
2
+ 1

1
k
2
+ ω
2

, ω
2
= 1,
Thus, according to our Fourier Transform Table, the solution to this boundary value
problem is
u(x) =
e
−| x |

1
ω
e
−ω | x |
ω
2
−1
when ω
2
= 1. (13.125)
The reader may wish to verify that this function is indeed a solution, meaning that it is
twice continuously differentiable (which is not so immediately apparent from the formula),
decays to 0 as [ x[ →∞, and satisfies the differential equation everywhere. The “resonant”
case ω
2
= 1 is left as an exercise.
Remark: The method of partial fractions that you learned in first year calculus is
often an effective tool for evaluating (inverse) Fourier transforms of rational functions.
A particularly important case is when the forcing function h(x) = δ
y
(x) = δ(x − y)
represents a unit impulse concentrated at x = y. The resulting square-integrable solution
is the Green’s function G(x, y) for the boundary value problem. According to (13.123), its
Fourier transform with respect to x is
´
G(k, y) =
1


e
−i ky
k
2
+ ω
2
.
which is the product of an exponential factor e
−i ky
, representing the Fourier transform
of δ
y
(x), times a multiple of the Fourier transform of the even exponential pulse e
−ω | x |
.
We apply Theorem 13.19, and conclude that the Green’s function for this boundary value
problem is an exponential pulse centered at y, namely
G(x, y) =
1
2 ω
e
−ω | x−y |
.
Observe that, as with other self-adjoint boundary value problems, the Green’s function is
symmetric under interchange of x and y. As a function of x, it satisfies the homogeneous
differential equation −u
′′
+ ω
2
u = 0, except at the point x = y when its derivative has
a jump discontinuity of unit magnitude. It also decays as [ x[ → ∞, as required by
the boundary conditions. The Green’s function superposition principle tells us that the
solution to the inhomogeneous boundary value problem (13.122) under a general forcing
can be represented in the integral form
u(x) =


−∞
G(x, y) h(y) dy =
1


−∞
e
−ω| x−y |
h(y) dy. (13.126)
The reader may enjoy recovering the particular exponential solution (13.125) from this
integral formula.
2/25/07 735 c (2006 Peter J. Olver
Convolution
The final Green’s function formula (13.126) is indicative of a general property of
Fourier transforms. The right hand side has the form of a convolution product between
functions.
Definition 13.25. The convolution of scalar functions f(x) and g(x) is the scalar
function h = f ∗ g defined by the formula
h(x) = f(x) ∗ g(x) =


−∞
f(x −y) g(y) dy. (13.127)
We record the basic properties of the convolution product, leaving their verification
as exercises for the reader. All of these assume that the implied convolution integrals
converge.
(a) Symmetry: f ∗ g = g ∗ f,
(b) Bilinearity:

f ∗ (ag + bh) = a(f ∗ g) + b(f ∗ h),
(af + bg) ∗ h = a(f ∗ h) + b(g ∗ h),
a, b ∈ C,
(c) Associativity: f ∗ (g ∗ h) = (f ∗ g) ∗ h,
(d) Zero function: f ∗ 0 = 0,
(e) Delta function: f ∗ δ = f.
One tricky feature is that the constant function 1 is not a unit for the convolution
product; indeed,
f ∗ 1 =


−∞
f(y) dy
is a constant function — the total integral of f — not the original function f(x). In fact,
according to the last property, the delta function plays the role of the “convolution unit”:
f(x) ∗ δ(x) =


−∞
f(x −y) δ(y) dy = f(x),
which follows from the basic property (11.36) of the delta function.
In particular, our solution (13.126) has the form of a convolution product between an
even exponential pulse g(x) =
1

e
−ω| x |
and the forcing function:
u(x) = g(x) ∗ h(x).
On the other hand, its Fourier transform (13.123) is the ordinary multiplicative product
´ u(k) = ´ g(k)
´
h(k)
of the Fourier transforms of g and h. In fact, this is a general property of the Fourier trans-
form: convolution in the physical domain corresponds to multiplication in the frequency
domain, and conversely.
2/25/07 736 c (2006 Peter J. Olver
Theorem 13.26. The Fourier transform of the convolution u(x) = f(x) ∗ g(x) of
two functions is a multiple of the product of their Fourier transforms:
´ u(k) =


´
f(k) ´ g(k).
Vice versa, the Fourier transform of their product h(x) = f(x) g(x) is, up to multiple, the
convolution of their Fourier transforms:
´
h(k) =
1


´
f(k) ∗ ´ g(k).
Proof : Combining the definition of the Fourier transform with the convolution for-
mula (13.127), we find
´ u(k) =
1


−∞
h(x) e
−i kx
dx =
1


−∞


−∞
f(x −y) g(y) e
−i kx
dxdy.
where we are assuming that the integrands are sufficiently nice to allow us to interchange
the order of integration, [9]. Applying the change of variables z = x − y in the inner
integral produces
´ u(k) =
1


−∞


−∞
f(z) g(y) e
−i k(y+z)
dy dz
=

1


−∞
f(z) e
−i kz
dz

1


−∞
g(y) e
−i ky
dy

=


´
f(k) ´ g(k).
The second statement follows directly from the Symmetry Principle of Theorem 13.18.
Q.E.D.
Example 13.27. We already know, (13.106), that the Fourier transform of
f(x) =
sin x
x
is the box function
´
f(k) =

π
2

σ(k + 1) −σ(k −1)

=

π
2
, −1 < k < 1,
0, [ k [ > 1,
We also know that the Fourier transform of
g(x) =
1
x
is ´ g(k) = −i

π
2
sign k.
Therefore, the Fourier transform of their product
h(x) = f(x) g(x) =
sin x
x
2
2/25/07 737 c (2006 Peter J. Olver
-3 -2 -1 1 2 3
-1
-0.5
0.5
1
Figure 13.15. The Fourier transform of
sin x
x
2
.
can be obtained by convolution:
´
h(k) =
1


´
f ∗ ´ g(k) =
1


−∞
´
f(l) ´ g(k −l) dl
= −i

π
8

1
−1
sign(k −l) dl =

i

π
2
k < −1,
−i

π
2
k, −1 < k < 1,
−i

π
2
k > 1.
A graph of
´
h(k) appears in Figure 13.15.
The Fourier Transform on Hilbert Space
While we do not have the space to embark on a fully rigorous treatment of the theory
underlying the Fourier transform, it is worth outlining a few of the most basic features.
We have already noted that the Fourier transform, when defined, is a linear map, taking
functions f(x) on physical space to functions
´
f(k) on frequency space. A critical question is
precisely which function space should the theory be applied to. Not every function admits
a Fourier transform in the classical sense

— the Fourier integral (13.83) is required to
converge, and this places restrictions on the function and its asymptotics at large distances.
It turns out the proper setting for the rigorous theory is the Hilbert space of complex-
valued square-integrable functions — the same infinite-dimensional vector space that lies
at the heart of modern quantum mechanics. In Section 12.5, we already introduced the
Hilbert space L2[ −π, π] on a finite interval; here we adapt Definition 12.33 to the entire
real line. Thus, the Hilbert space L
2
= L
2
(R) is the infinite-dimensional vector space

We leave aside the more advanced issues involving generalized functions in this subsection.
2/25/07 738 c (2006 Peter J. Olver
consisting of all complex-valued functions f(x) which are defined for all x ∈ R and have
finite L
2
norm:
| f |
2
=


−∞
[ f(x) [
2
dx < ∞. (13.128)
For example, any piecewise continuous function that satisfies the decay criterion
[ f(x) [ ≤
M
[ x[
1/2+δ
, for all sufficiently large [ x[ ≫0, (13.129)
for some M > 0 and δ > 0, belongs to L
2
. However, as in Section 12.5, Hilbert space
contains many more functions, and the precise definitions and identification of its elements
is quite subtle. The inner product on the Hilbert space L
2
is prescribed in the usual
manner,
' f , g ` =


−∞
f(x) g(x) dx.
The Cauchy–Schwarz inequality
[ ' f , g ` [ ≤ | f | | g |
ensures that the inner product integral is finite whenever f, g ∈ L
2
.
Let us state the fundamental theorem governing the effect of the Fourier transform on
functions in Hilbert space. It can be regarded as a direct analog of the pointwise conver-
gence Theorem 12.7 for Fourier series. A fully rigorous proof can be found in [175, 195].
Theorem 13.28. If f(x) ∈ L
2
is square-integrable, then its Fourier transform
´
f(k) ∈
L
2
is a well-defined, square-integrable function of the frequency variable k. If f(x) is
continuously differentiable at a point x, then its inverse Fourier transform (13.86) equals
its value f(x). More generally, if the right and left hand limits f(x

), f

(x

), and f(x
+
),
f

(x
+
) exist, then the inverse Fourier transform integral converges to the average value
1
2

f(x

) + f(x
+
)

.
Thus, the Fourier transform
´
f = T[ f ] defines a linear transformation from L
2
func-
tions of x to L
2
functions of k. In fact, the Fourier transform preserves inner products.
This important result is known as Parseval’s formula, whose Fourier series counterpart
appeared in (12.117).
Theorem 13.29. If
´
f(k) = T[ f(x)] and ´ g(k) = T[ g(x)], then ' f , g ` = '
´
f , ´ g `, i.e.,


−∞
f(x) g(x) dx =


−∞
´
f(k) ´ g(k) dk. (13.130)
Proof : Let us sketch a formal proof that serves to motivate why this result is valid.
We use the definition (13.83) of the Fourier transform to evaluate


−∞
´
f(k) ´ g(k) dk =


−∞

1


−∞
f(x) e
−i kx
dx

1


−∞
g(y) e
+i ky
dy

dk
=


−∞


−∞
f(x) g(y)

1


−∞
e
−i k(x−y)
dk

dxdy.
2/25/07 739 c (2006 Peter J. Olver
Now according to (13.114), the inner k integral can be replaced by a delta function δ(x−y),
and hence


−∞
´
f(k) ´ g(k) dk =


−∞


−∞
f(x) g(y) δ(x −y) dxdy =


−∞
f(x) g(x) dx.
This completes our “proof”; see [175, 195] for a rigorous version. Q.E.D.
In particular, orthogonal functions, ' f , g ` = 0, will have orthogonal Fourier trans-
forms, '
´
f , ´ g ` = 0. Choosing f = g in (13.130) results in the Plancherel formula | f |
2
=
|
´
f |
2
, or, explicitly,


−∞
[ f(x[
2
dx =


−∞
[
´
f(k) [
2
dk. (13.131)
We conclude that the Fourier transform map T: L
2
→ L
2
defines a unitary or norm-
preserving linear transformation on Hilbert space, mapping L
2
functions of the physical
variable x to L
2
functions of the frequency variable k.
Quantum Mechanics and the Uncertainty Principle
In quantum mechanics, the wave functions or probability densities of a quantum sys-
tem are characterized as unit elements, | ϕ| = 1 of the underlying state space, which, in a
one-dimensional model of a single particle, is the Hilbert space L
2
= L
2
(R). All measurable
physical quantities are represented as linear operators A: L
2
→ L
2
acting on (a suitable
dense subspace of) the state space. These are obtained by the tahter myusterious process
of “quantizing” their classical counterparts, which are ordinary functions. In particular,
the particle’s position, usually denoted by Q, is represented by the operation of multiplica-
tion by x, so Q[ ϕ] = xϕ(x), whereas its momentum, denoted by P for historical reasons,
is represented by the differentiation operator P = d/dx, so that P[ ϕ] = ϕ

(x).
In its popularized form, Heisenberg’s Uncertainty Principle is a by now familiar philo-
sophical concept. The principle, first formulated by the twentieth century German physi-
cist Werner Heisenberg, one of the founders of modern quantum mechanics, states that,
in a physical system, certain quantities cannot be simultaneously measured with complete
accuracy. For instance, the more precisely one measures the position of a particle, the
less accuracy there will be in the measurement of its momentum; vice versa, the greater
the accuracy in the momentum, the less certainty in its position. A similar uncertainty
couples energy and time. Experimental verification of the uncertainty principle can be
found even in fairly simple situations. Consider a light beam passing through a small hole.
The position of the photons is constrained by the hole; the effect of their momenta is in
the pattern of light diffused on a screen placed beyond the hole. The smaller the hole, the
more constrained the position, and the wider the image on the screen, meaning the less
certainty there is in the momentum.
This is not the place to discuss the philosophical and experimental consequences of
Heisenberg’s principle. What we will show is that the Uncertainty Principle is, in fact, a
rigorous theorem concerning the Fourier transform! In quantum theory, each of the paired
quantities, e.g., position and momentum, are interrelated by the Fourier transform. Indeed,
2/25/07 740 c (2006 Peter J. Olver
Proposition 13.20 says that the Fourier transform of the differentiation operator represent-
ing monmentum is a multiplication operator representing position on the frequency space
and vice versa. This Fourier transform-based duality between position and momentum, or
multiplication and differentiation, lies at the heart of the Uncertainty Principle.
In general, suppose A is a linear operator on Hilbert space representing a physical
quantity. If the quantum system is in a state represented by a particular wave function
ϕ, then the localization of the quantity A is measured by the norm | A[ ϕ] |. The smaller
this norm, the more accurate the measurement. Thus, | Q[ ϕ] | = | xϕ(x) | measures the
localization of the position of the particle represented by ϕ; the smaller | Q[ ϕ] |, the more
concentrated the probability of finding the particle near

x = 0 and hence the smaller the
error in the measurement of its position. Similarly, by Plancherel’s formula (13.131),
| P[ ϕ] | = | ϕ

(x) | = | i k ´ ϕ(k) | = | k ´ ϕ(k) |,
measures the localization in the momentum of the particle, which is small if and only
if its Fourier transform is concentrated near k = 0, and hence the smaller the error in
its measured momentum. With this interpretation, the Uncertainty Principle states that
these two quantities cannot simultaneously be arbitrarily small.
Theorem 13.30. If ϕ(x) is a wave function, so | ϕ| = 1, then
| Q[ ϕ] | | P[ ϕ] | ≥
1
2
. (13.132)
Proof : The proof rests on the Cauchy–Schwarz inequality

' xϕ(x) , ϕ

(x) `

≤ | xϕ(x) | | ϕ

(x) | = | Q[ ϕ] | | P[ ϕ] |. (13.133)
On the other hand, writing out the inner product term
' xϕ(x) , ϕ

(x) ` =


−∞
xϕ(x) ϕ

(x) dx.
Let us integrate by parts, using the fact that
ϕ(x) ϕ

(x) =
d
dx

1
2
ϕ(x)
2

.
Since ϕ(x) →0 as [ x[ →∞, the boundary terms vanish, and hence
' xϕ(x) , ϕ

(x) ` =


−∞
xϕ(x) ϕ

(x) dx = −


−∞
1
2
ϕ(x)
2
dx = −
1
2
,
since | ϕ| = 1. Substituting back into (13.133) completes the proof. Q.E.D.
The inequality (13.132) quantifies the statement that the more accurately we measure
the momentum Q, the less accurately we are able to measure the position P, and vice
versa. For more details and physical consequences, you should consult an introductory
text on mathematical quantum mechanics, e.g., [124, 129].

At another position a, one replaces x by x −a.
2/25/07 741 c (2006 Peter J. Olver
13.4. The Laplace Transform.
In engineering applications, the Fourier transform is often overshadowed by a close
relative. The Laplace transform plays an essential role in control theory, linear systems
analysis, electronics, and many other fields of practical engineering and science. How-
ever, the Laplace transform is most properly interpreted as a particular real form of the
more fundamental Fourier transform. When the Fourier transform is evaluated along the
imaginary axis, the complex exponential factor becomes real, and the result is the Laplace
transform, which maps real-valued functions to real-valued functions. Since it is so closely
allied to the Fourier transform, the Laplace transform enjoys many of its featured proper-
ties, including linearity. Moreover, derivatives are transformed into algebraic operations,
which underlies its applications to solving differential equations. The Laplace transform
is one-sided; it only looks forward in time and prefers functions that decay — transients.
The Fourier transform looks in both directions and prefers oscillatory functions. For this
reason, while the Fourier transform is used to solve boundary value problems on the real
line, the Laplace transform is much better adapted to initial value problems.
Since we will be applying the Laplace transform to initial value problems, we switch
our variable from x to t to emphasize this fact. Suppose f(t) is a (reasonable) function
which vanishes on the negative axis, so f(t) = 0 for all t < 0. The Fourier transform of f
is
´
f(k) =
1


0
f(t) e
−i kt
dt,
since, by our assumption, its negative t values make no contribution to the integral. The
Laplace transform of such a function is obtained by replacing i k by a real

variable s,
leading to
F(s) = L[ f(t)] =


0
f(t) e
−st
dt, (13.134)
where, in accordance with the standard convention, the factor of

2π has been omitted.
By allowing complex values of the Fourier frequency variable k, we may identify the Laplace
transform with

2π times the evaluation of the Fourier transform for values of k = −i s
on the imaginary axis:
F(s) =


´
f(−i s). (13.135)
Since the exponential factor in the integral has become real, the Laplace transform L takes
real functions to real functions. Moreover, since the integral kernel e
−st
is exponentially
decaying for s > 0, we are no longer required to restrict our attention to functions that
decay to zero as t →∞.
Example 13.31. Consider an exponential function f(t) = e
αt
, where the exponent
α is allowed to be complex. Its Laplace transform is
F(s) =


0
e
(α−s)t
dt =
1
s −α
. (13.136)

One can also define the Laplace transform at complex values of s, but this will not be required
in the applications discussed here.
2/25/07 742 c (2006 Peter J. Olver
Note that the integrand is exponentially decaying, and hence the integral converges, if and
only if Re (α −s) < 0. Therefore, the Laplace transform (13.136) is, strictly speaking,
only defined at sufficiently large s > Re α. In particular, for an oscillatory exponential,
L[ e
i ωt
] =
1
s − i ω
=
s + i ω
s
2
+ ω
2
provided s > 0.
Taking real and imaginary parts of this identity, we discover the formulae for the Laplace
transforms of the simplest trigonometric functions:
L[ cos ωt ] =
s
s
2
+ ω
2
, L[ sin ωt ] =
ω
s
2
+ ω
2
. (13.137)
Two additional important transforms are
L[ 1] =


0
e
−st
dt =
1
s
, L[ t ] =


0
t e
−st
dt =
1
s


0
e
−st
dt =
1
s
2
. (13.138)
The second computation relies on an integration by parts, making sure that the boundary
terms at s = 0, ∞ vanish.
Remark: In every case, we really mean the Laplace transform of the function whose
values are given for t > 0 and is equal to 0 for all negative t. Therefore, the function 1 in
reality signifies the step function
σ(t) =

1, t > 0,
0, t < 0,
(13.139)
and so the first formula in (13.138) should more properly be written
L[ σ(t)] =
1
s
. (13.140)
However, in the traditional approach to the Laplace transform, one only considers the
functions on the positive t axis, and so the step function and the constant function are, from
this viewpoint, indistinguishable. However, once one moves beyond a purely mechanistic
approach, any deeper understanding of the properties of the Laplace transform requires
keeping this distinction firmly in mind.
Let us now pin down the precise class of functions to which the Laplace transform can
be applied.
Definition 13.32. A function f(t) is said to have exponential growth of order a if
[ f(t) [ < M e
at
, for all t > t
0
, (13.141)
for some M > 0 and t
0
> 0.
Note that the exponential growth condition only depends upon the function’s behavior
for large values of t. If a < 0, then f is, in fact, exponentially decaying as x → ∞. Since
e
at
< e
bt
for a < b and all t > 0, if f(t) has exponential growth of order a, it automat-
ically has exponential growth of any higher order b > a. All polynomial, trigonometric,
2/25/07 743 c (2006 Peter J. Olver
and exponential functions (with linear argument) have exponential growth. The simplest
example of a function that does not satisfy any exponential growth bound is f(t) = e
t
2
,
since it grows faster than any simple exponential e
at
.
The following result guarantees the existence of the Laplace transform, at least for
sufficiently large values of the transform variable s, for a rather broad class of functions
that includes almost all of the functions that arise in applications.
Theorem 13.33. If f(t) is piecewise continuous and has exponential growth of order
a, then its Laplace transform F(s) = L[ f(t)] is defined for all s > a.
Proof : The exponential growth inequality (13.141) implies that we can bound the in-
tegrand in (13.134) by [ f(t) e
−st
[ < M e
(a−s)t
. Therefore, as soon as s > a, the integrand
is exponentially decaying as t → ∞, and this suffices to ensure the convergence of the
Laplace transform integral. Q.E.D.
Theorem 13.33 is an existential result, and of course, in practice, we may not be able
to explicitly evaluate the Laplace transform integral. Nevertheless, the Laplace transforms
of most common functions are not hard to find, and extensive lists have been tabulated,
[141]. An abbreviated table of Laplace transforms can be found on the following page.
Nowadays, the most convenient sources of transform formulas are computer algebra pack-
ages, including Mathematica and Maple.
According to Theorem 13.28, when it exists, the Fourier transform uniquely specifies
the function, except possibly at jump discontinuities where the limiting value must be half
way in between. An analogous result can be established for the Laplace transform.
Lemma 13.34. If f(t) and g(t) are piecewise continuous functions that are of ex-
ponential growth, and L[ f(t)] = L[ g(t)] for all s sufficiently large, then f(t) = g(t) at all
points of continuity t > 0.
In fact, there is an explicit formula for the inverse Laplace transform, which follows
from its identification, (13.135), with the Fourier transform along the imaginary axis.
Under suitable hypotheses, a given function F(s) is the Laplace transform of the function
f(t) determined by the complex integral formula

f(t) =
1
2π i

i ∞
−i ∞
F(s) e
st
ds, t > 0. (13.142)
In practice, one hardly ever uses this complicated formula to compute the inverse Laplace
transform. Rather, one simply relies on tables of known Laplace transforms, coupled with
a few basic rules that will be covered in the following subsection.

See Section 16.5 for details on complex integration. The stated formula doesn’t apply to all
functions of exponential growth. A more universally valid inverse Laplace transform formula is
obtained by shifting the complex contour to run from b − i ∞to b + i ∞for some b > a, the order
of exponential growth of f.
2/25/07 744 c (2006 Peter J. Olver
Table of Laplace Transforms
f(t) F(s)
1
1
s
t
1
s
2
t
n
n!
s
n+1
δ(t −c) e
−sc
e
αt
1
s −α
cos ωt
s
s
2
+ ω
2
sin ωt
ω
s
2
+ ω
2
e
ct
f(t) F(s −c)
σ(t −c) f(t −c) e
−sc
F(s)
t f(t) −F

(s)
f

(t) s F(s) −f(0)
f
(n)
(t) s
n
F(s) −s
n−1
f(0) −
−s
n−2
f

(0) − −f
(n−1)
(0)
f(t) ∗ g(t) F(s) G(s)
In this table, n is a non-negative integer, ω is any real number, while c ≥ 0 is any
non-negative real number.
2/25/07 745 c (2006 Peter J. Olver
The Laplace Transform Calculus
The first observation is that the Laplace transform is a linear operator, and so
L[ f + g ] = L[ f ] +L[ g ], L[ cf ] = c L[ f ], (13.143)
for any constant c. Moreover, just like its Fourier progenitor, the Laplace transform
converts calculus into algebra. In particular, differentiation turns into multiplication by
the transform variable, but with one additional term that depends upon the value of the
function at t = 0.
Theorem 13.35. Let f(t) be continuously differentiable for t > 0 and have expo-
nential growth of order a. If L[ f(t)] = F(s) then, for s > a,
L[ f

(t)] = s F(s) −f(0). (13.144)
Proof : The proof relies on an integration by parts:
L[ f

(t)] =


0
f

(t) e
−st
dt = f(t) e
−st


t =0
+ s


0
f(t) e
−st
dt
= lim
t →∞
f(t) e
−st
−f(0) + s F(s).
The exponential growth inequality (13.141) implies that first term vanishes for s > a, and
the remaining terms agree with (13.144). Q.E.D.
Example 13.36. According to Example 13.31, the Laplace transform of the function
sin ωt is
L[ sin ωt ] =
ω
s
2
+ ω
2
.
Its derivative is
d
dt
sin ωt = ω cos ωt,
and therefore
L[ ω cos ωt ] = s L[ sin ωt ] =
ωs
s
2
+ ω
2
,
since sin ωt vanishes at t = 0. The result agrees with (13.137). On the other hand,
d
dt
cos ωt = −ω sin ωt,
and so
L[ −ω sin ωt ] = s L[ cos ωt ] −1 =
s
2
s
2
+ ω
2
−1 = −
ω
2
s
2
+ ω
2
,
again in agreement with the known formula.
2/25/07 746 c (2006 Peter J. Olver
Remark: The final term −f(0) in (13.144) is a manifestation of the discontinuity in
f(t) at t = 0. Keep in mind that the Laplace transform only applies to functions with
f(t) = 0 for all t < 0, and so if f(0) = 0, then f(t) has a jump discontinuity of magnitude
f(0) at t = 0. Therefore, by the calculus of generalized functions, its derivative f

(t) should
include a delta function term, namely f(0) δ(0), which accounts for the additional constant
term in its transform. In the practical approach to the Laplace transform calculus, one
suppresses the delta function when computing the derivative f

(t). However, its effect
must reappear on the other side of the differentiation formula (13.144), and the upshot is
the extra term −f(0).
Laplace transforms of higher order derivatives are found by iterating the first order
formula (13.144). For example, if f ∈ C
2
, then
L[ f
′′
(t)] = s L[ f

(t)] −f

(0) = s
2
F(s) −s f(0) −f

(0). (13.145)
In general, for an n times continuously differentiable function,
L[ f
(n)
(t)] = s
n
F(s) −s
n−1
f(0) −s
n−2
f

(0) − −f
(n−1)
(0). (13.146)
On the other hand, multiplying the function by t corresponds to differentiation of its
Laplace transform (up to a change of sign):
L[ t f(t)] = −F

(s). (13.147)
The proof of this formula is left as an exercise for the reader.
Conversely, integration corresponds to dividing the Laplace transform by s, so
L
¸
t
0
f(τ) dτ

=
F(s)
s
. (13.148)
Unlike the Fourier transform, there are no additional terms in the integration formula as
long as we start the integral at t = 0. For instance,
L[ t
2
] =
1
s
L[ 2t ] =
2
s
3
, and, more generally, L[ t
n
] =
n!
s
n+1
. (13.149)
There is also a shift formula, analogous to Theorem 13.19 for Fourier transforms, but
with one important caveat. Since all functions must vanish for t < 0, we are only allowed
to shift them to the right, a shift to the left would produce nonzero function values for
some t < 0. In general, the Laplace transform of the function f(t −c) shifted to the right
by an amount c > 0, is
L[ f(t −c)] =


0
f(t −c) e
−st
dt =


−c
f(t) e
−s(t+c)
dt (13.150)
=

0
−c
f(t) e
−s(t+c)
dt +


0
f(t) e
−s(t+c)
dt = e
−sc


0
f(t) e
−st
dt = e
−sc
F(s).
In this computation, we first used a change of variables in the integral, replacing t −c by
t; then, the fact that f(t) ≡ 0 for t < 0 was used to eliminate the integral from −c to 0.
When using the shift formula in practice, it is important to keep in mind that c > 0 and
the function f(t − c) vanishes for all t < c. In the table, the factor σ(t − c) is used to
remind the user of this fact.
2/25/07 747 c (2006 Peter J. Olver
Example 13.37. Consider the square wave pulse f(t) =

1, b < t < c,
0, otherwise,
for some
0 < b < c. To compute its Laplace transform, we write it as the difference
f(t) = σ(t −b) −σ(t −c)
of shifted versions of the step function (13.139). Combining the shift formula (13.150) and
the formula (13.140) for the Laplace transform of the step function, we find
L[ f(t)] = L[ σ(t −b)] −L[ σ(t −c)] =
e
−sb
−e
−sc
s
. (13.151)
We already noted that the Fourier transform of the convolution product of two func-
tions is realized as the ordinary product of their individual transforms. A similar result
holds for the Laplace transform. Let f(t), g(t) be given functions. Since we are implicitly
assuming that the functions vanish at all negative values of t, their convolution product
(13.127) reduces to a finite integral
h(t) = f(t) ∗ g(t) =

t
0
f(t −τ) g(τ) dτ. (13.152)
In particular h(t) = 0 for all t < 0 also. Further, it is not hard to show that the convolution
of two functions of exponential growth also has exponential growth.
Theorem 13.38. If L[ f(t)] = F(s) and L[ g(t)] = G(s), then the convolution
h(t) = f(t) ∗ g(t) has Laplace transform given by the product H(s) = F(s) G(s).
The proof of the convolution theorem for the Laplace transform proceeds along the
same lines as its Fourier transform version Theorem 13.26, and is left as Exercise for the
reader.
Applications to Initial Value Problems
The key application of the Laplace transform is to facilitate the solution of initial value
problems for linear, constant coefficient ordinary differential equations. As a prototypical
example, consider the second order initial value problem
a
d
2
u
dt
2
+ b
du
dt
+ cu = f(t), u(0) = α,
du
dt
(0) = β,
(13.153)
in which a, b, c are constant. We will solve the initial value problem by applying the Laplace
transform to both sides of the differential equation. In view of the differentiation formulae
(13.144, 145),
a

s
2
L[ u(t)] −s u(0) −

u(0)

+ b

s L[ u(t)] −u(0)

+ c L[ u(t)] = L[ f(t)].
Setting L[ u(t)] = U(s) and L[ f(t)] = F(s), and making use of the initial conditions, the
preceding equation takes the form
(as
2
+ bs + c) U(s) = F(s) + (as + b) α + aβ. (13.154)
2/25/07 748 c (2006 Peter J. Olver
Thus, by applying the Laplace transform, we have effectively reduced the entire initial
value problem to a single elementary algebraic equation! Solving for
U(s) =
F(s) + (as + b) α + aβ
as
2
+ bs + c
, (13.155)
we then recover solution u(t) to the initial value problem by finding the inverse Laplace
transform of U(s). As noted earlier, in practice the inverse transform is found by suitably
massaging the answer (13.155) to be a combination of known transforms.
Example 13.39. Consider the initial value problem

u + u = 10 e
−3t
, u(0) = 1,

u(0) = 2.
Taking the Laplace transform of the differential equation as above, we find
(s
2
+ 1) U(s) −s −2 =
10
s + 3
, and so U(s) =
s + 2
s
2
+ 1
+
10
(s + 3)(s
2
+ 1)
.
The second summand does not directly correspond to any of the entries in our table of
Laplace transforms. However, we can use the method of partial fractions to write it as a
sum
U(s) =
s + 2
s
2
+ 1
+
1
s + 3
+
3 −s
s
2
+ 1
=
1
s + 3
+
5
s
2
+ 1
of terms appearing in the table. We conclude that the solution to our initial value problem
is
u(t) = e
−3t
+ 5 sin t.
Of course, the last example is a problem that you can easily solve directly. The stan-
dard method learned in your first course on differential equations is just as effective in
finding the final solution, and does not require all the extra Laplace transform machin-
ery! The Laplace transform method is, however, particularly effective when dealing with
complications that arise in cases of discontinuous forcing functions.
Example 13.40. Consider a mass vibrating on a spring with fixed stiffness c = 4.
Assume that the mass starts at rest, is then subjected to a unit force over time interval
1
2
π < t < 2π, after which it left to vibrate on its own. The initial value problem is
d
2
u
dt
2
+ 4u =

1,
1
2
π < t < 2π,
0, otherwise,
u(0) =

u(0) = 0.
Taking the Laplace transform of the differential equation, and using (13.151), we find
(s
2
+ 4) U(s) =
e
−πs/2
−e
−2πs
s
, and so U(s) =
e
−πs/2
−e
−2πs
s(s
2
+ 4)
.
Therefore, by the shift formula (13.150)
u(t) = h

t −
1
2
π

−h(t −2π),
2/25/07 749 c (2006 Peter J. Olver
where h(t) is the function with Laplace transform
L[ h(t)] = H(s) =
1
s(s
2
+ 4)
=
1
4

1
s

s
s
2
+ 4

,
which has been conveniently rewritten using partial fractions. Referring to our table of
Laplace transforms,
h(t) =

1
4

1
4
cos 2t, t > 0,
0, t < 0.
Therefore, our desired solution is
u(t) =

0, 0 ≤ t ≤
1
2
π,
1
4
+
1
4
cos 2t,
1
2
π ≤ t ≤ 2π,
1
2
cos 2t, 2π ≤ t.
Note that the solution u(t) is only C
1
at the points of discontinuity of the forcing function.
Remark: A direct solution of this problem would proceed as follows. One solves
the differential equation on each interval of continuity of the forcing function, leading to
a solution on that interval depending upon two integration constants. The integration
constants are then adjusted so that the solution satisfies the initial conditions and is
continuously differentiable at each point of discontinuity of the forcing function. The details
are straightforward, but messy. The Laplace transform method successfully bypasses these
intervening manipulations.
Example 13.41. Consider the initial value problem
d
2
u
dt
2
+ ω
2
u = f(t), u(0) =

u(0) = 0,
involving a general forcing function f(t). Applying the Laplace transform to the differential
equation and using the initial conditions,
(s
2
+ ω
2
) U(s) = F(s), and hence U(s) =
F(s)
s
2
+ ω
2
.
The right hand side is the product of the Laplace transform of the forcing function f(t)
and that of the trigonometric function k(t) =
sin ωt
ω
. Therefore, Theorem 13.38 implies
that the solution can be written as their convolution
u(t) = f ∗ k(t) =

t
0
k(t −τ) f(τ) dτ =

t
0
sin ω(t −τ)
ω
f(τ) dτ. (13.156)
where
k(t) =

sin ωt
ω
, t > 0,
0, t < 0.
2/25/07 750 c (2006 Peter J. Olver
The integral kernel k(t − τ) is known as the fundamental solution to the initial value
problem, and prescribes the response of the system to a unit impulse force that is applied
instantaneously at the time t = τ. Note particularly that (unlike boundary value problems)
the impulse only affect the solutions at later times t > τ. For initial value problems, the
fundamental solution plays a role similar to that of a Green’s function in a boundary value
problem. The convolution formula (13.156) can be viewed as a linear superposition of
fundamental solution responses induced by expressing the forcing function
f(t) =


0
f(τ) δ(t −τ) dτ
as a superposition of concentrated impulses over time.
This concludes our brief introduction to the Laplace transform and a few of its many
applications to physical problems. More details can be found in almost all applied texts
on mechanics, electronic circuits, signal processing, control theory, and many other areas,
[Ltr].
2/25/07 751 c (2006 Peter J. Olver

in the United States. Spectral analysis of non-periodic functions defined on the entire real line requires replacing the Fourier series by a limiting Fourier integral. The resulting Fourier transform plays an essential role in functional analysis, in the analysis of ordinary and partial differential equations, and in quantum mechanics, data analysis, signal processing, and many other applied fields. In Section 13.3, we introduce the most important applied features of the Fourier transform. The closely related Laplace transform is a basic tool in engineering applications. To mathematicians, the Fourier transform is the more fundamental of the two, while the Laplace transform is viewed as a certain real specialization. Both transforms change differentiation into multiplication, thereby converting linear differential equations into algebraic equations. The Fourier transform is primarily used for solving boundary value problems on the real line, while initial value problems, particularly those involving discontinuous forcing terms, are effectively handled by the Laplace transform.

13.1. Discrete Fourier Analysis and the Fast Fourier Transform.
In modern digital media — audio, still images or video — continuous signals are sampled at discrete time intervals before being processed. Fourier analysis decomposes the sampled signal into its fundamental periodic constituents — sines and cosines, or, more conveniently, complex exponentials. The crucial fact, upon which all of modern signal processing is based, is that the sampled complex exponentials form an orthogonal basis. The section introduces the Discrete Fourier Transform, and concludes with an introduction to the Fast Fourier Transform, an efficient algorithm for computing the discrete Fourier representation and reconstructing the signal from its Fourier coefficients. We will concentrate on the one-dimensional version here. Let f (x) be a function representing the signal, defined on an interval a ≤ x ≤ b. Our computer can only store its measured values at a finite number of sample points a ≤ x0 < x1 < · · · < xn ≤ b. In the simplest and, by far, the most common case, the sample points are equally spaced, and so b−a n indicates the sample rate. In signal processing applications, x represents time instead of space, and the xj are the times at which we sample the signal f (x). Sample rates can be very high, e.g., every 10–20 milliseconds in current speech recognition systems. For simplicity, we adopt the “standard” interval of 0 ≤ x ≤ 2 π, and the n equally spaced sample points† xj = a + j h, j = 0, . . . , n, where h= 4π 2j π 2(n − 1) π 2π , x2 = , . . . xj = , . . . xn−1 = . (13.1) n n n n (Signals defined on other intervals can be handled by simply rescaling the interval to have length 2 π.) Sampling a (complex-valued) signal or function f (x) produces the sample x0 = 0, x1 =

We will find it convenient to omit the final sample point xn = 2 π from consideration.

2/25/07

692

c 2006

Peter J. Olver

1 0.5

1 0.5

1 0.5

1 -0.5 -1

2

3

4

5

6 -0.5 -1

1

2

3

4

5

6 -0.5 -1

1

2

3

4

5

6

1 0.5

1 0.5

1 0.5

1 -0.5 -1

2

3

4

5

6 -0.5 -1

1

2

3

4

5

6 -0.5 -1

1

2

3

4

5

6

Figure 13.1. vector

Sampling e− i x and e7 i x on n = 8 sample points.

f = f0 , f1 , . . . , fn−1 where

T

= f (x0 ), f (x1 ), . . . , f (xn−1 ) 2j π n .

T

,

fj = f (xj ) = f

(13.2)

Sampling cannot distinguish between functions that have the same values at all of the sample points — from the sampler’s point of view they are identical. For example, the periodic complex exponential function f (x) = e i n x = cos n x + i sin n x has sampled values fj = f 2j π n = exp in 2j π n = e2 j π i = 1 for all j = 0, . . . , n − 1,

and hence is indistinguishable from the constant function c(x) ≡ 1 — both lead to the T same sample vector ( 1, 1, . . . , 1 ) . This has the important implication that sampling at n equally spaced sample points cannot detect periodic signals of frequency n. More generally, the two complex exponential signals e i (k+n)x and eikx

are also indistinguishable when sampled. This has the important consequence that we need only use the first n periodic complex exponential functions f0 (x) = 1, f1 (x) = e i x , f2 (x) = e2 i x , ... fn−1 (x) = e(n−1) i x , (13.3)

in order to represent any 2 π periodic sampled signal. In particular, exponentials e− i k x of “negative” frequency can all be converted into positive versions, namely e i (n−k)x , by the same sampling argument. For example, e− i x = cos x − i sin x 2/25/07 and e(n−1) i x = cos(n − 1) x + i sin(n − 1) x 693
c 2006 Peter J. Olver

Aliasing also has important implications for the design of music CD’s. We must sample an audio signal at a sufficiently high rate that all audible frequencies can be adequately represented. Since we cannot distinguish sampled exponentials of frequency higher than n. Remark : If f (x) is real. human appreciation of music also relies on inaudible high frequency tones. sin x and − sin 7 x.4) of the discrete Fourier sum. sticking with a purely real construction unnecessarily complicates the analysis. Note that both functions have the same pattern of sample values. off of the sample points. cos x and cos 7 x. . (13. while the bottom row compares the imaginary parts. and regard the real part of p(x) as “the” interpolating trigonometric polynomial. Olver . On the other hand. then p(x) is also real on the sample points. we will usually discard its imaginary component. In computer graphics. In fact. .4) means that the function f (x) and the sum p(x) agree on the sample points: f (xj ) = p(xj ). indicated by the dots on the graphs. you would be unable to determine which of the two graphs the particle was following. they are quite different. † 2/25/07 694 c 2006 Peter J. we only need consider a finite linear combination n−1 f (x) ∼ p(x) = c0 + c1 e i x + c2 e2 i x + · · · + cn−1 e(n−1) i x = ck e i k x k=0 (13. p(x) can be viewed as a (complex-valued) interpolating trigonometric polynomial of degree ≤ n − 1 for the sample data fj = f (xj ). In Figure 13.4) of the first n exponentials (13. the term “aliasing” is used in a much broader sense that covers a variety of artifacts introduced by discretization — particularly. and so a much higher sample rate is actually used in commercial CD design. j = 0. If you view a moving particle under a stroboscopic light that flashes only eight times. The top row compares the real parts. we compare e− i x and e7 i x when there are n = 8 sample values. hi fi aficionados complain that it was not set high enough to fully reproduce the musical quality of an analog LP record! The discrete Fourier representation decomposes a sampled function f (x) into a linear combination of complex exponentials. To avoid this unsatisfying state of affairs. This effect is commonly referred to as aliasing † . n − 1. even though their overall behavior is strikingly different. the jagged appearance of lines and smooth curves on a digital monitor.3). . . but may very well be complex-valued in between. Aliasing is the cause of a well-known artifact in movies: spoked wheels can appear to be rotating backwards when our brain interprets the discretization of the high frequency forward motion imposed by the frames of the film as an equivalently discretized low frequency motion in reverse. the former is slowly varying. However. and so we will retain the complex exponential form (13.have identical values on the sample points (13. The symbol ∼ in (13. while the latter represents a high frequency oscillation. But the sample rate that was selected remains controversial.1).1.5) Therefore.

n − 1. Now. . ζ4 = i . Olver 2/25/07 695 . as with continuous Fourier series. .7) In other words.8) is a rescaled version of the standard Hermitian dot product (3. . Sampling the basic exponentials (13.3) produces the complex vectors ωk = e i k x0 . (13. to keep the discussion on track. (13. . k = 0. . . e i k x1 . e i k x2 . . (13. e4 k π i /n . . we may reformulate the discrete Fourier series in vectorial form. . orthogonality is no accident. n n where n = 1.Since we are working in the finite-dimensional vector space C n throughout.8) Remark : The inner product (13. Proposition 13. . . though.6) The interpolation conditions (13. ωn−1 form an orthonormal basis of C n with respect to the inner product f . g ∈ Cn. We can interpret the inner product between the sample vectors f . . . g as the average of the sampled values of the product signal f (x) g(x). . .12. rather than today’s indispensable tool. .9) i. 1 ζ3 = − 2 + √ 3 2 2π 2π + i sin . c 2006 Peter J. we shall outline a direct proof. The sampled exponential vectors ω0 . . . ωn−1 . Remark : As usual.90) between complex vectors. to compute the discrete Fourier coefficients c0 . . .10) The nth power of ζn is n ζn = e2 π i /n = e2 π i = 1. Fourier analysis might have remained a mere mathematical curiosity. 3. . . Were it not for the power of orthogonality. (13. . the absolutely crucial property is the orthonormality of the basis elements ω0 . Here. so their discrete sampled counterparts are eigenvectors for a self-adjoint matrix eigenvalue problem. . e i k xn = T T 1. j =0 f . e2 k π i /n . all we need to do is rewrite its sample vector f as a linear combination of the sampled exponential vectors ω0 . n and ζ8 = √ 2 2 + √ 2 2 i. .4. 2.5) can be recast in the equivalent vector form f = c0 ω0 + c1 ω1 + · · · + cn−1 ωn−1 .g 1 = n n−1 fj gj j =0 1 = n n−1 f (xj ) g(xj ) . . details can be found in Exercise 8. cn−1 of f . . Just as the complex exponentials are eigenfunctions for a self-adjoint boundary value problem. Proof : The crux of the matter relies on properties of the remarkable complex numbers ζn = e2 π i /n = cos Particular cases include ζ2 = −1. . (13. . . ωn−1 . e2 (n−1) k π i /n . . .1.

12) The complex numbers (13. ζn . Olver . we conclude that 2 n−1 1 + z + z 2 + · · · + z n−1 = (z − ζn )(z − ζn ) · · · (z − ζn ). elementary algebra provides us with the real factorization z n − 1 = (z − 1)(1 + z + z 2 + · · · + z n−1 ). this formula can easily be extended to general integers k. starting at 1.2. There are.2. n n k = 0. . we deduce the useful formula (n−1) k k 2 = 1 + ζn + ζn k + · · · + ζn n. k = 0. the sum is equal to n if n evenly divides k and is 0 otherwise. see Figure 13. Geometrically.11) lie on the vertices of a regular unit n–gon in the complex plane. . and finishing back at ζn = 1. (13. the nth roots (13. 0 < k < n. ζn . Continuing around. which can therefore be factored: 2 n−1 z n − 1 = (z − 1)(z − ζn )(z − ζn ) · · · (z − ζn ). .13) n+k k Since ζn = ζn . . (13.11) are a complete set of roots of the polynomial z n − 1. . On the other hand. . namely the powers of ζn : k ζn = e2 k π i /n = cos 2kπ 2kπ + i sin . √ and hence ζn is one of the complex nth roots of unity: ζn = n 1. The primitive root ζn is the first vertex we encounter as we go around the n–gon in a counterclockwise direction. including 1 itself. 0. . ζn is known as the primitive nth root of unity. . n − 1. in fact. The Fifth Roots of Unity. ζn (13. 2/25/07 696 c 2006 Peter J. The complex conjugate of ζn is the “last” nth root e−2 π i /n = ζ n = 1 n−1 = ζn = e2(n−1)π i /n . Comparing the two.11) Since it generates all the others. the other roots n−1 n 2 3 appear in their natural order ζn .ζ5 2 ζ5 0 ζ5 = 1 3 ζ5 4 ζ5 Figure 13. k Substituting z = ζn into both sides of this identity. n different complex nth roots of 1.

. . ωk 1 = n n−1 fj j =0 e i k xj 1 = n n−1 fj e j =0 − i k xj 1 = n n−1 − ζn j k fj . Orthonormality of the basis vectors implies that we can immediately compute the Fourier coefficients in the discrete Fourier sum (13. v . The corresponding sampled exponential         vectors 1 1 1 1 1  i  −1  −i  ω0 =  .7. 1. The passage from a signal to its Fourier coefficients is known as the Discrete Fourier Transform or DFT for short.11). n − 1. which establishes orthonormality of the sampled exponential vectors. . whose matrix representations can be found in Exercise 5. 697 c 2006 Peter J. ω1 =  ω2 =  ω3 =  .9. . ζn k . the sampled exponential vectors (13. w = 1 v0 w0 + v1 w1 + v2 w2 + v3 w3 . 0.4) (or (13. The Discrete Fourier Transform and its inverse define mutually inverse linear transformations on the space C n . w =  1 .16) c1 = f .4) by taking inner products: ck = f . f2 = f (π).E. (13. l < n.14) Therefore.15) In other words. ζn k . ω0 = 4 (f0 + f1 + f2 + f3 ).2. . Olver 2/25/07 . 4 1 c3 = f . 1 −1 1 −1 1 −i −1 i form an orthonormal basis of C 4 with respect to the averaged Hermitian dot product     w0 v0 v1  w   where v =  . The reverse procedure of reconstructing a signal from its discrete Fourier coefficients via the sum (13.Now. 13). f1 = f 1 2 π .7)) is known as the Inverse Discrete Fourier Transform or IDFT. c2 = f . .D. ωl = 1 n n−1 jl j ζn k ζn = j =0 1 n n−1 j(k−l) ζn = j =0 0 ≤ k. k = l. k = l. the discrete Fourier coefficient ck is obtained by averaging the sampled values of the product function f (x) e− i k x .1. 4 w2 v2 w3 v3 Given the sampled function values f0 = f (0). First.12. k = 0. we construct the discrete Fourier representation f = c0 ω0 + c1 ω1 + c2 ω2 + c3 ω3 . where 1 c0 = f . ω1 = 1 (f0 − i f1 − f2 + i f3 ). in view of (13. f3 = f 3 2 π . .6) can all be written in terms of the nth roots of unity: k 2 3 (n−1) k ωk = 1. let us apply what we’ve learned to prove Proposition 13. then ζ4 = i . If n = 4. 4 (13. ζn . . ω2 = 1 (f0 − f1 + f2 − f3 ). ζn T . Example 13. . applying (13. Q. we conclude that ωk . . j =0 (13. ω3 = 4 (f0 + i f1 − f2 − i f3 ).

We interpret this decomposition as the complex exponential interpolant f (x) ∼ p(x) = c0 + c1 e i x + c2 e2 i x + c3 e3 i x that agrees with f (x) on the sample points.3 we compare the function. While the first half of the 2/25/07 698 c 2006 Peter J.4022. if f (x) = 2 π x − x2 .17) f1 = 7. the graphs in Figure 13. namely. While the trigonometric polynomials do indeed correctly match the sampled function values.4022. their pronounced oscillatory behavior makes them completely unsuitable for interpolation away from the sample points. and discrete Fourier representations (13. f2 = 9.1685. The resulting graphs point out a significant difficulty with the Discrete Fourier Transform as developed so far.18) for both n = 4 and n = 16 points. However. then f0 = 0. Re p(x) = 6. Therefore. The problem is that we have not been paying sufficient attention to the frequencies that are represented in the Fourier sum. with the interpolation points indicated.4674 e i x − 1. c3 = −2. The Discrete Fourier Representation of x2 − 2 π x..4674 cos x − 1.3 might remind you of our earlier observation that.2337 e2 i x − 2.4674.3.2337 cos 2 x − 2. c2 = −1. For instance. low and high frequency exponentials can have the same sample data.10 8 6 4 2 1 10 8 6 4 2 1 2 3 4 5 6 2 3 4 5 6 10 8 6 4 2 1 10 8 6 4 2 1 2 3 4 5 6 2 3 4 5 6 10 8 6 4 2 1 10 8 6 4 2 1 2 3 4 5 6 2 3 4 5 6 Figure 13.18) In Figure 13.2337. Olver .8696. due to aliasing. f3 = 7. the interpolating trigonometric polynomial is given by the real part of p(x) = 6. (13. Indeed. c1 = −2. this difficulty can be rectified by being a little more clever. but differ wildly in between the sample points.4674. and hence c0 = 6.4674 e3 i x .1685 − 2. (13.4674 cos 3 x.1685 − 2.

If n = 2 m is even — which is the most common case occurring in applications — then m−1 p(x) = c− m e − i mx + · · · +c−1 e −ix +c0 +c1 e ix + · · · +cm−1 e i (m−1) x = k=−m − i mx ck e i k x (13. Thus. and can be replaced by equivalent lower frequency.1685 − 2.4) by their low frequency alternatives. Graphs of the n = 4 and 16 low frequency trigonometric interpolants can be seen in Figure 13. by utilizing only the lowest frequency exponentials. with real part Re p (x) = 6. (It is a matter of personal taste whether to use e or e i m x to represent the highest frequency term. Namely. we should replace the second half of the summands in the Fourier sum (13.1685 − 4.4) represent relatively low frequencies. ω− k . The Low Frequency Discrete Fourier Representation of x2 − 2 π x. for interpolatory purposes. Thus. (13.4.22) .9348 cos x − 1. ωn−k = f . and hence less oscillatory exponentials. if 0 < k ≤ 1 n. then e− i k x and e i (n−k) x have the same sample values. we should replace (13.2337 cos 2 x. If n = 2 m + 1 is odd.10 8 6 4 2 1 10 8 6 4 2 1 2 3 4 5 6 2 3 4 5 6 10 8 6 4 2 1 10 8 6 4 2 1 2 3 4 5 6 2 3 4 5 6 10 8 6 4 2 1 10 8 6 4 2 1 2 3 4 5 6 2 3 4 5 6 Figure 13. then we take m p(x) = c− m e − i mx + · · · +c−1 e −ix +c0 +c1 e ix + · · · +cm e i mx = k=−m ck e i k x (13.4674 e− i x + 6.21) where ω− k = ωn−k is the sample vector for e− i k x ∼ e i (n−k) x . summands in (13.20) will be our choice.19) as the equivalent low frequency interpolant.2337 e− 2 i x − 2.) In both cases.17) by the equivalent low frequency interpolant p (x) = − 1. the second half do not.4. for interpolating purposes.4674 e i x . the Fourier coefficients with negative indices are the same as their high frequency alternatives: c− k = cn−k = f . we successfully 2/25/07 699 c 2006 Peter J. Returning to the previous example. but the former is of 2 lower frequency than the latter. Olver (13.

19. we retain only the 2 l + 1 = 11 lowest frequency modes. c1 . in fact. while the authentic features tend to appear in the low frequencies.19) is a real trigonometric polynomial. 0 on the sample points. . . and then discard the high frequency constituents. Think of the hiss and static you hear on an AM radio station or a low quality audio recording. (13. 2 π ]. Therefore. a very simple. . In other words. . in which the low frequency summands only appear when | k | is small. This choice will depend upon the properties of the measured signal. capture the essential features of the original signal while simultaneously eliminating the high frequency noise. instead of all n = 512 Fourier coefficients c−256 . Thus. and to remove the noise. . and is left to the discretion of the signal processor.23) where l < (n + 1) specifies the selected cut-off frequency between signal and noise. c−1 . A correct implementation of the denoising procedure is facilitated by using the unaliased forms (13. to eliminate high frequency components. In this case. . formula (13. The one design issue is the specification of a cut-off between low and high frequency. We use n = 28 = 256 sample points in the discrete Fourier representation. that is. Since ω− k = ωk . The 2 l + 1 ≪ n low frequency Fourier modes retained in (13. Olver . so that scaling them back when showing the movie in the theater has the effect of eliminating much of the extraneous noise. we plot both the original signal and the denoised version on the same graph. . In this version. we replace the full summation by l ql (x) = 1 2 k=−l ck e i k x .4). c255 . Compression and Noise Removal In a typical experimental signal. . whereas in the even version (13. as in (13. A similar idea underlies the Dolby T recording system used on most movie soundtracks: during the recording process. To compare. 2/25/07 700 c 2006 Peter J.suppress the aliasing artifacts. in favorable situations. we only compute the 11 lowest order ones c−5 .23) will. . resulting in a quite reasonable trigonometric interpolant to the given function. Summing up just those 11 exponentials produces the denoised signal q(x) = c−5 e− 5 i x + · · · + c5 e5 i x . In Figure 13. between signal and noise. but effective.5 we display a sample signal followed by the same signal corrupted by adding in random noise. .15 over the entire interval [ 0. Remark : The low frequency version also serves to unravel the reality of the Fourier representation of a real function f (x). c5 .21) implies that c− k = ck . . 20) of the trigonometric interpolant. the high frequency modes are artificially boosted. c0 . method for denoising a corrupted signal is to decompose it into its Fourier modes. noise primarily affects the high frequency modes. and so the common frequency terms c− k e− i k x + ck e i k x = ak cos k x + bk sin k x add up to a real trigonometric function. the odd n interpolant (13. the maximal deviation is less than . .20) only the highest frequency term c− m e− i m x produces a complex term — which is.

we retain only its low frequency discrete Fourier coefficients. 2/25/07 701 c 2006 Peter J.6. 7 6 5 4 3 2 1 5 6 1 2 3 4 5 6 The Original Signal Moderate Compression Compressing a Signal. to compress a signal (and.5 × 10−4 over the entire interval. In Figure 13. As before.7 6 5 4 3 2 1 1 2 3 4 5 6 7 6 5 4 3 2 1 1 2 3 4 5 6 The Original Signal 7 6 5 4 3 2 1 1 2 3 4 5 6 7 6 5 4 3 2 1 The Noisy Signal 1 2 3 4 5 6 The Denoised Signal Figure 13. video. High Compression Figure 13. the maximal deviation between the signal and the compressed version is less than 1. the faster the decay rate. respectively. The same idea underlies many data compression algorithms for audio recordings. and so discarding the high frequency terms will.5. not lead to any noticeable degradation of the signal or image. in favorable situations. remove high frequency noise). the small high frequency Fourier coefficients will be of negligible importance. 2 l + 1 = 21 and 2 l + 1 = 7 Fourier coefficients only instead of all n = 512 that would be required for complete accuracy. Thus.6. the same signal is compressed by retaining. A mathematical justification of Fourier-based compression algorithms relies on the fact that the Fourier coefficients of smooth functions tend rapidly to zero — the smoother the function. The goal is efficient storage and/or transmission of the signal. we expect all the important features to be contained in the low frequency constituents. 7 6 5 4 3 2 1 1 2 3 4 5 6 7 6 5 4 3 2 1 1 2 3 4 Comparison of the Two Denoising a Signal.23). Thus. simultaneously. digital images and. particularly. The signal is reconstructed by summing the associated truncated discrete Fourier series (13. Olver . For the case of moderate compression.

24) requires 4 real multiplications and 2 real additions. computing all the discrete Fourier coefficients (13.25) or (13. . The Fast Fourier Transform While one may admire an algorithm for its intrinsic beauty. In the early 1960’s. cn−1 .) Similarly.) The seminal observation is that if the number of sample points n = 2m In fact.26) for the imaginary part. Still. and its discovery launched the modern revolution in digital signal and data processing. even the power of orthogonality reaches its limits when it comes to dealing with truly large scale problems such as three-dimensional medical imaging or video processing. Orthogonality is the first and most important feature of many practical linear algebra algorithms. Extending these ideas to multi-dimensional data only exacerbates the problem. but his insight was not fully appreciated until modern computers arrived on the scene. † 2/25/07 702 c 2006 Peter J. or. in the real world. and hence the more extensive the range of applications. reconstruction of the sampled signal via (13. discovered† a much more efficient approach to the Discrete Fourier Transform. The resulting algorithm is known as the Fast Fourier Transform. 3 real multiplications and 5 real additions.4) requires n2 −n complex multiplications and n2 −n complex additions. Of course. In general. while each complex multiplication z w = (x + i y) (u + i v) = (x u − y v) + i (x v + y u) x v + y u = (x + y) (u + v) − x u − y v (13. the lack of any fine scale features in this particular signal means that a very high compression can be achieved — the more complicated or detailed the original signal.15) of an n times sampled signal requires a total of n2 complex multiplications and n2 − n complex additions. .while even the highly compressed version deviates at most . often abbreviated FFT. and is the critical feature of Fourier analysis.25) (13. exploiting the rather special structure of the sampled exponential vectors. the bottom line is always efficiency of implementation: the less total computation.26) will depend upon the processor’s relative speeds of multiplication and addition. 31]. Olver . (The choice of formula (13. James Cooley and John Tukey. the faster the processing. we return to the original. [44].20). aliased form of the discrete Fourier representation (13. . As a result. both computations become quite labor intensive for large n. the more Fourier modes need to be retained for accurate reproduction. [30. the key ideas can be found in Gauss’ hand computations in the early 1800’s. by employing the alternative formula (13. Note also that each complex addition z + w = (x + i y) + (u + i v) = (x + u) + i (y + v) generally requires two real additions.05 from the original signal.4). (Once one understands how the FFT works. given the Fourier coefficients c0 . In order to explain the method without undue complication. . one can easily adapt the algorithm to the low frequency version (13.

. f5 . . k k k = 0. . (13. . f2 . 2/25/07 703 c 2006 Peter J.29). . If n = 2r is a power of 2.27) can be constructed from a pair of order m discrete Fourier coefficients via ck = 1 2 −k ce + ζn co . .31) Now if m = 2 l is also even.27) 1 − − − f + f3 ζm k + f5 ζm 2 k + · · · + f2m−1 ζm (m−1) k m 1 . . k o co k+m = ck . (13. .30) Therefore. f (x2 ). m − 1. then this game can be played all the way back to the start. f (x3 ). 1 −k −2k − (m−1) k o f + f3 ζm + f5 ζm + · · · + f2m−1 ζm . f2m−1 T T = f (x0 ). which just samples the function at a single point. ck = (13. . . (13. m √ 1 equals the square of the primitive We use this fact to split the summation (13. f (x2m−2 ) = f (x1 ). . observe that the expressions in braces are the order m Fourier coefficients for the sample data f e = f0 . the order n = 2 m discrete Fourier coefficients (13. but now at the odd sample points x2j+1 . f2m−2 f o = f1 . . we are splitting the original sampled signal into two “half-sampled” signals obtained by sampling on every other point. . . . . beginning with the trivial order 1 discrete Fourier representation. The even and odd Fourier coefficients are 1 − − − ce = f0 + f2 ζm k + f4 ζm 2 k + · · · + f2m−2 ζm (m−1) k . . and we adopt the identification e ck+m = ce . . . f (x5 ). f3 . . k = 0.15) for the order n discrete Fourier coefficients k into two parts. both the even and odd samples require only m distinct Fourier coefficients. Olver . . In other words. . f (x2m−1 ) T T .is even.29) m 1 Since they contain just m data values. f4 . n − 1.28) Note that f e is obtained by sampling f (x) on the even sample points x2j . . . . f (x4 ). m − 1. then we can play the same game on the order m Fourier coefficients (13. Now. . reconstructing each of them from a pair of order l discrete Fourier coefficients — obtained by sampling the signal at every fourth point. . then the primitive mth root of unity ζm = nth root: 2 ζm = ζn . . while f o is obtained by sampling the same function f (x). collecting together the even and the odd powers of ζn : ck = 1 −k − − f + f1 ζn + f2 ζn 2 k + · · · + fn−1 ζn (n−1) k n 0 1 − − − = f0 + f2 ζn 2 k + f4 ζn 4 k + · · · + f2m−2 ζn (2 m−2) k + n −k 1 − − − + ζn f + f3 ζn 2 k + f5 ζn 4 k + · · · + f2m−1 ζn (2 m−2) k n 1 1 1 − − − + f + f2 ζm k + f4 ζm 2 k + · · · + f2m−2 ζm (m−1) k = 2 m 0 + − ζn k 2 (13. . k m k = 0.

. are the discrete Fourier coefficients of our signal.35) in which ζ2k+1 is the primitive 2k+1 root of unity. c7 = f7 . The final output of the iterative procedure. . The bit operations (13. j1 j0 .27) in a different notation. . .The result is the desired algorithm. its discrete Fourier coefficients c0 . which we now present in its final form. namely (r) (13. if r = 5. . .32) the notation is shorthand for its r digit binary expansion j = j0 + 2 j1 + 4 j2 + 8 j3 + · · · + 2r−1 jr−1 . and so our signal has n = 23 = 8 sampled values f0 . where jν = 0 or 1. j0 . . jk+1 1 jk−1 .36) j = 0. f7 . n − 1. cj = cj . The subsequent steps successively combine the Fourier coefficients of the appropriate even and odd sampled subsignals together. while βk (j) sets it to 1. j0 = αk (j) + 2k . cn−1 are computed by the following iterative algorithm: cj (0) = fρ(j) . . . The preprocessing step of the algorithm. (k) (k) j = 0. produces a more convenient rearrangement of the sample values. c4 = f1 . j2 j1 j0 ) = j0 j1 j2 .34) In other words. . and j = 13. (13. . k th for j = jr−1 jr−2 . . We then define the bit reversal map ρ(jr−1 jr−2 . . .33. jr−2 jr−1 . then ρ(j) = 22 has the reversed binary representation 10110. . . . it helps to write out each index 0 ≤ j < 2r in its binary (as opposed to decimal) representation j = jr−1 jr−2 . (13.33) For instance. . The following example should help make the overall process clearer. . c5 = f5 . αk (j) sets the binary digit of j to 0. f1 . . . 2/25/07 704 c 2006 Peter J. j2 j1 j0 . Secondly. for each 0 ≤ k < r. To efficiently program the Fast Fourier Transform. with 5 digit binary representation 01101. . . βk (j) = jr−1 . c6 = f3 . Consider the case r = 3. Example 13. . we arrive at the Fast Fourier Transform. . define the maps αk (j) = jr−1 . . α2 (13) = 9. Olver (0) (0) (0) (0) (0) (0) (0) (0) . n − 1. c1 = f4 . (13. c2 = f2 . . 34) are especially easy to implement on modern binary computers. c3 = f6 . fn−1 . . . After some rearrangement of the basic steps. We begin the process by rearranging the sample values c0 = f0 . . . . . . cj (k+1) = 1 2 −j cαk (j) + ζ2k+1 cβk (j) . We begin with a sampled signal on n = 2r sample points. k = 0.3. In the preceding example. while β2 (13) = 13 with binary form 01101. . . Given a sampled signal f0 . . jk+1 0 jk−1 . with binary form 01001. reproducing (13. Note especially that the bit reversal map ρ = ρr depends upon the original choice of r = log2 n. . r − 1. (0) where we define cj . (13. . .

The first stage of the iteration is based on ζ2 = −1. many large scale computations that would be intractable using the direct approach are immediately brought into the realm of feasibility. n log2 n is significantly smaller than n2 . 048. we apply the slightly modified iterative procedure fj (0) = cρ(j) . . Therefore. In the last step. which is the number of operations required for the direct algorithm. the final output 1 c0 = c0 = 2 (c0 + c4 ). or. k = 0. c5 = c5 = c6 = c6 = (3) (3) 1 2 1 2 1 2 (3) (2) (2) √ c1 − c2 + c3 + (2) (2) (2) (2) . We find 1 c0 = 1 (c0 + c2 ). c3 = 1 (c2 − c3 ). r − 1. . The reconstruction of the signal from the discrete Fourier coefficients c0 . fj (k+1) j = fαk (j) + ζ2k+1 fβk (j) .37) Peter J. . c2 = 1 (c0 − c2 ). Let us count the number of arithmetic operations required in the Fast Fourier Transform algorithm. c7 = c7 = (3) 2 2 (1 (2) i c6 √ 2 2 (1 − i ) c5 . c3 = 1 (c1 + i c3 ). c5 = 1 (c4 − c5 ). . + i ) c7 (2) c4 = c4 = 1 (c0 − c4 ). while n log2 n = 10. 2 2 2 1 1 c4 = 2 (c4 + c5 ). ρ(011) = 110. However. (k) (k) j = 0. . . (Actually. . c 2006 (13. this does not significantly alter the final operations count. For instance ρ(3) = 6. in binary notation. This is the reason why all modern implementations of the Discrete Fourier Transform are based on the FFT algorithm and its variants. Now. c6 = 1 (c4 − c6 ). and drop the factors of 2 since there is no need to divide by n in the final result (13.35) gives 1 c0 = 1 (c0 + c1 ). . . √ where k = 2 and ζ8 = 22 (1 + i ). + i ) c7 (2) .) There are r = log2 n stages. 2 . we must perform n = 2r complex additions/subtractions and the same number of complex multiplications. we combine coefficients whose indices differ by 4 = 22 . The second stage of the iteration has k = 1 with ζ4 = i . As a result. 2 2 (1) (0) (0) (1) (0) (0) (1) (0) (0) (1) (0) (0) (1) (0) (0) (1) (0) (0) (1) (0) (0) (1) (0) (0) where we combine successive pairs of the rearranged sample values. c1 = 1 (c0 − c1 ). c2 = 2 (c2 + c3 ). 2 2 2 Note that the indices of the combined pairs of coefficients differ by 2. and so we require a total of r n = n log2 n complex additions/subtractions and the same number of multiplications. . (3) (2) (2) √ c1 = c1 = c2 = c2 = c3 = c3 = (3) (3) (3) 1 2 1 2 1 2 c1 + c2 − c3 − (2) (2) (2) 2 2 (1 (2) i c6 √ 2 2 (1 − i ) c5 . c7 = 1 (c5 + i c7 ).4). Equation (13. the number of multiplications is slightly smaller since multiplications by ± 1 and ± i are extremely simple. (2) . The only differences are that we replace ζn = ζn 1 by ζn . 2 2 2 (2) (1) (1) (2) (1) (1) (2) (1) (1) (2) (1) (1) (2) (1) (1) (2) (1) (1) (2) (1) (1) (2) (1) (1) 1 c4 = 1 (c4 + c6 ). For instance. c6 = 2 (c6 + c7 ). 024. At each stage in the computation. c5 = 2 (c5 − i c7 ). 240 — a net savings of 99%. . then n2 = 1. if n = 210 = 1.in the order specified by the bit reversal map ρ. cn−1 is −1 speeded up in exactly the same manner. c7 = 1 (c6 − c7 ). is the complete set of discrete Fourier coefficients. when n is large. 576. n − 1. Olver 2/25/07 705 . . c1 = 2 (c1 − i c3 ).

f5 = f2 (0) (0) (0) = c5 . f6 (0) (0) = c3 . n − 1. − i f7 . Trigonometric Fourier series. . . π ]. which were computed in Example 13. the sampled signal values are f (x0 ) = f0 f (x2 ) = f2 (3) (3) + f4 . f3 (1) (1) (0) = c6 . The basis functions e i k x = cos k x + i sin k x are spread out over the entire interval [ − π. c 2006 (13. (1) (0) (0) (0) = c2 . c7 . f5 = f0 = f2 (2) (2) = f5 + i f7 . Indeed. f7 (0) (0) (0) = c7 . f2 (0) (1) (1) + f3 . and so has all the advantages of the Fourier trigonometric functions. Olver 2/25/07 706 . . . f2 + f1 . adapts to localized structures in signals. and so are not well-suited to processing localized signals — meaning data that are concentrated in a relatively small regions. (2) f (x5 ) = f5 (3) (3) (2) (2) − f4 .4. 1 ]. . Ideally. (1) (1) (0) (0) f4 Next. (2) 13. − + √ 2 2 (2) (1 + i ) f5 .2. .61) and its high degree of localization is completely obscured. f6 = f4 f7 = f5 Finally. but. f (x7 ) = f7 (3) (2) − i f6 . the most concentrated data of all — a single delta function — has every Fourier component of equal magnitude in its Fourier series (12. + √ 2 2 (2) f (x4 ) = f4 (2) = f0 = f1 = f2 = f3 (2) f (x1 ) = f1 f (x3 ) = f3 = f1 = f3 (1 + i ) f5 . both continuous and discrete.3.and finish with f (xj ) = fj = fj .38) Example 13. This dream was the inspiration for the development of the modern theory of wavelets. We consider the space of functions (signals) defined the interval [ 0. f4 (0) (0) = c1 . The Haar Wavelets Although the modern era of wavelets started in the mid 1980’s. . + f5 . one would like to construct a system of functions that is orthogonal. can be implemented as follows. (r) j = 0. Then we begin combining them in successive pairs: = f0 (0) (0) f1 = f0 (0) (0) − f1 . (13. f0 = f4 f5 = f4 − f5 . f2 (1) (2) (2) = f0 (1) (1) f3 (2) (2) = f1 (1) (1) f4 = f4 + f6 . − √ 2 2 f (x6 ) = f6 (2) (1 − i ) f7 . The reconstruction formulae in the case of n = 8 = 23 Fourier coefficients c0 . − f6 . f1 f0 (1) (1) (0) = c4 . the simplest example of a wavelet basis was discovered by the Hungarian mathematician Alfr´d Haar in 1910. − f7 . √ 2 2 (1 − i ) f7 . + f7 . − f2 . f6 (1) = f6 f7 = f6 (2) (2) = f0 (1) (1) + f2 . − i f3 . are amazingly powerful. but they do suffer from one potentially serious defect. . Wavelets. equipped with the standard L2 inner product 1 f . f1 (1) (2) (2) = f1 (1) (1) + i f3 .g = 0 f (x) g(x) dx. in addition. e [88].39) Peter J. (3) (1) (1) (0) f3 (1) (1) = f2 − f3 . First. we rearrange the Fourier coefficients in bit reversed order: f0 (0) = c0 . (2) (2) (3) (3) (2) (2) + i f6 .

1 1 0.2 -0.6 0.5 -0.8 1 1. 1. Its values at the points of discontinuity.6 0.4 0. π ] or [ 0. it will be more convenient to consistently choose the left hand limiting value. to be zero outside the basic interval. The third and fourth Haar functions are compressed 2/25/07 707 c 2006 Peter J. This choice is merely for convenience.2 0. 0 < x ≤ 1.5 -0. .4 0. i.2 0.41) ϕ2 (x) = w(x) = −1. and all the other wavelets. Moreover. is not critical.8 1 1.6 0. The second Haar function  1  1.2 -0.6 0.5 0.7. 0.e. ϕ4 (x) The First Four Haar Wavelets.. 2 < x ≤ 1. Although we are only interested in the value of ϕ(x) on the interval [ 0.2 0.2 0. being slightly better suited to our construction than [ − π. 2 π ]. it will be convenient to extend it. otherwise.5 0.5 -0.2 0.2 0.2 -0. (13.7 The first is the box function ϕ1 (x) = ϕ(x) = 1.   0. The Haar wavelets are certain piecewise constant functions.8 1 1.5 -1 -1 ϕ3 (x) Figure 13.2 0. the usual scaling arguments can be used to adapt the wavelet formulas to any other interval.5 -0.40) is known as the mother wavelet.5 -1 -1 ϕ1 (x) 1 1 ϕ2 (x) 0. Olver known as the scaling function.  1 (13. The first four are graphed in Figure 13. for reasons that shall appear shortly.2 -0.8 1 1. otherwise. unlike the Fourier series midpoint value.4 0. but. 0. 0 < x ≤ 2.2 0.4 0. 1 ].

10. otherwise. 4  1 1 ϕ3 (x) = w(2 x) = −1. by direct evaluation of the integrals. .   0. Indeed. The scaling transformation x → 2 x serves to compress the wavelet function. is equal to the averaged dot product of their sample values — the real form of the inner product (13. −1.8) that was used in the discrete Fourier transform. (13. ϕ4 (x) = w(2 x − 1) =   1. we can represent the mother wavelet by compressing and translating the scaling function.43) that form the orthogonal wavelet basis of R 4 we encountered in Examples 2. then their L2 inner product 1 f .g = 0 f (x) g(x) dx = 1 4 f1 g1 + f2 g2 + f3 g3 + f4 g4 = 1 4 f · g. . g2 . 2/25/07 708 c 2006 Peter J. we can uniquely decompose any such piecewise constant function as a linear combination of wavelets f (x) = c1 ϕ1 (x) + c2 ϕ2 (x) + c3 ϕ3 (x) + c4 ϕ4 (x). One can easily check. f2 . that the four Haar wavelet functions are orthogonal with respect to the L2 inner product (13. The Haar wavelets have an evident discretization.called daughter wavelets.39). 4 .39). 1 ] into the four subintervals 1 0.1 . we obtain the wavelet sample vectors         1 1 1 0 1  1  −1   0 v1 =  .44) . Furthermore.4 . 0.  1.1 .    1 2 3 4 < x ≤ 3. g4 ) on which the four wavelet functions are constant. then we can represent each of them by a vector in R 4 whose entries are the values of each wavelet function sampled at the left endpoint of each subinterval.42) It is these two operations of scaling and compression — coupled with the all-important orthogonality — that underlies the power of wavelets.: w(x) = ϕ(2 x) − ϕ(2 x − 1). f3 . If we decompose the interval ( 0.43).44) with respect to the standard Euclidean dot product is equivalent to orthogonality of the Haar wavelet functions with respect to the inner product (13.35 and 5. (13. Orthogonality of the vectors (13. Olver . 1 4 versions of the mother wavelet:  0 < x ≤ 1. 1 −1 0 1 1 −1 0 −1 are piecewise constant real functions that achieve the indicated values on the four subintervals (13. f4 ) and g(x) ∼ g = (g1 . g3 . if f (x) ∼ f = (f1 . 3 4 . Since the vectors (13.44) form an orthogonal basis of R 4 . 2 1 2 3 . 4 < x ≤ 2. while the translation 2 x → 2 x − 1 moves the compressed version to the right by a half a unit. otherwise. v2 =  v3 =  v4 =  (13. 4 < x ≤ 1. . In this manner.

the full Haar wavelet basis is obtained from the mother wavelet by iterating the scaling and translation processes. written supp f .45) Peter J. ϕk f · vk = ϕk 2 vk 2 1 2 1 2 are fixed by our usual orthogonality formula (5.or. (f3 − f4 ). c3 = c4 = (f1 − f2 ). even though w(0) = 0. We begin with the scaling function ϕ(x). a point a ∈ supp f if and only if there exist a convergent sequence xn → a such that f (xn ) = 0. (f1 + f2 − f3 − f4 ). The key requirement for a wavelet basis is that it contains functions with arbitrarily small support.5. scaling x by a factor r compresses the support of the function by a factor 1/r.41) is supp w = [ 0. c1 = c2 = 1 4 1 4 (f1 + f2 + f3 + f4 ). r r In other words. More precisely: Lemma 13. Definition 13.1 .7). Lemma 13. 2 . and so are twice as localized. The effect of scalings and translations on the support of a function is easily discerned. the support of the Haar mother wavelet (13. let us introduce an important analytical definition that quantifies precisely how localized a function is. then a ∈ supp f . while translating x translates the support of the function. 2/25/07 709 c 2006 (13. Intuitively. For example. equivalently. the smaller the support of a function. An extreme case is the delta function. To this end. a ∈ supp f if and only if f (x) ≡ 0 on an interval a − δ < x < a + δ for some δ > 0. The required coefficients ck = f . or at least is not zero at nearby points. supp ϕ4 = 1 2 . because w(x) = 0 at nearby points. Before proceeding to the more general case. If f (a) = 0. whose support is a single point. In contrast.7. More generally. Explicitly. Conversely. The support of a function f (x). provided f is not zero there. then supp g = a+δ b+δ . is the closure of the set where f (x) = 0. . The two daughter wavelets have smaller support: 1 supp ϕ3 = 0 . Thus.6. the support of the Fourier trigonometric basis functions is all of R. the more localized it is. f = c1 v1 + c2 v2 + c3 v3 + c4 v4 . and g(x) = f (r x − δ). b ]. a point will belong to the support of f (x). in terms of the sample vectors. Olver . since they only vanish at isolated points. If supp f = [ a. 1 ] — the point x = 0 is included.

j = 0.7 implies that supp wj. so that supp wj.1 (x) = w(4 x − 1).2 . 2j − 1. each of length 2−j . . appends four additional granddaughter wavelets to our basis: w2.k (x) = wj.42). 1 ] by 2j subintervals. just consists of the mother wavelet w0. wj. The wavelet functions ϕ(x). wj.from which we construct the mother wavelet via (13. .0 (x) = w(2j x). w0. w1.0 (x) = w(4 x). and so the combined supports of all the j th generation of wavelets is the entire interval: supp wj. taking the successive sample values indicated by the columns of the 8 matrix   1 1 1 0 1 0 0 0 1 1 0 −1 0 0 0 1   1 −1 0 0 1 0 0 1   1 −1 0 0 −1 0 0 1 W8 =  (13.46) and then translating wj. w2. The primal generation. 1.0 (x) = w(2 x).0 . we form the wavelet offspring by first compressing the mother wavelet so that its support fits into an interval of length 2−j .k (x) for 0 ≤ j ≤ n and 0 ≤ k < 2j . The first generation.0 (x) = w(x). the usual terminological constraints of Definition 5.k = [ 0. Theorem 13. w2. 0 1 0 0 1 0  1 −1   0 1 0 0 −1 0  1 −1   1 −1 0 −1 0 0 0 1 1 −1 0 −1 0 0 0 −1 Orthogonality of the wavelets is manifested in the orthogonality of the columns of W8 .0 so as to fill up the entire interval [ 0. . namely w1.0 (x − k) = w(2j x − k). For any “generation” j ≥ 0.3 are constant on the 8 subintervals of length 1 .18 prevent us from calling W8 an orthogonal matrix because its columns are not orthonormal!) The nth stage consists of 2n+1 different wavelet functions comprising the scaling functions and all the generations up to the nth : w0 (x) = ϕ(x) and wj. defining wj. 2/25/07 710 c 2006 Peter J.2 (x) = w(4 x − 2). w1. consists of the two daughter wavelets already introduced as ϕ3 and ϕ4 .1 . where k = 0. w2. (13.8. (Unfortunately.0 . w2. 2−j ]. 2j −1 [ k=0 (13. Olver . The 8 Haar wavelets ϕ.1 .3 (x) = w(4 x − 3). j = 1. j = 2.k = [ 2−j k. 2−j (k + 1) ]. w1.k (x) form an orthogonal system with respect to the inner product (13.47) Lemma 13. The second generation.48) . They are all constant on each subinterval of length 2−n−1 .0 . w2.39).1 (x) = w(2 x − 1). 1 ] . w2.0 = [ 0. w2.

Therefore. f (x) dx − 2 j 2−j k+2−j−1 The convergence properties of the wavelet series (13. wj. (13. (13.ϕ = ϕ 2 1 f (x) dx. wl.k wj.k (x) wl. except in the case when the two wavelets are identical. . say j ≤ l.k (x)2 dx = 2−j . cj. have supports which are either disjoint.m = 0 wj.k . it takes quite a few wavelets to begin to accurately reproduce the signal. In Figure 13. details can be found [52]. and so their inner product is clearly zero: 1 wj.k and wl. wl. (13.49) since the + 1 and − 1 contributions cancel each other.49). the support of wl. wj.k (x).9. ϕ = 0 wj.k 2 = 0 wj.k .50) The second formula follows from the fact that | wj. 1 1 wj.52) f (x) dx.2 −j−1 Proof : First. 0 2−j k+2−j−1 2−j k 2−j (k+1) f .E.m (x). since the wavelets are themselves discontinuous. Otherwise. we plot the Haar expansions of the signal in the first plot. the wavelet series of a signal f (x) is given by ∞ 2j −1 cj. which is considerably more than would be required in a comparably accurate Fourier expansion (excluding points very close to the discontinuity). we combine a total of 26 = 64 Haar wavelets. or just overlap at a single point.k (x) is equal to + 1 on an interval of length and to − 1 on an adjacent interval of the same length. .k 2 (13.m is entirely contained in an interval where wj. 5.k is constant and so wj.k = 2j = wj. by (13. then their product wj. 3.k (x) wl. The next plots show the partial sums over j = 0.D. Example 13.51) f (x) ∼ c0 ϕ(x) + j =0 k=0 Orthogonality implies that the wavelet coefficients c0 . we compute 1 2 0 wj. .m = Finally.k can be immediately computed using the standard inner product formula coupled with (13. .k (x) wl.k (x) wl. On the other hand. note that each wavelet wj.51) are similar to those of Fourier series. 1 wj. In the last plot.m (x) = ± wl. Indeed. 6. Q. In direct analogy with the trigonometric Fourier series.k f . Therefore. We have used a discontinuous signal to demonstrate that there is no nonuniform Gibbs phenomenon in a Haar wavelet expansion.m (x) dx = 0. they do not have any difficult uniformly converging to a discontinuous function.m (x) dx = 0. 2/25/07 711 c 2006 Peter J. If two different wavelets wj.k (x) dx = 0. Olver .50): c0 = cj.k (x) | = 1 on an interval of length 2−j and is 0 elsewhere. ϕ = 0 dx = 1.m (x) dx = ± 1 0 wl. 4.k .m with. r with r = 2.m (x) ≡ 0.8.

.4 0. Olver .53) Since we are only sampling on intervals of length 2−r .k (x) for j = 0.4 0. or 0.8 1 7 6 5 4 3 2 1 0. these are the columns of the wavelet matrix (13. . so n = 8.8 1 7 6 5 4 3 2 1 0.6 0. r − 1 and k = 0. . f1 . .48). so there is a discrete wavelet transformation for suitably sampled signals. Just as the discrete Fourier representation arises from a sampled version of the full Fourier series.4 0. Haar Wavelet Expansion. . f (x1 ).2 0. . As before.2 0. . 1 ]. .4 0.) Moreover. . f (xn−1 ) T = f0 .6 0.6 0. To take full advantage of the wavelet basis. . . all of their entries are either +1. On the other hand.k (x) denote the corresponding sampled wavelet vectors. accurate reproduction of functions usually does require many more Haar wavelets. (When r = 3. −1. This handicap makes the Haar system of less use in practical situations. but less conveniently.k ∼ wj. . if necessary. (13. be numbered in order 1.2 0. for k = 0.4 0.7 6 5 4 3 2 1 0. . . and so could.8 1 7 6 5 4 3 2 1 0. fn−1 T ∈ Rn. on the interval [ 0. 2.2 0. the discrete wavelet transform of our sampled signal will only use the first n = 2r wavelets ϕ(x) and wj. . .6 0.k . orthogonality of the wavelets immediately implies that the wavelet vectors form an orthogonal basis of R n with n = 2r . Remark : To the novice.53) as a linear combination of the sampled wavelets.8 1 7 6 5 4 3 2 1 0. . 3.6 0. . we can identify the sampled signal with the vector f = f (x0 ). . f = c0 w0 + 2/25/07 r−1 2j −1 j =0 k=0 cj. Let w0 ∼ ϕ(x) and wj. 2j − 1.4 0. and served to motivate the search for a more sophisticated choice of wavelet basis.6 0. The point is that they both form a countably infinite set of functions.8 1 Figure 13.54) 712 Peter J.2 0. .k wj. there may appear to be many more wavelets than trigonometric functions. we sample the signal f (x) at n = 2r equally spaced sample points xk = k/2r .8 1 7 6 5 4 3 2 1 0. . But this is just another illusion of the magic show of infinite dimensional space.2 0.8. . . We can decompose our sample vector (13. c 2006 (13. n − 1. .

and therefore we first concentrate on the properties of the scaling function.42). storage and recognition of fingerprints in the FBI’s data base. The recipe for any wavelet system involves two basic ingredients — a scaling function and a mother wavelet.k .k 2  fi − fi  . rather. Olver . orthogonality and accurate reproduction of simple functions. its sample vector f . k+2r−j−1 −1 i=k k+2r−j −1 i = k+2r−j−1 These are the basic formulae connecting the functions f (x). The key requirement is that the scaling function must solve a 2/25/07 713 c 2006 Peter J. the Dutch mathematician Ingrid Daubechies produced the first examples of wavelet bases that realized all three basic criteria. we are interpolating the sample points by a piecewise constant (and thus discontinuous) function. when. wj.  (13.k = 2j−r  = wj. The breakthrough came in 1988. by our usual orthogonality formulae. and its discrete wavelet transform consisting of the 2r coefficients c0 . and hence a relatively extensive collection of Haar wavelets. as our signal at the left hand endpoint of the interval. which.where.k wj. The latter can be constructed from the scaling function by a prescription similar to that in (13. f . compression and denoising algorithms based on Haar wavelets are either insufficiently precise or hopelessly inefficient. In other words. The reconstructed function f (x) = c0 ϕ(x) + r−1 2j −1 j =0 k=0 f .k (x) (13. we will present a brief outline of the basic ideas underlying Daubechies’ remarkable construction. wavelets have developed into a sophisticated and burgeoning industry with major impact on modern technology. In this section. and so even an affine function y = α x + β requires many sample values. For a long time it was thought that it was impossible to simultaneously achieve the requirements of localization. thesis. The reason for this is that the Haar wavelets are piecewise constant. and the JPEG2000 image format. Significant applications include compression. w0 1 c0 = = r 2 w0 2 cj. and has the same value f (x) = f (xi ) = f (xi ) = fi . in her Ph. Since then.k 2r −1 i=0 fi .55) cj. unlike earlier Fourier-based JPEG standards.D. Modern Wavelets The main defect of the Haar wavelets is that they do not provide a very efficient means of representing even very simple functions — it takes quite a large number of wavelets to reproduce signals with any degree of precision. In particular. incorporates wavelet technology in its image compression and reconstruction algorithms. to be accurately reproduced. cj. or. and hence of minor practical value. xi = 2−r i ≤ x < xi+1 = 2−r (i + 1).56) is constant on each subinterval of length 2−r .

. Olver . dilation equation of the form p The Hat Function. . 2 − x. namely ϕ(x) = ϕ(2 x) + ϕ(2 x − 1). Even to prove that (nonzero) solutions exist is a nontrivial analytical problem.1.9. cp . c2 = 1 2. otherwise. .59) ϕ(x) =  0. The dilation equation relates the function ϕ(x) to a finite linear combination of its compressed translates. . . Another example of a scaling function is the hat function  0 ≤ x ≤ 1.10.2 -2 -1 -0. The hat function satisfies the dilation equation ϕ(2 x) + ϕ(2 x − 1) + = 1. (13. ϕ(x) = which is (13. . (13. The coefficients c0 . whose variants play a starring role in the finite element method.6 0.167). (13. Indeed. 1 2 graphed in Figure 13.58) We recommend that you convince yourself of the validity of this identity before continuing. and.57) for some collection of constants c0 . Example 13.60) 1 2 . is not so easy to solve.11. Since we already know two 2/25/07 714 c 2006 Peter J. The Haar or box scaling function (13.8 0. 1 2 ϕ(2 x − 2). (11.57) with c0 = this identity by hand. 1 ≤ x ≤ 2. since the properties of orthogonality and localization will impose certain rather stringent requirements. .40) satisfies the dilation equation (13. Example 13. the reader should be able to check The dilation equation (13.  x. as such. cf.57) with c0 = c1 = 1.2 1 2 3 4 Figure 13.9. ϕ(x) = k=0 ck ϕ(2 x − k) = c0 ϕ(2 x) + c1 ϕ(2 x − 1) + · · · + cp ϕ(2 x − p) (13.4 0. . c1 Again. the mathematics of functional equations remains much less well developed than that of differential equations or integral equations.2 1 0.57) is a kind of functional equation. cp are not arbitrary.

57). 2/25/07 715 c 2006 Peter J. b ]. a (13. in principle.62) In the general framework. Once we impose the constraint (13. Other solutions.12. Let us investigate what sort of conditions should be imposed on the dilation coefficients c0 . The daughter wavelets are then all found. localization of the wavelets requires that the scaling function has bounded support. 1 ] and so j and k can. dy. in light of its dilation equation (13. be arbitrary integers. .65) Example 13. with dx = ∞ −∞ 1 2 ϕ(2 x − k) dx = b 1 2 ∞ −∞ ϕ(y) dy = 1 2 ϕ(x) dx. such as ϕ(x) = 1/x. Up to constant multiple. (13. we define the mother wavelet to be p w(x) = k=0 (−1)k cp−k ϕ(2 x − k) (13.66) with bounded support are scalar multiples of the delta function δ(x).64) where we revert to x as our (dummy) integration variable.k (x) = w(2j x − k). We substitute this result back into (13. If we integrate both sides of (13. and thus not useful for constructing a wavelet basis. by iteratively compressing and translating the mother wavelet: wj. cp in order that we obtain a viable wavelet basis by this construction. This formula directly generalizes the Haar wavelet relation (13. Given a solution to the dilation equation. Assuming that satisfy a ϕ(x) dx = 0.63) Now using the change of variables y = 2 x − k. as in the Haar basis. .61) = cp ϕ(2 x) − cp−1 ϕ(2 x − 1) + cp−2 ϕ(2 x − 2) + · · · ± c0 ϕ(2 x − p). let us defer the discussion of solution techniques until we understand how the dilation equation can be used to construct a wavelet basis. and so ϕ(x) ≡ 0 when x lies outside some bounded interval [ a. . (13.66) where c0 = 2 is the only (nonzero) coefficient. we find b (13. Olver .5. are not localized. we discover that the dilation coefficients must c0 + · · · + cp = 2. we find b ϕ(x) dx = a ∞ −∞ p ϕ(x) dx = k=0 ck ∞ −∞ ϕ(2 x − k) dx.2. First.63). . we do not necessarily restrict our attention to the interval [ 0.42).65). the very simplest version of the dilation equation is ϕ(x) = 2 ϕ(2 x) (13. which follows from the identity in Exercise 11. the only “solutions” of the functional equation (13.explicit examples.58).

0 1 2 c0 c2 = 0.65. 0. then the left hand side of this identity will be 0 unless m = 0. ϕ(x + k − l) 2 (13.k = 0 cj ck ϕ(x) .The second condition we require is orthogonality of the wavelets. orthogonality requires that c2m+k ck = 0 ≤ k ≤ p−2m 2.g = ∞ −∞ f (x) g(x) dx. m = 0. (13. ϕ(x + j − 2 m − k) .57). the functions have bounded support. then (13. 2/25/07 716 c 2006 Peter J.69) = j. If we require orthogonality (13. c1 . c2 . (13. we only consider the standard L2 inner product† f . ϕ(x − m) = p j =0 cj ϕ(2 x − j) .k = 0 cj ck 1 ϕ(2 x − j) . For example. ϕ(2 x − 2 m − k) = 2 j. In all instances.67) 1 2 −∞ ∞ ϕ(2 x − k) ϕ(2 x − l) dx ϕ(x) ϕ(x + k − l) dx = 1 ϕ(x) . p p ϕ(x) . 70) reduce to c0 + c1 = 2. if we have just two nonzero coefficients c0 . For simplicity. Olver .70) require c0 + c1 + c2 = 2.65. since ϕ satisfies the dilation equation (13. † c2 + c2 + c2 = 2.65). c2 + c2 = 2.58).70) The algebraic equations (13. 70) for the dilation coefficients are the key requirements for the construction of an orthogonal wavelet basis. then (13. ϕ(x − m) = We first note the formula ϕ(2 x − k) . m = 0. and so the inner product integral can be reduced to an integral over a finite interval where both f and g are nonzero. (13. Therefore.67) of all the integer translates of ϕ. If we have three coefficients c0 . k=0 ck ϕ(2 x − 2 m − k) p (13. resulting in the Haar dilation equation (13.64). c1 . 0 1 and so c0 = c1 = 1 is the only solution. ϕ(2 x − l) = = ∞ ∞ −∞ ϕ(x) ϕ(x − m) dx = 0 for all m = 0.68) −∞ follows from the same change of variables y = 2 x − k used in (13. It turns out that the orthogonality of the complete wavelet system is guaranteed once we know that the scaling function ϕ(x) is orthogonal to all its integer translates: ϕ(x) . while only the summands with j = 2 m + k will be nonzero on the right. Therefore.

ϕ(x − m) = = (−1) j =0 p j+1 cj ϕ(2 x − 1 + j) . The particular values c0 = √ 1+ 3 4 .70) require c0 + c1 + c2 + c3 = 2. The remarkable fact. by compression and translation (13. It is easy to see that.Thus either c2 = 0. (13. the hat function (13. Before explaining how to solve the Daubechies dilation equation. . and we are back to the Haar case. c1 . k where the sum is over all 0 ≤ k ≤ p such that 0 ≤ 1 − 2 m − k ≤ p. (13. along with all her wavelet descendants w(2j x − k).72) solve (13. c3 .71). By orthogonality (13.k (x). √ 3− 3 4 . c0 c2 + c1 c3 = 0. Next we prove orthogonality of ϕ(x − m) and w(x): p p w(x) . c1 = c2 = 1. or c0 = 0. These coefficients correspond to the Daubechies dilation equation ϕ(x) = √ 1+ 3 4 ϕ(2 x) + ϕ(2 x − 1) + ϕ(2 x − 2) + ϕ(2 x − 3). . any even number) of nonzero coefficients c0 .65). 2/25/07 717 c 2006 Peter J. see Exercise . ϕ(x − 1 + j − 2 m − k) . √ 1− 3 4 . the resulting coefficient of ϕ(x) 2 is (−1)k c1−2 m−k ck = 0. discovered by Daubechies.62).68). k=0 ck ϕ(2 x − 2 m − k) j. (13. . = 1 2 j. and the resulting dilation equation is a simple reformulation of the Haar case. a mother wavelet w(x) = √ 1− 3 4 ϕ(2 x) − √ 3− 3 4 ϕ(2 x − 1) + √ 3+ 3 4 ϕ(2 x − 2) − √ 1+ 3 4 ϕ(2 x − 3). Each term in the sum appears twice.67) of the translates of ϕ. In particular.71) c1 = √ 3+ 3 4 c2 = √ 3− 3 4 c3 = √ 1− 3 4 (13. 0 1 2 3 √ 3+ 3 4 .k = 0 p (−1)j+1 cj ck ϕ(2 x − 1 + j) . cp are! The proof of orthogonality of the translates w(x − m) of the mother wavelet. . c2 . c0 = c1 = 1. the complete system of orthogonal wavelets wj.59) does not give rise to orthogonal wavelets. The basic equations (13. ϕ(2 x − 2 m − k) (−1)j+1 cj ck ϕ(x) . let us complete the proof of orthogonality. by translation invariance. the only summands that are nonzero are when j = 2 m + k + 1. and the details are left as an exercise for the reader. since ϕ(x) and ϕ(x − m) are orthogonal for any m = 0. and hence the result is always zero — no matter what the coefficients c0 . relies on a similar argument.74) and then.k = 0 using (13. with opposite signs.73) Any nonzero solution of bounded support to this remarkable functional equation will give rise to a scaling function ϕ(x). is that there is a nontrivial solution for four (and. indeed. c2 + c2 + c2 + c2 = 2. (13. so are ϕ(x − k) and ϕ(x − l) for any k = l. Olver .

. Olver . ϕ5 (x). otherwise.75) Before attempting to prove convergence of this iterative procedure to the Daubechies scaling function. In Figure 13. . This turns out to be true. The key observation is that (13. more correctly. Let us next discuss how to solve the dilation equation (13. 2/25/07 718 c 2006 Peter J. The solution we are after does not have an elementary formula.57) reads p ϕn+1 (x) = k=0 ck ϕn (2 x − k).75) converge uniformly to a continuous function ϕ(x). Solving the Dilation Equation Approximating the Daubechies Wavelet. [52]. let us experimentally investigate what happens. but in an infinite-dimensional function space. we can now try to prove convergence of the iterative scheme. . Bolstered by this preliminary experimental evidence. and we require a slightly sophisticated approach to recover it.57). 0 < t ≤ 1. and so starting with a suitable initial guess ϕ0 (x). called the Daubechies scaling function.57) has the form of a fixed point equation ϕ = F [ ϕ ]. Theorem 13. There clearly appears to be converging to some function ϕ(x). not in ordinary Euclidean space.Figure 13. 1.10 we graph the next 5 iterates ϕ1 (x).10. . . The functions converge ϕn (x) defined by the iterative functional equation (13. 0. A reasonable choice for the initial guess might be the Haar scaling or box function ϕ0 (x) = 1. n = 0. although the final result does look a little bizarre.13. a fully rigorous proof relies on the Fourier transform. With luck. 2. the fixed point (or. (13. . but is a little too advanced for this text and will be omitted. the successive iterates ϕn+1 = F [ ϕn ] will converge to the desired solution: ϕn (x) −→ ϕ(x). . fixed function) will be stable. the iterative version of the dilation equation (13. In detail. .

we are now able to verify that the scaling function and consequential system of wavelets form an orthogonal system of functions. are mutually orthogonal functions with respect to the L2 inner product. while wj. We use induction to prove that this holds for all the iterates ϕn (x). which are λ1 = 1. and all wavelets wj. Q. 2 v2 = −1 . 0 ϕ0 (2) 2 The solution is v(n) = An v(0) 2/25/07 √ √ 1 1− 3 1− 3 n A v2 = 2 v1 − n v2 . Moreover. the limiting procedure for constructing the scaling function is not so convenient. Details are relegated to Exercise . = 2 An v1 − 2 2 2 719 c 2006 Peter J. The starting point is to determine its values at integer points. j ≥ 0. for k ∈ Z of the Daubechies scaling function.76) and where v(n) = ϕn (1) . ϕn+1 (1) = √ 3+ 3 4 ϕn (1) + √ 1+ 3 4 ϕn (2). the limiting scaling function also satisfies this property. therefore.14. First. λ2 = 1 . 1 We write the initial condition as a linear combination of the eigenvectors √ 1− 3 1 ϕ0 (1) = 2 v1 − = v(0) = v2 .Once we have established convergence. Olver . All integer translates ϕ(x − k).75) will then produce the values of the iterates ϕn (m) at integer points m ∈ Z.E. ϕn+1 (2) = √ 1− 3 4 ϕn (1) + √ 3− 3 4 ϕn (2).75). A simple induction will convince you that ϕn (m) = 0 except for m = 1 and m = 2. ϕ 2 = 1. the orthogonality of the entire wavelet system will follow once we know the orthogonality (13. in view of uniform convergence. the initial box function has values ϕ0 (m) = 0 for all integers m ∈ Z except ϕ0 (1) = 1. and. by (13.k (x) = w(2j x − k). Proof : As noted earlier. ϕn (2) Referring back to Chapter 10.D. v1 = √ 1+ 3 4 √ 1− 3 4 .k 2 = 2− j . since all other terms are 0. and so. the solution to such an iterative system is specified by the eigenvalues and eigenvectors of the coefficient matrix.67) of the scaling function and its integer translates. and an alternative means of computing its values is employed. In practical computations. The iterative functional equation (13. This has the form of a linear iterative system v(n+1) = A v(n) with coefficient matrix A= √ 3+ 3 4 √ 1− 3 4 √ 1+ 3 4 √ 3− 3 4 (13. Proposition 13.

(13. Note that supp ϕ = supp w = [ 0.77) With this in hand.5 -0.5 2 1. . meaning rational numbers of the form x = m/2j for m. in practice. [127. The Daubechies scaling function is. because when x = 1 m then 2 x − k = m − k is an 2 integer. Once we know its values at the half integers.73) then prescribes the func1 tion values ϕ 2 m at all half integers. and so only dyadic numbers actually reside in a computer’s memory. j ∈ Z.5 -0. 2 for all m = 1.366025 .5 0. ϕ(1) ϕ(2) = lim v(n) = 2 v1 = n→∞ √ 1+ 3 2 √ 1− 3 2 gives the desired values of the scaling function: √ 1− 3 = 1. whose construction also relies on a similar scaling argument.5 -0.5 3 3. It is continuous.5 2 2.74). But. 154]. .5 0.62).5 -1 -1. Olver . 2/25/07 720 c 2006 Peter J. The preceding scheme was used to produce the graphs of the Daubechies scaling function in Figure 13. Continuing onwards.5 1 0. but non-differentiable function — and its graph has a very jagged. there is no real need to determine the value of ϕ at non-dyadic points. .2 1. the consequential values of the mother wavelet are given by formula (13.73) to give 1 its values at quarter integers 4 m.5 1 1. . Continuity will then prescribe its value at any other x ∈ R since x can be written as the limit of dyadic numbers xn — namely those obtained by truncating its binary (base 2) expansion at the nth digit beyond the decimal (or. With the values of the Daubechies scaling function on a sufficiently dense set of dyadic points in hand.5 -0. 2. The limiting vector The Daubechies Scaling Function and Mother Wavelet. Thus. rather “binary”) point.5 -1 -1.5 1 1. we can re-use equation (13. a close relative of the famous example of a continuous.366025 .5 Figure 13.5 2 2. in fact.11. . 3 ]. this latter step is unnecessary.5 3 3. √ −1 + 3 ϕ(2) = = − .5 1 0. we determine the values of ϕ(x) at all dyadic points.11. . since all computers are ultimately based on the binary number system. fractal-like appearance when viewed at close range. the Daubechies dilation equation (13. ϕ(1) = 2 ϕ(m) = 0. nowhere differentiable function originally due to Weierstrass. The daughter wavelets are then found by the usual compression and translation procedure (13.

. .4 0. 3.9. The Daubechies wavelet expansion of a function whose support is contained in† [ 0. 0 2−j (k+3) 3 (13.8 1 0.12. 1 ]. one can translate and rescale x to fit the function’s support inside [ 0.g.6 0.6 0.78) j = 0 k = −2 The inner summation begins at k = −2 so as to include all the wavelet offspring wj. 4.k (x). † 2/25/07 721 c 2006 Peter J. The wavelet coefficients c0 .8 1 0. For functions with larger support. . based on dyadic nodes to speedily evaluate the integrals (13.2 0.6 0. [9].k are computed by the usual orthogonality formula 3 c0 = f . and so cannot converge uniformly to a discontinuous function.2 8 6 4 2 0. (13.4 0.8 1 8 6 4 2 0. wj. In practice. cj.8 1 Figure 13. In Figure 13.k = 2j 2−j k f (x) wj.15. Daubechies Wavelet Expansion. 1 ] is then given by f (x) ∼ c0 ϕ(x) + ∞ 2j −1 cj. e.2 0. Compression and denoising algorithms based on retaining only low frequency modes proceed as before. The first plot is the original signal.79).4 0.k (x) dx = 0 where we agree that f (x) = 0 whenever x < 0 or x > 1.6 0.k whose support has a nontrivial intersection with the interval [ 0.6 0.12. cj.8 6 4 2 8 6 4 2 8 6 4 2 0. and the following show the partial sums of (13. Alternatively.8 1 8 6 4 2 0. Example 13.78) over j = 0. one employs a numerical integration procedure. A proof of completeness of the resulting wavelet basis functions can be found in [52]. Olver . 6.. one should include additional terms in the expansion corresponding to further translates of the wavelets so as to cover the entire support of the function. r with r = 2.k = f . . 1 ].8 1 0.2 0.2 0.6 0. we plot the Daubechies wavelet expansions of the same signal for Example 13. since the function is set to 0 outside the interval [ 0.79) f 2− j (x + k) w(x) dx. the trapezoid rule.2 0.4 0. the Daubechies wavelets are continuous. and are left as exercises for the reader to implement.4 0. 1 ]. ϕ = f (x) ϕ(x) dx.4 0.k wj. 5. the Daubechies wavelets do exhibit the nonuniform Gibbs phenomenon at the interior discontinuity as well as the endpoints. Indeed. Unlike the Haar expansion.

The Fourier Transform. ℓ ] of length 2 ℓ. signal processing. to name but a few. . which is a powerful mathematical tool for the analysis of non-periodic functions.82) f (kν ) = √ 2 π −ℓ This reformulation of the basic Fourier series formula allows us to smoothly pass to the limit when the intervals’ length ℓ → ∞.85) on a symmetric interval [ − ℓ . control theory. Let f (x) be a reasonably nice function defined for all −∞ < x < ∞.3. . ℓ 2/25/07 722 c 2006 Peter J. including both ordinary and partial differential equations. quantum mechanics. and then take the limit as the intervals’ length becomes infinite. with interfrequency spacing π ∆k = kν+1 − kν = .80) The sum is over the discrete collection of frequencies πν . and the resulting representation of a function is renamed the Fourier transform. (13. Although the rigorous details are rather exacting. On an interval of length 2 ℓ. ν = 0. 2 ℓ (13. Since we are dealing with an infinite interval. For reasons that will soon become apparent. One evident approach is to construct its Fourier series on progressively larger and larger intervals. Olver . non-periodic function into a continuous superposition of trigonometric functions of all possible frequencies. which we rewrite in the adapted form f (x) ∼ ∞ ν = −∞ π f ℓ (kν ) i kν x e . The extension of Fourier methods to unbounded intervals — the entire real line — leads naturally to the Fourier transform. Our starting point is the rescaled Fourier series (12. Fourier series and their ilk are designed to solve boundary value problems on bounded intervals. and probability.81) required to represent a function in Fourier series form are equally distributed. We begin by motivating the Fourier transform as a limiting case of Fourier series. 2 ℓ ℓ 1 f (x) e− i kν x dx. ±2. and so we are effectively decomposing a quite general. the frequencies (13. . there are no longer any periodicity requirements on the function f (x). the Fourier coefficients of f are now denoted as cν = f . the underlying idea is not so difficult. . Moreover. e i kν x = so that 1 2ℓ ℓ f (x) e− i kν x dx = −ℓ π f (kν ) . the frequencies represented in the Fourier transform are no longer constrained by the length of the interval. The goal is to construct a Fourier expansion for f (x) in terms of basic trigonometric functions. The Fourier transform is of fundamental importance in a broad range of applications.13. The limiting process converts the Fourier sums into integrals. The computations will be significantly simpler if we work with the complex version of the Fourier series from the outset. ±1. kν = (13. Let us present the details of this construction in a more concrete form.81) ℓ corresponding to those trigonometric functions that have period 2 ℓ.

The extra 2 π factor in front of the integral. piecewise continuous and decayin to 0 reasonably quickly as | x | → ∞.86) 2 π −∞ known as the inverse Fourier transform. the spacing ∆k → 0.83) −∞ known as the Fourier transform of the function f (x). over the entire frequency space −∞ < k < ∞. is for later convenience. its Fourier transform f (k) is defined for all possible frequencies −∞ < k < ∞. (13. Olver . 723 2/25/07 c 2006 f (x) = F −1 [ f (k) ] (13. The contribution of each such exponential is the value of the Fourier transform f (k) at the exponential’s frequency.88) Peter J.87) (13. the right hand side has the form of a Riemann sum. and so the relevant frequencies become more and more densely packed in the space of all possible frequencies: −∞ < k < ∞. and sending ℓ → ∞. which is sometimes omitted. that serves to recover the original signal from its Fourier transform. letting kν = k be arbitrary in (13. It is also worth pointing out that both the Fourier transform (13.As ℓ → ∞. If f (x) is a reasonably nice function. This formula will sometimes conveniently be abbreviated as f (k) = F[ f (x) ]. In this manner.80). the Fourier series (13. we anticipate that all possible frequencies will be represented. we employ a similar limiting procedure on the Fourier series (13. In abbreviated form is the inverse of the Fourier transform operator (13. In the limit. 155]. as ℓ → ∞. This means that the Fourier transform of the sum of two functions is the sum of their individual transforms. √ (13. which we first rewrite in a more suggestive form: 1 f (x) ∼ √ 2π ∞ ν = −∞ f ℓ (kν ) e i kν x ∆k.82). moreover. the interfrequency spacing ∆k → 0 and so one expects the Riemann sums to converge to the integral ∞ 1 f (x) ∼ √ f (k) e i k x dk. Indeed. the functions f ℓ (k) → f (k) converge to the Fourier transform. (13.84).84) and its inverse (13.85) becomes a Fourier integral that reconstructs the function f (x) as a (continuous) superposition of complex exponentials e i k x of all possible frequencies. Under reasonable hypotheses.g. To reconstruct the function from its Fourier transform.85) For each fixed value of x. while multiplying a function by a constant multiplies its Fourier transform by the same factor: F[ f (x) + g(x) ] = F[ f (x) ] + F[ g(x) ] = f (k) + g(k)..87) define linear maps on function space.84) where F is the Fourier transform operator . [9. F[ c f (x) ] = c F[ f (x) ] = c f (k). results in the infinite integral 1 f (k) = √ 2π ∞ f (x) e− i k x dx (13. for the function gℓ (k) = f ℓ (k) e i k x . e.

For example.13. (13.5 1 0.28 below.4 0.2 1 0. 1.5 1. | x | > a.8 0.2 1 2 1. π k (13. In particular.83) converges. implying that (as with Fourier series) the very high frequency modes make negligible contributions to the reconstruction of the signal.4 0. have become an entire function f (k) defined on all of frequency space k ∈ R. general result will be formulated in Theorem 13.89) e− i k x dx = −a e i ka − e− i ka √ = 2π i k 2 sin a k . if f (x) is piecewise C1 on all of R and f (x) → 0 decays reasonably rapidly as | x | → ∞ ensuring that its Fourier integral (13. A more precise.6 0. Recapitulating.2 1 0.2 -2 -1 -0.6 0.2 -2 -1 Figure 13. Example 13. and 1 to the midpoint 2 (f (x− ) + f (x+ )) at jump discontinuities — just like a Fourier series. Olver . The reconstruction of f (x) from its Fourier transform f (k) via (13. The Fourier transform of a rectangular pulse or box function f (x) = σ(x + a) − σ(x − a) = of width 2 a is easily computed: 1 f (k) = √ 2π a A similar statement hold for the inverse Fourier transform F −1 . while the Fourier coefficients.8 0.2 1 2 -3 -2 -1 -0.86) will converge to f (x) at all points of continuity. the reconstruction of the pulse via the inverse transform (13. the discrete Fourier series has become a continuous Fourier integral.2 2 -2 -1 -0.16. then it can be proved that the inverse Fourier integral (13.1.6 0. by letting the length of the interval go to ∞. 0.2 1 2 0. Fourier Transform of a Rectangular Pulse.8 0.90) On the other hand.86) tells 2/25/07 724 c 2006 Peter J. its Fourier transform f (k) → 0 must also decay as | k | → 0.5 1 2 3 1.4 0. which were defined only at a discrete collection of possible frequencies.86) can be rigorously justified under suitable hypotheses. − a < x < a.2 1 -0.

| x | > a.91) by the Fundamental Theorem of Calculus. indeed.91) where a > 0. its Fourier transform. 0. but it can be proved. −a < x < a. 2 π (a + i k) One can use Euler’s formula (3. (13.92) Just as many Fourier series yield nontrivial summation formulae. (13. −a < x < a. x = ± a.13 we display the box function with a = 1. along with a reconstruction obtained by numerically integrating (13. and so the usual comparison test for infinite integrals. Splitting this complex integral into its real and imaginary parts. Olver .17. we must break off the numerical integrator by restricting it to a finite interval. similar to that of a Fourier series. We compute its Fourier transform directly from the definition: 1 f (k) = √ 2π ∞ 0 e −ax − i kx e 1 e−(a+ i k)x dx = − √ 2π a + i k ∞ x=0 =√ 1 . 167].   ∞ ∞ cos k x sin a k sin k x sin a k 1 1 1 . which is required for the integral (13. it is not even clear that the integral converges.83) to converge. [9. the convergence of the integral is marginal at best: the trigonometric oscillations somehow overcome the slow rate of decay of 1/k and thereby induce the (conditional) convergence of the integral! In Figure 13. The first graph is obtained by integrating from −5 ≤ k ≤ 5 while the second is from −10 ≤ k ≤ 10. | x | > a.84) to reduce Zthe integrand to one of the form eα k /k. (13. † Note that we can’t Fourier transform the entire exponential function e− a x because it does not go to zero at both ± ∞. x > 0. the amplitude of the oscillatory integrand decays like 1/| k |. 46. ‡ 2/25/07 725 c 2006 Peter J. that there is no formula for (eα k /k) dk in terms of elementary functions. x < 0. Consider an exponentially decaying right-handed pulse‡ fr (x) = e− a x .93)  1. Thus. x = ± a. dk = f (x) =  2 k 0.us that 1 π ∞ −∞ e ikx Note the convergence to the middle of the jump discontinuities at x = ± a. but the latter function does not have a convergent integral. One cannot compute the integral (13. 2  π −∞ k π −∞ k  0. since there is no elementary function† whose derivative equals the integrand. Since we are dealing with an infinite integral. The non-uniform convergence of the integral leads to the appearance of a Gibbs phenomenon at the two discontinuities. [ int ].  sin a k 1 . Moreover. we deduce a pair of striking trigonometric integral identities  1. dk = dk = 0. Example 13. fails to apply. the reconstruction of a function from its Fourier transform often leads to nontrivial integral identities.92).

2 + a2 π k (13. Similarly.5 -1 -0. x > 0.99) Peter J. Odd Pulse fo (x) Exponential Pulses.97) The result is real and even because fe (x) is a real even function. the inverse Fourier transform for this function produces a nontrivial complex integral identity:   e− a x .4 0.98) (13.6 0.95) (13. x < 0. k 2 + a2 c 2006 (13.96) This also follows from the general fact that the Fourier transform of f (− x) is f (− k). 2 π (a − i k) (13.2 1 2 3 -3 -2 -1 1 0. a pulse that decays to the left. x = 0. x < 0. Olver . fl (x) = ea x .2 1 2 3 -1 Even Pulse fe (x) Figure 13.6 0.2 -3 -2 Left Pulse fl (x) 1 0.14. Thus.4 0. has Fourier transform f (k) = √ 1 . The even exponentially decaying pulse fe (x) = e− a | x | is merely the sum of left and right pulses: fe = fr + fl . where a > 0 is still positive. by linearity. x > 0.86) produces another nontrivial integral identity: e− a | x | = 2/25/07 1 π ∞ −∞ a eikx a dk = 2 + a2 k π 726 ∞ −∞ cos k x dk. As in the preceding example. f e (k) = √ 1 1 + √ = 2 π (a + i k) 2 π (a − i k) a 2 .8 0.2 2 3 Right Pulse fr (x) 1 0.4 0.8 0. The inverse Fourier transform (13.6 0. see Exercise . see Exercise .1 0.5 1 2 3 -3 -2 -1 -0.2 -3 -2 -1 -0.94) dk = .2 1 -0. 0.8 0.  2 2 π −∞ a + i k  0.  ∞ ikx e 1 1 (13.

and. If the Fourier transform of the function f (x) is f (k). in accordance with Theorem 13. Symmetry allows us to reduce the tabulation of Fourier transforms by half. For instance. referring back to Example 13.16.103) is f (k) = π e− a | k | .18. Therefore. then we exactly recover the integral (13.99). we conclude that the Fourier transform of (13. x < 0.18.) On the other hand. Indeed.104) The indefinite integral (anti-derivative) does not appear in basic integration tables. Olver . and has purely imaginary and odd Fourier transform f o (k) = √ 1 1 − √ = −i 2 π (a + i k) 2 π (a − i k) i π ∞ −∞ k 2 . + c2 where c > 0. The reader has no doubt already noted the remarkable similarity between the Fourier transform (13. (13.103) Its Fourier transform requires integrating 1 f (k) = √ 2π ∞ −∞ e− i k x dx. the only difference is that the former has a minus sign in the exponential. 2 a (13. the odd exponentially decaying pulse.100) is the difference of the right and left pulses. then the Fourier transform of f (x) is f (− k).86). If we change x to k and k to − x. we have just managed to evaluate this particular integral! Look at (13. 2 + a2 π k (13.(The imaginary part of the integral vanishes because the integrand is odd. Theorem 13. fo (x) = (sign x) e− a | x | = e− a x .101) The inverse transform is (sign x) e− a | x | = − 1 k eikx dk = 2 + a2 k π ∞ −∞ k sin k x dk.105) This last example is indicative of an important general fact.102) As a final example. (13. consider the rational function f (x) = x2 1 . x2 + a2 (13. in fact.104) up to a factor of a 2/π. However. This implies the following Symmetry Principle relating the direct and inverse Fourier transforms. k 2 + a2 (13. cannot be done in terms of elementary functions.83) and its inverse (13. − ea x . fo = fr − fl . x > 0. we deduce that the Fourier transform of the function 2 sin a x f (x) = π x 2/25/07 727 c 2006 Peter J.

108) The nonintegrable singularity of f (k) at k = 0 is indicative of the fact that the sign function does not decay as | x | → ∞. (13. However. Nevertheless. which is achieved √ by replacing 2 π by 2 π. Let us start with the odd exponential pulse (13.110) Peter J. the function fo (x) converges to the sign function f (x) = sign x = σ(x) − σ(− x) = +1. the reader always needs to check which convention is being used.100).109) The limit of the Fourier transform (13.106) Taking the limit of the Fourier transform (13.107)   1. When a → 0. it is possible to rigorously justify these results within the framework of generalized functions. k = ± a.83) of the Fourier transform f (k).) When consulting any particular reference book. Olver 2/25/07 . k = 0. transform of x √ Warning: Some authors leave out the 2 π factor in the definition (13. k = 0. π k (13. The functions that emerge in the limit as a goes to 0 are of fundamental importance. and so the Symmetry Principle of Theorem 13. (13. convolution — to be discussed below — is a little easier without the extra factor. or even Lebesgue.18 requires some modification. A significant disadvantage is that the resulting formulae for the Fourier transform and its inverse are not as close. 1 . which. (13. (On the other hand. this requires an extra such factor in the reconstruction formula (13. This alternative convention does have a slight advantage of elim√ inating many 2 π factors in the Fourier transforms of elementary functions. −1. More interesting are the even pulse functions fe (x). x > 0. we can divide both f (x) and f (k) by 2/π to deduce the Fourier sin a x . in the limit a → 0. x < 0. neither the Fourier transform integral nor its inverse are well-defined as standard (Riemann.98) is lim 2 2a = 2 + a2 π k 728 0. In this case.is Note that. c 2006 a→0 (13.86). ∞. | k | > a. [155]) integrals.101) leads to f (k) = − i 2 1 . See also Exercise . All of the functions in Example 13.17 required a > 0 for the Fourier integrals to converge. f (k) = σ(− k + a) − σ( −k − a) = σ(k + a) − σ(k − a) =  2 0. − a < k < a. become the constant function f (x) ≡ 1. by linearity.

This limiting behavior should remind the reader of our construction (11.31) of the delta function as the limit of the functions δ(x) = lim
n→∞

n a = lim , 2 x2 ) 2 + x2 ) a → 0 π (a π (1 + n

Comparing with (13.110), we conclude that the Fourier transform of the constant function (13.109) is a multiple of the delta function in the frequency variable: √ f (k) = 2 π δ(k). (13.111) The direct transform integral δ(k) = 1 2π
∞ −∞

e− i k x dx

is, strictly speaking, not defined because the infinite integral of the oscillatory sine and cosine functions doesn’t converge! However, this identity can be validly interpreted within the framework of weak convergence and generalized functions. On the other hand, the inverse transform formula (13.86) yields
∞ −∞

δ(k) e i k x dk = e i k 0 = 1,

which is in accord with the basic definition (11.36) of the delta function. As in the previous case, the delta function singularity at k = 0 manifests the lack of decay of the constant function. Conversely, the delta function δ(x) has constant Fourier transform 1 δ(k) = √ 2π 1 e− i k 0 ≡√ . δ(x) e− i k x dx = √ 2π 2π −∞

(13.112)

a result that also follows from the symmerty principle of Theorem 13.18. To determine the Fourier transform of a delta spike δy (x) = δ(x − y) concentrated at position x = y, we compute ∞ 1 e− i k y δy (k) = √ . (13.113) δ(x − y) e− i k x dx = √ 2 π −∞ 2π The result is a pure exponential in frequency space. Applying the inverse Fourier transform (13.86) leads, formally, to the remarkable identity 1 δy (x) = δ(x − y) = 2π
∞ −∞

e− i k(x−y) dk =

1 eiky , eikx , 2π

(13.114)

where · , · denotes the usual L2 inner product on R. Since the delta function vanishes for x = y, this identity is telling us that complex exponentials of differing frequencies are mutually orthogonal. However, this statement must be taken with a grain of salt, since the integral does not converge in the normal (Riemann or Lebesgue) sense. But 2/25/07 729
c 2006 Peter J. Olver

it is possible to make sense of this identity within the language of generalized functions. Indeed, multiplying both sides by f (x), and then integrating with respect to x, we find f (y) = 1 2π
∞ −∞ ∞ −∞

f (x) e− i k(x−y) dx dk.

(13.115)

This is a perfectly valid formula, being a restatement (or, rather, combination) of the basic formulae (13.83, 86) connecting the direct and inverse Fourier transforms of the function f (x). Conversely, the Symmetry Principle tells √ that the Fourier transform of a pure us i lx exponential e will be a shifted delta spike 2 π δ(k − l), concentrated in frequency space. Both results are particular cases of the general Shift Theorem, whose proof is left as an exercise for the reader. Theorem 13.19. If f (x) has Fourier transform f (k), then the Fourier transform of the shifted function f (x−y) is e− i k y f (k). Similarly, the transform of the product function e i l x f (x) for l real is the shifted transform f (k − l). Since the Fourier transform uniquely associates a function f (k) on frequency space with each (reasonable) function f (x) on physical space, one can characterize functions by their transforms. Practical applications rely on tables (or, even better, computer algebra systems) that recognize a wide variety of transforms of basic functions of importance in applications. The accompanying table lists some of the most important examples of functions and their Fourier transforms. Note that, according to the Symmetry Principle of Theorem 13.18, each tabular entry can be used to deduce two different Fourier transforms. A more extensive collection of Fourier transforms can be found in [140]. Derivatives and Integrals One of the most remarkable and important properties of the Fourier transform is that it converts calculus into algebra! More specifically, the two basic operations in calculus — differentiation and integration of functions — are realized as algebraic operations on their Fourier transforms. (The downside is that algebraic operations become more complicated in the transform domain.) Let us begin with derivatives. If we differentiate† the basic inverse Fourier transform formula ∞ 1 f (x) ∼ √ f (k) e i k x dk. 2 π −∞ with respect to x, we obtain
∞ 1 f ′ (x) ∼ √ i k f (k) e i k x dk. (13.116) 2 π −∞ The resulting integral is itself in the form of an inverse Fourier transform, namely of 2 π i k f (k) which immediately implies the following basic result.

We are assuming the integrand is sufficiently nice so that we can bring the derivative under the integral sign; see [ 64, 195 ] for a fully rigorous derivation.

2/25/07

730

c 2006

Peter J. Olver

Short Table of Fourier Transforms
f (x) 1 δ(x) σ(x) sign x σ(x + a) − σ(x − a) e− a x σ(x) ea x (1 − σ(x)) e− a | x | e
− a x2

f (k) √ 2 π δ(k) 1 √ 2π

i π δ(k) − √ 2 2π k −i 2 1 π k

2 sin a k π k 1 √ 2 π (a + i k) √ 1 2 π (a − i k) 2 a 2 + a2 π k e− k /4 a √ 2a −i π e− | k | π 3/2 + √ δ(k) 2 k 2 e i k d/c f |c| f (− k) f (− k) i k f (k) i f ′ (k) √ 2 π f (k) g(k) k c
2

tan−1 x f (c x + d) f (x) f (x) f ′ (x) x f (x) f ∗ g(x)

Note: The parameter a > 0 is always real and positive, while c = 0 is any nonzero complex number. 2/25/07 731
c 2006 Peter J. Olver

If the nth derivative f (n) (x) is also a reasonable function.18. This is reflected in the relation between their Fourier transforms.98. local features of f (x).117). This result has an important consequence: the smoothness of f (x) is manifested in the rate of decay of its Fourier transform f (k). its derivative contains a delta function.101) by i k we obtain i k f o (k) = 2 k2 = π k 2 + a2 2 − π 2 a2 = 2 δ(k) − a f e (k).22.21. since the odd exponential pulse has a jump discontinuity of magnitude 2 at x = 0. On the other hand. such as smoothness. We already noted that the Fourier transform of a (nice) function must decay to zero at large frequencies: f (k) → 0 as | k | → ∞. Olver . Example 13. the Fourier transform of x f (x) is obtained by differentiating the Fourier transform of f (x): df F[ x f (x) ] = i k . just like Fourier series. the smoother f (x). are manifested by global features of f (k). The Symmetry 2/25/07 732 c 2006 Peter J. The Fourier transform of f (n) (x) is ( i k)n f (k). Thus. The derivative of the even exponential pulse fe (x) = e− a | x | is a multiple of the odd exponential pulse fo (x) = (sign x) e− a | x | : fe′ (x) = − a (sign x) e− a | x | = − afo (x). the more rapid the decay of its Fourier transform. (13.20 says that their Fourier transforms are related by f o (k) = − ik f (k) = − i a e ik k 2 =− f (k). The Fourier transform of the derivative f ′ (x) of a function is obtained by multiplication of its Fourier transform by i k: F[ f ′ (x) ] = i k f (k). Higher order derivatives are handled by iterating formula (13. 101). 2 + a2 π k a o as previously noted in (13. As a general rule of thumb. and is equal to fo′ (x) = − a e− a | x | + 2 δ(x) = − a fe (x) + 2 δ(x).118) dk The second statement follows from the first by use of the Symmetry Principle of Theorem 13.117) Similarly. such as decay for large | k |. Proposition 13.Proposition 13. π k 2 + a2 The Fourier transform. and so: Corollary 13. If we multiply (13. (13. is completely compatible with the calculus of generalized functions. then its Fourier transform f (n) (k) = ( i k)n f (k) must go to zero as | k | → ∞.20. This requires that f (k) go to zero more rapidly than | k |− n .

23 with (13. k ∞ −∞ (13. Integration of functions is the inverse operation to differentiation. we find h(k) = g(k) − π f (0) δ(k) + On the other hand.120) and (13. and conclude that i k h(k) = f (k) − f (0).121) establishes the desired formula (13. the resulting function √ h(x) = g(x) − 2 π f (0) σ(x) decays to 0 at both ±∞. As with Fourier series. Therefore. Olver π e− | k | π 3/2 + √ δ(k). A 2/25/07 733 c 2006 Peter J. The Fourier transform of the inverse tangent function x Q.120) 2 π f (0) δ(x). this is not completely correct.121) Combining (13.E. we can apply our differentiation rule (13. Proposition 13.105): f (k) = − i Applications to Differential Equations The fact that the Fourier transform changes differentiation in the physical domain into multiplication in the frequency domain is one of its most compelling features.24. and this contributes an extra delta function in frequency space. f (x) = tan−1 x = 0 dy = 1 + y2 x −∞ π dy − 2 1+y 2 can be computed by combining Proposition 13. This local-global duality is one of the major themes of Fourier theory. Example 13. If f (x) has Fourier transform f (k). k (13. x → +∞ lim g(x) = f (x) dx = √ 2 π f (0). 2 k 2 . (13. then the Fourier transform x of its integral g(x) = −∞ f (y) dy is g(k) = − i f (k) + π f (0) δ(k). by subtracting a suitable multiple of the step function from the integral.119). Consulting our table of Fourier transforms. there is an extra constant involved. Since h(x) → 0 as | x | → ∞.D.Principle implies that reverse is also true: global features of f (x) correspond to local features of f (k).119) Proof : First notice that x → −∞ lim g(x) = 0. and so should correspond to division by 2 π i k in frequency space. h′ (x) = f (x) − √ i f (0) .23.117).

In place of the boundary conditions used on finite intervals.86): 1 u(x) = √ 2π ∞ −∞ h(k) e i k x dk. In quantum mechanics. dx2 − ∞ < x < ∞. these solutions are known as the bound states of the system. In lieu of boundary conditions. For example. k2 + ω2 (13.124) For example. the transformed equation can be immediately solved for u(k) = h(k) + ω2 (13. As a specific example.124) writes the solution as a Fourier integral: u(x) = 2/25/07 1 π ∞ −∞ eikx 1 dk = 2 + ω 2 )(k 2 + 1) (k π 734 ∞ −∞ (k 2 cos k x dk .22 into account.123) k2 Therefore. and thereby opens the door to their solution by elementary algebra! One begins by applying the Fourier transform to both sides of the differential equation under consideration. (13. the result is a linear algebraic equation k 2 u(k) + ω 2 u(k) = h(k) relating the Fourier transforms of u and h. π k2 + 1 then (13. consider the boundary value problem − d2 u + ω 2 u = h(x). [129]. which can then be immediately reconstructed via the inverse Fourier transform. the bound electrons in an atom are localized by the electrostatic attraction of the nucleus. Olver .122) where ω > 0 is a positive constant. we look for solutions that decay to zero sufficiently rapidly as | x | → ∞ — in order that their Fourier transform be well-defined (in the context of ordinary functions). The Fourier transform is particularly well adapted to boundary value problems on the entire real line. Solving the resulting algebraic equation will produce a formula for the Fourier transform of the desired solution. + ω 2 )(k 2 + 1) c 2006 Peter J. we can reconstruct the solution by applying the inverse Fourier transform formula (13. We will solve this problem by applying the Fourier transform to both sides of the differential equation. Taking Corollary 13. we require that the solution u(x) → 0 as | x | → ∞. Unlike the differential equation. h(x) = e− | x | with h(k) = 1 2 . if the forcing function is an even exponential pulse.particularly important consequence is that the Fourier transform effectively converts differential equations into algebraic equations. and correspond to subatomic particles that are trapped or localized in a region of space by some sort of force field.

the solution must also be real. if the forcing function is real.123). y) = 1 u(x) = G(x.126) The reader may enjoy recovering the particular exponential solution (13. as required by the boundary conditions. Thus.125) from this integral formula. The Green’s function superposition principle tells us that the solution to the inhomogeneous boundary value problem (13.) The Fourier integral can be explicitly evaluated by using partial fractions to rewrite u(k) = 1 2 = 2 + ω 2 )(k 2 + 1) π (k 1 2 2−1 π ω k2 1 1 − 2 + 1 k + ω2 . y) h(y) dy = 2ω −∞ ∞ ∞ −∞ e− ω| x−y | h(y) dy. and satisfies the differential equation everywhere. ω 2 = 1. y) = . The “resonant” case ω 2 = 1 is left as an exercise. times a multiple of the Fourier transform of the even exponential pulse e− ω | x | . 2 π k2 + ω2 which is the product of an exponential factor e− i k y . its Fourier transform with respect to x is e− i k y 1 √ G(k.noting that the imaginary part of the complex integral vanishes because the integrand is an odd function. According to (13.122) under a general forcing can be represented in the integral form G(x. 2/25/07 735 c 2006 Peter J. and conclude that the Green’s function for this boundary value problem is an exponential pulse centered at y. (Indeed. A particularly important case is when the forcing function h(x) = δy (x) = δ(x − y) represents a unit impulse concentrated at x = y. The resulting square-integrable solution is the Green’s function G(x. (13. namely 1 − ω | x−y | e . It also decays as | x | → ∞. representing the Fourier transform of δy (x). according to our Fourier Transform Table. (13. except at the point x = y when its derivative has a jump discontinuity of unit magnitude. 2ω Observe that. Remark : The method of partial fractions that you learned in first year calculus is often an effective tool for evaluating (inverse) Fourier transforms of rational functions. meaning that it is twice continuously differentiable (which is not so immediately apparent from the formula). it satisfies the homogeneous differential equation − u′′ + ω 2 u = 0. the solution to this boundary value problem is 1 e− | x | − e− ω | x | ω u(x) = when ω 2 = 1. as with other self-adjoint boundary value problems.125) ω2 − 1 The reader may wish to verify that this function is indeed a solution. y) for the boundary value problem. decays to 0 as | x | → ∞. Olver . the Green’s function is symmetric under interchange of x and y. As a function of x.19. We apply Theorem 13.

2/25/07 736 c 2006 Peter J. The right hand side has the form of a convolution product between functions. b ∈ C. (c) Associativity: f ∗ (g ∗ h) = (f ∗ g) ∗ h. The convolution of scalar functions f (x) and g(x) is the scalar function h = f ∗ g defined by the formula h(x) = f (x) ∗ g(x) = ∞ −∞ f (x − y) g(y) dy. In fact. (13. On the other hand. In fact. (b) Bilinearity: a. according to the last property. (d) Zero function: f ∗ 0 = 0.36) of the delta function. In particular. f ∗ (a g + b h) = a (f ∗ g) + b (f ∗ h). One tricky feature is that the constant function 1 is not a unit for the convolution product. the delta function plays the role of the “convolution unit”: f (x) ∗ δ(x) = ∞ −∞ f (x − y) δ(y) dy = f (x). Olver . its Fourier transform (13.Convolution The final Green’s function formula (13. and conversely.126) has the form of a convolution product between an 1 − ω| x | e and the forcing function: even exponential pulse g(x) = 2ω u(x) = g(x) ∗ h(x). (a) Symmetry: f ∗ g = g ∗ f.126) is indicative of a general property of Fourier transforms. indeed. our solution (13. leaving their verification as exercises for the reader. this is a general property of the Fourier transform: convolution in the physical domain corresponds to multiplication in the frequency domain. (a f + b g) ∗ h = a (f ∗ h) + b (g ∗ h). (e) Delta function: f ∗ δ = f.25. f ∗1= ∞ f (y) dy −∞ is a constant function — the total integral of f — not the original function f (x). Definition 13. which follows from the basic property (11.123) is the ordinary multiplicative product u(k) = g(k) h(k) of the Fourier transforms of g and h.127) We record the basic properties of the convolution product. All of these assume that the implied convolution integrals converge.

The Fourier transform of the convolution u(x) = f (x) ∗ g(x) of two functions is a multiple of the product of their Fourier transforms: √ u(k) = 2 π f (k) g(k).26. The second statement follows directly from the Symmetry Principle of Theorem 13. Vice versa. 2 0. up to multiple. 2π Proof : Combining the definition of the Fourier transform with the convolution formula (13.27. 2 g(k) = − i Therefore.D. the Fourier transform of their product h(x) = f (x) g(x) is.127). Q. the convolution of their Fourier transforms: 1 h(k) = √ f (k) ∗ g(k). π sign k. We already know. that the Fourier transform of f (x) = is the box function f (k) = sin x x   π . | k | > 1. we find 1 u(k) = √ 2π 1 h(x) e− i k x dx = √ 2π −∞ ∞ ∞ −∞ ∞ −∞ f (x − y) g(y) e− i k x dx dy. We also know that the Fourier transform of g(x) = 1 x is π σ(k + 1) − σ(k − 1) =  2 −1 < k < 1. Olver . Applying the change of variables z = x − y in the inner integral produces 1 u(k) = √ 2π √ = 2π ∞ −∞ ∞ f (z) g(y) e− i k(y+z) dy dz f (z) e −ikz 1 √ 2π −∞ ∞ −∞ dz 1 √ 2π ∞ −∞ g(y) e− i k y dy = √ 2 π f (k) g(k). (13.E. the Fourier transform of their product h(x) = f (x) g(x) = 2/25/07 737 sin x x2 c 2006 Peter J. [9].18. Example 13.Theorem 13.106). where we are assuming that the integrands are sufficiently nice to allow us to interchange the order of integration.

A critical question is precisely which function space should the theory be applied to.5 -1 1 2 3 Figure 13.1 0. −1 < k < 1. Olver . is a linear map. In Section 12. We have already noted that the Fourier transform. Not every function admits a Fourier transform in the classical sense† — the Fourier integral (13. k > 1. and this places restrictions on the function and its asymptotics at large distances.   i π    2      1 π sign(k − l) dl = k.15. taking functions f (x) on physical space to functions f (k) on frequency space. we already introduced the Hilbert space L2[ − π. −i  2 −1        −i π  2 f (l) g(k − l) dl k < −1. The Fourier transform of sin x .5 -3 -2 -1 -0. when defined. Thus. it is worth outlining a few of the most basic features. here we adapt Definition 12.33 to the entire real line. It turns out the proper setting for the rigorous theory is the Hilbert space of complexvalued square-integrable functions — the same infinite-dimensional vector space that lies at the heart of modern quantum mechanics. x2 can be obtained by convolution: 1 1 f ∗ g(k) = √ h(k) = √ 2π 2π ∞ −∞ = −i π 8 A graph of h(k) appears in Figure 13. The Fourier Transform on Hilbert Space While we do not have the space to embark on a fully rigorous treatment of the theory underlying the Fourier transform.5.83) is required to converge. π ] on a finite interval. the Hilbert space L2 = L2 (R) is the infinite-dimensional vector space † We leave aside the more advanced issues involving generalized functions in this subsection.15. 2/25/07 738 c 2006 Peter J.

and the precise definitions and identification of its elements is quite subtle. However. If f (x) ∈ L2 is square-integrable. g ∈ L2 . Theorem 13. 2 | f . then its Fourier transform f (k) ∈ L is a well-defined. f .. If f (x) is continuously differentiable at a point x. (13. then f . f ′ (x+ ) exist. This important result is known as Parseval’s formula.86) equals its value f (x). whose Fourier series counterpart appeared in (12. If f (k) = F[ f (x) ] and g(k) = F[ g(x) ]. More generally.consisting of all complex-valued functions f (x) which are defined for all x ∈ R and have finite L2 norm: ∞ f 2 = For example. The inner product on the Hilbert space L2 is prescribed in the usual manner. then its inverse Fourier transform (13. Theorem 13.e. A fully rigorous proof can be found in [175.28. | x |1/2+δ for all sufficiently large | x | ≫ 0. square-integrable function of the frequency variable k. belongs to L2 . i.29.g = The Cauchy–Schwarz inequality ∞ f (x) g(x) dx. (13.g | ≤ f g Thus. Let us state the fundamental theorem governing the effect of the Fourier transform on functions in Hilbert space. (13.7 for Fourier series. ∞ f (x) g(x) dx = ∞ f (k) g(k) dk. as in Section 12. We use the definition (13. Hilbert space contains many more functions. then the inverse Fourier transform integral converges to the average value 1 − + 2 f (x ) + f (x ) . g .5.130) −∞ −∞ Proof : Let us sketch a formal proof that serves to motivate why this result is valid. and f (x+ ).129) −∞ | f (x) |2 dx < ∞. 195]. −∞ ensures that the inner product integral is finite whenever f.83) of the Fourier transform to evaluate ∞ −∞ f (k) g(k) dk = = ∞ −∞ ∞ −∞ 1 √ 2π ∞ −∞ ∞ −∞ f (x) e− i k x dx 1 2π ∞ −∞ 1 √ 2π ∞ −∞ g(y) e+ i k y dy dk f (x) g(y) 739 e− i k(x−y) dk dx dy. if the right and left hand limits f (x− ). the Fourier transform f = F[ f ] defines a linear transformation from L2 functions of x to L2 functions of k.128) for some M > 0 and δ > 0. Olver 2/25/07 . the Fourier transform preserves inner products.117). f ′ (x− ). In fact. It can be regarded as a direct analog of the pointwise convergence Theorem 12. c 2006 Peter J. any piecewise continuous function that satisfies the decay criterion | f (x) | ≤ M . g = f .

This completes our “proof”. the less accuracy there will be in the measurement of its momentum. are interrelated by the Fourier transform. so that P [ ϕ ] = ϕ′ (x). In its popularized form. in a physical system. meaning the less certainty there is in the momentum. Q. certain quantities cannot be simultaneously measured with complete accuracy. the wave functions or probability densities of a quantum system are characterized as unit elements.Now according to (13. Experimental verification of the uncertainty principle can be found even in fairly simple situations. e. the more precisely one measures the position of a particle.D. Heisenberg’s Uncertainty Principle is a by now familiar philosophical concept. In particular. the inner k integral can be replaced by a delta function δ(x−y). All measurable physical quantities are represented as linear operators A: L2 → L2 acting on (a suitable dense subspace of) the state space. is represented by the operation of multiplication by x. the particle’s position. first formulated by the twentieth century German physicist Werner Heisenberg. denoted by P for historical reasons. These are obtained by the tahter myusterious process of “quantizing” their classical counterparts. explicitly. the more constrained the position. Quantum Mechanics and the Uncertainty Principle In quantum mechanics. mapping L2 functions of the physical variable x to L2 functions of the frequency variable k.. position and momentum. see [175. (13. which are ordinary functions. What we will show is that the Uncertainty Principle is. ∞ −∞ | f (x |2 dx = ∞ −∞ | f (k) |2 dk. which. so Q[ ϕ ] = x ϕ(x).131) We conclude that the Fourier transform map F: L2 → L2 defines a unitary or normpreserving linear transformation on Hilbert space. Olver . In particular. Consider a light beam passing through a small hole. and hence ∞ −∞ f (k) g(k) dk = ∞ −∞ ∞ −∞ f (x) g(y) δ(x − y) dx dy = ∞ −∞ f (x) g(x) dx.130) results in the Plancherel formula f 2 = f 2 . Indeed. 2/25/07 740 c 2006 Peter J. 195] for a rigorous version. will have orthogonal Fourier transforms.114). Choosing f = g in (13. g = 0. in a one-dimensional model of a single particle. g = 0. ϕ = 1 of the underlying state space. vice versa. The position of the photons is constrained by the hole. f . a rigorous theorem concerning the Fourier transform! In quantum theory. the less certainty in its position. each of the paired quantities.g. and the wider the image on the screen. the effect of their momenta is in the pattern of light diffused on a screen placed beyond the hole. A similar uncertainty couples energy and time. is the Hilbert space L2 = L2 (R). f . The principle. For instance. one of the founders of modern quantum mechanics. the greater the accuracy in the momentum. This is not the place to discuss the philosophical and experimental consequences of Heisenberg’s principle. or. The smaller the hole. usually denoted by Q. states that.E. whereas its momentum. in fact. is represented by the differentiation operator P = d/dx. orthogonal functions.

which is small if and only if its Fourier transform is concentrated near k = 0. The inequality (13. and hence the smaller the error in its measured momentum. then Q[ ϕ ] P[ϕ] ≥ 1. The smaller this norm. 2 ϕ′ (x) = Q[ ϕ ] P[ϕ] . ϕ′ (x) ≤ x ϕ(x) (13. or multiplication and differentiation.. the more accurate the measurement. the less accurately we are able to measure the position P . 2 Q. lies at the heart of the Uncertainty Principle. ϕ = 1. With this interpretation. then the localization of the quantity A is measured by the norm A[ ϕ ] . This Fourier transform-based duality between position and momentum. so ϕ = 1. ϕ′ (x) = Let us integrate by parts.30. and hence ϕ(x) ϕ′ (x) = x ϕ(x) . † At another position a. writing out the inner product term x ϕ(x) . Q[ ϕ ] = x ϕ(x) measures the localization of the position of the particle represented by ϕ. 129]. and vice versa. If ϕ(x) is a wave function. Thus.132) Proof : The proof rests on the Cauchy–Schwarz inequality x ϕ(x) . 2/25/07 741 c 2006 Peter J. Theorem 13.132) quantifies the statement that the more accurately we measure the momentum Q. [124. by Plancherel’s formula (13.E. one replaces x by x − a. In general.g. x ϕ(x) ϕ′ (x) dx = − ∞ −∞ 1 2 ϕ(x)2 dx = − 1 .133) completes the proof.133) On the other hand. using the fact that d 1 2 2 ϕ(x) . measures the localization in the momentum of the particle. Olver . the boundary terms vanish. ϕ′ (x) = since ∞ −∞ ∞ −∞ x ϕ(x) ϕ′ (x) dx. (13. Similarly. you should consult an introductory text on mathematical quantum mechanics. dx Since ϕ(x) → 0 as | x | → ∞. For more details and physical consequences. If the quantum system is in a state represented by a particular wave function ϕ. the smaller Q[ ϕ ] .D. P [ ϕ ] = ϕ′ (x) = i k ϕ(k) = k ϕ(k) .131).20 says that the Fourier transform of the differentiation operator representing monmentum is a multiplication operator representing position on the frequency space and vice versa. suppose A is a linear operator on Hilbert space representing a physical quantity.Proposition 13. the Uncertainty Principle states that these two quantities cannot simultaneously be arbitrarily small. e. Substituting back into (13. the more concentrated the probability of finding the particle near† x = 0 and hence the smaller the error in the measurement of its position.

derivatives are transformed into algebraic operations. we may identify the Laplace √ transform with 2 π times the evaluation of the Fourier transform for values of k = − i s on the imaginary axis: √ F (s) = 2 π f (− i s). by our assumption. (13. the complex exponential factor becomes real. In engineering applications. including linearity. and many other fields of practical engineering and science. which underlies its applications to solving differential equations. (13. but this will not be required in the applications discussed here. in accordance with the standard convention. The Laplace transform of such a function is obtained by replacing i k by a real† variable s. Since it is so closely allied to the Fourier transform. Moreover. leading to √ where. and the result is the Laplace transform.31.13. † 2/25/07 742 c 2006 Peter J. However. electronics. where the exponent α is allowed to be complex. Example 13. 2π 0 since. the factor of 2 π has been omitted. By allowing complex values of the Fourier frequency variable k. The Laplace Transform. we switch our variable from x to t to emphasize this fact. it only looks forward in time and prefers functions that decay — transients. its negative t values make no contribution to the integral. Moreover.4. the Laplace transform is much better adapted to initial value problems. we are no longer required to restrict our attention to functions that decay to zero as t → ∞. For this reason. Consider an exponential function f (t) = eα t .135) Since the exponential factor in the integral has become real.134) 0 e(α−s) t dt = 1 . while the Fourier transform is used to solve boundary value problems on the real line. linear systems analysis. so f (t) = 0 for all t < 0. since the integral kernel e− s t is exponentially decaying for s > 0. s−α (13. the Laplace transform L takes real functions to real functions. Olver . The Laplace transform is one-sided. Its Laplace transform is F (s) = 0 ∞ F (s) = L[ f (t) ] = ∞ f (t) e− s t dt. The Laplace transform plays an essential role in control theory. the Laplace transform is most properly interpreted as a particular real form of the more fundamental Fourier transform. which maps real-valued functions to real-valued functions. the Fourier transform is often overshadowed by a close relative. When the Fourier transform is evaluated along the imaginary axis.136) One can also define the Laplace transform at complex values of s. The Fourier transform looks in both directions and prefers oscillatory functions. the Laplace transform enjoys many of its featured properties. Since we will be applying the Laplace transform to initial value problems. Suppose f (t) is a (reasonable) function which vanishes on the negative axis. The Fourier transform of f is ∞ 1 f (k) = √ f (t) e− i k t dt.

s2 (13. (13. from this viewpoint. trigonometric. s2 + ω 2 ∞ (13. in fact. All polynomial. s2 + ω 2 ∞ 0 L[ sin ω t ] = 1 s ω . if f (t) has exponential growth of order a. If a < 0.137) Two additional important transforms are L[ 1 ] = e− s t dt = 1 . Taking real and imaginary parts of this identity. once one moves beyond a purely mechanistic approach. L[ e i ω t ] = 1 s+ iω = 2 s− iω s + ω2 provided s > 0. Therefore. (13. only defined at sufficiently large s > Re α. Olver for all t > t0 . (13. we discover the formulae for the Laplace transforms of the simplest trigonometric functions: L[ cos ω t ] = ∞ 0 s . in the traditional approach to the Laplace transform. 0. and so the step function and the constant function are. for some M > 0 and t0 > 0. 2/25/07 743 c 2006 Peter J.Note that the integrand is exponentially decaying. strictly speaking. Therefore.136) is. the Laplace transform (13. Remark : In every case. Since ea t < eb t for a < b and all t > 0.32. any deeper understanding of the properties of the Laplace transform requires keeping this distinction firmly in mind. In particular. indistinguishable.140) s However. ∞ vanish.141) . Definition 13. However. making sure that the boundary terms at s = 0. it automatically has exponential growth of any higher order b > a. s L[ t ] = t e− s t dt = e− s t dt = 1 . Note that the exponential growth condition only depends upon the function’s behavior for large values of t. A function f (t) is said to have exponential growth of order a if | f (t) | < M ea t . then f is. t < 0. exponentially decaying as x → ∞. one only considers the functions on the positive t axis. t > 0. we really mean the Laplace transform of the function whose values are given for t > 0 and is equal to 0 for all negative t.139) and so the first formula in (13. for an oscillatory exponential.138) should more properly be written 1 .138) 0 The second computation relies on an integration by parts. L[ σ(t) ] = Let us now pin down the precise class of functions to which the Laplace transform can be applied. if and only if Re (α − s) < 0. and hence the integral converges. the function 1 in reality signifies the step function σ(t) = 1.

and extensive lists have been tabulated. which follows from its identification. in practice. the Laplace transforms of most common functions are not hard to find. Nevertheless. the integrand is exponentially decaying as t → ∞.E. According to Theorem 13. there is an explicit formula for the inverse Laplace transform. Theorem 13. Q. A more universally valid inverse Laplace transform formula is obtained by shifting the complex contour to run from b − i ∞ to b + i ∞ for some b > a.33. (13. coupled with a few basic rules that will be covered in the following subsection. See Section 16. If f (t) is piecewise continuous and has exponential growth of order a. for a rather broad class of functions that includes almost all of the functions that arise in applications. one hardly ever uses this complicated formula to compute the inverse Laplace transform. then its Laplace transform F (s) = L[ f (t) ] is defined for all s > a. including Mathematica and Maple. The stated formula doesn’t apply to all functions of exponential growth. and of course. and this suffices to ensure the convergence of the Laplace transform integral. as soon as s > a. Theorem 13. (13. since it grows faster than any simple exponential ea t .5 for details on complex integration. then f (t) = g(t) at all points of continuity t > 0. with the Fourier transform along the imaginary axis. If f (t) and g(t) are piecewise continuous functions that are of exponential growth.135).33 is an existential result.34.142) In practice. Lemma 13.28. a given function F (s) is the Laplace transform of the function f (t) determined by the complex integral formula† f (t) = 1 2π i i∞ −i∞ F (s) es t ds.and exponential functions (with linear argument) have exponential growth. Nowadays. except possibly at jump discontinuities where the limiting value must be half way in between. In fact. The following result guarantees the existence of the Laplace transform. the order of exponential growth of f . Olver . The simplest 2 example of a function that does not satisfy any exponential growth bound is f (t) = et . Under suitable hypotheses. An abbreviated table of Laplace transforms can be found on the following page. the Fourier transform uniquely specifies the function. one simply relies on tables of known Laplace transforms. when it exists. and L[ f (t) ] = L[ g(t) ] for all s sufficiently large. Proof : The exponential growth inequality (13. t > 0. at least for sufficiently large values of the transform variable s. we may not be able to explicitly evaluate the Laplace transform integral.141) implies that we can bound the integrand in (13. Therefore. † 2/25/07 744 c 2006 Peter J.D. the most convenient sources of transform formulas are computer algebra packages. Rather. An analogous result can be established for the Laplace transform.134) by | f (t) e− s t | < M e(a−s) t . [141].

Olver .Table of Laplace Transforms f (t) 1 t tn δ(t − c) eα t cos ω t sin ω t ec t f (t) σ(t − c) f (t − c) t f (t) f ′ (t) f (n) (t) 1 s 1 s2 n! sn+1 e− s c 1 s−α s2 s2 F (s) s + ω2 ω + ω2 F (s − c) e− s c F (s) − F ′ (s) s F (s) − f (0) sn F (s) − sn−1 f (0) − − sn−2 f ′ (0) − · · · − f (n−1) (0) F (s) G(s) f (t) ∗ g(t) In this table. while c ≥ 0 is any non-negative real number. 2/25/07 745 c 2006 Peter J. n is a non-negative integer. ω is any real number.

the Laplace transform of the function sin ω t is ω L[ sin ω t ] = 2 .144) f (t) e ′ −st dt = f (t) e −st ∞ t=0 +s 0 ∞ f (t) e− s t dt = lim f (t) e − f (0) + s F (s).137).36. Example 13. Olver s2 ω2 −1=− 2 .143) for any constant c. L[ f ′ (t) ] = s F (s) − f (0). but with one additional term that depends upon the value of the function at t = 0.144). the Laplace transform converts calculus into algebra. Moreover. and the remaining terms agree with (13. Proof : The proof relies on an integration by parts: L[ f (t) ] = ′ ∞ 0 t→∞ −st (13. dt and so L[ − ω sin ω t ] = s L[ cos ω t ] − 1 = again in agreement with the known formula.31. s + ω2 Its derivative is d sin ω t = ω cos ω t. On the other hand. just like its Fourier progenitor. The result agrees with (13.D. dt and therefore L[ ω cos ω t ] = s L[ sin ω t ] = s2 ωs . 2/25/07 746 c 2006 Peter J.141) implies that first term vanishes for s > a. In particular. s2 + ω 2 s + ω2 . and so L[ f + g ] = L[ f ] + L[ g ]. + ω2 since sin ω t vanishes at t = 0. for s > a. According to Example 13.E. Theorem 13. Q. L[ c f ] = c L[ f ].The Laplace Transform Calculus The first observation is that the Laplace transform is a linear operator. The exponential growth inequality (13. (13. Let f (t) be continuously differentiable for t > 0 and have exponential growth of order a. If L[ f (t) ] = F (s) then.35. d cos ω t = − ω sin ω t. differentiation turns into multiplication by the transform variable.

L[ t f (t) ] = − F ′ (s).147) f (t − c) e− s t dt = dt + 0 ∞ ∞ −c f (t) e− s (t+c) dt dt = e −sc 0 ∞ (13. but with one important caveat. 2/25/07 747 c 2006 Peter J. However. Since all functions must vanish for t < 0. a shift to the left would produce nonzero function values for some t < 0. and. then. it is important to keep in mind that c > 0 and the function f (t − c) vanishes for all t < c. then f (t) has a jump discontinuity of magnitude f (0) at t = 0. In the table. the fact that f (t) ≡ 0 for t < 0 was used to eliminate the integral from − c to 0. the factor σ(t − c) is used to remind the user of this fact. for an n times continuously differentiable function. (13. one suppresses the delta function when computing the derivative f ′ (t). Olver . namely f (0) δ(0). and so if f (0) = 0. In the practical approach to the Laplace transform calculus. is L[ t2 ] = L[ f (t − c) ] = 0 ∞ 0 t L[ f (n) (t) ] = sn F (s) − sn−1 f (0) − sn−2 f ′ (0) − · · · − f (n−1) (0). we first used a change of variables in the integral. = −c f (t) e − s (t+c) f (t) e − s (t+c) In this computation. replacing t − c by t.150) f (t) e− s t dt = e− s c F (s).148) s 0 Unlike the Fourier transform.146) On the other hand. more generally.144) is a manifestation of the discontinuity in f (t) at t = 0. (13. Therefore. L[ f ′′ (t) ] = s L[ f ′ (t) ] − f ′ (0) = s2 F (s) − s f (0) − f ′ (0). Conversely.144). For example. (13.19 for Fourier transforms. we are only allowed to shift them to the right. When using the shift formula in practice. so F (s) . For instance. In general. L[ tn ] = n+1 . integration corresponds to dividing the Laplace transform by s. and the upshot is the extra term − f (0). which accounts for the additional constant term in its transform.Remark : The final term − f (0) in (13. L f (τ ) dτ = 1 2 n! L[ 2 t ] = 3 . then In general. its effect must reappear on the other side of the differentiation formula (13. if f ∈ C2 . Keep in mind that the Laplace transform only applies to functions with f (t) = 0 for all t < 0. multiplying the function by t corresponds to differentiation of its Laplace transform (up to a change of sign): The proof of this formula is left as an exercise for the reader. Laplace transforms of higher order derivatives are found by iterating the first order formula (13.144).145) (13. (13. its derivative f ′ (t) should include a delta function term. by the calculus of generalized functions. there are no additional terms in the integration formula as long as we start the integral at t = 0.149) s s s There is also a shift formula. analogous to Theorem 13. the Laplace transform of the function f (t − c) shifted to the right by an amount c > 0.

If L[ f (t) ] = F (s) and L[ g(t) ] = G(s). Setting L[ u(t) ] = U (s) and L[ f (t) ] = F (s). it is not hard to show that the convolution of two functions of exponential growth also has exponential growth.144.154) Peter J. s (13.139).26.152) In particular h(t) = 0 for all t < 0 also. b < t < c. du (0) = β. As a prototypical example. (13. The proof of the convolution theorem for the Laplace transform proceeds along the same lines as its Fourier transform version Theorem 13. constant coefficient ordinary differential equations. Consider the square wave pulse f (t) = f (t) = σ(t − b) − σ(t − c) of shifted versions of the step function (13. Applications to Initial Value Problems The key application of the Laplace transform is to facilitate the solution of initial value problems for linear. Combining the shift formula (13. 0 < b < c. and making use of the initial conditions.37.127) reduces to a finite integral t h(t) = f (t) ∗ g(t) = 0 f (t − τ ) g(τ ) dτ. otherwise. 2 dt dt u(0) = α. To compute its Laplace transform.151) We already noted that the Fourier transform of the convolution product of two functions is realized as the ordinary product of their individual transforms. dt (13.140) for the Laplace transform of the step function. In view of the differentiation formulae (13. Since we are implicitly assuming that the functions vanish at all negative values of t. then the convolution h(t) = f (t) ∗ g(t) has Laplace transform given by the product H(s) = F (s) G(s).1. Let f (t). a s2 L[ u(t) ] − s u(0) − u(0) + b s L[ u(t) ] − u(0) + c L[ u(t) ] = L[ f (t) ]. consider the second order initial value problem a du d2 u +b + c u = f (t). We will solve the initial value problem by applying the Laplace transform to both sides of the differential equation. we find L[ f (t) ] = L[ σ(t − b) ] − L[ σ(t − c) ] = e− s b − e − s c . for some 0. Further. b. we write it as the difference Example 13. c are constant.150) and the formula (13. the preceding equation takes the form (a s2 + b s + c) U (s) = F (s) + (a s + b) α + a β. g(t) be given functions. A similar result holds for the Laplace transform. their convolution product (13. 2/25/07 748 c 2006 (13. and is left as Exercise for the reader. 145). Olver .153) in which a.38. Theorem 13.

Thus. Consider the initial value problem u + u = 10 e− 3 t . otherwise. 2/25/07 749 c 2006 Peter J. Of course. We conclude that the solution to our initial value problem is u(t) = e− 3 t + 5 sin t. we find (s2 + 1) U (s) − s − 2 = 10 .40. Assume that the mass starts at rest.39. in practice the inverse transform is found by suitably massaging the answer (13.151). 0. 1 2 π < t < 2 π. Consider a mass vibrating on a spring with fixed stiffness c = 4. s2 + 1 (s + 3)(s2 + 1) The second summand does not directly correspond to any of the entries in our table of Laplace transforms. s+3 and so U (s) = s+2 10 + . The initial value problem is d2 u + 4u = dt2 1. u(0) = u(0) = 0. As noted earlier. s and so U (s) = e− πs/2 − e−2 πs . Olver . by applying the Laplace transform.155) we then recover solution u(t) to the initial value problem by finding the inverse Laplace transform of U (s). u(0) = 1. u(0) = 2. s(s2 + 4) Therefore. Taking the Laplace transform of the differential equation as above. we have effectively reduced the entire initial value problem to a single elementary algebraic equation! Solving for U (s) = F (s) + (a s + b) α + a β . the last example is a problem that you can easily solve directly. however. by the shift formula (13. is then subjected to a unit force over time interval 1 2 π < t < 2 π. after which it left to vibrate on its own. and does not require all the extra Laplace transform machinery! The Laplace transform method is. we can use the method of partial fractions to write it as a sum s+2 1 3−s 1 5 U (s) = 2 + + 2 = + 2 s +1 s+3 s +1 s+3 s +1 of terms appearing in the table. Example 13. Taking the Laplace transform of the differential equation. However. a s2 + b s + c (13.155) to be a combination of known transforms. Example 13. and using (13.150) 1 u(t) = h t − 2 π − h(t − 2 π). we find (s2 + 4) U (s) = e− πs/2 − e−2 πs . particularly effective when dealing with complications that arise in cases of discontinuous forcing functions. The standard method learned in your first course on differential equations is just as effective in finding the final solution.

c 2006 (13. (s2 + ω 2 ) U (s) = F (s). 1 0 ≤ t ≤ 2 π. 2 π ≤ t. ω 0. The integration constants are then adjusted so that the solution satisfies the initial conditions and is continuously differentiable at each point of discontinuity of the forcing function. but messy. + ω2 The right hand side is the product of the Laplace transform of the forcing function f (t) sin ω t . which has been conveniently rewritten using partial fractions. 1 2 π ≤ t ≤ 2 π. 4 − 4 cos 2t. t < 0.41. leading to a solution on that interval depending upon two integration constants. h(t) = 0. The Laplace transform method successfully bypasses these intervening manipulations. Consider the initial value problem d2 u + ω 2 u = f (t). our desired solution is   0. One solves the differential equation on each interval of continuity of the forcing function. The details are straightforward. Olver . Referring to our table of Laplace transforms.  1 4  2 cos 2t.156) k(t) = 2/25/07 Peter J. Example 13. 750 0 sin ω(t − τ ) f (τ ) dτ. t < 0. Remark : A direct solution of this problem would proceed as follows. dt2 u(0) = u(0) = 0. 1 1 t > 0. Theorem 13. Therefore. ω t > 0. Therefore.where h(t) is the function with Laplace transform L[ h(t) ] = H(s) = s(s2 1 1 = + 4) 4 s 1 − 2 s s +4 .  1 1 u(t) = 4 + cos 2t. Applying the Laplace transform to the differential equation and using the initial conditions. involving a general forcing function f (t). Note that the solution u(t) is only C1 at the points of discontinuity of the forcing function.38 implies and that of the trigonometric function k(t) = ω that the solution can be written as their convolution t t u(t) = f ∗ k(t) = where 0 k(t − τ ) f (τ ) dτ = sin ω t . and hence U (s) = s2 F (s) .

More details can be found in almost all applied texts on mechanics. [Ltr]. Note particularly that (unlike boundary value problems) the impulse only affect the solutions at later times t > τ . signal processing. control theory. This concludes our brief introduction to the Laplace transform and a few of its many applications to physical problems. electronic circuits. 2/25/07 751 c 2006 Peter J. Olver .156) can be viewed as a linear superposition of fundamental solution responses induced by expressing the forcing function f (t) = 0 ∞ f (τ ) δ(t − τ ) dτ as a superposition of concentrated impulses over time.The integral kernel k(t − τ ) is known as the fundamental solution to the initial value problem. the fundamental solution plays a role similar to that of a Green’s function in a boundary value problem. and prescribes the response of the system to a unit impulse force that is applied instantaneously at the time t = τ . For initial value problems. and many other areas. The convolution formula (13.