You are on page 1of 10

Math 395 (Fall 2010).

Honors Analysis I
19. Convolutions and applications
Convolution is an operation on functions that is as important as the Fourier transform.
In this section we study some of the basic questions one can ask about convolutions and
discuss a few important applications of the answers we will find.
19.1. Main themes. Here are the main themes we will encounter:
• The convolution f ∗ g of two functions, f and g, is usually a “nice” (for instance,
continuous, differentiable, or even infinitely differentiable) function provided one
(or both) of f or g is “nice.” In fact, sometimes f ∗ g is “nicer” than both f and
g: if f ∈ L p (Rn ) and g ∈ L q (Rn ), where p1 + 1q = 1 and 1 < p, q < ∞, then f ∗ g
is continuous even though f and g may both be discontinuous.
• Convolution is related to averaging a function over various sets.
• The ideas above can be used for approximating “bad” functions with well-behaved
ones. As a concrete example, we will use convolutions to show that if Ω ⊂ Rn is
any open set, then1 C0∞ (Ω) is dense in Lp (Ω) for every 1 ≤ p < ∞.
19.2. First question. Before studying applications of convolutions, we must first analyze
the question of when is the convolution of two functions, f and g, defined. Recall that if
f, g are functions on Rn , their convolution is formally defined as the integral
(19.1) (f ∗ g)(x) = f (x − y)g(y) dm(y) ∀ x ∈ Rn .
We will take the natural viewpoint that f ∗ g is defined when the functions f and g are
both measurable (otherwise we have little hope of giving meaning to the right hand side
of (19.1)), and when the function y 7→ f (x − y)g(y) is integrable for almost every x ∈ Rn .
In this case f ∗ g is well defined as an element of the quotient space

measurable functions f : R −→ C
Fun(Rn ) = .
{functions that are 0 almost everywhere
Remarks 19.1. (1) For many practical applications, it is important to only insist that the
function y 7→ f (x − y)g(y) is integrable for almost every x ∈ Rn . For example, this
requirement holds if f, g are integrable. However, for general f, g ∈ L 1 (Rn ), the
function y 7→ f (x − y)g(y) could fail to be integrable for certain values x ∈ Rn .
(2) There are some situations where the product y 7→ f (x − y)g(y) is not integrable, and
yet the right hand side of (19.1) can be defined. This is related to the phenomenon
R∞ that
a function f on R could have a conditionally convergent improper integral −∞ f (x) dx
without being integrable. We only insist on integrability to simplify the exposition.
1If X is a topological space and f : X −→ C is a function, the support of f , denoted supp(f ), is
the closure in X of the set {x ∈ X f (x) 6= 0}. The space C0∞ (Ω) consists of all infinitely differentiable
functions f : Ω −→ C such that supp(f ) is compact.

19.3. Some examples. In addition to the question of when the function y 7→ f (x−y)g(y)
is integrable for almost every x ∈ Rn , one can also ask what properties the convolution
f ∗ g has (of course, the answer depends on the assumptions one makes about f and
g). Here are some of the most common examples that arise in the study of general
measure-theoretic properties of functions (examples that have to do with differentiability
are discussed later on in this section).
(1) Suppose that 1 ≤ p, q ≤ ∞ and p1 + 1q = 1. If f ∈ L p (Rn ) and g ∈ L q (Rn ),
then f ∗ g exists everywhere (as opposed to merely almost everywhere), and satisfies
|(f ∗ g)(x)| ≤ ||f ||p · ||g||q for every x ∈ Rn . This follows very easily from Hölder’s
inequality. If, moreover, 1 < p, q < ∞, then f ∗ g is a continuous function that
vanishes at infinity (this is a little more challenging to prove; see Theorem 19.16).
(2) If f, g ∈ L 1 (Rn ), then f ∗ g exists almost everywhere and is also integrable; in fact,
||f ∗ g||1 ≤ ||f ||1 · ||g||1 . This is a fairly easy consequence of Fubini’s theorem.
(3) In problem 9.4 you were asked to show that if 1 ≤ p < ∞, if f ∈ L p (Rn ), and if
g ∈ L 1 (Rn ), then f ∗ g exists a.e. and lies in L p (Rn ); in fact, ||f ∗ g||p ≤ ||f ||p · ||g||1 .
This is shown using Minkowski’s inequality, and implies (2) as a special case.
(4) Finally, here is a result that generalizes both (2) and (3), as well as the first assertion of
(1). It is known as Young’s inequality. Let 1 ≤ p, q, r ≤ ∞ be such that p1 + 1q = 1 + 1r .
If f ∈ L p (Rn ) and g ∈ L q (Rn ), then f ∗ g exists a.e. and belongs to Lr (Rn ). In fact,
||f ∗ g||r ≤ ||f ||p · ||g||q . Note that by taking q = 1 and r = p one recovers (3), while
by taking r = ∞ one recovers (more or less) the first assertion of (1).
19.4. Exercise. It is instructive to compute (at least once) the convolution of two func-
tions by hand directly from the definition. Let f = 1[0,1] = g be the indicator function
of the interval [0, 1] ⊂ R. Calculate f ∗ g using the definition (19.1). You should obtain
a continuous piecewise linear function that equals x when 0 ≤ x ≤ 1, equals 2 − x when
1 ≤ x ≤ 2, and is zero everywhere else. (Do this without looking at your class notes!)
19.5. Averages. In order to understand how convolutions are related to averages, it is
useful to ask the following question. Suppose2 f ∈ Lloc 1
(Rn ), and consider the function
that to a point x ∈ Rn assigns the average of f over the ball

B (x) = y ∈ Rn ||y − x|| <  ,

where  > 0 is fixed. How can we express this function using convolution?
The answer is that this function is f ∗ g, where g = m(B1 (0)) · 1B (0) is the normalized
indicator function of the -ball around the origin (the normalization is needed
R because we
want to get the honest averages of f ; thus we need to make sure that Rn g dm = 1).
2We write Lloc
(Rn ) for n
R the space of locally integrable functions on R n, that is, measurable functions
f : R −→ C such that K |f | dm < ∞ for every compact subset K ⊂ R . Note that Lloc
n 1
(Rn ) contains
L p (Rn ) for every 1 ≤ p ≤ ∞. Since for the most part of this section we deal with averages of functions
over various small neighborhoods of points, it is natural to work with locally integrable functions.

Now we mention two very vague heuristic principles (it is difficult to make them precise).
1. Continuity and differentiability properties of functions have something to do with their
“local oscillation.”
2. Replacing the values of a function with its local averages around every point (as above)
should decrease oscillation.
Example 19.2. The function f on R defined by f (x) = sin(1/x) if 0 < x < 1 and f (x) = 0
otherwise is not continuous at 0 because it “oscillates too much.” Nevertheless, Theorem
19.16 below implies that if g = 21 1(−,) , as above, then f ∗ g is continuous ∀  > 0.
Because of the heuristic principles mentioned above, we expect that convolution with
something like 1B (0) should improve the local behavior of functions. In fact, it is even
better to consider local weighted averages, where the weight function is smooth.
Let us explain what we mean by this. Suppose g ∈ C ∞ (Rn ) has the following properties:
• g vanishes outside B (0) (in particular, g has compact support);
• g ≥ 0 everywhere;
• Rn g dm = 1 (so g is an “honest” weight function).
It turns out that if f ∈ Lloc
(Rn ), the convolution f ∗ g is always smooth (see Lemma
19.11 below). It also turns out that if f ∈ L p (Rn ) for some 1 ≤ p < ∞ and  is
sufficiently small, then f ∗ g is close to f with respect to the Lp norm. This fact is used
for approximating Lp functions by smooth functions with compact support, and a large
part of the material that follows is devoted to explaining the details of this technique.
Going back to the ordinary “local averages” of a function f (without any weights), let
us briefly mention what is known.
Definition 19.3. If f ∈ Lloc
(Rn ), a point x ∈ Rn is said to be a Lebesgue point of f if
lim |f (y) − f (x)| dm(y) = 0.
→0 m(B (x)) B (x)

Example 19.4. If f is continuous at x, it easily follows that x is a Lebesgue point of f .

Remark 19.5. In general, it is easy to check (using the triangle inequality) that if x ∈ Rn
is a Lebesgue point of a function f ∈ Lloc
(Rn ), then
f (x) = lim f (y) dm(y)
→0 m(B (x)) B (x)

(though the converse is not always true).

The following result is very remarkable, and a priori it is not at all clear why something
like that should be true. We will not prove it in this course3.
Theorem 19.6. If f ∈ Lloc
(Rn ), then almost every x ∈ Rn is a Lebesgue point of f .
3Recommended reading: Chapter 7 of Rudin’s “Real and Complex Analysis”.

Equivalently, the set of points x ∈ Rn such that x is not a Lebesguepoint of f has

measure zero. As a corollary, if f ∈ Lloc
(Rn ), we have f ∗ m(B1 (0)) · 1B (0) → f as  → 0
pointwise almost everywhere on Rn .

19.6. Density of Cc (Ω) in Lp (Ω). An important step in showing that C0∞ (Ω) is dense in
Lp (Ω) for an open set Ω ⊂ Rn and for every 1 ≤ p < ∞ is to prove the following weaker
statement. We will see that it follows fairly easily from the regularity properties of the
Lebesgue measure. In fact, it is related to a special case of a result known as Lusin’s
theorem, but a discussion of the latter is beyond the scope of these notes4.
If X is any topological space, we write5 Cc (X) for the space of continuous functions
X −→ C whose support is compact.
Theorem 19.7. If Ω ⊂ Rn is an open set and 1 ≤ p < ∞, then Cc (Ω) is dense in Lp (Ω).
Proof. Write S (Ω) for the space of all simple (measurable) functions s : Ω −→ C such
that m({s 6= 0}) < ∞. We have seen before that S (Ω) is dense in Lp (Ω). Thus it suffices
to show that every s ∈ S (Ω) can be written as the Lp limit of a sequence {fj ∈ Cc (Ω)}∞
j=1 .
By linearity, we may assume that s = 1A , where A ⊂ Ω is a Borel subset with m(A) < ∞.
Fix  > 0. By the regularity properties of the Lebesgue measure, there exist a compact
subset K and an open subset U of Rn such that K ⊂ A ⊂ U ⊂ Ω and m(U \ K) < .
By Lemma 19.9 below, there exists f ∈ Cc (Ω) such that 0 ≤ f ≤ 1 everywhere, f ≡ 0
on Ω \ U and f ≡ 1 on K. It follows that |f − 1A | ≤ 1 everywhere, and f = 1A outside
U \ K. Hence ||f − 1A ||p ≤ 1/p . Since  > 0 is arbitrary, the result follows. 
Lemma 19.8. If X is a metric space with distance function dist : X × X −→ [0, +∞)
and C0 , C1 ⊂ X are closed subsets such that C0 ∩ C1 = ∅, then there exists a continuous
function f : X −→ [0, 1] such that f (x) = 0 ⇐⇒ x ∈ C0 and f (x) = 1 ⇐⇒ x ∈ C1 .
Proof. Recall that for a closed subset C ⊂ X, we can consider the function

dist(x, C) = inf dist(x, y) y ∈ C ,
which is continuous w.r.t. x and has the property that dist(x, C) = 0 ⇐⇒ x ∈ C. Put
dist(x, C0 )
f (x) = .
dist(x, C0 ) + dist(x, C1 )
Note that this is well defined because C0 ∩ C1 = ∅, so the denominator is never zero. It
is easy to see that f has all the required properties. 
4Recommended reading: Chapter 2 of Rudin’s “Real and Complex Analysis”.
5Eventhough we agreed to use the notation C0∞ (Ω) for the space of infinitely differentiable functions
with compact support on a given open set Ω ⊂ Rn , it would be inadmissible to write C0 (X) in place
of Cc (X) because the former symbol means something different. For instance, C0 (R) is the space of
continuous functions f : R −→ C such that limx→±∞ f (x) = 0; thus C0 (R) ) Cc (R).

Lemma 19.9. Let K ⊂ U ⊂ Ω ⊂ Rn be subsets6 such that K is compact and U, Ω are

open. There exists a function f ∈ Cc (Ω) such that 0 ≤ f ≤ 1 everywhere, f ≡ 0 on Ω \ U
and f ≡ 1 on K.
Proof. For each x ∈ K there exists a number rx > 0 such thatSBrx (x) ⊂ U (where Br (x)
denotes the ball of radius r around x). Tautologically, K ⊂ x∈K Brx /2 (x). Since K is
compact, there is a finite set of points x1 , x2 , . . . , xN ∈ K such that K ⊂ N
j=1 Brxj /2 (xj ).
The union V := j=1 Brxj /2 (xj ) is open and its closure in Ω is compact because it equals
the union of the finitely many closed balls Brxj /2 (xj ). Now let C = Ω \ V ; note that
Ω \ U ⊂ C. By Lemma 19.8, there is a continuous function f : Ω −→ [0, 1] such that
f ≡ 1 on K and f ≡ 0 on C. Since f ≡ 0 outside V , we see that supp(f ) is compact. 

19.7. Density of C0∞ (Ω) in Lp (Ω). One of the main goals of this section is to sketch a
proof of the following result (the missing details are a part of Problem Set 9).
Theorem 19.10. If Ω ⊂ Rn is open and 1 ≤ p < ∞, then C0∞ (Ω) is dense in Lp (Ω).
There are two approaches to this result. One has a more ad hoc nature; the special
case where Ω = R appears as problem 9.7 on the homework. That approach uses the fact
that step functions are dense in Lp (R). A more conceptual approach uses convolutions,
and it is the one that will be explained in this subsection.
Lemma 19.11. If f ∈ Lloc
(Rn ) and g ∈ C0∞ (Rn ), then f ∗ g ∈ C ∞ (Rn ). In fact,
∂(f ∗ g) ∂g
=f∗ for all 1 ≤ j ≤ n.
∂xj ∂xj
This lemma is a slightly more general form of homework problem 9.8. Note that, by
induction, it suffices to prove the second statement of the lemma, because each of the
partial derivatives ∂x j
belongs to C0∞ (Rn ). To prove the second assertion, one rewrites it
in a form suitable for application of the result on differentiation under the integral sign
(problem 9.1 on the homework).
First we claim that we may assume that f ∈ L 1 (Rn ) (i.e., f is integrable instead of
being merely locally integrable). Indeed, we are interested in the partial derivatives of
the function f ∗ g at a given point x ∈ Rn . The partial derivatives only depend on the
values of f ∗ g on a small open ball B (x) around the point x. Furthermore, if we put
K = supp(g), then K is compact, and hence the set of points A of the form x0 − y, where
x0 ∈ B (x) and y ∈ K, is also compact. But the values of f outside A do not affect the
values of f ∗g on B (x). In other words, if we put f1 = 1A ·f , then (f1 ∗g)(x0 ) = (f ∗g)(x0 )
for every x0 ∈ B (x). However, f1 is integrable because A is compact.

6In fact, Rn can be replaced with any metric space that has the Heine-Borel property.

So from now on we assume that f ∈ L 1 (Rn ). Recall that

(f ∗ g)(x) = (g ∗ f )(x) = g(x − y)f (y) dm(y).
Fix x ∈ R and 1 ≤ j ≤ n, and consider the function
F (y, t) = g(x + tej − y)f (y), y ∈ Rn , t ∈ (−1, 1).
where ej ∈ Rn denotes the j-th standard basis vector (i.e., ej has 1 in the j-th coordinate
and 0’s everywhere else). Using the assumption that g ∈ C0∞ (Rn ), it is not hard to check
that the assumptions of homework problem 9.1 are satisfied for the function F . But
∂(g ∗ f )
∂F ∂g d
(y, 0) = (x − y)f (y) and F (y, t) dm(y) = (x),
∂t ∂xj dt t=0 Rn ∂xj

so the lemma follows.

Lemma 19.12. There exists a function δ ∈ C0∞ (Rn ) such that:
• δ(x) ≥ 0 for all x ∈ Rn ;
• δ(x) = δ(−x) for all x ∈ Rn ;
• δ(x) = 0 when ||x|| ≥ 1 (where ||x|| = (x21 + · · · + x2n )1/2 );
• Rn δ(x) dm(x) = 1.
This is homework problem 9.9.
Lemma 19.13. Let δ ∈ C0∞ (Rn ) satisfy the requirements of Lemma 19.12. For each
 > 0 define δ (x) = −n · δ(x/). If f ∈ Cc (Rn ), then f ∗ δ → f as  → 0 pointwise on
Rn , and f ∗ δ → f as  → 0 with respect to the Lp norm for each 1 ≤ p < ∞.
You should prove this lemma as part of your solution of homework problem 9.10.
The statement about pointwise convergence is proved in essentially the same way as
the fact that for a continuous function f on Rn , every point x ∈ Rn is a Lebesgue point
(cf. Example
R 19.4). The precise form of the functions δ is not important. What matters
is that Rn δ dm = 1 for each  > 0, and the support of δ is contained in B (0).
To deduce the statement about Lp convergence, use the Dominated Convergence The-
orem together with the fact that if  < 1, then
 the nsupports
of all the functions
f ∗ δ are
contained in a single compact set, namely, x ∈ R dist(x, supp(f )) ≤ 1 .

Finally, why is Lemma 19.13 useful in problem 9.10 on the homework? It’s the same
idea that we saw in the proof of the Riemann-Lebesgue lemma, and again in the proof of
Theorem 19.16 (see below), which was already explained in class. Namely, the fact that
Cc (Rn ) is dense in Lp (Rn ) for 1 ≤ p < ∞ (see Theorem 19.7) together with the second
assertion of Lemma 19.13 is enough to deduce that f ∗ δ → f as  → 0 in Lp (Rn ) for
every f ∈ Lp (Rn ). An important ingredient here is the observation that ||δ ||1 = 1 for
every  > 0 by construction. Hence ||f ∗ δ ||p ≤ ||f ||p for every f ∈ Lp (Rn ) by homework
problem 9.4.

Remark 19.14. Because of Lemma 19.13 and problem 9.10 on the homework, the family
of functions {δ }>0 is sometimes called an approximate identity. One can, in fact, give
precise meaning to the statement that as  → 0, the functions δ “converge to the Dirac
delta-function”; the latter acts as an identity for convolution (see Example 19.19 below).
At last, we are ready for the
Proof of Theorem 19.10. Fix f ∈ Lp (Ω) and a > 0. By Theorem 19.7, there is f0 ∈ Cc (Ω)
such that ||f −f0 ||p < a/2. Since f0 has compact support, we can extend it by zero outside
Ω and view it as an element of Cc (Rn ). By Lemma 19.13, for sufficiently small  > 0, we
have ||f0 − f0 ∗ δ || < a/2, and by Lemma 19.11, f0 ∗ δ is infinitely
differentiable for every

 > 0. Now if  is sufficiently small, the compact set x ∈ Rn dist(x, supp(f0 )) ≤  is
contained in Ω, and by construction, it contains the support of f0 ∗δ . Thus for sufficiently
small  > 0, we have f0 ∗δ ∈ C0∞ (Ω), and also ||f −f0 ∗δ ||p < a by the triangle inequality.
Since a > 0 is arbitrary, the proof of the theorem is complete. 
19.8. A concrete application of convolutions. In this subsection we will prove
Proposition 19.15.
 If A, B ⊂ R are Borel subsets of positive (Lebesgue) measure, then
the set A + B = a + b a ∈ A, b ∈ B contains a nonempty open interval.
Along the way, we will prove the following result, which is interesting in its own right.
Theorem 19.16. Let 1 < p, q < ∞ be such that p1 + 1q = 1. If f ∈ L p (R) and g ∈ L q (R),
then the convolution f ∗ g is a continuous function on R that vanishes at infinity.
Before moving on, let us deduce Proposition 19.15 from Theorem 19.16.
Proof of Proposition 19.15. Since m(A) and m(B) > 0, there exists N ≥ 1 such that
m(A ∩ [−N, N ]) > 0 and m(B ∩ [−N, N ]) > 0. Replacing A with A ∩ [−N, N ] and B
with B ∩ [−N, N ], we may assume that A and B are bounded (in fact, it is enough to
assume that they have finite measure).
In this case the indicator functions f = 1A and g = 1B belong to every L p space; in
particular, to L 2 (R). Applying Theorem 19.16 with p = q = 2, we see that the function
h = 1A ∗ 1B is continuous. Using Fubini’s R theorem and the translation invariance of the
Lebesgue measure it is easy to calculate R h dm = m(A) · m(B) > 0. In particular, h is
not identically zero, so by continuity, there is a nonempty open interval I ⊂ R such that
h(x) 6= 0 for all x ∈ I. However, if h(x) 6= 0 for a given x ∈ R, then 1A (x − y)1B (y) 6= 0
for some y ∈ R, whence x ∈ A + B. Therefore I ⊂ A + B, as desired. 
The proof of Theorem 19.16 is based on the lemma below (which is also useful in some
other contexts). To state it, let us make the following definition. If f is any function on
R and x ∈ R, we let Tx f be the function given by (Tx f )(y) = f (x + y). Note that the
operator Tx (acting on the space of all functions on R) preserves various properties such
as measurability, integrability, and so on. Moreover, for any measurable f : R −→ C, we
have ||Tx f ||p = ||f ||p for each 0 < p ≤ ∞ by the translation invariance of the Lebesgue
measure. Thus Tx induces an isometry on each of the spaces Lp (R) with 1 ≤ p ≤ ∞.

Lemma 19.17. Fix f ∈ L p (R), where 1 ≤ p < ∞. Given x ∈ R and a sequence of

points {xj }∞
j=1 ⊂ R converging to x, we have Txj f → Tx f as j → ∞ with respect to the
L -norm, i.e., limj→∞ ||Txj f − Tx f ||p = 0.
Proof. The key ingredient is the fact that the space T of step functions on R is dense in
Lp (R) for every 1 ≤ p < ∞. In more detail, suppose first that f = 1[a,b] is the indicator
function of a closed bounded interval [a, b] ⊂ R. Then Tx f is the indicator function of
[a − x, b − x] for every x ∈ R. If |xj − x| is sufficiently small, the symmetric difference
[a − x, b − x] 4 [a − xj , b − xj ] is a union of two intervals of length |x − xj |, and therefore
||Tx f − Txj f ||p ≤ (2 · |x − xj |)1/p . This proves that limj→∞ ||Txj f − Tx f ||p = 0 in this case.
By linearity, the same assertion holds for every f ∈ T .
Now let f ∈ L p (R) be arbitrary and fix  > 0. Since T is dense in Lp (R), there exists
f0 ∈ T such that ||f − f0 ||p < /3. By the previous paragraph, there exists N ≥ 1 such
that ||Txj f0 − Tx f0 ||p < /3 for every j ≥ N . So if j ≥ N , it follows that
||Txj f − Tx f ||p ≤ ||Txj f − Txj f0 ||p + ||Txj f0 − Tx f0 ||p + ||Tx f0 − Tx f ||p
= ||f − f0 ||p + ||Txj f0 − Tx f0 ||p + ||f − f0 ||p < + + = ,
3 3 3
which is what we need to show. 
Proof of Theorem 19.16. Let us first prove that f ∗ g is continuous. Fix a point x ∈ R
and a sequence {xj }∞ j=1 ⊂ R converging to x. We have

(f ∗ g)(xj ) − (f ∗ g)(x) = f (xj − y)g(y) dm(y) − f (x − y)g(y) dm(y)


≤ |f (−xj + y) − f (−x + y)| · |g(−y)| dm(y)

≤ ||T−xj f − T−x f ||p · ||g||q ,
where in the second step we used the change of variables y ↔ −y and in the third step
we used Hölder’s inequality. By Lemma 19.17, ||T−xj f − T−x f ||p → 0 as j → ∞, which
implies that f ∗ g is continuous.
We skip the details of the proof that f ∗ g vanishes at infinity (since this statement is
not used in the proof of Proposition 19.15). However, the idea is again very similar to
the one we used in the Riemann-Lebesgue lemma and in Lemma 19.17. Namely, if f and
g are both step functions, it is easy to see that f ∗ g is in fact identically zero outside
some bounded interval. Then one uses the fact that the space T of step functions on R
is dense in Lp (R) and in Lq (R), along with Hölder’s inequality. 

19.9. Other instances of convolution. Convolution can be defined for objects other
than functions; here we mention two useful examples (although there are others).
19.9.1. Convolution of functions with measures. Let µ be a positive7 measure on the σ-
algebra B of Borel subsets of Rn , and let f : Rn −→ C be a Borel-measurable function.
The convolution of f with µ is the function defined by
(f ∗ µ)(x) = f (x − y) dµ(y).

Note that this formula is very similar to (19.1). In fact, if8 µ = g · m for some nonnegative
measurable function g : Rn −→ [0, +∞], then f ∗ µ coincides with the function f ∗ g
defined by (19.1). However, looking at f ∗ µ is also sometimes useful when µ does not
come from any function.
Remark 19.18. As before, one must ask the question of when is f ∗ µ defined. We will not
go into any details of the possible answers here, except to say that, tautologically, if for
a given x ∈ Rn the function y 7→ f (x − y) is integrable with respect to µ, then the value
(f ∗ µ)(x) is defined. A notable special case is when µ(Rn ) < ∞ (for instance, µ could
be a probability measure). Then f ∗ µ is defined everywhere whenever f is a bounded
measurable function (though it could also be defined for other kinds of f ).
Example 19.19. Let µ be defined9 by µ(A) = 1 if 0 ∈ A and µ(A) = 0 if 0 6∈ A. Then there
is no measurable function g on Rn such that µ = g · m, for instance, because µ({0}) 6= 0.
Directly from the definition, we have f ∗ µ = f for every measurable f : Rn −→ C.
Remark 19.20. In probability theory texts a different convention is used: the value of the
convolution of f with µ at a point x is defined as the integral
f (x + y) dµ(y).
This is the same as the convolution in our sense of the function f with the measure
defined by µ̌(A) = µ(−A) for every Borel subset A ⊂ Rn , where −A = {−x x ∈ A}.
In particular, if µ is symmetric in the sense that µ(A) = µ(−A) for every Borel subset
A ⊂ Rn , then the two notions of convolution (analytic and probabilistic) coincide.
19.9.2. Convolution of two measures. For simplicity let us consider the following situation.
Let µ1 , µ2 be two finite 10 positive measures on the σ-algebra B of Borel subsets of Rn .
Then we can define the convolution of µ1 and µ2 as another finite measure on B. If µ1
and µ2 are both probability measures (i.e., µ1 (Rn ) = 1 = µ2 (Rn )), then so is µ1 ∗ µ2 .
7In fact, convolution can also be defined for complex measures. However, since we never discussed the
definition of the integral of a function with respect to a complex measure, we will restrict attention to
positive measures throughout this subsection of R the notes.
8Recall again that this means that µ(A) = g dm for every Borel subset A ⊂ Rn .
9This measure is related to the Dirac delta-function.
10This means that µ (Rn ) < ∞ and µ (Rn ) < ∞.
1 2

The definition is as follows. Let a : Rn × Rn −→ Rn denote the addition map: a(x, y) =

x + y. For every Borel subset A ⊂ Rn , we set
(µ1 ∗ µ2 )(A) = (µ1 ⊗ µ2 )(a−1 (A)).
The first example below explains the similarity between this definition and (19.1).
Examples 19.21. (1) Suppose that µ1 and µ2 are both absolutely continuous11 with respect
to the Lebesgue measure m. By the Radon-Nikodym theorem, there exist measurable
functions g1 , g2 : Rn −→ [0, +∞) such that µ1 = g1 · m and µ2 = g2 · m. One can then
easily deduce from Fubini’s theorem that µ1 ∗ µ2 = (g1 ∗ g2 ) · m, where g1 ∗ g2 is the
convolution of g1 with g2 defined in (19.1).
(2) Let X and Y be Rn -valued random variables defined on a probability space (Ω, F, P),
and assume that they are independent 12. Then PX+Y = PX ∗ PY . (Conversely, if the
last equality holds, then X and Y are independent.) Recall that PX is the probability
distribution of X, i.e., the probability measure on B given by PX (A) = P[X ∈ A].
This result follows easily from Fubini’s theorem and the definition of independence,
although the converse is a little more tricky to prove.

11Recall: this means that if A ∈ B and m(A) = 0, then µ1 (A) = 0 = µ2 (A).

12I.e., for any pair of Borel subsets A, B ⊂ Rn , we have P[X ∈ A, Y ∈ B] = P[X ∈ A] · P[Y ∈ B].