You are on page 1of 7

Lecture 6

1 Characteristic Functions
For weak convergence of probability measures on Rd , or equivalently weak convergence of
Rd -valued random variables, an important convergence determining class of functions are
functions of the form f (x) = eit·x , with t ∈ Rd . They lead to

Definition 1.1 [Characteristic Function] Let X be a Rd -valued random variable with dis-
tribution µ on Rd . Then the characteristic function of X (or µ) is defined to be
Z
φ(t) := E[eit·X ] = eit·x µ(dx).

When µ has a density with respect to Lebesgue measure, i.e., µ(dx) = ρ(x)dx, φ(·) is just
the Fourier transform of the function ρ(·). Therefore in general, we can think of φ(·) as the
Fourier transform of the measure µ.
Here are some properties which are immediate from the definition:

Proposition 1.2 [Properties of Characteristic Functions]

(i) Since eit·x = cos tx + i sin tx has bounded real and imaginary parts, φ(t) is well-defined.

(ii) φ(0) = 1 and |φ(t)| ≤ 1 for all t ∈ Rd .

(iii) φ is uniformly continuous on Rd . More precisely,

|φ(t + h) − φ(t)| = E[ei(t+h)·X ] − E[eit·X ] = E[eit·X (eih·X − 1)] ≤ E[|eih·X − 1|],

which tends to 0 as h → 0 by the bounded convergence theorem.

(iv) The complex conjugate of φ, φ(t), is the characteristic function of (−X), and φ(t) ∈ R
dist
for all t ∈ Rd if X = (−X).

(v) For a ∈ R and b ∈ Rd , aX + b has characteristic function ϕ(t) = eib·t φ(at).

(vi) If and X and Y are two independent random variables with characteristic functions φX
and φY , then X + Y has characteristic function φX+Y (t) = E[eit·(X+Y ) ] = φX (t)φY (t).

Proposition 1.3 [Common Distributions and Their Characteristic Functions]

(a) The delta measure at a ∈ R: µ(dx) = δa (dx). Then φ(t) = eia·t .


eit +e−it
(b) The coin flip: µ({1}) = µ({−1}) = 12 . Then φ(t) = 2 = cos t.
n
(c) The Poisson distribution with parameter λ > 0: µ({n}) = e−λ λn! for n ∈ {0} ∪ N. Then
φ(t) = e−λ ∞ itn λn = e−λ(1−eit ) .
P
n=0 e n!

(d) The compound Poisson distribution: X = Tn=1 Yi , where T has Poisson distribution
P

with parameter λ, and (Yi )i∈N is an independent sequence of i.i.d. random variables with
characteristic function ψ(·). Then φ(t) = E[eitX ] = e−λ(1−ψ(t)) .

1
x2
(e) The Gaussian (or standard normal) distribution: µ(dx) = √12π e− 2 dx on R. Then
x2 t2 (x−it)2 t2
φ(t) = √12π eitx e− 2 dx = e− 2 √12π R e− 2 dx = e− 2 by contour integration.
R R

1
(f) The exponential and gamma distributions: µ(dx) = Γ(γ) e−x xγ−1 on [0, ∞), where γ > 0
and Γ(γ) is the normalizing constant. When γ = 1, µ is the exponential distribution.
R it·x −x γ−1
1 1
Then φ(t) = Γ(γ) e e x dx = (1−it) γ.

The next result shows that the characteristic function of a probability measure on R
uniquely determines the probability measure, just as the Fourier transform of a function can
be inverted to recover the original function.
Theorem 1.4 [The Inversion Formula] Let φ(·) be the characteristic function of a proba-
bility measure µ on R. If a < b, then
Z T −ita
1 e − e−itb µ({a}) + µ({b})
lim φ(t)dt = µ((a, b)) + .
T →∞ 2π −T it 2
Note that Theorem 1.4 recovers µ((a, b)) from φ(·) for all a < b with µ({a}) = µ({b}) = 0,
and hence recovers µ. This implies that the family of functions {x ∈ R → eit·x : t ∈ R} is a
distribution determining class.
Proof of Theorem 1.4. Note that
e−ita − e−itb b
Z
= e−ity dy ≤ |b − a| for all t ∈ R. (1.1)
it a
Therefore by Fubini, we can write
Z T −ita
− e−itb T
e−ita − e−itb 
Z Z
1 e 1 
lim φ(t)dt = lim eitx µ(dx) dt
T →∞ 2π −T it T →∞ 2π −T it
T
eit(x−a) − eit(x−b)
Z Z
1
= lim dt µ(dx)
T →∞ 2π −T it
T
sin t(x − a) − sin t(x − b)
Z Z
1
= lim dt µ(dx)
T →∞ 2π −T t
Z
1
= lim (u(T, x − a) − u(T, x − b))µ(dx),
T →∞ 2π

where we used that cos t(x−a)−cos


it
t(x−b)
is an odd function in t, and
Z T Z T Z zT Z ∞
sin tz sin tz sin t sin t
u(T, z) := dt = 2 dt = 2 dt −→ sign(z)2 dt,
−T t 0 t 0 t T →∞ 0 t
where sign(z) = 0 if z = 0,R −1 if z < 0, and 1 if R Tz sin
> 0. Furthermore, it is known that the
∞ sin t
so-called Dirichlet integral 0 t dt = limT →∞ 0 t t dt = π2 (see [1, Appendix 6, Ex.6.6]).
Therefore supz,T |u(z, T )| < ∞, and hence by the bounded convergence theorem,
Z
1 µ({a}) + µ({b})
lim (u(T, x − a) − u(T, x − b))µ(dx) = µ((a, b)) + .
T →∞ 2π 2

It is known in the theory of Fourier transforms that, the faster f (x) tends to 0 as |x| → ∞,
the smoother (more differentiable) is its Fourier transform fˆ. Conversely, the faster fˆ(t) → 0
as |t| → ∞, the smoother is f . The characteristic function φ is the Fourier transform of the
measure µ. When φ is integrable, which can be interpreted as a condition on the decay of
φ(t) as |t| → ∞, it can be shown that µ is “smooth” in the sense that it admits a density.

2
Theorem 1.5 [Inverse Fourier Transform] Let φ(t) = eitx µ(dx), such that R |φ(t)|dt <
R R

∞. Then µ(dx) = f (x)dx with bounded density


Z
1
f (x) = e−itx φ(t)dt.
2π R

Proof. By (1.1) and the assumption that φ is integrable, we can apply the dominated
convergence theorem in Theorem 1.4 to conclude that for all a < b,
Z T −ita
µ({a}) + µ({b}) 1 e − e−itb
µ((a, b)) + = lim φ(t)dt
2 T →∞ 2π −T it
Z −ita
− e−itb |b − a|
Z
1 e
= φ(t)dt ≤ |φ(t)|dt,
2π R it 2π
which implies that µ has no atoms. Therefore by Fubini, for all a < b,
Z −ita
− e−itb
Z Z b
1 e 1
µ((a, b)) = φ(t)dt = e−ity dyφ(t)dt
2π R it 2π R a
Z b Z Z b
1 −ity

= e φ(t)dt dy = f (y)dy,
a 2π R a

which implies that µ(dx) = f (x)dx.

The above proof can be extended to show that probability measures on Rd are also uniquely
determined by their characteristic functions. Alternatively, see [2, Theorem 15.8] for a proof
which uses the fact that the algebra of functions generated by {eit·x : t ∈ Rd } is dense in some
sense in the space of bounded continuous functions, which makes the characteristic function
distribution determining.

2 Characteristic Functions and Weak Convergence


We now show that the class of functions {x ∈ R → eitx : t ∈ R} is not only distribution
determining, but also convergence determining. More precisely, we show

Theorem 2.1 [Lévy’s Continuity Theorem on R] Let (µn )n∈N be a sequence of proba-
bility measures on R, with characteristic functions (φn )n∈N . If µn ⇒ µ∞ , then φn converges
pointwise to φ∞ , the characteristic function of µ∞ . Conversely if φn converges pointwise to a
function φ∞ which is continuous at 0, then φ∞ is the characteristic function of a probability
measure µ∞ , and µn ⇒ µ∞ .

Proof. Since eitx has bounded and continuous real and imaginary parts, φn → φ∞ follows
from µn ⇒ µ∞ by the definition of weak convergence.
For the converse, assume that φn → φ∞ pointwise on R, where φ∞ is continuous at 0.
To prove that µn converges weakly to some limit µ∞ , we only need to show that {µn }n∈N
is a relatively compact set of probability measures, which implies that every subsequence of
{µn }n∈N has a further subsequential weak limit. The assumption φn → φ∞ implies that all
subsequential weak limits of (µn )n∈N have characteristic function φ∞ , and hence µn converges
weakly to a unique limit µ∞ with characteristic function φ∞ .
By Prohorov’s Theorem, {µn }n∈N is relatively compact if and only if it is tight, namely,
for all  > 0, we can find A such that

µn (−∞, −A) + µn (A, ∞) ≤  for all n ∈ N. (2.2)

3
Information about the tail probability µn (−∞, −A) + µn (A, ∞) can in fact be recovered from
the behavior of φn near 0. We proceed as follows. Note that for any T > 0, by Fubini,
Z T Z Z T
1 1
φn (t)dt = eitx dt µn (dx)
2T −T 2T −T
Z
sin T x
= µn (dx)
Tx
Z Z
sin T x sin T x
≤ µn (dx) + µn (dx)
|x|< l Tx |x|≥ l Tx
1  1
≤ µn (−l, l) + µn (x : |x| ≥ l) = 1 − 1 − µn (x : |x| ≥ l),
Tl Tl
sin y
where we used that y ≤ 1 and | sin y| ≤ 1 for all y ∈ R. Therefore
Z T
 1  1
1− µn (x : |x| ≥ l) ≤ 1 − φn (t)dt.
Tl 2T −T

2
Choosing l = T then gives
T
1 T
Z Z
 2  1 
µn x : |x| ≥ ≤2 1− φn (t)dt = (1 − φn (t))dt, (2.3)
T 2T −T T −T
1
RT 
where the right hand side converges to 2 1 − 2T −T φ∞ (t)dt as n → ∞ by the assumption
φn → φ∞ and the bounded convergence theorem. OnRthe other hand,  the assumption φ∞ (0) =
1 T
1 and φ∞ is continuous at 0 implies that 2 1 − 2T φ
−T ∞ (t)dt → 0 as T → 0. Therefore
given any  > 0, we can first choose T sufficiently small and then n sufficiently large, say
n ≥ n0 (), such that
Z T Z T
 2 1 1
µn x : |x| ≥ ≤ (φ∞ (t) − φn (t))dt + (1 − φ∞ (t))dt ≤ .
T T −T T −T

We can then choose T to be even smaller such that the above bound holds for all µi with
1 ≤ i < n0 , which establishes the tightness condition (2.2) with A = T2 .

Remark. We can complement (2.3) by establishing an inequality in the reverse direction, so


that how quickly φ(t) → 1 as t → 0 is controlled by the tail probability:
Z Z Z
itx itx
|1 − φ(t)| ≤ |e − 1|µ(dx) = |e − 1|µ(dx) + |eitx − 1|µ(dx)
|x|< l |x|≥ l
Z Z tx
(2.4)
≤ eiy dy µ(dx) + 2µ(x : |x| ≥ l)
|x|< l 0
≤ |t| l + 2µ(x : |x| ≥ l).

If we choose l = l(t) such that l ↑ ∞ and |t|l → 0 as t → 0, then we obtain a bound on how
fast φ(t) → 1 as t → 0 in terms of how fast µ(x : |x| ≥ l) → 0 as l → ∞.

Lévy’s Continuity Theorem can be extended to higher dimensions.

Theorem 2.2 [Lévy’s Continuity Theorem on Rd ] Let (µn )n∈N be a sequence of proba-
bility measures on Rd with characteristic functions (φn )n∈N . If φn converges pointwise to a
function φ∞ which is continuous at 0, then µn ⇒ µ∞ for some probability measure µ∞ on Rd
with characteristic function φ∞ .

4
Proof. As in the one-dimensional case, it suffices to show that {µn }n∈N is tight. Let Xn :=
(Xn (1), . . . , Xn (d)) ∈ Rd be a random variable with distribution µn . We leave it as an exercise
to show that

Exercise 2.3 A family of Rd -valued random variables {Xn := (Xn (1), . . . , Xn (d))}n∈N is
tight if and only if for each coordinate 1 ≤ i ≤ d, {Xn (i)}n∈N is a tight family of R-valued
random variables.

Therefore it only remains to show that {Xn (i)}n∈N is tight for each 1 ≤ i ≤ d. Note that
E[eitXn (1) ] = φn (t, 0, . . . , 0) → φ∞ (t, 0, . . . , 0), where φ∞ (t, 0, . . . , 0) is continuous at t = 0 by
assumption. Therefore by Lévy’s Continuity Theorem on R, {Xn (1)}n∈N is tight, and so is
{Xn (i)}n∈N for each 1 ≤ i ≤ d.

The next result shows that weak convergence of Rd -valued random variables can be reduced
to weak convergence of a family of R-valued random variables, which can be useful at times.

Theorem 2.4 [Cramér-Wold Device] Let Xn := (Xn (1), . . . , Xn (d)), n ∈ N, be a se-


quence of Rd -valued random variables. Then Xn converges weakly to some limit X∞ =
Pd
(X∞ (1), . . . , X∞ (d)) if and only if for every λ ∈ Rd , hλ, Xn i := i=1 λi Xn (i) converges
weakly to some limit Yλ , and we can further deduce that Yλ = hλ, X∞ i.

Proof. If Xn ⇒ X∞ , then hλ, Xn i ⇒ hλ, X∞ i by the Continuous Mapping Theorem, since


x ∈ Rd → hλ, xi ∈ R is a continuous map. Conversely if hλ, Xn i ⇒ Yλ for some Yλ for each
λ ∈ Rd , then Xn (i) converges for each 1 ≤ i ≤ d, which by Exercise 2.3 implies that {Xn }n∈N
is tight and hence relatively compact. On the other hand, E[eiht,Xn i ] → E[eiYt ] uniquely
determines the characteristic function of any subsequential weak limit of {Xn }n∈N , and hence
Xn converges weakly to a unique limit X∞ , with Yλ = hλ, X∞ i.

3 Characteristic Functions and Moments


In the proof of Lévy’s Continuity Theorem, we saw a close connection between the tail prob-
ability µ(−∞, −A) + µ(A, ∞) of a measure µ and the behavior of its characteristic function
φ near 0. We now explore this connection further and show how moments of µ are related to
the higher order derivatives of φ at 0.

Theorem 3.1 [Derivatives of a Characteristic Function] Let µ be a probability measure


on R with characteristic function φ.
R n
R |x|
(i) If µ(dx) < ∞ for some n R∈ N, then φ has a continuous n-th derivative φ(n) (t) =
(ix)n eitx µ(dx). In particular, xn µ(dx) = (−i)n φ(n) (0).

(ii) Conversely, if φ(2n) (0) exists for some n ∈ N, then x2n µ(dx) <R ∞. However, φ(2n−1)
R

exists and being continuous for some n ∈ N does not imply that |x|2n−1 µ(dx) < ∞.

Note that x2 µ(dx) = −φ00 (0), and hence the only probability measure with φ00 (0) = 0 is the
R

delta measure at 0. In other words, a non-trivial φ must have φ00 (0) < 0.
The proof of part (i) requires checking conditions which allow passing differentiation inside
an integral, while part (ii) can be proved by writing x2n as a suitable limit, such as x2 =
limh→0 2(1−cos
h2
hx)
, and then use Fatou’s Lemma. See e.g. [1, Section 2.3.c] and [2, Section 15.4].

5
Exercise 3.2 Given an example of a probability distribution µ on R such that its character-
0
R
istic function φ has φ (0) = 0, and yet |x|µ(dx) = ∞. (Hint: take a symmetric µ and follow
the definition of φ0 (0) to explore the conditions needed for the tail of µ.)

Fractional moments can also be expressed in terms of the characteristic function.


1−cos tx
= Cq |x|p to express |x|p µ(dx) in
R R
Exercise 3.3 For p ∈ (0, 2), use the identity R |t|p+1
dt
terms of the characteristic function φ of µ.

4 The Method of Moments


Given a sequence
R k of probability measures (µn )n∈N on R, often it is easier to show that the
moments x µn (dx) → mk ∈ R for each k ∈ N than to show that the characteristic functions
of (µn )n∈N converges. This raises the natural question: under what condition on the moment
sequence (mk )k∈N , can we conclude that µn ⇒ µ for some probability measure µ?
Note that x2 µn (dx) → m2 < ∞ implies supn x2 µn (dx) = C < ∞, and hence
R R

C
sup µn (|x| > l) ≤ ,
n l2

which in turn implies that {µn }n∈N is a tight (and hence relatively compact) family of proba-
bility measures. Therefore subsequential weak limits are guaranteed to exist. It only remains
Rto find
k
conditions on (mk )n∈N such that there exists a unique probability measure µ with
x µ(dx) = mk for all k ∈ N. This is known as the Hamburger moment problem.

Exercise 4.1 Assume that limn→∞ xk µn (dx) = mk ∈ R, for each k ∈ N, for a sequence
R

of probability measures
R k(µn )n∈N on R. Prove that if µ is any subsequential weak limit of
(µn )n∈N , then mk = x µ(dx) for all k ∈ N.

This exercise shows that in our context, the existence of a solution to the Hamburger moment
problem with moment sequence (mk )k∈N is guaranteed. The real issue is the uniqueness of
the solution, for which it is necessary to impose some conditions on (mk )k∈N .
We now construct a moment sequence (mn )n∈N which corresponds to two distinct probabil-
ity measures. Let µ = ∞
P P∞ P P
n=0 an δen and ν = n=0 bn δen , with an , bn ≥ 0 and an = bn = 1.
Assume further that for each k ∈ N,
Z ∞
X Z ∞
X
mk = xk µ(dx) = an ekn = xk ν(dx) = bn ekn .
n=0 n=0

Writing cn = an − bn , then the construction of µ 6= ν satisfying the above conditions is


equivalent to finding a sequence (cn )n≥0 not identically 0, such that

X X
C(z) := cn z n = 0 for z = 1, e, e2 , . . . with |cn | < ∞.
n=0 n

Indeed, given such (cn )n≥0 , we can simply take an := 2 max{c


P
|c
n ,0}
n | and bn := 2 max{−c
P
|cn
n ,0}
| . One
Q∞ n n
z

explicit construction is to take C(z) = n=1 1 − en , which is an entire function by the
Weierstrass Factorization Theorem. Expanding C(z) then gives (cn )n∈N , and ∞ kn <
P
n=0 |cn |e
∞ for all k ∈ N since the radius of convergence of C(z) is ∞.

6
To ensure that at most one probability measure corresponds to (mk )k∈N , we require mk
to grow not too fast in k ∈ N. The classic condition is Carleman’s condition:
∞ 1
X −
m2k2k = ∞, (4.5)
k=1

which is slightly weaker than what we will assume.


P∞ a2k
R k k=1 m2k 2k! < ∞ for some a > 0, then there is at
Theorem 4.2 Let (mk )k∈N be such that
most one probability measure µ with x µ(dx) = mk .

Proof. It suffices to determine uniquely the characteristic function Rof any µ with moment
sequence (mk )k∈N . Since mk ∈ R for all k ∈ N, it follows that φ(t) = eitx µ(dx) is infinitely
differentiable at each t ∈ R, with |φ(2k) (t)| ≤ m2k for each k ∈ N. Since for k ≥ 0,
Z
√ m2k + m2k+2
|φ(2k+1) (t)| ≤ |x|2k+1 µ(dx) ≤ m2k m2k+2 ≤
2
a2k
by the Cauchy-Schwarz inequality, the assumption ∞
P
k=1 m2k 2k! < ∞ implies that for any
t ∈ R, (see [1, 2] for detailed justifications on the power series expansion)

X zk
φ(t + z) = φ(k) (t)
k!
k=0

is analytic in z with |z| < a. In particular, for |z| < a, φ(z) is determined by its Taylor
series at 0, with Taylor coefficients φ(k) (0) determined by (mk )k∈N . We can then repeat the
argument and Taylor expand at t = ±a/2, ±a, ±3a/2, . . . to conclude that φ(z) is determined
by (mk )k∈N for all z = t + ix, with t ∈ R and −a < x < a. In particular, φ is determined by
(mk )k∈N , which in turn determines µ.

Remark. The most common distributions, such as the exponential or Gaussian distribution,
satisfy Carleman’s condition or the condition in Theorem 4.2. Therefore to prove (µn )n∈N
converges weakly to the exponential or Gaussian distribution, it suffices to prove that the
moments of µn converge to those of the exponential or Gaussian. This is called the method of
moments for proving weak convergence. On the other hand, a distribution whose moments
satisfy Carleman’s condition (4.5) is uniquely determined by its distribution function φ on
[−, ] for any  > 0, by Theorem 3.1 (i). Therefore to prove weak convergence to such a
distribution using Lévy’s Continuity Theorem, it is sufficient to verify the convergence of the
characteristic functions on [−, ] for any  > 0.
Lastly we note that, without assuming Carleman’s condition, a distribution in general is
not uniquely determined by its characteristic function on a finite interval [−a, a]. This can be
seen easily from the following result (see [1, 2] for a proof):

Theorem 4.3 [Pólya’s Criterion] If φ : R → [0, 1] is a real, continuous and even function
with φ(0) = 1, limt→∞ φ(t) = 0, and φ is convex on [0, ∞), then φ is a characteristic function.

References
[1] R. Durrett. Probability: Theory and Examples, 2nd edition, Duxbury Press, 1996.

[2] A. Klenke. Probability Theory–A Comprehensive Course, Springer-Verlag.

You might also like