Professional Documents
Culture Documents
1 Characteristic Functions
For weak convergence of probability measures on Rd , or equivalently weak convergence of
Rd -valued random variables, an important convergence determining class of functions are
functions of the form f (x) = eit·x , with t ∈ Rd . They lead to
Definition 1.1 [Characteristic Function] Let X be a Rd -valued random variable with dis-
tribution µ on Rd . Then the characteristic function of X (or µ) is defined to be
Z
φ(t) := E[eit·X ] = eit·x µ(dx).
When µ has a density with respect to Lebesgue measure, i.e., µ(dx) = ρ(x)dx, φ(·) is just
the Fourier transform of the function ρ(·). Therefore in general, we can think of φ(·) as the
Fourier transform of the measure µ.
Here are some properties which are immediate from the definition:
(i) Since eit·x = cos tx + i sin tx has bounded real and imaginary parts, φ(t) is well-defined.
(iv) The complex conjugate of φ, φ(t), is the characteristic function of (−X), and φ(t) ∈ R
dist
for all t ∈ Rd if X = (−X).
(vi) If and X and Y are two independent random variables with characteristic functions φX
and φY , then X + Y has characteristic function φX+Y (t) = E[eit·(X+Y ) ] = φX (t)φY (t).
(d) The compound Poisson distribution: X = Tn=1 Yi , where T has Poisson distribution
P
with parameter λ, and (Yi )i∈N is an independent sequence of i.i.d. random variables with
characteristic function ψ(·). Then φ(t) = E[eitX ] = e−λ(1−ψ(t)) .
1
x2
(e) The Gaussian (or standard normal) distribution: µ(dx) = √12π e− 2 dx on R. Then
x2 t2 (x−it)2 t2
φ(t) = √12π eitx e− 2 dx = e− 2 √12π R e− 2 dx = e− 2 by contour integration.
R R
1
(f) The exponential and gamma distributions: µ(dx) = Γ(γ) e−x xγ−1 on [0, ∞), where γ > 0
and Γ(γ) is the normalizing constant. When γ = 1, µ is the exponential distribution.
R it·x −x γ−1
1 1
Then φ(t) = Γ(γ) e e x dx = (1−it) γ.
The next result shows that the characteristic function of a probability measure on R
uniquely determines the probability measure, just as the Fourier transform of a function can
be inverted to recover the original function.
Theorem 1.4 [The Inversion Formula] Let φ(·) be the characteristic function of a proba-
bility measure µ on R. If a < b, then
Z T −ita
1 e − e−itb µ({a}) + µ({b})
lim φ(t)dt = µ((a, b)) + .
T →∞ 2π −T it 2
Note that Theorem 1.4 recovers µ((a, b)) from φ(·) for all a < b with µ({a}) = µ({b}) = 0,
and hence recovers µ. This implies that the family of functions {x ∈ R → eit·x : t ∈ R} is a
distribution determining class.
Proof of Theorem 1.4. Note that
e−ita − e−itb b
Z
= e−ity dy ≤ |b − a| for all t ∈ R. (1.1)
it a
Therefore by Fubini, we can write
Z T −ita
− e−itb T
e−ita − e−itb
Z Z
1 e 1
lim φ(t)dt = lim eitx µ(dx) dt
T →∞ 2π −T it T →∞ 2π −T it
T
eit(x−a) − eit(x−b)
Z Z
1
= lim dt µ(dx)
T →∞ 2π −T it
T
sin t(x − a) − sin t(x − b)
Z Z
1
= lim dt µ(dx)
T →∞ 2π −T t
Z
1
= lim (u(T, x − a) − u(T, x − b))µ(dx),
T →∞ 2π
It is known in the theory of Fourier transforms that, the faster f (x) tends to 0 as |x| → ∞,
the smoother (more differentiable) is its Fourier transform fˆ. Conversely, the faster fˆ(t) → 0
as |t| → ∞, the smoother is f . The characteristic function φ is the Fourier transform of the
measure µ. When φ is integrable, which can be interpreted as a condition on the decay of
φ(t) as |t| → ∞, it can be shown that µ is “smooth” in the sense that it admits a density.
2
Theorem 1.5 [Inverse Fourier Transform] Let φ(t) = eitx µ(dx), such that R |φ(t)|dt <
R R
Proof. By (1.1) and the assumption that φ is integrable, we can apply the dominated
convergence theorem in Theorem 1.4 to conclude that for all a < b,
Z T −ita
µ({a}) + µ({b}) 1 e − e−itb
µ((a, b)) + = lim φ(t)dt
2 T →∞ 2π −T it
Z −ita
− e−itb |b − a|
Z
1 e
= φ(t)dt ≤ |φ(t)|dt,
2π R it 2π
which implies that µ has no atoms. Therefore by Fubini, for all a < b,
Z −ita
− e−itb
Z Z b
1 e 1
µ((a, b)) = φ(t)dt = e−ity dyφ(t)dt
2π R it 2π R a
Z b Z Z b
1 −ity
= e φ(t)dt dy = f (y)dy,
a 2π R a
The above proof can be extended to show that probability measures on Rd are also uniquely
determined by their characteristic functions. Alternatively, see [2, Theorem 15.8] for a proof
which uses the fact that the algebra of functions generated by {eit·x : t ∈ Rd } is dense in some
sense in the space of bounded continuous functions, which makes the characteristic function
distribution determining.
Theorem 2.1 [Lévy’s Continuity Theorem on R] Let (µn )n∈N be a sequence of proba-
bility measures on R, with characteristic functions (φn )n∈N . If µn ⇒ µ∞ , then φn converges
pointwise to φ∞ , the characteristic function of µ∞ . Conversely if φn converges pointwise to a
function φ∞ which is continuous at 0, then φ∞ is the characteristic function of a probability
measure µ∞ , and µn ⇒ µ∞ .
Proof. Since eitx has bounded and continuous real and imaginary parts, φn → φ∞ follows
from µn ⇒ µ∞ by the definition of weak convergence.
For the converse, assume that φn → φ∞ pointwise on R, where φ∞ is continuous at 0.
To prove that µn converges weakly to some limit µ∞ , we only need to show that {µn }n∈N
is a relatively compact set of probability measures, which implies that every subsequence of
{µn }n∈N has a further subsequential weak limit. The assumption φn → φ∞ implies that all
subsequential weak limits of (µn )n∈N have characteristic function φ∞ , and hence µn converges
weakly to a unique limit µ∞ with characteristic function φ∞ .
By Prohorov’s Theorem, {µn }n∈N is relatively compact if and only if it is tight, namely,
for all > 0, we can find A such that
3
Information about the tail probability µn (−∞, −A) + µn (A, ∞) can in fact be recovered from
the behavior of φn near 0. We proceed as follows. Note that for any T > 0, by Fubini,
Z T Z Z T
1 1
φn (t)dt = eitx dt µn (dx)
2T −T 2T −T
Z
sin T x
= µn (dx)
Tx
Z Z
sin T x sin T x
≤ µn (dx) + µn (dx)
|x|< l Tx |x|≥ l Tx
1 1
≤ µn (−l, l) + µn (x : |x| ≥ l) = 1 − 1 − µn (x : |x| ≥ l),
Tl Tl
sin y
where we used that y ≤ 1 and | sin y| ≤ 1 for all y ∈ R. Therefore
Z T
1 1
1− µn (x : |x| ≥ l) ≤ 1 − φn (t)dt.
Tl 2T −T
2
Choosing l = T then gives
T
1 T
Z Z
2 1
µn x : |x| ≥ ≤2 1− φn (t)dt = (1 − φn (t))dt, (2.3)
T 2T −T T −T
1
RT
where the right hand side converges to 2 1 − 2T −T φ∞ (t)dt as n → ∞ by the assumption
φn → φ∞ and the bounded convergence theorem. OnRthe other hand, the assumption φ∞ (0) =
1 T
1 and φ∞ is continuous at 0 implies that 2 1 − 2T φ
−T ∞ (t)dt → 0 as T → 0. Therefore
given any > 0, we can first choose T sufficiently small and then n sufficiently large, say
n ≥ n0 (), such that
Z T Z T
2 1 1
µn x : |x| ≥ ≤ (φ∞ (t) − φn (t))dt + (1 − φ∞ (t))dt ≤ .
T T −T T −T
We can then choose T to be even smaller such that the above bound holds for all µi with
1 ≤ i < n0 , which establishes the tightness condition (2.2) with A = T2 .
If we choose l = l(t) such that l ↑ ∞ and |t|l → 0 as t → 0, then we obtain a bound on how
fast φ(t) → 1 as t → 0 in terms of how fast µ(x : |x| ≥ l) → 0 as l → ∞.
Theorem 2.2 [Lévy’s Continuity Theorem on Rd ] Let (µn )n∈N be a sequence of proba-
bility measures on Rd with characteristic functions (φn )n∈N . If φn converges pointwise to a
function φ∞ which is continuous at 0, then µn ⇒ µ∞ for some probability measure µ∞ on Rd
with characteristic function φ∞ .
4
Proof. As in the one-dimensional case, it suffices to show that {µn }n∈N is tight. Let Xn :=
(Xn (1), . . . , Xn (d)) ∈ Rd be a random variable with distribution µn . We leave it as an exercise
to show that
Exercise 2.3 A family of Rd -valued random variables {Xn := (Xn (1), . . . , Xn (d))}n∈N is
tight if and only if for each coordinate 1 ≤ i ≤ d, {Xn (i)}n∈N is a tight family of R-valued
random variables.
Therefore it only remains to show that {Xn (i)}n∈N is tight for each 1 ≤ i ≤ d. Note that
E[eitXn (1) ] = φn (t, 0, . . . , 0) → φ∞ (t, 0, . . . , 0), where φ∞ (t, 0, . . . , 0) is continuous at t = 0 by
assumption. Therefore by Lévy’s Continuity Theorem on R, {Xn (1)}n∈N is tight, and so is
{Xn (i)}n∈N for each 1 ≤ i ≤ d.
The next result shows that weak convergence of Rd -valued random variables can be reduced
to weak convergence of a family of R-valued random variables, which can be useful at times.
(ii) Conversely, if φ(2n) (0) exists for some n ∈ N, then x2n µ(dx) <R ∞. However, φ(2n−1)
R
exists and being continuous for some n ∈ N does not imply that |x|2n−1 µ(dx) < ∞.
Note that x2 µ(dx) = −φ00 (0), and hence the only probability measure with φ00 (0) = 0 is the
R
delta measure at 0. In other words, a non-trivial φ must have φ00 (0) < 0.
The proof of part (i) requires checking conditions which allow passing differentiation inside
an integral, while part (ii) can be proved by writing x2n as a suitable limit, such as x2 =
limh→0 2(1−cos
h2
hx)
, and then use Fatou’s Lemma. See e.g. [1, Section 2.3.c] and [2, Section 15.4].
5
Exercise 3.2 Given an example of a probability distribution µ on R such that its character-
0
R
istic function φ has φ (0) = 0, and yet |x|µ(dx) = ∞. (Hint: take a symmetric µ and follow
the definition of φ0 (0) to explore the conditions needed for the tail of µ.)
C
sup µn (|x| > l) ≤ ,
n l2
which in turn implies that {µn }n∈N is a tight (and hence relatively compact) family of proba-
bility measures. Therefore subsequential weak limits are guaranteed to exist. It only remains
Rto find
k
conditions on (mk )n∈N such that there exists a unique probability measure µ with
x µ(dx) = mk for all k ∈ N. This is known as the Hamburger moment problem.
Exercise 4.1 Assume that limn→∞ xk µn (dx) = mk ∈ R, for each k ∈ N, for a sequence
R
of probability measures
R k(µn )n∈N on R. Prove that if µ is any subsequential weak limit of
(µn )n∈N , then mk = x µ(dx) for all k ∈ N.
This exercise shows that in our context, the existence of a solution to the Hamburger moment
problem with moment sequence (mk )k∈N is guaranteed. The real issue is the uniqueness of
the solution, for which it is necessary to impose some conditions on (mk )k∈N .
We now construct a moment sequence (mn )n∈N which corresponds to two distinct probabil-
ity measures. Let µ = ∞
P P∞ P P
n=0 an δen and ν = n=0 bn δen , with an , bn ≥ 0 and an = bn = 1.
Assume further that for each k ∈ N,
Z ∞
X Z ∞
X
mk = xk µ(dx) = an ekn = xk ν(dx) = bn ekn .
n=0 n=0
6
To ensure that at most one probability measure corresponds to (mk )k∈N , we require mk
to grow not too fast in k ∈ N. The classic condition is Carleman’s condition:
∞ 1
X −
m2k2k = ∞, (4.5)
k=1
Proof. It suffices to determine uniquely the characteristic function Rof any µ with moment
sequence (mk )k∈N . Since mk ∈ R for all k ∈ N, it follows that φ(t) = eitx µ(dx) is infinitely
differentiable at each t ∈ R, with |φ(2k) (t)| ≤ m2k for each k ∈ N. Since for k ≥ 0,
Z
√ m2k + m2k+2
|φ(2k+1) (t)| ≤ |x|2k+1 µ(dx) ≤ m2k m2k+2 ≤
2
a2k
by the Cauchy-Schwarz inequality, the assumption ∞
P
k=1 m2k 2k! < ∞ implies that for any
t ∈ R, (see [1, 2] for detailed justifications on the power series expansion)
∞
X zk
φ(t + z) = φ(k) (t)
k!
k=0
is analytic in z with |z| < a. In particular, for |z| < a, φ(z) is determined by its Taylor
series at 0, with Taylor coefficients φ(k) (0) determined by (mk )k∈N . We can then repeat the
argument and Taylor expand at t = ±a/2, ±a, ±3a/2, . . . to conclude that φ(z) is determined
by (mk )k∈N for all z = t + ix, with t ∈ R and −a < x < a. In particular, φ is determined by
(mk )k∈N , which in turn determines µ.
Remark. The most common distributions, such as the exponential or Gaussian distribution,
satisfy Carleman’s condition or the condition in Theorem 4.2. Therefore to prove (µn )n∈N
converges weakly to the exponential or Gaussian distribution, it suffices to prove that the
moments of µn converge to those of the exponential or Gaussian. This is called the method of
moments for proving weak convergence. On the other hand, a distribution whose moments
satisfy Carleman’s condition (4.5) is uniquely determined by its distribution function φ on
[−, ] for any > 0, by Theorem 3.1 (i). Therefore to prove weak convergence to such a
distribution using Lévy’s Continuity Theorem, it is sufficient to verify the convergence of the
characteristic functions on [−, ] for any > 0.
Lastly we note that, without assuming Carleman’s condition, a distribution in general is
not uniquely determined by its characteristic function on a finite interval [−a, a]. This can be
seen easily from the following result (see [1, 2] for a proof):
Theorem 4.3 [Pólya’s Criterion] If φ : R → [0, 1] is a real, continuous and even function
with φ(0) = 1, limt→∞ φ(t) = 0, and φ is convex on [0, ∞), then φ is a characteristic function.
References
[1] R. Durrett. Probability: Theory and Examples, 2nd edition, Duxbury Press, 1996.