You are on page 1of 27

CONVERGENCE OF PROBABILITY DISTRIBUTIONS

GENERATING FUNCTIONS
Probability generating function (pgf) X discrete with values in {0,1,2,...} gX(s)=E[sX] for 0 s 1. Moment generating function (mgf) X general real-valued random variable MX(t)=E[etX] Characteristic function X general real-valued random variable X(t)=E[ei tX]=E[cos(tX)]+i E[sin(tX)]
2

Moment generating function (mgf) X general real-valued random variable MX(t)=E[etX] Characteristic function X general real-valued random variable X(t)=E[ei tX]=E[cos(tX)]+i E[sin(tX)] These are just slices of the same complex function. The real slice (mgf) is the biggest possible. The radius of convergence is sup{t:MX(t)<}. Principle: Bigger functions give better tail bounds. mgf is very useful, if it exists: Chernoff bound: P{X>x}<e-txMX(t) for all t>0.
3

PROPERTIES OF MOMENT GENERATING FUNCTIONS


MX(0)=1. If Y=aX+b for const. a and b, then MY(t)=ebtMX(at). The k-th derivative of MX(t) at t=0 is E[Xk]. If X,Y are indep. then MX+Y(t)= MX(t)MY(t). Uniqueness: If MX(t)= MY(t) for all |t|<t0, for some positive t0, then X=dY.

USING UNIQUENESS
Z standard normal 1 z2 /2 tz MZ (t) = e e dz 2 1 t2 /2 (zt)2 /2 = e e dz 2
=e
t2 /2

X~N(,2)
MX (t) = e MZ (t) = e
t t+ 2 t2 /2

GAMMA DISTRIBUTION
r1 x f (x) = x e (r) r r1 (t)x MX (t) = x e dx (r) 0 r = t r t for t<. = 1
r

DISTRIBUTION
tY

Let Y have the distribution of Z2. Then


MY (t) = E[e ] = E[e ] 1 z2 /2 tz 2 = e e dz 2 1 (12t)z2 /2 e = dz for t<. 2
tZ 2

This is the same mgf as the Gamma(,).


7

= (1 2t)1/2 .

DECOMPOSING A NORMAL DISTRIBUTION


Is there a non-normal distribution whose convolution with itself is normal? 1 z2 /2 f (x)f (z x)dx = e 2 mgf:
MX1 +X2 (t) = MX (t) = e MX (t) = e
t2 /4 2 t2 /2

By uniqueness, X~N(0,).
8

WHY UNIQUENESS?
Equal mgf on interval Equal analytic continuation Equal characteristic function Distributions are equal when P(XA)=P(YA) for all events A. Equivalent: Distributions are equal when E[f(X)]= E[f(Y)] for all bounded functions f.

Distributions are equal when E[f(X)]= E[f(Y)] for all bounded functions f. Equivalent: Distributions are equal when E[f(X)]= E[f(Y)] for enough functions f. Whats enough? Enough to approximate all functions. Examples: All bounded continuous functions. Smooth functions with bounded derivative Functions eitx (Fourier analysis) Polynomials?

10

WHAT COULD GO WRONG?: THE MOMENT PROBLEM


Suppose we have random variables X1,X2,... and X, and

E[X ] = E[Y ] for all k


k k

Does it follow that X=dY? Proof sketch - Moments determine the moment generating function. Moment generating E S Ldistribution. functions determine the FA Therefore, equal moments determines equal distribution.

11

In fact, equality of moments does not determine equality of distributions. Let X have standard log-normal distribution, and dene Y to have density:
fX (x) = (2)
1/2 1 1 ln2 x 2

for x > 0,

fY (y) = fX (y) (1 + sin(2 ln y)) for y > 0. For any k 0, we have (substituting ln y = t = s + k), 1 k 1 t2 +kt sin(2t)dt y fX (y) sin(2 ln y)dy = e 2 2 0 1 k2 /2 1 s2 = e e 2 sin(2s)ds 2 = 0 by symmetry.

12

In fact, equality of moments does not even determine equality of distributions. Let X and Y have same moments (and all nite), but different distributions.

Whats wrong with our proof? The tails of the density fall off faster than any polynomial (hence moments nite) but slower than exponential. MX(t)= for all t>0, so we cant apply uniqueness theorem. Theorem - Distribution determined by moments when E[Xk]1/k doesnt increase faster than linearly in k.
13

CONVERGENCE
Fundamentally two kinds of convergence in probability Strong convergence X1,X2,... on same space.
X = lim Xn exists
n

Weak convergence Convergence of probabilities Xn d X. Not on the same space.

with prob. 1

Need a notion of when X is a r.v. dened on the probabilities are close! same space, and P{ > 0 N s.t. n > N, |Xn X | < } = 1.
14

STRONG CONVERGENCE
Example: 1, 2,... i.i.d. with P(i=1)= P(i=0)=. n i Xn = 2 i
i=1

Always a Cauchy sequence. X has uniform distribution on (0,1). What if i has a different distribution? X has a distribution on (0,1) that is neither discrete nor continuous.
15

WEAK CONVERGENCE
First idea: Random variables X and X are close if |P(XA)-P(XA)| is small for all events A. This works for discrete X, X, but problematic for others. Example:
X = Xn =

2 i 2i i

from before.

A={k/2n: n=1,2,...; k=0,1,...,2n-1} Youd want strong convergenceweak convergence.


16

i=1 n i=1

P{Xn A} = 1 for all n;

P{X A} = 0.

Idea: Weaker convergence. Restrict the relevant sets. We have a class of functions F. d(X,X)=supfF{|E[f(X)]-E[f(X)]|} Metrics arent really part of this course. XndX if
n

lim E[f (Xn )] = E[f (X )]

for bounded continuous f. Equivalent: P(Xnx)P(Xx) for x where the limit cdf is continuous. E[f(Xn)]E[f(X)] for f with |f(k)|C. E[exp(itXn)]E[exp(itXn)]. Polynomials?
17

CONTINUITY THEOREM
For probability generating functions For moment generating functions For characteristic functions You can prove convergence with any of these collections of functions. You may want to know how far off a particular function may be.

18

DISCRETE CONVERGENCE
Let Xn have Binom(n,/n) distribution. k nk n P{Xn = k} = 1 k n n
k

n(n 1) (n k + 1) = k! nk

1 n

nk

converges to 1

converges to e-

19

DISCRETE CONVERGENCE
Let Xn have Binom(n,/n) distribution. n n (s1) gn (s) = pgf = 1 + (s 1) e n which we recognise as the pgf of Poi().

20

CONVERGENCE IN PROBABILITY
The simplest form of convergence in probability is when the limit is a -distribution. That is, a deterministic point. X1,X2,... converge in probability to x (Xip x) if EQUIVALENT (i) Xnd x (= distribution with P({x})=1) (ii) E[f(Xn)] f(x) for bounded continuous f
lim (iii) > 0, n P{|Xn x| > } = 0

EQUIVALENT
21

(iv) > 0 P{|Xn x| > } < for n suciently large.

CONVERGENCE IN PROBABILITY
(i) Xnd x (= distribution with P({x})=1)

(ii) E[f(Xn)] f(x) for bounded continuous f


This is just the denition of convergence in distribution.

22

CONVERGENCE IN PROBABILITY
(ii) E[f(Xn)] f(x) for bounded continuous f
lim (iii) > 0, n P{|Xn x| > } = 0
1.25 1

Proof - Let f(y)=min{1,|x-y|/}. Then E[f(Xn)]P{|Xn-x|> }, and f(x)=0.

0.75

0.5

0.25

0.5

1.5

23

CONVERGENCE IN PROBABILITY
lim (iii) > 0, n P{|Xn x| > } = 0

(iv) > 0 P{|Xn x| > } < for n suciently large. Denition of limit.

24

CONVERGENCE IN PROBABILITY
(iv) > 0 P{|Xn x| > } < for n suciently large. (ii) E[f(Xn)] f(x) for bounded continuous f Proof - Let f be any continuous function with sup f=1. Choose any >0. By denition of continuity, we may nd >0 s.t.|f(y)-f(x)|< when |x-y|<. Forn sufciently large,

Since and may be chosen arbitrarily small, this completes the proof.

E[f (Xn )] f (x) = E f (Xn ) f (x) 1{|Xn x|<} + f (Xn ) f (x) 1{|Xn x|} E

E[f (Xn )] f (x) + E 21{|X x|} + . n

25

CONVERGENCE IN PROBABILITY
(ii) E[f(Xn)] f(x) for bounded continuous f
lim (iii) > 0, n P{|Xn x| > } = 0

(iv) > 0 P{|Xn x| > } < for n suciently large.

26

WEAK LAW OF LARGE NUMBERS


Theorem - Let X1,X2,... be i.i.d. with E[X]= and E[X2]<. Let Sn=n-1(X1+...+Xn). Then Snp . Proof - Let 2=Var(Xi) < . We know that Var(Sn)=n-12, and E[Sn]=. By Chebyshev, 2 P{|Sn | > } 2 n Regardless of , this 0 as n , which completes the proof.

27

You might also like