You are on page 1of 64

Week 1: Review of Probability and Statistics

Week 1: Review of Probability and Statistics


Key Concepts:
Week 1: Review of Probability and Statistics
Key Concepts:
Sample space and Events
Rules of Probability
Conditional probability and Independence
Computing Probabilities
Week 1: Review of Probability and Statistics
Key Concepts: Probability: Definition and Properties
Sample space and Events (i) 0 ≤ P(A) ≤ 1
Rules of Probability (ii) P(S) = 1
Conditional probability and Independence (iii) if A1 , A2 , · · · , and B are mutually exclusive,
i.e. Ai ∩ Aj = ∅, for i 6= j, then
Computing Probabilities
X
P(∪Ai ) = P(Ai )
i

(iv) Not hard to see that P(∅) = 0


(v) P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
If A1 , A2 , · · · An are n events, then

n
P(U1 Ai ) =
X X
P(Ai ) − P(Ai ∩ Aj )+
i<j
n−1 n
X
P(Ai ∩ Aj ∩ Ak ) − · · · + (−) P(∩1 Ai )
i<j<k
Week 1: Review of Probability
Week 1: Review of Probability
Conditional Probability & Independence:
Week 1: Review of Probability
Conditional Probability & Independence:
The conditional probability of A given B,
P(A∩B)
P(A|B) = P(B)

Events A and B are independent if

P(A ∩ B) = P(A)P(B)
Week 1: Review of Probability
Conditional Probability & Independence:
The conditional probability of A given B,
P(A∩B)
P(A|B) = P(B)

Events A and B are independent if

P(A ∩ B) = P(A)P(B)

Discrete Probability
Week 1: Review of Probability
Conditional Probability & Independence:
The conditional probability of A given B,
P(A∩B)
P(A|B) = P(B)

Events A and B are independent if

P(A ∩ B) = P(A)P(B)

Discrete Probability
List of values x1 , x2 , · · · , xn

Associated probabilities:
P(x1 ), P(x2 ), · · · , P(xn )

P i) ≥ 0
(i) P(x
(ii) i P(xi ) = 1
Examples: Bernoulli(p), Binomial(n,p),
hypergeometric, and Poisson.
Week 1: Review of Probability
Conditional Probability & Independence: Continuous Probability
The conditional probability of A given B, A continuous probability distribution is completely
P(A∩B) defined via the probability density function, f . The p.d.f
P(A|B) = P(B) satisfies
Events A and B are independent if (i) f ≥ 0
R∞
(ii) −∞ f (x)dx = 1
P(A ∩ B) = P(A)P(B)
(iii) The probabilities are calculated by

Discrete Probability Z b
P(a, b) = f (x)dx
List of values x1 , x2 , · · · , xn a

Associated probabilities:
(iv) Note: for continuous distribution, p(x) = 0 for
P(x1 ), P(x2 ), · · · , P(xn ) all x. So,
P(a, b) = P[a, b]
P i) ≥ 0
(i) P(x
Examples: Normal, Uniform, exponential ,and
(ii) i P(xi ) = 1 Pareto.
Examples: Bernoulli(p), Binomial(n,p),
hypergeometric, and Poisson.
Week 1: Review of Probability
Week 1: Review of Probability
Continuous Probability on R2
Week 1: Review of Probability
Continuous Probability on R2
Given a p.d.f f (x1 , x2 )
for any a < b; c < d

Z d Z b
P(a < X1 < b, c < X2 < d) = f (x1 , x2 )dx1 dx2
c a

More generally, for densities on Rn


n
for any AR⊂
RR ,
P(A) = A
f (x1 , x2 , · · · , xn )dx1 dx2 · · · dxn
Random Variables
Week 1: Review of Probability
Continuous Probability on R2
Given a p.d.f f (x1 , x2 )
for any a < b; c < d

Z d Z b
P(a < X1 < b, c < X2 < d) = f (x1 , x2 )dx1 dx2
c a

More generally, for densities on Rn


n
for any AR⊂
RR ,
P(A) = A
f (x1 , x2 , · · · , xn )dx1 dx2 · · · dxn
Random Variables
A real valued random variable X is a function
from a probability space (Ω, P) to R
X : Ω 7→ R
The distribution of X is the probability on R
given by

−1
P(X ∈ A) = P(X (A)) = P({ω : X (ω) ∈ A})

Typically, all that matters is the distribution of X .


The underlying sample space is not very
relevant.
Week 1: Review of Probability
Continuous Probability on R2 Cumulative Distribution function
Given a p.d.f f (x1 , x2 ) The function F : R 7→ [0, 1] defined by
for any a < b; c < d F (t) = P(X ≤ t), is called the (Cumulative)
Distribution Function of X
Z d Z b
F is increasing int, right continuous,
P(a < X1 < b, c < X2 < d) = f (x1 , x2 )dx1 dx2 limt→−∞ F (t) = 0; limt→∞ F (t) = 1
c a
If P(X = x) > 0, then F (x) has a jump at x
with jump size equal to p(x)
n
More generally, for densities on R
If P is continuous then F is continuous
n
for any AR⊂
RR ,
Rt
If f is pdf then F (t) = −∞ f (x)dx and
P(A) = A
f (x1 , x2 , · · · , xn )dx1 dx2 · · · dxn dF (t)
f (x) = dt |x
Random Variables
A real valued random variable X is a function
from a probability space (Ω, P) to R
X : Ω 7→ R
The distribution of X is the probability on R
given by

−1
P(X ∈ A) = P(X (A)) = P({ω : X (ω) ∈ A})

Typically, all that matters is the distribution of X .


The underlying sample space is not very
relevant.
Week 1: Review of Probability
Continuous Probability on R2 Cumulative Distribution function
Given a p.d.f f (x1 , x2 ) The function F : R 7→ [0, 1] defined by
for any a < b; c < d F (t) = P(X ≤ t), is called the (Cumulative)
Distribution Function of X
Z d Z b
F is increasing int, right continuous,
P(a < X1 < b, c < X2 < d) = f (x1 , x2 )dx1 dx2 limt→−∞ F (t) = 0; limt→∞ F (t) = 1
c a
If P(X = x) > 0, then F (x) has a jump at x
with jump size equal to p(x)
n
More generally, for densities on R
If P is continuous then F is continuous
n
for any AR⊂
RR ,
Rt
If f is pdf then F (t) = −∞ f (x)dx and
P(A) = A
f (x1 , x2 , · · · , xn )dx1 dx2 · · · dxn dF (t)
f (x) = dt |x
Random Variables
Moment generating function
A real valued random variable X is a function
from a probability space (Ω, P) to R
The function MX (t) := E(etX ) is called the
X : Ω 7→ R moment generating Function (MGF) of X .
The distribution of X is the probability on R Moment generating function, if it exists in an
given by interval containing 0, huniquelyi determines the
dMX (t)
distribution. E(X ) = dt
−1 t=0
P(X ∈ A) = P(X (A)) = P({ω : X (ω) ∈ A})
MGF may not always exist.

Typically, all that matters is the distribution of X . E(e−itX ) is called the characteristics function.
The underlying sample space is not very This always exists and has nice properties.
relevant.
Week 1: Review of Probability
Week 1: Review of Probability
Joint Distribution: Discrete
Week 1: Review of Probability
Joint Distribution: Discrete
If (X , Y ) are two discrete random variables,
their joint probability are described by
Joint Probability distribution
P(xi , yj ) = P(X = xi , Y = yj )
P
for all xi , yj ; P(xi , yj ) > 0; i,j P(xi , yj ) = 1
P
PX (xi ) = j P(xi , yj ) is called the marginal
distribution of X
Week 1: Review of Probability
Joint Distribution: Discrete
If (X , Y ) are two discrete random variables,
their joint probability are described by
Joint Probability distribution
P(xi , yj ) = P(X = xi , Y = yj )
P
for all xi , yj ; P(xi , yj ) > 0; i,j P(xi , yj ) = 1
P
PX (xi ) = j P(xi , yj ) is called the marginal
distribution of X
Joint distribution: continuous
Week 1: Review of Probability
Joint Distribution: Discrete
If (X , Y ) are two discrete random variables,
their joint probability are described by
Joint Probability distribution
P(xi , yj ) = P(X = xi , Y = yj )
P
for all xi , yj ; P(xi , yj ) > 0; i,j P(xi , yj ) = 1
P
PX (xi ) = j P(xi , yj ) is called the marginal
distribution of X
Joint distribution: continuous
The joint density of (X , Y ) is given by the joint
pdf f (x, y )
R R
P((X , Y ) ∈ A) = A
f (x, y )dxdy
The marginal density is given by
Z ∞
fX (x) = f (x, y)dy
−∞

Conditional density of Y given X = x is

f (x, y)
f (y|x) =
fX (x)
Week 1: Review of Probability
Joint Distribution: Discrete Independence
If (X , Y ) are two discrete random variables,
their joint probability are described by X and Y are independent if and only if
f (x, y) = fX (x)fY (x) for all x, y
Joint Probability distribution
P(xi , yj ) = P(X = xi , Y = yj ) If X and Y are independent then ρ(X , Y ) = 0.
P The converse is not true
for all xi , yj ; P(xi , yj ) > 0; i,j P(xi , yj ) = 1 If X and Y are independent
P V (X + Y ) = V (X ) + V (Y )
PX (xi ) = j P(xi , yj ) is called the marginal
distribution of X If X1 and X2 are independent then
Joint distribution: continuous
MX1 +X2 (t) = MX1 (t) × MX2 (t)
The joint density of (X , Y ) is given by the joint
pdf f (x, y )
R R
P((X , Y ) ∈ A) = A
f (x, y )dxdy
The marginal density is given by
Z ∞
fX (x) = f (x, y)dy
−∞

Conditional density of Y given X = x is

f (x, y)
f (y|x) =
fX (x)
Week 1: Review of Probability
Joint Distribution: Discrete Independence
If (X , Y ) are two discrete random variables,
their joint probability are described by X and Y are independent if and only if
f (x, y) = fX (x)fY (x) for all x, y
Joint Probability distribution
P(xi , yj ) = P(X = xi , Y = yj ) If X and Y are independent then ρ(X , Y ) = 0.
P The converse is not true
for all xi , yj ; P(xi , yj ) > 0; i,j P(xi , yj ) = 1 If X and Y are independent
P V (X + Y ) = V (X ) + V (Y )
PX (xi ) = j P(xi , yj ) is called the marginal
distribution of X If X1 and X2 are independent then
Joint distribution: continuous
MX1 +X2 (t) = MX1 (t) × MX2 (t)
The joint density of (X , Y ) is given by the joint
pdf f (x, y )
R R Inequalities
P((X , Y ) ∈ A) = A
f (x, y )dxdy
The marginal density is given by Markov’s Ineqaulity if X ≥ 0 then

Z ∞ E(X )
fX (x) = f (x, y)dy P(X ≥ a) ≤
−∞ a

Conditional density of Y given X = x is Chebyshev’s inequality

f (x, y) V (x)
f (y|x) = P|(X − E(X )| > a) ≤
fX (x) a2
Week 1: Limit Theorems
Week 1: Limit Theorems
Key Concepts:
Week 1: Limit Theorems
Key Concepts:
Notions of convergence
Law of Large Number (LLN)
Central Limit Theorem (CLT)
Week 1: Limit Theorems
Key Concepts:
Notions of convergence
Law of Large Number (LLN)
Central Limit Theorem (CLT)
Notions of Convergence
Let (Ω, P) be a probability space. For each ω ∈ Ω, we
define a sequence of random variable
X1 (ω), X2 (ω), · · ·
We know, what it means to say a sequence of
numbers an converges to a
X1 (ω), X2 (ω), · · · are functions. A natural
definition would be

Xn → X if Xn (ω) → X (ω) for all ω

Note that the underlying probability plays no


role here.
We need a notion of convergence that uses P
Week 1: Limit Theorems
Key Concepts: Convergence in Probability
Notions of convergence We say that Xn → X in probability if
Law of Large Number (LLN)
P(|Xn − X | > ) = 0, as n → ∞
Central Limit Theorem (CLT)
Notions of Convergence
or conversely, if for every positive  ,
Let (Ω, P) be a probability space. For each ω ∈ Ω, we
define a sequence of random variable
X1 (ω), X2 (ω), · · · P(|Xn − X | ≤ ) = 1, as n → ∞.

We know, what it means to say a sequence of


numbers an converges to a This means that by taking n sufficiently large,
one can achieve arbitrarily high probability that
X1 (ω), X2 (ω), · · · are functions. A natural Xn is arbitrarily close to X .
definition would be

Xn → X if Xn (ω) → X (ω) for all ω

Note that the underlying probability plays no


role here.
We need a notion of convergence that uses P
Week 1: Limit Theorems
Week 1: Limit Theorems
Convergence in Distribution:
Week 1: Limit Theorems
Convergence in Distribution:

Xn → X in distribution if,
FXn (t) → FX (t) for all t such that FX is
continuous at t
The restriction on to points of continuity makes
this definition look artificial (will discuss this in a
moment)
Note: If the limit X has a continuous
distribution, then FX is continuous and we have

FXn (t) → FX (t) for all t

Now, suppose Yn ∼ N(0, σn ); and σn → 0


P(−a < Yn < a) = P( −a
σn < Z <
a
σn )
Week 1: Limit Theorems
Convergence in Distribution:

Xn → X in distribution if,
FXn (t) → FX (t) for all t such that FX is
continuous at t
The restriction on to points of continuity makes
this definition look artificial (will discuss this in a
moment)
Note: If the limit X has a continuous
distribution, then FX is continuous and we have

FXn (t) → FX (t) for all t

Now, suppose Yn ∼ N(0, σn ); and σn → 0


P(−a < Yn < a) = P( −a
σn < Z <
a
σn )→1
(why?),
Week 1: Limit Theorems
Convergence in Distribution:
In the topics we cover F will typically be Normal
Xn → X in distribution if, distribution. So we do not have to worry about
FXn (t) → FX (t) for all t such that FX is discontinuous points. There is none.
continuous at t A useful tool for showing convergence in
The restriction on to points of continuity makes distribution is the following:
this definition look artificial (will discuss this in a If MXn (t) → MX , then Xn converges to X .
moment)
Note: If the limit X has a continuous
distribution, then FX is continuous and we have

FXn (t) → FX (t) for all t

Now, suppose Yn ∼ N(0, σn ); and σn → 0


P(−a < Yn < a) = P( −a
σn < Z <
a
σn )→1
(why?), since σan → ∞
so we would like to say that the distribution of
Yn converges to the probability concentrated at
0
But Fn (0) = 0.5 for all n and F (0) = 1. So
Fn (0) does not converge to F (0)
F is not continuous at 0. At all other t,
Fn (t) → F (t)
Week 1: Limit Theorems
Convergence in Distribution:
In the topics we cover F will typically be Normal
Xn → X in distribution if, distribution. So we do not have to worry about
FXn (t) → FX (t) for all t such that FX is discontinuous points. There is none.
continuous at t A useful tool for showing convergence in
The restriction on to points of continuity makes distribution is the following:
this definition look artificial (will discuss this in a If MXn (t) → MX , then Xn converges to X .
moment) Example: Poisson Approximation to Binomial
Note: If the limit X has a continuous If npn → λ then Bin(n, pn ) converges to
distribution, then FX is continuous and we have Poisson(λ)

FXn (t) → FX (t) for all t MGF of Bin(n, Pn ); Mn (t) = (1 − pn + pn et )n


write pn = λ/n, so that
t n
Now, suppose Yn ∼ N(0, σn ); and σn → 0 Mn (t) = (1 − λ
n (1 − e ))

P(−a < Yn < a) = P( −a


σn < Z <
a
)→1 Since
σn
(1 − xn )n → e−x ; Mn (t) → exp(λ(et − 1))
(why?), since σan → ∞
so we would like to say that the distribution of
Yn converges to the probability concentrated at
0
But Fn (0) = 0.5 for all n and F (0) = 1. So
Fn (0) does not converge to F (0)
F is not continuous at 0. At all other t,
Fn (t) → F (t)
Week 1: Limit Theorems
Convergence in Distribution:
In the topics we cover F will typically be Normal
Xn → X in distribution if, distribution. So we do not have to worry about
FXn (t) → FX (t) for all t such that FX is discontinuous points. There is none.
continuous at t A useful tool for showing convergence in
The restriction on to points of continuity makes distribution is the following:
this definition look artificial (will discuss this in a If MXn (t) → MX , then Xn converges to X .
moment) Example: Poisson Approximation to Binomial
Note: If the limit X has a continuous If npn → λ then Bin(n, pn ) converges to
distribution, then FX is continuous and we have Poisson(λ)

FXn (t) → FX (t) for all t MGF of Bin(n, Pn ); Mn (t) = (1 − pn + pn et )n


write pn = λ/n, so that
t n
Now, suppose Yn ∼ N(0, σn ); and σn → 0 Mn (t) = (1 − λ
n (1 − e ))

P(−a < Yn < a) = P( −a


σn < Z <
a
)→1 Since
σn
(1 − xn )n → e−x ; Mn (t) → exp(λ(et − 1))
(why?), since σan → ∞
Example: Normal Approximation to Poisson
so we would like to say that the distribution of
Yn converges to the probability concentrated at If λ1 , λ2 , ..., be an increasing sequence s.t.
0 λn → ∞ with corresponding sequence of r.vs
But Fn (0) = 0.5 for all n and F (0) = 1. So Xn then Poisson(λn ) converges to
Fn (0) does not converge to F (0) Z ∼ N(0, 1)

F is not continuous at 0. At all other t, MGF of Poisson(λn ); Mn (t) = exp(λn (et − 1))
Fn (t) → F (t) write Xn in its standardized form as Zn
We see that MZn (t) converges to MZ (t)
Week 1: Limit Theorems
Week 1: Limit Theorems
Law of Large Numbers:
Week 1: Limit Theorems
Law of Large Numbers:

Theorem (WLLN)
Let X1 , X2 , · · · , Xi · · · Xn be a sequence of
independent random variables with E(Xi ) = µ and
Var (Xi ) = σ 2 . Let X̄n = n−1 ni=1 Xi Then, for any
P
 > 0,

P(|X̄n − µ| > ) → 0 as n → ∞

Proof.

We first find E(X̄n ) and Var (X̄n ):

n
1X
E(X̄n ) = E(Xi ) = µ
n i=1

Since the Xi ’s are independent,

n
1 X σ2
Var (X̄n ) = 2
Var (Xi ) =
n i=1 n

The desired result now follows immediately from


Chebyshev’s inequality, which states that
Var (X̄n ) σ2
P(|X̄n − µ| > ) ≤ = → 0, as n →
2 n2

Week 1: Limit Theorems
Law of Large Numbers: Problems
1. Let X1 , X2 , · · · be a sequence of independent
Theorem (WLLN) random variables with E(Xi ) = µ and
Var (Xi ) = σi2 . Show that if n−2 ni=1 σi2 → 0,
P
Let X1 , X2 , · · · , Xi · · · Xn be a sequence of
independent random variables with E(Xi ) = µ and then X̄ → µ in probability.
Var (Xi ) = σ 2 . Let X̄n = n−1 ni=1 Xi Then, for any 2. Let Xi be as in Problem 1 but with E(Xi ) = µi
P
and n−1 ni=1 µi → µ. Show that X̄ → µ in
P
 > 0,
probability.
P(|X̄n − µ| > ) → 0 as n → ∞

Proof.

We first find E(X̄n ) and Var (X̄n ):

n
1X
E(X̄n ) = E(Xi ) = µ
n i=1

Since the Xi ’s are independent,

n
1 X σ2
Var (X̄n ) = 2
Var (Xi ) =
n i=1 n

The desired result now follows immediately from


Chebyshev’s inequality, which states that
Var (X̄n ) σ2
P(|X̄n − µ| > ) ≤ = → 0, as n →
2 n2

Week 1: Limit Theorems
Law of Large Numbers: Problems
1. Let X1 , X2 , · · · be a sequence of independent
Theorem (WLLN) random variables with E(Xi ) = µ and
Var (Xi ) = σi2 . Show that if n−2 ni=1 σi2 → 0,
P
Let X1 , X2 , · · · , Xi · · · Xn be a sequence of
independent random variables with E(Xi ) = µ and then X̄ → µ in probability.
Var (Xi ) = σ 2 . Let X̄n = n−1 ni=1 Xi Then, for any 2. Let Xi be as in Problem 1 but with E(Xi ) = µi
P
and n−1 ni=1 µi → µ. Show that X̄ → µ in
P
 > 0,
probability.
P(|X̄n − µ| > ) → 0 as n → ∞ Solution
X1 , X2 , · · · , is a sequence of independent random
Proof. variables with E(Xi ) = µ, Var (Xi ) = σi2 . If
n−2
P 2
σi → 0, show that WLLN holds.
We first find E(X̄n ) and Var (X̄n ):
 Pn Pn 2
Xi

n 1 1 σi
1X E(X̄ ) = µ Var (X̄ ) = Var =
E(X̄n ) = E(Xi ) = µ n n2
n i=1
By Chebyshev
Since the Xi ’s are independent,
Pn 2
1 1 σi
n P(|X̄ − µ| > ) ≤
1 X σ2 2 n2
Var (X̄n ) = 2 Var (Xi ) =
n i=1
n
The last term goes to zero by assumption.

The desired result now follows immediately from


Chebyshev’s inequality, which states that
Var (X̄n ) σ2
P(|X̄n − µ| > ) ≤ = → 0, as n →
2 n2

Week 1: Limit Theorems
Week 1: Limit Theorems
Example (Monte Carlo):
Week 1: Limit Theorems
Example (Monte Carlo):
Suppose we want to evaluate 01 f (x)dx which
R

is difficult to evaluate analytically


Note: 01 f (x)dx = E(f ) with respect to uniform
R

distribution
Simulate x1 , x2 , · · · , xn (large n) from U(0, 1)
distribution
By WLLN

n Z 1
1X
f (xi ) ≈ f (x)dx
n 1 0

This is called Monte Carlo integration


Week 1: Limit Theorems
Example (Monte Carlo): Homework 1-Problem 19 & 20
Suppose we want to evaluate 01 f (x)dx which
R
Find Monte Carlo approximation to
R1
is difficult to evaluate analytically 0
cos2πxdx
Note: 01 f (x)dx = E(f ) with respect to uniform
R
Find an estimate of the standard deviaiton of
distribution the approximation
Simulate x1 , x2 , · · · , xn (large n) from U(0, 1)
distribution
By WLLN

n Z 1
1X
f (xi ) ≈ f (x)dx
n 1 0

This is called Monte Carlo integration


Week 1: Limit Theorems
Week 1: Limit Theorems
Central Limit Theorem(CLT):
Week 1: Limit Theorems
Central Limit Theorem(CLT):

Theorem (CLT)
Let X1 , X2 , · · · , be a sequence of independent
random variables with E(Xi ) = 0, Var (Xi ) = σ 2 with
common distribution F .( that is
X1 , X2 , · · · , i.i.d ∼ F ).
Assume that F has MGF M(t) defined in an interval
around 0. P
n
Let Sn = 1 Xi . Then for all −∞ < x < ∞,

 
Sn
P √ ≤x → Φ(x) as n → ∞
σ n

where Φ(x) = P(Z ≤ x) is the CDF of standard


normal
Week 1: Limit Theorems
Central Limit Theorem(CLT):
Dividing the numerator and denominator of
Theorem (CLT) Sn
√ by n, we get
σ n
Let X1 , X2 , · · · , be a sequence of independent
random variables with E(Xi ) = 0, Var (Xi ) = σ 2 with √ !
nX̄n
common distribution F .( that is P ≤x → Φ(x) as n → ∞
X1 , X2 , · · · , i.i.d ∼ F ). σ
Assume that F has MGF M(t) defined in an interval
around 0. P
n If E(Xi ) = µ we can apply CLT to Xi − µ(This
Let Sn = 1 Xi . Then for all −∞ < x < ∞,
has expected value 0) and so in this case,
 
Sn √ !
P √ ≤x → Φ(x) as n → ∞ n(X̄n − µ)
σ n P ≤x → Φ(x) as n → ∞
σ
where Φ(x) = P(Z ≤ x) is the CDF of standard
normal Typically
 √ we use CLT  to get an approximation of
n(X̄n −µ)
Extensions of CLT P σ ≤ x
The central limit theorem can be proved in How good is the approximation?
greater generality
√ If F is symmetric and has tails that die rapidly
Note: sd(Sn ) = nσ, so σS√n n has mean 0 and then the approximation is good
s.d 1
In case when F is highly skewed or has tail that
go to 0 very slowly, we need a large n to get a
good approximation.
Week 1: Limit Theorems
Week 1: Limit Theorems
Proof. Central Limit Theorem(CLT):
Week 1: Limit Theorems
Proof. Central Limit Theorem(CLT):

Proof.

M(t) = E(etX ), MSn (t) = E(etSn ) = [M(t)]n


Let !
S√
n √
Sn t
Zn = √ , MZ (t) =E e σ n = MSn (t/σ n)
σ n n

Thus,   n
t
MZn (t) = M √
σ n
2 /2
We want to show that , as n → ∞, this goes to et
We will make use of the following result

 n
b + an b
1+ → e as an → 0
n

So we also need to express MZn (t) in this form (how?).


Week 1: Limit Theorems
Proof. Central Limit Theorem(CLT):

Proof.

M(t) = E(etX ), MSn (t) = E(etSn ) = [M(t)]n


Let !
S√
n √
Sn t
Zn = √ , MZ (t) =E e σ n = MSn (t/σ n)
σ n n

Thus,   n
t
MZn (t) = M √
σ n
2 /2
We want to show that , as n → ∞, this goes to et
We will make use of the following result

 n
b + an b
1+ → e as an → 0
n

So we also need to express MZn (t) in this form (how?).


Note M(0) = 1;
Since E(X ) = 0, M 0 (0) = 0, E(X 2 ) = σ 2 , we have
M 00 (0) = σ 2
By Taylor expansion

0 s2 00 s3 000
M(s) = M(0) + sM (0) + M (0) + M (0)
2 6

s2 s3
M(s) = 1 + σ2 + M 000 (0)
Week 1: Limit Theorems
Proof. Central Limit Theorem(CLT): Proof Continued:

Proof. s2 2 s3 000
M(s) = 1 + σ + M (0)
tX
M(t) = E(e ), MSn (t) = E(etSn ) n
= [M(t)] 2 6
Let !
S√ t
Sn t n √ with s = σ

n
Zn = √ , MZ (t) =E e σ n = MSn (t/σ n)
σ n n

Thus, n

t

t2
M √ =1+ + n
 
t
MZn (t) = M √ σ n 2n
σ n
2 /2 where
We want to show that , as n → ∞, this goes to et t 000
We will make use of the following result n = M (0)
6n3/2
t3
n
M 000 (0).

b + an b where n =
1+ → e as an → 0 3/2 
6n
n t
Show that M σ√ n
can be written as
So we also need to express MZn (t) in this form (how?).
Note M(0) = 1; t 2 /2 + an
 
t
Since E(X ) = 0, M 0 (0) = 0, E(X 2 ) = σ 2 , we have M √ =1+
σ n n
M 00 (0) = σ 2
By Taylor expansion
with an → 0 We then have
" #n
t 2 /2 + an
n
s2 00 s3 000
 
0
t t 2 /2
M(s) = M(0) + sM (0) + M (0) + M (0) M √ = 1+ →e
2 6 σ n n

s2 s3
M(s) = 1 + σ2 + M 000 (0) as required.
Week 1: Limit Theorems
Week 1: Limit Theorems
Problems
Week 1: Limit Theorems
Problems
17. Suppose that a measurement has mean µ and
variance σ 2 = 25. Let X̄ be the average of n
such independent measurements. How large
should n be so that

P{|X̄ − µ| < 1} = .95?


Week 1: Limit Theorems
Problems
17. Suppose that a measurement has mean µ and
variance σ 2 = 25. Let X̄ be the average of n
such independent measurements. How large
should n be so that

P{|X̄ − µ| < 1} = .95?

( √ √ )
n(X̄ − µ) n
P{|X̄ −µ| < 1} = P | |<
5 5

( √ )
n
≈P |Z | < = .95
5

But we also know that

P {|Z | < 1.96} = .95


n
1.96 =
5

2 2
n = (1.96) × 5
Week 1: Limit Theorems
Problems Problems
17. Suppose that a measurement has mean µ and
variance σ 2 = 25. Let X̄ be the average of n
such independent measurements. How large
should n be so that

P{|X̄ − µ| < 1} = .95?

( √ √ )
n(X̄ − µ) n
P{|X̄ −µ| < 1} = P | |<
5 5

( √ )
n
≈P |Z | < = .95
5

But we also know that

P {|Z | < 1.96} = .95


n
1.96 =
5

2 2
n = (1.96) × 5
Week 1: Limit Theorems
Problems Problems
17. Suppose that a measurement has mean µ and
variance σ 2 = 25. Let X̄ be the average of n
such independent measurements. How large
should n be so that

P{|X̄ − µ| < 1} = .95?

Xi weight of ith package

( √ √ ) E(Xi ) = 15, σ = 10
n(X̄ − µ) n
P{|X̄ −µ| < 1} = P | |< P100
5 5 Total weight, T = 1 Xi

( √ ) P(T > 1700) =


n
≈P |Z | < = .95
T − 1500 1700 − 1500
 
5
P >
10 × 10 10 × 10
But we also know that
≈ P (Z > 2)
P {|Z | < 1.96} = .95


n
1.96 =
5

2 2
n = (1.96) × 5
Week 1: Limit Theorems
Week 1: Limit Theorems
Problems
Week 1: Limit Theorems
Problems
Week 1: Limit Theorems
Problems

X1 , X2 , · · · , Xn ∼ U(0, 1)

M = min(X1 , X2 , · · · , Xn )

n
P{1 − M < t} = P{M > 1 − t} = 1 − (1 − t)
t
P{1 − M < } = P{n(1 − M) < t}
n
t n
1 − (1 − )
n
n
As n → ∞, 1 − 1 − t
n → 1 − e−t

n(1 − M) → exp(1)
Week 1: Limit Theorems
Problems Problems

X1 , X2 , · · · , Xn ∼ U(0, 1)

M = min(X1 , X2 , · · · , Xn )

n
P{1 − M < t} = P{M > 1 − t} = 1 − (1 − t)
t
P{1 − M < } = P{n(1 − M) < t}
n
t n
1 − (1 − )
n
n
As n → ∞, 1 − 1 − t
n → 1 − e−t

n(1 − M) → exp(1)

You might also like