Limit Theorems (Teacher)

5.
Limit Theorems
Prof. Jane-Hwa Huang
現今大數據時代，我們收集許多數據 X1, X2, …。每個數值各

不相同，是隨機變數。並且或許不是已知特定的隨機分布。
Q: 從這些巨量數據能獲得什麼資訊? 有沒有什麼特性?
例如，這些巨量數據的平均值，在極限(數據極多)的情況下會
不會收斂?
平均值: Mn=Sn/n=(X1 +  + Xn)/n, as n?
Wireless Mobile System Lab, NCNU 1

Outline
 Introduction
 Markov and Chebyshev Inequalities
 The Weak Law of Large Numbers
 Convergence in Probability
 The Central Limit Theorem
 The Strong Law of Large Numbers

Introduction (本章的基本假設)

Asymptotic lines: y=x, x=0
3
Limiting Case I
 Sn = X1 +  + Xn. (Sum of iid RVs)
 n  , Sn =?
 E[Sn] = E[X1]+  +E[Xn] = n
 var[Sn] = var(X1)+  +var(Xn) = n2
n  , E[Sn] and var[Sn]  
 The property is not good. ()

Limiting Case II (民意調查的理論基礎)
EX: Power consumption of a mobile phone is a RV, X i .
N
X i
True mean:  = i 1
, if there are totally N mobile phones .
N
RVs X 1 , X 2 , , X n : The samples of power consumption for n mobile phones (n  N).
X1  X 2   Xn Sn
Sample mean: M n   (the mean value for these n samples)
n n
 n  , Sn = X1 +  + Xn, then Mn=Sn/n=?
 E[Mn] = E[Sn]/n = n/n = 
 var[Mn] = var[Sn]/n2 = n2/n2 = 2/n
 n  , E[Mn] =  and var[Mn]  0
PHY meaning: If n  , var[Mn] 0. Mn is very close to  .

Mn does not vary significantly. Sample meantrue mean  .
Limiting Case III
 E[Zn] =0
 var[Zn] = var[Sn]/n2 = 1
 Distribution, mean, and variance remain
unchanged, as n increases.
 Can we find something from these properties?

 Limit theorems provide inference and statistics
(推論、統計), for large data sets (e.g., X1, X2, …,
Xn, …, X).
 Even if the probability distribution is unknown,
the limit theorems can provide some useful
information as n   .
6
5.1 Markov and Chebyshev
Inequalities
 These inequalities can provide
probabilistic bounds.
 Although the distribution is unknown;
 Only mean and variance are known.

Markov Inequality (沒辦法時的辦法)
 If a RV X can only take nonnegative values,

then
 Only the mean E[X] is known. E[ X ]

a
0 a
Proof:
0, if X  a
Define Ya   Ya  0, Ya  a,
 a, if X  a
if X  a if X  a
Ya  X always holds.
E[Ya ]  E[ X ] 0 a
E[Ya ]  0  P(Ya  0)  a  P(Ya  a)  aP( X  a )  E[ X ]

E[ X ]
P( X  a)  8
a
Ex. 5.1: Markov Inequality
 X ~ U[0, 4].
 Use Markov inequality to find upper
bounds for P(X  n), for n = 2, 3, and 4
 Compare with exact probabilities.
Sol:
 E[X]=2.
Exact
 The bound by Markov inequality is quite
loose. (Only the mean is known) 9
Chebyshev Inequality
 If X is a RV with mean  and variance 2,

then
 X is outside the range of (  c).

 The mean  and variance 2 are known. 2
c2
Proof:
c  +c
Consider a nonnegative RV ( X   ) . 2
Apply Markov inequality with a  c 2 .

E ( X   ) 2  2
P(( X   ) 2  c 2 )  2

c c2
2
Since ( X   ) 2  c 2  X    c, P ( X    c ) 
c2
Alternative form of Chebyshev Inequality
 Let c = k.
2 1
P( X    k )  2 2  2
k k
 Prob.of a RV taking a value more than
k away from the mean is at most 1/k2.
 More accurate than Markov inequality since

it also uses information of variance.

5.2 The Weak Law of Large Numbers
(弱大數法則)
 Samples X1, X2, … are iid RVs with true mean .
 Sample mean Mn= (X1+…+ Xn)/n.
 For every  > 0, we have
|Sample mean - true mean|
 Physical sense: Sample mean Mn of a large

number of iid RVs is very close to their true
mean with high probability.
 Necessary assumption is the well-defined E[X].
 Still valid for the infinite variance var(X).

Proof
X1  X 2   Xn Sn
Sample mean: M n  
n n
 E[Mn]=
 var[Mn]=var[Sn]/n2 = n2/n2 = 2/n
 According to Chebyshev inequality

2 /n
P( M n     )  , for any   0
 2
 For any  >0, prob.  0, as n  .

Ex. 5.5: Polling (民調)
 Q: 1. How many samples are enough?
2. error (誤差) ?
3. Confidence level? (可靠度、信賴水準、信賴區間、信心水準, a
prob.)
EX: Estimation error =0.1, Confidence level=95%,
Pr{Estimation error < =0.1}  95% (at least 95%)
 p: true fraction of all voters supporting a candidate.
 n random selected voters.
 Mn: fraction of n selected voters supporting a
candidate. (Using sample mean Mn to estimate p)
 How large is n enough to estimate the true mean p?
[Sol] The answer of a voter: Bernoulli RV with success
prob. p and variance 2= p(1-p). 14
Ex. 5.5: Polling (民調) max{p(1-p)}=1/4,
if p=1/2 (Ex. 5.3)
 By Chebyshev inequality (with a loose bound)

var( M n ) np(1  p) / n2 1
P( M n  p   )    , for any   0
2 2 4n 2
1
If   0.1 and n  100, P( M100  p  0.1)   0.25
4 100  (0.1) 2
 100 samples, Pr{Estimation error  =0.1}<25%.

 100 samples, Pr{Estimation error  =0.1}>75%.
(Confidence level = 75%)
 Increase the confidence level to at least 95%, and
the accuracy level  to within 0.01.
1
P( M n  p  0.01)  2
 (1  0.95)  0.05
4n(0.01)
 According to the inequality, we know that n  15
50000 voters are needed to achieve accuracy.
5.3 Convergence in Probability
 Weak law of large number:

Sample mean Mn converges to true mean .
 Convergence?
 EX: 1/n. If n, 1/n converges to 0.
 But M1, …, Mn are a sequence of RVs.

(since X1, …, Xn are RVs. ) RV Mn may vary.
X1  X 2   Xn
Sample mean: M n 
n
 We have to define “convergence” for RVs, in
probability.
Convergence of
a Deterministic Sequence
 Let a1, a2, … be a sequence of real numbers.
 Let a be another real number.
 The sequence a1, …, an converges to a, or
limn an = a,
 (Condition:) if for every  > 0 there exists
some n0 such that
(對每個  都能找到一個臨界值 n0。使得 n  n0 時,

an 與 a 的誤差小於 )
 EX: an =1/n, n, a=0.

 For 0.01, n0=100
 For 0.001, n0=1000
17
Convergence in Probability
 RV sequence Y1, Y2, …, Yn converges to

real number a in probability,
 (Condition:) if for every  > 0, we have
or
(對每個  都能找到一個臨界值 n0。使得 n 

n0 時, RV Yn 與 a 的誤差大於  的機率小於 )
 Yn concentrated within a.
  : accuracy level.
 (1): confidence level (信賴水準).
18
Ex. 5.7: Convergence in Probability
 Y ~ exp distribution ( = 1).

 Let Yn = Y / n.
 Does the sequence Yn converge to zero,
in probability?
 Sol:
P( Yn  0   )  P(Yn  Y / n   )  P(Y  n )  e n
lim P( Yn  0   )  lim e n  0 1-CDF
n  n 

5.5 The Strong Law of Large Numbers
(強大數法則)
 Similar to the weak law, concerning the
convergence of sample mean and true mean.
 Let X1, X2, … be a sequence of iid RVs with
true mean .
 Then, the sequence of sample means Mn =
(X1 +  + Xn)/n converges to true mean ,
with probability 1, in the sense that
Pr(sample mean
= true mean)=1
Sample mean true mean

Pr{Greater than
 The weak law of large numbers an error}0
20
Convergence with Probability 1
 Let Y1, Y2, … be a sequence of RVs (not

necessarily independent) associated
with the same probabilistic model.
 Let c be a real number.
 We say that Yn converges to c with
probability 1 (or almost surely) if

Example: Convergence with
Probability 1
 X1, X2, … ~ iid U[0, 1].
 Yn = min{X1, … , Xn}.
 Show that Yn converges 0 with probability 1.
Proof:
 For any positive ,
P(Y  min( X1 ,..., X n )   )  P( X 1   ,..., X n   )
 P( X 1   ) P( X n   )  (1   ) n
 It is true for any :

let 0
lim P(Yn   )  lim(1   )  0  lim P(Yn  0)  0
n
n  n  n 
lim P(Yn  0)  1 (Yn in [0, 1] )

n  22
5.4 The Central Limit Theorem (中央極限定理)
Observation:
common mean  and variance 2.
 Define
 Note: Same mean and variance for any

distribution at a large n.
 Can Zn be approximated by a particular
distribution? 23
The Central Limit Theorem

common mean  and variance 2.
 Define
 Then, the CDF of Zn converges to the standard
normal CDF
in the sense that
 Note:
 Hold for any kind of RV at a large n.
 iid RVs.
 Only finite mean and variance are needed.
24
 Used to find the distribution of large data sets.
Normal Approximation Based on the
Central Limit theorem
 Sn = X1 +  + Xn, Xi are iid RVs with mean 
and variance 2.
 If n is large, CDF P(Sn≤c) can be approximated
by treating Sn as if it were normal.
1) Calculate mean S and the variance  2
S of Sn.

 E[Sn]= S= n, var(Sn)= S = n2
2
2) Normal → standard normal (table looking-up)

Sn  S c  S
P( S n  c )  P(   z )  ( z )
S S
 CDF (z) is available from standard normal
CDF tables.
25
Example: Normal Approximation
Based on the Central Limit theorem
 Load on a plane 100 packages.
 Single package weight ~ U[5, 50] (pounds).
 P(total weight > 3000 pounds) = ?
Sol:
 Mean and variance of one-package weight:
Uniform
 Total weight: S100
 E[S100]= 100, var(Sn)=1002
S100  100 3000  100
P( S100  3000)  P(   1.92)  (1.92)
10 10
P( S100  3000)  1  P( S100  3000)  1  (1.92)  0.0274
26
De Moivre(棣美弗)-Laplace
Approximation to the Binomial
 Sn ~ binomial(n, p). (Binomial=sum of n Bernoulli RVs)
 The central limit approximation treats Sn as normal
with mean np and varaince np(1-p).
k  np Sn  np l  np  l  np   k  np 
P(k  Sn  l )  P(   )     
np(1  p) np(1  p) np(1  p )   
 np (1  p )   np (1  p ) 
If l=k,
 De Moivre-Laplace approximation Improved P(Sn=l)=0
 n is large. version
 k and l are nonnegative integers, then
Central limit De Moivre-Laplace

approximation
P(k  Sn  l ) approximation
 l  np   k  np   l  1/ 2  np   k  1/ 2  np  27
         
 np(1  p)  np(1  p)   np(1  p)   np(1  p) 
     
Proof
 Binominal = Sum of Bernoulli RVs
 Sn= X1+…+ Xn
 E[Xi]= p, var(Xi)=p(1-p) (Bernoulli)
 E[Sn]= np, var(Sn)= np(1-p) (Binomial)
 By the central limit theorem, binominal  normal
CDF of standard normal
 If k= l, P(Sn=k)=0. (X)
 De Moivre-Laplace approximation
 P(Sn=k) = P(k-1/2 Sn  k+1/2) 28

Example: De Moivre-Laplace
Approximation to the Binomial
 Sn ~ binomial(n = 36, p = 0.5).
 P(Sn  21) = ?
 Compare with the exact value.
Hard
Sol:
 36  21
 Exact value: P( S  21)    k (0.5) n

36
 0.8785
  k 0
 Central limit approximation:

21  np 21  18
P( Sn  21)  ( )  ( )  (1)  0.8413
np(1  p) 3
 De Moivre-Laplace approximation:
21  0.5  np 21.5  18 Much
P( Sn  21)  ( )  ( )  (1.17)  0.879 closer
np(1  p) 3
 Approximation for a single value P(Sn = 19) = ?
19.5  18 18.5  18  36 
P( Sn  19)  ( )  ( )  0.6915  0.5675  0.124  19  (0.5)  0.1251
36
3 3   29
Very close Exact
節錄修改自:民調樣本多多益善？
【聯合報／記者李名揚、郭錦萍】
 理論上，若問題僅兩個答案可選（如是或否），則不
論總人數（母數）多少，只要有效回答的人數（樣本
數），達到1068人，民調結果「誤差」就是正負3個百
分點，「信心水準」是95%。
 「誤差正負3個百分點」的意義是，若有一份選舉民調
，甲候選人的支持度42%，代表他真正的支持度，介於
39%到45%之間。基本上，支持度相差6個百分點以上，
才可以說其中一人領先。
 「信心水準95%」則是說，用這種方法做的民調，在每
100次民調中，平均有95次是可以信賴的。但還是有5%
的機率，不落在民調的誤差範圍內。
3% 5%@ n=1068
30
X1  X 2   X n Sn
Mn  
n n
E[ M n ]  E[ S n ] / n  p and var( M n )  var( S n ) / n 2  p(1  p) / n
According to the central limit theorem, we assume that
M n can be approximated as a normal random variable.
 M p Mn  p  
P( M n  p   )  P  n
  
 var( M n ) p (1  p ) / n p (1  p ) / n 

 M p      
 P n
 var( M n )
   2 1   
1/ 4n    1/ 4n  
  (
 2  2 2 n )

where p (1  p )  1/ 4.
For the confidence level to at least 95% and the accuracy level   3%,
( ) ( )
2  2 2 n  2  2 0.06 n  0.05, 0.975   0.06 n ( )
1.96  0.06 n , n  1067.1
31

Limit Theorems (Teacher)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Limit Theorems (Teacher)

Uploaded by

Copyright:

Available Formats

5.

現今大數據時代，我們收集許多數據 X1, X2, …。每個數值各

Wireless Mobile System Lab, NCNU 1

Wireless Mobile System Lab, NCNU 2

Asymptotic lines: y=x, x=0

 The property is not good. ()

Wireless Mobile System Lab, NCNU 4

PHY meaning: If n  , var[Mn] 0. Mn is very close to  .

 Can we find something from these properties?

Wireless Mobile System Lab, NCNU 7

 If a RV X can only take nonnegative values,

 Only the mean E[X] is known. E[ X ]

E[Ya ]  0  P(Ya  0)  a  P(Ya  a)  aP( X  a )  E[ X ]

 If X is a RV with mean  and variance 2,

 X is outside the range of (  c).

Apply Markov inequality with a  c 2 .

 More accurate than Markov inequality since

Wireless Mobile System Lab, NCNU 11

 Physical sense: Sample mean Mn of a large

Wireless Mobile System Lab, NCNU 12

 According to Chebyshev inequality

 For any  >0, prob.  0, as n  .

 By Chebyshev inequality (with a loose bound)

 100 samples, Pr{Estimation error  =0.1}<25%.

 Weak law of large number:

 But M1, …, Mn are a sequence of RVs.

(對每個  都能找到一個臨界值 n0。使得 n  n0 時,

 EX: an =1/n, n, a=0.

 RV sequence Y1, Y2, …, Yn converges to

(對每個  都能找到一個臨界值 n0。使得 n 

 Y ~ exp distribution ( = 1).

Wireless Mobile System Lab, NCNU 19

Sample mean true mean

 Let Y1, Y2, … be a sequence of RVs (not

Wireless Mobile System Lab, NCNU 21

 It is true for any :

lim P(Yn  0)  1 (Yn in [0, 1] )

 Note: Same mean and variance for any

 Let X1, X2, … be a sequence of iid RVs with

in the sense that

2) Normal → standard normal (table looking-up)

Central limit De Moivre-Laplace

CDF of standard normal

 P(Sn=k) = P(k-1/2 Sn  k+1/2) 28

 Exact value: P( S  21)    k (0.5) n

 Central limit approximation:

You might also like