You are on page 1of 31

5.

Limit Theorems
Prof. Jane-Hwa Huang

現今大數據時代,我們收集許多數據 X1, X2, …。每個數值各


不相同,是隨機變數。並且或許不是已知特定的隨機分布。
Q: 從這些巨量數據能獲得什麼資訊? 有沒有什麼特性?
例如,這些巨量數據的平均值,在極限(數據極多)的情況下會
不會收斂?
平均值: Mn=Sn/n=(X1 +  + Xn)/n, as n?

Wireless Mobile System Lab, NCNU 1


Outline
 Introduction
 Markov and Chebyshev Inequalities
 The Weak Law of Large Numbers
 Convergence in Probability
 The Central Limit Theorem
 The Strong Law of Large Numbers

Wireless Mobile System Lab, NCNU 2


Introduction (本章的基本假設)

Asymptotic lines: y=x, x=0

3
Limiting Case I
 Sn = X1 +  + Xn. (Sum of iid RVs)

 n  , Sn =?
 E[Sn] = E[X1]+  +E[Xn] = n
 var[Sn] = var(X1)+  +var(Xn) = n2
n  , E[Sn] and var[Sn]  

 The property is not good. ()

Wireless Mobile System Lab, NCNU 4


Limiting Case II (民意調查的理論基礎)
EX: Power consumption of a mobile phone is a RV, X i .
N

X i
True mean:  = i 1
, if there are totally N mobile phones .
N
RVs X 1 , X 2 , , X n : The samples of power consumption for n mobile phones (n  N).
X1  X 2   Xn Sn
Sample mean: M n   (the mean value for these n samples)
n n
 n  , Sn = X1 +  + Xn, then Mn=Sn/n=?
 E[Mn] = E[Sn]/n = n/n = 
 var[Mn] = var[Sn]/n2 = n2/n2 = 2/n
 n  , E[Mn] =  and var[Mn]  0

PHY meaning: If n  , var[Mn] 0. Mn is very close to  .


Mn does not vary significantly. Sample meantrue mean  .
Limiting Case III

 E[Zn] =0
 var[Zn] = var[Sn]/n2 = 1
 Distribution, mean, and variance remain
unchanged, as n increases.

 Can we find something from these properties?


 Limit theorems provide inference and statistics
(推論、統計), for large data sets (e.g., X1, X2, …,
Xn, …, X).
 Even if the probability distribution is unknown,
the limit theorems can provide some useful
information as n   .
6
5.1 Markov and Chebyshev
Inequalities
 These inequalities can provide
probabilistic bounds.
 Although the distribution is unknown;
 Only mean and variance are known.

Wireless Mobile System Lab, NCNU 7


Markov Inequality (沒辦法時的辦法)

 If a RV X can only take nonnegative values,


then

 Only the mean E[X] is known. E[ X ]


a
0 a
Proof:
0, if X  a
Define Ya   Ya  0, Ya  a,
 a, if X  a
if X  a if X  a
Ya  X always holds.
E[Ya ]  E[ X ] 0 a

E[Ya ]  0  P(Ya  0)  a  P(Ya  a)  aP( X  a )  E[ X ]


E[ X ]
P( X  a)  8
a
Ex. 5.1: Markov Inequality

 X ~ U[0, 4].
 Use Markov inequality to find upper
bounds for P(X  n), for n = 2, 3, and 4
 Compare with exact probabilities.
Sol:
 E[X]=2.

Exact
 The bound by Markov inequality is quite
loose. (Only the mean is known) 9
Chebyshev Inequality

 If X is a RV with mean  and variance 2,


then

 X is outside the range of (  c).


 The mean  and variance 2 are known. 2
c2
Proof:
c  +c
Consider a nonnegative RV ( X   ) . 2

Apply Markov inequality with a  c 2 .


E ( X   ) 2  2
P(( X   ) 2  c 2 )  2

c c2
2
Since ( X   ) 2  c 2  X    c, P ( X    c ) 
c2
Alternative form of Chebyshev Inequality

 Let c = k.
2 1
P( X    k )  2 2  2
k k
 Prob.of a RV taking a value more than
k away from the mean is at most 1/k2.

 More accurate than Markov inequality since


it also uses information of variance.

Wireless Mobile System Lab, NCNU 11


5.2 The Weak Law of Large Numbers
(弱大數法則)
 Samples X1, X2, … are iid RVs with true mean .
 Sample mean Mn= (X1+…+ Xn)/n.
 For every  > 0, we have
|Sample mean - true mean|

 Physical sense: Sample mean Mn of a large


number of iid RVs is very close to their true
mean with high probability.
 Necessary assumption is the well-defined E[X].
 Still valid for the infinite variance var(X).

Wireless Mobile System Lab, NCNU 12


Proof

X1  X 2   Xn Sn
Sample mean: M n  
n n
 E[Mn]=
 var[Mn]=var[Sn]/n2 = n2/n2 = 2/n

 According to Chebyshev inequality


2 /n
P( M n     )  , for any   0
 2

 For any  >0, prob.  0, as n  .


Wireless Mobile System Lab, NCNU 13
Ex. 5.5: Polling (民調)
 Q: 1. How many samples are enough?
2. error (誤差) ?
3. Confidence level? (可靠度、信賴水準、信賴區間、信心水準, a
prob.)
EX: Estimation error =0.1, Confidence level=95%,
Pr{Estimation error < =0.1}  95% (at least 95%)
 p: true fraction of all voters supporting a candidate.
 n random selected voters.
 Mn: fraction of n selected voters supporting a
candidate. (Using sample mean Mn to estimate p)
 How large is n enough to estimate the true mean p?
[Sol] The answer of a voter: Bernoulli RV with success
prob. p and variance 2= p(1-p). 14
Ex. 5.5: Polling (民調) max{p(1-p)}=1/4,
if p=1/2 (Ex. 5.3)

 By Chebyshev inequality (with a loose bound)


var( M n ) np(1  p) / n2 1
P( M n  p   )    , for any   0
2 2 4n 2

1
If   0.1 and n  100, P( M100  p  0.1)   0.25
4 100  (0.1) 2

 100 samples, Pr{Estimation error  =0.1}<25%.


 100 samples, Pr{Estimation error  =0.1}>75%.
(Confidence level = 75%)
 Increase the confidence level to at least 95%, and
the accuracy level  to within 0.01.
1
P( M n  p  0.01)  2
 (1  0.95)  0.05
4n(0.01)
 According to the inequality, we know that n  15
50000 voters are needed to achieve accuracy.
5.3 Convergence in Probability

 Weak law of large number:


Sample mean Mn converges to true mean .
 Convergence?
 EX: 1/n. If n, 1/n converges to 0.

 But M1, …, Mn are a sequence of RVs.


(since X1, …, Xn are RVs. ) RV Mn may vary.
X1  X 2   Xn
Sample mean: M n 
n
 We have to define “convergence” for RVs, in
probability.
Wireless Mobile System Lab, NCNU 16
Convergence of
a Deterministic Sequence
 Let a1, a2, … be a sequence of real numbers.
 Let a be another real number.
 The sequence a1, …, an converges to a, or
limn an = a,
 (Condition:) if for every  > 0 there exists
some n0 such that

(對每個  都能找到一個臨界值 n0。使得 n  n0 時,


an 與 a 的誤差小於 )

 EX: an =1/n, n, a=0.


 For 0.01, n0=100
 For 0.001, n0=1000
17
Convergence in Probability

 RV sequence Y1, Y2, …, Yn converges to


real number a in probability,
 (Condition:) if for every  > 0, we have

or

(對每個  都能找到一個臨界值 n0。使得 n 


n0 時, RV Yn 與 a 的誤差大於  的機率小於 )
 Yn concentrated within a.
  : accuracy level.
 (1): confidence level (信賴水準).
18
Ex. 5.7: Convergence in Probability

 Y ~ exp distribution ( = 1).


 Let Yn = Y / n.
 Does the sequence Yn converge to zero,
in probability?
 Sol:

P( Yn  0   )  P(Yn  Y / n   )  P(Y  n )  e n
lim P( Yn  0   )  lim e n  0 1-CDF
n  n 

Wireless Mobile System Lab, NCNU 19


5.5 The Strong Law of Large Numbers
(強大數法則)
 Similar to the weak law, concerning the
convergence of sample mean and true mean.
 Let X1, X2, … be a sequence of iid RVs with
true mean .
 Then, the sequence of sample means Mn =
(X1 +  + Xn)/n converges to true mean ,
with probability 1, in the sense that
Pr(sample mean
= true mean)=1

Sample mean true mean


Pr{Greater than
 The weak law of large numbers an error}0
20
Convergence with Probability 1

 Let Y1, Y2, … be a sequence of RVs (not


necessarily independent) associated
with the same probabilistic model.
 Let c be a real number.
 We say that Yn converges to c with
probability 1 (or almost surely) if

Wireless Mobile System Lab, NCNU 21


Example: Convergence with
Probability 1
 X1, X2, … ~ iid U[0, 1].
 Yn = min{X1, … , Xn}.
 Show that Yn converges 0 with probability 1.
Proof:
 For any positive ,
P(Y  min( X1 ,..., X n )   )  P( X 1   ,..., X n   )
 P( X 1   ) P( X n   )  (1   ) n

 It is true for any :


let 0
lim P(Yn   )  lim(1   )  0  lim P(Yn  0)  0
n
n  n  n 

lim P(Yn  0)  1 (Yn in [0, 1] )


n  22
5.4 The Central Limit Theorem (中央極限定理)

Observation:
 Let X1, X2, … be a sequence of iid RVs with
common mean  and variance 2.
 Define

 Note: Same mean and variance for any


distribution at a large n.
 Can Zn be approximated by a particular
distribution? 23
The Central Limit Theorem

 Let X1, X2, … be a sequence of iid RVs with


common mean  and variance 2.
 Define
 Then, the CDF of Zn converges to the standard
normal CDF

in the sense that

 Note:
 Hold for any kind of RV at a large n.
 iid RVs.
 Only finite mean and variance are needed.
24
 Used to find the distribution of large data sets.
Normal Approximation Based on the
Central Limit theorem
 Sn = X1 +  + Xn, Xi are iid RVs with mean 
and variance 2.
 If n is large, CDF P(Sn≤c) can be approximated
by treating Sn as if it were normal.
1) Calculate mean S and the variance  2
S of Sn.

 E[Sn]= S= n, var(Sn)= S = n2
2

2) Normal → standard normal (table looking-up)


Sn  S c  S
P( S n  c )  P(   z )  ( z )
S S
 CDF (z) is available from standard normal
CDF tables.
25
Example: Normal Approximation
Based on the Central Limit theorem
 Load on a plane 100 packages.
 Single package weight ~ U[5, 50] (pounds).
 P(total weight > 3000 pounds) = ?
Sol:
 Mean and variance of one-package weight:
Uniform
 Total weight: S100
 E[S100]= 100, var(Sn)=1002
S100  100 3000  100
P( S100  3000)  P(   1.92)  (1.92)
10 10
P( S100  3000)  1  P( S100  3000)  1  (1.92)  0.0274
26
De Moivre(棣美弗)-Laplace
Approximation to the Binomial
 Sn ~ binomial(n, p). (Binomial=sum of n Bernoulli RVs)
 The central limit approximation treats Sn as normal
with mean np and varaince np(1-p).
k  np Sn  np l  np  l  np   k  np 
P(k  Sn  l )  P(   )     
np(1  p) np(1  p) np(1  p )   
 np (1  p )   np (1  p ) 
If l=k,
 De Moivre-Laplace approximation Improved P(Sn=l)=0
 n is large. version
 k and l are nonnegative integers, then

Central limit De Moivre-Laplace


approximation
P(k  Sn  l ) approximation

 l  np   k  np   l  1/ 2  np   k  1/ 2  np  27
         
 np(1  p)  np(1  p)   np(1  p)   np(1  p) 
     
Proof
 Binominal = Sum of Bernoulli RVs
 Sn= X1+…+ Xn
 E[Xi]= p, var(Xi)=p(1-p) (Bernoulli)
 E[Sn]= np, var(Sn)= np(1-p) (Binomial)
 By the central limit theorem, binominal  normal

CDF of standard normal

 If k= l, P(Sn=k)=0. (X)
 De Moivre-Laplace approximation

 P(Sn=k) = P(k-1/2 Sn  k+1/2) 28


Example: De Moivre-Laplace
Approximation to the Binomial
 Sn ~ binomial(n = 36, p = 0.5).
 P(Sn  21) = ?
 Compare with the exact value.
Hard
Sol:
 36  21

 Exact value: P( S  21)    k (0.5) n


36
 0.8785
  k 0

 Central limit approximation:


21  np 21  18
P( Sn  21)  ( )  ( )  (1)  0.8413
np(1  p) 3
 De Moivre-Laplace approximation:
21  0.5  np 21.5  18 Much
P( Sn  21)  ( )  ( )  (1.17)  0.879 closer
np(1  p) 3
 Approximation for a single value P(Sn = 19) = ?
19.5  18 18.5  18  36 
P( Sn  19)  ( )  ( )  0.6915  0.5675  0.124  19  (0.5)  0.1251
36

3 3   29
Very close Exact
節錄修改自:民調樣本 多多益善?
【聯合報/記者李名揚、郭錦萍】

 理論上,若問題僅兩個答案可選(如是或否),則不
論總人數(母數)多少,只要有效回答的人數(樣本
數),達到1068人,民調結果「誤差」就是正負3個百
分點,「信心水準」是95%。
 「誤差正負3個百分點」的意義是,若有一份選舉民調
,甲候選人的支持度42%,代表他真正的支持度,介於
39%到45%之間。基本上,支持度相差6個百分點以上,
才可以說其中一人領先。
 「信心水準95%」則是說,用這種方法做的民調,在每
100次民調中,平均有95次是可以信賴的。但還是有5%
的機率,不落在民調的誤差範圍內。

3% 5%@ n=1068
30
X1  X 2   X n Sn
Mn  
n n
E[ M n ]  E[ S n ] / n  p and var( M n )  var( S n ) / n 2  p(1  p) / n
According to the central limit theorem, we assume that
M n can be approximated as a normal random variable.
 M p Mn  p  
P( M n  p   )  P  n
  
 var( M n ) p (1  p ) / n p (1  p ) / n 

 M p      
 P n
 var( M n )
   2 1   
1/ 4n    1/ 4n  
  (
 2  2 2 n )

where p (1  p )  1/ 4.
For the confidence level to at least 95% and the accuracy level   3%,
( ) ( )
2  2 2 n  2  2 0.06 n  0.05, 0.975   0.06 n ( )
1.96  0.06 n , n  1067.1

31

You might also like