Professional Documents
Culture Documents
Limit Theorems
Prof. Jane-Hwa Huang
3
Limiting Case I
Sn = X1 + + Xn. (Sum of iid RVs)
n , Sn =?
E[Sn] = E[X1]+ +E[Xn] = n
var[Sn] = var(X1)+ +var(Xn) = n2
n , E[Sn] and var[Sn]
X i
True mean: = i 1
, if there are totally N mobile phones .
N
RVs X 1 , X 2 , , X n : The samples of power consumption for n mobile phones (n N).
X1 X 2 Xn Sn
Sample mean: M n (the mean value for these n samples)
n n
n , Sn = X1 + + Xn, then Mn=Sn/n=?
E[Mn] = E[Sn]/n = n/n =
var[Mn] = var[Sn]/n2 = n2/n2 = 2/n
n , E[Mn] = and var[Mn] 0
E[Zn] =0
var[Zn] = var[Sn]/n2 = 1
Distribution, mean, and variance remain
unchanged, as n increases.
X ~ U[0, 4].
Use Markov inequality to find upper
bounds for P(X n), for n = 2, 3, and 4
Compare with exact probabilities.
Sol:
E[X]=2.
Exact
The bound by Markov inequality is quite
loose. (Only the mean is known) 9
Chebyshev Inequality
Let c = k.
2 1
P( X k ) 2 2 2
k k
Prob.of a RV taking a value more than
k away from the mean is at most 1/k2.
X1 X 2 Xn Sn
Sample mean: M n
n n
E[Mn]=
var[Mn]=var[Sn]/n2 = n2/n2 = 2/n
1
If 0.1 and n 100, P( M100 p 0.1) 0.25
4 100 (0.1) 2
or
P( Yn 0 ) P(Yn Y / n ) P(Y n ) e n
lim P( Yn 0 ) lim e n 0 1-CDF
n n
Observation:
Let X1, X2, … be a sequence of iid RVs with
common mean and variance 2.
Define
Note:
Hold for any kind of RV at a large n.
iid RVs.
Only finite mean and variance are needed.
24
Used to find the distribution of large data sets.
Normal Approximation Based on the
Central Limit theorem
Sn = X1 + + Xn, Xi are iid RVs with mean
and variance 2.
If n is large, CDF P(Sn≤c) can be approximated
by treating Sn as if it were normal.
1) Calculate mean S and the variance 2
S of Sn.
E[Sn]= S= n, var(Sn)= S = n2
2
l np k np l 1/ 2 np k 1/ 2 np 27
np(1 p) np(1 p) np(1 p) np(1 p)
Proof
Binominal = Sum of Bernoulli RVs
Sn= X1+…+ Xn
E[Xi]= p, var(Xi)=p(1-p) (Bernoulli)
E[Sn]= np, var(Sn)= np(1-p) (Binomial)
By the central limit theorem, binominal normal
If k= l, P(Sn=k)=0. (X)
De Moivre-Laplace approximation
3 3 29
Very close Exact
節錄修改自:民調樣本 多多益善?
【聯合報/記者李名揚、郭錦萍】
理論上,若問題僅兩個答案可選(如是或否),則不
論總人數(母數)多少,只要有效回答的人數(樣本
數),達到1068人,民調結果「誤差」就是正負3個百
分點,「信心水準」是95%。
「誤差正負3個百分點」的意義是,若有一份選舉民調
,甲候選人的支持度42%,代表他真正的支持度,介於
39%到45%之間。基本上,支持度相差6個百分點以上,
才可以說其中一人領先。
「信心水準95%」則是說,用這種方法做的民調,在每
100次民調中,平均有95次是可以信賴的。但還是有5%
的機率,不落在民調的誤差範圍內。
3% 5%@ n=1068
30
X1 X 2 X n Sn
Mn
n n
E[ M n ] E[ S n ] / n p and var( M n ) var( S n ) / n 2 p(1 p) / n
According to the central limit theorem, we assume that
M n can be approximated as a normal random variable.
M p Mn p
P( M n p ) P n
var( M n ) p (1 p ) / n p (1 p ) / n
M p
P n
var( M n )
2 1
1/ 4n 1/ 4n
(
2 2 2 n )
where p (1 p ) 1/ 4.
For the confidence level to at least 95% and the accuracy level 3%,
( ) ( )
2 2 2 n 2 2 0.06 n 0.05, 0.975 0.06 n ( )
1.96 0.06 n , n 1067.1
31