Professional Documents
Culture Documents
¯¯¯¯
X −μ X 1 + X 2 +. . . +X n − nμ
Zn = −
−
=
−
−
σ/√n √n σ
has mean E Zn = 0 and variance Var(Zn ) = 1. The central limit theorem states that the CDF of Zn converges to the standard normal CDF.
Let X1 ,X2 ,...,Xn be i.i.d. random variables with expected value E Xi = μ < ∞ and variance 0 < Var(X i ) = σ
2
< ∞. Then, the random variable
¯¯¯¯
X −μ X 1 + X 2 +. . . +X n − nμ
Zn = − =
− −
−
σ/√n √n σ
converges in distribution to the standard normal random variable as n goes to infinity, that is
An interesting thing about the CLT is that it does not matter what the distribution of the Xi 's is. The Xi 's can be discrete, continuous, or mixed random variables. To get
a feeling for the CLT, let us look at some examples. Let's assume that Xi 's are Bernoulli(p). Then E Xi = p , Var(Xi ) = p(1 − p) . Also,
Y n = X 1 + X 2 +. . . +X n has Binomial(n, p) distribution. Thus,
Y n − np
Zn = −−−−−−−−,
√np(1 − p)
where Yn ∼ Binomial(n, p) . Figure 7.1 shows the PMF of Zn for different values of n. As you see, the shape of the PMF gets closer to a normal PDF curve as n
increases. Here, Zn is a discrete random variable, so mathematically speaking it has a PMF not a PDF. That is why the CLT states that the CDF (not the PDF) of Zn
converges to the standard normal CDF. Nevertheless, since PMF and PDF are conceptually similar, the figure is useful in visualizing the convergence to normal
distribution.
Fig.7.1 - Zn is the normalized sum of n independent Bernoulli(p) random variables. The shape of its PMF, PZ n
(z), resembles the normal curve as n
increases.
1 1
As another example, let's assume that Xi 's are U nif orm(0, 1). Then E Xi =
2
, Var(Xi ) =
12
. In this case,
n
X 1 + X 2 +. . . +X n −
2
Zn = −−−− .
√n/12
Figure 7.2 shows the PDF of Zn for different values of n. As you see, the shape of the PDF gets closer to the normal PDF as n increases.
Y −nμ
By the CLT, is approximately standard normal, so we can write
√nσ
– –
P (90 < Y ≤ 110) ≈ Φ(√2) − Φ(−√2)
= 0.8427
Example 7.2
In a communication system each data packet consists of 1000 bits. Due to the noise, each bit may be received in error with probability 0.1. It is assumed bit errors occur
independently. Find the probability that there are more than 120 errors in a certain data packet.
Solution
Let us define Xi as the indicator random variable for the ith bit in the packet. That is, Xi = 1 if the ith bit is received in error, and Xi = 0 otherwise.
Then the Xi 's are i.i.d. and Xi ∼ Bernoulli(p = 0.1). If Y is the total number of bit errors in the packet, we have
Y = X 1 + X 2 +. . . +X n .
2
E X i = μ = p = 0.1, Var(X i ) = σ = p(1 − p) = 0.09
Y − nμ 120 − nμ
P (Y > 120) = P ( −
− > −
− )
√n σ √n σ
Y − nμ 120 − 100
= P ( −
− > −− )
√n σ √90
20
≈ 1 − Φ( −−)
√90
= 0.0175
Continuity Correction:
1
Let us assume that Y ∼ Binomial(n = 20, p = ), and suppose that we are interested in P (8 ≤ Y ≤ 10) . We know that a
2
1
Binomial(n = 20, p =
2
) can be written as the sum of n i.i.d. Bernoulli(p) random variables:
Y = X 1 + X 2 +. . . +X n .
1
Since Xi ∼ Bernoulli(p =
2
), we have
1 1
2
E Xi = μ = p = , Var(X i ) = σ = p(1 − p) = .
2 4
8 − nμ Y − nμ 10 − nμ
P (8 ≤ Y ≤ 10) = P ( < < )
−
− −
− −
−
√n σ √n σ √n σ
8 − 10 Y − nμ 10 − 10
= P ( – < < – )
−
−
√5 √n σ √5
−2
≈ Φ(0) − Φ ( –)
√5
= 0.3145
Since, here, n = 20 is relatively small, we can actually find P (8 ≤ Y ≤ 10) accurately. We have
10
n
k n−k
P (8 ≤ Y ≤ 10) = ∑ ( )p (1 − p)
k
k=8
20
20 20 20 1
= [( )+( )+( )]( )
8 9 10 2
= 0.4565
We notice that our approximation is not so good. Part of the error is due to the fact that Y is a discrete random variable and we are using a continuous distribution to find
P (8 ≤ Y ≤ 10) . Here is a trick to get a better approximation, called continuity correction. Since Y can only take integer values, we can write
P (8 ≤ Y ≤ 10) = P (7.5 < Y < 10.5)
7.5 − nμ Y − nμ 10.5 − nμ
= P ( < < )
−
− −
− −
−
√n σ √n σ √n σ
7.5 − 10 Y − nμ 10.5 − 10
= P ( < < )
– −
− –
√5 √n σ √5
0.5 −2.5
≈ Φ( ) − Φ( )
– –
√5 √5
= 0.4567
As we see, using continuity correction, our approximation improved significantly. The continuity correction is particularly useful when we would like to find
P (y1 ≤ Y ≤ y2 ) , where Y is binomial and y1 and y2 are close to each other.
Y = X1 + X2 + ⋯ + Xn .
Suppose that we are interested in finding P (A) = P (l ≤ Y ≤ u) using the CLT, where l and u are integers. Since Y is an integer-valued random variable, we can
write
1 1
P (A) = P (l − ≤ Y ≤ u+ ).
2 2
It turns out that the above expression sometimes provides a better approximation for P (A) when applying the CLT. This is called the continuity correction and it is
particularly useful when Xi 's are Bernoulli (i.e., Y is binomial).
← previous
next →