Professional Documents
Culture Documents
Binomialnormal PDF
Binomialnormal PDF
1
Binomial Calculations for Compound
Events
2
The Normal Approximation to the
Binomial
Consider the following problem:
Suppose we draw an SRS of 1,500 Americans and
want to assess whether the representation of
blacks in the sample is accurate. We know that
about 12% of Americans are black, so we expect
X, the number of blacks in the sample, to be
around 180. Allowing a little leeway, what is the
probability that the sample contains 170 or fewer
blacks?
170
X
P (X ≤ 170) = P (X = j)
j=0
170 µ
X ¶
1500
= (0.12)j (0.88)1500−j
j=0
j
3
It turns out that as n gets larger, the
Binomial distribution looks increasingly
like the Normal distribution.
Consider the following Binomial
histograms, each representing 10,000
samples from a Binomial distribution with
p = 0.1:
4
Probability Probability Probability
0.00 0.02 0.04 0.06 0.00 0.05 0.10 0.15 0.0 0.1 0.2 0.3 0.4 0.5 0.6
0
0
30
1
40
5
50
2
Value
Value
Value
n=5
n = 50
n = 500
60
3
10
70
4
80
15
5
Probability Probability Probability
0.00 0.01 0.02 0.03 0.04 0.00 0.04 0.08 0.12 0.0 0.1 0.2 0.3 0.4
70
0
80
5
1
90
2
10
3
Value
Value
Value
n = 10
n = 100
n = 1000
4
15
5
20
6
µ = np
p
σ = np(1 − p)
6
When is the approximation
appropriate?
np ≥ 10 and
n(1 − p) ≥ 10
7
Behavior of the Approximation as a
Function of p, for n = 100
p = 0.001 p = 0.005
0.6
0.8
0.5
0.6
0.4
Probability
Probability
0.3
0.4
0.2
0.2
0.1
0.0
0.0
0 1 2 3 0 1 2 3 4 5
Value Value
p = 0.01 p = 0.05
0.15
0.3
Probability
Probability
0.10
0.2
0.05
0.1
0.00
0.0
0 2 4 6 0 5 10 15
Value Value
p = 0.1 p = 0.5
0.08
0.12
0.06
Probability
Probability
0.08
0.04
0.04
0.02
0.00
0.00
0 5 10 15 20 30 40 50 60 70
Value Value
8
Calculations with the Normal
Approximation
9
Thus, using the approximating Normal
distribution Y ∼ N (180, 12.59), we
calculate
10
The Continuity Correction*
The addition of 0.5 in the previous slide is
an example of the continuity correction
which is intended to refine the
approximation by accounting for the fact
that the Binomial distribution is discrete
while the Normal distribution is
continuous.
In general, we make the following
adjustments:
P (X ≤ x) ≈ P (Y ≤ x + 0.5)
P (X < x) = P (X ≤ x − 1)
≈ P (Y ≤ x − 0.5)
P (X ≥ x) ≈ P (Y ≥ x − 0.5)
P (X > x) = P (X ≥ x + 1)
≈ P (Y ≥ x + 0.5)
11
Sampling Distributions
The Normal approximation to the
Binomial distribution is, in fact, a special
case of a more general phenomenon.
The general reason for this phenomenon
depends on the notion of a sampling
distribution.
Consider the following setup: We observe
a sample of size n from some population
and compute the mean
n
1X
x̄ = xi
n i=1
12
a random quantity.
If we repeatedly drew samples of size n
and calculated x̄, we could ascertain the
sampling distribution of x̄.
13
A Word on Notation
14
The Mean and Standard Deviation
of X̄
15
à n
! n
1 X 1 X
Var(X̄) = Var Xi = 2 Var(Xi )
n n
i=1 i=1
n
1 X 2 1 2 σ2
= 2 σ = 2 (nσ ) =
n n n
i=1
p σ
SD(X) = Var(X) = √
n
16
The Central Limit Theorem*
Now we know that X̄ has mean µ and
standard deviation √σn , but what is its
distribution?
If X1 , X2 , . . . , Xn are Normally distributed,
then X̄ is also normally distributed. Thus,
µ ¶
σ
Xi ∼ N (µ, σ), ∀i =⇒ X̄ ∼ N µ, √
n
17
The Central Limit Theorem
Suppose X1 , X2 , . . . , Xn are iid random
variables with mean µ and finite standard
deviation σ.
If n is sufficiently large, the sampling dis-
tribution of X̄ is approximately Normal
with mean µ and standard deviation √σn .
18
Normal Approximation to the
Binomial Revisited
What does all this have to do with the
Normal approximation to the Binomial?
An observation from a Binomial distribution
Y is actually the sum of n independent
observations from a simpler distribution, the
Bernoulli distribution.
A Bernoulli random variable X takes the
value 1 with probability p or the value 0 with
probability 1 − p, and has
p
E[X] = p and SD(X) = p(1 − p)
19
According to the CLT, X̄ has a N (µ, σ)
distribution, where
r
p(1 − p)
µ = p and σ =
n
E[nX̄] = nE[X̄] = nµ = np
Var(nX) = n2 Var(X) = n2 σ 2
2 p(1 − p)
= n = np(1 − p)
n
Thus, in general, if X1 , . . . , Xn are iid
random variables with mean µ and
standard deviation σ, then
n
X √
Xi ∼
˙ N (nµ, nσ)
i=1
20