You are on page 1of 54

Probability With The Binomial Distribution And

Pascal’s Triangle

A Key Idea In Statistics

By Scott Hartshorn
Thank You!
Thank you for getting this book! This book walks through how to use the
binomial equation to calculate the likelihood of certain outcomes after a
number of discrete events, for instance how likely you are to roll 3 sixes out
of 12 rolls of a die. One way to visualize the binomial distribution is to use
Pascal’s triangle, and the beginning of this book focuses on using Pascal’s
triangle to calculate outcomes. Later, the book gets into calculating outcomes
with uneven probabilities or outcomes for a large number of events and the
binomial equation is used for those calculations.
The final part of the book is a guide for how to approximate the binomial
results using a normal curve, as well as when it is correct to do so. An
example of when you would want to do that approximation is when there are
a huge number of discrete events and you need to calculate them over a large
range, such as if you are trying to estimate the odds of a politician getting at
least 50% of the votes out of a million votes cast.
If you want to help us produce more material like this, then please leave a
positive review on Amazon. It really does make a difference!
Your Free Gift
As a way of saying thank you for your purchase, I’m offering this
binomial theorem cheat sheet that’s exclusive to my readers.
This FREE PDF cheat sheet has the binomial equation, and an example. It
also shows Pascal’s triangle, which is quite useful for visualizing the
binomial distribution. This is a PDF document that I encourage you to print,
save, and share. You can download it by going here

http://www.fairlynerdy.com/binomial-theorem-cheat-sheet/
Table of Contents
1. Thank You!
2. Binomial Distribution – The Basics
3. Example 1 – Binomial Theorem With Pascal’s Triangle
4. Other Ways To Think Of And To Visualize Pascal’s Triangle
5. For Even Odds – Highest Values In The Middle
6. A “Real Life” Example Of The Binomial Distribution In Action
7. Binomial Theorem With Uneven Odds
8. The Binomial Equation
9. Another Example With The Binomial Equation
10. Binomial Theorem With “At Least” A Number
11. Cumulative Distribution Function
12. Approximation To The Binomial Distribution
13. How To Use Mean And Standard Deviation To Approximate
Binomial Distribution
14. When Is The Normal Curve Approximation Good Enough?
15. Multinomial Distribution
16. So How Would You Use This Multinomial Equation?
17. If You Find Bugs & Omissions:
18. More Books
Binomial Distribution – The Basics
The Binomial Distribution can be used any time there are one or more
discrete events that have exactly two outcomes. An example of a discrete
event is a coin flip, or a roll of a die, or a vote cast by a single voter. Each
event happens and is done. This contrasts to a continuous event such as
learning a new skill, or the tide rolling in, where there is no clear start or stop.
The binomial equation also only applies to events with two mutually
exclusive outcomes. Yes/No, True/False, Success/Failure. Events with 3 or
more outcomes would fall under the multinomial equation, which is covered
at the end of this book.
The purpose of using the binomial equation is to determine how likely a
given outcome is after a series of events. For instance, if you flip a coin 10
times, how likely are you to get 7 heads? If you have a 52% winning
percentage at blackjack, how likely are you to be ahead after 1000 hands?
Results from the binomial equation have a characteristic shape, similar to the
normal curve, show in the example below

That shape can change as you change the number of events, or the likelihood
of a given event. A low probability for an event will cause the bulk of the
curve to slide to the left. A high probability for a given event will cause the
bulk of the curve to slide to the right.
The first example below starts simple, with only a few discrete events and a
50% likelihood for those events. Subsequent examples add additional
complexity step by step.
Example 1 – Binomial Theorem With Pascal’s Triangle
In this example suppose that you have a fair coin, and you flip it 10 times.
What is the probability that you will get exactly 4 heads?
There are a couple of different ways we could solve this problem, but I’m
going to use Pascal’s triangle, since it is simple and intuitive. Later examples
will use the Binomial equation, since it is more powerful, but harder to
remember.

What is Pascal’s Triangle?


Pascal’s triangle is one way to visualize the outcomes of successive events
that can have one of two mutually exclusive outcomes. For instance, if you
have a series of events that were all True/False, or Heads/Tails, or Yes/No
and wanted to visualize the total probability of getting a certain number of
outcome A or outcome B as you kept doing more and more of the events, a
good way to do that visualization is to use Pascal’s triangle. We are showing
it here because a series of two mutually exclusive events is exactly what the
binomial theorem represents, and hence Pascal’s triangle is a good way to
show the outcomes from the binomial theorem.
Pascal’s triangle starts with the number 1 at the very top row of the triangle.
Each row below that has an additional digit in it compared to the row above
it. Those digits represent the numbers of A/B outcomes (or success/failure)
as you add more events. Here are the first three rows of Pascal’s triangle

The top row represents having no events, and has only the single digit 1.
This is saying that if you don’t do any events then there is only one possible
outcome (having no successes). The next row, which we have labeled Row
1, has two outcomes. It represents a single event, for instance a single flip of
a coin. The two outcomes are the number of times you will get zero heads
after that flip of the coin, and the number of times you will get 1 head after
that flip. The next row, which we have labeled Row 2, has 3 possible
outcomes. For the coin example, this would be getting zero heads, getting
one head, or getting two heads in two flips of the coin.
Note that the total sum of Row 2 is 4. This means that if you flip a fair coin
twice, and repeat that pair of flips 4 times, the most likely result is to get zero
heads on one pair of flips, one head on two pairs of flips, and two heads on
one pair of flips. However, maybe you don’t want to do 4 pairs of flips,
maybe you only want to do one pair of flips and want to know your odds of
getting a pair of heads, a head and a tail, or a pair of tails. In that case you
can normalize the results. The total sum of Row 2 is 4. If we divide all the
values in Row 2 by 4 they become .25, .50, and .25. This means that if you
do a single pair of flips, your chances of getting zero heads is 25%, your
chances of getting a head and a tail is 50%, and your chances of getting two
heads is 25%
Naturally, Pascal’s triangle doesn’t stop at Row 2. It continues down
indefinitely for as long as you wish to continue calculating it, with each
subsequent row representing adding another event. This is the same as
saying that you could continue flipping coins and tabulating the outcomes for
as long as you wish. You could always do another flip or add another row.
You can calculate each row from the row above it. Each number in a row is
the sum of the two numbers directly above it. For instance, the 2 in Row 2 is
the sum of the two 1’s above it. The two 1’s on the edges of row 2 only have
a single number directly above them. Those outside edges of the triangle will
remain 1 the whole way down.
Here is Pascal’s triangle continued down to 5 events

Seeing it extended down this far makes it obvious how the numbers are the
sum of the numbers above them, and also obvious that the biggest numbers in
the triangle are going to be at the center of any given row, with the smaller
numbers on the outside of the row.
Other Ways To Think Of And To Visualize Pascal’s Triangle
There are a couple of ways to think about Pascal’s triangle. The first way is
what was described above, each value is the sum of the numbers above it.
The second way is that each number represents the value of the combination
formula where a number’s row and column corresponds to the number of
events and number of successes. This way of utilizing Pascal’s triangle is
quite useful, and we will go over it later in the book. The third way of
thinking about Pascal’s triangle we will touch on briefly here, because it is a
very intuitive way of thinking about it.
And that way is that each value in Pascal’s triangle represents the number of
paths you can take to reach it. For instance, the 2 in Row 2 can be reached
by going left and then right, or right and then left. Either of the 3’s in the
third row can be reached 3 ways.

If you think of these paths as events, such as flips of a coin, this is showing
that if you flip a coin three times there are 3 different ways that you can get a
single head. Each rightward arrow represents a success (heads) and each
leftward arrow represents a failure (tails). Each distinct path is a different
permutation of the events.

Path 1 is showing heads-tails-tails


Path 2 is showing tails-tails-heads
Path 3 is showing tails-heads-tails
Naturally, for any given number, there are two ways to get to that number,
either from the number above and to the left with a rightward arrow, or the
number above and to the right with a leftward arrow. (Note this isn’t true for
the ones on the edges, there is only one path to get to them). So the number
of paths you can take to reach a number is the sum of the number of paths
you can take to reach its two parents.
Reshaping Pascal’s Triangle For Easier Charts
The visualization of Pascal’s triangle shown above, where each row is offset
from the row above it, is probably the most common way to show it.
However there have been a number of other visualizations developed as
well. One of those is shown below

The values are the same as in the other version of Pascal’s triangle. For
instance, row 5 has values of 1, 5, 10, 10, 5, and 1. But in this version each
number is the sum of the number above it and the number to the left and
above it. It happens that this modified format for Pascal’s triangle is easier
to create in tables such as the ones that Excel uses, so this book will tend to
use that version.
Many of the charts and tables in this book have been created in Excel. You
can download all of them, for free, here
http://www.fairlynerdy.com/binomial-theorem-examples/
For Even Odds – Highest Values In The Middle
We observed that the highest values in Pascal’s triangle were in the middle of
any given row. Let’s look at why that is, using coin flips as an example. If
we go down to 3 flips, we see that the values in the row are 1, 3, 3, 1
Since the sum of those numbers is 8, this means that there are 8 possible
orders that all the flips could be in. The values in Pascal’s triangle and the
binomial equation correspond to the Combination values, where order is not
important. This is as opposed to Permutation values where the order is
important. If we list out all 8 possible permutations for 3 flips, they are

T T T ( 0 Heads )
H T T ( 1 Head )
T H T ( 1 Head )
T T H ( 1 Head )
T H H ( 2 Heads )
H T H ( 2 Heads )
H H T ( 2 Heads )
H H H ( 3 Heads )
And we can see the same values in these permutations as we did in Pascal’s
triangle. For either no heads, or all heads, there is only 1 possible way to get
that outcome: keep flipping either all heads or all tails. However if you can
have some of flips be heads and some be tails then there are multiple ways to
get that outcome, and it occurs more frequently.
The probability of getting either 0, 1, 2, or 3 heads out of 3 flips is the total
number of times each outcome occurred, divided by the 8 possible outcomes.
This is

0 Heads: 1 time, 12.5%


1 Head: 3 times, 37.5%
2 Heads: 3 times, 37.5%
3 Heads: 1 time, 12.5%

With 10 events, and a 50% likelihood for a single event, we can pull the
distribution of binomial results from row 10 of Pascal’s triangle
Those results are plotted below

With a 50% likelihood, the most frequent results are in the center of number
of outcomes. After 10 flips there are 1024 possible outcomes, and 252 of
those outcomes have 5 heads / 5 tails. So if you were to guess the number of
heads after 10 flips, you would guess 5 as the most likely outcome, even
though it is only a 24.6% likelihood overall.
As the number of events increases, the overall shape remains more or less the
same, however it spreads out farther to the right to encompass more possible
outcomes. Since there are more possible outcomes, the likelihood of any
individual outcome decreases. The chart below shows the binomial
distribution at a 50% likelihood for 6 events, 10 events, and 14 events. The
distribution for each line has been normalized so that the total probability
(area under each curve) is 1.0. You can see how adding more events
stretches out the distribution and makes it less tall.

One interesting observation is that this binomial distribution looks a lot like a
normal distribution. This is a fact that we will take advantage of later in the
book in order to approximate binomial results without having to calculate all
of them for large numbers of events.

Non Equal Likelihood


We’ve highlighted several times that the shape of the chart changes for
different probabilities of success. For instance the lower the probability of an
individual success, the lower the number of successes there will be. In the
chart below we have included the binomial distribution for 10 events where
each individual event has a 30% likelihood
As you can see, this shifts the distribution of outcomes to the left.
Importantly however, the full range of outcomes doesn’t change. Even with
a 30% likelihood, you can still have 10 successes in 10 trials. No matter how
small or large the likelihood of an individual event, as long as it isn’t 0% or
100%, the binomial distribution of outcomes will go all the way from 0
successes to the full number of possible successes.
Of course, as the number of events increases, the odds of getting 0 successes
or all successes becomes vanishingly small.

This Version Of Pascal’s Triangle Only Works For Equally Likely


Outcomes
It is important to note that the Pascal’s triangle we were using above is only
valid for events that have equally likely outcomes. I.e. ones where the
probability of success or failure is 50%. A coin flip is a good example of
this, which is why it was used as an example several times. If the outcomes
are not equally likely then you cannot use this version of Pascal’s triangle to
directly calculate binomial results, although variants are still useful.
The next section looks at a whimsical “Real Life” application of Pascal’s
triangle, and the following section looks at a variant of Pascal’s triangle for
uneven odds, before showing how to avoid Pascal’s triangle entirely and just
use the Binomial equation.
A “Real Life” Example Of The Binomial Distribution In Action
Plinko
One game that uses the binomial distribution is “Plinko” in the television
game show “The Price Is Right” In this game contestants drop a puck into a
big board that has a number of rows of pegs. At each peg, the puck bounces
left or right until it lands in one of several slots at the bottom labeled with
different amounts of money. At the time of this writing, the amounts of
money that can be won in the slots is $100, 500, 1000, 0, 10000, 0, 1000,
500, 100.
A left or right option at each peg replicates what we see in Pascal’s triangle.
The distribution of results for Plinko pucks can be predicted from the
binomial distribution.
We know that the most common slot to land in (if the puck starts dead center)
is the $10,000 slot in the middle. However the two $0 slots on either side are
also quite common. Still, from the contestant’s point of view, the winning
strategy is to start with the puck exactly in the middle, which gives them the
largest likelihood of getting the $10,000 prize. More on that strategy, and
how one time CBS accidently rigged the game to always land on $10,000,
can be found in this interesting (and fairly short) article
www.theatlantic.com/entertainment/archive/2013/09/how-to-game-
plinko/280088/

Bean Machine
The Bean Machine was a late 1800’s invention that used this same bouncing
off a peg property to generate a random number in the binomial distribution.
One example of a bean machine is shown here
Photo by Antoine Taveneaux CC BY-SA 3.0
This device used the same principle of left-right choices to divide the balls
into slots resulting in a binomial distribution.
Binomial Theorem With Uneven Odds
As promised, it is time to look at events that do not have an equal chance of
success. To simulate this, instead of flipping a coin we will think of rolling a
die.
A typical die has 6 sides. The odds of rolling a given number on that die are
1 in 6, so 16.6%. To make the numbers a little bit more round, let’s assume
that you are using a 4 sided die, which do exist. A success for you is to roll a
4, anything else is a failure. That means the success rate is 1 in 4, so 25%
Why is a rolling a 4 considered a success? Well truthfully, mathematicians,
such as you and I, don’t concern ourselves overly much with “How can we
relate this to the real world?” kind of thoughts, but for these series of
problems we can assume that we are simulating a role playing game, such as
Dungeons and Dragons, and you need to get hits with your 4 sided die.
You roll the 4 sided die 7 times, what are the chances of getting exactly 2
hits?
We can solve that question directly with the binomial equation, which is

Where n is the total number of events, k is the number of successful events,


and p is the likelihood of a successful event on a single trial. (Note, n over k
inside parentheses at the front of the problem is mathematical short hand for
the combination formula)
However before diving into that equation, it is once again worth looking at
the problem in terms of Pascal’s triangle. Previously, we set up Pascal’s
triangle such that each number was the sum of the two numbers directly
above it. Including the number above and to the left in the sum simulated
having a success on the most recent event. I.e. if you go from 2 events, 1
success to 3 events, 2 successes you have had a success on the most recent
event. Including the number directly above (or above and to the right) in the
sum simulated having a failure on the most recent event. For example, if
you go from 2 events, 2 successes to 3 events, 2 successes you have had a
failure on the most recent event.
Because the probability of success and failure were the same in the previous
iteration of Pascal’s triangle, we didn’t apply any weighting to the chance of
success vs failure. Here is a version of Pascal’s triangle where each
subsequent event assumes a 75% chance of failure, and a 25% chance of
success on the latest event.

To answer our question, we can see from row 7 column 3 of this triangle that
the odds of getting exactly 2 hits out of 7 rolls with the 4 sided die is 31.1%
This version of Pascal’s triangle was simple to generate. Each cell is 75% of
the value above it added to 25% of the value above it to the left. An
immediate difference that you will notice in this version of Pascal’s triangle
vs the standard version at a 50% probability is that these numbers are smaller
than one, and they are also decimals instead of integers.
The fact that these numbers are decimals instead of integers is simply because
an arbitrary fraction does not necessarily have a clean integer ratio, so it is
often simpler to work with decimals. Pascal’s triangle with even odds of 50%
could easily scale as 1+1=2. That is why in the normal version of Pascal’s
triangle each row is double the value of the row above it. In this modified
triangle, the fact that the numbers are smaller than 1 is because the sum of
each row is equal to the sum of the row above it, i.e. 1.0. This is different
than the standard Pascal’s triangle. In the standard version, the sum of each
row is double to row above it, but then you divide by 2 to the power of the
row number to normalize it. If we wanted, we could multiply each row by 2
to the appropriate power to get a scaled value.
This version of Pascal’s triangle was intended mainly to highlight how each
subsequent cell is formed. Namely that every directly downward step is
equivalent to multiplying by .75, and every downward and rightward step is
equivalent to multiplying by .25. In addition to those multiplications, they
are also multiplied by the standard values in Pascal’s triangle which
correspond to how many routes could be taken to get to a given cell.
That means that to calculate the value of any cell in this triangle, we simple
need to know what row it is in, and how many successes we have had.
The value of any cell in this modified Pascal’s triangle is equal to

The value of that same cell in the standard Pascal’s triangle.


Multiplied by
The probability of success raised to the power of the number of
successes. Multiplied by
The probability of failure raised to the power of the number of
failures

If n is the number of events, and k is the number of successes, the formula for
any cell in this modified Pascal’s triangle is

An intuitive meaning of this equation isn’t obvious at first glance, however it


turns out to make quite good sense. The next section breaks down and
explains the equation so it can be more easily understood.
The Binomial Equation
The binomial equation can be used in place of Pascal’s triangle for problems
with uneven odds, or to save calculate compared to Pascal’s triangle. The
binomial equation has the advantage of calculating a cell directly, instead of
having to calculate every row above it.
For instance, if you wanted to find row 10 with Pascal’s triangle, you would
have to do 66 calculations (1+2+3…+9+10+11). With the binomial equation
you would do 11. If you wanted to do the same calculation for row 1,000
you would do 501,501 calculations for Pascal’s triangle, and 1001 for the
binomial equation. In computer science terms, Pascal’s triangle is order n
squared, the binomial equation is order n.
The binomial equation is shown below

This equation appears somewhat complicated at first glance. However if we


break it down it can be understood very intuitively. The first part of the
equation

just shows that we are going to be calculating a result for a single specific
value in the binomial distribution. This equation doesn’t show all the
possible outcomes of flipping a coin 10 times. It doesn’t show how often you
will get 0 heads, and 1 head, and 2 heads etc. This equation only shows the
probability for a single one of those outcomes. For instance you could use it
to calculate the probability of getting exactly 4 heads in 10 flips. Of course,
you could then use it again to also calculate the probability of 3 heads, and
again for 5 heads and again for 6 heads etc.
As far as what the letters mean

f – just means it is a function of k, n, p


k – This is the number of successful events. i.e. the total number
of heads in the coin flips, or the total number of outcome A out of
A/B
n – This is the total number of events. This would be all the flips,
regardless of the outcome
p – This is a decimal number between 0 and 1 inclusive
representing the probability of a successful event on a single trial.
For instance if you were rolling a die and it only counted if you
got a 6, then p would be 1/6 = .1667

The second part of the equation

Is an abbreviation for the combination formula. This part of the equation is


what we were simulating with the original Pascal’s triangle. The
combination formula shows the number of ways a subset can be selected
from a group without replacement when the order of the selection does not
matter.

The ! is a factorial. A factorial means multiply a number by all the numbers


below it down to 1. For instance 10! = 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1 =
3,628,800
If we wanted to generate the value for the combination equation for 5 items,
select 2 (i.e. 5 flips of the coin, 2 heads) the equation would be

For this book, we are going to use the combination formula without diving
very much more deeply into why it works the way it does. But the
combination formula, and permutation formula, have interesting properties in
their own right. If you are interesting in them, you may want to check out my
book Probability with Permutations and Combinations on Amazon.
The value of 10 from the combination formula of 5 pick 2 turns out to be the
same value that is in Pascal’s triangle for 5 events, 2 successes

If you were to solve the combination formula for 5 events and 0 successes, 1
success, 2 successes etc. all the way through 5 successes, you would
duplicate the 1, 5, 10, 10, 5, and 1 values that are in Pascal’s triangle.
The third part of the equation is

This part is the probabilities of the event occurring raised to the power of the
number of times it occurs, multiplied by the probability of the event not
occurring raised to the power of the number of times it didn’t occur.
For instance, if you had an event that had a 30% probability of occurring and
it happened 4 times in 10 trials, this part of the equation would be
This is the part of the equation that normalizes the results. When we looked
at the probability of getting heads or tails after 3 flips based on Pascal’s
triangle, we started with the 1, 3, 3, 1 in row 3

And then we divided those numbers by 8 to get 12.5%, 37.5%, 37.5%, and
12.5%. The p^k * (1-p)^(n-k) part of the equation is what was doing that
dividing by 8. In that example, the likelihood of a successful outcome was
50%. That meant that both p and 1-p were equal values of .5. Therefor this
part of the equation

Became

And further reduced to


And since n was 3 for our 3 trials, 1 divided by 2^n is effectively dividing by
8. This ends up normalizing all the probabilities from the different numbers
of successes so that they all add up to 1.0.
Another Example With The Binomial Equation
This is another example using the binomial directly equation and not Pascal’s
triangle. Here let’s assume that you are gambling at a casino and you decide
to play 100 hands of blackjack. Surprisingly, your odds of winning any given
hand are actually 55%, maybe you are counting cards or something. What
are the chances that after 100 hands you have won exactly 51 of them?
Here we can just do a straight forward plugging in of values into the binomial
equation.

n is the total number of events, 100 hands in this case. K is the number of
hands that you won, 51 and p is the odds of winning a given hand, .55

The result in this case is .0577, or 5.77%, which is the probability of you
winning exactly 51 hands.
Binomial Theorem With “At Least” A Number
The previous example showed the probability of getting exactly 51 wins out
of 100 bets. Honestly that example felt a little bit contrived, because in the
real world what you are more frequently concerned with is “at least” or “no
more than” rather than an exact value. The gambler in the previous example
is more likely to care about winning at least 51 bets, so that he knows he
leaves a winner for the day, rather than winning exactly 51 bets.
The solution to an at least problem is simply to solve the exact number
problem multiple times, and add up all the probabilities. For instance, if you
had the problem of calculating the odds of getting 4 or fewer heads in 10
flips, you would solve for the odds of getting 0 heads, and separately solve
for the odds of getting 1, and 2, and 3, and 4. Adding up all of those odds
would give you the odds of four or fewer.
Visualizing that using Pascal’s triangle

1 + 10 + 45 + 120 + 210 = 386


So there are 386 permutations of flips that give 4 or fewer heads. The total
number of possible outcomes is always a power of 2 raised to the power of
number of events, which is 10 in this case. 2^10 = 1024.
386 divided by 1024 is .377. So our final result for this problem is 37.7%
chance of getting 4 or less.
Unfortunately, if you need an exact solution to an “at least” kind of problem
there isn’t another other option than solving each outcome individually and
adding them all together. That is why the example solved above was 4 heads
or fewer in 10 rolls, rather than 51 or more wins in 100 hands, simply due to
space constraints of showing the 51-100 solutions. The actual odds of a 55%
gambler winning at least 51 hands are 76%. That calculation is done in the
supplemental Excel file you can download here for free.
As the binomial problems increase in size, the amount of calculation can be
become cumbersome. For instance, if someone wanted the odds of getting
fewer than 1000 heads on 5000 coin flips you would have to solve the
problem 1000 times and keep a running sum. In that case, using Excel or
writing some software to solve it is your best solution. Fortunately the
equations are straight forward enough that Excel or custom software is fairly
simple.
The only trick that can save time for an exact solution in these cases is that if
the question calls for more than half of the outcomes, for instance flipping the
coin 20 times and getting at least 1 head but not more than 19 heads, you can
solve the inverse of the problem and subtract. I.e. solve for getting 0 heads &
20 heads, and subtract the sum of those probabilities from 1.0.
Although there isn’t a shortcut for solving the exact solution to the binomial
equation over a range of numbers, if you are interested in that range, it turns
out that there is a pretty good approximation to that solution using a normal
curve and knowing the mean and standard deviation of the binomial results.
If you get into millions or billions of events, where Excel or simple scripts
run into calculation problems, the approximation becomes useful. We will go
over that approximation in the next section.
Cumulative Distribution Function
In technical terms, most of the charts that we have shown so far for the
binomial distribution have been the probability mass function. They plot a
given outcome, and the probability of that outcome occurring. For the
binomial distribution, they have the characteristic hump shape like the ones
below.

However in the previous section, when we started talking about summing that
probability mass function over a range of values, what we get is the
“cumulative distribution function”. This function is the running sum of all
the probabilities up to a certain number. It always starts at zero and goes to
one, and has a different characteristic shape similar to a stretched out S.
We can shift the cumulative distribution function left or right by decreasing
or increasing the probability of the event respectively.

And we can stretch the cumulative distribution out by increasing the number
of events
So one way to think of problems such as “What are the odds that you will get
no more than 6 heads in 10 coin flips?” is that you are generating a
cumulative distribution, and then selecting a value off of it.
Approximation To The Binomial Distribution
Up until now we have shown the exact results for the binomial theorem.
However there are times when it is more useful to have an approximation.
For instance, let’s say that there was an election and you knew any single
voter had a 48% chance of voting for a specific candidate. You also know
that 1,000 votes will be cast.
What are the odds that the candidate will get at least 50% of the vote?
To get the exact solution, we could use the binomial equation for when there
are uneven odds

And then sum all


the outcomes for getting 500 votes, 501 votes, 502 votes, etc. all the way to
getting 1000 votes.
However that amount of math would be onerous to do by hand. A computer
could complete the calculations for all 501 outcomes just fine, but you can
certainly conceive of situations where you wouldn’t want to do that
calculation repeatedly, such as within another loop.
Fortunately, it turns out that the results from a binomial expansion are fairly
close to the normal curve, if the number of binomial events is large enough.
If we assume a normal curve, then all we need is the mean value and the
standard deviation of the binomial distribution and we can use those results
for our calculations without having to solve for every branch in the binomial
expansion.
This section goes over how to calculate the mean value and standard
deviation resulting from the binomial theorem and how to use those results.
This section also gives some examples for how close the normal curve is to
the binomial distribution, and how many binomial events you need for it to
be a good approximation.

Mean Value Of The Binomial Distribution


The mean value that results from a binomial expansion is simple. If the
probability of a single event is p, then the average number of successful
events after n trials will be np

I.e. if there is a 40% chance of success from a single event, then after 10
events the average number of successful outcomes will be 4.
Standard Deviation Of The Binomial Distribution
The standard deviation of the binomial theorem actually turns out to be a
surprisingly simple equation, although deriving the equation is more
challenging than for the mean value.
The variance of the binomial expansion is

Where n is the number of events and p is the probability of a successful


outcome from a single event. Sigma squared is the variance. Variance is the
square of the standard deviation. So the standard deviation of the binomial
expansion is

Let’s give an example of this calculation. If there is a 50% chance of


success, and you do 5 trials, we expect the standard deviation to be

Now let’s see what the results would be if we actually did the binomial
expansion and calculated the standard deviation.
From Pascal’s triangle, here is the binomial expansion up to 5 events

This says that we would have 1 time with 0 successes, 5 times with 1 success,
10 times with 2 successes, 10 times with 3 successes, 5 times with 4
successes and 1 time with 5 successes. If we listed all 32 of those numbers
out we would have
0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4,
5
The equation for population standard deviation is

Here u is the mean value, n is the total number of values (32 in this case) and
x is each individual value. Note that we are using population standard
deviation as opposed to sample standard deviation. That makes the
denominator of the above equation n as opposed to (n-1). We are using the
population equation because we are taking the standard deviation of the entire
set of data, not just a sampled subset
The mean value is 2.5, which you can get from the average of all of those
values, or using the previous equation of

You can get standard deviation out of nearly any spreadsheet software, or
mathematical tool. In Excel it would be =Stdev.P() However the manual
calculation is shown below and matches the value of 1.118 we got
previously.

Just to be sure that the formula works for all cases, let’s do it one more time
with 5 events, but this time use a 30% probability of success instead of a 50%
probability.
Based on our equation

In this case the modified Pascal’s triangle (scaled so the sum of each row is
2^n) would be
The mean value is .3 * 5 = 1.5
In this case we have 5.378 times that we had 0 successes, 11.525 times that
we had 1 success, 9.878 times that we had 2 successes, 4.234 times that we
had 3 successes, .907 times that we had 4 successes and .078 times that we
had 5 successes.
If we plug those numbers into the standard deviation calculation the result
matches the value from our previous equation.

As a side note, if you do want to manually calculate standard deviation for


those values you would probably have to do the calculation like it is done in
the image above. It would be difficult to list out non-whole amounts of a
number in order to do the calculation in a spreadsheet.
How To Use Mean And Standard Deviation To Approximate
Binomial Distribution
Let’s go back to the starting problem, which was “There is a candidate who
has a 48% chance of getting a vote from any given voter. What is that
candidate’s chances of getting at least 500 out of 1000 votes?”
Using the equations above we know that the mean number of votes is 480,
and the standard deviation in the number of votes is 15.8
Solving this example is almost as simple as pulling a cumulative probability
out of a Z-table for normal distribution, however there is one trick to be
aware of. That trick is the difference between discrete events and a range.
The binomial theorem deals with discrete events. The candidate can get 500
votes, or they can get 499 votes, or they can get 501 votes. However they
cannot get 500.25 votes. Therefore we need to round the real range into
discrete numbers. So any number between 499.5 and 500.5 would round to
500. Any number between 500.5 and 501.5 would round to 501. Since we
care about the candidate getting at least 500 votes, we need to find the
probability that the candidate will get at least 499.5 votes in the real range.
Since this is an approximation on the normal curve, we need to find the
number of standard deviations 499.5 is from the mean value.
The total difference between the required value of 499.5 and the mean value
of 480 is 19.5
The standard deviation from the binomial distribution is 15.8.
19.5 / 15.8 = 1.234 standard deviations
So the question becomes, “What are the odds that the candidate will get votes
that are at least 1.234 standard deviations from the mean?” The most
common way to get that kind of result is to look it up on a cumulative
probability Z-table. Z-tables can come in different formats. Sometimes they
show the tails of the normal distribution, sometimes they show the center.
Sometimes they show results from the left and sometimes from the right.
The Z-table shown below is cumulative results from the left, simulated by the
image below
We read the Z-table below by looking up 1.2 on the left most column, and
pairing it with .03 on the top most row. That total result is the value for 1.23
standard deviations above the mean (I am ignoring additional significant
digits)

This is a cumulative Z-table from the left. The result for 1.23 from this Z-
table is .8907. That tells us that there is an 89.07% chance that the total votes
the candidate will get will be less than 1.23 standard deviations from the
mean. That means there is a 10.93% chance the candidate will have more
than 1.23 standard deviations of votes.
That is the solution to our problem, there is a 10.93% chance of getting at
least 500 votes.
If we didn’t want to us a Z-table, we could use Excel directly to get the result
from the Z-value. Plugging in the 1.234 + additional decimal places into the
Excel function =NORM.S.DIST(Z-value,TRUE) gives the cumulative
probability the same as the Z-table. Excel gives the final result to be
10.855%, which is slightly different than from the Z-table due to our
rounding 1.234 + additional decimal places to 1.23

How Good Was The Approximation?


But wait, this was just an approximation. What would the results have been
if we actually did the binomial expansion?
If we did the full binomial expansion, we would have determined that
10.858% of the values were greater than or equal to 500 votes. This is quite
close to the 10.855% derived from the normal curve and likely within
acceptable error for most situations.
When Is The Normal Curve Approximation Good Enough?
The above example showed how to use the normal curve to approximate the
binomial results. But when is that approximation good enough?
As a rule of thumb the normal distribution is a good approximation of the
binomial distribution as long as

and

The very best approximations occur when p is near .5, n is very large, and
you are looking near the middle of the distribution instead of at the very at
edges.
For instance, here we have 100 events with a probability of .4

As you can see, the normal distribution and binomial distribution match quite
well. This fits our rule of thumb since np = 40 and n(1-p) = 60, both of
which are greater than 10.
Let’s look at a plot that should be at the edges of our rule of thumb. Here
p=.25 and n = 50, so np = 12.5. This match is adequate, but not quite as good
a fit as the one above

As we continue to decrease the number of events, this time with n=10 and p =
.25, the normal distribution is not a very good fit for the binomial
distribution, as shown below
Of course, if you only have 10 events, it is simple to calculate the full
binomial expansion and not worry about a normal approximation.
So far what we have seen is that with a large n and a p near .5 there is a good
match, and with a very small n there isn’t a good match. What about with a
large n but a small p? Here is an example with n = 1000 and p = .01. This
has a large enough number of events that you might want to approximate it,
and just matches our rule of thumb with np = 10.
Once again, as with the previous example on the boundary of our rule of
thumb, we see that this is a fair approximation. With a larger n, say 2500
and p = .01, it becomes an even better approximation
Skewness and Kurtosis
In addition to mean and standard deviation, there are two other measures of a
distribution that you can play with in order to make a normal curve better
match the binomial distribution. However these measures could be more
hassle than they are worth, so might only be worthwhile if you need to be
really accurate with your approximation. These measures are skewness and
kurtosis

Skewness
Skewness is if the one of the tails of the curve is longer than the other. For
instance, in the binomial distribution, if there is a 30% likelihood of a single
event, and you do 10 events the mean will be at 3. The left tail will go from 0
to 3, and the right tail will go from 3 to 10. This is known as skewed right,
because the right tail is longer.
In the picture below, the blue curve for 50% likelihood has no skew, and the
red curve for 30% likelihood is skewed right.

The equation to calculate skew for the binomial distribution is


Where n = number of events, p = probability of a successful event
This shows that as p moves farther away from .5, and closer to either 0 or 1,
the skewness will increase. However as the number of events increases the
skewness will decrease at the square root of the number of events.
If you have a binomial distribution with a high skew and need a very good
approximation, there exists a skew normal distribution
https://en.wikipedia.org/wiki/Skew_normal_distribution that might do a
better job of approximating the distribution than the baseline normal.
However if you stick to the rule of thumb that np >= 10 and n(1-p) >=10 in
order to use the normal approximation, worrying about skew probably isn’t
necessary.

Kurtosis
Another measure of a distribution is the kurtosis. The kurtosis measures how
much of the curve lies in the tails vs. in the middle. A distribution with a
large kurtosis has a “fat tail”, that is seemingly unlikely events are more
likely to occur than you would think.
The standard normal distribution has a kurtosis of 3. The kurtosis of other
distributions are often calculated to see if they are greater than or smaller than
the kurtosis of the standard normal distribution. This value calculated is
known as “Excess Kurtosis”. If it is positive than the curve has more values
in the tail than the normal curve, if it is negative then the curve has less
values in the tail.
The equation for Excess Kurtosis of the binomial distribution is
The binomial distribution with p = .5 has a slightly negative excess Kurtosis.
That means you are less likely to get events with a high standard deviation.
Once p gets below .21 or above .79 the Kurtosis is positive, which means
unlikely events are slightly more likely than they would be in the standard
normal distribution.
However since the value of n is on the denominator, this means the kurtosis
of the system decreases as the number of events increases. So for any large
value of n the likely significance of the kurtosis is small.
Multinomial Distribution
Up until now we’ve discussed only the binomial distribution. That is, the
distribution with two mutually exclusive events. However you don’t always
have to have only two events. Sometimes you can have more.
For instance, we’ve limited our events to success/failure. But what if the
outcomes could be A, B, or C? Instead if a Bi-nomial distribution, meaning
two, we will use a multinomial distribution.
It turns out that the multinomial equation is nearly the same as the binomial
equation. Let’s take a look at the binomial equation again in a slightly
different way. The binomial equation that we saw before was

Where n is the total number of events, k is the number of successful events,


and p is the probability of a successful event. Let’s rewrite that equation
using pa as the probability of event A (success) and pb as the probability of
event B (failure). We will also use A, B to represent the number of times
event A and B occur respectively. The equation becomes

and we know

And

This is the exact same equation as the binomial equation except it is more
clearly two distinct events. So what would the equation be for three distinct
outcomes?

The first part is just the combination formula for 3 outcomes.

It isn’t common to see the combination formula expressed as 3 or more


outcomes, but it works just fine. The second part of the multinomial equation
is just raising the probability of a single event to the power of the number of
times that event occurs

And note that

And

Basically all the event counts add up to the total number of events and all the
probabilities add up to 1. Extending the multinomial equation to more than 3
possible outcomes becomes obvious. 4 events would just be
and each additional outcome gets included both in the denominator and as a
power.
So How Would You Use This Multinomial Equation?
For starters, visualization is a problem. We don’t have an equivalent to
Pascal’s triangle in arbitrary dimensions. For 3 dimensions one could
visualize a pyramid or a stack of Pascal’s triangles, but it becomes even more
difficult with higher dimensions.
However simply using the equation is straight forward. Let’s say you have a
sack with a large number of balls. (Large so that the probability doesn’t
change as you draw balls from it). 50% of the balls are red, 30% are blue,
and 20% are green.
If you draw 9 balls, what are the odds that you will pull exactly 4 red ones, 3
blue ones, and 2 green ones?
We can plug those values into the equation and get

The result is = 8.505 %


If You Find Bugs & Omissions:
We put some effort into trying to make this book as bug free as possible, and
including what we thought was the most important information. However if
you have found some errors or significant omissions that we should address
please email us here

And let us know. If you do, then let us know if you would like free copies of
our future books. Also, a big thank you!
More Books
If you liked this book, you may be interested in checking out some of my
other books such as

Probability With Permutations And Combinations – Which gives


examples on how to use permutations and combinations. It goes
over the equations in ways rarely seen, such as how to do
combinations or permutations for 3 or more groups.
Bayes Theorem Examples – Which walks through how to update
your probability estimates as you get new information about
things. It gives half a dozen easy to understand examples on how
to use Bayes Theorem
Machine Learning With Random Forests And Decision Trees–
which goes through how Random Forests and Decision Trees
work at a conceptual level, mostly programming language
agnostic.
Thank You
Before you go, I’d like to say thank you for purchasing my eBook. I know
you have a lot of options online to learn this kind of information. So a big
thank you for reading all the way to the end.
If you like this book, then I need your help. Please take a moment to leave
a review on Amazon. It really does make a difference, and will help me
continue to write quality eBooks on Math, Statistics, and Computer Science.

P.S.
I would love to hear from you. It is easy for you to connect with us on
Facebook here
https://www.facebook.com/FairlyNerdy
or on our webpage here
http://www.FairlyNerdy.com
But it’s often better to have one-on-one conversations. So I encourage you to
reach out over email with any questions you have or just to say hi!
Simply write here:

~ Scott Hartshorn

You might also like