Professional Documents
Culture Documents
Pascal’s Triangle
By Scott Hartshorn
Thank You!
Thank you for getting this book! This book walks through how to use the
binomial equation to calculate the likelihood of certain outcomes after a
number of discrete events, for instance how likely you are to roll 3 sixes out
of 12 rolls of a die. One way to visualize the binomial distribution is to use
Pascal’s triangle, and the beginning of this book focuses on using Pascal’s
triangle to calculate outcomes. Later, the book gets into calculating outcomes
with uneven probabilities or outcomes for a large number of events and the
binomial equation is used for those calculations.
The final part of the book is a guide for how to approximate the binomial
results using a normal curve, as well as when it is correct to do so. An
example of when you would want to do that approximation is when there are
a huge number of discrete events and you need to calculate them over a large
range, such as if you are trying to estimate the odds of a politician getting at
least 50% of the votes out of a million votes cast.
If you want to help us produce more material like this, then please leave a
positive review on Amazon. It really does make a difference!
Your Free Gift
As a way of saying thank you for your purchase, I’m offering this
binomial theorem cheat sheet that’s exclusive to my readers.
This FREE PDF cheat sheet has the binomial equation, and an example. It
also shows Pascal’s triangle, which is quite useful for visualizing the
binomial distribution. This is a PDF document that I encourage you to print,
save, and share. You can download it by going here
http://www.fairlynerdy.com/binomial-theorem-cheat-sheet/
Table of Contents
1. Thank You!
2. Binomial Distribution – The Basics
3. Example 1 – Binomial Theorem With Pascal’s Triangle
4. Other Ways To Think Of And To Visualize Pascal’s Triangle
5. For Even Odds – Highest Values In The Middle
6. A “Real Life” Example Of The Binomial Distribution In Action
7. Binomial Theorem With Uneven Odds
8. The Binomial Equation
9. Another Example With The Binomial Equation
10. Binomial Theorem With “At Least” A Number
11. Cumulative Distribution Function
12. Approximation To The Binomial Distribution
13. How To Use Mean And Standard Deviation To Approximate
Binomial Distribution
14. When Is The Normal Curve Approximation Good Enough?
15. Multinomial Distribution
16. So How Would You Use This Multinomial Equation?
17. If You Find Bugs & Omissions:
18. More Books
Binomial Distribution – The Basics
The Binomial Distribution can be used any time there are one or more
discrete events that have exactly two outcomes. An example of a discrete
event is a coin flip, or a roll of a die, or a vote cast by a single voter. Each
event happens and is done. This contrasts to a continuous event such as
learning a new skill, or the tide rolling in, where there is no clear start or stop.
The binomial equation also only applies to events with two mutually
exclusive outcomes. Yes/No, True/False, Success/Failure. Events with 3 or
more outcomes would fall under the multinomial equation, which is covered
at the end of this book.
The purpose of using the binomial equation is to determine how likely a
given outcome is after a series of events. For instance, if you flip a coin 10
times, how likely are you to get 7 heads? If you have a 52% winning
percentage at blackjack, how likely are you to be ahead after 1000 hands?
Results from the binomial equation have a characteristic shape, similar to the
normal curve, show in the example below
That shape can change as you change the number of events, or the likelihood
of a given event. A low probability for an event will cause the bulk of the
curve to slide to the left. A high probability for a given event will cause the
bulk of the curve to slide to the right.
The first example below starts simple, with only a few discrete events and a
50% likelihood for those events. Subsequent examples add additional
complexity step by step.
Example 1 – Binomial Theorem With Pascal’s Triangle
In this example suppose that you have a fair coin, and you flip it 10 times.
What is the probability that you will get exactly 4 heads?
There are a couple of different ways we could solve this problem, but I’m
going to use Pascal’s triangle, since it is simple and intuitive. Later examples
will use the Binomial equation, since it is more powerful, but harder to
remember.
The top row represents having no events, and has only the single digit 1.
This is saying that if you don’t do any events then there is only one possible
outcome (having no successes). The next row, which we have labeled Row
1, has two outcomes. It represents a single event, for instance a single flip of
a coin. The two outcomes are the number of times you will get zero heads
after that flip of the coin, and the number of times you will get 1 head after
that flip. The next row, which we have labeled Row 2, has 3 possible
outcomes. For the coin example, this would be getting zero heads, getting
one head, or getting two heads in two flips of the coin.
Note that the total sum of Row 2 is 4. This means that if you flip a fair coin
twice, and repeat that pair of flips 4 times, the most likely result is to get zero
heads on one pair of flips, one head on two pairs of flips, and two heads on
one pair of flips. However, maybe you don’t want to do 4 pairs of flips,
maybe you only want to do one pair of flips and want to know your odds of
getting a pair of heads, a head and a tail, or a pair of tails. In that case you
can normalize the results. The total sum of Row 2 is 4. If we divide all the
values in Row 2 by 4 they become .25, .50, and .25. This means that if you
do a single pair of flips, your chances of getting zero heads is 25%, your
chances of getting a head and a tail is 50%, and your chances of getting two
heads is 25%
Naturally, Pascal’s triangle doesn’t stop at Row 2. It continues down
indefinitely for as long as you wish to continue calculating it, with each
subsequent row representing adding another event. This is the same as
saying that you could continue flipping coins and tabulating the outcomes for
as long as you wish. You could always do another flip or add another row.
You can calculate each row from the row above it. Each number in a row is
the sum of the two numbers directly above it. For instance, the 2 in Row 2 is
the sum of the two 1’s above it. The two 1’s on the edges of row 2 only have
a single number directly above them. Those outside edges of the triangle will
remain 1 the whole way down.
Here is Pascal’s triangle continued down to 5 events
Seeing it extended down this far makes it obvious how the numbers are the
sum of the numbers above them, and also obvious that the biggest numbers in
the triangle are going to be at the center of any given row, with the smaller
numbers on the outside of the row.
Other Ways To Think Of And To Visualize Pascal’s Triangle
There are a couple of ways to think about Pascal’s triangle. The first way is
what was described above, each value is the sum of the numbers above it.
The second way is that each number represents the value of the combination
formula where a number’s row and column corresponds to the number of
events and number of successes. This way of utilizing Pascal’s triangle is
quite useful, and we will go over it later in the book. The third way of
thinking about Pascal’s triangle we will touch on briefly here, because it is a
very intuitive way of thinking about it.
And that way is that each value in Pascal’s triangle represents the number of
paths you can take to reach it. For instance, the 2 in Row 2 can be reached
by going left and then right, or right and then left. Either of the 3’s in the
third row can be reached 3 ways.
If you think of these paths as events, such as flips of a coin, this is showing
that if you flip a coin three times there are 3 different ways that you can get a
single head. Each rightward arrow represents a success (heads) and each
leftward arrow represents a failure (tails). Each distinct path is a different
permutation of the events.
The values are the same as in the other version of Pascal’s triangle. For
instance, row 5 has values of 1, 5, 10, 10, 5, and 1. But in this version each
number is the sum of the number above it and the number to the left and
above it. It happens that this modified format for Pascal’s triangle is easier
to create in tables such as the ones that Excel uses, so this book will tend to
use that version.
Many of the charts and tables in this book have been created in Excel. You
can download all of them, for free, here
http://www.fairlynerdy.com/binomial-theorem-examples/
For Even Odds – Highest Values In The Middle
We observed that the highest values in Pascal’s triangle were in the middle of
any given row. Let’s look at why that is, using coin flips as an example. If
we go down to 3 flips, we see that the values in the row are 1, 3, 3, 1
Since the sum of those numbers is 8, this means that there are 8 possible
orders that all the flips could be in. The values in Pascal’s triangle and the
binomial equation correspond to the Combination values, where order is not
important. This is as opposed to Permutation values where the order is
important. If we list out all 8 possible permutations for 3 flips, they are
T T T ( 0 Heads )
H T T ( 1 Head )
T H T ( 1 Head )
T T H ( 1 Head )
T H H ( 2 Heads )
H T H ( 2 Heads )
H H T ( 2 Heads )
H H H ( 3 Heads )
And we can see the same values in these permutations as we did in Pascal’s
triangle. For either no heads, or all heads, there is only 1 possible way to get
that outcome: keep flipping either all heads or all tails. However if you can
have some of flips be heads and some be tails then there are multiple ways to
get that outcome, and it occurs more frequently.
The probability of getting either 0, 1, 2, or 3 heads out of 3 flips is the total
number of times each outcome occurred, divided by the 8 possible outcomes.
This is
With 10 events, and a 50% likelihood for a single event, we can pull the
distribution of binomial results from row 10 of Pascal’s triangle
Those results are plotted below
With a 50% likelihood, the most frequent results are in the center of number
of outcomes. After 10 flips there are 1024 possible outcomes, and 252 of
those outcomes have 5 heads / 5 tails. So if you were to guess the number of
heads after 10 flips, you would guess 5 as the most likely outcome, even
though it is only a 24.6% likelihood overall.
As the number of events increases, the overall shape remains more or less the
same, however it spreads out farther to the right to encompass more possible
outcomes. Since there are more possible outcomes, the likelihood of any
individual outcome decreases. The chart below shows the binomial
distribution at a 50% likelihood for 6 events, 10 events, and 14 events. The
distribution for each line has been normalized so that the total probability
(area under each curve) is 1.0. You can see how adding more events
stretches out the distribution and makes it less tall.
One interesting observation is that this binomial distribution looks a lot like a
normal distribution. This is a fact that we will take advantage of later in the
book in order to approximate binomial results without having to calculate all
of them for large numbers of events.
Bean Machine
The Bean Machine was a late 1800’s invention that used this same bouncing
off a peg property to generate a random number in the binomial distribution.
One example of a bean machine is shown here
Photo by Antoine Taveneaux CC BY-SA 3.0
This device used the same principle of left-right choices to divide the balls
into slots resulting in a binomial distribution.
Binomial Theorem With Uneven Odds
As promised, it is time to look at events that do not have an equal chance of
success. To simulate this, instead of flipping a coin we will think of rolling a
die.
A typical die has 6 sides. The odds of rolling a given number on that die are
1 in 6, so 16.6%. To make the numbers a little bit more round, let’s assume
that you are using a 4 sided die, which do exist. A success for you is to roll a
4, anything else is a failure. That means the success rate is 1 in 4, so 25%
Why is a rolling a 4 considered a success? Well truthfully, mathematicians,
such as you and I, don’t concern ourselves overly much with “How can we
relate this to the real world?” kind of thoughts, but for these series of
problems we can assume that we are simulating a role playing game, such as
Dungeons and Dragons, and you need to get hits with your 4 sided die.
You roll the 4 sided die 7 times, what are the chances of getting exactly 2
hits?
We can solve that question directly with the binomial equation, which is
To answer our question, we can see from row 7 column 3 of this triangle that
the odds of getting exactly 2 hits out of 7 rolls with the 4 sided die is 31.1%
This version of Pascal’s triangle was simple to generate. Each cell is 75% of
the value above it added to 25% of the value above it to the left. An
immediate difference that you will notice in this version of Pascal’s triangle
vs the standard version at a 50% probability is that these numbers are smaller
than one, and they are also decimals instead of integers.
The fact that these numbers are decimals instead of integers is simply because
an arbitrary fraction does not necessarily have a clean integer ratio, so it is
often simpler to work with decimals. Pascal’s triangle with even odds of 50%
could easily scale as 1+1=2. That is why in the normal version of Pascal’s
triangle each row is double the value of the row above it. In this modified
triangle, the fact that the numbers are smaller than 1 is because the sum of
each row is equal to the sum of the row above it, i.e. 1.0. This is different
than the standard Pascal’s triangle. In the standard version, the sum of each
row is double to row above it, but then you divide by 2 to the power of the
row number to normalize it. If we wanted, we could multiply each row by 2
to the appropriate power to get a scaled value.
This version of Pascal’s triangle was intended mainly to highlight how each
subsequent cell is formed. Namely that every directly downward step is
equivalent to multiplying by .75, and every downward and rightward step is
equivalent to multiplying by .25. In addition to those multiplications, they
are also multiplied by the standard values in Pascal’s triangle which
correspond to how many routes could be taken to get to a given cell.
That means that to calculate the value of any cell in this triangle, we simple
need to know what row it is in, and how many successes we have had.
The value of any cell in this modified Pascal’s triangle is equal to
If n is the number of events, and k is the number of successes, the formula for
any cell in this modified Pascal’s triangle is
just shows that we are going to be calculating a result for a single specific
value in the binomial distribution. This equation doesn’t show all the
possible outcomes of flipping a coin 10 times. It doesn’t show how often you
will get 0 heads, and 1 head, and 2 heads etc. This equation only shows the
probability for a single one of those outcomes. For instance you could use it
to calculate the probability of getting exactly 4 heads in 10 flips. Of course,
you could then use it again to also calculate the probability of 3 heads, and
again for 5 heads and again for 6 heads etc.
As far as what the letters mean
For this book, we are going to use the combination formula without diving
very much more deeply into why it works the way it does. But the
combination formula, and permutation formula, have interesting properties in
their own right. If you are interesting in them, you may want to check out my
book Probability with Permutations and Combinations on Amazon.
The value of 10 from the combination formula of 5 pick 2 turns out to be the
same value that is in Pascal’s triangle for 5 events, 2 successes
If you were to solve the combination formula for 5 events and 0 successes, 1
success, 2 successes etc. all the way through 5 successes, you would
duplicate the 1, 5, 10, 10, 5, and 1 values that are in Pascal’s triangle.
The third part of the equation is
This part is the probabilities of the event occurring raised to the power of the
number of times it occurs, multiplied by the probability of the event not
occurring raised to the power of the number of times it didn’t occur.
For instance, if you had an event that had a 30% probability of occurring and
it happened 4 times in 10 trials, this part of the equation would be
This is the part of the equation that normalizes the results. When we looked
at the probability of getting heads or tails after 3 flips based on Pascal’s
triangle, we started with the 1, 3, 3, 1 in row 3
And then we divided those numbers by 8 to get 12.5%, 37.5%, 37.5%, and
12.5%. The p^k * (1-p)^(n-k) part of the equation is what was doing that
dividing by 8. In that example, the likelihood of a successful outcome was
50%. That meant that both p and 1-p were equal values of .5. Therefor this
part of the equation
Became
n is the total number of events, 100 hands in this case. K is the number of
hands that you won, 51 and p is the odds of winning a given hand, .55
The result in this case is .0577, or 5.77%, which is the probability of you
winning exactly 51 hands.
Binomial Theorem With “At Least” A Number
The previous example showed the probability of getting exactly 51 wins out
of 100 bets. Honestly that example felt a little bit contrived, because in the
real world what you are more frequently concerned with is “at least” or “no
more than” rather than an exact value. The gambler in the previous example
is more likely to care about winning at least 51 bets, so that he knows he
leaves a winner for the day, rather than winning exactly 51 bets.
The solution to an at least problem is simply to solve the exact number
problem multiple times, and add up all the probabilities. For instance, if you
had the problem of calculating the odds of getting 4 or fewer heads in 10
flips, you would solve for the odds of getting 0 heads, and separately solve
for the odds of getting 1, and 2, and 3, and 4. Adding up all of those odds
would give you the odds of four or fewer.
Visualizing that using Pascal’s triangle
However in the previous section, when we started talking about summing that
probability mass function over a range of values, what we get is the
“cumulative distribution function”. This function is the running sum of all
the probabilities up to a certain number. It always starts at zero and goes to
one, and has a different characteristic shape similar to a stretched out S.
We can shift the cumulative distribution function left or right by decreasing
or increasing the probability of the event respectively.
And we can stretch the cumulative distribution out by increasing the number
of events
So one way to think of problems such as “What are the odds that you will get
no more than 6 heads in 10 coin flips?” is that you are generating a
cumulative distribution, and then selecting a value off of it.
Approximation To The Binomial Distribution
Up until now we have shown the exact results for the binomial theorem.
However there are times when it is more useful to have an approximation.
For instance, let’s say that there was an election and you knew any single
voter had a 48% chance of voting for a specific candidate. You also know
that 1,000 votes will be cast.
What are the odds that the candidate will get at least 50% of the vote?
To get the exact solution, we could use the binomial equation for when there
are uneven odds
I.e. if there is a 40% chance of success from a single event, then after 10
events the average number of successful outcomes will be 4.
Standard Deviation Of The Binomial Distribution
The standard deviation of the binomial theorem actually turns out to be a
surprisingly simple equation, although deriving the equation is more
challenging than for the mean value.
The variance of the binomial expansion is
Now let’s see what the results would be if we actually did the binomial
expansion and calculated the standard deviation.
From Pascal’s triangle, here is the binomial expansion up to 5 events
This says that we would have 1 time with 0 successes, 5 times with 1 success,
10 times with 2 successes, 10 times with 3 successes, 5 times with 4
successes and 1 time with 5 successes. If we listed all 32 of those numbers
out we would have
0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4,
5
The equation for population standard deviation is
Here u is the mean value, n is the total number of values (32 in this case) and
x is each individual value. Note that we are using population standard
deviation as opposed to sample standard deviation. That makes the
denominator of the above equation n as opposed to (n-1). We are using the
population equation because we are taking the standard deviation of the entire
set of data, not just a sampled subset
The mean value is 2.5, which you can get from the average of all of those
values, or using the previous equation of
You can get standard deviation out of nearly any spreadsheet software, or
mathematical tool. In Excel it would be =Stdev.P() However the manual
calculation is shown below and matches the value of 1.118 we got
previously.
Just to be sure that the formula works for all cases, let’s do it one more time
with 5 events, but this time use a 30% probability of success instead of a 50%
probability.
Based on our equation
In this case the modified Pascal’s triangle (scaled so the sum of each row is
2^n) would be
The mean value is .3 * 5 = 1.5
In this case we have 5.378 times that we had 0 successes, 11.525 times that
we had 1 success, 9.878 times that we had 2 successes, 4.234 times that we
had 3 successes, .907 times that we had 4 successes and .078 times that we
had 5 successes.
If we plug those numbers into the standard deviation calculation the result
matches the value from our previous equation.
This is a cumulative Z-table from the left. The result for 1.23 from this Z-
table is .8907. That tells us that there is an 89.07% chance that the total votes
the candidate will get will be less than 1.23 standard deviations from the
mean. That means there is a 10.93% chance the candidate will have more
than 1.23 standard deviations of votes.
That is the solution to our problem, there is a 10.93% chance of getting at
least 500 votes.
If we didn’t want to us a Z-table, we could use Excel directly to get the result
from the Z-value. Plugging in the 1.234 + additional decimal places into the
Excel function =NORM.S.DIST(Z-value,TRUE) gives the cumulative
probability the same as the Z-table. Excel gives the final result to be
10.855%, which is slightly different than from the Z-table due to our
rounding 1.234 + additional decimal places to 1.23
and
The very best approximations occur when p is near .5, n is very large, and
you are looking near the middle of the distribution instead of at the very at
edges.
For instance, here we have 100 events with a probability of .4
As you can see, the normal distribution and binomial distribution match quite
well. This fits our rule of thumb since np = 40 and n(1-p) = 60, both of
which are greater than 10.
Let’s look at a plot that should be at the edges of our rule of thumb. Here
p=.25 and n = 50, so np = 12.5. This match is adequate, but not quite as good
a fit as the one above
As we continue to decrease the number of events, this time with n=10 and p =
.25, the normal distribution is not a very good fit for the binomial
distribution, as shown below
Of course, if you only have 10 events, it is simple to calculate the full
binomial expansion and not worry about a normal approximation.
So far what we have seen is that with a large n and a p near .5 there is a good
match, and with a very small n there isn’t a good match. What about with a
large n but a small p? Here is an example with n = 1000 and p = .01. This
has a large enough number of events that you might want to approximate it,
and just matches our rule of thumb with np = 10.
Once again, as with the previous example on the boundary of our rule of
thumb, we see that this is a fair approximation. With a larger n, say 2500
and p = .01, it becomes an even better approximation
Skewness and Kurtosis
In addition to mean and standard deviation, there are two other measures of a
distribution that you can play with in order to make a normal curve better
match the binomial distribution. However these measures could be more
hassle than they are worth, so might only be worthwhile if you need to be
really accurate with your approximation. These measures are skewness and
kurtosis
Skewness
Skewness is if the one of the tails of the curve is longer than the other. For
instance, in the binomial distribution, if there is a 30% likelihood of a single
event, and you do 10 events the mean will be at 3. The left tail will go from 0
to 3, and the right tail will go from 3 to 10. This is known as skewed right,
because the right tail is longer.
In the picture below, the blue curve for 50% likelihood has no skew, and the
red curve for 30% likelihood is skewed right.
Kurtosis
Another measure of a distribution is the kurtosis. The kurtosis measures how
much of the curve lies in the tails vs. in the middle. A distribution with a
large kurtosis has a “fat tail”, that is seemingly unlikely events are more
likely to occur than you would think.
The standard normal distribution has a kurtosis of 3. The kurtosis of other
distributions are often calculated to see if they are greater than or smaller than
the kurtosis of the standard normal distribution. This value calculated is
known as “Excess Kurtosis”. If it is positive than the curve has more values
in the tail than the normal curve, if it is negative then the curve has less
values in the tail.
The equation for Excess Kurtosis of the binomial distribution is
The binomial distribution with p = .5 has a slightly negative excess Kurtosis.
That means you are less likely to get events with a high standard deviation.
Once p gets below .21 or above .79 the Kurtosis is positive, which means
unlikely events are slightly more likely than they would be in the standard
normal distribution.
However since the value of n is on the denominator, this means the kurtosis
of the system decreases as the number of events increases. So for any large
value of n the likely significance of the kurtosis is small.
Multinomial Distribution
Up until now we’ve discussed only the binomial distribution. That is, the
distribution with two mutually exclusive events. However you don’t always
have to have only two events. Sometimes you can have more.
For instance, we’ve limited our events to success/failure. But what if the
outcomes could be A, B, or C? Instead if a Bi-nomial distribution, meaning
two, we will use a multinomial distribution.
It turns out that the multinomial equation is nearly the same as the binomial
equation. Let’s take a look at the binomial equation again in a slightly
different way. The binomial equation that we saw before was
and we know
And
This is the exact same equation as the binomial equation except it is more
clearly two distinct events. So what would the equation be for three distinct
outcomes?
And
Basically all the event counts add up to the total number of events and all the
probabilities add up to 1. Extending the multinomial equation to more than 3
possible outcomes becomes obvious. 4 events would just be
and each additional outcome gets included both in the denominator and as a
power.
So How Would You Use This Multinomial Equation?
For starters, visualization is a problem. We don’t have an equivalent to
Pascal’s triangle in arbitrary dimensions. For 3 dimensions one could
visualize a pyramid or a stack of Pascal’s triangles, but it becomes even more
difficult with higher dimensions.
However simply using the equation is straight forward. Let’s say you have a
sack with a large number of balls. (Large so that the probability doesn’t
change as you draw balls from it). 50% of the balls are red, 30% are blue,
and 20% are green.
If you draw 9 balls, what are the odds that you will pull exactly 4 red ones, 3
blue ones, and 2 green ones?
We can plug those values into the equation and get
And let us know. If you do, then let us know if you would like free copies of
our future books. Also, a big thank you!
More Books
If you liked this book, you may be interested in checking out some of my
other books such as
P.S.
I would love to hear from you. It is easy for you to connect with us on
Facebook here
https://www.facebook.com/FairlyNerdy
or on our webpage here
http://www.FairlyNerdy.com
But it’s often better to have one-on-one conversations. So I encourage you to
reach out over email with any questions you have or just to say hi!
Simply write here:
~ Scott Hartshorn