You are on page 1of 50

Section 6.

3
Binomial and Geometric
Random Variables
Introduction
We frequently encounter experimental
situations where there are two
outcomes of interest
–Boy/girl
–Makes a shot/misses
–Meets requirements/doesn’t meet
–Flip coin—kicking off/receiving
Binomial Settings
When the same chance process is repeated several times, we are often
interested in whether a particular outcome does or doesn’t happen on
each repetition. Some random variables count the number of times the
outcome of interest occurs in a fixed number of repetitions. They are
called binomial random variables.
A binomial setting arises when we perform several independent trials of
the same chance process and record the number of times that a
particular outcome occurs. The four conditions for a binomial setting are:

B• Binary? The possible outcomes of each trial can be classified as


“success” or “failure.”
I• Independent? Trials must be independent; that is, knowing the result of
one trial must not tell us anything about the result of any other trial.
N• Number? The number of trials n of the chance process must be fixed in
advance.
S• Success? There is the same probability p of success on each trial.
Binomial Distribution
• Binomial distribution – the distribution
of the count X of successes in the
binomial setting with parameters n
and p
• N = numbers of observations
• P = probability of success
we say X is B(n,p)
4
Determine whether the random variables below have a
binomial distribution. Justify your answer
1. Roll a fair die 10 times and let X = the number of 6s.
Yes: B - success = 6, failure = 1-5
I – knowing the outcome of one roll doesn’t influence the outcome of any other rolls
N - n = 10 trials
S – prob is always the same 1/6
2. Shoot a basketball 20 times from various distances on the
court. Let Y = number of shots made.
No: B - success = make the shot, failure = don’t make the shot
I – knowing the outcome of one roll doesn’t influence the outcome of any other rolls
N - n = 20 trials
S – prob changes as you move around the court.

3. Observe the next 100 cars that go by and let C = color.


No: B – no success or failure. Too many colors.
I – knowing the outcome of one roll doesn’t influence the outcome of any other rolls
N - n = 100 trials
S – don’t know what a success is to determine what the probability is.
• Binomial distributions are an
important class of discrete
probability distributions
• Remember – not all counts have
binomial distributions
• The central problem of a binomial
experiment is to find the probability
of the number of successes out of n
trials 6
Binomial Probabilities
Each child of a particular pair of parents has probability 0.25 of
having type O blood. Genetics says that children receive genes
from each of their parents independently. If these parents have 5
children, the count X of children with type O blood is a binomial
random variable with n = 5 trials and probability p = 0.25 of a
success on each trial. In this setting, a child with type O blood is a
“success” (S) and a child with another blood type is a “failure” (F).
What’s P(X = 2)?
P(SSFFF) = (0.25)(0.25)(0.75)(0.75)(0.75) = (0.25)2(0.75)3 = 0.02637
However, there are a number of different arrangements in which 2 out
of the 5 children have type O blood:
SSFFF SFSFF SFFSF SFFFS FSSFF
FSFSF FSFFS FFSSF FFSFS FFFSS
Verify that in each arrangement, P(X = 2) = (0.25)2(0.75)3 = 0.02637
Therefore, P(X = 2) = 10(0.25)2(0.75)3 = 0.2637
Binomial coefficient
𝑛 𝑛!
(number of ways of =
arranging k successes 𝑘 𝑘! 𝑛 − 𝑘 !
among n observations)
nCr

Binomial probability
𝑛 𝑘 𝑛−𝑘
𝑃 𝑥=𝐾 = 𝑝 1−𝑝
𝑘
Number of
arrangements of Probability of
k successes Probability of n-k failures
k successes
8
Review : Algebra II Topic

• Binomial • Pascal’s
expansion: Triangle

• (x + y) n
• nCr

9
How to Find Binomial Probabilities
How to Find Binomial Probabilities
Step 1: State the distribution and the values of interest. Specify a
binomial distribution with the number of trials n, success probability p, and
the values of the variable clearly identified.
Step 2: Perform calculations—show your work!
Do one of the following:
(i) Use the binomial probability formula to find the desired probability; or
(ii) Use binompdf or binomcdf command and label each of the inputs.
Step 3: Answer the question.
Example: How to Find Binomial Probabilities
Each child of a particular pair of parents has probability 0.25 of having
blood type O. Suppose the parents have 5 children
(a) Find the probability that exactly 3 of the children have type O blood.

Let X = the number of children with type O blood. We know X has a


binomial distribution with n = 5 and p = 0.25.
æ5ö
P(X = 3) = ç ÷(0.25) 3 (0.75) 2 = 10(0.25) 3 (0.75) 2 = 0.08789
è 3ø
(b) Should the parents be surprised if more than 3 of their children have
type O blood? To answer this, we need to find P(X > 3).
P(X > 3) = P(X = 4) + P(X = 5)
Since there is only a 1.5%
æ 5ö æ5ö chance that more than 3
= ç ÷(0.25) (0.75) + ç ÷(0.25) 5 (0.75) 0
4 1
children out of 5 would
è 4ø è5ø
have Type O blood, the
= 5(0.25) 4 (0.75)1 + 1(0.25) 5 (0.75) 0 parents should be
surprised!
= 0.01465 + 0.00098 = 0.01563
cdf
• We frequently want to find the probability
that a random variable takes a range of values
• The cumulative distribution function of X
calculates the sum of the probabilities for 0,
1, 2, …, up to the value X
• That is, it calculates the probability of
obtaining at most X successes in n trials
From Using a TI-83 or TI-84 Series Graphing Calculator
in an Introductory Statistics class
By W. Scott Street, IV
Dept of Statistical Sciences & Operations Research
VA Commonwealth University
Cumulative probability

From Using a TI-83 or TI-84 Series Graphing Calculator


in an Introductory Statistics class
By W. Scott Street, IV
Dept of Statistical Sciences & Operations Research
VA Commonwealth University
Suppose a cereal manufacturer puts cards of
famous athletes in boxes of cereal in the hope of
boosting sales. The manufacturer announces that
20% of the boxes contain an Alex Rodriguez card,
30% contain a card of Michael Phelps and the rest
contain a card of Serena Williams. You buy 6 boxes
of cereal, what is the probability you get exactly 2 A-
Rod cards?
P(2 successes among 6 trials) =
(.2)(.2)(.8)(.8)(.8)(.8) = 0.0164

𝑛 𝑛! 6 6!
= = = 15
𝑟 𝑥! 𝑛 − 𝑥 ! 2 2! 6 − 2 !

𝑛 𝑟 𝑛−𝑟
𝑝 1−𝑝
𝑟
2 4
𝑃 #𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 = 2 = 15 .2 .8
= 0.246
The Last Kiss
Do people have a preference for the last thing they taste?
Researchers at the University of Michigan designed a study to find
out. The researchers gave 22 students five different Hershey’s
Kisses (milk chocolate, dark chocolate, crème, caramel, and
almond) in random order and asked the student to rate each one.
Participants were not told how many Kisses they would be
tasting. However, when the 5th and final Kiss was presented,
participants were told that it would be their last one. Of the 22
students, 14 of them gave the final Kiss the highest rating.
<http://www.sitemaker.umich.edu/eob/files/obrienellsworth201
2.pdf>
Problem: Assume that the participants in the study don’t have a
special preference for the last thing they taste. That is, assume
that the probability a person would prefer the last Kiss tasted is p
= 0.20.
(a) What is the probability that exactly 5 of the 22 participants
would prefer the last Kiss they tried?
(b) What is the probability that 14 or more of the 22 participants
would prefer the last Kiss they tried?
(a) Step 1: State the distribution and the values of interest. Let X =
the number of participants who prefer the last Kiss they taste. X
has a binomial distribution with n = 22 and p = 0.20. We want to
find P(X = 5).

Step 2: Perform calculations—show your work!


 22 
P  X  5     0.20   0.80 
5 17
= 0.1898.
5

Using technology: The command binompdf(trials:22,


p:0.20, x value:5) gives 0.1898.

Step 3: Answer the question. There is about a 19% chance that


exactly 5 participants would choose the last Kiss, assuming that
they have no special preference for the last thing they taste.
(b) Step 1: State the distribution and the values of interest.
Let X = the number of participants who prefer the last Kiss they
taste. X has a binomial distribution with n = 22 and p = 0.20.
We want to find P(X ≥ 14).

Step 2: Perform calculations—show your work! P(X ≥ 14) = 1 –


P(X ≤ 13). Using technology: The command 1 –
binomcdf(trials:22, p:0.20, x value:13) gives 0.00001.

Step 3: Answer the question. There is about a 0.001% chance


that 14 or more participants would choose the last Kiss,
assuming that they have no special preference for the last thing
they taste. Because this probability is so small, there is
convincing evidence that the participants have a preference for
the last thing they taste. It is almost impossible to get 14 or
more just by chance.
A quality engineer selects an SRS of 10 switches
from a large shipment for detailed inspection.
Unknown to the engineer, 10% of the switches in
the shipment fail to meet the specifications.
What is the probability that no more than 1 of the
10 switches in the sample fail inspection?
Let X= number of switches that fail. We are
looking for P(X<1). This is a binomial dist with
n=10, p=0.1.
Binomcdf(10, .1, 1) = 0.7361
The probability that no more than 1 switch will
fail is 73.6%.
Mean and Standard Deviation of a Binomial Distribution
We describe the probability distribution of a binomial random variable
just like any other distribution – by looking at the shape, center, and
spread. Consider the probability distribution of X = number of children
with type O blood in a family with 5 children.
xi 0 1 2 3 4 5
pi 0.2373 0.3955 0.2637 0.0879 0.0147 0.00098

Shape: The probability distribution of X is skewed to


the right. It is more likely to have 0, 1, or 2 children
with type O blood than a larger value.
Center: The median number of children with
type O blood is 1. Based on our formula for the
mean:
m X = å x i pi = (0)(0.2373) + 1(0.39551) + ...+ (5)(0.00098)
= 1.25
Spread: The variance of X is s = å (x i - m X ) 2 pi = (0 -1.25) 2 (0.2373) + (1-1.25) 2 (0.3955) + ...+
2
X

(5 -1.25) 2 (0.00098) = 0.9375


The standard deviation of X is s X = 0.9375 = 0.968
Mean and Standard Deviation of a
Binomial Distribution
Mean and Standard Deviation of a Binomial Random Variable
If a count X has the binomial distribution with number of trials n and
probability of success p, the mean and standard deviation of X are

m X = np
s X = np(1- p)

Note: These formulas work ONLY for


binomial distributions.
They can’t be used for other distributions!
Tastes as good as the real thing?
The makers of a diet cola claim that its taste is indistinguishable
from the full-calorie version of the same cola. To investigate, an
AP® Statistics student named Emily prepared small samples of each
type of soda in identical cups. Then she had volunteers taste each
cola in a random order and try to identify which was the diet cola
and which was the regular cola. Overall, 23 of the 30 subjects
made the correct identification.
If we assume that the volunteers really couldn’t tell the difference,
then each one was guessing with a 1/2 chance of being correct. Let
X = the number of volunteers who correctly identify the colas.
Problem:
(a) Explain why X is a binomial random variable.
(b) Find the mean and the standard deviation of X. Interpret each
value in context.
(c) Of the 30 volunteers, 23 made correct identifications. Does this
give convincing evidence that the volunteers can taste the
difference between the diet and regular colas?
(a) The chance process is each volunteer guessing which sample is the diet cola.
Binary? Yes; guesses are either correct or incorrect. Independent? Yes; the results
of one volunteer’s guess tells us nothing about the results of other volunteers’
guesses. Number? Yes; there are 30 trials. Success? Yes; the probability of
guessing correctly is always 50%. Because X is counting the number of successful
guesses, X is a binomial random variable with n = 30 and p = 0.50.

(b) The mean of X is  X  np = 30(0.5) = 15, and the standard deviation of X is


 X  np(1  p)  30(0.5)(1  0.5)

= 2.74. If this experiment were repeated many times and the volunteers were
randomly guessing, the average number of correct guesses would be about 15.
Also, the number of correct guesses would typically vary by about 2.74 from the
mean (15).

(c) P(X > 23) = 1 – P(X < 22) = 22) = 1 – binomcdf(trials:30, p:0.5, x value:22) = 1 –
0.9974 = 0.0026. There is a very small chance that there would be 23 or more
correct guesses if the volunteers couldn’t tell the difference in the colas. Therefore,
we have convincing evidence that the volunteers can taste the difference.
Dead batteries
Almost everyone has one—a drawer that holds
miscellaneous batteries of all sizes. Suppose that
your drawer contains 8 AAA batteries but only 6 of
them are good. You need to choose 4 for your
graphing calculator. If you randomly select 4
batteries, what is the probability that all 4 of them
will work?
Using the Binomial distribution you get:
 4 4 0
P(X = 4) =   (0.75) (0.25) = 0.3164
 4
The correct answer is: 0.2143
6 5 4 3
= 0.2143
8 7 6 5
Because we are sampling without replacement, the
selections of batteries aren’t independent. We can ignore
this problem if the sample we are selecting is less than
10% of the population. However, in this case we are
sampling 50% of the population (4/8), so it is not
reasonable to ignore the lack of independence and use
the binomial distribution. This explains why the binomial
probability is so different from the actual probability.
NASCAR cards and cereal boxes
In the “NASCAR Cards and Cereal Boxes” example
from Section 5.1, we read about a cereal company
that put 1 of 5 different cards into each box of
cereal. Each card featured a different driver: Jeff
Gordon, Dale Earnhardt, Jr., Tony Stewart, Danica
Patrick, or Jimmie Johnson. Suppose that the
company printed 20,000 of each card, so there were
100,000 total boxes of cereal with a card inside. If a
person bought 6 boxes at random, what is the
probability of getting no Danica Patrick cards?
Let X be the number of Danica Patrick cards obtained from 6
different boxes of cereal. Because we are sampling without
replacement, the trials are not independent. The distribution of X
is not quite binomial—but it is close. If we assume X is binomial
with n = 6 and p = 0.2,

6
P(X = 0) =   (0.2)0 (0.8)6 = 0.262144
0
There is a 26.2% chance of getting no Danica Patrick cards if a
person bought 6 boxes at random.

The actual probability, using the general multiplication rule, is


P(no Danica Patrick cards)
 80,000  79,999  79,998  79,997  79,996  79,995 
 100,000  99,999  99,998  99,997  99,996  99,995  = 0.262134
      
Normal Approximations to binomial
distributions

• As the number of trials n gets larger, the


binomial distribution gets close to a normal
distribution
• When n is large, we can use normal
probability calculations to approximate hard
to calculate binomial probabilities.

30
Binomial Distributions in Statistical Sampling
The binomial distributions are important in statistics when we wish to
make inferences about the proportion p of successes in a population.

Almost all real-world sampling, such as taking an SRS from a population


of interest, is done without replacement. However, sampling without
replacement leads to a violation of the independence condition.

When the population is much larger than the sample, a count of


successes in an SRS of size n has approximately the binomial
distribution with n equal to the sample size and p equal to the proportion
of successes in the population.
10% Condition
When taking an SRS of size n from a population of size N, we can use a
binomial distribution to model the count of successes in the sample as
long as 1
n£ N
10
Normal Approximations for Binomial Distributions
As n gets larger, something interesting happens to the shape of a
binomial distribution. The figures below show histograms of binomial
distributions for different values of n and p.
What do you notice as n gets larger?

Normal Approximation For Binomial Distributions: The Large Counts Condition


Suppose that X has the binomial distribution with n trials and success
probability p. When n is large, the distribution of X is approximately
Normal with mean and standard deviation
mX = np s X = np(1- p)
As a rule of thumb, we will use the Normal approximation when n is so
large that np ≥ 10 and n(1 – p) ≥ 10. That is, the expected number of
successes and failures are both at least 10.
N (np, np(1  p))
Teens and debit cards
In a survey of 506 teenagers aged 14 to 18, subjects
were asked a variety of questions about personal
finance (http://www.nclnet.org/personal-finance/66-
teens-and-money/120-ncl-survey-teens-and-financial-
education). One question asked teens if they had a debit
card.
Problem: Suppose that exactly 10% of teens aged 14 to
18 have debit cards. Let X = the number of teens in a
random sample of size 506 who have a debit card.
(a) Show that the distribution of X is approximately
binomial.
(b) Check the conditions for using a Normal
approximation in this setting.
(c) Use a Normal distribution to estimate the probability
that 40 or fewer teens in the sample have debit cards.
(a) Binary? Yes; teens either have a debit card or they don’t.
Independent? No; because we are sampling without replacement.
However, because the sample size (n = 506) is much less than 10% of
the population size (there are millions of teens aged 14 to 18), the
responses will be very close to independent. Number? Yes; there is a
fixed sample size, n = 506. Success? Yes; the unconditional probability
of selecting a teen with a debit card is 10%.

(b) We need to check the Large Counts condition. Because np =


506(0.10) = 50.6 and n(1 – p) = 506(0.90) = 455.4 are both at least 10,
we should be safe using the Normal approximation.
(c) Step 1: State the distribution and the values of interest.
 X = np = 506(0.10) = 50.6 and  X  np(1  p)  506(0.1)(0.9)

= 6.75. Thus, X is approximately Normally distributed with mean


50.6 and standard deviation 6.75. We want to find P(X ≤ 40).

Step 2: Perform calculations—show your work! Standardizing


the boundary value gives
40  50.6 = –1.57. Using Table A, P(Z ≤ –1.57) = 0.0582. Using
z
6.75 technology: The command normalcdf(lower:–1000,
upper:40, μ:50.6, : 6.75) gives an area of 0.0582.
(Note: The probability using the binomial distribution
is 0.064).

Step 3: Answer the question. There is about a 6% chance that 40 or


fewer teens in a sample of size 506 will have a debit card.
Warmup
1. (#6.76 and 78) Suppose you purchase a bundle of 10
bare-root rhubarb plants. The sales clerk tells you that
5% of these plants will die before producing any
rhubarb. Assume that the bundle is a random sample
of plants and that the sales clerk’s statement is
accurate. Let Y = the number of plants that die before
producing any rhubarb.
a. Find P(Y=1)
b. Would you be surprised if 3 or more of the plants in
the bundle die before producing any rhubarb?
2. (#6.86) Engineers define reliability as the probability
that an item will perform its function under specific
conditions for a specific period of time. A certain model
of aircraft engine is designed so that each engine has
probability 0.999 of performing properly for an hour of
flight. Company engineers test an SRS of 350 engines of
this model. Let X = the number of that operate for an
hour without failure.
a. Explain why X is a binomial random variable.
b. Find the mean and standard deviation of X. Interpret
each value in context.
c. Two engines failed the test. Are you convinced that
this model of engine is less reliable than it’s
supposed to be? Compute P(X<348) and use the
result to justify your answer.
Consider the following situations:
• Flip a coin until you get a head
• Roll a die until you get a 3
• In basketball, attempt a three-point
shot until you make a basket
Geometric Settings
In a binomial setting, the number of trials n is fixed and the
binomial random variable X counts the number of successes.

In other situations, the goal is to repeat a chance behavior


until a success occurs. These situations are called
geometric settings.

A geometric setting arises when we perform independent


trials of the same chance process and record the number of
trials it takes to get one success. On each trial, the probability
p of success must be the same.
Geometric Settings
In a geometric setting, if we define the random variable Y to be the
number of trials needed to get the first success, then Y is called a
geometric random variable. The probability distribution of Y is called
a geometric distribution.

The number of trials Y that it takes to get a success in a geometric setting


is a geometric random variable. The probability distribution of Y is a
geometric distribution with parameter p, the probability of a success on
any trial. The possible values of Y are 1, 2, 3, . . . .

Like binomial random variables, it is important to be able to


distinguish situations in which the geometric distribution does and
doesn’t apply!
A probability distribution table for the geometric
random variable is a bit different because it
never ends. The probabilities are the terms of a
geometric sequence
X 1 2 3 4 5 6…
P(X) p (1-p)p (1-p)2p (1-p)3p (1-p)4p (1-p)5p…
Geometric Probability Formula
The Lucky Day Game. The random variable of interest in this game is Y =
the number of guesses it takes to correctly match the lucky day. What is the
probability the first student guesses correctly? The second? Third? What is
the probability the kth student guesses correctly?

P(Y =1) =1/7


P(Y = 2) = (6/7)(1/7) = 0.1224
P(Y = 3) = (6/7)(6/7)(1/7) = 0.1050
Geometric Probability Formula
If Y has the geometric distribution with probability p of success on each
trial, the possible values of Y are 1, 2, 3, … . If k is any one of these
values,
P(Y = k) = (1- p) p
k-1
Example
Roll a die until you see a 6.
• The probability of rolling a 6 =
1/6
• The probability of rolling the
first 6 on the first roll:
P(X=1) = 1/6.
geometpdf(1/6,1)
Calculating Probabilities
• The probability of rolling the first 6
after the first roll:
P(X>1)=1-1/6.
1-geometpdf(1/6,1)

• The probability of rolling the first 6


on the second roll:
P(X=2)=(5/6)*(1/6).
geometpdf(1/6,2)
Calculating Probabilities
• The probability of rolling the first 6
on the second roll or before:
P(X<2)=(1/6) +(5/6)*(1/6)
geometcdf(1/6,2)
• The probability of rolling the first 6
after the second roll:
P(X>2)=1-((1/6) +(5/6)*(1/6))
1-geometcdf(1/6,2)
Better formulas?
n 1
P( X  n)  pq , geometpdf(p,n)
P( X  n)  1  p  , 1-geometcdf(p,n)
n

P( X  n)  1  1  p  , geometcdf(p,n)
n
Monopoly
In the board game Monopoly, one way to get out of jail is to roll
doubles. Suppose that this was the only way a player could get out of
jail. The random variable of interest in this example is Y = number of
attempts it takes to roll doubles one time.
(a) Find the probability that it takes 3 turns to roll doubles.
P(Y = 3) = (5/6)2(1/6) = 0.116
(b) Find the probability that it takes more than 3 turns to roll doubles,
and interpret this value in context.
Because there are an infinite number of possible values of Y greater
than 3, we will use the complement rule.
P(Y > 3) = 1 – P(Y < 3) =
= 1 – P(Y = 3) – P(Y = 2) – P(Y = 1)
= 1 – (5/6)2(1/6) – (5/6)1(1/6) – (5/6)0(1/6)
= 0.5787. or on calc: 1 - geometcdf (p: 1/6, x: 3)=0.579
If a player tried to get out of jail many, many times by trying to roll
doubles, about 58% of the time it would take more than 3 attempts.
Mean of a Geometric Random Variable
The table below shows part of the probability distribution of Y. We can’t
show the entire distribution because the number of trials it takes to get
the first success could be an incredibly large number.
yi 1 2 3 4 5 6 …
pi 0.143 0.122 0.105 0.090 0.077 0.066

Shape: The heavily right-skewed shape is characteristic


of any geometric distribution. That’s because the most
likely value is 1.

Center: The mean of Y is µY = 7. We’d expect it to take


7 guesses to get our first success.

Spread: The standard deviation of Y is σY = 6.48. If the class played the Lucky Day game
many times, the number of homework problems the students receive would differ from 7
by an average of 6.48.

Mean (Expected Value) Of A Geometric Random Variable


If Y is a geometric random variable with probability p of success on each
trial, then its mean (expected value) is E(Y) = µY = 1/p.

You might also like