You are on page 1of 41

2.4.

1 Estimating the Population


Mean
After we take a random sample from a population, that sample’s mean is our best estimate
of that population’s mean. However, the sample mean is only a point estimate. To make
decisions as managers, we need more than a point estimate. We also need to have a
measure of the accuracy of our estimate—in this case, how close our estimate is to the true
population mean. Based on the sample mean and a specified level of confidence, we can
create a confidence interval.

A confidence interval depends on the sample’s mean, standard deviation, and sample size.
As we’ll see, a confidence interval also depends on how “confident” we would like to be that
the range contains the true mean of the population.  For example, if our sample mean is 50,
we might calculate a 95% confidence interval from 42 to 58.  We can be 95% confident that
our interval contains the true population mean. 


 Using the properties of
the normal distribution
 and the central
limit theorem, we
 can construct a range of values
that is very likely to contain
 the true population mean.
 Specifically, for a
normal distribution,
 we know that if we select
any value at random,
 there is a 95%
chance that it will
 be within about two
standard deviation
 of that distribution's mean.
 Now, let's apply that
logic to a sample
 mean in the context
of a particular normal
 distribution-- the
distribution of sample means.
 If we take a sufficiently
large sample--
 let's say at least 30 points--
 there is a 95% chance that
the mean of that sample
 falls within about two
standard deviations
 of the mean of the
distribution of sample means.
 But the Central Limit
Theorem tells us
 that the mean of the
distribution of sample means
 is the same as the
population mean.
 So we can conclude that
the mean of our sample
 is within about two
standard deviations
 of the true population mean.

 Remember that the standard
deviation of the distribution
 of sample means is different
from that of the population
 distribution.
 Specifically, for the
distribution of sample means,
 two standard deviations equals
2 times the population standard
 deviation divided by the square
root of n, our sample size.

 Let's build this up
step by step to make
 sure we understand the logic.
 First, we take a sample
from a population
 and compute its mean.
 We know that the
sample's mean is
 a point on the distribution
of sample means.
 Thus, there is a 95% chance
that this sample mean
 is within two standard
deviations of the mean
 of that distribution.
 Equivalently, we can conclude
that there is a 95% chance
 that the sample mean is
within two standard deviations
 of the true population mean.
 Next, we'll turn this around
and look at an interval centered
 at a sample mean because that's
what a confidence interval is.
 Let's look at two samples.
 First, let's look at
a sample whose mean
 falls outside the two
standard deviation range.
 Since the sample mean
is outside the range,
 it must be more than
two standard deviations
 away from the population mean.
 Therefore, an interval of
width two standard deviations
 around the sample mean could
not contain the true population
 mean.
 Since we're using a
95% confidence level,
 we know that 5% of
all sample means
 should fall outside the two
standard deviation range
 around the population mean.
 Therefore, 5% of all
samples we obtain
 will have confidence
intervals that do not contain
 the true population mean.
 Now, let's think about
the remaining 95%
 of samples whose means
do fall within the two
 standard deviation range
around the population mean.
 If we take one of those
samples, the interval
 that's about two
standard deviations wide,
 must contain the
true population mean.
 Putting these two
cases together,
 we conclude that
95% of all samples
 will have confidence
intervals that contain
 the true population mean.
 Importantly, we are
not saying that 95%
 of the time, our sample
mean is the population mean.
 What we are saying is that
for 95% of all random samples,
 a range that is two
standard deviations wide
 and is centered
at the sample mean
 contains the true
population mean.
 To visualize the general concept
of a confidence interval,
 imagine taking 20 different
samples from a population
 and drawing a
confidence interval
 around each sample's mean.
 On average, 95% of these
intervals, or 19 out of 20,
 would actually contain
the true population mean.

A 95% confidence level should be interpreted as saying that if we took 100 samples from a
population and created a 95% confidence interval for each sample, on average 95 of the
100 confidence intervals (that is, 95%) would contain the true population mean.

Often people misinterpret a confidence interval’s meaning, thinking that for a 95%
confidence interval, there is “a 95% probability (or 95% chance) that the confidence interval
contains the true population mean.”  This is false. For any given sample and a
corresponding confidence interval, the population mean either is or is not in the confidence
interval.  Because we don't know the true population mean, we cannot say whether a
particular sample’s confidence interval is one that contains the population mean.
However, we do know that, on average, 95 out of 100 such confidence intervals do contain
the true mean, which is why we say we're 95% confident that our interval does.

Changing the Confidence Level


What if we wish to construct an interval based on a level of confidence other than 95%?
Higher levels of confidence require wider confidence intervals that have a higher likelihood
of containing the true population mean; lower levels of confidence require narrower
confidence intervals. For example, a 68% confidence interval would be approximately half
the width of a 95% confidence interval; its width is only one times the standard deviation of
the distribution of sample means.
We know that the Distribution of Sample Means is a normal distribution with standard

deviation  . Thus, we can infer that a confidence interval must cover a distance of  on
either side of the sample mean, where   is the z-value associated with our specified level of
confidence. (For example, the z-value associated with the 95% confidence interval is
approximately 2, 1.96 to be exact.) However, since we rarely know the standard deviation of
the underlying population,  , we will substitute our best estimate of  , which is

the sample standard deviation,  . The term  , estimated by  , is often referred to as


the confidence interval’s “margin of error.”

The equation for the confidence interval is shown on the graphic below. Move the sliders to
adjust the sample size and confidence level and see how the width of the confidence
interval changes.
x¯±zsn√

Which of the following would increase the width of the confidence interval? Select all that
apply.
Suppose that you have a sample with a mean of 50. You construct a 95% confidence
interval and find that the lower and upper bounds are 42 and 58. What does this 95%
confidence interval around the sample mean indicate? Select all that apply.

95% of the population distribution lies between 42 and 58.


The 95% confidence interval estimates the population mean and does not tell us about the
distribution of the population.
There is a 95% chance that the sample mean lies between 42 and 58.
We construct a confidence interval around the sample mean to draw conclusions about the
population mean, not the sample mean. Indeed, we know the sample mean is 50, so there
is a 100% chance that the range includes the sample mean.
There is a 95% chance that the population mean lies between 42 and 58.
There is not a 95% chance that the population mean lies between 42 and 58. If multiple
95% confidence intervals were calculated to estimate the population mean, on average,
95% of these confidence intervals would contain the true mean. The confidence interval’s
level of confidence does not tell us the chance, probability, or likelihood that an individual
confidence interval contains the true population mean.
We are 95% confident that the population mean lies between 42 and 58. (CORRECT)
The 95% confidence interval is a range around the sample mean. We can say that we
are 95% confident that the true population mean is within this range, based on the methods
we used to calculate the range. If we were to construct similar intervals for 100 samples
drawn from this population, on average 95 of the intervals will contain the true
population mean.
2.4.2 Large Samples
We know that if we have a large enough sample size (typically defined as greater than 30
data points), we can construct a confidence interval using properties of the normal
distribution such as z-values, normal distribution rules of thumb, etc. For large samples,
Excel consolidates the multiple steps for building a confidence interval into a single function,
CONFIDENCE.NORM.

=CONFIDENCE.NORM(alpha, standard_dev, size)

 alpha, the significance level, equals one minus the confidence level (for
example, a 95% confidence interval would correspond to the significance
level 0.05).
 standard_dev is the standard deviation of the population distribution. We will
typically use the sample standard deviation,  , which is our best estimate of
our population’s standard deviation.
 size is the sample size,  .

CONFIDENCE.NORM returns the margin of error,  , where   is the z-value associated
with the specified level of confidence. The lower and upper bounds of the confidence
interval are equal to the sample mean, plus or minus that margin of error. Thus the
confidence interval is:

Related: Alternative Excel Functions

CONFIDENCE.NORM replaces the function:

=CONFIDENCE(alpha, standard_dev, size)

Throughout the course we will provide alternative functions that existed prior to Excel 2010
that can still be used in Excel 2010.
Step 1

Calculate the sample mean and standard deviation using either Excel’s descriptive statistics
tool or the functions AVERAGE and STDEV.S and enter those values into
cells D2 and D3 respectively.

Step 2

In cell D4, enter the function =CONFIDENCE.NORM(0.05,D3,100) to calculate the


confidence interval’s margin of error.

 We link to the standard deviation by entering D3 to ensure that the actual value is
used rather than the rounded value that is displayed. (To see the full value you can
double-click on the cell.) Remember that in this course, your answers will be
deemed incorrect if you do not capture the actual values by linking to cells.
 Note that 100 is the sample size. If you didn’t know the sample size, you could use
the COUNT function to count the number of numerical values in column A.

The margin of error for the 95% confidence interval is


CONFIDENCE.NORM(0.05,D3,100)=1.25.

Step 3

In cell D5, subtract the margin of error from the sample mean to calculate the lower bound
of the 95% confidence interval.

Step 4

In cell D6, add the margin of error to the sample mean to calculate the upper bound of the
95% confidence interval.

The lower bound of the 95% confidence interval is the mean minus the margin of error, D2–
D4=25.64. The upper bound of the 95% confidence level is the mean plus the margin of
error, D2+D4=28.14.

Interpret the 95% confidence interval for average BMI. What do the lower and upper bounds
of the confidence interval tell us?
The average BMI for the sample is 26.89. The lower bound for a 95% confidence level is
25.64 and the upper bound is 28.14. This means that if we keep taking samples out of the
entire population, 95 out of 100 samples will contain the true population mean and 5 will not.
So we are 95% confident that the true population mean will fall within the range between
25.64 and 28.14.
I hope I got this right...... --Dee+11

They tell us that we have a 95% 'confidence' that the population mean lies within the
confidence interval. In other words, if we took 100 samples from the population and
calculated the alpha = 0.05 CI for each of them, on average we would expect to have 95 of
those CIs to contain the true population mean --Michael+6

The 95% confidence interval suggests that for 95% of all random samples, a range that is
two standard deviations wide and is centered at the sample mean contains the true
population mean. Therefore, for 95 out of 100 random samples, the true population mean
will be between the lower and upper bounds. –Andrew+6

The 95% confidence interval for average BMI shows that we are 95% confident that the BMI
of the population will be between the range calculated, meaning it will be between the lower
and upper values.
The lower and upper bounds tell us that we are confident that 95% of the population will be
between this interval. Jefferson+5
You can separate the calculations into separate formulas or you can combine them into one
calculation. The lower bound of the 95% confidence interval is the sample mean minus the
margin of error, that is B1–CONFIDENCE.NORM(0.05,B2,B3)=222.52. The upper bound of
the 95% confidence interval is the sample mean plus the margin of error, that is
B1+CONFIDENCE.NORM(0.05,B2,B3)=227.48.

Based on our calculated values from above, we are 95% confident that the true population
mean lies between 222.52 and 227.48. That is, on average, 95% of the confidence intervals
constructed in this manner would contain the true population mean.
You can separate the calculations into separate formulas or you can combine them into one
calculation. The lower bound of the 99% confidence interval is the sample mean minus the
margin of error, that is B1–CONFIDENCE.NORM(0.01,B2,B3)=221.74. Similarly, the upper
bound of the 99% confidence interval is the sample mean plus the margin of error, that is
B1+CONFIDENCE.NORM(0.01,B2,B3)=228.26. 

Based on our calculated values from above, we are 99% confident that the true population
mean lies between 221.74 and 228.26. Notice that the width of the 99% confidence interval
is larger than the width of the 95% confidence interval that we previously calculated, which
ranged from 222.52 to 227.48.
2.4.3 Small Samples
So far, we have assumed in our confidence interval calculations that the sample size is
“large.” Convention defines “large” as greater than or equal to 30 observations. But what if
our sample size is small? Is there a way to estimate the population mean even if we have
only a handful of data points? Can we still create a confidence interval?

It depends: if we don’t know anything about the underlying population, we cannot create a
confidence interval with fewer than 30 data points because the properties of the Central
Limit Theorem may not hold.

However, if the underlying population is roughly normally distributed, we can use a


confidence interval to estimate the population mean as long as we modify our approach
slightly. We can gain insight into whether a data set is approximately normally distributed by
looking at the shape of a histogram of that data. There are formal tests of normality that are
beyond the scope of this course.

To estimate the population mean with a small sample, we use a t-distribution instead of a
“z-distribution”, that is, a normal distribution. A t-distribution looks similar to a normal
distribution but is not as tall in the center and has thicker tails. These differences reflect that
fact that a t-distribution is more likely than a normal distribution to have values farther from
the mean. Therefore, the normal distribution’s “rules of thumb" do not apply. The shape of a
t-distribution depends on the sample size; as the sample size grows towards 30, the t-
distribution becomes very similar to a normal distribution.

Assuming the same level of confidence, how does the width of the confidence interval for
small samples compare with that for large samples?

It is wider (CORRECT)
Smaller samples have greater uncertainty, which means wider confidence intervals.
It is narrower
See correct answer for explanation.
Drill Down: Descriptive Statistics and Confidence Intervals

Another way to find the confidence interval is to use the descriptive statistics tool we
learned about earlier in the course. We just check the Confidence Level for Mean box and
enter the desired level of confidence. The resulting output table calculates the margin of
error which we can then add to and subtract from the sample mean to find the confidence
interval range. The descriptive statistics tool uses CONFIDENCE.T to calculate the margin
of error, thus, we typically don’t use the descriptive statistics tool for large samples because
the t-distribution is more conservative than the normal distribution. However, for very large
samples, the tool generates a confidence interval that’s very close to the results of
CONFIDENCE.NORM.

Spreadsheet: Calculating Confidence Intervals


for Small Samples
Let’s return to our estimate of the average BMI of U.S. adults. Suppose now that we
gathered information from only 15 people and we want to calculate a 95% confidence
interval. Suppose our new sample has a mean of 25.97 kg/m 2 and a standard deviation of
7.10 kg/m2.

Step 1

In cell B5, enter the function =CONFIDENCE.T(0.05,B2,B3). This calculates the margin of


error, which we will add and subtract from the sample mean.

The margin of error for the 95% confidence interval is CONFIDENCE.T(0.05,B2,B3)=3.93.

Step 2

In cell B6, subtract the margin of error from the sample mean to calculate the lower bound
of the 95% confidence interval. Enter =B1–B5.

Step 3

In cell B7, add the margin of error to the sample mean to calculate the upper bound of the
95% confidence interval. Enter =B1+B5.
The lower bound of the 95% confidence interval is the mean minus the margin of error, B1–
B5=22.04. The upper bound of the 95% confidence interval is the mean plus the margin of
error, B1+B5=29.90.

The lower bound is approximately 22.04 and the upper bound is 29.90, so we can be 95%
confident that the true mean BMI of all U.S. citizens is between 22.04 kg/m2 and 29.90
kg/m2. Recall that the lower and upper bounds for the 95% confidence interval when the
sample size was 100 were 25.64 kg/m2 and 28.14 kg/m2, respectively. The results are
different for two reasons: the mean and standard deviation of the samples are not equal
since the samples contain different data and the sample sizes are different. Note that even
if the mean and standard deviation of the small sample were the same as those of the large
sample, the confidence intervals would still be different. This is because there is greater
uncertainty when dealing with smaller samples. Thus, in either case, the width of the
confidence interval for the smaller sample would be wider.

Drill Down: Finding the t-value in Excel

The function T.INV.2T can find the t-value for a desired level of confidence.

=T.INV.2T(probability, degrees_freedom)

 probability is the significance level, that is, 1–confidence level, so for a 95%
confidence interval, the significance level=0.05.
 degrees_freedom is the number of degrees of freedom, which in this case is simply
the sample size minus one, or n–1.

For example, for the BMI example where the confidence level was 95% and n=15, the t-
value would be T.INV.2T(0.05,14)=2.14.

T.INV.2T replaces the function:

=TINV(probability, degrees_freedom)
2.4.4 Choosing a Sample Size
Comparing the BMI 95% confidence intervals shows the impact of sample size on the width
of confidence intervals. In general, we know that the larger the sample size, the tighter the
confidence interval. Now we would like to determine how to find the right sample size to
ensure a desired level of accuracy, that is, to ensure that the confidence interval’s width is
less than a specified quantity. 

For example, suppose we want to construct a 95% confidence interval for the true mean
BMI that has a margin of error of 1 kg/m2. That is, our desired level of accuracy is 1 kg/m2;
we want to be 95% confident that our sample mean is within 1 kg/m2 of the true
population mean. How large does our sample size need to be in order to produce this
level of accuracy with 95% confidence?  

For now, let’s focus on samples that are large enough to warrant use of the normal
distribution. Remember that for large samples, the lower and upper bounds of a confidence

interval are given by  . The sample size gives us a confidence interval that extends

a distance,  , on either side of the mean. The distance,  , is the confidence


interval’s margin of error.

To find the sample size necessary to ensure a specified margin of error is less than or equal
to a given distance,  , we just rearrange the equation and solve for the sample size,  .

Recall that we usually don’t know  , the true standard deviation of the population.
Moreover, when determining the appropriate sample size, we typically would not have even
taken a sample yet, so we don’t have a sample standard deviation. In a case like this, we
could take a preliminary sample and use that sample’s standard deviation,  , as an
estimate of  . Thus, to ensure that the margin of error is less than  , the sample size must
satisfy:
2.4.5 Estimating the Population
Proportion
Sometimes our sample does not consist of numerical values. For example, if we pose a
“yes" or "no” question, the data will consist only of yes and no answers. Even though our
question has only two possible responses, we still have to address an inherent uncertainty:
how often will each response occur? In such cases, we usually convey the survey results by
reporting  , the percent of the total number of responses that were “yes” responses.  is
our best estimate of our variable of interest,  , the true percentage of "yes" responses
in the underlying population. Because every respondent must answer “yes” or “no”, we
know that the percentage of “no” responses equals 

Before we can calculate the confidence interval for the proportion of "yes" or "no" answers,
we need to calculate . The easiest way to do this is to assign a “dummy variable” to each
response—a variable that can take on only the values 0 and 1. (Dummy variables are also
called indicator variables or binary variables; we will learn more about them in the
regression modules.) In this case, we’ll assign 1 to every "yes" response and 0 to every "no"
response. (Typically we assign 1 to our value of interest, in this case the proportion of "yes"
responses.) To do so, we use Excel’s IF function.

=IF(logical_test,[value_if_true],[value _if_false])

To make this function assign the value 1 if the referenced cell is “Yes,” and 0 if the
referenced cell is not (in this case, if the cell referenced is “No”), we would enter the IF
function for every observation. In this example, the following formula refers to the first
observation in cell A2.

=IF(A2=“Yes”,1,0)

Be sure to use the correct formula syntax when creating dummy variables.  For example, if
you place quotation marks around the numbers in your formula, that is, if you
write =IF(A2=“Yes”,"1","0") instead of writing =IF(A2=“Yes”,1,0), the misconfigured
formula will produce a 1 or 0 result that looks like a number, but does not perform like a
number.  The quotation marks in the formula tell the spreadsheet to treat the 1 or 0 as text,
rather than as a numeric value.  If you then try to perform any calculations on these text-
formatted values, you will get an error.

Question 1 of 3Next Question

Suppose we want to estimate what proportion of all students plan to enroll in a new online
course. To construct a confidence interval around the sample proportion, we begin by
translating the responses into a dummy variable.

Step 1

In cell B2, enter the function =IF(A2="Yes",1,0).

 This function says that if cell A2 equals “Yes”, then enter a 1 in cell B2 and if


cell A2 does not equal "Yes", then enter a 0 in cell B2.

Step 2

Copy and paste the formula from cell B2 into cells B3:B51.

 This assigns a dummy variable value in column B for each data point in column A.
 To use auto-fill, enter the first value in cell B2. Highlight B2 and place your cursor at
the bottom right-hand corner of the cell. The cursor will turn into a black cross. Drag
the cross down the column until you reach cell B51. When you release the mouse,
the values will auto-fill.
There should be a 1 or 0 in cells B2:B51. In cell B2, enter the function =IF(A2="Yes",1,0),
then copy and paste this function into cells B3:B51.

The mean of our sample is 0.60, or 60%, and the standard deviation is 0.49, or 49%. You
can use the descriptive statistics tool or AVERAGE(B2:B51) and STDEV.S(B2:B51).

Step 4

Now calculate the 95% confidence interval using the appropriate formula for this sample
size. Remember to add and subtract the margin of error from the sample mean to calculate
the lower and upper bounds of the confidence interval.
You can separate the calculations into individual steps or you can combine them into one
function. Since the sample size is greater than 30, we can use the CONFIDENCE.NORM
function. The margin of error is CONFIDENCE.NORM(0.05,E3,50)=0.14, or 14%. The lower
bound of the 95% confidence interval is the sample mean minus the margin of error, E2–
CONFIDENCE.NORM(0.05,E3,50)=0.60–0.14=0.46, or 46%. Similarly, the upper bound of
the 95% confidence interval is the sample mean plus the margin of error,
E2+CONFIDENCE.NORM(0.05,E3,50)=0.60+0.14=0.74, or 74%.

Formula for Confidence Intervals around the Proportion


Verifying Sample Size for Low Probability
Events
Sample size is particularly important when dealing with very small (or very large)
proportions. Suppose we are sampling to find the prevalence of Amyotrophic Lateral
Sclerosis (ALS), a disease commonly known as Lou Gehrig’s disease. In the United States,
an estimated six to eight people per 100,000 have ALS. That is, the likelihood that a person
in the U.S. has ALS is between 0.00006 and 0.00008, or between 0.006% and 0.008%.
Would our sample be useful if we surveyed 100 people? No. Since the proportion we are
estimating is very small, we need to have a large enough sample to make sure that it
includes at least SOME people with the disease. Otherwise, we will not have enough data
to obtain a good estimate of the true proportion. The following guidelines are typically used
when estimating proportions to ensure that a sample is large enough to provide a good
estimate. The sample size   must be large enough to satisfy both conditions:

Spreadsheet
Let’s check to make sure that 50—the sample size we used to estimate the proportion of
students that plan to enroll in the new online course—is large enough to satisfy both
guidelines. We know that  =50 and earlier we determined that  =0.60. Now we just need to
check each inequality.
2.4 Summary
Lesson Summary
The sample mean is only a point estimate. We can construct a range around the sample
mean, called a confidence interval, which contains the true population mean with a certain
level (e.g., 95%) of confidence.  For a 95% confidence interval, on average, 95% of
samples drawn from the population will have the population mean within the confidence
interval. Note that a confidence interval’s level of confidence does not tell us the chance,
probability, or likelihood that an individual confidence interval contains the true population
mean.

 The width of the confidence interval depends on the level of confidence, our
best estimate of the population standard deviation, and the sample size. We
control only the level of confidence and the sample size.
 For large samples, the lower and upper bounds are calculated using the
following equation:

o The function CONFIDENCE.NORM calculates the margin of error,


which we add and subtract from the sample mean to find the
confidence interval.
 For small samples, the lower and upper bounds are calculated using the
following equation:

o For small samples, we use a t-distribution, which is shorter and wider


than a normal distribution. The t-distribution provides a wider range, a
more conservative estimate of where the true population mean lies.
o The function CONFIDENCE.T calculates the margin of error, which we
add and subtract from the sample mean to find the confidence
interval.
 We can also calculate confidence intervals for proportions.
o To do so, we must convert data to dummy (0, 1) variables. After that,
we can proceed as we would with any other confidence interval.
o When estimating the true population proportion, we should ensure that
the sample size is large enough by checking that both of the following
conditions are true:  , and  . If either of these
guidelines is not satisfied, we must collect a larger sample.
 Calculating Confidence Intervals

Excel Summary
 =CONFIDENCE.NORM(alpha, standard_dev, size)
 =CONFIDENCE.T(alpha, standard_dev, size)
 =IF(logical_test,[value_if_true],[value_if_false])
The margin of error is based on the significance level (1-confidence level, or 1-0.95=0.05),
the standard deviation (in B2) and the sample size (in B3). We can compute the margin of
error using the Excel function CONFIDENCE.NORM(0.05,B2,B3). The lower bound of the
95% confidence interval is the sample mean minus the margin of error, that is B1–
CONFIDENCE.NORM(0.05,B2,B3)=15-0.39=14.61. The upper bound of the 95%
confidence interval is the sample mean plus the margin of error, that is
B1+CONFIDENCE.NORM(0.05,B2,B3)=15+0.39=15.39. You must link directly to cells to
obtain the correct answer.

Calculate the 99% confidence interval for the true population mean for the BMI data. Recall
that the new sample contains 15 people and has a mean of 25.97 kg/m 2 and a standard
deviation of 7.10 kg/m2.

Because our sample has fewer than 30 cases, we cannot assume that the distribution of
sample means will be normal, and must use the t-distribution. The margin of error is based
on the significance level (1-confidence level, or 1-0.99=0.01), the standard deviation (in B2)
and the sample size (in B3).  We can compute the margin of error using the Excel function
CONFIDENCE.T(0.01,B2,B3). The lower bound of the 99% confidence interval is the
sample mean minus the margin of error, that is B1–CONFIDENCE.T(0.01,B2,B3)= 25.97-
5.46=20.51. The upper bound of the 99% confidence interval is the sample mean plus the
margin of error, that is B1+CONFIDENCE.T(0.01,B2,B3)= 25.97+5.46=31.43. We can be
99% confident that the true mean BMI of all U.S. citizens is between 20.51 kg/m 2 and 31.43
kg/m2. You must link directly to cells to obtain the correct answer.
Calculate the 80% confidence interval for the true population mean based on a sample with
=225, s=8.5, and n=45.

The margin of error is based on the significance level (1-confidence level, in this case,
100%-80%=20%), the standard deviation (in cell B2) and the sample size (in cell B3).  We
can compute the margin of error using the Excel function
CONFIDENCE.NORM(0.20,B2,B3)=1.62. The lower bound of the 80% confidence interval
is the sample mean minus the margin of error, that is B1–
CONFIDENCE.NORM(0.20,B2,B3)=225-1.62=223.38. The upper bound of the 80%
confidence interval is the sample mean plus the margin of error, that is
B1+CONFIDENCE.NORM(0.20,B2,B3)= 225+1.62=226.62. You must link directly to cells to
obtain the correct answer.
Calculate the 90% confidence interval for the true population mean based on a sample
with  =15, s=2, and n=100.

You can separate the calculations into separate formulas or you can combine them into one
calculation. The margin of error is based on the significance level (1-confidence level, or 1-
0.90=0.1), the standard deviation (in B2), and the sample size (in B3). We can compute the
margin of error using the Excel function CONFIDENCE.NORM(0.10,B2,B3). The lower
bound of the 90% confidence interval is the sample mean minus the margin of error, that is
B1–CONFIDENCE.NORM(0.10,B2,B3)=14.67. The upper bound of the 90% confidence
interval is the sample mean plus the margin of error, that is
B1+CONFIDENCE.NORM(0.10,B2,B3)=15.33. You must link directly to cells to obtain the
correct answer.
Calculate the 90% confidence interval for the true population mean based on a sample
with  =15, s=2, and n=20.

Because our sample has fewer than 30 cases, we cannot assume that the distribution of
sample means will be normal, and must use the t-distribution. The margin of error is based
on the significance level (1-confidence level, or 1-0.90=0.10), the standard deviation (in B2)
and the sample size (in B3), and We can compute the margin of error using the Excel
function CONFIDENCE.T(0.10,B2,B3). The lower bound of the 90% confidence interval is
the sample mean minus the margin of error, that is B1–CONFIDENCE.T(0.10,B2,B3)=15-
0.77=14.23. The upper bound of the 90% confidence interval is the sample mean plus the
margin of error, that is B1+CONFIDENCE.T(0.10,B2,B3)= 15+0.77=15.77. You must link
directly to cells to obtain the correct answer.

To better assess student understanding of confidence intervals, a professor gives a test to a


random sample of 15 students. The grades for those students are provided below.
Calculate the 90% confidence interval for the true average grade on the text.
First, calculate the mean and standard deviation of the sample grades using formulas
=AVERAGE(A2:A16) and =STDEV.S(A2:A16) in any of the open cells. The values are
approximately 76.93 and 11.73 respectively. 

Second, calculate the margin of error. Because the sample size is less than 30, use the
function CONFIDENCE.T(alpha, standard_dev, size) to find the margin of error using the t-
distribution. Here, alpha is 0.1 and the sample size is 15. For the standard deviation value,
you need to reference the cell in which you calculated the standard deviation.  The result is
approximately =CONFIDENCE.T(0.1,11.73,15) = 5.34. 

Alternatively, you could use the Descriptive Statistics tool to calculate the mean, standard
deviation, and margin of error. To include the margin of error calculation to the Descriptive
Statistics output, check the “Confidence Interval” box and adjust the confidence level to
90%. Note that the Descriptive Statistics tool uses CONFIDENCE.T by default to calculate
the margin of error.  

The lower bound of the 90% confidence interval is the mean minus the margin of error,
approximately 76.93–5.34=71.60. The upper bound of the 90% confidence interval is the
mean plus the margin of error, approximately 76.93+5.34=82.27. You must link directly to
cells in all of your calculations in order to obtain the correct answer.
You can also use a single formula to complete each of the calculations:
=AVERAGE(A2:A16)-CONFIDENCE.T(0.1, STDEV.S(A2:A16),15) for the lower bound, and
=AVERAGE(A2:A16)+CONFIDENCE.T(0.1, STDEV.S(A2:A16),15) for the upper bound.

2.5.1 Amazon's Inventory Sampling


 So I have a great example
of, one of our warehouses
 wanted to save some money and
invented a new kind of divider
 in between inventory on
a shelf, and this divider
 was made of cardboard,
which was inexpensive.
 And they installed
these dividers
 across the whole portion
of the warehouse.
 They were very
proud of the work,
 and we were proud of
their scrappiness.
 And unfortunately though,
after a couple of months,
 we saw inventory defects
spike in this part
 of this particular warehouse.
 We went back to do some deep
dive analytics and inspection
 to understand why.
 And it turned out that as
pickers were removing items,
 they were bumping the
items next to them,
 and slowly they would flatten
some of the cardboard.
 And so, items from one bin
were drifting into another bin.
 And when we assigned an
inventory checker to go out
 to check to see if
the item was there,
 it was no longer in the
place it was supposed to be.
 It was in the one next to it,
and that counts as a defect.
 Because when a
picker comes, they're
 only looking in the
zone where the software
 tells them to pick.
 If it's not there, they note
it as a defect and move on.
 So we actually were introducing
more customer defects because
 of this change that was made
to the process in an attempt
 to improve it, and we discovered
it through our sampling.

Spreadsheet
Let’s analyze the inventory sampling data Amazon collected after the implementation of
cardboard dividers. In particular, let’s construct confidence intervals to estimate the true
inventory defect rate for each of the three different storage types at the warehouse. One of
these storage types uses the cardboard divider.

Amazon sampled 5,000 observations from each of the three different storage types and
recorded “1” if there was a defect in the bin and “0” if there was not. The mean defect rate
and standard deviation for each storage type are provided below.
Calculate the 90% confidence interval for the true population mean based on a sample
with  =225, s=8.5, and n=10.

Because our sample has fewer than 30 cases, we cannot assume that the distribution of
sample means will be normal, and must use the t-distribution. The margin of error is based
on the significance level (1-confidence level, or 1-0.90=0.10), the standard deviation (in B2)
and the sample size (in B3). We can compute the margin of error using the Excel function
CONFIDENCE.T(0.10,B2,B3) and it is approximately 4.93.  The lower bound of the 90%
confidence interval is the sample mean minus the margin of error, that is B1–
CONFIDENCE.T(0.10,B2,B3)=225-4.93=220.07. The upper bound of the 90% confidence
interval is the sample mean plus the margin of error, that is
B1+CONFIDENCE.T(0.10,B2,B3)= 225+4.93=229.93. You must link directly to cells to
obtain the correct answer.

Using a 95% confidence level, calculate your best estimate of the true defect rate of storage
type 1.
You can separate the calculations into individual steps or you can combine them into one
function. The lower bound of the 95% confidence interval is the sample mean minus the
margin of error, B2–CONFIDENCE.NORM(0.05,B3,B4)=0.0251, or 2.51%. The upper
bound of the 95% confidence interval is the sample mean plus the margin of error,
B2+CONFIDENCE.NORM(0.05,B3,B4)=0.0345, or 3.45%.

Storage type 3 uses the cardboard divider as Jeff Wilke mentioned the defect rate of the
inventory is higher in that warehouse which corresponds with type 3.
We can say with 95% confidence that the average of the true defect rate of inventory in
warehouse 3 lies between 5.21% and 6.51%. This is almost twice as high as in the other
two warehouses and thus was recognized quickly by the management it looks like. –Dee+8

You might also like