You are on page 1of 6

# Chapter 8: Interval Estimation

GBS221, Class 25221 April 11, 2011 Notes Compiled by Nicolas C. Rouse, Instructor, Phoenix College

Chapter Objectives 1. Understand how to calculate the margin of error and interval estimate when the population standard deviation is known. 2. Understand how to calculate the margin of error and interval estimate when the population standard deviation is not known. 3. Understand how to determine the sample size based on existing data. 4. Understand how to calculate the sample size based on the population proportion. 1. Population Mean: Known In most applications is not known, and s is used to compute the margin of error. In some applications, however, large amounts of relevant historical data are available and can be used to estimate the population standard deviation prior to sampling. Also, in quality control applications where a process is assumed to be operating correctly, or in control, it is appropriate to treat the population standard deviation as known. We refer to such cases as the known case. Margin of Error: Known o The margin of error is the plus or minus value added to and subtracted from a point estimate in order to develop an interval estimate of a population parameter. o The confidence associated with an interval estimate is the confidence level. For example, if an interval estimation procedure provides intervals such that 95% of the intervals formed using the procedure will include the population parameter, the interval estimate is said to be constructed at the 95% confidence level. o The confidence coefficient is the confidence level expressed as a decimal value. For example,.95 is the confidence coefficient for a 95% confidence level. Interval Estimate: Known o The confidence interval is another name for an interval estimate.

Practical Advice o If the population follows a normal distribution, the confidence interval provided by expression (8.1) is exact. In other words, if expression (8.1) were used repeatedly to generate 95% confidence intervals, exactly 95% of the intervals generated would contain the population mean. If the population does not follow a normal distribution, the confidence interval provided by expression (8.1) will be approximate. In this case, the quality of the approximation depends on both the distribution of the population and the sample size. In most applications, a sample size of n > 30 is adequate when using expression (8.1) to develop an interval estimate of a population mean. If the population is not normally distributed, but is roughly symmetric,

## Chapter 8: Interval Estimation

GBS221, Class 25221 April 11, 2011 Notes Compiled by Nicolas C. Rouse, Instructor, Phoenix College

sample sizes as small as 15 can be expected to provide good approximate confidence intervals. With smaller sample sizes, expression (8.1) should only be used if the analyst believes, or is willing to assume, that the population distribution is at least approximately normal. 2. Population Mean: Unknown When developing an interval estimate of a population mean we usually do not have a good estimate of the population standard deviation either. In these cases, we must use the same sample to estimate and . This situation represents the unknown case. When s is used to estimate , the margin of error and the interval estimate for the population mean are based on a probability distribution known as the t distribution. The t distribution is a family of similar probability distributions, with a specific t distribution depending on a parameter known as the degrees of freedom. The t distribution with one degree of freedom is unique, as is the t distribution with two degrees of freedom, with three degrees of freedom, and so on. As the number of degrees of freedom increases, the difference between the t distribution and the standard normal distribution becomes smaller and smaller. A t distribution with more degrees of freedom exhibits less variability and more closely resembles the standard normal distribution. The mean of the t distribution is zero. Margin of Error: Unknown o To compute an interval estimate of for the unknown case, the sample standard deviation s is used to estimate , and z/2 is replaced by the t distribution value t/2. The margin of error is then given by t/2 . With this margin of error, the general expression for an interval estimate of a population mean when is unknown follows.

The reason the number of degrees of freedom associated with the t value in expression (8.2) is n - 1 concerns the use of s as an estimate of the population standard deviation . Practical Advice o If the population follows a normal distribution, the confidence interval provided by expression (8.2) is exact and can be used for any sample size. If the population does not follow a normal distribution, the confidence interval provided by expression (8.2) will be approximate. In this case, the quality of the approximation depends on both the distribution of the population and the sample size. o In most applications, a sample size of n > 30 is adequate when using expression (8.2) to develop an interval estimate of a population mean. However, if the population distribution is highly skewed or contains outliers, most statisticians would recommend increasing the sample size to 50 or more. If the population is not normally distributed but is roughly symmetric, sample sizes as small as 15 can be expected to provide good approximate confidence intervals. With smaller sample sizes, expression (8.2) should only be used if the o

## Chapter 8: Interval Estimation

GBS221, Class 25221 April 11, 2011 Notes Compiled by Nicolas C. Rouse, Instructor, Phoenix College

analyst believes, or is willing to assume, that the population distribution is at least approximately normal. 3. Determining Sample Size To provide a desired margin of error, we need to determine an appropriate sample size. To understand how this process is done, we return to the known case presented in Section 8.1. Using expression (8.1), the interval estimate is + z/2(/n). The quantity z/2(/n) is the margin of error. Thus, we see that z/2, the population standard deviation , and the sample size n combine to determine the margin of error. Once we select a confidence coefficient 1 - , z/2 can be determined. Then, if we have a value for , we can determine the sample size n needed to provide any desired margin of error. The sample size calculated below provides the desired margin of error at the chosen confidence level.

E is the margin of error that the user is willing to accept, and the value of z/2 follows directly from the confidence level to be used in developing the interval estimate. Although user preference must be considered, 95% confidence is the most frequently chosen value (z.025 = 1.96). Finally, use of the above equation requires a value for the population standard deviation . However, even if is unknown, we can use the above equation provided we have a preliminary or planning value for . The three methods of obtaining a planning value for are: o Use the estimate of the population standard deviation computed from data of previous studies as the planning value for . o Use a pilot study to select a preliminary sample. The sample standard deviation from the preliminary sample can be used as the planning value for . o Use judgment or a best guess for the value of . For example, we might begin by estimating the largest and smallest data values in the population. The difference between the largest and smallest values provides an estimate of the range for the data. Finally, the range divided by 4 is often suggested as a rough approximation of the standard deviation and thus an acceptable planning value for .

4. Population Proportion The sampling distribution of plays a key role in computing the margin of error for the interval estimate. In Chapter 7 we said that the sampling distribution of can be approximated by a normal distribution whenever np > 5 and n(1 - p) > 5. Because the sampling distribution of is normally distributed, if we choose z/2 as the margin of error in an interval estimate of a population proportion, we know that 100(1 - )% of the intervals generated will contain the true population proportion. But cannot be used directly in the computation of the margin of error because p will not be known; p is what we are trying to estimate.

## Chapter 8: Interval Estimation

GBS221, Class 25221 April 11, 2011 Notes Compiled by Nicolas C. Rouse, Instructor, Phoenix College

Let us consider the question of how large the sample size should be to obtain an estimate of a population proportion at a specified level of precision. The rationale for the sample size determination in developing interval estimates of p is similar to the rationale used in Section 8.3 to determine the sample size for estimating a population mean. The margin of error associated with an interval estimate of a population proportion is . The margin of error is based on the value of z/2, the sample proportion, and the sample size n. Larger sample sizes provide a smaller margin of error and better precision. Let E denote the desired margin of error:

Solving this equation for n provides a formula for the sample size that will provide a margin of error of size E:

We cannot use this formula to compute the sample size that will provide the desired margin of error because will not be known until after we select the sample. What we need, then, is a planning value for that can be used to make the computation. Using p* to denote the planning value for p, the following formula can be used to compute the sample size that will provide a margin of error of size E.

In practice, the planning value p* can be chosen by one of the following procedures. o Use the sample proportion from a previous sample of the same or similar units. o Use a pilot study to select a preliminary sample. The sample proportion from this sample can be used as the planning value, p*. o Use judgment or a best guess for the value of p*. o If none of the preceding alternatives apply, use a planning value of p* = .50.

## Chapter 8: Interval Estimation

GBS221, Class 25221 April 11, 2011 Notes Compiled by Nicolas C. Rouse, Instructor, Phoenix College

KEY TERMS
An estimate of a population parameter that provides an interval believed to contain the value of the parameter. For the interval estimates in this chapter, it has the form: point estimate + margin of error. Margin of error The + value added to and subtracted from a point estimate in order to develop an interval estimate of a population parameter. known The case when historical data or other information provides a good value for the population standard deviation prior to taking a sample. The interval estimation procedure uses this known value of in computing the margin of error. Confidence level The confidence associated with an interval estimate. For example, if an interval estimation procedure provides intervals such that 95% of the intervals formed using the procedure will include the population parameter, the interval estimate is said to be constructed at the 95% confidence level. Confidence coefficient The confidence level expressed as a decimal value. For example, .95 is the confidence coefficient for a 95% confidence level. Confidence interval Another name for an interval estimate. unknown The more common case when no good basis exists for estimating the population standard deviation prior to taking the sample. The interval estimation procedure uses the sample standard deviation s in computing the margin of error. t distribution A family of probability distributions that can be used to develop an interval estimate of a population mean whenever the population standard deviation is unknown and is estimated by the sample standard deviation s. Degrees of freedom A parameter of the t distribution. When the t distribution is used in the computation of an interval estimate of a population mean, the appropriate t distribution has n - 1 degrees of freedom, where n is the size of the simple random sample. Interval estimate

## Chapter 8: Interval Estimation

GBS221, Class 25221 April 11, 2011 Notes Compiled by Nicolas C. Rouse, Instructor, Phoenix College

KEY FORMULAS