Subject: Statistics

**Binomial Approximation to the Hypergeometric Distribution
**

We often want to determine the proportion (percentage) of members of a finite population that have a specified attribute. For instance, we might be interested in the proportion of U.S. adults that have Internet access. Here the population consists of all U.S. adults, and the attribute is“has Internet access.” Or we might want to know the proportion of U.S. businesses that are minority owned. In this case, the population consists of all U.S. businesses, and the attribute is “minority owned. “Generally, the population under consideration is too large for the population proportion to be found by taking a census. Imagine, for instance, trying to interview every U.S. adult to determine the proportion that have Internet access. So, in practice, we rely mostly on sampling and use the sample data to estimate the population proportion. Suppose that a simple random sample of size n is taken from a population in which the proportion of members that have a specified attribute is p. Then a random variable of primary importance in the estimation of p is the number of members sampled that have the specified attribute, which we denote X. The exact probability distribution of X depends on whether the sampling is done with or without replacement. If sampling is done with replacement, the sampling process constitutes Bernoulli trials: Each selection of a member from the population corresponds to a trial. A success occurs on a trial if the member selected in that trial has the specified attribute; otherwise, a failure occurs. The trials are independent because the sampling is done with replacement. The success probability remains the same from trial to trial—it always equals the proportion of the population that has the specified attribute. Therefore the random variable X has the binomial distribution with parameters n (the sample size) and p (the population proportion). In reality, however, sampling is ordinarily done without replacement. Under these circumstances, the sampling process does not constitute Bernoulli trials because the trials are not independent and the success probability varies from trial to trial.

Subject: Statistics

In other words, the random variable X does not have a binomial distribution. Its distribution is important, however, and is referred to as a hyper-geometric distribution. We won’t present the hyper-geometric probability formula here because, in practice, a hyper-geometric distribution can usually be approximated by a binomial distribution. The reason is that, if the sample size does not exceed 5% of the population size, there is little difference between sampling with and without replacement. The binomial distribution is the most important and most widely used discrete probability distribution. Other common discrete probability distributions are the Poisson, hyper-geometric, and geometric distributions, which you are asked to consider in the exercises.

We often want to determine the proportion (percentage) of members of a finite population that have a specified attribute. For instance, we might be interested in the proportion of U.S. adults that have Internet access. Here the population consists of all U.S. adults, and the attribute is“has Internet access.”

