This action might not be possible to undo. Are you sure you want to continue?

**POLICES ON THE AMERICAN COMMUNITY SURVEY
**

TO: ROBERT GROVES, DIRECTOR

FROM: ANDREW A. BEVERIDGE, PROFESSOR OF SOCIOLOGY, QUEENS COLLEGE AND

GRADUATE CENTER CUNY AND CONSULTANT TO THE NEW YORK TIMES FOR CENSUS

AND OTHER DEMOGRAPHIC ANALYSES

*

SUBJECT: IMPACT OF MISCOMPUTED STANDARD ERRORS AND FLAWED DATA RELEASE

POLICES ON THE AMERICAN COMMUNITY SURVEY

DATE: 1/24/2011

CC: RODERICK LITTLE, ASSOCIATE DIRECTOR; CATHY MCCULLY CHIEF U.S. CENSUS

BUREAU REDISTRICTING DATA OFFICE

This memo is to alert the Bureau to two problems with the American Community Survey as released that

degrade the utility of the data through confusion and filtering (not releasing all data available for a specific

geography). The resultant errors could reflect poorly on the Bureau’s ability to conduct the American

Community Survey and could potentially undercut support for that effort. The problems are as follows:

1. The Standard Errors, and the Margins of Error (MOE) based upon them, reported for data

in the American Community Survey are miscomputed. For situations where the usual

approximation used to compute standard errors is inappropriate (generally where there a

given cell in a table makes up only a small part or a very large portion of the total) it is used

anyway. The results they yield include reports of “negative” numbers of people for many

cells within the MOE in the standard reports.

2. The rules for not releasing data about specific fields or variables in the one-year and three-

year files (so-called “filtering”) in a given table seriously undercuts the value of the one-year

and three-year ACS for many communities and areas throughout the United States by

depriving them of valuable data. (The filtering is also based in part on the miscomputed

MOEs.)

This memo discusses the issues underlying each problem. Attached to this memo are a number of items to

support and illustrate my points.

A. MISCOMPUTED STANDARD ERRORS

Background

The standard errors and MOEs reported with the American Community Survey, were miscomputed

in certain situations and lead to erroneous results. When very small or very large proportions of a given

table are in a specific category then the standard errors are computed improperly. Indeed, there are a massive

number of instances where the Margins of Error (MOEs) include zero or negative counts of persons with

given characteristics. This happens even though individuals were found in the sample with specific

*

I write as a friend of the Census Bureau. Since 1993 I have served as consultant to the New York Times with regard to

the Census and other demographic matters. I use Census and ACS data in my research and have testified using Census

data numerous times in court.

Miscomputation of Standard Error and Flawed Release Rules in the ACS--2

characteristics (e.g. Non-Hispanic Asian). Obviously, if one finds any individuals of a given category in a

given area, the MOE should never be negative or include zero for that category. If it did, then the method by

which it was computed should be considered incorrect. For instance, in the case of non-Hispanic Asians in

counties, in some 742 counties where the point estimate of the number of non-Hispanic Asians is greater

than zero, the published MOE implies that the actual number in those counties could be negative, and in 40

additional cases the MOE includes zero. Similarly, for the 419 counties where the point estimate is zero, the

published MOE implies that there could be “negative” or “minus” Non-Hispanic Asians in the county. (See

Appendix 1 for these and other results. Here I have only shared results from Table B03002. Many other

tables exhibit similar issues.) This problem permeates the data released from the 2005-2009 ACS, and has

contributed to filtering issues in the three-year and one-year files, which will be discussed later in this memo.

Issues with the ACS Approach to Computing Standard Error

This problem appears to be directly related to the method by which Standard Errors and MOEs are

computed for the American Community Survey, especially for situations where a given data field constitutes a

very large or very small proportion of a specific table. Chapter 12 of the ACS Design and Methodology describes in

some detail the methods used to compute standard error in the ACS (see Appendix 2). It appears that the

problems with using the normal approximation or Wald approximation in situations with large or small

proportions are ignored. This is especially surprising given the fact that the problems of using a normal

approximation to the binomial distribution under these conditions are well known. I have attached an article

that surveys some of these issues. (See Appendix 3, Lawrence D. Brown, T. Tony Cai and Anirban

DasGupta. “Interval Estimation for a Binomial Proportion.” Statistical Science 16:2 (2001): p.101–33.) The

article also includes responses from several researchers who have tackled this problem. Indeed, even

Wikipedia has references to this problem in its entry regarding the “Binomial Proportion.” Much of the

literature makes the point that even in conditions where the issue is not a large or small proportion, the

normal approximation may result in inaccuracies. Some standard statistical packages (e.g., SAS, SUDAAN)

have implemented several of the suggested remedies. Indeed, some software programs even warn users when

the Wald or normal approximation should not be used.

It is true that on page 12-6 of ACS Design and Methodology the following statement seems to indicate that the

Census Bureau recognizes the problem (similar statements are in other documents regarding the ACS):

Users are cautioned to consider ‘‘logical’’ boundaries when creating confidence bounds from

the margins of error. For example, a small population estimate may have a calculated lower

bound less than zero. A negative number of people does not make sense, so the lower

bound should be set to zero instead. Likewise, bounds for percents should not go below

zero percent or above 100 percent.

In fact, this means that it was obvious to the bureau that the method for computing standard errors and

MOEs resulted in errors in certain situations, but nothing, so far, has been done to remedy the problem.

Furthermore in the American Fact Finder MOEs which included negative numbers, where respondents were

found in the area abound. (See Appendix 4, for an example.) It is equally problematic to have the MOE

include zero when the sample found individuals or households with a given characteristic. I should note that

exact confidence intervals for binomial proportions and appropriate approximations under these conditions

are asymmetric, and never include zero or become negative. Therefore, the ACS’s articulated (but not

implemented) MOE remedy of setting the lower bound to zero would not be a satisfactory solution. Rather,

the method of computation must be changed to reflect the accepted statistical literature and statistical practice

by major statistical packages and to guard against producing data that are illogical and thus likely to be

misunderstood and criticized.

Miscomputation of Standard Error and Flawed Release Rules in the ACS--3

Consequences of the Miscomputation

There are serious consequences stemming from this miscomputation:

1. The usefulness, accuracy and reliability of ACS data may be thrown into question. With the

advent of the ACS 2005-2009 data, many of the former uses of the Census long-form will now

depend upon the ACS five year file, while others could be based upon the one-year and three-year

files. One of these uses is in the contentious process of redistricting Congress, state legislatures and

local legislative bodies. In any area where the Voting Rights Act provisions are used to assess

potential districting plans, Citizens of Voting Age Population (CVAP) numbers need to be computed

for various cognizable groups, including Hispanics, Asian and Pacific Islanders, Non-Hispanic White

and Non-Hispanic Black, etc. Since citizenship data is not collected in the decennial census, such

data, which used to come from the long form, must now be produced from the 2005-2009 ACS. In

2000 the Census Bureau created a special tabulation (STP76) based upon the long form to report

CVAP by group for the voting age population to help in the computation of the so-called “effective

majority.” I understand that the Department of Justice has requested a similar file based upon the

2005 to 2009 ACS. Unless the standard error for the ACS is correctly computed, I envision that

during the very contentious and litigious process of redistricting, someone could allege that the ACS

is completely without value for estimating CVAP due to its flawed standard errors and MOEs, and

therefore is not useful for redistricting. I am sure that somewhere a redistricting expert would testify

to that position if the current standard error and MOE computation procedure were maintained.

This could lead to a serious public relations and political problem, not only for the ACS, but for

census data generally.

When a redistricting expert suggested this exact scenario to me, I became convinced that I

should formally bring these issues to the Bureau’s attention. Given the fraught nature of the

politics over the ACS and census data generally, I think it is especially important that having

a massive number of MOEs including “minus and zero people” should be addressed

immediately by choosing an appropriate method to compute MOEs and implementing it in

the circumstances discussed above.

I should note that in addition to redistricting, the ACS data will be used for the following:

1. Equal employment monitoring using an Equal Employment Opportunity

Commission File, which includes education and occupation data, so it must be

based upon the ACS.

2. Housing availability and affordability studies using a file created for the Housing and

Urban Development (HUD) called the “Comprehensive Housing Affordability

Strategy (CHAS) file. This file also requires ACS data since it includes several

housing characteristics and cost items.

3. Language assistance assessments for voting by the Department of Justice, which is

another data file that can only be created from the ACS because it is based upon the

report of the language spoken at home.

Because the ACS provides richer data on a wider variety of topics than the decennial census, there

are a multitude of other uses for the ACS in planning transportation, school enrollment, districting

and more where the MOEs would be a significant issue. In short, the miscomputed MOEs are likely

to cause local and state government officials, private researchers, and courts of law to use the ACS

less effectively and less extensively, or to stop using it all together.

Miscomputation of Standard Error and Flawed Release Rules in the ACS--4

B. FLAWED RELEASE POLICIES FOR DATA TABLES IN THE ONE-YEAR AND THREE-

YEAR ACS FILES

Background

Having yearly and tri-yearly data available makes the ACS potentially much more valuable than the infrequent

Census long form. However, much of that added value has been undercut by the miscomputation of MOEs

coupled with disclosure rules for data in the one-year and three-year ACS files. This has meant that

important data for many communities and areas have been “filtered,” that is not released. The general rule,

of course, is that data are released for areas with a population of 65,000 or greater for the one-year file and for

areas with a population of 20,000 or greater for the three-year files. However, the implementation of the so-

called “filtering rule,” used to prevent unreliable statistical data from being released, has had the practical

effect of blocking the release of much important data.

As chapter 13 of the ACS Design Methodology states:

The main data release rule for the ACS tables works as follows. Every detailed table consists

of a series of estimates. Each estimate is subject to sampling variability that can be

summarized by its standard error. If more than half of the estimates in the table are not

statistically different from 0 (at a 90 percent confidence level), then the table fails. Dividing

the standard error by the estimate yields the coefficient of variation (CV) for each estimate.

(If the estimate is 0, a CV of 100 percent is assigned.) To implement this requirement for

each table at a given geographic area, CVs are calculated for each table’s estimates, and the

median CV value is determined. If the median CV value for the table is less than or equal to

61 percent, the table passes for that geographic area and is published; if it is greater than 61

percent, the table fails and is not published. (Chapter 13 of the ACS Design and Methodology,

p. 13-7, see Appendix 5.)

Negative Consequences of the Filtering Rules

These rules on their face are flawed, since they make the release of data about non-Hispanic blacks or non-

Hispanic whites contingent on the presence of members of other groups living in the same area, such as

Hispanic Native Americans or Alaskan Natives, Hispanic Asians, or Hispanic Hawaiian Natives or other

Pacific Islanders. Therefore, for areas populated by just a few groups, the Bureau will not release the data

about any group for that area. Thus, many cities, counties and other areas will not have important data about

the population composition reported because they do not have enough different population groups living

within the specified area. Furthermore, using the CV based upon the miscomputed standard error (most

especially for the case of where few or zero individuals in the area are in a given category) means that the

likelihood of reporting a high CV is increased and even more areas will not have some data released. (As

noted in the section discussing the miscomputed standard errors, the computation of the standard error and

thus the computation of CV is flawed.)

To demonstrate the impact of this rule, I assessed the “filtering” for the variable B03002 (Hispanic status by

race) in the 2009 one-year release, using the Public Use Micro-data Areas (PUMAs). PUMAs are areas that

were designed to be used with the Public Use Micro-Data files and are required to include at least 100,000

people. Due to the Bureau’s release rules, the only areas released for the one-year and three-year files that

provide complete coverage of the United States were the nation, States, and PUMAs levels of data. Using

PUMAs, it is easy to understand the impact of the “filtering” rules since every PUMA received data in 2009.

For the Race and Hispanic status table (B03002), the table elements include total population; total non-

Hispanic population; non-Hispanic population for the following groups: white; black; native American or

Alaskan Native; Asian; native Hawaiian or other Pacific Islander; some other race; two or more races; two

races including some other race; two races excluding some other race, and three or more races; and total

Hispanic population and then population for each of the same groups listed above but for those that are

Hispanic, such as Hispanic Asian, Hispanic native Hawaiian and other Pacific Islander, as well as Hispanic

Miscomputation of Standard Error and Flawed Release Rules in the ACS--5

White and Hispanic black. The table contains some 21 items, of which 3 are subtotals of other items of other

items. (See Appendix 2 for an example.)

Plainly, many of the cells in this table are likely to be zero or near zero for many geographies within the

United States. For that reason, it is not surprising that 1,274 of 2,101 PUMAs (well more than half) were

“filtered” for the 2009 one-year file. Those not filtered are likely to be urban areas with a substantial degree

of population diversity, while those filtered are often the opposite. The map presented on the final page of

this memo shows graphically which PUMAs had table B03002 filtered. This particular example shows where

the proportion of the population that was non-Hispanic white was not revealed in this table.

Looking at the map, the problem becomes clear. Vast portions of the United States are filtered, as

represented by the green cross-hatch pattern. Parts of North Dakota and Maine for example do not have a

report on the number of non-Hispanic whites. In a similar vein, there is at least one PUMA in New York

City that is so concentrated in terms of non-Hispanic African American population that B03002 is “filtered.”

Conclusion and Recommendations

Based upon this analysis, I would make three recommendations to the Bureau:

1) That a User Note immediately be drafted indicating that there is a problem with the MOEs (and the

standard errors which are used to compute them) in the ACS for certain situations, and that the

Bureau will recompute them for the ACS 2005-2009 file on an expedited basis. We are basically one-

month from the start of redistricting with Virginia and New Jersey required to redraw their lines by

early summer. Both states have substantial numbers of foreign-born individuals, many of whom are

Hispanic, so the Citizens of Voting Age by group calculations are very important and dependent on

clear ACS data.

2) That the MOEs for the 2005-2009 ACS, (as well as all of the one and three year files already issues)

be recomputed using a method that takes into account the issues surrounding the “binomial

proportion.” That the ACS 2005-2009 data be re-released with these new MOE files. This should

be done almost immediately for the more recent files and as soon as possible for the others.

3) That the Bureau adopt a new release policy based not upon whole tables (the specifications of which

are in any event arbitrary), but rather based upon specific variables or table cells. In this way, the

release of important data would not be subject to miscomputed standard errors, or to the size or

estimated variability of other variables or cells. It also may make sense to dispense with “filtering”

altogether. This policy should then be applied to previously released data, and the data should be re-

released.

The problems identified in this memo are causing serious issues regarding the ACS’s use in a wide variety of

settings, and could seriously threaten the viability of this irreplaceable and vital data source if they became

widely known. As a Census and ACS user, I truly hope that steps can be taken to remedy these problems

swiftly.

Miscomputation of Standard Error and Flawed Release Rules in the ACS--6

MAP OF THE LOWER 48 STATES SHOWING NON-RELEASE (“FILTERING”) OF DATA FOR RACE AND HISPANIC

STATUS BY PUMAS. GREEN CROSS-HATCHES INDICATE NON-RELEASE

Andrew A. Beveridge <andy@socialexplorer.com>

Andrew A. Beveridge <andrew.beveridge@qc.cuny.edu> Tue, Feb 8, 2011 at 8:44 AM

To: Roderick.Little@census.gov, David Rindskopf <drindskopf@gc.cuny.edu>

Cc: robert.m.groves@census.gov, Catherine.Clark.McCully@census.gov, "sharon.m.stern" <Sharon.M.Stern@census.gov>

Bcc: j.trent.alexander@census.gov, Matthew Ericson <matte@nytimes.com>

Dear Rod:

It was good to talk to you yesterday along with David Rindskopf

regarding the issues raised in my memo regarding MOE and filtering in

the ACS Data.

We understand the fact that recomputing and re-releasing the

confidence intervals would be a very time consuming process.

Nonetheless, I do believe that some sort of statement regarding the

MOE's is necessary, particularly around negative and zero values for

groups that were found in the sample, as well as the intervals for

situations where no subjects were found.

I attach to this memo an example table from the CVAP data released on

Friday. This is for a block group in Brooklyn, and as you can see the

point estimates for seven of the 11 groups are zero, and the MOE's are

123 in every case. These groups include several that have very low

frequencies in Brooklyn (e.g., American Indian or Alaskan Native). In

the litigation context, it would be very easy to make such numbers

look absurd. I still remain concerned that this could easily harm the

Bureau's credibility, which would be very damaging given the context

of support for Bureau activities.

I also agree that doing some research into what would be an

appropriate way to view and compute Confidence Intervals for the ACS,

possibly including reference to the Decennial Census results and also

contemplating the importance of various spatial distributions would

ultimately be very helpful. At the same time, it is nonetheless the

fact that at least two of the major statistical package vendors have

implemented procedures, which take into account the issue of high or

low proportions in estimating confidence intervals from survey data.

I attach the write-up on confidence intervals for SAS's SURVEYFREQ,

one of a set of procedures that SAS has developed to handle the

analysis of data drawn from complex samples. They have the following

option for the computation of confidence limits: "PROC SURVEYFREQ

also provides the PSMALL option, which uses the alternative confidence

limit type for extreme (small or large) proportions and uses the Wald

confidence limits for all other proportions (not extreme)." (See

Attachment)

At the same time, later yesterday afternoon I was speaking with a very

well known lawyer who works on redistricting issues. He had looked at

the release of the CVAP data at the block-group level and basically

concluded that it was effectively worthless. Part of this, I think,

is due to the fact that the data such as that in the attachment, do

not look reliable on their face.

If nothing is done, it will be very difficult to defend these data in

court or use them to help in drawing districts. I remain convinced

that there also could be serious collateral damage to the Census

Bureau in general. I for one hope that this does not occur. I look

forward to your response to the issues I raised.

Very truly yours,

Andy

--

Andrew A. Beveridge

Prof of Sociology Queens College and Grad Ctr CUNY

Chair Queens College Sociology Dept

Office: 718-997-2848

Email: andrew.beveridge@qc.cuny.edu

252A Powdermaker Hall

65-30 Kissena Blvd

Flushing, NY 11367-1597

www.socialexplorer.com

Attachment_2_8_2011.doc

196K

Attachment to EMAIL to Rod Little 2/8/2011 -1- 1

Block Group 2, Census Tract 505, Kings County, New York

LNTITLE LNNUMBER CVAP_EST CVAP_MOE

Total 1 625 241

Not Hispanic or Latino 2 195 124

American Indian or Alaska Native Alone 3 0 123

Asian Alone 4 35 55

Black or African American Alone 5 45 51

Native Hawaiian or Other Pacific Islander Alone 6 0 123

White Alone 7 120 80

American Indian or Alaska Native and White 8 0 123

Asian and White 9 0 123

Black or African American and White 10 0 123

American Indian or Alaska Native and Black or African

American

11 0 123

Remainder of Two or More Race Responses 12 0 123

Hispanic or Latino 13 430 197

The SURVEYFREQ Procedure

Confidence Limits for Proportions

http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer

.htm#statug_surveyfreq_a0000000221.htm

If you specify the CL option in the TABLES statement, PROC SURVEYFREQ computes

confidence limits for the proportions in the frequency and crosstabulation tables.

By default, PROC SURVEYFREQ computes Wald ("linear") confidence limits if you do not

specify an alternative confidence limit type with the TYPE= option. In addition to Wald

confidence limits, the following types of design-based confidence limits are available for

proportions: modified Clopper-Pearson (exact), modified Wilson (score), and logit confidence

limits.

PROC SURVEYFREQ also provides the PSMALL option, which uses the alternative confidence

limit type for extreme (small or large) proportions and uses the Wald confidence limits for all

other proportions (not extreme). For the default PSMALL= value of 0.25, the procedure

computes Wald confidence limits for proportions between 0.25 and 0.75 and computes the

alternative confidence limit type for proportions that are outside of this range. See Curtin et al.

(2006).

Attachment to EMAIL to Rod Little 2/8/2011 -2- 2

See Korn and Graubard (1999), Korn and Graubard (1998), Curtin et al. (2006), and Sukasih and

Jang (2005) for details about confidence limits for proportions based on complex survey data,

including comparisons of their performance. See also Brown, Cai, and DasGupta (2001), Agresti

and Coull (1998) and the other references cited in the following sections for information about

binomial confidence limits.

For each table request, PROC SURVEYFREQ produces a nondisplayed ODS table, "Table

Summary," which contains the number of observations, strata, and clusters that are included in

the analysis of the requested table. When you request confidence limits, the "Table Summary"

data set also contains the degrees of freedom df and the value of that is used to compute the

confidence limits. See Example 84.3 for more information about this output data set.

Wald Confidence Limits

PROC SURVEYFREQ computes standard Wald ("linear") confidence limits for proportions by

default. These confidence limits use the variance estimates that are based on the sample design.

For the proportion in table cell , the Wald confidence limits are computed as

where is the estimate of the proportion in table cell , is the standard error of the

estimate, and is the th percentile of the t distribution with df degrees of

freedom calculated as described in the section Degrees of Freedom. The confidence level is

determined by the value of the ALPHA= option, which by default equals 0.05 and produces 95%

confidence limits.

The confidence limits for row proportions and column proportions are computed similarly to the

confidence limits for table cell proportions.

Modified Confidence Limits

PROC SURVEYFREQ uses the modification described in Korn and Graubard (1998) to compute

design-based Clopper-Pearson (exact) and Wilson (score) confidence limits. This modification

substitutes the degrees-of-freedom adjusted effective sample size for the original sample size in

the confidence limit computations.

The effective sample size is computed as

where is the original sample size (unweighted frequency) that corresponds to the total domain

of the proportion estimate, and is the design effect.

Attachment to EMAIL to Rod Little 2/8/2011 -3- 3

If the proportion is computed for a table cell of a two-way table, then the domain is the two-way

table, and the sample size is the frequency of the two-way table. If the proportion is a row

proportion, which is based on a two-way table row, then the domain is the row, and the sample

size is the frequency of the row.

The design effect for an estimate is the ratio of the actual variance (estimated based on the

sample design) to the variance of a simple random sample with the same number of observations.

See the section Design Effect for details about how PROC SURVEYFREQ computes the design

effect.

If you do not specify the ADJUST=NO option, the procedure applies a degrees-of-freedom

adjustment to the effective sample size to compute the modified sample size. If you specify

ADJUST=NO, the procedure does not apply the adjustment and uses the effective sample size

in the confidence limit computations.

The modified sample size is computed by applying a degrees-of-freedom adjustment to the

effective sample size as

where df is the degrees of freedom and is the th percentile of the t distribution

with df degrees of freedom. The section Degrees of Freedom describes the computation of the

degrees of freedom df, which is based on the variance estimation method and the sample design.

The confidence level is determined by the value of the ALPHA= option, which by default

equals 0.05 and produces 95% confidence limits.

The design effect is usually greater than 1 for complex survey designs, and in that case the

effective sample size is less than the actual sample size. If the adjusted effective sample size is

greater than the actual sample size , then the procedure truncates the value of to , as

recommended by Korn and Graubard (1998). If you specify the TRUNCATE=NO option, the

procedure does not truncate the value of .

Modified Clopper-Pearson Confidence Limits

Clopper-Pearson (exact) confidence limits for the binomial proportion are constructed by

inverting the equal-tailed test based on the binomial distribution. This method is attributed to

Clopper and Pearson (1934). See Leemis and Trivedi (1996) for a derivation of the distribution

expression for the confidence limits.

PROC SURVEYFREQ computes modified Clopper-Pearson confidence limits according to the

approach of Korn and Graubard (1998). The degrees-of-freedom adjusted effective sample size

is substituted for the sample size in the Clopper-Pearson computation, and the adjusted

effective sample size times the proportion estimate is substituted for the number of positive

Attachment to EMAIL to Rod Little 2/8/2011 -4- 4

responses. (Or if you specify the ADJUST=NO option, the procedure uses the unadjusted

effective sample size instead of .)

The modified Clopper-Pearson confidence limits for a proportion ( and ) are computed as

where is the th percentile of the distribution with and degrees of freedom, is the

adjusted effective sample size, and is the proportion estimate.

Modified Wilson Confidence Limits

Wilson confidence limits for the binomial proportion are also known as score confidence limits

and are attributed to Wilson (1927). The confidence limits are based on inverting the normal test

that uses the null proportion in the variance (the score test). See Newcombe (1998) and Korn and

Graubard (1999) for details.

PROC SURVEYFREQ computes modified Wilson confidence limits by substituting the degrees-

of-freedom adjusted effective sample size for the original sample size in the standard Wilson

computation. (Or if you specify the ADJUST=NO option, the procedure substitutes the

unadjusted effective sample size .)

The modified Wilson confidence limits for a proportion are computed as

where is the adjusted effective sample size and is the estimate of the proportion. With the

degrees-of-freedom adjusted effective sample size , the computation uses . With the

unadjusted effective sample size, which you request with the ADJUST=NO option, the

computation uses . See Curtin et al. (2006) for details.

Logit Confidence Limits

If you specify the TYPE=LOGIT option, PROC SURVEYFREQ computes logit confidence

limits for proportions. See Agresti (2002) and Korn and Graubard (1998) for more information.

Logit confidence limits for proportions are based on the logit transformation .

The logit confidence limits and are computed as

Attachment to EMAIL to Rod Little 2/8/2011 -5- 5

where

where is the estimate of the proportion, is the standard error of the estimate, and

is the th percentile of the t distribution with df degrees of freedom. The degrees of

freedom are calculated as described in the section Degrees of Freedom. The confidence level is

determined by the value of the ALPHA= option, which by default equals 0.05 and produces 95%

confidence limits.

Andrew A. Beveridge <andy@socialexplorer.com>

roderick.little@census.gov <roderick.little@census.gov> Tue, Feb 8, 2011 at 9:22 AM

To: "Andrew A. Beveridge" <andrew.beveridge@qc.cuny.edu>

Cc: andy@socialexplorer.com, Catherine.Clark.McCully@census.gov, David Rindskopf <drindskopf@gc.cuny.edu>, robert.m.groves@census.gov,

"sharon.m.stern" <Sharon.M.Stern@census.gov>

Andy and David, thanks for this. The SAS appendix looks interesting and seems to be a good start on the problem. I think I have a better idea of the main

concerns from our conversation, and will convey this to the ACS team as we consider next steps. Best, Rod

From: "Andrew A. Beveridge" <andrew.beveridge@qc.cuny.edu>

To: Roderick.Little@census.gov, David Rindskopf <drindskopf@gc.cuny.edu>

Cc: robert.m.groves@census.gov, Catherine.Clark.McCully@census.gov, "sharon.m.stern" <Sharon.M.Stern@census.gov>

Date: 02/08/2011 08:45 AM

Subject: Confidence Limits for Small and Large Proportion in ACS Data

Sent by: andy@socialexplorer.com

[Quoted text hidden]

[attachment "Attachment_2_8_2011.doc" deleted by Roderick Little/DIR/HQ/BOC]

13.8 IMPORTANT NOTES ON MULTIYEAR ESTIMATES

While the types of data products for the multiyear estimates are almost entirely identical to those

used for the 1-year estimates, there are several distinctive features of the multiyear estimates that

data users must bear in mind.

First, the geographic boundaries that are used for multiyear estimates are always the boundary as

of January 1 of the final year of the period. Therefore, if a geographic area has gained or lost terri-

tory during the multiyear period, this practice can have a bearing on the user’s interpretation of

the estimates for that geographic area.

Secondly, for multiyear period estimates based on monetary characteristics (for example, median

earnings), inflation factors are applied to the data to create estimates that reflect the dollar values

in the final year of the multiyear period.

Finally, although the Census Bureau tries to minimize the changes to the ACS questionnaire, these

changes will occur from time to time. Changes to a question can result in the inability to build cer-

tain estimates for a multiyear period containing the year in which the question was changed. In

addition, if a new question is introduced during the multiyear period, it may be impossible to

make estimates of characteristics related to the new question for the multiyear period.

13.9 CUSTOM DATA PRODUCTS

The Census Bureau offers a wide variety of general-purpose data products from the ACS designed

to meet the needs of the majority of data users. They contain predefined sets of data for standard

census geographic areas. For users whose data needs are not met by the general-purpose prod-

ucts, the Census Bureau offers customized special tabulations on a cost-reimbursable basis

through the ACS custom tabulation program. Custom tabulations are created by tabulating data

from ACS edited and weighted data files. These projects vary in size, complexity, and cost,

depending on the needs of the sponsoring client.

Each custom tabulation request is reviewed in advance by the DRB to ensure that confidentiality is

protected. The requestor may be required to modify the original request to meet disclosure avoid-

ance requirements. For more detailed information on the ACS Custom Tabulations program, go to

<http://www.census.gov/acs/www/Products/spec_tabs/index.htm>.

13−8 Preparation and Review of Data Products ACS Design and Methodology

U.S. Census Bureau

Cumulative Cumulative

Frequency Percent

MOE Not Reported 3,098 96.18 3,098 96.18

MOE Less than Estimate 123 3.82 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Not Reported 2,503 77.71 2,503 77.71

MOE Greater than Estimate 2 0.06 2,505 77.77

MOE Less than Estimate 715 22.20 3,220 99.97

Est Zero, MOE Positive 1 0.03 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Greater than Estimate 4 0.12 4 0.12

MOE Less than Estimate 3,215 99.81 3,219 99.94

Est Zero, MOE Positive 2 0.06 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 24 0.75 24 0.75

MOE Greater than Estimate 429 13.32 453 14.06

MOE Less than Estimate 2,473 76.78 2,926 90.84

Est Zero, MOE Positive 295 9.16 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 35 1.09 35 1.09

MOE Greater than Estimate 665 20.65 700 21.73

MOE Less than Estimate 2,133 66.22 2,833 87.95

Est Zero, MOE Positive 388 12.05 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 40 1.24 40 1.24

MOE Greater than Estimate 742 23.04 782 24.28

MOE Less than Estimate 2,020 62.71 2,802 86.99

Est Zero, MOE Positive 419 13.01 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 21 0.65 21 0.65

MOE Greater than Estimate 893 27.72 914 28.38

MOE Less than Estimate 419 13.01 1,333 41.38

Est Zero, MOE Positive 1,888 58.62 3,221 100.00

Appendix 1. Tabulation of MOEs, Including Those That Include Zero or Negative

Cases for Table B03002 (Race by Hispanic Status)

Neg MOE vs EST: Not Hispanic or Latino: Native Hawaiian and Other Pacific Islander

alone

B03002_7_neg

Frequency Percent

Neg MOE vs EST: Not Hispanic or Latino: Asian alone

B03002_6_neg

Frequency Percent

Neg MOE vs EST: Not Hispanic or Latino: American Indian and Alaska Native alone

B03002_5_neg

Frequency Percent

Neg MOE vs EST: Not Hispanic or Latino: Black or African American alone

B03002_4_neg

Frequency Percent

Neg MOE vs EST: Not Hispanic or Latino: White alone

B03002_3_neg

Frequency Percent

Neg MOE vs EST: Not Hispanic or Latino

B03002_2_neg

Frequency Percent

Neg MOE vs EST: Total population

B03002_1_neg

Frequency Percent

Page 1 of 4

Appendix 1. Tabulation of MOEs, Including Those That Include Zero or Negative

Cases for Table B03002 (Race by Hispanic Status)

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 28 0.87 28 0.87

MOE Greater than Estimate 1,082 33.59 1,110 34.46

MOE Less than Estimate 817 25.36 1,927 59.83

Est Zero, MOE Positive 1,294 40.17 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 34 1.06 34 1.06

MOE Greater than Estimate 339 10.52 373 11.58

MOE Less than Estimate 2,706 84.01 3,079 95.59

Est Zero, MOE Positive 142 4.41 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 25 0.78 25 0.78

MOE Greater than Estimate 1,019 31.64 1,044 32.41

MOE Less than Estimate 505 15.68 1,549 48.09

Est Zero, MOE Positive 1,672 51.91 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 32 0.99 32 0.99

MOE Greater than Estimate 354 10.99 386 11.98

MOE Less than Estimate 2,684 83.33 3,070 95.31

Est Zero, MOE Positive 151 4.69 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Not Reported 2,503 77.71 2,503 77.71

MOE Equals Estimate 12 0.37 2,515 78.08

MOE Greater than Estimate 187 5.81 2,702 83.89

MOE Less than Estimate 476 14.78 3,178 98.67

Est Zero, MOE Positive 43 1.33 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 21 0.65 21 0.65

MOE Greater than Estimate 342 10.62 363 11.27

MOE Less than Estimate 2,757 85.59 3,120 96.86

Est Zero, MOE Positive 101 3.14 3,221 100.00

Neg MOE vs EST: Hispanic or Latino: White alone

B03002_13_neg

Frequency Percent

Neg MOE vs EST: Hispanic or Latino

B03002_12_neg

Frequency Percent

Neg MOE vs EST: Not Hispanic or Latino: Two or more races: Two races excluding

Some other race, and three or more races

B03002_11_neg

Frequency Percent

Neg MOE vs EST: Not Hispanic or Latino: Two or more races: Two races including

Some other race

B03002_10_neg

Frequency Percent

Neg MOE vs EST: Not Hispanic or Latino: Two or more races

B03002_9_neg

Frequency Percent

Neg MOE vs EST: Not Hispanic or Latino: Some other race alone

B03002_8_neg

Frequency Percent

Page 2 of 4

Appendix 1. Tabulation of MOEs, Including Those That Include Zero or Negative

Cases for Table B03002 (Race by Hispanic Status)

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 23 0.71 23 0.71

MOE Greater than Estimate 883 27.41 906 28.13

MOE Less than Estimate 788 24.46 1,694 52.59

Est Zero, MOE Positive 1,527 47.41 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 34 1.06 34 1.06

MOE Greater than Estimate 1,062 32.97 1,096 34.03

MOE Less than Estimate 646 20.06 1,742 54.08

Est Zero, MOE Positive 1,479 45.92 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 11 0.34 11 0.34

MOE Greater than Estimate 557 17.29 568 17.63

MOE Less than Estimate 274 8.51 842 26.14

Est Zero, MOE Positive 2,379 73.86 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 6 0.19 6 0.19

MOE Greater than Estimate 363 11.27 369 11.46

MOE Less than Estimate 62 1.92 431 13.38

Est Zero, MOE Positive 2,790 86.62 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 24 0.75 24 0.75

MOE Greater than Estimate 581 18.04 605 18.78

MOE Less than Estimate 2,320 72.03 2,925 90.81

Est Zero, MOE Positive 296 9.19 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 40 1.24 40 1.24

MOE Greater than Estimate 886 27.51 926 28.75

MOE Less than Estimate 1,573 48.84 2,499 77.58

Est Zero, MOE Positive 722 22.42 3,221 100.00

Neg MOE vs EST: Hispanic or Latino: Two or more races

B03002_19_neg

Frequency Percent

Neg MOE vs EST: Hispanic or Latino: Some other race alone

B03002_18_neg

Frequency Percent

Neg MOE vs EST: Hispanic or Latino: Native Hawaiian and Other Pacific Islander

alone

B03002_17_neg

Frequency Percent

Neg MOE vs EST: Hispanic or Latino: Asian alone

B03002_16_neg

Frequency Percent

Neg MOE vs EST: Hispanic or Latino: American Indian and Alaska Native alone

B03002_15_neg

Frequency Percent

Neg MOE vs EST: Hispanic or Latino: Black or African American alone

B03002_14_neg

Frequency Percent

Page 3 of 4

Appendix 1. Tabulation of MOEs, Including Those That Include Zero or Negative

Cases for Table B03002 (Race by Hispanic Status)

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 36 1.12 36 1.12

MOE Greater than Estimate 975 30.27 1,011 31.39

MOE Less than Estimate 1,263 39.21 2,274 70.60

Est Zero, MOE Positive 947 29.40 3,221 100.00

Cumulative Cumulative

Frequency Percent

MOE Equals Estimate 32 0.99 32 0.99

MOE Greater than Estimate 1,023 31.76 1,055 32.75

MOE Less than Estimate 906 28.13 1,961 60.88

Est Zero, MOE Positive 1,260 39.12 3,221 100.00

Neg MOE vs EST: Hispanic or Latino: Two or more races: Two races excluding Some

other race, and three or more races

B03002_21_neg

Frequency Percent

Neg MOE vs EST: Hispanic or Latino: Two or more races: Two races including Some

other race

B03002_20_neg

Frequency Percent

Page 4 of 4

Appendix 2.

ACS Desi gn and Met hodol ogy (Ch. 12 Revi sed 12/ 2010) Vari ance Est i mat i on 12- 1

U.S. Census Bureau

Chapt er 12.

Variance Estimation

12.1 OVERVIEW

Sampl i ng error i s the uncertai nt y associ at ed wi t h an esti mat e t hat i s based on dat a gat hered from

a sampl e of t he popul at i on rat her t han t he ful l popul at ion. Not e t hat sampl e- based est i mat es wi l l

vary dependi ng on the part icul ar sampl e sel ect ed f rom t he popul at i on. Measures of t he magni t ude

of sampl i ng error, such as the vari ance and the st andard error (t he square root of the vari ance),

ref l ect the vari at i on in t he est i mat es over al l possi bl e sampl es t hat coul d have been sel ect ed from

t he popul at i on using the same sampl i ng methodol ogy.

The Ameri can Communi t y Survey (ACS) i s commi t t ed t o provi di ng i t s users wi th measures of

sampl i ng error al ong wi th each publ i shed est i mat e. To accompl i sh t hi s, al l publi shed ACS

est i mat es are accompani ed ei t her by 90 percent margins of error or conf i dence i nterval s, bot h

based on ACS di rect vari ance est i mat es. Due t o t he compl exi t y of the sampl i ng design and t he

wei ghti ng adj ust ment s performed on the ACS sampl e, unbi ased desi gn- based variance est i mat ors

do not exi st . As a consequence, t he di rect vari ance est imat es are comput ed usi ng a repl i cat i on

met hod t hat repeat s t he esti mat i on procedures i ndependent l y several t i mes. The vari ance of t he

f ull sampl e i s t hen est i mat ed by usi ng the vari abi l i t y across t he resul t ing repl i cat e est i mat es.

Al t hough the vari ance est i mat es cal cul at ed usi ng t hi s procedure are not compl et el y unbi ased, t he

current met hod produces vari ances t hat are accurat e enough f or anal ysi s of the ACS dat a.

For Publ i c Use Mi crodat a Sampl e (PUMS) dat a users, repl i cat e wei ght s are provi ded t o approxi mat e

st andard errors f or t he PUMS- t abul at ed est i mat es. Desi gn f act ors are al so provi ded wi t h the PUMS

dat a, so PUMS dat a users can comput e st andard errors of t hei r st at i st i cs usi ng ei ther t he

repl i cat i on method or t he desi gn f act or met hod.

12.2 VARIANCE ESTIMATION FOR ACS HOUSING UNIT AND PERSON ESTIMATES

Unbi ased est i mat es of vari ances f or ACS est i mat es do not exi st because of the syst emat i c sampl e

desi gn, as wel l as t he rat i o adj ust ment s used in est i mat i on. As an al t ernat i ve, ACS i mpl ement s a

repl i cat i on method f or vari ance est i mat i on. An advant age of thi s method i s t hat the vari ance

est i mat es can be comput ed wi t hout consi derat i on of the f orm of t he st at i st i cs or t he compl exi t y of

t he sampl i ng or wei ghti ng procedures, such as t hose bei ng used by t he ACS.

The ACS empl oys t he Successi ve Di f ferences Repl i cat i on (SDR) met hod (Wol t er, 1984; Fay & Trai n,

1995; Judki ns, 1990) t o produce vari ance est i mat es. It has been t he met hod used to cal cul at e ACS

est i mat es of vari ances si nce t he st art of t he survey. The SDR was desi gned t o be used wi t h

syst emat i c sampl es f or which t he sort order of t he sampl e i s inf ormat i ve, as i n t he case of t he

ACS’ s geographi c sort . Appli cat i ons of t hi s met hod were devel oped t o produce est imat es of

vari ances f or the Current Popul at i on Survey (U.S. Census Bureau, 2006) and Census 2000 Long

Form est i mat es (Gbur & Fai rchi l d, 2002).

In the SDR met hod, t he fi rst st ep i n creat ing a vari ance est i mat e i s const ruct i ng t he repl i cat e

f act ors. Repl i cat e base wei ght s are t hen cal cul at ed by mul t i pl ying the base weight f or each

housi ng uni t (HU) by t he f act ors. The wei ghti ng process t hen i s rerun, usi ng each set of repl i cat e

base wei ght s in t urn, t o creat e f i nal repl i cat e wei ght s. Repl i cat e est i mat es are creat ed by usi ng t he

same est i mat i on met hod as t he ori ginal est i mat e, but appl yi ng each set of repl i cat e wei ght s

i nst ead of t he origi nal weight s. Fi nal l y, t he repli cat e and ori ginal est i mat es are used t o comput e

t he vari ance est i mat e based on t he vari abi li t y bet ween the repl i cat e est i mat es and t he ful l sampl e

est i mat e.

12- 2 Vari ance Est i mat i on (Ch. 12 Revi sed 12/ 2010) ACS Desi gn and Met hodol ogy

U.S. Census Bureau

The f ol l owi ng st eps produce t he ACS di rect vari ance est i mat es:

1. Comput e repl i cat e f act ors.

2. Comput e repl i cat e wei ght s.

3. Comput e vari ance est i mat es.

Replicate Factors

Comput at i on of repl i cat e f act ors begi ns wi th t he sel ect ion of a Hadamard mat ri x of order R (a

mul t i pl e of 4), where R i s t he number of repli cat es. A Hadamard mat ri x H i s a k- by- k mat ri x wi t h

al l ent ri es ei ther 1 or −1, such t hat H'H = kI (t hat i s, the col umns are ort hogonal ). For ACS, t he

number of repl i cat es i s 80 (R = 80). Each of t he 80 col umns represent s one repli cat e.

Next , a pai r of rows i n t he Hadamard mat ri x i s assi gned t o each record (HU or group quart ers (GQ)

person). An al gori thm i s used t o assi gn t wo rows of an 80× 80 Hadamard mat ri x t o each HU. The

ACS uses a repeat i ng sequence of 780 pai rs of rows i n t he Hadamard mat ri x t o assi gn rows t o

each record, in sort order (Navarro, 2001a). The assi gnment of Hadamard mat ri x rows repeat s

every 780 records unt il al l records recei ve a pai r of rows f rom t he Hadamard mat ri x. The fi rst row

of t he mat ri x, in whi ch every cel l i s al ways equal t o one, i s not used.

The repl i cat e f act or f or each record then i s det ermi ned f rom these t wo rows of the 80× 80

Hadamard mat ri x. For record i (i = 1,…,n, where n i s sampl e si ze) and repli cat e r (r = 1,…,80), t he

repl i cat e f act or i s comput ed as:

where R1i and R2i are respect i vel y t he f irst and second row of the Hadamard mat ri x assi gned t o

t he i- t h HU, and a

Rl i,r

and a

R2i ,r

are respect i vel y t he mat rix el ement s (ei t her 1 or −1) from t he

Hadamard mat ri x i n rows R1i and R2i and col umn r. Not e t hat t he f ormul a f or ƒ

i,r

• If a

yi el ds repl i cat e

f act ors t hat can t ake one of t hree approxi mat e val ues: 1.7, 1.0, or 0.3. That i s;

R1i,r

= + 1 and a

R2i,r

• If a

= + 1, t he repl i cat e f act or i s 1.

R1i,r

= −1 and a

R2i,r

• If a

= −1, the repl i cat e f act or i s 1.

R1i,r

= + 1 and a

R2i,r

• If a

= −1, t he repl i cat e f act or i s approxi mat el y 1.7.

R1i,r

= −1 and a

R2i,r

The expect at i on i s t hat 50 percent of repl i cat e f act ors wi l l be 1, and t he ot her 50 percent wil l be

evenl y spli t bet ween 1.7 and 0.3 (Gunl i cks, 1996).

= + 1, t he repl i cat e f act or i s approxi mat el y 0.3.

The f ol l owi ng exampl e demonst rat es t he comput at i on of repli cat e f act ors f or a sampl e of si ze

f i ve, using a Hadamard mat ri x of order f our:

Tabl e 12.1 present s an exampl e of a t wo- row assi gnment devel oped f rom thi s mat ri x, and the

val ues of repl i cat e f act ors for each sampl e uni t .

ACS Desi gn and Met hodol ogy (Ch. 12 Revi sed 12/ 2010) Vari ance Est i mat i on 12- 3

U.S. Census Bureau

Tabl e 12.1 Example of Two- Row Assignment, Hadamard Matrix Elements, and Replicate

Factors

Case

#(i)

Row

Hadamard mat rix element Approximat e replicat e

R1 R2

i

Replicat e 1

i

Replicat e 2 Replicat e 3 Replicat e 4

f f

i ,1

f

i ,2

f

i ,3

a

i ,4

a

R1i ,1

a

R2i ,1

a

R1i ,2

a

R2i ,2

a

R1i ,3

a

R2i ,3

a

R1i ,4

1

R2i ,4

2 3 - 1 + 1 + 1 + 1 - 1 - 1 + 1 - 1 0.3 1 1 1.7

2 3 4 + 1 - 1 + 1 + 1 - 1 + 1 - 1 - 1 1.7 1 0.3 1

3 4 2 - 1 - 1 + 1 + 1 + 1 + 1 - 1 + 1 1 1 1 0.3

4 2 3 - 1 + 1 + 1 + 1 - 1 - 1 + 1 - 1 0.3 1 1 1.7

5 3 4 + 1 - 1 + 1 + 1 - 1 + 1 - 1 - 1 1.7 1 0.3 1

Not e t hat row 1 i s not used. For t he thi rd case (i = 3), rows f our and t wo of t he Hadamard mat ri x

are t o cal cul at e t he repl i cat e f act ors. For t he second repl i cat e (r = 2), t he repl i cat e f act or i s

comput ed usi ng the values i n t he second col umn of rows f our (+1) and t wo (+1) as f ol l ows:

Replicate Weights

Repl i cat e wei ght s are produced i n a way si mi l ar t o t hat used t o produce ful l sampl e f i nal weight s.

Al l of t he weight ing adj ustment processes perf ormed on t he f ul l sampl e f i nal survey wei ght s (such

as appl yi ng nonint ervi ew adj ust ment s and popul at i on cont rol s) al so are carri ed out f or each

repl i cat e weight . However, col l apsi ng pat t erns are ret ained f rom t he f ul l sampl e wei ght ing and are

not det ermined agai n f or each set of repl i cat e weight s.

Bef ore appl yi ng t he weight ing st eps expl ai ned i n Chapter 11, t he repl i cat e base wei ght (RBW) f or

repl i cat e r i s comput ed by mul t i pl ying the full sampl e base wei ght (BW— see Chapt er 11 f or t he

comput at i on of t hi s wei ght ) by t he repl i cat e f act or ƒ

i,r

; that i s, RBW

i,r

= BW

i

× ƒ

i ,r

, where RBW

i,r

One can el aborat e on the previ ous exampl e of t he repl icat e const ruct i on usi ng fi ve cases and f our

repl i cat es: Suppose t he f ull sampl e BW val ues are gi ven under t he second col umn of t he f oll owi ng

t abl e (Tabl e 12.2). Then, t he repl i cat e base weight values are gi ven i n columns 7−10.

i s

t he repl i cat e base wei ght f or t he i- t h HU and the r- t h repl i cat e (r = 1, …, 80).

Tabl e 12.2 Example of Computation of Replicate Base Weight Factor (RBW)

Case # BW

Approximat e Replicat e Fact or

i

Replicat e Base Weight

f f

i ,1

f

i ,2

f

i ,3

RBW

i ,4

RBW

i,1

RBW

i,2

RBW

i,3

1

i,4

100 0.3 1 1 1.7 29 100 100 171

2 120 1.7 1 0.3 1 205 120 35 120

3 80 1 1 1 0.3 80 80 80 23

4 120 0.3 1 1 1.7 35 120 120 205

5 110 1.7 1 0.3 1 188 110 32 110

The rest of t he wei ght ing process (Chapt er 11) t hen i s appl i ed t o each repli cat e weight RBW

i ,r

(st art i ng f rom the adj ust ment f or CAPI subsampl i ng) and proceedi ng t o the popul ati on cont rol

adj ust ment or raking). Basi cal l y, t he wei ght ing adj ust ment process i s repeat ed independent l y 80

t i mes and t he RBW

i ,r

i s used i n pl ace of BW

i

By t he end of thi s process, 80 f i nal repli cat e wei ght s f or each HU and person record are produced.

(as i n Chapt er 11).

Variance Estimates

Gi ven the repl i cat e weight s, t he comput at i on of vari ance f or any ACS est i mat e i s strai ght f orward.

Suppose t hat i s an ACS est imat e of any t ype of st at i st i c, such as mean, t ot al , or proport i on. Let

denot e t he est i mat e comput ed based on t he f ull sampl e wei ght , and , , …, denot e t he

est i mat es comput ed based on t he repl i cat e wei ght s. The vari ance of , , i s est i mat ed as t he

12- 4 Vari ance Est i mat i on (Ch. 12 Revi sed 12/ 2010) ACS Desi gn and Met hodol ogy

U.S. Census Bureau

sum of squared dif f erences bet ween each repl i cat e est imat e (r = 1, …, 80) and t he f ul l sampl e

est i mat e . The f ormul a i s as f ol l ows

1

Thi s equat i on hol ds f or count est i mat es as wel l as any ot her t ypes of est i mat es, incl udi ng

percent s, rat i os, and medi ans.

:

There are cert ain cases, however, where thi s f ormul a does not appl y. The f i rst and most i mport ant

cases are est i mat es t hat are “cont rol l ed” t o popul at i on tot al s and have t hei r st andard errors set t o

zero. These are est i mat es that are f orced t o equal int ercensal est i mat es duri ng t he wei ght i ng

process’ s raki ng st ep—f or exampl e, t ot al popul at i on and col l apsed age, sex, and Hi spani c ori gin

est i mat es f or wei ghti ng areas. Al t hough race i s i ncluded i n t he raking procedure, race group

est i mat es are not cont roll ed; t he cat egori es used i n the wei ghti ng process (see Chapt er 11) do not

mat ch t he publ i shed t abul at i on groups because of mult i pl e race responses and the “Some Ot her

Race” cat egory. Inf ormat i on on t he fi nal col l apsi ng of the person post - st rat i f i cat i on cel l s i s passed

f rom t he wei ght ing t o t he vari ance est i mat i on process in order t o i denti f y est i mat es t hat are

cont rol l ed. Thi s i denti f i cat ion i s done i ndependentl y f or al l wei ghti ng areas and t hen i s appl i ed t o

t he geographi c areas used for t abul at i on. St andard errors f or those est i mat es are set t o zero, and

publ i shed margi ns of error are set t o “*****” (wi t h an appropri at e accompanyi ng foot not e).

Anot her speci al case deal s wi t h zero- est i mat ed count s of peopl e, househol ds, or HUs. A di rect

appl i cat i on of t he repl i cat e vari ance f ormul a l eads t o a zero st andard error f or a zero- est i mat ed

count . However, there may be peopl e, househol ds, or HUs wi t h that charact eri st i c i n that area t hat

were not sel ect ed t o be i n the ACS sampl e, but a di ff erent sampl e mi ght have sel ect ed t hem, so a

zero st andard error i s not appropri at e. For t hese cases, t he f ol l owi ng model - based est i mat i on of

st andard error was i mpl ement ed.

For ACS dat a i n a census year, t he ACS zero- est i mat ed count s (f or charact eri st i cs incl uded in t he

100 percent census (“short f orm”) count ) can be checked agai nst t he corresponding census

est i mat es. At l east 90 percent of the census count s f or t he ACS zero- est i mat ed count s shoul d be

wi t hin a 90 percent confi dence i nt erval based on our model ed st andard error.

2

Then, set the 90 percent upper bound f or the zero est imat e equal t o t he Census count :

Let the vari ance of

t he est i mat e be model ed as some mul t i pl e (K) of the average fi nal wei ght (f or a st at e or t he

nat i on). That i s:

Sol vi ng f or K yi el ds:

K was comput ed f or all ACS zero- est i mat ed count s f rom 2000 whi ch mat ched t o Census 2000

100 percent count s, and t hen t he 90t h percent il e of t hose Ks was det ermined. Based on t he

Census 2000 dat a, we use a val ue f or K of 400 (Navarro, 2001b). As t hi s model ing met hod

requi res census count s, t he 400 val ue can next be updat ed usi ng the 2010 Census and 2010 ACS

dat a.

For publ i cat i on, the st andard error (SE) of t he zero count est i mat e i s comput ed as:

1

A general replicat ion- based variance f ormula can be expressed as

where c

r

is t he mult iplier relat ed t o t he r- th replicat e det ermined by t he replicat ion met hod. For t he SDR

met hod, the value of c

r

is 4 / R, where R is t he number of replicat es (Fay & Train, 1995).

2

This modeling was done only once, in 2001, prior t o t he publicat ion of t he 2000 ACS dat a.

ACS Desi gn and Met hodol ogy (Ch. 12 Revi sed 12/ 2010) Vari ance Est i mat i on 12- 5

U.S. Census Bureau

The average wei ght s (t he maxi mum of the average housi ng uni t and average person f inal wei ght s)

are cal cul at ed at t he st at e and nat i onal l evel f or each ACS si ngl e- year or mul ti year dat a rel ease.

Est i mat es f or geographi c areas wi t hin a st at e use t hat st at e’ s average weight , and est i mat es f or

geographi c areas t hat cross st at e boundari es use t he nat i onal average wei ght.

Fi nal l y, a si mil ar met hod i s used t o produce an approximat e st andard error f or bot h ACS zero and

100 percent est i mat es. We do not produce approxi mat e st andard errors f or ot her zero est i mat es,

such as rat i os or medi ans.

Variance Estimation for Multiyear ACS Estimates – Finite Population Correction Factor

Through t he 2008 and 2006- 2008 dat a product s, t he same vari ance est i mat i on met hodol ogy

descri bed above was i mpl ement ed f or bot h 1- year and 3- year. No changes t o t he met hodol ogy

were necessary due t o usi ng mul ti pl e years of sampl e dat a. However, beginni ng wi t h t he 2007-

2009 and 2005- 2009 dat a product s, t he ACS i ncorporat ed a f i ni t e popul at i on correct i on (FPC)

f act or i nt o t he 3- year and 5- year vari ance est i mat i on procedures.

The Census 2000 l ong f orm, as not ed above, used the same SDR vari ance est i mat i on met hodol ogy

as t he ACS current l y does. The l ong f orm met hodol ogy al so i ncl uded an FPC f act or i n i t s

cal cul at i on. One- year ACS sampl es are not l arge enough f or an FPC t o have much impact on

vari ances. However, wi th 5- year ACS est i mat es, up t o 50 percent of housing uni t s in cert ai n

bl ocks may have been in sampl e over the 5- year peri od. Appl yi ng an FPC f act or t o mul t i - year ACS

repl i cat e est i mat es wi ll enabl e a more accurat e est i mat e of t he vari ance, part i cul arly f or smal l

areas. It was deci ded t o appl y t he FPC adj ust ment t o 3- year and 5- year ACS product s, but not t o

1- year product s.

The ACS FPC f act or i s appl ied i n the creat i on of the repl i cat e f act ors:

where i s t he FPC f act or. Generi cal l y, n i s t he unwei ght ed sampl e si ze, and N i s t he

unwei ght ed uni verse si ze. The ACS uses t wo separat e FPC f act ors: one f or HUs respondi ng by mai l

or t el ephone, and a second f or HUs responding vi a personal vi si t f ol l ow- up.

The FPC i s t ypi cal l y appl i ed as a mul t i pl i cat i ve f act or “out si de” t he vari ance f ormula. However,

under cert ai n si mpli f ying assumpt i ons, t he vari ance using t he repli cat e f act ors af t er appl yi ng the

FPC f act or i s equal t o t he ori gi nal vari ance mul t i pl ied by t he FPC f act or. Thi s met hod al l ows a

di rect appl i cat i on of t he FPC t o each housi ng uni t ’ s or person’ s set of repli cat e weight s, and a

seaml ess i ncorporat i on int o t he ACS’ s current vari ance product i on met hodol ogy, rat her than

havi ng t o keep track of mult i pl i cat i ve f act ors when t abul at i ng across areas of di f f erent sampl i ng

rat es.

The adj ust ed repl i cat e f act ors are used t o creat ed repl i cat e base wei ght s, and ul t i mat el y f i nal

repl i cat e weight s. It i s expect ed t hat t he i mprovement in t he vari ance est i mat e wi l l carry t hrough

t he wei ghti ng, and wil l be seen when t he f inal wei ght s are used.

The ACS FPC f act or coul d be appl i ed at any geographi c l evel . Si nce the ACS sampl i ng rat es are

det ermi ned at the smal l area l evel (mai nl y census t ract s and government al uni t s), a l ow level of

geography was desi rabl e. At hi gher l evel s, t he hi gh sampl i ng rat es i n specif i c bl ocks woul d l ikel y

be masked by t he l ower rates i n surrounding bl ocks. For t hat reason, the f act ors are appl i ed at t he

census t ract l evel.

Group quart ers persons do not have an FPC f act or appl i ed t o t heir repl i cat e f act ors.

12.3 MARGIN OF ERROR AND CONFIDENCE INTERVAL

Once t he st andard errors have been comput ed, margins of error and/ or conf i dence bounds are

produced f or each est i mat e. These are the measures of overal l sampl ing error present ed al ong

wi t h each publ i shed ACS est i mat e. Al l publ i shed ACS margi ns of error and t he l ower and upper

12- 6 Vari ance Est i mat i on (Ch. 12 Revi sed 12/ 2010) ACS Desi gn and Met hodol ogy

U.S. Census Bureau

bounds of confi dence i nt erval s present ed in t he ACS dat a product s are based on a 90 percent

conf i dence l evel, whi ch i s the Census Bureau’ s st andard (U.S. Census Bureau, 2010b). A margi n of

error cont ains t wo component s: t he st andard error of the est i mat e, and a mul t i pli cat i on f act or

based on a chosen conf i dence l evel. For the 90 percent conf i dence l evel, t he val ue of t he

mul t i pl i cat i on f act or used by t he ACS i s 1.645. The margi n of error of an est i mat e can be

comput ed as:

where SE( ) i s t he st andard error of t he est i mat e . Gi ven t hi s margin of error, t he 90 percent

conf i dence i nt erval can be comput ed as:

That i s, t he l ower bound of t he confi dence int erval i s [ − margi n of error ( ) ], and t he upper

bound of the conf i dence i nterval i s [ + margin of error ( ) ]. Roughl y speaki ng, t hi s int erval i s a

range t hat wi l l cont ai n the ‘ ‘ f ull popul at i on val ue’ ’ of t he est i mat ed charact eri st i c wi t h a known

probabi l i t y.

Users are caut i oned t o consi der ‘ ‘ l ogi cal ’ ’ boundari es when creat ing conf i dence bounds f rom t he

margi ns of error. For exampl e, a smal l popul at i on est imat e may have a cal cul at ed l ower bound

l ess t han zero. A negat i ve number of peopl e does not make sense, so t he l ower bound shoul d be

set t o zero inst ead. Li kewi se, bounds f or percent s shoul d not go bel ow zero percent or above 100

percent . For other charact eri st i cs, l i ke i ncome, negat i ve val ues may be l egi t i mat e.

Gi ven the confi dence bounds, a margi n of error can be comput ed as t he di f ference bet ween an

est i mat e and i t s upper or l ower confi dence bounds:

Usi ng t he margin of error (as publ i shed or cal cul at ed f rom t he bounds), t he st andard error i s

obt ai ned as f ol l ows:

For ranking t abl es and compari son prof i les, t he ACS provi des an indi cat or as t o whet her two

est i mat es, Est

1

and Est

2

If Z < −1.645 or Z > 1.645, t he di ff erence bet ween the est i mat es i s si gnif i cant at the 90 percent

l evel. Det ermi nat i ons of st at i st i cal si gnif i cance are made usi ng unrounded values of t he st andard

errors, so users may not be abl e t o achi eve the same resul t using the st andard errors deri ved f rom

t he rounded est i mat es and margi ns of error as publ i shed. Onl y pai rwi se t est s are used t o

det ermi ne si gni fi cance i n the ranking t abl es; no mul t i ple compari son met hods are used.

, are st at i st i cal l y si gni fi cant l y di ff erent at t he 90 percent conf i dence l evel.

That det ermi nat i on i s made by i ni t i all y cal cul at i ng:

12.4 VARIANCE ESTIMATION FOR THE PUMS

The Census Bureau cannot possi bl y predi ct al l combi nat i ons of est i mat es and geography t hat may

be of int erest t o dat a users. Dat a users can downl oad PUMS f i les and t abul at e t he dat a t o creat e

est i mat es of t hei r own choosi ng. Because the ACS PUMS cont ai ns onl y a subset of the ful l ACS

sampl e, est i mat es f rom t he ACS PUMS f i l e wi ll of ten be di f f erent f rom the publi shed ACS est i mat es

t hat are based on t he f ul l ACS sampl e.

Users of t he ACS PUMS fi l es can comput e t he est i mat ed vari ances of thei r st at i st i cs usi ng one of

t wo opt i ons: (1) t he repl i cat i on method using repl i cat e wei ght s rel eased wi t h the PUMS dat a, and

(2) t he desi gn f act or method.

ACS Desi gn and Met hodol ogy (Ch. 12 Revi sed 12/ 2010) Vari ance Est i mat i on 12- 7

U.S. Census Bureau

PUMS Replicate Variances

For t he repl i cat e met hod, direct vari ance est i mat es based on t he SDR f ormul a as descri bed i n

Sect i on 12.2 above can be i mpl ement ed. Users can si mpl y t abul at e 80 repl i cat e est i mat es i n

addi t i on t o t heir desi red esti mat e by usi ng t he provi ded 80 repl i cat e wei ght s, and then appl y t he

vari ance f ormul a:

PUMS Design Factor Variances

Si mi l ar t o met hods used t o cal cul at e st andard errors f or PUMS dat a f rom Census 2000, t he ACS

PUMS provi des t abl es of desi gn f act ors f or vari ous t opics such as age f or persons or t enure f or

HUs. For exampl e, the 2009 ACS PUMS desi gn f act ors are publ i shed at nat i onal and st at e l evel s

(U.S. Census Bureau, 2010a), and were cal cul at ed usi ng 2009 ACS dat a. PUMS desi gn f act ors are

updat ed peri odi cal l y, but not necessari l y on an annual basi s. The desi gn f act or approach was

devel oped based on a model t hat uses a st andard error f rom a si mpl e random sampl e as t he base,

and t hen i nf l at es i t t o account f or an increase i n t he vari ance caused by t he compl ex sampl e

desi gn. St andard errors f or al most al l count s and proport i ons of persons, households, and HUs

are approxi mat ed usi ng desi gn f act ors. For 1- year ACS PUMS f i l es begi nni ng wi th 2005, use:

f or a t ot al , and

f or a percent, where:

= t he est i mat e of t ot al or a count .

= t he est i mat e of a percent .

DF = t he appropri at e desi gn fact or based on t he t opi c of t he est i mat e.

N = t he t ot al f or the geographi c area of int erest (i f the est i mat e i s of HUs, the number

of HUs i s used; i f t he est i mat e i s of f amil i es or househol ds, t he number of

househol ds i s used; otherwise t he number of persons i s used as N).

B = t he denominat or (base) of t he percent .

The val ue 99 i n t he f ormul a i s t he val ue of the 1- year PUMS FPC f act or, whi ch i s comput ed as (100

− ƒ) / ƒ, where ƒ (gi ven as a percent ) i s t he sampl i ng rat e f or the PUMS dat a. Si nce t he PUMS i s

approxi mat el y a 1 percent sampl e of HUs, (100 − ƒ) / ƒ = (100 − 1)/ 1 = 99.

For 3- year PUMS f i l es beginni ng wi th 2005−2007, t he 3 years’ wort h of dat a represent

approxi mat el y a 3 percent sampl e of HUs. Hence, t he 3- year PUMS FPC f act or i s (100 − ƒ) / ƒ =

(100 − 3) / 3 = 97 / 3. To cal cul at e st andard errors from 3- year PUMS dat a, subst i t ut e 97 / 3 f or

99 i n t he above f ormul as.

Si mi l arl y, 5- year PUMS f i les, begi nni ng wi t h 2005- 2009, represent approxi mat el y a 5 percent

sampl e of HUs. So, t he 5- year PUMS FPC i s 95 / 5 = 19, whi ch can be subst i t ut ed for 99 i n t he

above f ormul as.

The desi gn f act or (DF) i s def i ned as t he rat i o of t he st andard error of an est i mat ed paramet er

(comput ed under t he repl i cat i on met hod descri bed in Sect i on 12.2) t o t he st andard error based on

a si mpl e random sampl e of t he same si ze. The DF ref l ect s t he ef fect of the act ual sampl e desi gn

and est i mat i on procedures used f or the ACS. The DF f or each t opi c was comput ed by model i ng

12- 8 Vari ance Est i mat i on (Ch. 12 Revi sed 12/ 2010) ACS Desi gn and Met hodol ogy

U.S. Census Bureau

t he rel at i onshi p bet ween t he st andard error under t he repl i cat i on met hod (RSE) wi t h t he st andard

error based on a si mple random sampl e (SRSSE); t hat i s, RSE = DF × SRSSE, where the SRSSE i s

comput ed as f ol l ows:

The val ue 39 i n t he f ormul a above i s t he FPC f act or based on an approxi mat e sampl i ng f ract i on of

2.5 percent i n the ACS; t hat i s, (100 − 2.5) / 2.5 = 97.5 / 2.5 = 39.

The val ue of DF i s obt ai ned by f i t t i ng t he no- i ntercept regressi on model RSE = DF × SRSSE usi ng

st andard errors (RSE, SRSSE) f or vari ous publ i shed t abl e est i mat es at t he nat i onal and st at e l evel s.

The val ues of DFs by t opi c can be obt ai ned from t he “PUMS Accuracy of t he Dat a” st at ement t hat

i s publ i shed wi t h each PUMS f i l e. For exampl e, 2009 1- year PUMS DFs can be f ound i n U,S,

Census Bureau (2010a).. The document at i on al so provides exampl es on how t o use t he design

f act ors t o comput e st andard errors f or t he est i mat es of t ot al s, means, medi ans, proport i ons or

percent ages, rat i os, sums, and di f f erences.

The t opi cs f or t he 2009 PUMS desi gn f act ors are, f or the most part , t he same ones t hat were

avai l abl e f or t he Census 2000 PUMS. We recommend t o users t hat , i n usi ng t he desi gn f act or

approach, i f t he est i mat e i s a combi nat i on of t wo or more charact eri st i cs, t he l argest DF f or thi s

combi nat i on of charact eri st ics i s used. The onl y except ions t o t hi s are i t ems crossed wi t h race or

Hi spani c ori gin; f or t hese i tems, t he l argest DF i s used, af t er removi ng t he race or Hi spani c ori gin

DFs f rom consi derat i on.

12.5 REFERENCES

Fay, R., & Trai n, G. (1995). Aspect s of Survey and Model Based Post censal Est i mat i on of Income

and Povert y Charact eri st i cs f or St at es and Counti es. Joint Statistical Meetings: Proceedings of the

Section on Government Statistics (pp. 154- 159). Al exandri a, VA: Ameri can St at i st i cal Associ at i on:

ht t p:/ / www.census.gov/ di d/ www/ sai pe/ publ i cat i ons/ f i l es/ FayTrai n95.pdf

Gbur, P., & Fai rchi l d, L. (2002). Overvi ew of t he U.S. Census 2000 Long Form Di rect Vari ance

Est i mat i on. Joint Statistical Meetings: Proceedings of the Section on Survey Research Methods (pp.

1139- 1144). Al exandri a, VA: Ameri can St at i st i cal Associ at i on.

Gunl i cks, C. (1996). 1990 Replicate Variance System (VAR90-20). Washi ngt on, DC: U.S. Census

Bureau.

Judki ns, D. R. (1990). Fay's Met hod f or Vari ance Est i mat i on. Journal of Official Statistics , 6 (3),

223- 239.

Navarro, A. (2001a). 2000 American Community Survey Comparison County Replicate Factors.

Ameri can Communi t y Survey Vari ance Memorandum Seri es #ACS- V- 01. Washi ngt on, DC: U.S.

Census Bureau.

Navarro, A. (2001b). Estimating Standard Errors of Zero Estimates. Washi ngt on, DC: U.S. Census

Bureau.

U.S. Census Bureau. (2006). Current Population Survey: Technical Paper 66—Design and

Methodology. Ret ri eved from U.S. Census Bureau: ht t p:/ / www.census.gov/ prod/ 2006pubs/ t p-

66.pdf

U.S. Census Bureau. (2010a). PUMS Accuracy of the Data (2009). Ret ri eved f rom U.S. Census

Bureau:

ht t p:/ / www.census.gov/ acs/ www/ Downl oads/ dat a_document at i on/ pums/ Accuracy/ 2009Accurac

yPUMS.pdf

U.S. Census Bureau. (2010b). Statistical Quality Standard E2: Reporting Results. Washi ngt on, DC:

U.S. Census Bureau: ht t p:/ / www.census.gov/ qual i t y/ st andards/ st andarde2.ht ml

ACS Desi gn and Met hodol ogy (Ch. 12 Revi sed 12/ 2010) Vari ance Est i mat i on 12- 9

U.S. Census Bureau

Wol t er, K. M. (1984). An Invest i gat i on of Some Est i mat ors of Vari ance f or Syst emat i c Sampl i ng.

Journal of the American Statistical Association , 79, 781- 790.

Appendix 3.

Statistical Science

2001, Vol. 16, No. 2, 101–133

Interval Estimation for

a Binomial Proportion

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta

Abstract. We revisit the problem of interval estimation of a binomial

proportion. The erratic behavior of the coverage probability of the stan-

dard Wald conﬁdence interval has previously been remarked on in the

literature (Blyth and Still, Agresti and Coull, Santner and others). We

begin by showing that the chaotic coverage properties of the Wald inter-

val are far more persistent than is appreciated. Furthermore, common

textbook prescriptions regarding its safety are misleading and defective

in several respects and cannot be trusted.

This leads us to consideration of alternative intervals. A number of

natural alternatives are presented, each with its motivation and con-

text. Each interval is examined for its coverage probability and its length.

Based on this analysis, we recommend the Wilson interval or the equal-

tailed Jeffreys prior interval for small n and the interval suggested in

Agresti and Coull for larger n. We also provide an additional frequentist

justiﬁcation for use of the Jeffreys interval.

Key words and phrases: Bayes, binomial distribution, conﬁdence

intervals, coverage probability, Edgeworth expansion, expected length,

Jeffreys prior, normal approximation, posterior.

1. INTRODUCTION

This article revisits one of the most basic and

methodologically important problems in statisti-

cal practice, namely, interval estimation of the

probability of success in a binomial distribu-

tion. There is a textbook conﬁdence interval for

this problem that has acquired nearly universal

acceptance in practice. The interval, of course, is

ˆ r ± z

α¡2

n

−1¡2

( ˆ r(1 − ˆ r))

1¡2

, where ˆ r = .¡n is

the sample proportion of successes, and z

α¡2

is the

100(1 − α¡2)th percentile of the standard normal

distribution. The interval is easy to present and

motivate and easy to compute. With the exceptions

Lawrence D. Brown is Professor of Statistics, The

Wharton School, University of Pennsylvania, 3000

Steinberg Hall-Dietrich Hall, 3620 Locust Walk,

Philadelphia, Pennsylvania 19104-6302. T. Tony Cai

is Assistant Professor of Statistics, The Wharton

School, University of Pennsylvania, 3000 Steinberg

Hall-Dietrich Hall, 3620 Locust Walk, Philadelphia,

Pennsylvania 19104-6302. Anirban DasGupta is

Professor, Department of Statistics, Purdue Uni-

versity, 1399 Mathematical Science Bldg., West

Lafayette, Indiana 47907-1399

of the I test, linear regression, and ANOVA, its

popularity in everyday practical statistics is virtu-

ally unmatched. The standard interval is known as

the Wald interval as it comes from the Wald large

sample test for the binomial case.

So at ﬁrst glance, one may think that the problem

is too simple and has a clear and present solution.

In fact, the problem is a difﬁcult one, with unantic-

ipated complexities. It is widely recognized that the

actual coverage probability of the standard inter-

val is poor for r near 0 or 1. Even at the level of

introductory statistics texts, the standard interval

is often presented with the caveat that it should be

used only when n· min(r, 1−r) is at least 5 (or 10).

Examination of the popular texts reveals that the

qualiﬁcations with which the standard interval is

presented are varied, but they all reﬂect the concern

about poor coverage when r is near the boundaries.

In a series of interesting recent articles, it has

also been pointed out that the coverage proper-

ties of the standard interval can be erratically

poor even if r is not near the boundaries; see, for

instance, Vollset (1993), Santner (1998), Agresti and

Coull (1998), and Newcombe (1998). Slightly older

literature includes Ghosh (1979), Cressie (1980)

and Blyth and Still (1983). Agresti and Coull (1998)

101

102 L. D. BROWN, T. T. CAI AND A. DASGUPTA

particularly consider the nominal 95% case and

show the erratic and poor behavior of the stan-

dard interval’s coverage probability for small n

even when r is not near the boundaries. See their

Figure 4 for the cases n = 5 and 10.

We will show in this article that the eccentric

behavior of the standard interval’s coverage prob-

ability is far deeper than has been explained or is

appreciated by statisticians at large. We will show

that the popular prescriptions the standard inter-

val comes with are defective in several respects and

are not to be trusted. In addition, we will moti-

vate, present and analyze several alternatives to the

standard interval for a general conﬁdence level. We

will ultimately make recommendations about choos-

ing a speciﬁc interval for practical use, separately

for different intervals of values of n. It will be seen

that for small n (40 or less), our recommendation

differs from the recommendation Agresti and Coull

(1998) made for the nominal 95% case. To facili-

tate greater appreciation of the seriousness of the

problem, we have kept the technical content of this

article at a minimal level. The companion article,

Brown, Cai and DasGupta (1999), presents the asso-

ciated theoretical calculations on Edgeworth expan-

sions of the various intervals’ coverage probabili-

ties and asymptotic expansions for their expected

lengths.

In Section 2, we ﬁrst present a series of exam-

ples on the degree of severity of the chaotic behav-

ior of the standard interval’s coverage probability.

The chaotic behavior does not go away even when

n is quite large and r is not near the boundaries.

For instance, when n is 100, the actual coverage

probability of the nominal 95% standard interval

is 0.952 if r is 0.106, but only 0.911 if r is 0.107.

The behavior of the coverage probability can be even

more erratic as a function of n. If the true r is 0.5,

the actual coverage of the nominal 95% interval is

0.953 at the rather small sample size n = 17, but

falls to 0.919 at the much larger sample size n = 40.

This eccentric behavior can get downright

extreme in certain practically important prob-

lems. For instance, consider defective proportions in

industrial quality control problems. There it would

be quite common to have a true r that is small. If

the true r is 0.005, then the coverage probability

of the nominal 95% interval increases monotoni-

cally in n all the way up to n = 591 to the level

0.945, only to drop down to 0.792 if n is 592. This

unlucky spell continues for a while, and then the

coverage bounces back to 0.948 when n is 953, but

dramatically falls to 0.852 when n is 954. Subse-

quent unlucky spells start off at n = 1279, 1583 and

on and on. It should be widely known that the cov-

erage of the standard interval can be signiﬁcantly

lower at quite large sample sizes, and this happens

in an unpredictable and rather random way.

Continuing, also in Section 2 we list a set of com-

mon prescriptions that standard texts present while

discussing the standard interval. We show what

the deﬁciencies are in some of these prescriptions.

Proposition 1 and the subsequent Table 3 illustrate

the defects of these common prescriptions.

In Sections 3 and 4, we present our alterna-

tive intervals. For the purpose of a sharper focus

we present these alternative intervals in two cat-

egories. First we present in Section 3 a selected

set of three intervals that clearly stand out in

our subsequent analysis; we present them as our

“recommended intervals.” Separately, we present

several other intervals in Section 4 that arise as

clear candidates for consideration as a part of a

comprehensive examination, but do not stand out

in the actual analysis.

The short list of recommended intervals contains

the score interval, an interval recently suggested

in Agresti and Coull (1998), and the equal tailed

interval resulting from the natural noninforma-

tive Jeffreys prior for a binomial proportion. The

score interval for the binomial case seems to

have been introduced in Wilson (1927); so we call

it the Wilson interval. Agresti and Coull (1998)

suggested, for the special nominal 95% case, the

interval ˜ r±z

0.025

˜ n

−1¡2

( ˜ r(1− ˜ r))

1¡2

, where ˜ n = n÷4

and ˜ r = (.÷ 2)¡(n ÷ 4); this is an adjusted Wald

interval that formally adds two successes and

two failures to the observed counts and then uses

the standard method. Our second interval is the

appropriate version of this interval for a general

conﬁdence level; we call it the Agresti–Coull inter-

val. By a slight abuse of terminology, we call our

third interval, namely the equal-tailed interval

corresponding to the Jeffreys prior, the Jeffreys

interval.

In Section 3, we also present our ﬁndings on the

performances of our “recommended” intervals. As

always, two key considerations are their coverage

properties and parsimony as measured by expected

length. Simplicity of presentation is also sometimes

an issue, for example, in the context of classroom

presentation at an elementary level. On considera-

tion of these factors, we came to the conclusion that

for small n (40 or less), we recommend that either

the Wilson or the Jeffreys prior interval should

be used. They are very similar, and either may be

used depending on taste. The Wilson interval has a

closed-form formula. The Jeffreys interval does not.

One can expect that there would be resistance to

using the Jeffreys interval solely due to this rea-

son. We therefore provide a table simply listing the

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 103

limits of the Jeffreys interval for n up to 30 and

in addition also give closed form and very accurate

approximations to the limits. These approximations

do not need any additional software.

For larger n (n > 40), the Wilson, the Jeffreys

and the Agresti–Coull interval are all very simi-

lar, and so for such n, due to its simplest form,

we come to the conclusion that the Agresti–Coull

interval should be recommended. Even for smaller

sample sizes, the Agresti–Coull interval is strongly

preferable to the standard one and so might be the

choice where simplicity is a paramount objective.

The additional intervals we considered are two

slight modiﬁcations of the Wilson and the Jeffreys

intervals, the Clopper–Pearson “exact” interval,

the arcsine interval, the logit interval, the actual

Jeffreys HPD interval and the likelihood ratio

interval. The modiﬁed versions of the Wilson and

the Jeffreys intervals correct disturbing downward

spikes in the coverages of the original intervals very

close to the two boundaries. The other alternative

intervals have earned some prominence in the liter-

ature for one reason or another. We had to apply a

certain amount of discretion in choosing these addi-

tional intervals as part of our investigation. Since

we wish to direct the main part of our conversation

to the three “recommended” intervals, only a brief

summary of the performances of these additional

intervals is presented along with the introduction

of each interval. As part of these quick summaries,

we indicate why we decided against including them

among the recommended intervals.

We strongly recommend that introductory texts

in statistics present one or more of these recom-

mended alternative intervals, in preference to the

standard one. The slight sacriﬁce in simplicity

would be more than worthwhile. The conclusions

we make are given additional theoretical support

by the results in Brown, Cai and DasGupta (1999).

Analogous results for other one parameter discrete

families are presented in Brown, Cai and DasGupta

(2000).

2. THE STANDARD INTERVAL

When constructing a conﬁdence interval we usu-

ally wish the actual coverage probability to be close

to the nominal conﬁdence level. Because of the dis-

crete nature of the binomial distribution we cannot

always achieve the exact nominal conﬁdence level

unless a randomized procedure is used. Thus our

objective is to construct nonrandomized conﬁdence

intervals for r such that the coverage probability

T

r

(r ∈ C1) ≈ 1 − α where α is some prespeciﬁed

value between 0 and 1. We will use the notation

C(r, n) = T

r

(r ∈ C1), 0 - r - 1, for the coverage

probability.

A standard conﬁdence interval for r based on nor-

mal approximation has gained universal recommen-

dation in the introductory statistics textbooks and

in statistical practice. The interval is known to guar-

antee that for any ﬁxed r ∈ (0, 1), C(r, n) → 1 −α

as n → ∞.

Let φ(z) and 4(z) be the standard normal density

and distribution functions, respectively. Throughout

the paper we denote κ ≡ z

α¡2

= 4

−1

(1 − α¡2), ˆ r =

.¡n and ˆ q = 1 − ˆ r. The standard normal approxi-

mation conﬁdence interval C1

:

is given by

C1

:

= ˆ r ±κ n

−1¡2

( ˆ rˆ q)

1¡2

. (1)

This interval is obtained by inverting the accep-

tance region of the well known Wald large-sample

normal test for a general problem:

[(

ˆ

θ −θ)¡¨ :c(

ˆ

θ)[ ≤ κ, (2)

where θ is a generic parameter,

ˆ

θ is the maximum

likelihood estimate of θ and ¨ :c(

ˆ

θ) is the estimated

standard error of

ˆ

θ. In the binomial case, we have

θ = r,

ˆ

θ = .¡n and ¨ :c(

ˆ

θ) = ( ˆ rˆ q)

1¡2

n

−1¡2

.

The standard interval is easy to calculate and

is heuristically appealing. In introductory statis-

tics texts and courses, the conﬁdence interval C1

:

is usually presented along with some heuristic jus-

tiﬁcation based on the central limit theorem. Most

students and users no doubt believe that the larger

the number n, the better the normal approximation,

and thus the closer the actual coverage would be to

the nominal level 1−α. Further, they would believe

that the coverage probabilities of this method are

close to the nominal value, except possibly when n

is “small” or r is “near” 0 or 1. We will show how

completely both of these beliefs are false. Let us

take a close look at how the standard interval C1

:

really performs.

2.1 Lucky n, Lucky p

An interesting phenomenon for the standard

interval is that the actual coverage probability

of the conﬁdence interval contains nonnegligible

oscillation as both r and n vary. There exist some

“lucky” pairs (r, n) such that the actual coverage

probability C(r, n) is very close to or larger than

the nominal level. On the other hand, there also

exist “unlucky” pairs (r, n) such that the corre-

sponding C(r, n) is much smaller than the nominal

level. The phenomenon of oscillation is both in n,

for ﬁxed r, and in r, for ﬁxed n. Furthermore, dras-

tic changes in coverage occur in nearby r for ﬁxed

n and in nearby n for ﬁxed r. Let us look at ﬁve

simple but instructive examples.

104 L. D. BROWN, T. T. CAI AND A. DASGUPTA

Fig. 1. Standard interval; oscillation phenomenon for ﬁxed r = 0.2 and variable n = 25 to 100.

The probabilities reported in the following plots

and tables, as well as those appearing later in

this paper, are the result of direct probability

calculations produced in S-PLUS. In all cases

their numerical accuracy considerably exceeds the

number of signiﬁcant ﬁgures reported and/or the

accuracy visually obtainable from the plots. (Plots

for variable r are the probabilities for a ﬁne grid

of values of r, e.g., 2000 equally spaced values of r

for the plots in Figure 5.)

Example 1. Figure 1 plots the coverage prob-

ability of the nominal 95% standard interval for

r = 0.2. The number of trials n varies from 25 to

100. It is clear from the plot that the oscillation is

signiﬁcant and the coverage probability does not

steadily get closer to the nominal conﬁdence level

as n increases. For instance, C(0.2, 30) = 0.946 and

C(0.2, 98) = 0.928. So, as hard as it is to believe,

the coverage probability is signiﬁcantly closer to

0.95 when n = 30 than when n = 98. We see that

the true coverage probability behaves contrary to

conventional wisdom in a very signiﬁcant way.

Example 2. Now consider the case of r = 0.5.

Since r = 0.5, conventional wisdom might suggest

to an unsuspecting user that all will be well if n is

about 20. We evaluate the exact coverage probabil-

ity of the 95% standard interval for 10 ≤ n ≤ 50.

In Table 1, we list the values of “lucky” n [deﬁned

as C(r, n) ≥ 0.95] and the values of “unlucky” n

[deﬁned for speciﬁcity as C(r, n) ≤ 0.92]. The con-

clusions presented in Table 2 are surprising. We

Table 1

Standard interval; lucky n and unlucky n for 10 ≤ n ≤ 50 and r = 0.5

Lucky n 17 20 25 30 35 37 42 44 49

C(0.5, n) 0.951 0.959 0.957 .957 0.959 0.953 0.956 0.951 0.956

Unlucky n 10 12 13 15 18 23 28 33 40

C(0.5, n) 0.891 0.854 0.908 0.882 0.904 0.907 0.913 0.920 0.919

note that when n = 17 the coverage probability

is 0.951, but the coverage probability equals 0.904

when n = 18. Indeed, the unlucky values of n arise

suddenly. Although r is 0.5, the coverage is still

only 0.919 at n = 40. This illustrates the inconsis-

tency, unpredictability and poor performance of the

standard interval.

Example 3. Now let us move r really close to

the boundary, say r = 0.005. We mention in the

introduction that such r are relevant in certain

practical applications. Since r is so small, now one

may fully expect that the coverage probability of

the standard interval is poor. Figure 2 and Table

2.2 show that there are still surprises and indeed

we now begin to see a whole new kind of erratic

behavior. The oscillation of the coverage probabil-

ity does not show until rather large n. Indeed, the

coverage probability makes a slow ascent all the

way until n = 591, and then dramatically drops to

0.792 when n = 592. Figure 2 shows that thereafter

the oscillation manifests in full force, in contrast

to Examples 1 and 2, where the oscillation started

early on. Subsequent “unlucky” values of n again

arise in the same unpredictable way, as one can see

from Table 2.2.

2.2 Inadequate Coverage

The results in Examples 1 to 3 already show that

the standard interval can have coverage noticeably

smaller than its nominal value even for values of n

and of nr(1 − r) that are not small. This subsec-

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 105

Table 2

Standard interval; late arrival of unlucky n for small r

Unlucky n 592 954 1279 1583 1876

C(0.005, n) 0.792 0.852 0.875 0.889 0.898

tion contains two more examples that display fur-

ther instances of the inadequacy of the standard

interval.

Example 4. Figure 3 plots the coverage probabil-

ity of the nominal 95% standard interval with ﬁxed

n = 100 and variable r. It can be seen from Fig-

ure 3 that in spite of the “large” sample size, signiﬁ-

cant change in coverage probability occurs in nearby

r. The magnitude of oscillation increases signiﬁ-

cantly as r moves toward 0 or 1. Except for values

of r quite near r = 0.5, the general trend of this

plot is noticeably below the nominal coverage value

of 0.95.

Example 5. Figure 4 shows the coverage proba-

bility of the nominal 99% standard interval with n =

20 and variable r from 0 to 1. Besides the oscilla-

tion phenomenon similar to Figure 3, a striking fact

in this case is that the coverage never reaches the

nominal level. The coverage probability is always

smaller than 0.99, and in fact on the average the

coverage is only 0.883. Our evaluations show that

for all n ≤ 45, the coverage of the 99% standard

interval is strictly smaller than the nominal level

for all 0 - r - 1.

It is evident from the preceding presentation

that the actual coverage probability of the standard

interval can differ signiﬁcantly from the nominal

conﬁdence level for moderate and even large sam-

ple sizes. We will later demonstrate that there are

other conﬁdence intervals that perform much better

Fig. 2. Standard interval; oscillation in coverage for small r.

in this regard. See Figure 5 for such a comparison.

The error in coverage comes from two sources: dis-

creteness and skewness in the underlying binomial

distribution. For a two-sided interval, the rounding

error due to discreteness is dominant, and the error

due to skewness is somewhat secondary, but still

important for even moderately large n. (See Brown,

Cai and DasGupta, 1999, for more details.) Note

that the situation is different for one-sided inter-

vals. There, the error caused by the skewness can

be larger than the rounding error. See Hall (1982)

for a detailed discussion on one-sided conﬁdence

intervals.

The oscillation in the coverage probability is

caused by the discreteness of the binomial dis-

tribution, more precisely, the lattice structure of

the binomial distribution. The noticeable oscil-

lations are unavoidable for any nonrandomized

procedure, although some of the competing proce-

dures in Section 3 can be seen to have somewhat

smaller oscillations than the standard procedure.

See the text of Casella and Berger (1990) for intro-

ductory discussion of the oscillation in such a

context.

The erratic and unsatisfactory coverage prop-

erties of the standard interval have often been

remarked on, but curiously still do not seem to

be widely appreciated among statisticians. See, for

example, Ghosh (1979), Blyth and Still (1983) and

Agresti and Coull (1998). Blyth and Still (1983) also

show that the continuity-corrected version still has

the same disadvantages.

2.3 Textbook Qualiﬁcations

The normal approximation used to justify the

standard conﬁdence interval for r can be signiﬁ-

cantly in error. The error is most evident when the

true r is close to 0 or 1. See Lehmann (1999). In

fact, it is easy to show that, for any ﬁxed n, the

106 L. D. BROWN, T. T. CAI AND A. DASGUPTA

Fig. 3. Standard interval; oscillation phenomenon for ﬁxed n = 100 and variable r.

conﬁdence coefﬁcient C(r, n) → 0 as r → 0 or 1.

Therefore, most major problems arise as regards

coverage probability when r is near the boundaries.

Poor coverage probabilities for r near 0 or 1 are

widely remarked on, and generally, in the popu-

lar texts, a brief sentence is added qualifying when

to use the standard conﬁdence interval for r. It

is interesting to see what these qualiﬁcations are.

A sample of 11 popular texts gives the following

qualiﬁcations:

The conﬁdence interval may be used if:

1. nr, n(1 −r) are ≥ 5 (or 10);

2. nr(1 −r) ≥ 5 (or 10);

3. n ˆ r, n(1 − ˆ r) are ≥ 5 (or 10);

4. ˆ r ±3

**ˆ r(1 − ˆ r)¡n does not contain 0 or 1;
**

5. n quite large;

6. n ≥ 50 unless r is very small.

It seems clear that the authors are attempting to

say that the standard interval may be used if the

central limit approximation is accurate. These pre-

scriptions are defective in several respects. In the

estimation problem, (1) and (2) are not veriﬁable.

Even when these conditions are satisﬁed, we see,

for instance, from Table 1 in the previous section,

that there is no guarantee that the true coverage

probability is close to the nominal conﬁdence level.

Fig. 4. Coverage of the nominal 99% standard interval for ﬁxed n = 20 and variable r.

For example, when n = 40 and r = 0.5, one has

nr = n(1 − r) = 20 and nr(1 − r) = 10, so clearly

either of the conditions (1) and (2) is satisﬁed. How-

ever, from Table 1, the true coverage probability in

this case equals 0.919 which is certainly unsatisfac-

tory for a conﬁdence interval at nominal level 0.95.

The qualiﬁcation (5) is useless and (6) is patently

misleading; (3) and (4) are certainly veriﬁable, but

they are also useless because in the context of fre-

quentist coverage probabilities, a data-based pre-

scription does not have a meaning. The point is that

the standard interval clearly has serious problems

and the inﬂuential texts caution the readers about

that. However, the caution does not appear to serve

its purpose, for a variety of reasons.

Here is a result that shows that sometimes the

qualiﬁcations are not correct even in the limit as

n → ∞.

Proposition 1. Let γ > 0. For the standard con-

ﬁdence interval,

lim

n→∞

inf

r: nr, n(1−r)≥γ

C(r, n) (3)

≤ T(o

γ

- Poisson(γ) ≤ b

γ

),

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 107

Fig. 5. Coverage probability for n = 50.

Table 3

Standard interval; bound (3) on limiting minimum coverage

when nr, n(1 −r) ≥ γ

5 7 10

lim

n→∞

inf

r: nr, n(1−r)≥γ

C(r, n) 0.875 0.913 0.926

where o

γ

and b

γ

are the integer parts of

(κ

2

÷2γ ±κ

κ

2

÷4γ)¡2,

where the − sign goes with o

γ

and the ÷ sign with b

γ

.

The proposition follows from the fact that the

sequence of Bin(n, γ¡n) distributions converges

weakly to the Poisson(γ) distribution and so the

limit of the inﬁmum is at most the Poisson proba-

bility in the proposition by an easy calculation.

Let us use Proposition 1 to investigate the validity

of qualiﬁcations (1) and (2) in the list above. The

nominal conﬁdence level in Table 3 below is 0.95.

Table 4

Values of λ

r

for the modiﬁed lower bound for the Wilson interval

1 − x = 1 x = 2 x = 3

0.90 0.105 0.532 1.102

0.95 0.051 0.355 0.818

0.99 0.010 0.149 0.436

It is clear that qualiﬁcation (1) does not work at

all and (2) is marginal. There are similar problems

with qualiﬁcations (3) and (4).

3. RECOMMENDED ALTERNATIVE INTERVALS

From the evidence gathered in Section 2, it seems

clear that the standard interval is just too risky.

This brings us to the consideration of alternative

intervals. We now analyze several such alternatives,

each with its motivation. A few other intervals are

also mentioned for their theoretical importance.

Among these intervals we feel three stand out in

their comparative performance. These are labeled

separately as the “recommended intervals”.

3.1 Recommended Intervals

3.1.1 The Wilson interval. An alternative to the

standard interval is the conﬁdence interval based

on inverting the test in equation (2) that uses the

null standard error (rq)

1¡2

n

−1¡2

instead of the esti-

mated standard error ( ˆ rˆ q)

1¡2

n

−1¡2

. This conﬁdence

interval has the form

C1

W

=

.÷κ

2

¡2

n ÷κ

2

±

κn

1¡2

n ÷κ

2

( ˆ rˆ q ÷κ

2

¡(4n))

1¡2

. (4)

This interval was apparently introduced by Wilson

(1927) and we will call this interval the Wilson

interval.

The Wilson interval has theoretical appeal. The

interval is the inversion of the CLT approximation

108 L. D. BROWN, T. T. CAI AND A. DASGUPTA

to the family of equal tail tests of J

0

: r = r

0

.

Hence, one accepts J

0

based on the CLT approx-

imation if and only if r

0

is in this interval. As

Wilson showed, the argument involves the solution

of a quadratic equation; or see Tamhane and Dunlop

(2000, Exercise 9.39).

3.1.2 The Agresti–Coull interval. The standard

interval C1

:

is simple and easy to remember. For

the purposes of classroom presentation and use in

texts, it may be nice to have an alternative that has

the familiar form ˆ r ± z

ˆ r(1 − ˆ r)¡n, with a better

and new choice of ˆ r rather than ˆ r = .¡n. This can

be accomplished by using the center of the Wilson

region in place of ˆ r. Denote

¯

. = . ÷ κ

2

¡2 and

˜ n = n ÷κ

2

. Let ˜ r =

¯

.¡ ˜ n and ˜ q = 1 − ˜ r. Deﬁne the

conﬁdence interval C1

PC

for r by

C1

PC

= ˜ r ±κ( ˜ r˜ q)

1¡2

˜ n

−1¡2

. (5)

Both the Agresti–Coull and the Wilson interval are

centered on the same value, ˜ r. It is easy to check

that the Agresti–Coull intervals are never shorter

than the Wilson intervals. For the case when α =

0.05, if we use the value 2 instead of 1.96 for κ,

this interval is the “add 2 successes and 2 failures”

interval in Agresti and Coull (1998). For this rea-

son, we call it the Agresti–Coull interval. To the

best of our knowledge, Samuels and Witmer (1999)

is the ﬁrst introductory statistics textbook that rec-

ommends the use of this interval. See Figure 5 for

the coverage of this interval. See also Figure 6 for

its average coverage probability.

3.1.3 Jeffreys interval. Beta distributions are the

standard conjugate priors for binomial distributions

and it is quite common to use beta priors for infer-

ence on r (see Berger, 1985).

Suppose . ∼ Bin(n, r) and suppose r has a prior

distribution Beta(o

1

, o

2

); then the posterior distri-

bution of r is Beta(. ÷ o

1

, n − . ÷ o

2

). Thus a

100(1 −α)% equal-tailed Bayesian interval is given

by

|1(α¡2; .÷o

1

, n −.÷o

2

),

1(1 −α¡2; .÷o

1

, n −.÷o

2

)|,

where 1(α; n

1

, n

2

) denotes the α quantile of a

Beta(n

1

, n

2

) distribution.

The well-known Jeffreys prior and the uniform

prior are each a beta distribution. The noninforma-

tive Jeffreys prior is of particular interest to us.

Historically, Bayes procedures under noninforma-

tive priors have a track record of good frequentist

properties; see Wasserman (1991). In this problem

the Jeffreys prior is Beta(1¡2, 1¡2) which has the

density function

](r) = π

−1

r

−1¡2

(1 −r)

−1¡2

.

The 100(1−α)% equal-tailed Jeffreys prior interval

is deﬁned as

C1

J

= |1

J

(r), U

J

(r)|, (6)

where 1

J

(0) = 0, U

J

(n) = 1 and otherwise

1

J

(r) = 1(α¡2; .÷1¡2, n −.÷1¡2), (7)

U

J

(r) = 1(1 −α¡2; .÷1¡2, n −.÷1¡2). (8)

The interval is formed by taking the central 1 − α

posterior probability interval. This leaves α¡2 poste-

rior probability in each omitted tail. The exception

is for r = 0(n) where the lower (upper) limits are

modiﬁed to avoid the undesirable result that the

coverage probability C(r, n) → 0 as r → 0 or 1.

The actual endpoints of the interval need to be

numerically computed. This is very easy to do using

softwares such as Minitab, S-PLUS or Mathematica.

In Table 5 we have provided the limits for the case

of the Jeffreys prior for 7 ≤ n ≤ 30.

The endpoints of the Jeffreys prior interval are

the α¡2 and 1−α¡2 quantiles of the Beta(r÷1¡2, n−

r ÷ 1¡2) distribution. The psychological resistance

among some to using the interval is because of the

inability to compute the endpoints at ease without

software.

We provide two avenues to resolving this problem.

One is Table 5 at the end of the paper. The second

is a computable approximation to the limits of the

Jeffreys prior interval, one that is computable with

just a normal table. This approximation is obtained

after some algebra from the general approximation

to a Beta quantile given in page 945 in Abramowitz

and Stegun (1970).

The lower limit of the 100(1 − α)% Jeffreys prior

interval is approximately

r ÷1¡2

n ÷1 ÷(n −r ÷1¡2)(c

2ω

−1)

, (9)

where

ω =

κ

4 ˆ rˆ q¡n ÷(κ

2

−3)¡(6n

2

)

4 ˆ rˆ q

÷

(1¡2 − ˆ r)( ˆ rˆ q(κ

2

÷2) −1¡n)

6n( ˆ rˆ q)

2

.

The upper limit may be approximated by the same

expression with κ replaced by −κ in ω. The simple

approximation given above is remarkably accurate.

Berry (1996, page 222) suggests using a simpler nor-

mal approximation, but this will not be sufﬁciently

accurate unless n ˆ r(1 − ˆ r) is rather large.

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 109

Table 5

95% Limits of the Jeffreys prior interval

x n=7 n=8 n=9 n=10 n=11 n=12

0 0 0.292 0 0.262 0 0.238 0 0.217 0 0.200 0 0.185

1 0.016 0.501 0.014 0.454 0.012 0.414 0.011 0.381 0.010 0.353 0.009 0.328

2 0.065 0.648 0.056 0.592 0.049 0.544 0.044 0.503 0.040 0.467 0.036 0.436

3 0.139 0.766 0.119 0.705 0.104 0.652 0.093 0.606 0.084 0.565 0.076 0.529

4 0.234 0.861 0.199 0.801 0.173 0.746 0.153 0.696 0.137 0.652 0.124 0.612

5 0.254 0.827 0.224 0.776 0.200 0.730 0.180 0.688

6 0.270 0.800 0.243 0.757

x n=13 n=14 n=15 n=16 n=17 n=18

0 0 0.173 0 0.162 0 0.152 0 0.143 0 0.136 0 0.129

1 0.008 0.307 0.008 0.288 0.007 0.272 0.007 0.257 0.006 0.244 0.006 0.232

2 0.033 0.409 0.031 0.385 0.029 0.363 0.027 0.344 0.025 0.327 0.024 0.311

3 0.070 0.497 0.064 0.469 0.060 0.444 0.056 0.421 0.052 0.400 0.049 0.381

4 0.114 0.577 0.105 0.545 0.097 0.517 0.091 0.491 0.085 0.467 0.080 0.446

5 0.165 0.650 0.152 0.616 0.140 0.584 0.131 0.556 0.122 0.530 0.115 0.506

6 0.221 0.717 0.203 0.681 0.188 0.647 0.174 0.617 0.163 0.589 0.153 0.563

7 0.283 0.779 0.259 0.741 0.239 0.706 0.222 0.674 0.207 0.644 0.194 0.617

8 0.294 0.761 0.272 0.728 0.254 0.697 0.237 0.668

9 0.303 0.746 0.284 0.716

x n=19 n=20 n=21 n=22 n=23 n=24

0 0 0.122 0 0.117 0 0.112 0 0.107 0 0.102 0 0.098

1 0.006 0.221 0.005 0.211 0.005 0.202 0.005 0.193 0.005 0.186 0.004 0.179

2 0.022 0.297 0.021 0.284 0.020 0.272 0.019 0.261 0.018 0.251 0.018 0.241

3 0.047 0.364 0.044 0.349 0.042 0.334 0.040 0.321 0.038 0.309 0.036 0.297

4 0.076 0.426 0.072 0.408 0.068 0.392 0.065 0.376 0.062 0.362 0.059 0.349

5 0.108 0.484 0.102 0.464 0.097 0.446 0.092 0.429 0.088 0.413 0.084 0.398

6 0.144 0.539 0.136 0.517 0.129 0.497 0.123 0.478 0.117 0.461 0.112 0.444

7 0.182 0.591 0.172 0.568 0.163 0.546 0.155 0.526 0.148 0.507 0.141 0.489

8 0.223 0.641 0.211 0.616 0.199 0.593 0.189 0.571 0.180 0.551 0.172 0.532

9 0.266 0.688 0.251 0.662 0.237 0.638 0.225 0.615 0.214 0.594 0.204 0.574

10 0.312 0.734 0.293 0.707 0.277 0.681 0.263 0.657 0.250 0.635 0.238 0.614

11 0.319 0.723 0.302 0.698 0.287 0.675 0.273 0.653

12 0.325 0.713 0.310 0.690

x n=25 n=26 n=27 n=28 n=29 n=30

0 0 0.095 0 0.091 0 0.088 0 0.085 0 0.082 0 0.080

1 0.004 0.172 0.004 0.166 0.004 0.160 0.004 0.155 0.004 0.150 0.004 0.145

2 0.017 0.233 0.016 0.225 0.016 0.217 0.015 0.210 0.015 0.203 0.014 0.197

3 0.035 0.287 0.034 0.277 0.032 0.268 0.031 0.259 0.030 0.251 0.029 0.243

4 0.056 0.337 0.054 0.325 0.052 0.315 0.050 0.305 0.048 0.295 0.047 0.286

5 0.081 0.384 0.077 0.371 0.074 0.359 0.072 0.348 0.069 0.337 0.067 0.327

6 0.107 0.429 0.102 0.415 0.098 0.402 0.095 0.389 0.091 0.378 0.088 0.367

7 0.135 0.473 0.129 0.457 0.124 0.443 0.119 0.429 0.115 0.416 0.111 0.404

8 0.164 0.515 0.158 0.498 0.151 0.482 0.145 0.468 0.140 0.454 0.135 0.441

9 0.195 0.555 0.187 0.537 0.180 0.521 0.172 0.505 0.166 0.490 0.160 0.476

10 0.228 0.594 0.218 0.576 0.209 0.558 0.201 0.542 0.193 0.526 0.186 0.511

11 0.261 0.632 0.250 0.613 0.239 0.594 0.230 0.577 0.221 0.560 0.213 0.545

12 0.295 0.669 0.282 0.649 0.271 0.630 0.260 0.611 0.250 0.594 0.240 0.578

13 0.331 0.705 0.316 0.684 0.303 0.664 0.291 0.645 0.279 0.627 0.269 0.610

14 0.336 0.697 0.322 0.678 0.310 0.659 0.298 0.641

15 0.341 0.690 0.328 0.672

110 L. D. BROWN, T. T. CAI AND A. DASGUPTA

Fig. 6. Comparison of the average coverage probabilities. From top to bottom: the Agresti–Coull interval C1

PC

, the Wilson interval C1

W

,

the Jeffreys prior interval C1

J

and the standard interval C1

s

. The nominal conﬁdence level is 0.95.

In Figure 5 we plot the coverage probability of the

standard interval, the Wilson interval, the Agresti–

Coull interval and the Jeffreys interval for n = 50

and α = 0.05.

3.2 Coverage Probability

In this and the next subsections, we compare the

performance of the standard interval and the three

recommended intervals in terms of their coverage

probability and length.

Coverage of the Wilson interval ﬂuctuates accept-

ably near 1 − α, except for r very near 0 or 1. It

might be helpful to consult Figure 5 again. It can

be shown that, when 1 −α = 0.95,

lim

n→∞

inf

γ≥1

C

γ

n

, n

= 0.92,

lim

n→∞

inf

γ≥5

C

γ

n

, n

= 0.936

and

lim

n→∞

inf

γ≥10

C

γ

n

, n

= 0.938

for the Wilson interval. In comparison, these three

values for the standard interval are 0.860, 0.870,

and 0.905, respectively, obviously considerably

smaller.

The modiﬁcation C1

J−W

presented in Section

4.1.1 removes the ﬁrst few deep downward spikes

of the coverage function for C1

W

. The resulting cov-

erage function is overall somewhat conservative for

r very near 0 or 1. Both C1

W

and C1

J−W

have the

same coverage functions away from 0 or 1.

The Agresti–Coull interval has good minimum

coverage probability. The coverage probability of

the interval is quite conservative for r very close

to 0 or 1. In comparison to the Wilson interval it

is more conservative, especially for small n. This

is not surprising because, as we have noted, C1

PC

always contains C1

W

as a proper subinterval.

The coverage of the Jeffreys interval is quali-

tatively similar to that of C1

W

over most of the

parameter space |0, 1|. In addition, as we will see

in Section 4.3, C1

J

has an appealing connection to

the mid-T corrected version of the Clopper–Pearson

“exact” intervals. These are very similar to C1

J

,

over most of the range, and have similar appealing

properties. C1

J

is a serious and credible candidate

for practical use. The coverage has an unfortunate

fairly deep spike near r = 0 and, symmetrically,

another near r = 1. However, the simple modiﬁca-

tion of C1

J

presented in Section 4.1.2 removes these

two deep downward spikes. The modiﬁed Jeffreys

interval C1

J−J

performs well.

Let us also evaluate the intervals in terms of their

average coverage probability, the average being over

r. Figure 6 demonstrates the striking difference in

the average coverage probability among four inter-

vals: the Agresti–Coull interval, the Wilson interval

the Jeffreys prior interval and the standard inter-

val. The standard interval performs poorly. The

interval C1

PC

is slightly conservative in terms of

average coverage probability. Both the Wilson inter-

val and the Jeffreys prior interval have excellent

performance in terms of the average coverage prob-

ability; that of the Jeffreys prior interval is, if

anything, slightly superior. The average coverage

of the Jeffreys interval is really very close to the

nominal level even for quite small n. This is quite

impressive.

Figure 7 displays the mean absolute errors,

1

0

[C(r, n) − (1 − α)[ dr, for n = 10 to 25, and

n = 26 to 40. It is clear from the plots that among

the four intervals, C1

W

, C1

PC

and C1

J

are com-

parable, but the mean absolute errors of C1

:

are

signiﬁcantly larger.

3.3 Expected Length

Besides coverage, length is also very important

in evaluation of a conﬁdence interval. We compare

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 111

Fig. 7. The mean absolute errors of the coverage of the standard (solid), the Agresti–Coull (dashed), the Jeffreys (÷) and the Wilson

(dotted) intervals for n = 10 to 25 and n = 26 to 40.

both the expected length and the average expected

length of the intervals. By deﬁnition,

Expected length

= T

n, r

(length(C1))

=

n

¸

r=0

(U(r, n) −1(r, n))

n

r

r

r

(1 −r)

n−r

,

where U and 1 are the upper and lower lim-

its of the conﬁdence interval C1, respectively.

The average expected length is just the integral

1

0

T

n, r

(length(C1)) dr.

We plot in Figure 8 the expected lengths of the

four intervals for n = 25 and α = 0.05. In this case,

C1

W

is the shortest when 0.210 ≤ r ≤ 0.790, C1

J

is

the shortest when 0.133 ≤ r ≤ 0.210 or 0.790 ≤ r ≤

0.867, and C1

:

is the shortest when r ≤ 0.133 or r ≥

0.867. It is no surprise that the standard interval is

the shortest when r is near the boundaries. C1

:

is

not really in contention as a credible choice for such

values of r because of its poor coverage properties

in that region. Similar qualitative phenomena hold

for other values of n.

Figure 9 shows the average expected lengths of

the four intervals for n = 10 to 25 and n = 26 to

Fig. 8. The expected lengths of the standard (solid), the Wilson (dotted), the Agresti–Coull (dashed) and the Jeffreys (÷) intervals for

n = 25 and α = 0.05.

40. Interestingly, the comparison is clear and con-

sistent as n changes. Always, the standard interval

and the Wilson interval C1

W

have almost identical

average expected length; the Jeffreys interval C1

J

is

comparable to the Wilson interval, and in fact C1

J

is slightly more parsimonious. But the difference is

not of practical relevance. However, especially when

n is small, the average expected length of C1

PC

is

noticeably larger than that of C1

J

and C1

W

. In fact,

for n till about 20, the average expected length of

C1

PC

is larger than that of C1

J

by 0.04 to 0.02, and

this difference can be of deﬁnite practical relevance.

The difference starts to wear off when n is larger

than 30 or so.

4. OTHER ALTERNATIVE INTERVALS

Several other intervals deserve consideration,

either due to their historical value or their theoret-

ical properties. In the interest of space, we had to

exercise some personal judgment in deciding which

additional intervals should be presented.

4.1 Boundary modiﬁcation

The coverage probabilities of the Wilson interval

and the Jeffreys interval ﬂuctuate acceptably near

112 L. D. BROWN, T. T. CAI AND A. DASGUPTA

Fig. 9. The average expected lengths of the standard (solid), the Wilson (dotted), the Agresti–Coull (dashed) and the Jeffreys (÷)

intervals for n = 10 to 25 and n = 26 to 40.

1−α for r not very close to 0 or 1. Simple modiﬁca-

tions can be made to remove a few deep downward

spikes of their coverage near the boundaries; see

Figure 5.

4.1.1 Modiﬁed Wilson interval. The lower bound

of the Wilson interval is formed by inverting a CLT

approximation. The coverage has downward spikes

when r is very near 0 or 1. These spikes exist for all

n and α. For example, it can be shown that, when

1 −α = 0.95 and r = 0.1765¡n,

lim

n→∞

T

r

(r ∈ C1

W

) = 0.838

and when 1 − α = 0.99 and r = 0.1174¡n,

lim

n→∞

T

r

(r ∈ C1

W

) = 0.889. The particular

numerical values (0.1174, 0.1765) are relevant only

to the extent that divided by n, they approximate

the location of these deep downward spikes.

The spikes can be removed by using a one-sided

Poisson approximation for r close to 0 or n. Suppose

we modify the lower bound for r = 1, . . . , r

∗

. For a

ﬁxed 1 ≤ r ≤ r

∗

, the lower bound of C1

W

should be

Fig. 10. Coverage probability for n = 50 and r ∈ (0, 0.15). The plots are symmetric about r = 0.5 and the coverage of the modiﬁed intervals

(solid line) is the same as that of the corresponding interval without modiﬁcation (dashed line) for r ∈ |0.15, 0.85|.

replaced by a lower bound of λ

r

¡n where λ

r

solves

c

−λ

(λ

0

¡0! ÷λ

1

¡1! ÷· · · ÷λ

r−1

¡(r−1)!) = 1−α. (10)

A symmetric prescription needs to be followed to

modify the upper bound for r very near n. The value

of r

∗

should be small. Values which work reasonably

well for 1 −α = 0.95 are

r

∗

= 2 for n - 50 and r

∗

= 3 for 51 ≤ n ≤ 100÷.

Using the relationship between the Poisson and

χ

2

distributions,

T(Y ≤ r) = T(χ

2

2(1÷r)

≤ 2λ)

where Y ∼ Poisson(λ), one can also formally

express λ

r

in (10) in terms of the χ

2

quantiles:

λ

r

= (1¡2)χ

2

2r, α

, where χ

2

2r, α

denotes the 100αth

percentile of the χ

2

distribution with 2r degrees of

freedom. Table 4 gives the values of λ

r

for selected

values of r and α.

For example, consider the case 1 − α = 0.95 and

r = 2. The lower bound of C1

W

is ≈ 0.548¡(n ÷

4). The modiﬁed Wilson interval replaces this by a

lower bound of λ¡n where λ = (1¡2) χ

2

4, 0.05

. Thus,

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 113

Fig. 11. Coverage probability of other alternative intervals for n = 50.

from a χ

2

table, for r = 2 the new lower bound is

0.355¡n.

We denote this modiﬁed Wilson interval by

C1

J−W

. See Figure 10 for its coverage.

4.1.2 Modiﬁed Jeffreys interval. Evidently, C1

J

has an appealing Bayesian interpretation, and,

its coverage properties are appealing again except

for a very narrow downward coverage spike fairly

near 0 and 1 (see Figure 5). The unfortunate down-

ward spikes in the coverage function result because

U

J

(0) is too small and symmetrically 1

J

(n) is too

large. To remedy this, one may revise these two

speciﬁc limits as

U

J−J

(0) = r

/

and 1

J−J

(n) = 1 −r

/

,

where r

/

satisﬁes (1 − r

/

)

n

= α¡2 or equivalently

r

/

= 1 −(α¡2)

1¡n

.

We also made a slight, ad hoc alteration of 1

J

(1)

and set

1

J−J

(1) = 0 and U

J−J

(n −1) = 1.

In all other cases, 1

J−J

= 1

J

and U

J−J

= U

J

.

We denote the modiﬁed Jeffreys interval by C1

J−J

.

This modiﬁcation removes the two steep down-

ward spikes and the performance of the interval is

improved. See Figure 10.

4.2 Other intervals

4.2.1 The Clopper–Pearson interval. The Clopper–

Pearson interval is the inversion of the equal-tail

binomial test rather than its normal approxima-

tion. Some authors refer to this as the “exact”

procedure because of its derivation from the bino-

mial distribution. If . = r is observed, then

the Clopper–Pearson (1934) interval is deﬁned by

C1

CT

= |1

CT

(r), U

CT

(r)|, where 1

CT

(r) and U

CT

(r)

are, respectively, the solutions in r to the equations

T

r

(. ≥ r) = α¡2 and T

r

(. ≤ r) = α¡2.

It is easy to show that the lower endpoint is the α¡2

quantile of a beta distribution Beta(r, n − r ÷ 1),

and the upper endpoint is the 1 −α¡2 quantile of a

beta distribution Beta(r ÷ 1, n − r). The Clopper–

Pearson interval guarantees that the actual cov-

erage probability is always equal to or above the

nominal conﬁdence level. However, for any ﬁxed r,

the actual coverage probability can be much larger

than 1−α unless n is quite large, and thus the conﬁ-

dence interval is rather inaccurate in this sense. See

Figure 11. The Clopper–Pearson interval is waste-

fully conservative and is not a good choice for prac-

tical use, unless strict adherence to the prescription

C(r, n) ≥ 1−α is demanded. Even then, better exact

methods are available; see, for instance, Blyth and

Still (1983) and Casella (1986).

114 L. D. BROWN, T. T. CAI AND A. DASGUPTA

4.2.2 The arcsine interval. Another interval is

based on a widely used variance stabilizing trans-

formation for the binomial distribution [see, e.g.,

Bickel and Doksum, 1977: T( ˆ r) = arcsin( ˆ r

1¡2

)|.

This variance stabilization is based on the delta

method and is, of course, only an asymptotic one.

Anscombe (1948) showed that replacing ˆ r by

ˇ r = (. ÷ 3¡8)¡(n ÷ 3¡4) gives better variance

stabilization; furthermore

2n

1¡2

|arcsin( ˇ r

1¡2

) −arcsin(r

1¡2

)| → 1(0, 1)

as n → ∞.

This leads to an approximate 100(1−α)% conﬁdence

interval for r,

C1

P¡c

=

¸

sin

2

(arcsin( ˇ r

1¡2

) −

1

2

κn

−1¡2

),

sin

2

(arcsin( ˇ r

1¡2

) ÷

1

2

κn

−1¡2

)

¸

.

(11)

See Figure 11 for the coverage probability of this

interval for n = 50. This interval performs reason-

ably well for r not too close to 0 or 1. The coverage

has steep downward spikes near the two edges; in

fact it is easy to see that the coverage drops to zero

when r is sufﬁciently close to the boundary (see

Figure 11). The mean absolute error of the coverage

of C1

Arc

is signiﬁcantly larger than those of C1

W

,

C1

PC

and C1

J

. We note that our evaluations show

that the performance of the arcsine interval with

the standard ˆ r in place of ˇ r in (11) is much worse

than that of C1

Arc

.

4.2.3 The logit interval. The logit interval is

obtained by inverting a Wald type interval for the

log odds λ = log(

r

1−r

); (see Stone, 1995). The MLE

of λ (for 0 - . - n) is

ˆ

λ = log

ˆ r

1 − ˆ r

= log

.

n −.

,

which is the so-called empirical logit transform. The

variance of

ˆ

λ, by an application of the delta theorem,

can be estimated by

¨

V =

n

.(n −.)

.

This leads to an approximate 100(1−α)% conﬁdence

interval for λ,

C1(λ) = |λ

/

, λ

u

| = |

ˆ

λ −κ

¨

V

1¡2

,

ˆ

λ ÷κ

¨

V

1¡2

|. (12)

The logit interval for r is obtained by inverting the

interval (12),

C1

Logit

=

¸

c

λ

/

1 ÷c

λ

/

,

c

λ

u

1 ÷c

λ

u

¸

. (13)

The interval (13) has been suggested, for example,

in Stone (1995, page 667). Figure 11 plots the cov-

erage of the logit interval for n = 50. This interval

performs quite well in terms of coverage for r away

from 0 or 1. But the interval is unnecessarily long;

in fact its expected length is larger than that of the

Clopper–Pearson exact interval.

Remark. Anscombe (1956) suggested that

ˆ

λ =

log(

.÷1¡2

n−.÷1¡2

) is a better estimate of λ; see also Cox

and Snell (1989) and Santner and Duffy (1989). The

variance of Anscombe’s

ˆ

λ may be estimated by

¨

V =

(n ÷1)(n ÷2)

n(.÷1)(n −.÷1)

.

A new logit interval can be constructed using the

new estimates

ˆ

λ and

¨

V. Our evaluations show that

the new logit interval is overall shorter than C1

Logit

in (13). But the coverage of the new interval is not

satisfactory.

4.2.4 The Bayesian HPD interval. An exact

Bayesian solution would involve using the HPD

intervals instead of our equal-tails proposal. How-

ever, HPD intervals are much harder to compute

and do not do as well in terms of coverage proba-

bility. See Figure 11 and compare to the Jeffreys’

equal-tailed interval in Figure 5.

4.2.5 The likelihood ratio interval. Along with

the Wald and the Rao score intervals, the likeli-

hood ratio method is one of the most used methods

for construction of conﬁdence intervals. It is con-

structed by inversion of the likelihood ratio test

which accepts the null hypothesis J

0

: r = r

0

if

−2log(A

n

) ≤ κ

2

, where A

n

is the likelihood ratio

A

n

=

1(r

0

)

sup

r

1(r)

=

r

.

0

(1 −r

0

)

n−.

(.¡n)

.

(1 −.¡n)

n−.

,

1 being the likelihood function. See Rao (1973).

Brown, Cai and DasGupta (1999) show by analyt-

ical calculations that this interval has nice proper-

ties. However, it is slightly harder to compute. For

the purpose of the present article which we view as

primarily directed toward practice, we do not fur-

ther analyze the likelihood ratio interval.

4.3 Connections between Jeffreys Intervals

and Mid-P Intervals

The equal-tailed Jeffreys prior interval has some

interesting connections to the Clopper–Pearson

interval. As we mentioned earlier, the Clopper–

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 115

Pearson interval C1

CT

can be written as

C1

CT

= |1(α¡2; ., n −.÷1),

1(1 −α¡2; .÷1, n −.)|.

It therefore follows immediately that C1

J

is always

contained in C1

CT

. Thus C1

J

corrects the conserva-

tiveness of C1

CT

.

It turns out that the Jeffreys prior interval,

although Bayesianly constructed, has a clear and

convincing frequentist motivation. It is thus no sur-

prise that it does well from a frequentist perspec-

tive. As we now explain, the Jeffreys prior interval

C1

J

can be regarded as a continuity corrected

version of the Clopper–Pearson interval C1

CT

.

The interval C1

CT

inverts the inequality T

r

(. ≤

1(r)) ≤ α¡2 to obtain the lower limit and similarly

for the upper limit. Thus, for ﬁxed r, the upper limit

of the interval for r, U

CT

(r), satisﬁes

T

U

CT

(r)

(. ≤ r) ≤ α¡2, (14)

and symmetrically for the lower limit.

This interval is very conservative; undesirably so

for most practical purposes. A familiar proposal to

eliminate this over-conservativeness is to instead

invert

T

r

(.≤1(r)−1)÷(1¡2)T

r

(.=1(r))=α¡2, (15)

This amounts to solving

(1¡2){T

U

CT

(r)

(. ≤ r −1)

÷T

U

CT

(r)

(. ≤ r)} = α¡2,

(16)

which is the same as

U

mid-T

(.) = (1¡2)1(1 −α¡2; r, n −r ÷1)

÷(1¡2)1(1 −α¡2; r ÷1, n −r)

(17)

and symmetrically for the lower endpoint. These

are the “Mid-TClopper-Pearson” intervals. They are

known to have good coverage and length perfor-

mance. U

mid-T

given in (17) is a weighted average

of two incomplete Beta functions. The incomplete

Beta function of interest, 1(1 −α¡2; r, n−r÷1), is

continuous and monotone in r if we formally treat

r as a continuous argument. Hence the average of

the two functions deﬁning U

mid-T

is approximately

the same as the value at the halfway point, r÷1¡2.

Thus

U

mid-T

(.)≈1(1−α¡2;r÷1¡2,n−r÷1¡2)=U

J

(r),

exactly the upper limit for the equal-tailed Jeffreys

interval. Similarly, the corresponding approximate

lower endpoint is the Jeffreys’ lower limit.

Another frequentist way to interpret the Jeffreys

prior interval is to say that U

J

(r) is the upper

limit for the Clopper–Pearson rule with r−1¡2 suc-

cesses and 1

J

(r) is the lower limit for the Clopper–

Pearson rule with r ÷ 1¡2 successes. Strawderman

and Wells (1998) contains a valuable discussion of

mid-T intervals and suggests some variations based

on asymptotic expansions.

5. CONCLUDING REMARKS

Interval estimation of a binomial proportion is a

very basic problem in practical statistics. The stan-

dard Wald interval is in nearly universal use. We

ﬁrst show that the performance of this standard

interval is persistently chaotic and unacceptably

poor. Indeed its coverage properties defy conven-

tional wisdom. The performance is so erratic and

the qualiﬁcations given in the inﬂuential texts

are so defective that the standard interval should

not be used. We provide a fairly comprehensive

evaluation of many natural alternative intervals.

Based on this analysis, we recommend the Wilson

or the equal-tailed Jeffreys prior interval for small

n(n ≤ 40). These two intervals are comparable in

both absolute error and length for n ≤ 40, and we

believe that either could be used, depending on

taste.

For larger n, the Wilson, the Jeffreys and the

Agresti–Coull intervals are all comparable, and the

Agresti–Coull interval is the simplest to present.

It is generally true in statistical practice that only

those methods that are easy to describe, remember

and compute are widely used. Keeping this in mind,

we recommend the Agresti–Coull interval for prac-

tical use when n ≥ 40. Even for small sample sizes,

the easy-to-present Agresti–Coull interval is much

preferable to the standard one.

We would be satisﬁed if this article contributes

to a greater appreciation of the severe ﬂaws of the

popular standard interval and an agreement that it

deserves not to be used at all. We also hope that

the recommendations for alternative intervals will

provide some closure as to what may be used in

preference to the standard method.

Finally, we note that the speciﬁc choices of the

values of n, r and α in the examples and ﬁgures

are artifacts. The theoretical results in Brown, Cai

and DasGupta (1999) show that qualitatively sim-

ilar phenomena as regarding coverage and length

hold for general n and r and common values of

the coverage. (Those results there are asymptotic

as n → ∞, but they are also sufﬁciently accurate

for realistically moderate n.)

116 L. D. BROWN, T. T. CAI AND A. DASGUPTA

APPENDIX

Table A.1

95% Limits of the modiﬁed Jeffreys prior interval

x n=7 n=8 n=9 n=10 n=11 n=12

0 0 0.410 0 0.369 0 0.336 0 0.308 0 0.285 0 0.265

1 0 0.501 0 0.454 0 0.414 0 0.381 0 0.353 0 0.328

2 0.065 0.648 0.056 0.592 0.049 0.544 0.044 0.503 0.040 0.467 0.036 0.436

3 0.139 0.766 0.119 0.705 0.104 0.652 0.093 0.606 0.084 0.565 0.076 0.529

4 0.234 0.861 0.199 0.801 0.173 0.746 0.153 0.696 0.137 0.652 0.124 0.612

5 0.254 0.827 0.224 0.776 0.200 0.730 0.180 0.688

6 0.270 0.800 0.243 0.757

x n=13 n=14 n=15 n=16 n=17 n=18

0 0 0.247 0 0.232 0 0.218 0 0.206 0 0.195 0 0.185

1 0 0.307 0 0.288 0 0.272 0 0.257 0 0.244 0 0.232

2 0.033 0.409 0.031 0.385 0.029 0.363 0.027 0.344 0.025 0.327 0.024 0.311

3 0.070 0.497 0.064 0.469 0.060 0.444 0.056 0.421 0.052 0.400 0.049 0.381

4 0.114 0.577 0.105 0.545 0.097 0.517 0.091 0.491 0.085 0.467 0.080 0.446

5 0.165 0.650 0.152 0.616 0.140 0.584 0.131 0.556 0.122 0.530 0.115 0.506

6 0.221 0.717 0.203 0.681 0.188 0.647 0.174 0.617 0.163 0.589 0.153 0.563

7 0.283 0.779 0.259 0.741 0.239 0.706 0.222 0.674 0.207 0.644 0.194 0.617

8 0.294 0.761 0.272 0.728 0.254 0.697 0.237 0.668

9 0.303 0.746 0.284 0.716

x n=19 n=20 n=21 n=22 n=23 n=24

0 0 0.176 0 0.168 0 0.161 0 0.154 0 0.148 0 0.142

1 0 0.221 0 0.211 0 0.202 0 0.193 0 0.186 0 0.179

2 0.022 0.297 0.021 0.284 0.020 0.272 0.019 0.261 0.018 0.251 0.018 0.241

3 0.047 0.364 0.044 0.349 0.042 0.334 0.040 0.321 0.038 0.309 0.036 0.297

4 0.076 0.426 0.072 0.408 0.068 0.392 0.065 0.376 0.062 0.362 0.059 0.349

5 0.108 0.484 0.102 0.464 0.097 0.446 0.092 0.429 0.088 0.413 0.084 0.398

6 0.144 0.539 0.136 0.517 0.129 0.497 0.123 0.478 0.117 0.461 0.112 0.444

7 0.182 0.591 0.172 0.568 0.163 0.546 0.155 0.526 0.148 0.507 0.141 0.489

8 0.223 0.641 0.211 0.616 0.199 0.593 0.189 0.571 0.180 0.551 0.172 0.532

9 0.266 0.688 0.251 0.662 0.237 0.638 0.225 0.615 0.214 0.594 0.204 0.574

10 0.312 0.734 0.293 0.707 0.277 0.681 0.263 0.657 0.250 0.635 0.238 0.614

11 0.319 0.723 0.302 0.698 0.287 0.675 0.273 0.653

12 0.325 0.713 0.310 0.690

x n=25 n=26 n=27 n=28 n=29 n=30

0 0 0.137 0 0.132 0 0.128 0 0.123 0 0.119 0 0.116

1 0 0.172 0 0.166 0 0.160 0 0.155 0 0.150 0 0.145

2 0.017 0.233 0.016 0.225 0.016 0.217 0.015 0.210 0.015 0.203 0.014 0.197

3 0.035 0.287 0.034 0.277 0.032 0.268 0.031 0.259 0.030 0.251 0.029 0.243

4 0.056 0.337 0.054 0.325 0.052 0.315 0.050 0.305 0.048 0.295 0.047 0.286

5 0.081 0.384 0.077 0.371 0.074 0.359 0.072 0.348 0.069 0.337 0.067 0.327

6 0.107 0.429 0.102 0.415 0.098 0.402 0.095 0.389 0.091 0.378 0.088 0.367

7 0.135 0.473 0.129 0.457 0.124 0.443 0.119 0.429 0.115 0.416 0.111 0.404

8 0.164 0.515 0.158 0.498 0.151 0.482 0.145 0.468 0.140 0.454 0.135 0.441

9 0.195 0.555 0.187 0.537 0.180 0.521 0.172 0.505 0.166 0.490 0.160 0.476

10 0.228 0.594 0.218 0.576 0.209 0.558 0.201 0.542 0.193 0.526 0.186 0.511

11 0.261 0.632 0.250 0.613 0.239 0.594 0.230 0.577 0.221 0.560 0.213 0.545

12 0.295 0.669 0.282 0.649 0.271 0.630 0.260 0.611 0.250 0.594 0.240 0.578

13 0.331 0.705 0.316 0.684 0.303 0.664 0.291 0.645 0.279 0.627 0.269 0.610

14 0.336 0.697 0.322 0.678 0.310 0.659 0.298 0.641

15 0.341 0.690 0.328 0.672

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 117

ACKNOWLEDGMENTS

We thank Xuefeng Li for performing some helpful

computations and Jim Berger, David Moore, Steve

Samuels, Bill Studden and Ron Thisted for use-

ful conversations. We also thank the Editors and

two anonymous referees for their thorough and con-

structive comments. Supported by grants from the

National Science Foundation and the National Secu-

rity Agency.

REFERENCES

Abramowitz, M. and Stegun, I. A. (1970). Handbook of Mathe-

matical Functions. Dover, New York.

Agresti, A. and Coull, B. A. (1998). Approximate is better than

“exact” for interval estimation of binomial proportions. Amer.

Statist. 52 119–126.

Anscombe, F. J. (1948). The transformation of Poisson, binomial

and negative binomial data. Biometrika 35 246–254.

Anscombe, F. J. (1956). On estimating binomial response rela-

tions. Biometrika 43 461–464.

Berger, J. O. (1985). Statistical Decision Theory and Bayesian

Analysis, 2nd ed. Springer, New York.

Berry, D. A. (1996). Statistics: A Bayesian Perspective.

Wadsworth, Belmont, CA.

Bickel, P. and Doksum, K. (1977). Mathematical Statistics.

Prentice-Hall, Englewood Cliffs, NJ.

Blyth, C. R. and Still, H. A. (1983). Binomial conﬁdence inter-

vals. J. Amer. Statist. Assoc. 78 108–116.

Brown, L. D., Cai, T. and DasGupta, A. (1999). Conﬁdence inter-

vals for a binomial proportion and asymptotic expansions.

Ann. Statist to appear.

Brown, L. D., Cai, T. and DasGupta, A. (2000). Interval estima-

tion in discrete exponential family. Technical report, Dept.

Statistics. Univ. Pennsylvania.

Casella, G. (1986). Reﬁning binomial conﬁdence intervals

Canad. J. Statist. 14 113–129.

Casella, G. and Berger, R. L. (1990). Statistical Inference.

Wadsworth & Brooks/Cole, Belmont, CA.

Clopper, C. J. and Pearson, E. S. (1934). The use of conﬁdence

or ﬁducial limits illustrated in the case of the binomial.

Biometrika 26 404–413.

Cox, D. R. and Snell, E. J. (1989). Analysis of Binary Data, 2nd

ed. Chapman and Hall, London.

Cressie, N. (1980). A ﬁnely tuned continuity correction. Ann.

Inst. Statist. Math. 30 435–442.

Ghosh, B. K. (1979). A comparison of some approximate conﬁ-

dence intervals for the binomial parameter J. Amer. Statist.

Assoc. 74 894–900.

Hall, P. (1982). Improving the normal approximation when

constructing one-sided conﬁdence intervals for binomial or

Poisson parameters. Biometrika 69 647–652.

Lehmann, E. L. (1999). Elements of Large-Sample Theory.

Springer, New York.

Newcombe, R. G. (1998). Two-sided conﬁdence intervals for the

single proportion; comparison of several methods. Statistics

in Medicine 17 857–872.

Rao, C. R. (1973). Linear Statistical Inference and Its Applica-

tions. Wiley, New York.

Samuels, M. L. and Witmer, J. W. (1999). Statistics for

the Life Sciences, 2nd ed. Prentice Hall, Englewood

Cliffs, NJ.

Santner, T. J. (1998). A note on teaching binomial conﬁdence

intervals. Teaching Statistics 20 20–23.

Santner, T. J. and Duffy, D. E. (1989). The Statistical Analysis

of Discrete Data. Springer, Berlin.

Stone, C. J. (1995). A Course in Probability and Statistics.

Duxbury, Belmont, CA.

Strawderman, R. L. and Wells, M. T. (1998). Approximately

exact inference for the common odds ratio in several 2 2

tables (with discussion). J. Amer. Statist. Assoc. 93 1294–

1320.

Tamhane, A. C. and Dunlop, D. D. (2000). Statistics and Data

Analysis from Elementary to Intermediate. Prentice Hall,

Englewood Cliffs, NJ.

Vollset, S. E. (1993). Conﬁdence intervals for a binomial pro-

portion. Statistics in Medicine 12 809–824.

Wasserman, L. (1991). An inferential interpretation of default

priors. Technical report, Carnegie-Mellon Univ.

Wilson, E. B. (1927). Probable inference, the law of succes-

sion, and statistical inference. J. Amer. Statist. Assoc. 22

209–212.

Comment

Alan Agresti and Brent A. Coull

In this very interesting article, Professors Brown,

Cai and DasGupta (BCD) have shown that discrete-

Alan Agresti is Distinguished Professor, Depart-

ment of Statistics, University of Florida, Gainesville,

Florida 32611-8545 (e-mail: aa@stat.uﬂ.edu). Brent

A. Coull is Assistant Professor, Department of Bio-

statistics, Harvard School of Public Health, Boston,

Massachusetts 02115 (e-mail: bcoull@hsph.har-

vard.edu).

ness can cause havoc for much larger sample sizes

that one would expect. The popular (Wald) conﬁ-

dence interval for a binomial parameter r has been

known for some time to behave poorly, but readers

will surely be surprised that this can happen for

such large n values.

Interval estimation of a binomial parameter is

deceptively simple, as there are not even any nui-

sance parameters. The gold standard would seem

to be a method such as the Clopper–Pearson, based

on inverting an “exact” test using the binomial dis-

118 L. D. BROWN, T. T. CAI AND A. DASGUPTA

Fig. 1. A Comparison of mean expected lengths for the nominal 95% Jeffreys (J), Wilson (W), Modiﬁed Jeffreys (M-J), Modiﬁed Wilson

(M-W), and Agresti–Coull (AC) intervals for n = 5, 6, 7, 8, 9.

tribution rather than an approximate test using

the normal. Because of discreteness, however, this

method is too conservative. A more practical, nearly

gold standard for this and other discrete problems

seems to be based on inverting a two-sided test

using the exact distribution but with the mid-T

value. Similarly, with large-sample methods it is

better not to use a continuity correction, as other-

wise it approximates exact inference based on an

ordinary T-value, resulting in conservative behav-

ior. Interestingly, BCD note that the Jeffreys inter-

val (CI

J

) approximates the mid-T value correction

of the Clopper–Pearson interval. See Gart (1966)

for related remarks about the use of

1

2

additions

to numbers of successes and failures before using

frequentist methods.

1. METHODS FOR ELEMENTARY

STATISTICS COURSES

It’s unfortunate that the Wald interval for r

is so seriously deﬁcient, because in addition to

being the simplest interval it is the obvious one

to teach in elementary statistics courses. By con-

trast, the Wilson interval (CI

W

) performs surpris-

ingly well even for small n. Since it is too com-

plex for many such courses, however, our motiva-

tion for the “Agresti–Coull interval” (CI

PC

) was to

provide a simple approximation for CI

W

. Formula

(4) in BCD shows that the midpoint ˜ r for CI

W

is

a weighted average of ˆ r and 1/2 that equals the

sample proportion after adding z

2

α¡2

pseudo obser-

vations, half of each type; the square of the coef-

ﬁcient of z

α¡2

is the same weighted average of the

variance of a sample proportion when r = ˆ r and

when r = 1¡2, using ˜ n = n ÷z

2

α¡2

in place of n. The

CI

PC

uses the CI

W

midpoint, but its squared coef-

ﬁcient of z

α¡2

is the variance ˜ r˜ q¡ ˜ n at the weighted

average ˜ r rather than the weighted average of the

variances. The resulting interval ˜ r ± z

α¡2

( ˜ r˜ q¡ ˜ n)

1¡2

is wider than CI

W

(by Jensen’s inequality), in par-

ticular being conservative for r near 0 and 1 where

CI

W

can suffer poor coverage probabilities.

Regarding textbook qualiﬁcations on sample size

for using the Wald interval, skewness considera-

tions and the Edgeworth expansion suggest that

guidelines for n should depend on r through (1 −

2r)

2

¡|r(1−r)|. See, for instance, Boos and Hughes-

Oliver (2000). But this does not account for the

effects of discreteness, and as BCD point out, guide-

lines in terms of r are not veriﬁable. For elemen-

tary course teaching there is no obvious alternative

(such as I methods) for smaller n, so we think it is

sensible to teach a single method that behaves rea-

sonably well for all n, as do the Wilson, Jeffreys and

Agresti–Coull intervals.

2. IMPROVED PERFORMANCE WITH

BOUNDARY MODIFICATIONS

BCD showed that one can improve the behavior

of the Wilson and Jeffreys intervals for r near 0

and 1 by modifying the endpoints for CI

W

when

r = 1, 2, n − 2, n − 1 (and r = 3 and n − 3 for

n > 50) and for CI

J

when r = 0, 1, n − 1, n. Once

one permits the modiﬁcation of methods near the

sample space boundary, other methods may per-

form decently besides the three recommended in

this article.

For instance, Newcombe (1998) showed that when

0 - r - n the Wilson interval CI

W

and the Wald

logit interval have the same midpoint on the logit

scale. In fact, Newcombe has shown (personal com-

munication, 1999) that the logit interval necessarily

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 119

Fig. 2. A comparison of expected lengths for the nominal 95% Jeffreys (J), Wilson (W), Modiﬁed Jeffreys (M-J), Modiﬁed Wilson (M-W),

and Agresti–Coull (AC) intervals for n = 5.

contains CI

W

. The logit interval is the uninforma-

tive one [0, 1] when r = 0 or r = n, but substitut-

ing the Clopper–Pearson limits in those cases yields

coverage probability functions that resemble those

for CI

W

and CI

PC

, although considerably more con-

servative for small n. Rubin and Schenker (1987)

recommended the logit interval after

1

2

additions to

numbers of successes and failures, motivating it as a

normal approximation to the posterior distribution

of the logit parameter after using the Jeffreys prior.

However, this modiﬁcation has coverage probabili-

ties that are unacceptably small for r near 0 and 1

(See Vollset, 1993). Presumably some other bound-

ary modiﬁcation will result in a happy medium. In

a letter to the editor about Agresti and Coull (1998),

Rindskopf (2000) argued in favor of the logit inter-

val partly because of its connection with logit mod-

eling. We have not used this method for teaching

in elementary courses, since logit intervals do not

extend to intervals for the difference of proportions

and (like CI

W

and CI

J

) they are rather complex for

that level.

For practical use and for teaching in more

advanced courses, some statisticians may prefer the

likelihood ratio interval, since conceptually it is sim-

ple and the method also applies in a general model-

building framework. An advantage compared to the

Wald approach is its invariance to the choice of

scale, resulting, for instance, both from the origi-

nal scale and the logit. BCD do not say much about

this interval, since it is harder to compute. However,

it is easy to obtain with standard statistical soft-

ware (e.g., in SAS, using the LRCI option in PROC

GENMOD for a model containing only an intercept

term and assuming a binomial response with logit

or identity link function). Graphs in Vollset (1993)

suggest that the boundary-modiﬁed likelihood ratio

interval also behaves reasonably well, although con-

servative for r near 0 and 1.

For elementary course teaching, a disadvantage

of all such intervals using boundary modiﬁcations

is that making exceptions from a general, simple

recipe distracts students from the simple concept

of taking the estimate plus and minus a normal

score multiple of a standard error. (Of course, this

concept is not sufﬁcient for serious statistical work,

but some over simpliﬁcation and compromise is nec-

essary at that level.) Even with CI

PC

, instructors

may ﬁnd it preferable to give a recipe with the

same number of added pseudo observations for all

α, instead of z

2

α¡2

. Reasonably good performance

seems to result, especially for small α, from the

value 4 ≈ z

2

0.025

used in the 95% CI

PC

interval (i.e.,

the “add two successes and two failures” interval).

Agresti and Caffo (2000) discussed this and showed

that adding four pseudo observations also dramat-

ically improves the Wald two-sample interval for

comparing proportions, although again at the cost of

rather severe conservativeness when both parame-

ters are near 0 or near 1.

3. ALTERNATIVE WIDTH COMPARISON

In comparing the expected lengths of the

three recommended intervals, BCD note that the

comparison is clear and consistent as n changes,

with the average expected length being noticeably

larger for CI

PC

than CI

J

and CI

W

. Thus, in their

concluding remarks, they recommend CI

J

and CI

W

for small n. However, since BCD recommend mod-

ifying CI

J

and CI

W

to eliminate severe downward

spikes of coverage probabilities, we believe that a

120 L. D. BROWN, T. T. CAI AND A. DASGUPTA

more fair comparison of expected lengths uses the

modiﬁed versions CI

J−J

and CI

J−W

. We checked

this but must admit that ﬁgures analogous to

the BCD Figures 8 and 9 show that CI

J−J

and

CI

J−W

maintain their expected length advantage

over CI

PC

, although it is reduced somewhat.

However, when n decreases below 10, the results

change, with CI

J−J

having greater expected width

than CI

PC

and CI

J−W

. Our Figure 1 extends the

BCD Figure 9 to values of n - 10, showing how the

comparison differs between the ordinary intervals

and the modiﬁed ones. Our Figure 2 has the format

of the BCD Figure 8, but for n = 5 instead of 25.

Admittedly, n = 5 is a rather extreme case, one for

which the Jeffreys interval is modiﬁed unless r = 2

or 3 and the Wilson interval is modiﬁed unless r = 0

or 5, and for it CI

PC

has coverage probabilities that

can dip below 0.90. Thus, overall, the BCD recom-

mendations about choice of method seem reasonable

to us. Our own preference is to use the Wilson inter-

val for statistical practice and CI

PC

for teaching in

elementary statistics courses.

4. EXTENSIONS

Other than near-boundary modiﬁcations, another

type of ﬁne-tuning that may help is to invert a test

permitting unequal tail probabilities. This occurs

naturally in exact inference that inverts a sin-

gle two-tailed test, which can perform better than

inverting two separate one-tailed tests (e.g., Sterne,

1954; Blyth and Still, 1983).

Finally, we are curious about the implications of

the BCD results in a more general setting. How

much does their message about the effects of dis-

creteness and basing interval estimation on the

Jeffreys prior or the score test rather than the Wald

test extend to parameters in other discrete distri-

butions and to two-sample comparisons? We have

seen that interval estimation of the Poisson param-

eter beneﬁts from inverting the score test rather

than the Wald test on the count scale (Agresti and

Coull, 1998).

One would not think there could be anything

new to say about the Wald conﬁdence interval

for a proportion, an inferential method that must

be one of the most frequently used since Laplace

(1812, page 283). Likewise, the conﬁdence inter-

val for a proportion based on the Jeffreys prior

has received attention in various forms for some

time. For instance, R. A. Fisher (1956, pages 63–

70) showed the similarity of a Bayesian analysis

with Jeffreys prior to his ﬁducial approach, in a dis-

cussion that was generally critical of the conﬁdence

interval method but grudgingly admitted of limits

obtained by a test inversion such as the Clopper–

Pearson method, “though they fall short in logical

content of the limits found by the ﬁducial argument,

and with which they have often been confused, they

do fulﬁl some of the desiderata of statistical infer-

ences.” Congratulations to the authors for brilliantly

casting new light on the performance of these old

and established methods.

Comment

George Casella

1. INTRODUCTION

Professors Brown, Cai and DasGupta (BCD) are

to be congratulated for their clear and imaginative

look at a seemingly timeless problem. The chaotic

behavior of coverage probabilities of discrete conﬁ-

dence sets has always been an annoyance, result-

ing in intervals whose coverage probability can be

George Casella is Arun Varma Commemorative

Term Professor and Chair, Department of Statis-

tics, University of Florida, Gainesville, Florida

32611-8545 (e-mail: casella@stat.uﬂ.edu).

vastly different from their nominal conﬁdence level.

What we now see is that for the Wald interval, an

approximate interval, the chaotic behavior is relent-

less, as this interval will not maintain 1 − α cover-

age for any value of n. Although ﬁxes relying on

ad hoc rules abound, they do not solve this funda-

mental defect of the Wald interval and, surprisingly,

the usual safety net of asymptotics is also shown

not to exist. So, as the song goes, “Bye-bye, so long,

farewell” to the Wald interval.

Now that the Wald interval is out, what is in?

There are probably two answers here, depending

on whether one is in the classroom or the consult-

ing room.

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 121

Fig. 1. Coverage probabilities of the Blyth-Still interval (upper) and Agresti-Coull interval (lower) for n = 100 and 1 −α = 0.95.

2. WHEN YOU SAY 95%. . .

In the classroom it is (still) valuable to have a

formula for a conﬁdence intervals, and I typically

present the Wilson/score interval, starting from

the test statistic formulation. Although this doesn’t

have the pleasing ˆ r ± something, most students

can understand the logic of test inversion. More-

over, the fact that the interval does not have a

symmetric form is a valuable lesson in itself; the

statistical world is not always symmetric.

However, one thing still bothers me about this

interval. It is clearly not a 1 − α interval; that is,

it does not maintain its nominal coverage prob-

ability. This is a defect, and one that should not

be compromised. I am uncomfortable in present-

ing a conﬁdence interval that does not maintain its

stated conﬁdence; when you say 95% you should

mean 95%!

But the ﬁx here is rather simple: apply the “con-

tinuity correction” to the score interval (a technique

that seems to be out of favor for reasons I do not

understand). The continuity correction is easy to

justify in the classroom using pictures of the nor-

mal density overlaid on the binomial mass func-

tion, and the resulting interval will now maintain

its nominal level. (This last statement is not based

on analytic proof, but on numerical studies.) Anyone

reading Blyth (1986) cannot help being convinced

that this is an excellent approximation, coming at

only a slightly increased effort.

One other point that Blyth makes, which BCD do

not mention, is that it is easy to get exact conﬁ-

dence limits at the endpoints. That is, for . = 0 the

122 L. D. BROWN, T. T. CAI AND A. DASGUPTA

lower bound is 0 and for . = 1 the lower bound is

1 −(1 −α)

1¡n

[the solution to T(. = 0) = 1 −α].

3. USE YOUR TOOLS

The essential message that I take away from the

work of BCD is that an approximate/formula-based

approach to constructing a binomial conﬁdence

interval is bound to have essential ﬂaws. However,

this is a situation where brute force computing will

do the trick. The construction of a 1 − α binomial

conﬁdence interval is a discrete optimization prob-

lem that is easily programmed. So why not use the

tools that we have available? If the problem will

yield to brute force computation, then we should

use that solution.

Blyth and Still (1983) showed how to compute

exact intervals through numerical inversion of

tests, and Casella (1986) showed how to compute

exact intervals by reﬁning conservative intervals.

So for any value of n and α, we can compute an

exact, shortest 1 − α conﬁdence interval that will

not display any of the pathological behavior illus-

trated by BCD. As an example, Figure 1 shows the

Agresti–Coull interval along with the Blyth–Still

interval for n = 100 and 1 − α = 0.95. While

the Agresti–Coull interval fails to maintain 0.95

coverage in the middle r region, the Blyth–Still

interval always maintains 0.95 coverage. What is

more surprising, however, is that the Blyth–Still

interval displays much less variation in its cov-

erage probability, especially near the endpoints.

Thus, the simplistic numerical algorithm produces

an excellent interval, one that both maintains its

guaranteed coverage and reduces oscillation in the

coverage probabilities.

ACKNOWLEDGMENT

Supported by NSF Grant DMS-99-71586.

Comment

Chris Corcoran and Cyrus Mehta

We thank the authors for a very accessible

and thorough discussion of this practical prob-

lem. With the availability of modern computa-

tional tools, we have an unprecedented opportu-

nity to carefully evaluate standard statistical pro-

cedures in this manner. The results of such work

are invaluable to teachers and practitioners of

statistics everywhere. We particularly appreciate

the attention paid by the authors to the gener-

ally oversimpliﬁed and inadequate recommenda-

tions made by statistical texts regarding when to

use normal approximations in analyzing binary

data. As their work has plainly shown, even in

the simple case of a single binomial proportion,

the discreteness of the data makes the use of

Chris Corcoran is Assistant Professor, Depart-

ment of Mathematics and Statistics, Utah

State University, 3900 old Main Hill, Logon,

Utah, 84322-3900 (e-mail: corcoran@math.

usu.edu). Cyrus Mehta is Professor, Department

of Biostatistics, Harvard School of Public Health,

655 Huntington Avenue Boston, Massachusetts

02115 and is with Cytel Software Corporation, 675

Massachusetts Avenue, Cambridge, Massachusetts

02319.

some asymptotic procedures tenuous, even when the

underlying probability lies away from the boundary

or when the sample size is relatively large.

The authors have evaluated various conﬁdence

intervals with respect to their coverage properties

and average lengths. Implicit in their evaluation

is the premise that overcoverage is just as bad as

undercoverage. We disagree with the authors on this

fundamental issue. If, because of the discreteness of

the test statistic, the desired conﬁdence level cannot

be attained, one would ordinarily prefer overcover-

age to undercoverage. Wouldn’t you prefer to hire

a fortune teller whose track record exceeds expec-

tations to one whose track record is unable to live

up to its claim of accuracy? With the exception of

the Clopper–Pearson interval, none of the intervals

discussed by the authors lives up to its claim of

95% accuracy throughout the range of r. Yet the

authors dismiss this interval on the grounds that

it is “wastefully conservative.” Perhaps so, but they

do not address the issue of how the wastefulness is

manifested.

What penalty do we incur for furnishing conﬁ-

dence intervals that are more truthful than was

required of them? Presumably we pay for the conser-

vatism by an increase in the length of the conﬁdence

interval. We thought it would be a useful exercise

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 123

Fig. 1. Actual coverage probabilities for BSC and LR intervals

as a function of r(n = 50). Compare to author’s Figures 5, 10

and 11.

to actually investigate the magnitude of this penalty

for two conﬁdence interval procedures that are guar-

anteed to provide the desired coverage but are not

as conservative as Clopper–Pearson. Figure 1 dis-

plays the true coverage probabilities for the nominal

95% Blyth–Still–Casella (see Blyth and Still, 1983;

Casella, 1984) conﬁdence interval (BSC interval)

and the 95% conﬁdence interval obtained by invert-

ing the exact likelihood ratio test (LR interval; the

inversion follows that shown by Aitken, Anderson,

Francis and Hinde, 1989, pages 112–118).

There is no value of r for which the coverage of the

BSC and LR intervals falls below 95%. Their cover-

age probabilities are, however, much closer to 95%

than would be obtained by the Clopper–Pearson pro-

cedure, as is evident from the authors’ Figure 11.

Thus one could say that these two intervals are uni-

formly better than the Clopper–Pearson interval.

We next investigate the penalty to be paid for the

guaranteed coverage in terms of increased length of

the BSC and LR intervals relative to the Wilson,

Agresti–Coull, or Jeffreys intervals recommended

by the authors. This is shown by Figure 2.

In fact the BSC and LR intervals are actually

shorter than Agresti–Coull for r - 0.2 or r > 0.8,

and shorter than the Wilson interval for r - 0.1

and r > 0.9. The only interval that is uniformly

shorter than BSC and LR is the Jeffreys interval.

Most of the time the difference in lengths is negligi-

ble, and in the worst case (at r = 0.5) the Jeffreys

interval is only shorter by 0.025 units. Of the three

asymptotic methods recommended by the authors,

the Jeffreys interval yields the lowest average prob-

ability of coverage, with signiﬁcantly greater poten-

tial relative undercoverage in the (0.05, 0.20) and

(0.80, 0.95) regions of the parameter space. Consid-

ering this, one must question the rationale for pre-

ferring Jeffreys to either BSC or LR.

The authors argue for simplicity and ease of com-

putation. This argument is valid for the teaching of

statistics, where the instructor must balance sim-

plicity with accuracy. As the authors point out, it is

customary to teach the standard interval in intro-

ductory courses because the formula is straight-

forward and the central limit theorem provides a

good heuristic for motivating the normal approxi-

mation. However, the evidence shows that the stan-

dard method is woefully inadequate. Teaching sta-

tistical novices about a Clopper–Pearson type inter-

val is conceptually difﬁcult, particularly because

exact intervals are impossible to compute by hand.

As the Agresti–Coull interval preserves the conﬁ-

dence level most successfully among the three rec-

ommended alternative intervals, we believe that

this feature when coupled with its straightforward

computation (particularly when α = 0.05) makes

this approach ideal for the classroom.

Simplicity and ease of computation have no role

to play in statistical practice. With the advent

of powerful microcomputers, researchers no longer

resort to hand calculations when analyzing data.

While the need for simplicity applies to the class-

room, in applications we primarily desire reliable,

accurate solutions, as there is no signiﬁcant dif-

ference in the computational overhead required by

the authors’ recommended intervals when compared

to the BSC and LR methods. From this perspec-

tive, the BSC and LR intervals have a substantial

advantage relative to the various asymptotic inter-

vals presented by the authors. They guarantee cov-

erage at a relatively low cost in increased length.

In fact, the BSC interval is already implemented in

StatXact (1998) and is therefore readily accessible to

practitioners.

124 L.D. BROWN, T.T. CAI AND A. DASGUPTA

Fig. 2. Expected lengths of BSC and LR intervals as a function of r compared, respectively, to Wilson, Agresti–Coull and Jeffreys

intervals (n = 25). Compare to authors’ Figure 8.

Comment

Malay Ghosh

This is indeed a very valuable article which brings

out very clearly some of the inherent difﬁculties

associated with conﬁdence intervals for parame-

ters of interest in discrete distributions. Professors

Brown, Cai and Dasgupta (henceforth BCD) are

to be complimented for their comprehensive and

thought-provoking discussion about the “chaotic”

behavior of the Wald interval for the binomial pro-

portion and an appraisal of some of the alternatives

that have been proposed.

My remarks will primarily be conﬁned to the

discussion of Bayesian methods introduced in this

paper. BCD have demonstrated very clearly that the

Malay Ghosh is Distinguished Professor, Depart-

ment of Statistics, University of Florida, Gainesville,

Florida 32611-8545 (e-mail: ghoshm@stat.uﬂ.edu).

modiﬁed Jeffreys equal-tailed interval works well

in this problem and recommend it as a possible con-

tender to the Wilson interval for n ≤ 40.

There is a deep-rooted optimality associated with

Jeffreys prior as the unique ﬁrst-order probability

matching prior for a real-valued parameter of inter-

est with no nuisance parameter. Roughly speak-

ing, a probability matching prior for a real-valued

parameter is one for which the coverage probability

of a one-sided Baysian credible interval is asymp-

totically equal to its frequentist counterpart. Before

giving a formal deﬁnition of such priors, we pro-

vide an intuitive explanation of why Jeffreys prior

is a matching prior. To this end, we begin with

the fact that if .

1

, . . . , .

n

are iid 1(θ, 1), then

¨

.

n

=

¸

n

i=1

.

i

¡n is the MLE of θ. With the uni-

form prior π(θ) ∝ c (a constant), the posterior of θ

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 125

is 1(

¨

.

n

, 1¡n). Accordingly, writing z

α

for the upper

100α% point of the 1(0, 1) distribution,

T(θ ≤

¨

.

n

÷z

α

n

−1¡2

[

¨

.

n

)

= 1 −α = T(θ ≤

¨

.

n

÷z

α

n

−1¡2

[θ)

and this is an example of perfect matching. Now

if

ˆ

θ

n

is the MLE of θ, under suitable regular-

ity conditions,

ˆ

θ

n

[θ is asymptotically (as n → ∞)

1(θ, 1

−1

(θ)), where 1(θ) is the Fisher Information

number. With the transformation ¤(θ) =

θ

1

1¡2

(I),

by the delta method, ¤(

ˆ

θ

n

) is asymptotically

1(¤(θ), 1). Now, intuitively one expects the uniform

prior π(θ) ∝ c as the asymptotic matching prior for

¤(θ). Transforming back to the original parameter,

Jeffreys prior is a probability matching prior for θ.

Of course, this requires an invariance of probability

matching priors, a fact which is rigorously estab-

lished in Datta and Ghosh (1996). Thus a uniform

prior for arcsin(θ

1¡2

), where θ is the binomial pro-

portion, leads to Jeffreys Beta (1/2, 1/2) prior for θ.

When θ is the Poisson parameter, the uniform prior

for θ

1¡2

leads to Jeffreys’ prior θ

−1¡2

for θ.

In a more formal set-up, let .

1

, . . . , .

n

be iid

conditional on some real-valued θ. Let θ

1−α

π

(.

1

, . . . ,

.

n

) denote a posterior (1−α)th quantile for θ under

the prior π. Then π is said to be a ﬁrst-order prob-

ability matching prior if

T(θ ≤ θ

1−α

π

(.

1

, . . . , .

n

)[θ)

= 1 −α ÷o(n

−1¡2

).

(1)

This deﬁnition is due to Welch and Peers (1963)

who showed by solving a differential equation that

Jeffreys prior is the unique ﬁrst-order probability

matching prior in this case. Strictly speaking, Welch

and Peers proved this result only for continuous

distributions. Ghosh (1994) pointed out a suitable

modiﬁcation of criterion (1) which would lead to the

same conclusion for discrete distributions. Also, for

small and moderate samples, due to discreteness,

one needs some modiﬁcations of Jeffreys interval as

done so successfully by BCD.

This idea of probability matching can be extended

even in the presence of nuisance parameters.

Suppose that θ = (θ

1

, . . . , θ

r

)

T

, where θ

1

is the par-

ameter of interest, while (θ

2

, . . . , θ

r

)

T

is the nui-

sance parameter. Writing 1(θ) = ((1

jI

)) as the

Fisher information matrix, if θ

1

is orthogonal to

(θ

2

, . . . , θ

r

)

T

in the sense of Cox and Reid (1987),

that is, 1

1I

= 0 for all I = 2, . . . , r, extending

the previous intuitive argument, π(θ) ∝ 1

1¡2

11

(θ)

is a probability matching prior. Indeed, this prior

belongs to the general class of ﬁrst-order probabil-

ity matching priors

π(θ) ∝ 1

1¡2

11

(θ)!(θ

2

, . . . , θ

r

)

as derived in Tibshirani (1989). Here !(·) is an arbi-

trary function differentiable in its arguments.

In general, matching priors have a long success

story in providing frequentist conﬁdence intervals,

especially in complex problems, for example, the

Behrens–Fisher or the common mean estimation

problems where frequentist methods run into dif-

ﬁculty. Though asymptotic, the matching property

seems to hold for small and moderate sample sizes

as well for many important statistical problems.

One such example is Garvan and Ghosh (1997)

where such priors were found for general disper-

sion models as given in Jorgensen (1997). It may

be worthwhile developing these priors in the pres-

ence of nuisance parameters for other discrete cases

as well, for example when the parameter of interest

is the difference of two binomial proportions, or the

log-odds ratio in a 2 2 contingency table.

Having argued so strongly in favor of matching

priors, I wonder, though, whether there is any spe-

cial need for such priors in this particular problem of

binomial proportions. It appears that any Beta (o, o)

prior will do well in this case. As noted in this paper,

by shrinking the MLE .¡n toward the prior mean

1/2, one achieves a better centering for the construc-

tion of conﬁdence intervals. The two diametrically

opposite priors Beta (2, 2) (symmetric concave with

maximum at 1/2 which provides the Agresti–Coull

interval) and Jeffreys prior Beta (1¡2, 1¡2) (symmet-

ric convex with minimum at 1¡2) seem to be equally

good for recentering. Indeed, I wonder whether any

Beta (α, β) prior which shrinks the MLE toward

the prior mean α¡(α ÷ β) becomes appropriate for

recentering.

The problem of construction of conﬁdence inter-

vals for binomial proportions occurs in ﬁrst courses

in statistics as well as in day-to-day consulting.

While I am strongly in favor of replacing Wald inter-

vals by the new ones for the latter, I am not quite

sure how easy it will be to motivate these new inter-

vals for the former. The notion of shrinking can be

explained adequately only to a few strong students

in introductory statistics courses. One possible solu-

tion for the classroom may be to bring in the notion

of continuity correction and somewhat heuristcally

ask students to work with (.÷

1

2

, n−.÷

1

2

) instead

of (., n − .). In this way, one centers around

(.÷

1

2

)¡(n ÷1) a la Jeffreys prior.

126 L. D. BROWN, T. T. CAI AND A. DASGUPTA

Comment

Thomas J. Santner

I thank the authors for their detailed look at

a well-studied problem. For the Wald binomial r

interval, there has not been an appreciation of

the long persistence (in n) of r locations having

substantially deﬁcient achieved coverage compared

with the nominal coverage. Figure 1 is indeed a

picture that says a thousand words. Similarly, the

asymptotic lower limit in Theorem 1 for the mini-

mum coverage of the Wald interval is an extremely

useful analytic tool to explain this phenomenon,

although other authors have given ﬁxed r approx-

imations of the coverage probability of the Wald

interval (e.g., Theorem 1 of Ghosh, 1979).

My ﬁrst set of comments concern the speciﬁc bino-

mial problem that the authors address and then the

implications of their work for other important dis-

crete data conﬁdence interval problems.

The results in Ghosh (1979) complement the cal-

culations of Brown, Cai and DasGupta (BCD) by

pointing out that the Wald interval is “too long” in

addition to being centered at the “wrong” value (the

MLE as opposed to a Bayesian point estimate such

is used by the Agresti–Coull interval). His Table 3

lists the probability that the Wald interval is longer

than the Wilson interval for a central set of r val-

ues (from 0.20 to 0.80) and a range of sample sizes

n from 20 to 200. Perhaps surprisingly, in view of

its inferior coverage characteristics, the Wald inter-

val tends to be longer than the Wilson interval

with very high probability. Hence the Wald interval

is both too long and centered at the wrong place.

This is a dramatic effect of the skewness that BCD

mention.

When discussing any system of intervals, one

is concerned with the consistency of the answers

given by the interval across multiple uses by a

single researcher or by groups of users. Formally,

this is the reason why various symmetry properties

are required of conﬁdence intervals. For example,

in the present case, requiring that the r interval

(1(.), U(.)) satisfy the symmetry property

(1(r), U(r)) = (1 −1(n −r), 1 −U(n −r)) (1)

for r ∈ {0, . . . , n} shows that investigators who

reverse their deﬁnitions of success and failure will

Thomas J. Santner is Profesor, Ohio State Univer-

sity, 404 Cockins Hall, 1958 Neil Avenue, Columbus,

Ohio 43210 (e-mail: tjs@stat.ohio-state.edu).

be consistent in their assessment of the likely values

for r. Symmetry (1) is the minimal requirement of a

binomial conﬁdence interval. The Wilson and equal-

tailed Jeffrey intervals advocated by BCD satisfy

the symmetry property (1) and have coverage that

is centered (when coverage is plotted versus true r)

about the nominal value. They are also straightfor-

ward to motivate, even for elementary students, and

simple to compute for the outcome of interest.

However, regarding r conﬁdence intervals as the

inversion of a family of acceptance regions corre-

sponding to size α tests of H

0

: r = r

0

versus

H

P

: r ,= r

0

for 0 - r

0

- 1 has some sub-

stantial advantages. Indeed, Brown et al. mention

this inversion technique when they remark on the

desirable properties of intervals formed by invert-

ing likelihood ratio test acceptance regions of H

0

versus H

P

. In the binomial case, the acceptance

region of any reasonable test of H

0

: r = r

0

is of

the form {1

r

0

, . . . , U

r

0

}. These acceptance regions

invert to intervals if and only if 1

r

0

and U

r

0

are

nondecreasing in r

0

(otherwise the inverted r con-

ﬁdence set can be a union of intervals). Of course,

there are many families of size α tests that meet

this nondecreasing criterion for inversion, includ-

ing the very conservative test used by Clopper and

Pearson (1934). For the binomial problem, Blyth and

Still (1983) constructed a set of conﬁdence intervals

by selecting among size α acceptance regions those

that possessed additional symmetry properties and

were “small” (leading to short conﬁdence intervals).

For example, they desired that the interval should

“move to the right” as r increases when n is ﬁxed

and should “move the left” as n increases when r

is ﬁxed. They also asked that their system of inter-

vals increase monotonically in the coverage proba-

bility for ﬁxed r and n in the sense that the higher

nominal coverage interval contain the lower nomi-

nal coverage interval.

In addition to being less intuitive to unsophisti-

cated statistical consumers, systems of conﬁdence

intervals formed by inversion of acceptance regions

also have two other handicaps that have hindered

their rise in popularity. First, they typically require

that the conﬁdence interval (essentially) be con-

structed for all possible outcomes, rather than

merely the response of interest. Second, their rather

brute force character means that a specialized com-

puter program must be written to produce the

acceptance sets and their inversion (the intervals).

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 127

Fig. 1. Coverage of nominal 95% symmetric Duffy–Santner r intervals for n = 20 (bottom panel) and n = 50 (top panel).

However, the beneﬁts of having reasonably short

and suitably symmetric conﬁdence intervals are suf-

ﬁcient that such intervals have been constructed for

several frequently occurring problems of biostatis-

tics. For example, Jennison and Turnbull (1983) and

Duffy and Santner (1987) present acceptance set–

inversion conﬁdence intervals (both with available

FORTRAN programs to implement their methods)

for a binomial r based on data from a multistage

clinical trial; Coe and Tamhane (1989) describe a

more sophisticated set of repeated conﬁdence inter-

vals for r

1

− r

2

also based on multistage clinical

trial data (and give a SAS macro to produce the

intervals). Yamagami and Santner (1990) present

an acceptance set–inversion conﬁdence interval and

FORTRAN program for r

1

− r

2

in the two-sample

binomial problem. There are other examples.

To contrast with the intervals whose coverages

are displayed in BCD’s Figure 5 for n = 20 and

n = 50, I formed the multistage intervals of Duffy

and Santner that strictly attain the nominal con-

ﬁdence level for all r. The computation was done

naively in the sense that the multistage FORTRAN

program by Duffy that implements this method

was applied using one stage with stopping bound-

aries arbitrarily set at (o, b) = (0, 1) in the nota-

tion of Duffy and Santner, and a small adjustment

was made to insure symmetry property (1). (The

nonsymmetrical multiple stage stopping boundaries

that produce the data considered in Duffy and Sant-

ner do not impose symmetry.) The coverages of these

systems are shown in Figure 1. To give an idea of

computing time, the n = 50 intervals required less

than two seconds to compute on my 400 Mhz PC.

To further facilitate comparison with the intervals

whose coverage is displayed in Figure 5 of BCD,

I computed the Duffy and Santner intervals for a

slightly lower level of coverage, 93.5%, so that the

average coverage was about the desired 95% nomi-

nal level; the coverage of this system is displayed

in Figure 2 on the same vertical scale and com-

pares favorably. It is possible to call the FORTRAN

program that makes these intervals within SPLUS

which makes for convenient data analysis.

I wish to mention that are a number of other

small sample interval estimation problems of con-

tinuing interest to biostatisticians that may well

have very reasonable small sample solutions based

on analogs of the methods that BCD recommend.

128 L. D. BROWN, T. T. CAI AND A. DASGUPTA

Fig. 2. Coverage of nominal 93.5% symmetric Duffy–Santner r intervals for n = 50.

Most of these would be extremely difﬁcult to han-

dle by the more brute force method of inverting

acceptance sets. The ﬁrst of these is the problem

of computing simultaneous conﬁdence intervals for

r

0

− r

i

, 1 ≤ i ≤ T that arises in comparing a con-

trol binomial distribution with T treatment ones.

The second concerns forming simultaneous conﬁ-

dence intervals for r

i

− r

j

, the cell probabilities

of a multinomial distribution. In particular, the

equal-tailed Jeffrey prior approach recommended by

the author has strong appeal for both of these prob-

lems.

Finally, I note that the Wilson intervals seem

to have received some recommendation as the

method of choice in other elementary texts. In his

introductory texts, Larson (1974) introduces the

Wilson interval as the method of choice although

he makes the vague, and indeed false, statement, as

BCD show, that the user can use the Wald interval if

“n is large enough.” One reviewer of Santner (1998),

an article that showed the coverage virtues of the

Wilson interval compared with Wald-like intervals

advocated by another author in the magazine Teach-

ing Statistics (written for high school teachers) com-

mented that the Wilson method was the “standard”

method taught in the U.K.

Rejoinder

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta

We deeply appreciate the many thoughtful and

constructive remarks and suggestions made by the

discussants of this paper. The discussion suggests

that we were able to make a convincing case that

the often-used Wald interval is far more problem-

atic than previously believed. We are happy to see

a consensus that the Wald interval deserves to

be discarded, as we have recommended. It is not

surprising to us to see disagreement over the spe-

ciﬁc alternative(s) to be recommended in place of

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 129

this interval. We hope the continuing debate will

add to a greater understanding of the problem, and

we welcome the chance to contribute to this debate.

A. It seems that the primary source of disagree-

ment is based on differences in interpretation

of the coverage goals for conﬁdence intervals.

We will begin by presenting our point of view

on this fundamental issue.

We will then turn to a number of other issues,

as summarized in the following list:

B. Simplicity is important.

C. Expected length is also important.

D. Santner’s proposal.

E. Should a continuity correction be used?

F. The Wald interval also performs poorly in

other problems.

G. The two-sample binomial problem.

H. Probability-matching procedures.

I. Results from asymptotic theory.

A. Professors Casella, Corcoran and Mehta come

out in favor of making coverage errors always fall

only on the conservative side. This is a traditional

point of view. However, we have taken a different

perspective in our treatment. It seems more consis-

tent with contemporary statistical practice to expect

that a γ% conﬁdence interval should cover the true

value approximately γ% of the time. The approxi-

mation should be built on sound, relevant statisti-

cal calculations, and it should be as accurate as the

situation allows.

We note in this regard that most statistical mod-

els are only felt to be approximately valid as repre-

sentations of the true situation. Hence the result-

ing coverage properties from those models are at

best only approximately accurate. Furthermore, a

broad range of modern procedures is supported

only by asymptotic or Monte-Carlo calculations, and

so again coverage can at best only be approxi-

mately the nominal value. As statisticians we do

the best within these constraints to produce proce-

dures whose coverage comes close to the nominal

value. In these contexts when we claim γ% cover-

age we clearly intend to convey that the coverage is

close to γ%, rather than to guarantee it is at least

γ%.

We grant that the binomial model has a some-

what special character relative to this general dis-

cussion. There are practical contexts where one can

feel conﬁdent this model holds with very high preci-

sion. Furthermore, asymptotics are not required in

order to construct practical procedures or evaluate

their properties, although asymptotic calculations

can be useful in both regards. But the discrete-

ness of the problem introduces a related barrier

to the construction of satisfactory procedures. This

forces one to again decide whether γ% should mean

“approximately γ%,” as it does in most other con-

temporary applications, or “at least γ%” as can

be obtained with the Blyth–Still procedure or the

Cloppe–Pearson procedure. An obvious price of the

latter approach is in its decreased precision, as mea-

sured by the increased expected length of the inter-

vals.

B. All the discussants agree that elementary

motivation and simplicity of computation are impor-

tant attributes in the classroom context. We of

course agree. If these considerations are paramount

then the Agresti–Coull procedure is ideal. If the

need for simplicity can be relaxed even a little, then

we prefer the Wilson procedure: it is only slightly

harder to compute, its coverage is clearly closer to

the nominal value across a wider range of values of

r, and it can be easier to motivate since its deriva-

tion is totally consistent with Neyman–Pearson the-

ory. Other procedures such as Jeffreys or the mid-T

Clopper–Pearson interval become plausible competi-

tors whenever computer software can be substituted

for the possibility of hand derivation and computa-

tion.

Corcoran and Mehta take a rather extreme posi-

tion when they write, “Simplicity and ease of com-

putation have no role to play in statistical practice

[italics ours].” We agree that the ability to perform

computations by hand should be of little, if any, rel-

evance in practice. But conceptual simplicity, parsi-

mony and consistency with general theory remain

important secondary conditions to choose among

procedures with acceptable coverage and precision.

These considerations will reappear in our discus-

sion of Santner’s Blyth–Still proposal. They also

leave us feeling somewhat ambivalent about the

boundary-modiﬁed procedures we have presented in

our Section 4.1. Agresti and Coull correctly imply

that other boundary corrections could have been

tried and that our choice is thus somewhat ad hoc.

(The correction to Wilson can perhaps be defended

on the principle of substituting a Poisson approx-

imation for a Gaussian one where the former is

clearly more accurate; but we see no such funda-

mental motivation for our correction to the Jeffreys

interval.)

C. Several discussants commented on the pre-

cision of various proposals in terms of expected

length of the resulting intervals. We strongly con-

cur that precision is the important balancing crite-

rion vis- ´ a-vis coverage. We wish only to note that

there exist other measures of precision than inter-

val expected length. In particular, one may investi-

gate the probability of covering wrong values. In a

130 L. D. BROWN, T. T. CAI AND A. DASGUPTA

charming identity worth noting, Pratt (1961) shows

the connection of this approach to that of expected

length. Calculations on coverage of wrong values of

r in the binomial case will be presented in Das-

Gupta (2001). This article also discusses a number

of additional issues and presents further analytical

calculations, including a Pearson tilting similar to

the chi-square tilts advised in Hall (1983).

Corcoran and Mehta’s Figure 2 compares average

length of three of our proposals with Blyth–Still and

with their likelihood ratio procedure. We note ﬁrst

that their LB procedure is not the same as ours.

Theirs is based on numerically computed exact per-

centiles of the ﬁxed sample likelihood ratio statistic.

We suspect this is roughly equivalent to adjustment

of the chi-squared percentile by a Bartlett correc-

tion. Ours is based on the traditional asymptotic

chi-squared formula for the distribution of the like-

lihood ratio statistic. Consequently, their procedure

has conservative coverage, whereas ours has cov-

erage ﬂuctuating around the nominal value. They

assert that the difference in expected length is “neg-

ligible.” How much difference qualiﬁes as negligible

is an arguable, subjective evaluation. But we note

that in their plot their intervals can be on aver-

age about 8% or 10% longer than Jeffreys or Wilson

intervals, respectively. This seems to us a nonneg-

ligible difference. Actually, we suspect their prefer-

ence for their LR and BSC intervals rests primarily

on their overriding preference for conservativity in

coverage whereas, as we have discussed above, our

intervals are designed to attain approximately the

desired nominal value.

D. Santner proposes an interesting variant of the

original Blyth–Still proposal. As we understand it,

he suggests producing nominal γ% intervals by con-

structing the γ

∗

% Blyth–Still intervals, with γ

∗

%

chosen so that the average coverage of the result-

ing intervals is approximately the nominal value,

γ%. The coverage plot for this procedure compares

well with that for Wilson or Jeffreys in our Figure 5.

Perhaps the expected interval length for this proce-

dure also compares well, although Santner does not

say so. However, we still do not favor his proposal.

It is conceptually more complicated and requires a

specially designed computer program, particularly if

one wishes to compute γ

∗

% with any degree of accu-

racy. It thus fails with respect to the criterion of sci-

entiﬁc parsimony in relation to other proposals that

appear to have at least competitive performance

characteristics.

E. Casella suggests the possibility of perform-

ing a continuity correction on the score statistic

prior to constructing a conﬁdence interval. We do

not agree with this proposal from any perspec-

tive. These “continuity-corrected Wilson” intervals

have extremely conservative coverage properties,

though they may not in principle be guaranteed to

be everywhere conservative. But even if one’s goal,

unlike ours, is to produce conservative intervals,

these intervals will be very inefﬁcient at their nor-

mal level relative to Blyth–Still or even Clopper–

Pearson. In Figure 1 below, we plot the coverage

of the Wilson interval with and without a conti-

nuity correction for n = 25 and α = 0.05, and

the corresponding expected lengths. It is seems

clear that the loss in precision more than neutral-

izes the improvements in coverage and that the

nominal coverage of 95% is misleading from any

perspective.

Fig. 1. Comparison of the coverage probabilities and expected lengths of the Wilson (dotted) and continuity-corrected Wilson (solid)

intervals for n = 25 and α = 0.05.

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 131

Fig. 2. Comparison of the systematic coverage biases. The ¸-axis is nS

n

(r). From top to bottom: the systematic coverage biases of the

Agresti–Coull, Wilson, Jeffreys, likelihood ratio and Wald intervals, with n = 50 and α = 0.05.

F. Agresti and Coull ask if the dismal perfor-

mance of the Wald interval manifests itself in

other problems, including nordiscrete cases. Indeed

it does. In other lattice cases such as the Poisson

and negative binomial, both the considerable neg-

ative coverage bias and inefﬁciency in length per-

sist. These features also show up in some continu-

ous exponential family cases. See Brown, Cai and

DasGupta (2000b) for details.

In the three important discrete cases, the bino-

mial, Poisson and negative binomial, there is in fact

some conformity in regard to which methods work

well in general. Both the likelihood ratio interval

(using the asymptotic chi-squared limits) and the

equal-tailed Jeffreys interval perform admirably in

all of these problems with regard to coverage and

expected length. Perhaps there is an underlying the-

oretical reason for the parallel behavior of these

two intervals constructed from very different foun-

dational principles, and this seems worth further

study.

G. Some discussants very logically inquire about

the situation in the two-sample binomial situation.

Curiously, in a way, the Wald interval in the two-

sample case for the difference of proportions is less

problematic than in the one-sample case. It can

nevertheless be somewhat improved. Agresti and

Caffo (2000) present a proposal for this problem,

and Brown and Li (2001) discuss some others.

H. The discussion by Ghosh raises several inter-

esting issues. The deﬁnition of “ﬁrst-order proba-

bility matching” extends in the obvious way to any

set of upper conﬁdence limits; not just those cor-

responding to Bayesian intervals. There is also an

obvious extension to lower conﬁdence limits. This

probability matching is a one-sided criterion. Thus

a family of two-sided intervals |1

n

, U

n

| will be ﬁrst-

order probability matching if

T¡

r

(r ≤ 1

n

) = α¡2 ÷o(n

−1¡2

) = T¡

r

(r ≥ U

n

).

As Ghosh notes, this deﬁnition cannot usefully

be literally applied to the binomial problem here,

because the asymptotic expansions always have a

discrete oscillation term that is O(n

−1¡2

). However,

one can correct the deﬁnition.

One way to do so involves writing asymptotic

expressions for the probabilities of interest that can

be divided into a “smooth” part, S, and an “oscil-

lating” part, Osc, that averages to O(n

−3¡2

) with

respect to any smooth density supported within (0,

1). Readers could consult BCD (2000a) for more

details about such expansions. Thus, in much gen-

erality one could write

T¡

r

(r ≤ 1

n

)

= α¡2 ÷S

1

n

(r) ÷O:c

1

n

(r) ÷O(n

−1

),

(1)

where S

1

n

(r) = O(n

−1¡2

), and O:c

1

n

(r) has the

property informally described above. We would then

say that the procedure is ﬁrst-order probability

matching if S

1

n

(r) = o(n

−1¡2

), with an analogous

expression for the upper limit, U

n

.

In this sense the equal-tail Jeffreys procedure

is probability matching. We believe that the mid-

T Clopper–Pearson intervals also have this asymp-

totic property. But several of the other proposals,

including the Wald, the Wilson and the likelihood

ratio intervals are not ﬁrst-order probability match-

ing. See Cai (2001) for exact and asymptotic calcula-

tions on one-sided conﬁdence intervals and hypoth-

esis testing in the discrete distributions.

132 L. D. BROWN, T. T. CAI AND A. DASGUPTA

The failure of this one-sided, ﬁrst-order property,

however, has no obvious bearing on the coverage

properties of the two-sided procedures considered

in the paper. That is because, for any of our proce-

dures,

S

1

n

(r) ÷S

U

n

(r) = 0 ÷O(n

−1

), (2)

even when the individual terms on the left are only

O(n

−1¡2

). All the procedures thus make compensat-

ing one-sided errors, to O(n

−1

), even when they are

not accurate to this degree as one-sided procedures.

This situation raises the question as to whether

it is desirable to add as a secondary criterion for

two-sided procedures that they also provide accu-

rate one-sided statements, at least to the probabil-

ity matching O(n

−1¡2

). While Ghosh argues strongly

for the probability matching property, his argument

does not seem to take into account the cancellation

inherent in (2). We have heard some others argue in

favor of such a requirement and some argue against

it. We do not wish to take a strong position on

this issue now. Perhaps it depends somewhat on the

practical context—if in that context the conﬁdence

bounds may be interpreted and used in a one-sided

fashion as well as the two-sided one, then perhaps

probability matching is called for.

I. Ghosh’s comments are a reminder that asymp-

totic theory is useful for this problem, even though

exact calculations here are entirely feasible and con-

venient. But, as Ghosh notes, asymptotic expres-

sions can be startingly accurate for moderate

sample sizes. Asymptotics can thus provide valid

insights that are not easily drawn from a series of

exact calculations. For example, the two-sided inter-

vals also obey an expression analogous to (1),

T¡

r

(1

n

≤ r ≤ U

n

) (3)

= 1 −α ÷S

n

(r) ÷O:c

n

(r) ÷O(n

−3¡2

).

The term S

n

(r) is O(n

−1

) and provides a useful

expression for the smooth center of the oscillatory

coverage plot. (See Theorem 6 of BCD (2000a) for

a precise justiﬁcation.) The following plot for n =

50 compares S

n

(r) for ﬁve conﬁdence procedures.

It shows how the Wilson, Jeffreys and chi-

squared likelihood ratio procedures all have cover-

age that well approximates the nominal value, with

Wilson being slightly more conservative than the

other two.

As we see it our article articulated three primary

goals: to demonstrate unambiguously that the Wald

interval performs extremely poorly; to point out that

none of the common prescriptions on when the inter-

val is satisfactory are correct and to put forward

some recommendations on what is to be used in its

place. On the basis of the discussion we feel gratiﬁed

that we have satisfactorily met the ﬁrst two of these

goals. As Professor Casella notes, the debate about

alternatives in this timeless problem will linger on,

as it should. We thank the discussants again for a

lucid and engaging discussion of a number of rel-

evant issues. We are grateful for the opportunity

to have learned so much from these distinguished

colleagues.

ADDITIONAL REFERENCES

Agresti, A. and Caffo, B. (2000). Simple and effective conﬁ-

dence intervals for proportions and differences of proportions

result from adding two successes and two failures. Amer.

Statist. 54. To appear.

Aitkin, M., Anderson, D., Francis, B. and Hinde, J. (1989).

Statistical Modelling in GLIM. Oxford Univ. Press.

Boos, D. D. and Hughes-Oliver, J. M. (2000). How large does n

have to be for Z and I intervals? Amer. Statist. 54 121–128.

Brown, L. D., Cai, T. and DasGupta, A. (2000a). Conﬁdence

intervals for a binomial proportion and asymptotic expan-

sions. Ann. Statist. To appear.

Brown, L. D., Cai, T. and DasGupta, A. (2000b). Interval estima-

tion in exponential families. Technical report, Dept. Statis-

tics, Univ. Pennsylvania.

Brown, L. D. and Li, X. (2001). Conﬁdence intervals for

the difference of two binomial proportions. Unpublished

manuscript.

Cai, T. (2001). One-sided conﬁdence intervals and hypothesis

testing in discrete distributions. Preprint.

Coe, P. R. and Tamhane, A. C. (1993). Exact repeated conﬁdence

intervals for Bernoulli parameters in a group sequential clin-

ical trial. Controlled Clinical Trials 14 19–29.

Cox, D. R. and Reid, N. (1987). Orthogonal parameters and

approximate conditional inference (with discussion). J. Roy.

Statist. Soc. Ser. B 49 113–147.

DasGupta, A. (2001). Some further results in the binomial inter-

val estimation problem. Preprint.

Datta, G. S. and Ghosh, M. (1996). On the invariance of nonin-

formative priors. Ann. Statist. 24 141–159.

Duffy, D. and Santner, T. J. (1987). Conﬁdence intervals for

a binomial parameter based on multistage tests. Biometrics

43 81–94.

Fisher, R. A. (1956). Statistical Methods for Scientiﬁc Inference.

Oliver and Boyd, Edinburgh.

Gart, J. J. (1966). Alternative analyses of contingency tables. J.

Roy. Statist. Soc. Ser. B 28 164–179.

Garvan, C. W. and Ghosh, M. (1997). Noninformative priors for

dispersion models. Biometrika 84 976–982.

Ghosh, J. K. (1994). Higher Order Asymptotics. IMS, Hayward,

CA.

Hall, P. (1983). Chi-squared approximations to the distribution

of a sum of independent random variables. Ann. Statist. 11

1028–1036.

Jennison, C. and Turnbull, B. W. (1983). Conﬁdence intervals

for a binomial parameter following a multistage test with

application to MIL-STD 105D and medical trials. Techno-

metrics, 25 49–58.

Jorgensen, B. (1997). The Theory of Dispersion Models. CRC

Chapman and Hall, London.

Laplace, P. S. (1812). Th´ eorie Analytique des Probabilit´ es.

Courcier, Paris.

Larson, H. J. (1974). Introduction to Probability Theory and Sta-

tistical Inference, 2nd ed. Wiley, New York.

INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 133

Pratt, J. W. (1961). Length of conﬁdence intervals. J. Amer.

Statist. Assoc. 56 549–567.

Rindskopf, D. (2000). Letter to the editor. Amer. Statist. 54 88.

Rubin, D. B. and Schenker, N. (1987). Logit-based interval esti-

mation for binomial data using the Jeffreys prior. Sociologi-

cal Methodology 17 131–144.

Sterne, T. E. (1954). Some remarks on conﬁdence or ﬁducial

limits. Biometrika 41 275–278.

Tibshirani, R. (1989). Noninformative priors for one parameter

of many. Biometrika 76 604–608.

Welch, B. L. and Peers, H. W. (1963). On formula for conﬁ-

dence points based on intergrals of weighted likelihoods. J.

Roy. Statist. Ser. B 25 318–329.

Yamagami, S. and Santner, T. J. (1993). Invariant small sample

conﬁdence intervals for the difference of two success proba-

bilities. Comm. Statist. Simul. Comput. 22 33–59.

Appendix 4.

B03002. HISPANIC OR LATINO ORIGIN BY RACE - Universe: TOTAL POPULATION

Data Set: 2005-2009 American Community Survey 5-Year Estimates

Survey: American Community Survey

NOTE. Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population

Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities and towns and estimates of

housing units for states and counties.

For information on confidentiality protection, sampling error, nonsampling error, and definitions, see Survey Methodology.

Autauga County, Alabama

Estimate Margin of Error

Total: 49,584 *****

Not Hispanic or Latino: 48,572 *****

White alone 38,636 +/-43

Black or African American alone 8,827 +/-83

American Indian and Alaska Native alone 183 +/-89

Asian alone 309 +/-94

Native Hawaiian and Other Pacific Islander alone 0 +/-119

Some other race alone 56 +/-45

Two or more races: 561 +/-157

Two races including Some other race 0 +/-119

Two races excluding Some other race, and three or more races 561 +/-157

Hispanic or Latino: 1,012 *****

White alone 579 +/-115

Black or African American alone 27 +/-30

American Indian and Alaska Native alone 10 +/-17

Asian alone 0 +/-119

Native Hawaiian and Other Pacific Islander alone 0 +/-119

Some other race alone 280 +/-109

Two or more races: 116 +/-87

Two races including Some other race 56 +/-58

Two races excluding Some other race, and three or more races 60 +/-68

Source: U.S. Census Bureau, 2005-2009 American Community Survey

Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling

variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error

can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the

estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the

ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of

nonsampling error is not represented in these tables.

While the 2005-2009 American Community Survey (ACS) data generally reflect the November 2008 Office of Management and Budget

(OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal

cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities.

Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census

2000 data. Boundaries for urban areas have not been updated since Census 2000. As a result, data for urban and rural areas from the

ACS do not necessarily reflect the results of ongoing urbanization.

Explanation of Symbols:

1. An '**' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to

compute a standard error and thus the margin of error. A statistical test is not appropriate.

2. An '-' entry in the estimate column indicates that either no sample observations or too few sample observations were available to

compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or

upper interval of an open-ended distribution.

3. An '-' following a median estimate means the median falls in the lowest interval of an open-ended distribution.

4. An '+' following a median estimate means the median falls in the upper interval of an open-ended distribution.

5. An '***' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended

distribution. A statistical test is not appropriate.

6. An '*****' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not

appropriate.

Standard Error/Variance documentation for this dataset:

Accuracy of the Data

Main Search Feedback FAQs Glossary Site Map Help

Appendix 5

Chapter 13.

Preparation and Review of Data Products

13.1 OVERVIEW

This chapter discusses the data products derived from the American Community Survey (ACS).

ACS data products include the tables, reports, and files that contain estimates of population and

housing characteristics. These products cover geographic areas within the United States and

Puerto Rico. Tools such as the Public Use Microdata Sample (PUMS) files, which enable data users

to create their own estimates, also are data products.

ACS data products will continue to meet the traditional needs of those who used the decennial

census long-form sample estimates. However, as described in Chapter 14, Section 3, the ACS will

provide more current data products than those available from the census long form, an especially

important advantage toward the end of a decade.

Most surveys of the population provide sufficient samples to support the release of data products

only for the nation, the states, and, possibly, a few substate areas. Because the ACS is a very large

survey that collects data continuously in every county, products can be released for many types of

geographic areas, including many smaller geographic areas such as counties, townships, and cen-

sus tracts. For this reason, geography is an important topic for all ACS data products.

The first step in the preparation of a data product is defining the topics and characteristics it will

cover. Once the initial characteristics are determined, they must be reviewed by the Census

Bureau Disclosure Review Board (DRB) to ensure that individual responses will be kept confiden-

tial. Based on this review, the specifications of the products may be revised. The DRB also may

require that the microdata files be altered in certain ways, and may restrict the population size of

the geographic areas for which these estimates are published. These activities are collectively

referred to as disclosure avoidance.

The actual processing of the data products cannot begin until all response records for a given year

or years are edited and imputed in the data preparation and processing phases, the final weights

are determined, and disclosure avoidance techniques are applied. Using the weights, the sample

data are tabulated for a wide variety of characteristics according to the predetermined content.

These tabulations are done for the geographic areas that have a sample size sufficient to support

statistically reliable estimates, with the exception of 5-year period estimates, which will be avail-

able for small geographic areas down to the census tract and block group levels. The PUMS data

files are created by different processes because the data are a subset of the full sample data.

After the estimates are produced and verified for correctness, Census Bureau subject matter ana-

lysts review them. When the estimates have passed the final review, they are released to the pub-

lic. A similar process of review and public release is followed for PUMS data.

While the 2005 ACS sample was limited to the housing unit (HU) population for the United States

and Puerto Rico, starting in sample year 2006, the ACS was expanded to include the group quar-

ters (GQ) population. Therefore, the ACS sample is representative of the entire resident population

in the United States and Puerto Rico. In 2007, 1-year period estimates for the total population and

subgroups of the total population in both the United States and Puerto Rico were released for

sample year 2006. Similarly, in 2008, 1-year period estimates were released for sample year 2007.

In 2008, the Census Bureau will, for the first time, release products based on 3 years of ACS

sample, 2005 through 2007. In 2010, the Census Bureau plans to release the first products based

on 5 years of consecutive ACS samples, 2005 through 2009. Since several years of samples form

the basis of these multiyear products, reliable estimates can be released for much smaller geo-

graphic areas than is possible for products based on single-year data.

Preparation and Review of Data Products 13−1 ACS Design and Methodology

U.S. Census Bureau

In addition to data products regularly released to the public, other data products may be

requested by government agencies, private organizations and businesses, or individuals. To

accommodate such requests, the Census Bureau operates a custom tabulations program for the

ACS on a fee basis. These tabulation requests are reviewed by the DRB to assure protection of

confidentiality before release.

Chapter 14 describes the dissemination of the data products discussed in this chapter, including

display of products on the Census Bureau’s Web site and topics related to data file formatting.

13.2 GEOGRAPHY

The Census Bureau strives to provide products for the geographic areas that are most useful to

users of those data. For example, ACS data products are already disseminated for many of the

nation’s legal and administrative entities, including states, American Indian and Alaska Native

(AIAN) areas, counties, minor civil divisions (MCDs), incorporated places, congressional districts,

as well as data for a variety of other geographic entities. In cooperation with state and local agen-

cies, the Census Bureau identifies and delineates geographic entities referred to as ‘‘statistical

areas.’’ These include regions, divisions, urban areas (UAs), census county divisions (CCDs), cen-

sus designated places (CDPs), census tracts, and block groups. Data users then can select the geo-

graphic entity or set of entities that most closely represent their geographic areas of interest and

needs.

‘‘Geographic summary level’’ is a term used by the Census Bureau to designate the different geo-

graphic levels or types of geographic areas for which data are summarized. Examples include the

entities described above, such as states, counties, and places (the Census Bureau’s term for enti-

ties such as for cities and towns, including unincorporated areas). Information on the types of

geographic areas for which the Census Bureau publishes data is available at

<http://www.census.gov/geo/www/garm.html>.

Single-year period estimates of ACS data are published annually for recognized legal, administra-

tive, or statistical areas with populations of 65,000 or more (based on the latest Census Bureau

population estimates). Three-year period estimates based on 3 successive years of ACS samples

are published for areas of 20,000 or more. If a geographic area met the 1-year or 3-year threshold

for a previous period but dropped below it for the current period, it will continue to be published

as long as the population does not drop more than 5 percent below the threshold. Plans are to

publish 5-year period estimates based on 5 successive years of ACS samples starting in 2010 with

the 2005−2009 data. Multiyear period estimates based on 5 successive years of ACS samples will

be published for all legal, administrative, and statistical areas down to the block-group level,

regardless of population size. However, there are rules from the Census Bureau’s DRB that must be

applied.

The Puerto Rico Community Survey (PRCS) also provides estimates for legal, administrative, and

statistical areas in Puerto Rico. The same rules as described above for the 1-year, 3-year, and

5-year period estimates for the U.S resident population apply for the PRCS as well.

The ACS publishes annual estimates for hundreds of substate areas, many of which will undergo

boundary changes due to annexations, detachments, or mergers with other areas.

1

Each year, the

Census Bureau’s Geography Division, working with state and local governments, updates its files

to reflect these boundary changes. Minor corrections to the location of boundaries also can occur

as a result of the Census Bureau’s ongoing Master Address File (MAF)/Topologically Integrated

Geographic Encoding and Referencing (TIGER®) Enhancement Project. The ACS estimates must

1

The Census Bureau conducts the Boundary and Annexation Survey (BAS) each year. This survey collects infor-

mation on a voluntary basis from local governments and federally recognized American Indian areas. The

information collected includes the correct legal place names, type of government, legal actions that resulted in

boundary changes, and up-to-date boundary information. The BAS uses a fixed reference date of January 1 of

the BAS year. In years ending in 8, 9, and 0, all incorporated places, all minor civil divisions, and all federally

recognized tribal governments are included in the survey. In other years, only governments at or above vari-

ous population thresholds are contacted. More detailed information on the BAS can be found at

<http://www.census .gov/geo/www/bas/bashome.html>.

13−2 Preparation and Review of Data Products ACS Design and Methodology

U.S. Census Bureau

reflect these legal boundary changes, so all estimates are based on Geography Division files that

show the geographic boundaries as they existed on January 1 of the sample year or, in the case of

multiyear data products, at the beginning of the final year of data collection.

13.3 DEFINING THE DATA PRODUCTS

For the 1999 through 2002 sample years, the ACS detailed tables were designed to be compa-

rable with Census 2000 Summary File 3 to allow comparisons between data from Census 2000

and the ACS. However, when Census 2000 data users indicated certain changes they wanted in

many tables, ACS managers saw the years 2003 and 2004 as opportunities to define ACS prod-

ucts based on users’ advice.

Once a preliminary version of the revised suite of products had been developed, the Census

Bureau asked for feedback on the planned changes from data users (including other federal agen-

cies) via a Federal Register Notice (Fed. Reg. #3510-07-P). The notice requested comments on cur-

rent and proposed new products, particularly on the basic concept of the product and its useful-

ness to the data users. Data users provided a wide variety of comments, leading to modifications

of planned products.

ACS managers determined the exact form of the new products in time for their use in 2005 for the

ACS data release of sample year 2004. This schedule allowed users sufficient time to become

familiar with the new products and to provide comments well in advance of the data release for

the 2005 sample.

Similarly, a Federal Register Notice issued in August 2007 shared with the public plans for the

data release schedule and products that would be available beginning in 2008. This notice was

the first that described products for multiyear estimates. Improvements will continue when multi-

year period estimates are available.

13.4 DESCRIPTION OF AGGREGATED DATA PRODUCTS

ACS data products can be divided into two broad categories: aggregated data products, and the

PUMS, which is described in Section 13.5 (‘‘Public Use Microdata Sample’’).

Data for the ACS are collected from a sample of housing units (HUs), as well as the GQ population,

and are used to produce estimates of the actual figures that would have been obtained by inter-

viewing the entire population. The aggregated data products contain the estimates from the sur-

vey responses. Each estimate is created using the sample weights from respondent records that

meet certain criteria. For example, the 2007 ACS estimate of people under the age of 18 in

Chicago is calculated by adding the weights from all respondent records from interviews com-

pleted in 2007 in Chicago with residents under 18 years old.

This section provides a description of each aggregated product. Each product described is avail-

able as single-year period estimates; unless otherwise indicated, they will be available as 3-year

estimates and are planned for the 5-year estimates. Chapter 14 provides more detail on the actual

appearance and content of each product.

These data products contain all estimates planned for release each year, including those from mul-

tiple years of data, such as the 2005−2007 products. Data release rules will prevent certain

single- and 3-year period estimates from being released if they do not meet ACS requirements for

statistical reliability.

Detailed Tables

The detailed tables provide basic distributions of characteristics. They are the foundation upon

which other data products are built. These tables display estimates and the associated lower and

upper bounds of the 90 percent confidence interval. They include demographic, social, economic,

and housing characteristics, and provide 1-, 3-, or 5-year period estimates for the nation and the

states, as well as for counties, towns, and other small geographic entities, such as census tracts

and block groups.

Preparation and Review of Data Products 13−3 ACS Design and Methodology

U.S. Census Bureau

The Census Bureau’s goal is to maintain a high degree of comparability between ACS detailed

tables and Census 2000 sample-based data products. In addition, characteristics not measured in

the Census 2000 tables will be included in the new ACS base tables. The 2007 detailed table prod-

ucts include more than almost 600 tables that cover a wide variety of characteristics, and another

380 race and Hispanic-origin iterations that cover 40 key characteristics. In addition to the tables

on characteristics, approximately 80 tables summarize allocation rates from the data edits for

many of the characteristics. These provide measures of data quality by showing the extent to

which responses to various questionnaire items were complete. Altogether, over 1,300 separate

detailed tables are provided.

Data Profiles

Data profiles are high-level reports containing estimates for demographic, social, economic, and

housing characteristics. For a given geographic area, the data profiles include distributions for

such characteristics as sex, age, type of household, race and Hispanic origin, school enrollment,

educational attainment, disability status, veteran status, language spoken at home, ancestry,

income, poverty, physical housing characteristics, occupancy and owner/renter status, and hous-

ing value. The data profiles include a 90 percent margin of error for each estimate. Beginning with

the 2007 ACS, a comparison profile that compares the 2007 sample year’s estimates with those of

the 2006 ACS also will be published. These profile reports include the results of a statistical sig-

nificance test for each previous year’s estimate, compared to the current year. This test result indi-

cates whether the previous year’s estimate is significantly different (at a 90 percent confidence

level) from that of the current year.

Narrative Profiles

Narrative profiles cover the current sample year only. These are easy-to-read, computer-produced

profiles that describe main topics from the data profiles for the general-purpose user. These are

the only ACS products with no standard errors accompanying the estimates.

Subject Tables

These tables are similar to the Census 2000 quick tables, and like them, are derived from detailed

tables. Both quick tables and subject tables are predefined, covering frequently requested infor-

mation on a single topic for a single geographic area. However, subject tables contain more detail

than the Census 2000 quick tables or the ACS data profiles. In general, a subject table contains

distributions for a few key universes, such as the race groups and people in various age groups,

which are relevant to the topic of the table. The estimates for these universes are displayed as

whole numbers. The distribution that follows is displayed in percentages. For example, subject

table S1501 on educational attainment provides the estimates for two different age groups—18 to

24 years old and 25 years and older, as a whole number. For each age group, these estimates are

followed by the percentages of people in different educational attainment categories (high school

graduate, college undergraduate degree, etc.). Subject tables also contain other measures, such as

medians, and they include the imputation rates for relevant characteristics. More than 40 topic-

specific subject tables are released each year.

Ranking Products

Ranking products contain ranked results of many important measures across states. They are pro-

duced as 1-year products only, based on the current sample year. The ranked results among the

states for each measure are displayed in three ways—charts, tables, and tabular displays that

allow for testing statistical significance.

The rankings show approximately 80 selected measures. The data used in ranking products are

pulled directly from a detailed table or a data profile for each state.

Geographic Comparison Tables (GCTs)

GCTs contain the same measures that appear in the ranking products. They are produced as both

1-year and multiyear products. GCTs are produced for states as well as for substate entities, such

as congressional districts. The results among the geographic entities for each measure are dis-

played as tables and thematic maps (see next).

13−4 Preparation and Review of Data Products ACS Design and Methodology

U.S. Census Bureau

Thematic Maps

Thematic maps are similar to ranking tables. They show mapped values for geographic areas at a

given geographic summary level. They have the added advantage of visually displaying the geo-

graphic variation of key characteristics (referred to as themes). An example of a thematic map

would be a map showing the percentage of a population 65 years and older by state.

Selected Population Profiles (SPPs)

SPPs provide certain characteristics from the data profiles for a specific race or ethnic group (e.g.,

Alaska Natives) or some other selected population group (e.g., people aged 60 years and older).

SPPs are provided every year for many of the Census 2000 Summary File 4 iteration groups. SPPs

were introduced on a limited basis in the fall of 2005, using the 2004 sample. In 2008 (sample

year 2007), this product was significantly expanded. The earlier SPP requirement was that a sub-

state geographic area must have a population of at least 1,000,000 people. This threshold was

reduced to 500,000, and congressional districts were added to the list of geographic types that

can receive SPPs. Another change to SPPs in 2008 is the addition of many country-of-birth groups.

Groups too small to warrant an SPP for a geographic area based on 1 year of sample data may

appear in an SPP based on the 3- or 5-year accumulations of sample data. More details on these

profiles can be found in Hillmer (2005), which includes a list of selected race, Hispanic origin, and

ancestry populations.

13.5 PUBLIC USE MICRODATA SAMPLE

Microdata are the individual records that contain information collected about each person and HU.

PUMS files are extracts from the confidential microdata that avoid disclosure of information about

households or individuals. These extracts cover all of the same characteristics contained in the

full microdata sample files. Chapter 14 provides information on data and file organization for the

PUMS.

The only geography other than state shown on a PUMS file is the Public Use Microdata Area

(PUMA). PUMAs are special nonoverlapping areas that partition a state, each containing a popula-

tion of about 100,000. State governments drew the PUMA boundaries at the time of Census 2000.

They were used for the Census 2000 sample PUMS files and are known as the ‘‘5 percent PUMAs.’’

(For more information on these geographic areas, go to <http://www.census.gov/prod/cen2000

/doc/pums.pdf>.)

The Census Bureau has released a 1-year PUMS file from the ACS since the survey’s inception. In

addition to the 1-year ACS PUMS file, the Census Bureau plans to create multiyear PUMS files from

the ACS sample, starting with the 2005−2007 3-year PUMS file. The multiyear PUMS files combine

annual PUMS files to create larger samples in each PUMA, covering a longer period of time. This

will allow users to create estimates that are more statistically reliable.

13.6 GENERATION OF DATA PRODUCTS

Following conversations with users of census data, the subject matter analysts in the Census

Bureau’s Housing and Household Economic Statistics Division and Population Division specify the

organization of the ACS data products. These specifications include the logic used to calculate

every estimate in each data product and the exact textual description associated with each esti-

mate. Starting with the 2006 ACS data release, only limited changes to these specifications have

occurred. Changes to the data product specifications must preserve the ability to compare esti-

mates from one year to another and must be operationally feasible. Changes must be made no

later than late winter of each year to ensure that the revised specifications are finalized by the

spring of that year and ready for the data releases beginning in the late summer of the year.

After the edited data with the final weights are available (see Chapters 10 and 11), generation of

the data products begins with the creation of the detailed tables data products with the 1-year

period estimates. The programming teams of the American Community Survey Office (ACSO) gen-

erate these estimates. Another staff within ACSO verifies that the estimates comply with the speci-

fications from subject matter analysts. Both the generation and the verification activities are auto-

mated.

Preparation and Review of Data Products 13−5 ACS Design and Methodology

U.S. Census Bureau

The 1-year data products are released on a phased schedule starting in the summer. Currently, the

Census Bureau plans to release the multiyear data products late each year, after the release of the

1-year products.

One distinguishing feature of the ACS data products system is that standard errors are calculated

for all estimates and are released with the latter in tables. Subject matter analysts also use the

standard errors in their internal reviews of estimates.

Disclosure Avoidance

Once plans are finalized for the ACS data products, the DRB reviews them to assure that confiden-

tiality of respondents has been protected.

Title 13 of the United States Code (U.S.C.) is the basis for the Census Bureau’s policies on disclo-

sure avoidance. Title 13 says, ‘‘Neither the Secretary, nor any other officer or employee of the

Department of Commerce may make any publication whereby the data furnished by any particular

establishment or individual under this title can be identified . . .’’ The DRB reviews all data prod-

ucts planned for public release to ensure adherence to Title 13 requirements, and may insist on

applying disclosure avoidance rules that could result in the suppression of certain measures for

small geographic areas. (More information about the DRB and its policies can be found at

<http://www.factfinder.census.gov/jsp/saff/SAFFInfo.jsp?_pageId=su5_confidentiality>.

To satisfy Title 13 U.S.C., the Census Bureau uses several statistical methodologies during tabula-

tion and data review to ensure that individually identifiable data will not be released.

Swapping. The main procedure used for protecting Census 2000 tabulations was data swap-

ping. It was applied to both short-form (100 percent) and long-form (sample) data indepen-

dently. Currently, it also is used to protect ACS tabulations. In each case, a small percentage of

household records is swapped. Pairs of households in different geographic regions are

swapped. The selection process for deciding which households should be swapped is highly

targeted to affect the records with the most disclosure risk. Pairs of households that are

swapped match on a minimal set of demographic variables. All data products (tables and

microdata) are created from the swapped data files.

For PUMS data the following techniques are employed in addition to swapping:

Top-coding is a method of disclosure avoidance in which all cases in or above a certain per-

centage of the distribution are placed into a single category.

Geographic population thresholds prohibit the disclosure of data for individuals or HUs for

geographic units with population counts below a specified level.

Age perturbation (modifying the age of household members) is required for large households

containing 10 people or more due to concerns about confidentiality.

Detail for categorical variables is collapsed if the number of occurrences in each category

does not meet a specified national minimum threshold.

For more information on disclosure avoidance techniques, see Section 5, ‘‘Current disclosure

avoidance practices’’ at <http://www.census.gov/srd/papers/pdf/rrs2005-06.pdf>.

The DRB also may determine that certain tables are so detailed that other restrictions are required

to ensure that there is sufficient sample to avoid revealing information on individual respondents.

In such instances, a restriction may be placed on the size of the geographic area for which the

table can be published. Current DRB rules require that detailed tables containing more than 100

detailed cells may not be released below the census tract level.

The data products released in the summer of 2006 for the 2005 sample covered the HU popula-

tion of the United States and Puerto Rico only. In January 2006, data collection began for the

population living in GQ facilities. Thus, the data products released in summer 2007 (and each year

13−6 Preparation and Review of Data Products ACS Design and Methodology

U.S. Census Bureau

thereafter) covered the entire resident population of the United States and Puerto Rico. Most esti-

mates for person characteristics covered in the data products were affected by this expansion. For

the most part, the actual characteristics remained the same, and only the description of the popu-

lation group changed from HU to resident population.

Data Release Rules

Even with the population size thresholds described earlier, in certain geographic areas some very

detailed tables might include estimates with unacceptable reliability. Data release rules, based on

the statistical reliability of the survey estimates, were first applied in the 2005 ACS. These release

rules apply only to the 1- and 3-year data products.

The main data release rule for the ACS tables works as follows. Every detailed table consists of a

series of estimates. Each estimate is subject to sampling variability that can be summarized by its

standard error. If more than half of the estimates in the table are not statistically different from 0

(at a 90 percent confidence level), then the table fails. Dividing the standard error by the estimate

yields the coefficient of variation (CV) for each estimate. (If the estimate is 0, a CV of 100 percent

is assigned.) To implement this requirement for each table at a given geographic area, CVs are cal-

culated for each table’s estimates, and the median CV value is determined. If the median CV value

for the table is less than or equal to 61 percent, the table passes for that geographic area and is

published; if it is greater than 61 percent, the table fails and is not published.

Whenever a table fails, a simpler table that collapses some of the detailed lines together can be

substituted for the original. If the simpler table passes, it is released. If it fails, none of the esti-

mates for that table and geographic area are released. These release rules are applied to single-

and multiyear period estimates based on 3 years of sample data. Current plans are not to apply

data release rules to the estimates based on 5 years of sample data.

13.7 DATA REVIEW AND ACCEPTANCE

After the editing, imputation, data products generation, disclosure avoidance, and application of

the release rules have been completed, subject matter analysts perform a final review of the ACS

data and estimates before release. This final data review and acceptance process helps to ensure

that there are no missing values, obvious errors, or other data anomalies.

Each year, the ACS staff and subject matter analysts generate, review, and provide clearance of all

ACS estimates. At a minimum, the analysts subject their data to a specific multistep review pro-

cess before they are cleared and released to the public. Because of the short time available to

review such a large amount of data, an automated review tool (ART) has been developed to facili-

tate the process.

ART is a computer application that enables subject matter analysts to detect statistically signifi-

cant differences in estimates from one year to the next using several statistical tests. The initial

version of ART was used to review 2003 and 2004 data. It featured predesigned reports as well as

ad hoc, user-defined queries for hundreds of estimates and for 350 geographic areas. An ART

workgroup defined a new version of ART to address several issues that emerged. The improved

version has been used by the analysts since June 2005; it is designed to work on much larger data

sets and a wider range of capabilities, with faster response time to user commands. A team of

programmers, analysts, and statisticians then developed an automated tool to assist analysts in

their review of the multiyear estimates. This tool was used in 2008 for the review of the

2005−2007 estimates.

The ACSO staff, together with the subject matter analysts, also have developed two other auto-

mated tools to facilitate documentation and clearance for required data review process steps: the

edit management and messaging application (EMMA), and the PUMS management and messaging

application (PMMA). Both are used to track the progress of analysts’ review activities and both

enable analysts and managers to see the current status of files under review and determine which

review steps can be initiated.

Preparation and Review of Data Products 13−7 ACS Design and Methodology

U.S. Census Bureau

13.8 IMPORTANT NOTES ON MULTIYEAR ESTIMATES

While the types of data products for the multiyear estimates are almost entirely identical to those

used for the 1-year estimates, there are several distinctive features of the multiyear estimates that

data users must bear in mind.

First, the geographic boundaries that are used for multiyear estimates are always the boundary as

of January 1 of the final year of the period. Therefore, if a geographic area has gained or lost terri-

tory during the multiyear period, this practice can have a bearing on the user’s interpretation of

the estimates for that geographic area.

Secondly, for multiyear period estimates based on monetary characteristics (for example, median

earnings), inflation factors are applied to the data to create estimates that reflect the dollar values

in the final year of the multiyear period.

Finally, although the Census Bureau tries to minimize the changes to the ACS questionnaire, these

changes will occur from time to time. Changes to a question can result in the inability to build cer-

tain estimates for a multiyear period containing the year in which the question was changed. In

addition, if a new question is introduced during the multiyear period, it may be impossible to

make estimates of characteristics related to the new question for the multiyear period.

13.9 CUSTOM DATA PRODUCTS

The Census Bureau offers a wide variety of general-purpose data products from the ACS designed

to meet the needs of the majority of data users. They contain predefined sets of data for standard

census geographic areas. For users whose data needs are not met by the general-purpose prod-

ucts, the Census Bureau offers customized special tabulations on a cost-reimbursable basis

through the ACS custom tabulation program. Custom tabulations are created by tabulating data

from ACS edited and weighted data files. These projects vary in size, complexity, and cost,

depending on the needs of the sponsoring client.

Each custom tabulation request is reviewed in advance by the DRB to ensure that confidentiality is

protected. The requestor may be required to modify the original request to meet disclosure avoid-

ance requirements. For more detailed information on the ACS Custom Tabulations program, go to

<http://www.census.gov/acs/www/Products/spec_tabs/index.htm>.

13−8 Preparation and Review of Data Products ACS Design and Methodology

U.S. Census Bureau

Memo Regarding ACS Confidence Interval With Response
Andrew A. Beveridge, Ph.D.

Memo Regarding ACS Confidence Interval With Response

Andrew A. Beveridge, Ph.D.

Andrew A. Beveridge, Ph.D.

- Demographics and Educational Attainment Revised
- null
- Health and Human Services
- Morris County Data Book - Rev 2009 January
- Readmit Form
- 2kh08
- 2014 Federal Departures
- National Science Foundation
- District of Columbia State Data Center Monthly Brief
- Startup Financing Trends by Race
- University of Southern Mississippi Completion
- description
- Application Earned Safe & Sick Leave Stakeholder Work Group
- 50063_1930-1934
- Untitled
- National Science Foundation
- New York State Chartbook
- SPENCER, DOMINIQUE, E. ®©™ SF181 POSTED December 12th, 2016 A.D.E.
- MCAT Facts table
- 61093_2005-2009
- tmpF589.tmp
- Instructions for Groups Touring the White House
- Krog Street Market Marketing Materials
- Letter to the NLCN Editor- Cook County Redistricting 2
- Mc Public Safety Academy Application Updated
- description
- 137 s Main St Dempgraphics
- Jys Associate Membership Form Lms Studios 2013-14
- Tab 02
- STEELE, YOLANDA, H. ®©™ SF181 POSTED December 16th, 2016 A.D.E.
- Memo Regarding ACS-With Response

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd