You are on page 1of 7

Supplementary materials for this article are available online. Please go to www.tandfonline.

com/r/TAS

General

Exact Binomial Confidence Intervals for Randomized Response

Jesse FREY and Andrés PÉREZ

πyes is the chance of responding “yes” to the question, then


We consider the problem of finding an exact confidence inter- πyes = pπA + (1 − p)(1 − πA ).
val for a proportion that is estimated using randomized response. If π̂yes is the sample proportion of “yes” answers obtained
For many randomized response schemes, this is equivalent to from a simple random sample of n respondents, then one ob-
finding an exact confidence interval for a bounded binomial pro- tains an unbiased estimator for πA by setting π̂A = (π̂yes − (1 −
portion. Such intervals can be obtained by truncating standard p))/(2p − 1). However, it is possible for this unbiased estimator
exact binomial confidence intervals, but the truncated intervals to be less than 0 or bigger than 1. When we take into account that
may be empty or misleadingly short. We address this prob- the true πA must satisfy πA ∈ [0, 1], we find that the maximum
lem by using exact confidence intervals obtained by inverting a likelihood estimator (MLE) for πA is π̃A = median{0, π̂A , 1},
likelihood ratio test that takes into account that the proportion which always lies in the interval [0, 1].
is bounded. A simple adjustment is made to keep the intervals Since the original work of Warner (1965), much additional
from being excessively conservative. An R function for comput- work has been done on randomized response. Techniques for
ing the intervals is available as online supplementary material. handling multichotomous and continuous variables have been
developed in articles such as Greenberg, Kuebler, Abernathy,
KEY WORDS: Binomial distribution; Bounded binomial pro- and Horvitz (1971), and improved methods have been pro-
portion; Likelihood ratio; Simple random sampling posed for estimating proportions πA . Greenberg, Abul-Ela,
Simmons, and Horvitz (1969) explored the unrelated ques-
tion technique, in which Warner’s question “Do you belong
to group Ā?” is replaced by an innocuous question with a
1. INTRODUCTION known “yes” probability. Kuk (1990) proposed a method in
In surveys that ask sensitive questions, there is a danger that which, out of sight of the interviewer, the respondent gener-
respondents will refuse to respond or will answer untruthfully. ates two independent binary outcomes X1 and X2 with known
Randomized response techniques encourage truthful answers success probabilities p1 ̸= p2 . If the respondent belongs to
by ensuring that the respondent’s answers do not provide any group A, then he/she reports X1 . Otherwise, he/she reports X2 .
definite sensitive information. In the original formulation of Mangat (1994) developed a method in which the Warner ran-
randomized response, in Warner (1965), one is interested in domized response method is used only for members of group
estimating the proportion of a population that belongs to a sen- Ā. Those in group A respond “yes” regardless of the value of
sitive group A. Since respondents who belong to group A are X. Campbell and Joiner (1973) gave an interesting introduction
unlikely to admit membership under direct questioning, a ran- to randomized response, and more extensive reviews of the ran-
domizing device is brought in. Out of sight of the interviewer, domized response literature appear in van den Hout and van
the respondent uses a spinner, a die, or some other device to der Heijden (2002) and in Chaudhuri (2011). Recent contribu-
generate a binary outcome X that is 1 with known probability tions to the randomized response literature include Quatember
p ̸= 1/2 and 0 otherwise. If X = 1, then the respondent truth- (2009), Tan, Tian, and Tang (2009), and Nayak and Adeshiyan
fully answers the question “Do you belong to group A?”, and (2009).
if X = 0, then the respondent truthfully answers the opposite The randomized response literature includes techniques for
question “Do you belong to group Ā?” A routine calculation point estimation, techniques for estimating the variability of
shows that if πA is the chance of belonging to group A and a point estimate, and techniques for comparing randomized re-
sponse schemes in terms of how well they handle the tradeoff be-
tween privacy protection and efficiency of estimation. However,
it seems that relatively little work has been done on confidence
Jesse Frey is Associate Professor, Department of Mathematics and Statistics,
Villanova University, 800 East Lancaster Avenue, Villanova, PA 19085 (E- intervals. In the binomial case, the only confidence intervals that
mail: jesse.frey@villanova.edu). Andrés Pérez is PhD student, Department of seem to have been used are the standard Wald intervals, which
Statistics, Temple University, 1801 N. Broad Street, Philadelphia, PA 19122 perform poorly (see Brown, Cai, and DasGupta 2001). In this
(E-mail: andres.perez@temple.edu). This work was done while A. Pérez was article, we develop exact confidence intervals for proportions
a Master’s student at Villanova University. The authors thank the editor, an
that are estimated using randomized response. These intervals
associate editor, and the two referees for their comments and suggestions.

8 © 2012 American Statistical Association DOI: 10.1080/00031305.2012.663680 The American Statistician, February 2012, Vol. 66, No. 1
P-Value Precision and Reproducibility

Dennis D. B OOS and Leonard A. S TEFANSKI


< 0.05. In this article we systematically study p-value variabil-
ity with an eye toward developing a better appreciation of its
P-values are useful statistical measures of evidence against a magnitude and potential impacts, especially those related to the
null hypothesis. In contrast to other statistical estimates, how- profusion of scientific results that fail to reproduce upon repli-
ever, their sample-to-sample variability is usually not consid- cation.
ered or estimated, and therefore not fully appreciated. Via a Under alternative hypotheses, p-values are less variable than
systematic study of log-scale p-value standard errors, bootstrap they are under null hypotheses, where for continuous cases the
prediction bounds, and reproducibility probabilities for future Uniform(0,1) standard deviation 12−1/2 = 0.29 applies. How-
replicate p-values, we show that p-values exhibit surprisingly ever, under alternatives their standard deviations are typically
large variability in typical data situations. In addition to provid- large fractions of their mean values, and thus p-values are in-
ing context to discussions about the failure of statistical results herently imprecise. Papers such as Goodman (1992) and Gel-
to replicate, our findings shed light on the relative value of ex- man and Stern (2006) address this variability, yet the subject
act p-values vis-a-vis approximate p-values, and indicate that deserves more attention because p-values play such an impor-
the use of *, **, and *** to denote levels 0.05, 0.01, and 0.001 tant role in practice. Ignoring variability in p-values is poten-
of statistical significance in subject-matter journals is about the tially misleading just as ignoring the standard error of a mean
right level of precision for reporting p-values when judged by can be misleading.
widely accepted rules for rounding statistical estimates. The science of statistics has recently come under attack for
its purported role in the failure of statistical findings to hold up
KEY WORDS: Log p-value; Measure of evidence; Prediction under the litmus test of replication. In Odds Are, It’s Wrong:
interval; Reproducibility probability. Science Fails to Face the Shortcomings of Statistics, Siegfried
(2010) highlighted Science’s love affair with p-values and ar-
gued that their shortcomings, and the shortcomings of statis-
tics more generally, are responsible for the profusion of faulty
1. INTRODUCTION
scientific claims. Pantula et al. (2010) responded on behalf of
Good statistical practice demands reporting some measure of the American Statistical Association and the International Sta-
variability or reliability for important statistical estimates. For tistical Institute, pointing out that Siegfried’s failure to distin-
example, if a population mean µ is estimated by a sample mean guish between the limitations of statistical science and the mis-
Y using an independent and identically distributed (iid) sample use of statistical methods results in erroneous conclusions about

Y1 , . . . ,Yn , common practice is to report a standard error s/ n the role of statistics in the profusion of false scientific claims.
or a confidence interval for µ , or their Bayesian equivalents. Lack of replication in studies and related statistical issues have
However, the variability of a p-value is typically not assessed been highlighted recently in interesting presentations by Young
or reported in routine data analysis, even when it is the primary (2008), who pointed to multiple testing, multiple methodologies
statistical estimate. (trying many different methods of analysis), and bias as partic-
More generally, statisticians are quick to emphasize that ular problems. In fact, results fail to replicate for a number of
statistics have sampling distributions and that it is important reasons. In order to correctly assess their impacts, it is necessary
to interpret those statistics in view of their sampling variation. to understand the role of p-value variability itself.
However, the caveat about interpretation is often overlooked We study three approaches to quantifying p-value variabil-
when the statistic is a p-value. With any statistic, ignoring vari- ity. First, we consider the statistician’s general purpose mea-
ability can have undesirable consequences. In the case of the p- sure of variability, the standard deviation and its estimate, the
value, the main problem is that too much stock may be placed standard error. We show that − log10 (p-value) standard devia-
in a finding that is deemed statistically significant by a p-value tions are such that for a wide range of observed significance
levels, only the magnitude of − log10 (p-value) is reliably deter-
−k
Dennis D. Boos is Professor and Associate Department Head, and mined. That is, writing the p-value as x · 10 , where 1 ≤ x < 10
Leonard A. Stefanski is Professor, Department of Statistics, North Car- and k = 1, 2, 3, . . . is the magnitude so that − log10 (p-value)
olina State University, Raleigh, NC 27695-8203 (E-mail for correspondence: = − log (x) + k, the standard deviation of − log (p-value) is
10 10
boos@stat.ncsu.edu). This work was supported by NSF grant DMS-0906421
so large relative to its value that only the magnitude k is reli-
and NIH grant P01 CA142538-01. We thank the Editor, Associate Editor, and
two referees for thoughtful comments that resulted in substantial improvements ably determined as a measure of evidence. This phenomenon is
to the article’s content and its exposition. manifest in standard errors derived from both the bootstrap and

c
⃝2011 American Statistical Association DOI: 10.1198/tas.2011.10129 The American Statistician, November 2011, Vol. 65, No. 4 213
Teaching Statistics at Google-Scale

Nicholas CHAMANDY, Omkar MURALIDHARAN, and Stefan WAGER

across observations can be fully realized. But these exciting op-


Modern data and applications pose very different challenges portunities come with strings attached. The data sources and
from those of the 1950s or even the 1980s. Students contemplat- structures that a graduating statistician (or data scientist) faces
ing a career in statistics or data science need to have the tools are unlike those of the 1950s, or even the 1980s. Statistical mod-
to tackle problems involving massive, heavy-tailed data, often els we take for granted are sometimes out of reach: constructing
interacting with live, complex systems. However, despite the a matrix of dimensions n by p can be pure fantasy, and outliers
deepening connections between engineering and modern data are the rule, not the exception. Moreover, powerful comput-
science, we argue that training in classical statistical concepts ing tools have become a prerequisite to even reading the data.
plays a central role in preparing students to solve Google-scale Modern data are in general unwieldy and raw, often contami-
problems. To this end, we present three industrial applications nated by “spammy” or machine-generated observations. More
where significant modern data challenges were overcome by than ever, data checking and sanitization are the domain of the
statistical thinking. statistician.
In this context, some might think that the future of data science
KEY WORDS: Big data; Data science. education lies in engineering departments, with a focus on build-
ing ever more sophisticated data-analysis systems. Indeed, the
American Statistical Association Guidelines Workgroup noted
the “increased importance of data science” as the leading Key
1. INTRODUCTION Point of its 2014 Curriculum Guidelines (ASA 2014). As Di-
Technology companies like Google generate and consume ane Lambert and others have commented, it is vital that today’s
data on a staggering scale. Massive, distributed data present statisticians have the ability to “think with data” (Hardin et al.
novel and interesting challenges for the statistician, and have 2014). We agree wholeheartedly with the notion that students
spurred much excitement among students, and even a new dis- must be fluent in modern computational paradigms and data
cipline: Data Science. Hal Varian famously quipped in 2009 manipulation techniques. We present a counterbalance to this
that Statistician would be “the sexy job in the next 10 years” narrative, however, in the form of three data analysis challenges
(Lohr 2009), a claim seemingly backed up by the proliferation inspired by an industrial “big data” problem: click cost estima-
of job postings for data scientists in high-tech. The McKinsey tion. We illustrate how each example can be tackled not with
Global Institute took a more urgent tone in their 2011 report fancy computation, but with new twists on standard statistical
examining the explosion of big data in industry (Manyika et al. methods, yielding solutions that are not only principled, but also
2011). While extolling the huge productivity gains that untap- practical.
ping such data would bring, they predicted a shortage of hun- Our message is not at odds with the ASA’s recent recom-
dreds of thousand of “people with deep analytical skills,” and mendations; indeed, the guidelines highlight statistical theory,
millions of data-savvy managers, over the next few years. “flexible problem solving skills,” and “problems with a sub-
Massive data present great opportunities for a statistician. stantive context” as core components of the curriculum (ASA
Estimating tiny experimental effect sizes becomes routine, and 2014). The methodological tweaks presented in this article are
practical significance is often more elusive than mere statistical not particularly advanced, touching on well-known results in the
significance. Moreover, the power of approaches that pool data domains of resampling, shrinkage, randomization, and causal
inference. Their contributions are more conceptual than the-
oretical. As such, we believe that each example and solution
Nicholas Chamandy is Data Scientist, Lyft, San Francisco, CA; formerly with would be accessible to an undergraduate student in a statistics
Google (E-mail: ). Omkar Muralidharan is Statistician in Ads Quality, Google, or data science program. In relaying this message to such a stu-
1600 Amphitheater Pkwy, Mountain View, CA 94043 (E-mail: ). Stefan Wager
dent, it is not the specific examples or solutions presented here
is Ph.D. Candidate in Statistics, Stanford University, Sequoia Hall, 390 Serra
Mall, Palo Alto, CA 94305-4065 (E-mail: ). This article is derived from a talk that should be stressed. Rather, we wish to emphasize the value
presented at the Joint Statistical Meetings on August 5, 2013, in Montreal, of a solid understanding of classical statistical ideas as they ap-
during the session entitled Toward Big Data in Teaching Statistics. The authors ply to modern problems in preparing tomorrow’s students for
are grateful to Amir Najmi, Hal Varian, and three anonymous reviewers for large-scale data science challenges.
their many helpful suggestions. They would also like to thank Nick Horton, Jo
In many cases, the modern statistician acts as an interface
Hardin, and Tim Hesterberg for encouraging them to write the article. (E-mail:
nickchamandy@gmail.com). between the raw data and the consumer of those data. The term
Color versions of one or more of the figures in the article can be found online “consumer” is used rather broadly here. Traditionally, it would
at www.tandfonline.com/r/tas. include the key decision-makers of a business, and perhaps the

© 2015 American Statistical Association DOI: 10.1080/00031305.2015.1089790 The American Statistician, November 2015, Vol. 69, No. 4 283
Three Examples of Accurate Likelihood Inference

C. L OZADA -C AN and A. C. DAVISON


of continuous responses, y = (y1 , . . . , yn ). The log-likelihood,
ℓ(θ ) = log f (y; θ ), is maximized by the maximum likelihood
The modern theory of likelihood inference provides im- estimator ! θ . Under standard regularity conditions ! θ has an ap-
proved inferences in many parametric models, with little more proximate normal distribution with mean θ and variance ma-
effort than is required for application of standard first-order the- θ )−1 , where ȷ (θ ) = −∂ 2 ℓ(θ)/∂θ ∂θ T is the observed in-
trix ȷ (!
ory. We outline the relevant computations, and illustrate the cal- formation matrix, and the Wald pivot {ȷ (! θ )}1/2 (!
θ − θ ) has a
culations using a dilution assay, a zero-inflated Poisson regres- standard normal distribution. The approximate normal distrib-
sion model, and a short time series. In each case the effect of ution of ! θ provides the most common basis for inference about
the higher order correction can be appreciable. elements of θ , but the implicit assumption that the estimator
is symmetrically distributed around its target means that the
KEY WORDS: Autoregression; Bias reduction; Dilution as-
resulting confidence intervals for components of θ tend to be
say; Higher order asymptotics; Likelihood; Zero-inflated Pois-
centered wrongly, or equivalently that true significance levels
son distribution.
for one-sided tests on such components may differ substan-
tially from their nominal values, because asymmetry of the log-
likelihood about its maximum is not accommodated. Moreover,
1. INTRODUCTION these inferences are not invariant to transformations of the pa-
rameters.
Likelihood is a mainstay of statistical modeling. Inference
Write θ = (ψ, λ), where ψ is a scalar component of θ for
in many applications is based on familiar large sample the-
which inference is required and λ represents the remaining
ory for the maximum likelihood estimator and the likelihood
components, and let ! θ = (ψ!,!
λ) and ! θψ = (ψ,! λψ ) respectively
ratio statistic (Cox and Hinkley 1974, chap. 9). This theory
denote the overall maximum likelihood estimator and the maxi-
involves approximations that may be justified when the sam-
mum likelihood estimator when ψ is held fixed. In this notation
ple size is large, but simulation is often needed to establish
a preferable basis for inference on ψ is the likelihood root
whether a given sample is sufficiently large in a new applica-
" #1/2
tion. Thus the basic theory is often applied in cases where its r(ψ) = sign(ψ ! − ψ) 2{ℓ(! θ ) − ℓ(!θψ )} , (1)
performance is unclear, either because the sample is small but
no better approach is readily available, or because although the which takes potential asymmetry of the log-likelihood into ac-
sample appears large, the information content per observation is count, and may be treated as an N (0, 1) variable. The quantity
much smaller than is thought. The best-known situation where r(ψ)2 is the familiar likelihood ratio statistic. When testing the
this second issue arises is in logistic regression, for which spe- null hypothesis that ψ = ψ0 against the one-sided hypothesis
cial approximations have been widely studied (Davison 1988; that ψ > ψ0 , we regard small values of the p-value '{r(ψ0 )}
Strawderman and Wells 1998). It is not widely appreciated that as casting doubt on the null hypothesis; here and below we use
standard likelihood theory can be readily improved and that the ' to denote the cumulative probability function of the standard
corresponding computations are relatively straightforward. The normal distribution. When θ = ψ is scalar, λ disappears from
purpose of this article is to illustrate these ideas in three sit- the expressions above and ȷ (θ ) is scalar. It is easy to verify that
uations of increasing complexity: a single-parameter dilution apart from a possible sign change, r(ψ) is invariant to repara-
assay; a zero-inflated Poisson regression model; and an autore- meterizations of the form (ψ, λ) "→ {g(ψ), h(ψ, λ)}, in which
gressive time series. g and h are bijective, so-called interest-respecting reparameter-
izations.
2. MODERN LIKELIHOOD THEORY
2.2 Higher Order Approximations
2.1 Basic Notions
Improved likelihood inferences may be obtained through
We assume a parametric model with probability density func- higher order asymptotics, on which there is a large literature
tion f (y; θ ), with a d-dimensional parameter θ and a vector summarized in books by Barndorff-Nielsen and Cox (1994),
Pace and Salvan (1997), Severini (2000), and Brazzale, Davi-
Claudia Lozada-Can is a Postdoctoral Researcher (E-mail: Claudia_ son, and Reid (2007), and in review articles such as Reid
Lozada@live.com) and Anthony Davison is Professor of Statistics (E-mail:
Anthony.Davison@epfl.ch), Institute of Mathematics, Ecole Polytechnique
(2003). One basic formula is the so-called p ∗ approximation to
Fédérale de Lausanne, IMA-FSB-EPFL, Station 8, 1015 Lausanne, Switzer- the density of the maximum likelihood estimator conditioned
land. The work was supported by the Swiss National Science Foundation on an ancillary statistic (Barndorff-Nielsen 1980, 1983, 1986),
through the National Centre for Competence in Research on Plant Survival manipulation of which yields the modified likelihood root
(http://www2.unine.ch/nccr). We are grateful to the referee, associate editor and $ %
editor, and to Alessandra Brazzale and Nancy Reid for their helpful comments 1 q(ψ)
on the work.
r ∗ (ψ) = r(ψ) + log , (2)
r(ψ) r(ψ)

© 2010 American Statistical Association DOI: 10.1198/tast.2010.09004 The American Statistician, May 2010, Vol. 64, No. 2 131
Use of R as a Toolbox for Mathematical Statistics Exploration
Nicholas J. HORTON, Elizabeth R. BROWN, and Linjuan QIAN

tation available for the system, and describes an introductory R


session. Each of the activities in Section 3 include a listing of the
The R language, a freely available environment for statistical R commands and output, and a description to help introduce new
computing and graphics is widely used in many fields. This syntax, structures, and idioms. Finally, we conclude with some
“expert-friendly” system has a powerful command language overall comments about R as an environment for mathematical
and programming environment, combined with an active user statistics education.
community. We discuss how R is ideal as a platform to sup- Over the past decade, there has been increasing consensus re-
port experimentation in mathematical statistics, both at the un- garding the importance of computing skills as a component of
dergraduate and graduate levels. Using a series of case studies statistics education. Numerous authors have described the im-
and activities, we describe how R can be used in a mathemati- portance of computing support for statistics, in the context of
cal statistics course as a toolbox for experimentation. Examples numerous curricular initiatives. As an example, Moore (2000)
include the calculation of a running average, maximization of featured papers on technology that fall within the mantra of
a nonlinear function, resampling of a statistic, simple Bayesian “more data, less lecturing” (Cobb 1991). Curriculum guidelines
modeling, sampling from multivariate normal, and estimation of for undergraduate programs in statistical science from the Amer-
power. These activities, often requiring only a few dozen lines ican Statistical Association require familiarity with a standard
of code, offer students the opportunity to explore statistical con- software package and should encourage study of data manage-
cepts and experiment. In addition, they provide an introduction ment and algorithmic problem-solving (American Statistical As-
to the framework and idioms available in this rich environment. sociation 2004). While we are fully in agreement with the need
for software packages to teach introductory statistics courses, in
KEY WORDS: Mathematical statistics education; Statistical
this article, we focus on aspects relating to algorithmic problem-
computing.
solving, at the level of a more advanced course.
We also make a distinction between statistical computing and
numerical analysis from the toolbox (or sandbox or testbed) for
1. INTRODUCTION statistical education that we describe. Although there is a cru-
The R language (Ihaka and Gentleman 1996; Hornik 2004) cial role for statisticians with appropriate training in numerical
is a freely available environment for statistical computing and analysis, computer science, graphical interfaces, and database
graphics. It allows the handling and storage of data, supports design, in our experience relatively few students emerge from
many operators for calculations, provides a number of tools for undergraduate or graduate statistics programs with this train-
data analysis, and features excellent graphical capabilities and ing. While somewhat dated, Eddy, Jones, Kass, and Schervish
a straightforward programming language. R is widely used for (1987) considered the role of computational statistics in grad-
uate statistical education in the late 1980s. A series of articles
statistical analysis in a variety of fields, and boasts a large number
in the February 2004 issue of The American Statistician (Gen-
of add-on packages that extend the system.
tle 2004; Monahan 2004; Lange 2004) readdress these issues.
This article considers the use of R not for analysis, but as a
Lange (2004) described computational statistics and optimiza-
toolbox for exploration of mathematical statistics. Following the
tion courses that “would have been unrecognizable 20 years ago”
approach of Baglivo (1995), we introduce examples to illustrate
[though considerable overlap is noted with topics given by Eddy
how R can be used to experiment with concepts in mathematical
et al. (1987)]. Gentle (2004) listed six areas of expertise valuable
statistics. In addition to providing insight into the underlying
for statisticians:
mathematical statistics, these example topics provide an intro-
duction to R syntax, capabilities and idioms, and the power of 1. data manipulation and analysis
this environment. 2. symbolic computations
We begin by providing some motivation for the importance 3. computer arithmetic
of statistical computing environments as a component of math- 4. programming
ematical statistics education. Section 2 reviews the history of R 5. algorithms and methods
(and its connection to S/S-Plus), details resources and documen-
6. data structures
Most of these topics, while quite valuable, often require addi-
Nicholas J. Horton is Assistant Professor, Department of Mathematics, Smith
tional training beyond the standard prerequisites for a mathemat-
College, Northampton, MA 01063 (E-mail: nhorton@email.smith.edu). Eliz-
abeth R. Brown is Research Assistant Professor, Department of Biostatistics, ical statistics class (Monahan 2004; Gentle 2004; Lange 2004).
University of Washington, Seattle, WA. Linjuan Qian is Undergraduate Re- The use of “rapid prototyping” and “quick-and-dirty” Monte
search Assistant, Department of Mathematics, Smith College, Northampton, Carlo studies (Gentle 2004) to better understand a given setting
MA 01063. We are grateful to Ken Kleinman and Paul Kienzle for comments
on an earlier draft of the article, and for the support provided by NIMH grant
is particularly relevant for mathematics statistics classes (not
R01-MH54693 and a Career Development Fund Award from the Department of just statistical computing). For many areas of modern statistics
Biostatistics at the University of Washington. (e.g., resampling based tests, Bayesian inference, smoothing,

© 2004 American Statistical Association DOI: 10.1198/000313004X5572 The American Statistician, November 2004, Vol. 58, No. 4 343
General

“Not Only Defended But Also Applied”: The Perceived Absurdity


of Bayesian Inference

Andrew GELMAN and Christian P. ROBERT

We begin with a Note on Bayes’ rule that appeared in William


The missionary zeal of many Bayesians of old has been Feller’s classic probability text:
matched, in the other direction, by an attitude among some
Unfortunately Bayes’ rule has been somewhat discredited
theoreticians that Bayesian methods were absurd—not merely by metaphysical applications of the type described above. In
misguided but obviously wrong in principle. We consider sev- routine practice, this kind of argument can be dangerous. A
eral examples, beginning with Feller’s classic text on probability quality control engineer is concerned with one particular ma-
theory and continuing with more recent cases such as the per- chine and not with an infinite population of machines from
ceived Bayesian nature of the so-called doomsday argument. We which one was chosen at random. He has been advised to
analyze in this note the intellectual background behind various use Bayes’ rule on the grounds that it is logically acceptable
misconceptions about Bayesian statistics, without aiming at a and corresponds to our way of thinking. Plato used this type
complete historical coverage of the reasons for this dismissal. of argument to prove the existence of Atlantis, and philoso-
phers used it to prove the absurdity of Newton’s mechanics.
KEY WORDS: Bogosity; Doomsdsay argument; Foundations; In our case it overlooks the circumstance that the engineer
Frequentist; Laplace law of succession. desires success and that he will do better by estimating and
minimizing the sources of various types of errors in predict-
ing and guessing. The modern method of statistical tests and
estimation is less intuitive but more realistic. It may be not
only defended but also applied. W. Feller, 1950 (pp. 124–125
1. A VIEW FROM 1950 of the 1970 edition)

Younger readers of this journal may not be fully aware of Feller believed that Bayesian inference could be defended
the passionate battles over Bayesian inference among statisti- (i.e., supported via theoretical argument) but not applied to
cians in the last half of the twentieth century. During this pe- give reliable answers to problems in science or engineering,
riod, the missionary zeal of many Bayesians was matched, in a claim that seems quaint in the modern context of Bayesian
the other direction, by a view among some theoreticians that methods being used in problems from genetics, toxicology, and
Bayesian methods are absurd—not merely misguided but obvi- astronomy to economic forecasting and political science. As we
ously wrong in principle. Such anti-Bayesianism could hardly discuss below, what struck us about Feller’s statement was not
be maintained in the present era, given the many recent practical so much his stance as his apparent certainty.
successes of Bayesian methods. But by examining the historical One might argue that, whatever the merits of Feller’s state-
background of these beliefs, we may gain some insight into the ment today, it might have been true back in 1950. Such a claim,
statistical debates of today. however, would have to ignore, for example, the success of
Bayesian methods by Turing and others in code breaking dur-
ing the Second World War, followed up by expositions such as
Andrew Gelman, Departments of Statistics and Political Science, Columbia Uni- Good (1950), as well as Jeffreys’s Theory of Probability, which
versity, New York, NY 10027 (E-mail: gelman@stat.columbia.edu). Christian came out in 1939. Consider this recollection from physicist and
P. Robert, Université Paris-Dauphine, CEREMADE, IUF, and CREST, Paris, Bayesian E. T. Jaynes:
France (E-mail: xian@ceremade.dauphine.fr). We thank David Aldous, Ronald
Christensen, the Associate Editor, and two reviewers for helpful comments. In When, as a student in 1946, I decided that I ought to learn
addition, the first author (AG) thanks the Institute of Education Sciences, Depart- some probability theory, it was pure chance which led me to
ment of Energy, National Science Foundation, and National Security Agency
take the book Theory of Probability by Jeffreys, from the li-
for partial support of this work. He remembers reading with pleasure much of
brary shelf. In reading it, I was puzzled by something which,
Feller’s first volume in college, after taking probability but before taking any
statistics courses. The second author’s (CPR) research is partly supported by I am afraid, will also puzzle many who read the present book.
the Agence Nationale de la Recherche (ANR, 212, rue de Bercy 75012 Paris) Why was he so much on the defensive? It seemed to me that
through the 2007–2010 grant ANR-07-BLAN-0237 “SPBayes.” He remembers Jeffreys’ viewpoint and most of his statements were the most
buying Feller’s first volume in a bookstore in Ann Arbor during a Bayesian obvious common sense, I could not imagine any sane person
econometrics conference where he was kindly supported by Jim Berger. disputing them. Why, then, did he feel it necessary to insert

© 2013 American Statistical Association DOI: 10.1080/00031305.2013.760987 The American Statistician, February 2013, Vol. 67, No. 1 1
Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/TAS

Statistical Computing and Graphics

Visualizing Longitudinal Data With Dropouts

Mithat GÖNEN

There are several graphical methods for displaying various


This article proposes a triangle plot to display longitudinal features of longitudinally collected data such as plotting the
data with dropouts. The triangle plot is a tool of data visual- means over time, event charts, follow-up PLOTs (FU-PLOTs),
ization that can also serve as a graphical check for informative- spaghetti plots, and lasagna plots. Plotting the means, arguably
ness of the dropout process. There are similarities between the the most commonly used method in practice, is simple to un-
lasagna plot and the triangle plot, but the explicit use of dropout derstand and communicate but can hide the salient features of
time as an axis is an advantage of the triangle plot over the more the dropout process.
commonly used graphical strategies for longitudinal data. It is Event charts (Goldman 1992) and extensions (Lee, Hess, and
possible to interpret the triangle plot as a trellis plot, which gives Dubin 2000; Atherton et al. 2003) display the timing of multiple
rise to several extensions such as the triangle histogram and the events of clinical interest. Related to the event chart is the FU-
triangle boxplot. R code is available to streamline the use of the PLOT of Lesser et al. (1995), which shows the timing of visits
triangle plot in practice. Supplementary materials for this article or data collection (such as a blood draw). Event charts and the
are available online. FU-PLOT can be helpful in visualizing the dropout process, but
they are disconnected from the measured values and hence they
KEY WORDS: Data visualization; Graph; Informative cannot guide the analyst as to how informative dropouts are.
dropout; Trellis plots; Triangle plot. The spaghetti plot is simply a longitudinal profile of each
patient, usually with a linear interpolation between the time
points. It is useful for small datasets where the patients can be
categorized in a few groups that can be coded either by color or
1. INTRODUCTION by a plotting symbol. With large datasets or a large number of
Data visualization is an essential component of data analysis. groups, the plot quickly becomes uninterpretable. The lasagna
As stated in Cleveland (1993), “It provides a front line of attack, plot (Swihart et al. 2010), a cousin of the spaghetti plot, displays
revealing intricate structure in data that cannot be absorbed in the longitudinal information in color-coded layers much like a
any other way. We discover unimagined effects and we challenge heatmap, with subjects as rows and time as columns. In the last
imagined ones.” It is widely argued that graphical displays of section, we will mention a connection between the lasagna plot
information are more efficient in capturing the readers’ attention and the triangle plot.
and result in a higher retention rate of the messages delivered All of these plots have features that have made them useful
in an article or presentation. In fact the American Statistical in displaying various aspects of longitudinal data but none deals
Association Style Guide (2011) states that “When feasible, put with the issue of informative dropout. The goal of this article is to
important conclusions into graphical form. Not everyone reads close this gap using the triangle plot that retains the longitudinal
an entire article from beginning to end. When readers skim an aspect of the data while uncovering the amount of information in
article, they are drawn to graphs. Try to make the graphs and the dropout process. The triangle plot is most useful in situations
their captions tell the story of your article.” where subjects dropping out do not return, leading to monotone
This article is concerned with the longitudinal setting where missingness. This is further discussed in Section 5.
subjects contribute data over multiple time points. Dropout in The rest of the article is organized as follows: Section 2 intro-
longitudinal data is common. It is rare that subjects drop out duces the triangle plot and Section 3 presents an interpretation
at random. If the dropout process is informative, data analysis as a Trellis plot. Triangle plots from a study of quality of life
needs to reflect this accordingly. The literature on the various (QOL) in patients with gastric cancer are shown in Section 4.
methods proposed to take into account informative dropouts is The article is concluded with the discussion in Section 5.
very rich. Verbeke and Molenberghs (2000) and Little (2008)
are good overviews of this literature.
2. TRIANGLE PLOT

The observed data will be denoted by Xit , where i =


Mithat Gönen, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue 1, . . . , n indexes the subject and t = 1, . . . , T denotes all the
Box 44, New York, NY 10065 (E-mail: gonenm@mskcc.org) possible time points at which measurements may be taken. Also

© 2013 American Statistical Association DOI: 10.1080/00031305.2013.785980 The American Statistician, May 2013, Vol. 67, No. 2 97