You are on page 1of 19

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/238879758

Tukey's Paper After 40 Years

Article  in  Technometrics · August 2006


DOI: 10.1198/004017006000000219

CITATIONS READS

18 2,131

1 author:

Colin Mallows
Independent Researcher
117 PUBLICATIONS   8,776 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Colin Mallows on 25 March 2015.

The user has requested enhancement of the downloaded file.


Tukey’s Paper After 40 Years
Colin M ALLOWS
Avaya Labs
Basking Ridge, NJ 07920
(colinm@avaya.com)

The paper referred to is “The Future of Data Analysis,” published in 1962. Many authors have discussed
it, notably Peter Huber, who in 1995 reviewed the period starting with Hotelling’s 1940 article “The
Teaching of Statistics.” I extend the scope of Huber’s remarks by considering also the period before 1940
and developments since 1995. I ask whether statistics is a science and suggest that to attract bright students
to our subject, we need to show them the excitement and rewards of applied work.

KEY WORDS: Data analysis; Massive data; Statistical science; University College London; Zeroth
problem.

This invited paper and the discussions were organized by Vi- I cannot hope to better this masterly survey of 55 years of de-
jay Nair. The paper and the discussions by Andreas Buja and velopment. Perhaps the most useful thing I can do is to urge you
James Landwehr were originally presented at the conference to reread Tukey’s paper, and Huber’s commentary. I must also
on the “Future of Data Analysis” in honor of Jon Kettenring at draw attention to an important 2002 National Science Foun-
Avaya Labs in October 2005. dation report Statistics: Challenges and Opportunities for the
Twenty-First Century (National Science Foundation 2002). So
1. INTRODUCTION what can I hope to do here? I think the most I can hope to do
is to give my assessment of where we are, and to point to the
Tukey’s 1962 paper (Tukey 1962) redefined our subject. It areas that seem most crucial to me.
introduced the term “data analysis” as a name for what applied
statisticians do, differentiating this from formal statistical infer-
ence. Tukey said: 2. UNIVERSITY COLLEGE LONDON
Large parts of data analysis are inferential in the sample-to-population sense, I start by extending the scope of Huber’s review, beginning
but these are only parts, not the whole. Large parts of data analysis are incisive,
laying bare indications which we could not perceive by simple and direct ex- with some personal reminiscences. When I first went to Uni-
amination of the raw data, but these too are parts, not the whole. Some parts of versity College London (UCL) in 1948 to study mathematics,
data analysis. . . are allocation, in the sense that they guide us in the distribution I had not heard of statistics (the discipline), although for sev-
of effort. . . . Data analysis is a larger and more varied field than inference, or
incisive procedures, or allocation. eral years my father (who was what in England is called a Po-
lice Chief Inspector; there seems to be no equivalent rank here)
In an early section of the paper, Tukey asked: “How can new had been responsible for developing a system for recording and
data analysis be initiated?” He suggested four ways: analyzing data on traffic accidents. He introduced a degree of
1. We should seek out wholly new questions to be answered. mechanization into the process by recording the data on edge-
2. We need to tackle old problems in more realistic frame- punched cards that could be sorted using a knitting needle. He
works. also developed some graphical displays. So perhaps some of my
3. We should seek out unfamiliar summaries of observa- aptitude was inherited. I was very fortunate to find that UCL
tional material, and establish their useful properties. had the world’s premier Statistics Department, founded by Karl
4. Still more novelty can come from finding, and evading, Pearson in 1911 as an outgrowth of his Biometric Laboratory,
still deeper lying constraints. which he had established in 1895. As a junior in the Mathemat-
ics Department, I was required to take some lectures outside
Tukey’s paper has been enormously influential. In dis-
the department, and, serendipitously, I opted to take some lec-
cussing it, I am at a great disadvantage, because many distin-
tures from F. N. David, who succeeded (as she did with many
guished authors have addressed this topic, particularly in the
others) in awakening my interest and enthusiasm for our sub-
1997 Festschrift celebrating John’s 80th birthday (Brillinger,
ject. I transferred to Statistics for my final year and stayed on to
Fernholz, and Morgenthaler 1997). Peter Huber gave a review
work for my doctorate. At that time, Egon Pearson was the se-
of developments from Hotelling’s 1940 article “The Teaching
of Statistics” (Hotelling 1940) through Tukey’s 1962 paper, the nior professor; the staff included F. N. David and N. L. Johnson,
Madison (1967), Edmonton (1974), and Berkeley (1983) con- who jointly supervised my thesis work, and H. O. Hartley was
ferences, and the 1984 David report “Renewing U.S. Mathe- there for several years. Later, of course, these three emigrated
matics. Critical Resource for the Future.” Huber then presented to the United States.
his own thoughts: “where are we now in 1995?”, “current view I cannot overemphasize the sense of history that was perva-
of the path of statistics,” “where should we go?”, and “where sive at UCL at that time. The spirit of Karl Pearson was still
will we go?”. His final comment was:
Statistics will survive and flourish through the sheer mass of applications in © 2006 American Statistical Association and
most diverse fields. But whether the field as such will retain coherence is an al- the American Society for Quality
together different question; the answer is up to us statisticians and data analysts, TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3
and to the actions we are going to take. DOI 10.1198/004017006000000219

319
320 COLIN MALLOWS

3. R. A. FISHER
When Karl Pearson died in 1936, the Statistics Department
was split in two, and Fisher became chairman of a new Eu-
genics Department, upstairs from the Statistics Department. He
had moved out some years before I got there. Of course, Fisher
had been stimulated by his study of the many decades of exper-
iments at the agricultural research station, Rothamstead. The
theory of statistics had been given a tremendous boost by his
1922 and 1925 papers in which he set out the concept of a
mathematical specification of a statistical problem, which led
to all of the machinery of sufficiency, efficiency, and so on.
Fisher identified three problems: (1) problems of specification,
(2) problems of estimation, and (3) problems of distribution.
Nowadays we would rephrase the second and third of these
problems as being those of choosing one or more methods
of analysis, and studying these methods. In the three vol-
umes Breakthroughs in Statistics, edited by Kotz and Johnson
Figure 1. John Tukey.
(1992, 1997), by my count 3/4 of the papers are devoted to
elaborations of Fisher’s formulation of statistics as a theoreti-
present, although he had been dead for 14 years; the Neyman– cal subject. They are concerned with the mathematical analysis
Pearson theory was only 20 years old, but much of the theory of a statistical specification, not of a statistical problem per se.
of Normal-theory tests had been worked out; and Bayesian- Fisher’s first problem can be criticized as being too restricted in
ism had been in decline (due to the antipathy of Fisher and scope, because it does not allow for the possibility that the cor-
the impact of Neyman and the confidence-interval approach) rect formulation of the problem may not be understood when
for 25 years. This was before Dennis Lindley’s conversion from the analysis begins. In my Fisher lecture (Mallows 1998), I sug-
frequentist to Bayesian; Jimmie Savage had not yet written his gested that we need to consider a “zeroth problem,” in which we
“Foundations of Statistics” book. Box had not yet applied ex- consider what the relevant population is, what the relevant data
perimental design concepts to response surfaces. At that time a are, and how they relate to the purpose of the statistical study.
computer was a young woman employed to sit at a mechanical Later I draw attention to what I call the “fourth problem.”
desk calculator. H. O. Hartley lectured on numerical methods Fisher’s approach made theoretical statistics into a mathe-
and showed us UCL’s collection of ancient mechanical calcu- matical subject, and the accident of Normal theory being so el-
lating machines. David Cox was at Birkbeck College, just down egant led to tremendous developments in the theory. Probability
the street from UCL, and M. G. Kendall was a professor in the theory was by this time a respectable subject, due to the efforts
London School of Economics. in the 1930s of Kolmogorov, Levy, Cramer, and later Feller and
Hanging in the corridors of the department at UCL were Loeve. There was the sense that the basic concepts of statistical
a dozen or so framed aphorisms, quotations from such great theory had been discovered and that all that remained was ap-
figures of the past as Charles Darwin, Francis Galton, and plying the theory to more sources of data and working out the
Florence Nightingale. These mottoes were all taken down when details.
Dennis Lindley became professor. In pride of place in the main Now we realize that this was an illusion. Fisher had created
classroom was an enormous two-way contingency table, show- a mathematical theory that was admirably suited to providing
ing heights of fathers and sons, dating from the time of Galton’s subjects for doctoral theses containing many theorems.
development of regression in the 1880s. The journal Biometrika
was published by UCL; it had been founded in 1905 by Karl
Pearson.
The sense was that this was the place where modern statistics
began; where Galton invented regression, where Karl Pearson
invented the correlation coefficient and his family of distrib-
utions and the chi-squared test, and where Egon Pearson and
Jerzy Neyman invented the theory of hypothesis tests. Gosset
(“Student”) had been an intimate of Karl Pearson and had in-
vented the t-test. It was impossible to forget that much of the
theory developed out of questions raised in biometrics, which
itself was stimulated by Darwin’s theory of 90 years earlier.
Statistics was basically a descriptive subject (as all sciences are
to begin with). So Karl Pearson’s family of distributions was
a major contribution, as was his chi-squared test of goodness
of fit. Experimental design was a subject of use in agriculture.
Egon Pearson was friendly with Walter Shewhart, and had vis-
ited the United States in 1931. Some of the data that we studied
were drawn from Shewhart’s experience. Figure 2. Sir Ronald Aylmer Fisher.

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3


TUKEY’S PAPER AFTER 40 YEARS 321

As Leo Breiman remarked at the Neyman–Kiefer conference 5. JOHN TUKEY


in 1983, if all you have is a hammer, every problem looks like
a nail. (This observation was not original with Breiman, of In his 1962 paper “The Future of Data Analysis,” Tukey ar-
course.) So for awhile, every statistical problem looked like a gued that statistics should not be thought of as a subfield of
hypothesis-testing problem. Nowadays we have the Bayesian mathematics, and that statistics is more comprehensive than for-
sledgehammer, which is guaranteed to make an impression on mal inference. It was this paper that introduced the term “data
every problem, but only after it has been cast into a formal analysis,” which has largely replaced the term “applied statis-
shape. tics.”
Here are two more quotations from this eminently quotable
4. HAROLD HOTELLING paper:
Hotelling’s 1940 paper, reprinted in Statistical Science To the extent that pieces of mathematical statistics fail to contribute, or are not
intended to contribute . . . to the practice of data analysis, they must be judged
in 1988, argued two things: first, that statistics had assembled as pieces of pure mathematics, and criticized according to its purest standards.
a considerable body of coherent techniques, which meant that
it deserved to be taught in its own department rather than be- Quoting Martin Wilk: We must teach an understanding of why certain sorts of
techniques . . . are indeed useful.
ing dispersed to its applied fields, and second, that because so
much of this body of knowledge was mathematical, these new Tukey and others (notably George Box, at the 1967 Madison
statistics departments should be affiliated with departments of conference) argued that statisticians should aspire to be first-
mathematics. This paper was very influential, especially in the rate scientists, rather than second-rate mathematicians. All
United States. Huber (1997) pointed out that already in 1940 commentators have pointed to education as the key; education
Deming had commented on Hotelling’s paper and had remarked in schools, exposing young students to data and its display, ed-
that: ucation in service courses, and education in graduate schools
Some of [Hotelling’s] recommendations might be misunderstood. I take it that of statistics. Students need to learn how to be scientists, not
they are not supposed to embody all that there is in the teaching of statistics,
because there are many other neglected phases that ought to be addressed. . . . just technicians. The problem is how do we attract the brightest
The modern student, and too often his teacher, overlook the fact that such a students to our subject? We all, I think, understand the excite-
simple thing as a scatter diagram is a more important tool of prediction than the ment of our subject, its intellectual challenge, and the reward
correlation coefficient. . . . Above all, the statistician must be a scientist.
that comes from an insightful analysis of a statistical problem.
Please remember this last remark; I will return to it later. How to convey this to a bright student who has some analytical
One of the results of the mathematization of the subject aptitude but is attracted by the glamor of pure science (or math)
was that ambitious and talented young professors, under pres- or the promise of riches on Wall Street?
sure to publish scholarly research, found that the easiest way Tukey’s approach can be criticized; his 1977 EDA book
to do so was to write papers that could appear in the Annals (Tukey 1977) discusses the methods of exploratory data analy-
of Mathematical Statistics. Working on applications was time- sis, but says nothing about how to use these methods. (In his
consuming and did not readily lead to research papers. Preface he pointed out that he was presenting examples, not
Particularly in this country, this led to an emphasis on math-
case studies.)
ematical derivations and proofs. Asymptotics became a major
industry. The Annals became uninteresting and unintelligible to
most applied statisticians. The great power of Fisher’s formu- 6. 1962–1995
lation was that it allowed statistical theory to develop divorced
from application. By 1962, the time was ripe for a new vision, Between 1962 and 1995 we have the era described by Huber;
which Tukey’s paper provided. refer to his paper for an insightful commentary. Highlights in-
clude a 1967 Conference on the Future of Statistics at Madi-
son [edited by Watts (1968)], a 1974 Conference on Direction
for Mathematical Sciences at Edmonton [edited by Ghurye
(1975)], the 1983 Neyman–Kiefer Conference at Berkeley
[edited by LeCam and Olshen (1985)], and the 1984 David
report titled “Renewing U.S. Mathematics.”
At the three conferences, much attention was paid to the
impact of the computer, which was beginning to be felt. But
more general topics were also considered. At the 1967 con-
ference, Tukey asked: “Is statistics a computing science?” He
commented:
Trying to be certain about uncertainty is a phenomenon of the present century.
. . . Some would not call actions or problems statistical unless they explicitly
involve the treatment of uncertainty. A few might even claim them not to be
statistical unless a formal optimization, under specific formal hypotheses, ad-
mittedly underlay what was to be done. Such views, if too widely held, would
have very unfortunate consequences for statistics.

At the 1974 Edmonton Conference, Herb Robbins’ paper


Figure 3. Harold Hotelling. “Wither Mathematical Statistics” drew much attention. I quote:
TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3
322 COLIN MALLOWS

An intense preoccupation with the latest technical minutiae, and indifference on the subject itself, rather than outward, towards the needs of
to the social and intellectual forces of tradition and revolutionary change, com- statistics in particular scientific domains.” It argued for several
bine to produce the Mandarism that some would now say already characterizes
academic statistical theory and is most likely to describe its immediate future. initiatives aimed at ensuring the health of this core.
In a commentary published in Statistical Science in 2004,
Peter Huber argued that “the cause of the apparent mis- Leo Breiman criticized the NSF report’s emphasis on devel-
ery was that too many problems in mathematical statistics has oping the “core,” claiming (in my language) that this is a
reached maturity and were simply being squeezed dry.” Huber throwback to Hotelling’s vision of statistics as an academic
identified data analysis as a promising growth area. He also subject, with research “. . . focused on the development of statis-
contributed an insightful paper to the Neyman–Kiefer memo- tical models, methods, and related theory. . . ” (Breiman 2004).
rial conference in 1983. He argued that statistics evolves along Breiman pointed out that this emphasis denigrates the way most
a spiral path, so that “after some time the focus of concern important advances in statistics have occurred not by introspec-
returns, although in a different track, to an earlier stage of devel- tion, but rather by involvement in challenging problems sug-
opment, and takes a fresh look at business left unfinished dur- gested by different disciplines and data. The report argues that
ing the last turn.” Thus the current interest in graphics (made more funding should be given to activity that is focused inward
possible by advances in computing technology) revisits prob-
rather than outward. Breiman says that:
lems addressed in the nineteenth century, after an interlude in
the twentieth century during which methods based primarily on At a time when statistics is beginning to recover from its “overmathematiza-
tion” in the post World War II years and engage in significant applications in
models (Fisher, Neyman/Pearson, Wald) attracted more inter- many areas, the report is a step into the past and not into the future.
est.
The 1984 David report on “Renewing U.S. Mathematics” is
of interest to us mainly because of what it did not address. Al- 8. IS STATISTICS A SCIENCE?
though it drew attention to developments in probability theory,
and pointed to data handling and analysis as one component of Is statistics a science? I still remember the psychic shock
a rise in mathematics usage, it said very little about statistics. I felt when I first heard the name of the new IMS journal, Sta-
tistical Science. It sounded pretentious. If statistics is a science,
then what is its subject matter? Physicists study “stuff,” biolo-
7. SINCE 1995 gists study life, astronomers study the universe—what do sta-
tisticians study? At the 2002 NSF conference, David Cox was
What has happened since 1995? Here are some high-
asked to identify “what is statistics.” His answer was that statis-
lights. In 1996 a report of the CATS subcommitee of the Na-
tics is the discipline concerned with the study of variability, the
tional Research Council on Statistical Software Engineering
identified this new interdisciplinary field (National Research study of uncertainty, and the study of decision making in the
Council/CATS 1996a). Also in 1996, the same subcommi- face of uncertainty. Similar definitions have been given many
tee published the proceedings of a workshop on massive times. This seems to say that what statisticians study is their
datasets (National Research Council/CATS 1996b). There methodology, divorced from applications. In my 1997 Fisher
Arthur Dempster stated that: lecture I criticized this kind of definition. Particularly with large
datasets, the kind of uncertainty that statistical techniques know
one of the major complaints about statistical theory as formulated and taught
in the textbooks is that it is a theory about procedures. It is divorced from the
how to handle is not the primary issue; the difficulty comes
specific phenomenon. from the complexity of the problem and the fact that we have
not been “given” a specification on which to rely. The definition
Both of these reports draw attention to how our subject is evolv-
I prefer says that:
ing in new directions.
The 1998 National Science Foundation report (“the Odom Statistics concerns the relation of quantitative data to a real-world problem,
often in the presence of variability and uncertainty. It attempts to make precise
report”) was concerned mainly with mathematics; it formu- and explicit what the data has to say about the problem of interest.
lated three primary activities of mathematicians. For statisti-
cians, these are: However we view this issue, the question remains: Is sta-
tistics a science? We all agree that statisticians should act like
• Generating concepts (and also methodologies) scientists, but is statistics itself a science? We are like carpen-
• Interacting with areas that use statistics ters, with the Neyman–Pearson hammer, Bayesian formalism,
• Attracting and developing the next generation of statisti- and now also a collection of powerful and delicate computing
cians. and graphical tools, but we need experience to know when and
In 2002 JASA published a series of 52 vignettes that were col- how to use them.
lected in a volume titled Statistics in the 21st Century (Raftery Perhaps statistics is the science whose domain is inference,
et al. 2002). These short review articles highlight important ad- from data to a substantive problem. In his 1962 paper Tukey
vances and outline potentially fruitful areas of research. said that:
Also in 2002, the National Science Foundation published a Three constituents will be judged essential [for constituting a science]:
report titled Statistics: Challenges and Opportunities for the
(a1) intellectual content;
21st Century, a shortened version of which appeared in Statisti- (a2) organization into an understandable form;
cal Science in 2004. This report included a short history of the (a3) reliance on the test of experience as the ultimate standard of validity.
development of our subject. It identifies the “core” of our sub- . . . Data analysis passes all three tests. I would regard it as a science, one defined
ject as “the subset of statistical activity that is focused inward, by a ubiquitous problem rather than by a concrete subject.

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3


TUKEY’S PAPER AFTER 40 YEARS 323

So is statistics (or, if you prefer, data analysis) a science? any nonzero value for the prior probability that discrimination
If it is, then it is still a science in the preanalytical stage. We has occurred. What is needed is a decision procedure that will
have barely begun to organize our thinking about how statistical determine whether there was or was not discrimination. So a
technology is applied to substantive problems. In 1980 Peter test procedure seems appropriate.
Walley and I pleaded for more work in this direction. Not much But the validity of the permutation test depends on the va-
has happened since then. lidity of the model. Suppose that there is a strong day-of-week
In my years at Bell Labs I developed a sincere admiration effect but that day of the week has not been recorded in the
for engineers, who have to make things work in the real world. available data. Suppose in fact that data for Fridays tends to be
Statistics is nothing more than a trivial exercise if it does not ad- larger than for other weekdays, and that the single CLEC value
dress real-world problems. I again quote John Tukey, who said happened to fall on a Friday. Then we should be comparing this
(in an interview recorded in the 1997 Festschrift), that “statis- CLEC value not with the whole set of ILEC data, but only with
tics is a pure technology.” the Friday values. Similarly, a CLEC value that occurred on a
But it is a technology that faces new challenges; to deal Wednesday may not be outlying when viewed against the whole
with them, we need new ideas, new theories, and new method- ILEC dataset but may become so when compared with other
ologies. In 1995 Daryl Pregibon and I gave a definition of a Wednesday data. A careful survey of the methodology that has
massive-data problem. This is not just a problem for which we been proposed should draw attention to this kind of difficulty.
have massive data; that could be simply a problem where classi- The key idea seems to be exchangeability; if we ignore (or can-
cal asymptotic theory is all that is needed. I have not seen many not observe) day of week, then the observations may appear to
of those. No, a massive data problem is a massive problem for be exchangeable, but if we can see day of week, then they may
which we have massive data. (If we have just the problem with- not be (see Draper, Hodges, Mallows, and Pregibon 1993). It
out the massive data, it is one for which statistics has little to is the analyst’s responsibility to determine how to organize the
offer.) In a massive problem, the difficulty is the complexity. data so that comparisons are made of like with like. In the par-
This point was made repeatedly in the 1996 NRC Workshop ity proceedings, much time was spent arguing about the proper
report. levels of disaggregation for the data. These discussions were
not illuminated by actual data. The problem of reaggregating
the within-cell statistics to provide an overall criterion was an
9. AN EXAMPLE
interesting technical challenge.
Let me discuss a very simple example. For my last 3 years It seems that the most important contribution a statistician
with AT&T, I was deeply involved in the parity problem. I pub- could make in this problem is not to design a “valid” test, but
lished a summary of some of the technical problems that arose rather to ensure that disaggregation has been carried far enough.
in Statistical Science in 2002 (Mallows 2002). This is not Fisher’s second or third problem, or even his first,
The Telecommunications Act of 1996 mandated that incum- but the zeroth problem! We can also identify the “fourth prob-
bent local exchange carriers (ILECs, Bell companies) must pro- lem,” which comes after the statistical analysis has been com-
vide (on request, for a fair fee) certain services for incoming pleted; it is to interpret the results in terms that are intelligible
competitive companies (CLECs, such as AT&T, trying to enter to the nonstatistical worker. Here I have little to suggest. I have
the local telephone market), with these services to be “at least no hope of explaining to an innumerate commissioner the pro-
equal in quality to that provided by the local exchange carrier to cedure that I proposed, which can be described as a “balanced
itself.” ILECs routinely collect data on many variables to mon- averaged adjusted truncated disaggregated modified t.”
itor their performance. All one can hope to do in such a situation is to get the op-
The available data is the monthly reports that the ILEC posing technical experts and the commission technical staff to
prepares, and corresponding reports for the CLECs. A nat- agree that the procedure makes sense, so that the commission
ural approach for a statistician is to postulate that for each will have no grounds for objecting.
recorded variable, the ILEC data represents a random sample
from some distribution, whereas the CLEC data represent a 10. A NEW VISION
sample from a possibly different distribution, the problem be-
ing to test whether these distributions are the same. A permu- I think the time is ripe for a new vision of our subject. What
tation test seems perfectly suited for the purpose, because its statisticians do is to exploit the idea of a probability model for
null hypothesis is exactly that the two distributions are identi- data. In teaching the technical methods of the subject, the model
cal, without relying on any assumption as to their shape. I tes- is assumed to be “given,” not in question. Bayesians go even
tified to this effect before several state commissions, and my further, assuming that the analyst can assign prior probabilities
argument was accepted as valid. I argued that even with as few for every unknown quantity. And decision theory assumes that
as one CLEC observation, on a variable where large values are the consequences of every possible decision are known. But in
bad, then if this single CLEC observation is larger than each real applications, models are not known; a major part of the
of, say, 99 ILEC observations, then this is strong evidence (at problem lies in setting up an appropriate model. This is Fisher’s
the 1% level) that discrimination against the CLEC customer “first problem.”
has occurred. Such an argument would be easy to present to a Two Fisher lectures, in 1988 and 1989, by Erich Lehmann
statistics student. Note that there is no possibility of a Bayesian and David Cox (both printed in Statistical Science in 1990), ad-
approach to this problem, because in addition to the fact that we dressed this problem. Nowadays students learn in school about
do not know the distribution shapes, the ILEC will not accept collecting and presenting data, but from our point of view they
TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3
324 COLIN MALLOWS

are still stuck in the nineteenth century, when the emphasis was by people interested only in parts of the whole (such as machine
on describing populations and not on statistical inference. Much learning, “data mining,” or image analysis), then we must look
university teaching addresses Fisher’s second and third prob- for the common elements in these various fields and develop
lems, which deal with what we commonly think of as statisti- a framework that encompasses them all. This framework need
cal theory, the technical tools of our profession. Fisher’s first not be mathematical; mathematics is seductively easy compared
problem is setting up the probability model that will be used. with data analysis. It should not merely organize the techniques.
My zeroth problem logically precedes that, considering what The key concept is “statistical thinking.”
data to use and how it relates to the substantive problem. The
fourth problem is interpreting the results of the statistical analy- ACKNOWLEDGMENTS
sis. When I learned the Neyman–Pearson theory from Egon
Pearson himself, it was applied only to very simple problems A version of this article was presented at a conference in
in which the meaning of a final conclusion that “the effect is honor of Jon Kettenring, held at Avaya Research, September 30,
statistically significant” was self-evident. But in anything more 2005. At that conference I remarked that Jon Kettenring and
than the simplest problems, the results need to be interpreted, I are professional brothers, because both of us were guided (on
with all of the necessary caveats. different continents) by Norman Johnson in our thesis work.
What we need to do is to attract bright university students to Thanks to several commentators and referees for stimulating
the excitement of the subject, by showing them how the statis- me to polish this article (a little).
tical approach can lead to insights in many fields. To do this, [Received January 2006. Revised April 2006.]
I suggest that we need to show them how the five stages (zeroth
through fourth) appear in applied problems. A first step might
be to look at the 20 or so JASA vignettes that are concerned with REFERENCES
applications and try to identify common elements. Surely, each Andrews, D. F., and Herzberg, A. M. (1985), Data, New York: Springer-Verlag.
of these applications areas is not completely different from all Breiman, L. (2004), “Comment on the NSF Report on the Future of Statistics,”
Statistical Science, 19, 411.
the others. What has been done in the past is to organize appli- Brillinger, D., Fernholz, L. T., and Morgenthaler, S. (1997), The Practice of
cations by the methodologies they use; GLMs, survival analy- Data Analysis (the Tukey Festschrift), Princeton, NJ: Princeton University
Press.
sis, ARIMA models, and so on. But this does not address the Cox, D. R. (1990), “Role of Models in Statistical Analysis (The 1989 Fisher
question of how one chooses an appropriate methodology. What Lecture),” Statistical Science, 5, 169–174.
we need is a classification of applications that does this. Cox, D. R., and Snell, E. J. (1981), Applied Statistics: Principles and Examples,
New York: Chapman & Hall.
I suggest that someone should take on the task of looking Draper, D., Hodges, J. S., Mallows, C. L., and Pregibon, D. (1993), “Exchange-
for the common features among the JASA vignettes, and also ability and Data Analysis,” Journal of the Royal Statistical Society, Ser. A,
among the subject areas listed in the 2002 NSF report, and per- 156, 9–28.
Ghurye, S. G. (1975), “Proceedings of the Conference on Directions for Math-
haps the essays in the collection edited by Tanur et al. (1987) ematical Statistics (The Edmonton Conference),” special supplement to Ad-
and the collections of small problems collected by Cox and vances in Applied Probability.
Snell (1981), Andrews and Herzberg (1985), and Hand, Daly, Hand, D. J., Daly, F., Lunn, A. D., McConway, K. J., and Ostrowski, E. (1994),
A Handbook of Small Data Sets, London: Chapman & Hall.
Lunn, McConway, and Ostrowski (1994) with the purpose of Hotelling, H. (1940), “The Teaching of Statistics,” The Annals of Mathematical
organizing the material so that the excitement of applications Statistics, 11, 457–470; reprinted in Statistical Science (1988), 3, 63–71.
can be taught to university students. I am not the first to sug- Huber, P. J. (1997), “Speculations on the Path of Statistics,” in The Practice
of Data Analysis, eds. D. R. Brillinger, L. T. Fernholz, and S. Morgenthaler,
gest this. In his 1962 paper, Tukey stated that “the ideas of data Princeton, NJ: Princeton University Press, pp. 175–191.
analysis ought to survive a look at how data is analyzed.” Kotz, S., and Johnson, N. L. (1992, 1997), Breakthroughs in Statistics,
Moreover, he noted that he once suggested at a statistical Vols. 1–3, New York: Springer-Verlag.
LeCam, L., and Olshen, R. A. (eds.) (1985), Proceedings of the Berkeley
meeting that it might be useful if statisticians looked to see how Conference in Honor of Jerzy Neyman and Jack Kiefer, Belmont, CA:
data was actually analyzed by many sorts of people. And he Wadsworth.
was criticized by a “very eminent and senior statistician” who Lehmann, E. L. (1990), “Model Specification: The Views of Fisher and Ney-
man, and Later Developments (The 1988 Fisher Lecture),” Statistical Sci-
said that this idea might have merit, but that young statisticians ence, 5, 160–168.
should not indulge it too much, because it might distort their Mallows, C. L. (1998), “The Zeroth Problem (1997 Fisher Lecture),” The Amer-
ideas. ican Statistician, 52, 1–9.
(2002), “Parity: Implementing the Telecommunications Act of 1996,”
In my 1997 Fisher lecture, I quoted from Rubin (1993, Statistical Science, 17, 256–285.
p. 204): Mallows, C. L., and Pregibon, D. (1995), “Some Statistical Principles for Mas-
sive Data Problems,” in Proceeding of the Statistical Computing Section,
The special training statisticians receive in mapping real problems into formal American Statistical Association.
probability models, computing inferences from data and models, and exploring Mallows, C. L., and Walley, P. (1980), “A Theory of Data Analysis?” in Pro-
the adequacy of these inferences, is not really part of any other formal disci- ceedings of the Business and Economics Section, American Statistical Asso-
pline, yet is often crucial to the quality of empirical research. ciation.
National Research Council/CATS (1992), Combining Information, Washing-
I commented then, and say again now: Would that students ton, DC: National Academies Press.
indeed were so trained! (1996a), Massive Data Sets, Washington, DC: National Academies
Press.
(1996b), Statistical Software Engineering, Washington, DC: National
11. CONCLUSION Academies Press.
National Science Foundation (2002), Statistics: Challenges and Opportunities
for the 21st Century, Arlington, VA: Author.
Statisticians need to be involved with real problems. If our Raftery, A. E., Tanner, M. A., and Wells, M. T. (eds.) (2002), Statistics in the
discipline is to prosper and not have its growth areas taken over 21st Century, London: Chapman & Hall/CRC.

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3


DISCUSSION 325

Rubin, D. R. (1993), “The Future of Statistics,” Statistics and Computing, 3, Tukey, J. W. (1962), “The Future of Data Analysis,” The Annals of Mathemati-
204. cal Statistics, 33, 1–67.
Tanur, J. M., et al. (1987), Statistics: A Guide to the Unknown (2nd ed.), San (1977), Exploratory Data Analysis, Reading, MA: Addison-Wesley.
Francisco: Holden-Day. Watts, D. G. (1968), The Future of Statistics, New York: Academic Press.

Discussion
David R. B RILLINGER
University of California
Berkeley, CA 94720
(brill@stat.Berkeley.edu )

1. INTRODUCTION of expertise included sampling, multivariate analysis, time se-


ries, analysis of variance, and the newly defined field of data
It is a total pleasure to be invited to comment on Colin’s analysis. [Gnanadesikan (2001) reminded me that JWT came
timely paper. In it Colin refers to Bell Labs and AT&T sev- up with the term “data analysis” at a party at my house in 1960.
eral times. Further, the Tukey (JWT) paper lists his affiliations Ram’s paper contains many reminiscences about the Labs and
as Princeton University and Bell Telephone Laboratories, so comments on data analysis.]
I seize an opportunity to celebrate the Labs of the early 1960s Martin Wilk went on to become a Vice President of AT&T
as well as comment on his ideas. and then Director of Statistics Canada. He was one of the few
Colin’s paper brings back so many memories of the people who could cause John Tukey to really focus on the topic
1960–1964 period: anecdotes, FFTs, lunches, seminars, at hand. (JWT was one of the great multiprocessors and typi-
Hamming, Tukey, Hamming–Tukey, golf, learning, visitors,
cally focused on several things at a time.) In particular, Martin
computing, books, history, open doors, pink paper drafts,
could sum up mighty ideas in a pithy phrase or sentence. To give
technical reports, rides between Princeton and Murray Hill,
an example, there was a scorn for significance tests at the Labs.
shared offices, AMTSs, chiding, support (personal and finan-
Martin remarked: “Significance tests are things to do while your
cial), opportunities (both seized and missed), blackboards, air-
are thinking about what you really want to do.” Both Colin and
conditioning, freedom, confidence, pranks, Tukey anecdotes,
Martin went on to write influential papers with Tukey on ex-
gossip, conferences, unpublished memos, and people who are
ploratory data analysis.
no longer with us. Pursuit of excellence was the order of the
day. I could write a page or more on each of these topics, but
this is not the place. 3. THE RESEARCH
I was at Bell Labs for the summers of 1960, 1961, and then
for the years 1962–1964. I was a summer student at first and The Labs’ researchers’ directions then were not specifically
next a Member of Technical Staff (MTS). These were magic laid out by the higher-ups, rather various management and engi-
years at a magic place. None of the involved persons with whom neering types would drop in with problems. It seemed that few,
I have used the term have ever disagreed. I can say that every- if any, in the statistics group could resist these problems, puz-
thing important about statistics that I ever learned, I learned at zles, or datasets. There were expected and unexpected discov-
lunch at Murray Hill. The rest of my career has been applying eries. Terminology was created, graphic displays were basic,
what I learned. residuals were fodder, engineering and chemical science were
Colin reviews a place (University College London, 1948– ever present. Gnanadesikan (2001) used the word “synergy” to
1958) and people (Fisher, Hotelling, Tukey) in his paper. I will describe the milieu.
do the same. A theme of my discussion is that the Labs of the early 1960s
were magic years for data analysis. They were also magic years
2. THE PEOPLE for the digitization of the engineering sciences. The FFT (fast
Fourier transform) has been mentioned, but also seismic records
Colin is, of course, one of the key influences, drivers, crit- and speech were being digitized and an analysis sometimes cul-
ics, and contributors to the development of modern data analy- minated with an analog record. I mention this because a great
sis. He is a problem solver with few if any peers. At the Labs talent that Colin brought to the Statistics Group was skills in
he used to be in his office (with door wide open), at lunch, al- combinatorics and discrete mathematics.
ways available and always interruptible. The others in the group
with wide-open doors and a thirst for discovery included Martin
© 2006 American Statistical Association and
Wilk, Ram Gnanadesikan, Bill Williams, Roger Pinkham, and a the American Society for Quality
stream of visitors. Of course, John Tukey dropped in/appeared TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3
steadily from the management wing of the buildings. The fields DOI 10.1198/004017006000000200

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3


326 DAVID R. BRILLINGER

4. TUKEY’S PAPER • Get them to attend pertinent courses.


• Teach pertinent courses.
“Tukey’s paper” was the first article of the first number of
• Get them to attend talks, and get talks presented.
the Annals of Mathematical Statistics of 1962. The editor at
• Pay them well.
that time was J. L. Hodges Jr., who was renowned for both the-
• Raid the computer science departments. (There are lots of
oretical and applied statistics work. No thanks are given in the
straight computing problems, like how to work out bag-
paper to referees, so perhaps the editor published it on his own
plots and how to speed up computations, that can lure stu-
authority. The paper had been received by the Annals on July 1,
1961 and was presented at the IMS Meeting in Seattle in 1961, dents in.)
so it was out in public. My own serious attempt at an original course was Statis-
Tukey’s Foreward to the Collected Works (Jones 1986) is tics 215a, taught in the fall semesters of 2003 and 2004 here
worth a read. For example, one finds at the beginning: “Besse at Berkeley. The syllabus, book list, and readings are provided
Day (Mauss), who spent a year with R. A. Fisher, once told me in the Appendix.
that he told her that ‘all he had learned he had learned over Another attempt I made was to use the book of De Veaux,
the (then hand-cranked) calculating machine’.” I record this Velleman, and Bock (2006) as text in a third-year undergrad-
quote to lead into the remark that JWT was involved in more uate course. In it many EDA techniques are illustrated, there
than pencil-and-paper data analyses. Tukey’s paper presents an is a chapter on “Regression Wisdom,” and one finds the stric-
example. There are several analyses of one particular dataset, ture “Make a picture. Make a picture. Make a picture.” repeated
a 36 × 15 table of the values of some particular multiple regres- many times. (This was a Labs mantra.) Students from a broad
sion coefficients. JWT presents a robust/resistant row/column group of departments registered for the course and appeared to
fitting procedure. The Foreward is also interesting for JWT’s
grasp the EDA concepts almost immediately.
comments on Bayesian statistics.
I am sure others teach such courses. It strikes me that one
does not have to yearn for a reincarnation of that 1960s Labs
5. COLIN’S PAPER environment, because the ideas are out and Tukey-type data
Colin asks a sequence of questions: analysis is now the order of the day.

• “How do we attract the brightest students to our subject?”


• “How to convey this to a bright student, who has some 7. SUMMARY
analytical attitude, but who is attracted to the glamour of
pure science (or math), or the promise of riches in Wall I call this 1960–1964 period “magic years” because the seeds
Street?” for high-quality statistical analyses were sown then, and analy-
• “Is statistics a science?” ses in which electronic computers, graphics, and residuals be-
• “If statistics is a science, what is its subject matter?” came paramount. Sadly, one cannot say the same about the
• “What do statisticians study?” Labs; how the mighty have fallen.
• “The question remains, is statistics a science?” I end with the following note. There was talk at the 1960s
• “But is statistics itself a science?” lunches of forming a Society of Data Analysis. My contribution
• “So is statistics, or data analysis if you prefer, a science?” was to suggest that Tukey could be called “soda pop.”
• “Surely each of these applications areas is not completely
different from all the others?” APPENDIX: STATISTICS 215A “APPLIED STATISTICS
• “How does one choose an appropriate methodology?” AT AN ADVANCED LEVEL,” UNIVERSITY OF
CALIFORNIA BERKELEY 2003, 2004
6. SOME ANSWERS TO THE QUESTIONS
Syllabus
First off, I am not going to get into the “is it a science?”
discussion, because I just do not think that it matters much. Week 1. Stem-and-leaf, 5-number summary, boxplot, par-
I am happy to view “statistics/data analysis” as a fine endeavor allel boxplots, examples
that provides much amusement and contributions of insight Week 2. EDA vs. CDA vs. DM, magical thinking, scatter
and understanding to scientific researchers. I leave the ques- plots, pairs(), bagplot(), spin()
tion to others, but note that Colin mentions his “sincere admi- Week 3. Summaries of location, spread vs. level plot, em-
ration for engineers, who have to make things work in the real pirical Q–Q plot, smoothing scatterplots, smoothing types
world” (I have heard this sentiment phrased as “every engineer- Week 4. The future of data analysis, linear fitting, OLS,
ing problem has a solution”), and engineering statistics is one WLS, NLS, multiple OLS, robust/resistant fitting of straight
of our subfields (see Technometrics). line
However, “how to involve students” is a question dear to my Week 5. Optimization methods, the psi function, residual
heart. I do have suggestions: analysis, fitting by stages, the x-values
• Get them to read books like the Hoaglin–Mosteller–Tukey Week 6. Wavelets, NLS, robust/resistant variants, smooth-
(1983, 1985, 1991) series. (I note Colin’s chiding of JWT’s ing/nonparametric regression, sensitivity curve, two-way arrays
EDA book with “his 1977 EDA book discusses the meth- Week 7. Residuals analysis for two-way array, L1 approxi-
ods of exploratory data analysis, but says nothing about mation, median polish, diagnostic plot, data analysis and statis-
how to use these methods.”) tics: an expository overview
TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3
DISCUSSION 327

Week 9. Exploratory analysis of variance: terminology, Readings


overlays, ANOVA table, rob/res methods, examples
Breiman, L. (2001), “Statistical Modeling: The Two Cul-
Week 10. Some principles of data analysis
tures,” Statistical Science, 16, 199–231.
Week 11. r − 2, R − 2, Simpson’s paradox, lurking variables
Diaconnis, P. (1985), “Theories of Data Analysis: From
Week 12. Exploratory time series analysis (ETSA), plotting
Magical Thinking Through Classical Statistics,” in Exploring
time series, methods Data Tables, Trend, and Shapes, eds. D. Hoaglin, F. Mosteller,
Week 13. Data mining, definitions; contrasts with statistics and J. Tukey, New York: Wiley, pp. 1–36.
Week 14. Data mining for time series, for association rules, Friedman, J. H. (2001), “The Role of Statistics in the Data
market basket analysis. Revolution,” International Statistical Review, 29, 5–10.
Hand, D. J. (1998), “Data Mining: Statistics and More,” The
American Statistician, 52, 112–118.
Book List Mallows, C., and Pregibon, D. (1987), “Some Principles of
Data Analysis,” in Proceedings of the 46th Session ISI, Tokyo,
Cleveland, W. S. (1994), The Elements of Graphing Data, pp. 267–278.
Belmont, CA: Wadsworth. Mannila, H. (2001), “Theoretical Framework for Data Min-
Chambers, J. M., Cleveland, W. S., Kleiner, B., and ing,” SIGKDD, 1, 30–32.
Tukey, P. A. (1983), Graphical Methods for Data Analysis, (1980), “We Need Both Exploratory and Confirma-
Duxbury. tory,” in The Collected Works of John W. Tukey, ed. L. V. Jones,
Monterey, CA: Wadsworth & Brooks/Cole, pp. 811–817.
Hand, D., Mannila, H., and Smyth, P. (2000), Principles of
Tukey, J. W. (1962), “The Future of Data Analysis,” in The
Data Mining, Cambridge, MA: MIT Press.
Collected Works of John W. Tukey, ed. L. V. Jones, Monterey,
Hastie, T., Tibshirani, R., and Friedman, J. (2001), The Ele-
CA: Wadsworth & Brooks/Cole, pp. 391–484.
ments of Statistical Learning, New York: Springer-Verlag. Tukey, J. W., and Wilk, M. B. (1966), “Data Analysis and
Hoaglin, D., Mosteller, F., and Tukey, J. (1983), Understand- Statistics: An Expository Overview,” in The Collected Works of
ing Robust and Exploratory Data Analysis, New York: Wiley. John W. Tukey, ed. L. V. Jones, Monterey, CA: Wadsworth &
(1985), Exploring Data Tables, Trends, and Shapes, Brooks/Cole, pp. 549–578.
New York: Wiley.
(1991), Fundamentals of Exploratory Analysis of
ADDITIONAL REFERENCES
Variance, New York: Wiley.
Mosteller, F., and Tukey, J. W. (1977), Data Analysis and DeVeaux, R. D., Velleman, P. F., and Bock, D. E. (2006), Introductory Statstics,
Boston: Pearson, Addison-Wesley.
Regression, Reading, MA: Addison-Wesley. Gnanadesikan, R. (2001), “A Conversation With Ramanathan Gnanadesikan,”
Rao, C. R. (2002), Linear Statistical Inference and Its Appli- Statistical Science, 16, 295–309.
cations, New York: Wiley. Hoaglin et al., see the Appendix.
Jones, L. V. (1986), The Collected Works of John W. Tukey, Vols. III and IV,
Tukey, J. W. (1977), Exploratory Data Analysis, Reading, Monterey, CA: Wadsworth & Brooks/Cole.
MA: Addison-Wesley. Mallows, C., and Tukey, J. W. (1982), “An Overview of Techniques of Data
Analysis, Emphasizing Its Exploratory Aspects,” in The Collected Works
Venables, W. N., and Ripley, B. D. (2002), Modern Applied of John Tukey, ed. L.V. Jones, Monterey, CA: Wadsworth & Brooks/Cole,
Statistics With S–PLUS, New York: Springer-Verlag. pp. 891–968.

Discussion
Andreas B UJA

Department of Statistics
University of Pennsylvania
Philadelphia, PA 19104
(buja@wharton.upenn.edu )

Colin Mallows discussion of Tukey’s paper gives us an op- exactly a conversation stopper, but it did not move the conver-
portunity to clarify our thoughts about the state of the field. Be- sation in a desirable direction either. This I remember from the
fore I enter into a debate with Colin, I will follow his lead by 1980s. Did we have a problem back then, and, if so, do we still
reminiscing about the past—a more recent past than his, how- have it today?
ever.
© 2006 American Statistical Association and
It used to be that self-identification as a statistician, at parties,
the American Society for Quality
say, produced rambling responses about “the worst class I had TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3
to take in college.” The confession “I’m in statistics” was not DOI 10.1198/004017006000000192

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3


328 ANDREAS BUJA

Recently, I had an experience of the opposite kind, and al- brilliance, but recently people who we may have thought of
though it may not (yet?) be typical, it may be an indicator of as archetypes of theoreticians have also gotten themselves in-
a climate change: One day at the start of my train ride home, volved in applications: Peter Bickel and Larry Brown come to
a young woman asked whether she could have the seat next to mind.
me. As the train began to roll, we independently opened our Colin asks whether statistics is a “science.” He criticizes de-
bags and both pulled out very similar looking paper: LaTex finitions of statistics that refer to the study of methodology, di-
hardcopy. A conversation ensued, part of which was “what do vorced from applications, and he prefers a definition that refers
you do?” When I told her that I am in statistics, she let out a to application, quantitative data, and meaning in data. I am not
sigh and said “I wish I had gone into statistics.” Why? She was so sure; this seems dangerously close to us being the guys who
a graduate student at Penn in astrophysics, and for her Ph.D. she know how to use tools. Of course we should know how to use
analyzed quasar data and needed statistics to look for support tools, and the world will appreciate us if we do use them, but
of a theory according to which quasar signals look the same run for us on the inside I think any particular application is just that:
forward and backward in time. Moreover, her husband was an particular. I do not think we are interested in microarrays and
econometrician on Wall Street, who applied statistical analysis transaction data as such, although we feel deeply rewarded if
to financial time series for good pay. we help sciences and businesses achieve their substantive goals.
These days I feel quite comfortable being a statistician. More Yet those of us who apply themselves to application areas hope
than ever it has what I find attractive: the license to do so many to find new problems and challenges of a generality that tran-
different things, ranging from pretty math to powerful comput- scend these areas. We hope for conceptual and methodological
ing to applications in almost any science. If you allow me to innovation and enjoy the successes in applications as beneficial
reminisce for one more moment, I am still fond of computer and essential side effects that keep us grounded and make the
graphics, especially the collaboration with Debby Swayne and world happy. Our preoccupation with methodological general-
Di Cook, and I have fond memories of a programming error that ity is not misguided, because it has long-term benefits in that
I once made way back at Stanford. It amounted to unknowingly generalizations abstracted from one application may pay off in
stuffing scrambled data into an algorithm (multidimensional future applications. Actually, Colin implicitly agrees at the end
scaling), which produced amazingly regular and pretty pictures. of his paper when he urges us to go over the JASA vignettes and
They were indeed so puzzling that later they became the inspi- look for common strands of ideas and approaches.
ration for a very pretty mathematical problem, for which I was Colin’s statement that “statistics is nothing more than a trivial
lucky enough to fall into a collaboration with Logan, Reeds, exercise if it does not address real-world problems” makes me
and Shepp that produced an even prettier solution. somewhat uneasy. I would rather hold that solving real-world
I am also fond of the memories of one of my first consulting problems is essential for the practice of statistics, but it does
experiences at the University of Washington, which consisted not belong in its definition. Metaphorically, we are tool makers,
of co-advising a Ph.D. student in musicology (of all fields) who not carpenters. We are into tools, not furniture, although we
had collected two fascinating lab datasets that recorded the re- find it essential to spend a fair amount of time in carpenters’
sponses of trained musicians to musical stimuli: how they felt workshops to see what new tools might be needed for future
a given tone fragment should be continued, and to what degree furniture making.
they felt a tone fitted into a cord. The datasets were so rich and As a definition of statistics the field, I propose the following:
so structured, the likes I have not seen since. Statistics is the science of quantitative methodology for data
In all, I think statistics is one of the most satisfying intellec- analysis and data-based decision making. Not part of the defin-
tual environments, great for those who like dabbling in many ar- ition are the concepts of uncertainty and variation, and neither
eas, and also great for the focused and highly specialized minds. are applications, just as the definition of physics should not in-
I am not too worried about finding promising young researchers clude the concepts of heat and particles. Another issue: In the
in the recent crop of graduates, judging from our last recruiting foregoing definition, I apply the word “quantitative” to method-
season in which we felt we just did not have enough positions ology, not to data. Why? Because we are perfectly able to an-
for the available talent. alyze aspects of qualitative data by quantitative means. In this
Having started on an optimistic note, let me continue right proposal I also use the phrase “the science of. . . ,” because sta-
along. Colin discusses the question of whether the field is get- tistics is neither just mathematical theory nor just application;
ting too fragmented. Some of us remember from the 1980s importantly, it includes pondering the meaning of rather deep
the call for application and relevance of statistical research. concepts. I will return to this in a moment.
This call was heeded to such an extent that a later NSF report Note that any definition of statistics is really meant for our
called for renewed focus on the “core,” an idea that the late Leo self-reflection and for people in friendly fields who already have
Breiman found misguided. Do we have a problem? Maybe not. some notion of statistics. Definitions typically cannot be used
The focus on the core is not incompatible with relevance and to explain to folks at parties what we do. If they ask, you may
application. As a community, we should be able to attend to say “for example, we develop methods for telling from your
both. At worst, we will go the way of physics; physicists have genes whether you’ll get cancer,” on which you may hear “oh,
always had a mostly healthy tension between experimentalists so you’re into this bio stuff! That’s hot!,” and you say “not re-
and theorists. Sure they tease each other, but physics remains ally, the same methods could be used to predict from your credit
the most successful science there is. In statistics, we may ac- records whether you’ll go bankrupt.” The point is that the ab-
tually have reversed some of the tension between the theoreti- straction level at which we operate may be difficult to convey
cal and applied areas. Colin has always bridged the two with to folks on the outside. We on the inside would miss the level
TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3
DISCUSSION 329

also if we reduced statistics to toying in applications. It is not them we are a crowd of privileged and largely white males (not
too weird if some of us (not all of us!) spend time on tools that all dead yet), who have instituted rules by which to play games
have never seen an application. Take the Bayesians; theirs was a of science. Rules being arbitrary, the games are arbitrary, and
“trivial exercise” for a long time, “divorced from applications,” nothing essential sets us apart from other games being played
until computing unlocked their tools and unleashed this torrent the world over—political, religious, literary, artistic, athletic. In
of Bayesian modeling that is still on us. this thinking, our game is of a highbrow variety, but such ex-
Colin quotes an important Tukey sentence: “Statistics is pure ist for all other types of games as well. Why do I go off on
technology.” I will have some bones to pick with this, too, but this seemingly unwarranted tangent? Why inveigh gratuitously
on the whole it jibes with the idea of statistics as tool making. against folks to whom we have never spoken? The reason is
This sentence is an excellent corrective, especially for first-year that we actually have something to say, and what we have to
graduate students to whom statistics looks like a proving ground say throws light on our field and its role in the sciences.
for math skills, because all they do is solve cute math problems The major point that we should insist on is that rules are not
for the math stat course. It helps to tell them that statistics is arbitrary, and not all games are equal. Indeed, few things are
also a place where one invents tools. Indeed, making inventions as empowering to humans as are good conventions. An exam-
may be more important than mathematical theory, although it is ple are the rules and conventions of the sciences that have the
true that the inventions should be mathematically informed. power to produce true theories in the long run. And some of
The quibble I have with Tukey’s quote is that it is too ab- these rules and conventions are in fact owed to statistics. Statis-
solute. I would still say that statistics is more science than en- tics contributes to the conduct of science in the following two
gineering, mainly because of the depth of the concepts and ways: (1) It develops and proposes rules for guarding against
insights we bring to bear. Here is a small list of such con- the overinterpretation of data, which is the traditional domain
cepts, many forming natural dualities: randomness, uncertainty, of statistical inference, and (2) it also constantly explores new
sample spaces (=sets of hypothetical worlds), probability of language to express quantitative associations in data, which is
events and plausibility (likelihood) of hypotheses, populations the domain of modeling in all its incarnations, Bayesian or fre-
and samples, sampling (=dataset-to-dataset) variability, priors quentist, parametric or nonparametric. Although we may be a
and posteriors, structure-and-error or signal-and-noise, causal- little jaded by the vengeance with which vast areas of the sci-
ity and correlation, intervention and observation, exploration ences have embraced the use of p values and confidence inter-
and inference, explanation and prediction, confounding, type 1 vals, the fact remains that these conventions provide protections
and type 2 error and multiplicity, aggregation and disaggrega- that we would not want to miss. As for statistical modeling, the
tion (mentioned by Colin), and, on a related note, Simpson’s area of greatest creative effort today, we tend to think of it as
paradox. These notions go beyond the purely technological as- technology. This view covers only the use of models for predic-
pects of tool making; they are deep, and some have ancestry tion, however. When using models for interpretation, it is more
in old traditions of philosophy. Indeed, similar to the way the useful to think of them as languages that describe quantitative
natural sciences replaced what was formerly the “philosophy associations. A difficulty with models is often that there are too
of nature,” statistics appropriated topics that used to belong to many to choose from, as may be the case with structural equa-
“epistemology.” Again similar to the natural sciences, statistics tion models, some types of Bayesian models, and, of course,
developed some aspects of epistemology beyond anything that nonparametric models from trees to boosting to SVMs. Facing
philosophers of the past could have anticipated. In as far as it an embarrassment of riches, we tend to complain, but instead
is the business of statistics to ponder the question “how is it should we not be happy with the expressive choices that we
possible to extract knowledge from empirical observations?,” have? If there is a danger with the current wealth in modeling,
our field is the legitimate inheritor of the quantifiable aspects of then it is that we and our colleagues in the substantive fields are
epistemology. seduced to look for answers in the latest statistical models as
Some of us may better know a newer strand of philosophy, opposed to substantive theory. These are minor problems, how-
called “philosophy of science,” which was initially driven by ever, and the fact remains that our game is not just any game;
intense opposition to traditional philosophy. Among its figure- our conventions, vocabulary and expressive power do advance
heads were such well-known names as Carnap and Popper, both knowledge.
very capable of quantitative theorizing and therefore more con- We heard much about the virtues of being involved in appli-
genial to us. Some of Popper’s ideas about “conjectures and cations and real-world problems. Sure, applications have driven
refutations” (one of his book titles) are firmly embedded in some of the recent developments in the field. But one important
the theory of statistical hypothesis testing; we too teach that driver of the recent history of statistics matters above all. Note
hypotheses can never be verified, they can only be falsified. this: What made the bootstrap, Bayesian modeling, nonpara-
Putting it more grandiosely, truth is in those hypotheses that metric fitting, and data visualization possible in 1999 but not
hold up to repeated testing in the long run. 1949? Computers! Wasn’t computer technology a more perva-
As a corollary of this ramble, I state that statistics is nei- sive driver of research than any particular area of application?
ther just technology nor just application—it is science above Maybe “driver” is the wrong word; “enabler” or “catalyst” may
all. “Science” still seems to be a badge of honor in our part of be better, because computers allowed us to do things we al-
the world, and abrogating it from statistics would presumably ways wanted to do but could not. For me, it is a curious per-
lower its status. It should come to our attention, however, that sonal memory that only a quarter-century ago Werner Stuetzle
science is not a universally appreciated value, and even in acad- and Jerry Friedman custom-built hardware for computer graph-
emia there are colleagues who see us as just playing a game. To ics in anticipation of the concept of a “workstation.” And to-
TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3
330 BRADLEY EFRON

day we take it for granted to have access to 4–6 orders of puting tools, the S and R languages, possibly away from the
magnitude more power, all packed into a laptop with 60G of functional paradigm, which in an early incantation even John
disk, 1G of memory, and a processor probably more than a Chambers called “flamboyant,” namely wasteful. Twenty years
thousand times faster than the SUN chip number 2 they used from now, S and R might still be recognizable as the same lan-
on their ORION workstation. And that is only the hardware. guages that they are today, but they will have grown to enable us
Equally important is software, such as the S and R languages, to use them for the exploration of the Sloan Digital Sky Survey
the BUGS package, C and Fortran (if we must), database soft- data, or the medical records of all U.S. patients.
ware, spreadsheets, and of course LaTex. Colin talked about Looking into the future of the fundamental concepts of statis-
large datasets: they too became possible because of computer tics may be more difficult. Will there be breakthroughs similar
technology. Related technologies produced microarrays, fMRI, to robustness, bootstrap, nonparametric fitting, Bayesian mod-
communications networks, the Sloan Digital Sky Survey, and eling, and data visualization? Colin may be on to something
more. with his zeroth and fourth problems of which address the points
At this point, one may wonder whether computer technol- where the rubber meets the road:
ogy will keep enabling our field, or whether we are slowly
approaching an asymptote. It may indeed be the case that the 1. Before we perform an analysis, how do we make contact
future stimuli will be new types of data, such as microarrays, with reality? What data should we use? Are the data at
genomics and proteomics data, image libraries, and transaction hand relevant for the questions that we have in mind?
and network data. I expect, however, that computer speed and 2. After we performed an analysis, what does it mean in re-
increase in storage will continue to play a role, but, realistically ality? Have we answered any questions at all? Going be-
speaking, we have to wait for software standards to emerge so yond the original questions, did we find surprises?
the manipulation of, for example, sets of images will be as easy
as it is today for standard N × p data matrices. We are probably Do these questions not echo the concerns of epistemology: how
sufficiently parochial and computationally limited that we will can we extract knowledge from empirical observations? Maybe
not be the leaders in future data infrastructure, although I could there exist fundamental concepts that would elucidate these im-
be wrong, in view of the achievements of our colleagues in the portant stages of statistical activity. Colin kicked the ball; it is
bioconductor project. Something that may and should happen, up to us to keep it rolling. Meanwhile, we might just hold our
as we move with faster computers and greater storage capacity heads a little higher than we have been used to doing. Thanks,
into larger data problems, is an evolution of our high-level com- Colin, for an inspiring piece!

Discussion
Bradley E FRON
Department of Statistics
Stanford University
Stanford, CA 94305
(brad@stat.stanford.edu )

Colon Mallows’ essay is intriguing, insightful, provocative, ural world. Is information science real science? Tukey is more
and a little strange. The same description can be applied to generous than Mallows in his answer. From my point of view,
Tukey’s monumental 1962 paper, which turned out to be much the question was settled in the 1920s and 1930s by the de-
fatter than I remembered when retrieved from the JSTOR velopment of the Fisher information bound and the Neyman–
archive (48 sections!) Much of that paper now seems idio- Pearson lemma. These are optimality results, statements of the
syncratic, even for Tukey, focusing on, for instance “FUNOP” best it is possible to do in a given situation, and they moved sta-
methodology for dealing with outliers, but the general message tistics beyond its methodology stage into scientific adulthood.
is still powerful: Statistical ideas should be ultimately judged by The theory/applications dichotomy looks shallow in this con-
their utility in applications, not on theoretical grounds. I do not text; every MLE application reflects, at least tacitly, the bless-
agree completely with Tukey’s sentiment, but I am grateful to ing of the information bound, and similarly for hypothesis tests
Mallows for channeling it so forcefully. What follows are some and Neyman–Pearson.
responses to both Tukey and Mallows, with no real attempt at Tukey’s paper proposed a return to the world of pure method-
logical ordering. ology. It is easy to forget how radical data analysis (also pro-
I believe that statistics is an information science (actually,
the first information science), joined now by such fields as
© 2006 American Statistical Association and
computer science, information theory, optimization (operations the American Society for Quality
research), and signal processing. As such, it operates at a sec- TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3
ond level, one step removed from direct contact with the nat- DOI 10.1198/004017006000000183

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3


DISCUSSION 331

(a)
posed as a new name for our field) was intended to be. A look
through EDA or Mostellor and Tukey’s green book on regres-
sion analysis reveals almost no theoretical structure at all, not
even basic probability. Electronic computation also plays a sur-
prisingly minor role. Section 47, “The Impact of the Computer”
(one page), has a certain Luddite aspect, with Tukey as John
Henry, carrying out “hand FUNOP” at lightening speeds. In
fairness, this relates to small datasets, “36 values,” a forum
where Tukey was unchallenged master, and arguably the fa-
vored setting for the whole data analysis program. Of course,
this is not how data analysis has played out in the twenty-first
century.
At least one of Tukey’s main points has come to pass, and in
spectacular fashion: “We need to tackle old problems in more
realistic frameworks.” As a sometimes-practicing biostatisti-
cian, I have been witness to nearly a complete makeover in the
day-to-day methodology of statistical analysis: Kaplan–Meier,
generalized linear models, proportional hazards, robust meth- (b)
ods, jackknives and bootstraps, Cp (!), GEE, EM, MCMC, . . . .
The computer broke the bottleneck of mathematical tractability
that constrained classical statistics, and statisticians responded
with a ferocious burst of algorithmic-based technology. The-
ory and applications worked together in this creative outburst,
a healthy situation that continues today.
Mathematical statistics was a tired subject in 1962, as Huber
suggests, making Tukey’s call for an applications-oriented ref-
ormation timely as well as exciting. The call was answered. The
ensuing 40 years have seen a rising curve of applied statistical
interest, with a notable upward acceleration in the past decade
but with a twist that Tukey did not foresee: Massive data prob-
lems, in the terminology of Pregibon and Mallows, generated
by new scientific equipment such as microarrays, have moved
center stage. (Tukey himself worked on large-scale problems,
notably in the halothane study, but they are not the main thrust
of “the future of data analysis.”) Figure 1. Histogram of z-Values From a Prostate Cancer Microarray
I want to talk about one example, at the risk of it soon ap- Study (Singh et al. 2002) (a) and Q–Q Plot of z-Values (b).
pearing as quaint as FUNOP. Figure 1 relates to a microar-
ray experiment comparing two classes of prostate cancer. Here
50 patients in class 1 were compared with 52 patients in class 2, • Which of the genes are “significantly” nonnull? The quo-
yielding two-sample t statistics for each of 6,033 genes, say ti , tation marks are necessary because the classic definition
i = 1, 2, . . . , 6,033. Each ti has been converted to a putative of significance seems possibly inappropriate in a massive-
z-score, data context. FDR analysis, another useful new statistical
methodology, flags 60 genes at FDR level .1: 32 on the left
zi = −1 F100 (ti ), i = 1, 2, . . . , 6,033, (1) and 28 on the right.
• As in Mallows’ example, is this permutation analysis the
where F100 is the cdf of a standard Student t variate with
correct one?
100 degrees of freedom, and  is the standard normal cdf.
• How powerful for detecting nonnull genes is the experi-
The zi would have a N(0, 1) null distribution under the classic
ment?
Gaussian assumptions.
• Is N(0, 1) the correct null hypothesis, or should we use
Panel (a) of the figure shows a histogram of the 6,033
N(0, 1.12 ), as suggested by the Q–Q plot? (Doing so re-
z-values, compared with a permutation null distribution ob-
duces the number of flagged genes to 16, with only 3 on
tained by randomly interchanging the microarrays, recomput-
the left.)
ing the t statistics, and applying transformation (1). [In fact,
the permutation null is almost perfectly N(0, 1); score one for I have given some possible answers in earlier work (Efron
the classic Gaussian assumptions.] A Q–Q plot of the z-values, 2004, 2005). My reason for bringing this up here relates to the
in (b) at right, shows a distribution approximately N(0, 1.12 ) question of theory versus application. Microarray studies have
near the center but with long tails, presumably reflecting some generated another furious burst of statistical technology, as a
nonnull genes, the kind that the scientists were looking for. look at the R library “CRAN” will show. But technology by
Here are a few of the questions that come to mind: itself is not enough. Sooner or later, the new applications will
TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3
332 PETER J. HUBER

have to be grounded in new theory. This is exactly the point makes some telling points, but we do not want to emulate the
where statistics proves itself to be a science and not just “a pure Gilbert and Sullivan character “who praises with enthusiastic
technology,” in Tukey and Mallows’ unfortunate phrase. FDR tone, every century but this, and every country but his own.”
theory is a good example of new theory already happening, and
I have no doubt of further progress being close at hand. ADDITIONAL REFERENCES
The call to applications, appropriate in 1962, seems strange
Efron, B. (2004), “Large-Scale Simultaneous Hypothesis Testing: The Choice
in today’s massive-data atmosphere. Applications and theory of a Null Hypothesis,” Journal of the American Statistical Association, 99,
feed off of each other in an ideal science, and in fact I think 96–104.
the balance in statistics is rather nice right now. A critical spirit (2005), “Local False-Discovery Rates,” technical report, Stanford Uni-
is natural to statistical training (I like Rubin’s way of saying versity, available at http://www-stat.stanford.edu/brad/papers/False.pdf.
Singh, D., Febbol, P., Ross, K., Golub, T., and Sellers, R. (2002), “Gene Ex-
this), and that includes a big dose of self-criticism. Mallows’ pression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, 1,
essay offers a healthy example of the self-critical genre, and 203–209.

Discussion
Peter J. H UBER
Klosters, Switzerland
(peterj.huber@bluewin.ch)

When I was asked to comment on Colin Mallows’s paper and Already back in 1940, Deming had gently criticized Hotel-
the surrounding issues, my first reaction was: What can I say ling’s paper on the teaching of statistics by pointing out
that has not been said before? I felt that Mallows’ account was that “there are many other neglected phases that ought to be
so insightful and elegant, that to comment on it would be to stressed.” He mentioned simple descriptive tools and the in-
detract from it. However, I found a few snippets to add, and terpretation of data that arise from conditions not in statistical
after looking once more at Tukey’s 1962 paper and at reports control. His suggestions fell on deaf ears; these items were, and
on the status and future of statistics, I felt that there are many obviously still are, considered dirty stuff, below the dignity of
things that should be said again, and again. And again. a “core” statistician.
I comment on four topics: the 2002 National Science Foun- Deming’s plea was reiterated more forcefully and in consid-
dation (NSF) report (because it is a step in the wrong direction), erably more detail by Tukey in 1962. It is sad to witness the
data mining (because it invites programmed self-deception), renewed attempts to keep statistics pure and narrow.
models (because Tukey had eschewed them in 1962), and strat-
egy (because I would like to add some alternative facets to Mal- 2. DATA MINING
lows’s five problems).
By not paying attention to “dirty stuff,” the statistics com-
munity opened the field wide to others, particularly computer
1. 2002 NSF REPORT ON THE FUTURE scientists, who then invented data mining and touted it as a cure-
OF STATISTICS all for the problems caused by data glut. On the whole, I take
a pretty dim view of data mining (see Huber 1999, p. 636).
This report sets its theme by defining the field in the follow- Of course, there are interesting and important aspects. But too
ing words: many of the so-called “data mining tools” are nothing more than
Statistics is the discipline concerned with the study of variability, with the study good old-fashioned methods of statistics, with fancier terminol-
of uncertainty and with the study of decision making in the face of uncertainty. ogy and a glossier wrapping. Unfortunately, what had made
Is that all? And does it point in the right direction? Through those methods work in the first place—namely, the common
the three-fold use of the words “the study of,” it stresses that sense judgment of a good old-fashioned statistician applying
them—did not fit into a supposedly fully automated package
statistics is concerned with the theory of the things rather than
and was omitted.
with the things themselves. It is a fine description of ivory tower
As a consequence, data miners have added a new twist to ly-
theoretical statistics, and it pointedly excludes analysis of actual
ing with statistics: programmed self-deception. Large datasets
data. This is like giving a definition of physics that excludes
typically are heterogeneous, and automated methods inevitably
experimental physics. Why did they not write: “Statistics is the
fall into the pitfalls dug by heterogeneity. The statistics com-
theory and practice of data analysis,” with some explication of
munity may be to blame here, too; how many statistics courses
what is meant by theory and by practice? The report is sticking
to its set theme by concentrating on the supposed theoretical
© 2006 American Statistical Association and
“core” of statistics, and after seeing that, I gave up. I agree with the American Society for Quality
Leo Breiman that the report is a step into the past and not into TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3
the future. DOI 10.1198/004017006000000174

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3


DISCUSSION 333

and texts ever draw attention to Simpson’s paradox? Freed- it adequately represents the data. On the other hand, a rejected
man/Pisani/Purves is a laudable exception (see the example on model might be perfectly adequate.
sex bias in graduate admissions). Rather than downplay and ignore statistical modeling (as
The danger inherent in neural network and similar black-box Tukey seems to suggest), I recommend that data analysis should
methods lies not so much in overfitting, as many statisticians provide techniques for assessing the adequacy of models. A di-
seem to think, but rather in the difficulty of interpreting what is rect comparison between the data and the model is not good
going on inside the black box. The more opaque the box, the enough. For example, comparing an estimated spectrum with
less likely it is that one will spot potential problems. A case the theoretical spectrum is difficult, because it may be impos-
story from a data analysis exam may illustrate my point. In sible to tell whether differences are real or are due to random
that exam, a student found that the best discriminator between errors or processing artifacts. But a more elaborate comparison,
carriers and noncarriers of a certain genetic disease was age. between judiciously chosen results computed from the data and
This was entirely correct but useless; what he had discovered, analogous results computed from an ensemble of simulations
but misinterpreted, was that carriers and controls had not been of the model, is a most powerful way to judge the adequacy of
matched with regard to age. Would we have noticed that if we a model.
had been presented with a black-box discrimination procedure
without explicit identification of the discriminatory variable? 4. STRATEGY
In 1997, in an essay titled “Strategy Issues in Data Analysis,”
3. MODELS, COMPARISON AND SIMULATIONS I drew parallels to the famous treatise by Clausewitz (1832)
on military strategy. This essay has relevance to several of the
In his 1962 paper Tukey eschews modeling. This is interest-
issues raised by Mallows, in particular to his five problems of
ing for several reasons. Among special growth areas Tukey sin-
Fisher (from zeroth to fourth), and I cull some aphorisms from
gles out stochastic process data: “Data of the sort today thought
it.
of as generated by a stochastic process is a current challenge
The standard statistics texts concentrate on techniques geared
which both deserves more attention, and different sorts of at-
toward small and homogeneous datasets. They are concerned
tention, than it is now receiving” (p. 4). Today, most people
with the “tactics” of the field, whereas “strategy” deals with
would think that data of this sort cries out for modeling (e.g.,
broader issues and with the question of when to use which tech-
Box–Jenkins, state-space models, Kalman filter). Of course, nique. The need for strategic thinking in data analysis is im-
these approaches gained prominence only later. But in sec- posed on us by the advent of ever larger datasets. What really
tion 27, after discussing spectral aspects of stochastic process forces the issue is not the size, but the fact that larger datasets al-
data in some detail, Tukey voices reservations about models: most invariably are less homogeneous and have more complex
“If I were actively concerned with the analysis of data from internal structure.
stochastic processes (other than as related to spectra), I believe Clausewitz commented disdainfully on the theorists of his
that I should try to seek out techniques of data processing which time, who “considered only factors that could be mathemati-
were not too closely tied to individual models.” Note that his cally calculated.” Similarly, theoretical statistics should go be-
EDA (1977) might have been subtitled “data analysis in the ab- yond mathematical statistics.
sence of models.” According to Clausewitz, “war is the continuation of politics
Tukey’s idiosyncratic dislike of models is curious, because by other means. . . . The political object is the goal, war is the
he regards data analysis as a science (p. 6), and modeling is means of reaching it, and means can never be considered in iso-
a scientific, not a statistical, task. But models are viewed dif- lation from their purpose.” If war is the continuation of politics,
ferently in the sciences and in (traditional) statistics. In the sci- then data analysis is the continuation of science.
ences, insight typically is gained by thinking in models. Models Data analysis is hard and often tedious work, so do not waste
do not need to exactly represent the situation, only its relevant forces. Concentrate on essentials. Use the simplest approach
features, and this creates problems on the interface between sci- that will do the job. Do not demonstrate your data-analytic
ence and statistics. Scientists interpret goodness-of-fit statistics prowess with exotic procedures. Remember the KISS principle:
rather differently from statisticians. I remember the puzzlement keep it simple and stupid. Do not overanalyze your data. Know
of a physicist comparing observed to theoretical spectra, when when to stop. If it comes to the worst, accept defeat gracefully.
his data analysis program had declared a visually lousy fit as Data analysis ranges from planning the data collection to pre-
good and a visually perfect fit as poor, and he suspected a bug senting the conclusions of the analysis, that is, from Mallows’s
in the program. On inspection, it turned out that his program zeroth to fourth problem. The war may be lost already in the
was perfectly in order; it calculated chi-squared goodness-of-fit planning stage. Unfortunately, the data analyst rarely has any
statistics, and in the first case the observational errors were very control over the earliest phases of an analysis—namely, over
large, and in the second case they were negligibly small (and planning and design of the data collection, as well as over the
the test statistic had picked up irrelevant systematic errors in a act of collecting the data (which sorely complicates the zeroth
large dataset). There is no methodology in traditional statistics problem!). There are cases (mostly hushed up and therefore
for assessing the adequacy of a model! The only thing orthodox rarely documented) where multimillion dollar data collections
frequentist theory can do about models is to reject them, and had to be junked because of poor design. I recounted a case
Bayesian statistics cannot even do that. Nonrejection does not (a large hail-prevention experiment) where an unusually re-
imply that the model is correct; it does not even tell one whether sourceful and persuasive statistician was able to convince the
TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3
334 JAMES M. LANDWEHR

sponsor that the data he was supposed to analyze were worth- likelihood methods. The other items on my list of actions typi-
less because one had neglected to randomize, and to force an cally call for judgment rather than mathematics.
improved repetition of the experiment. The final strategy issue, the presentation of conclusions, is re-
My recommendation is to plan the data collection with the lated to Mallows’ fourth problem, and I should add some alter-
subsequent analysis in mind. Clever planning may simplify the nate considerations to it. The more massive and more complex
analysis and may make the spotting and correcting of the ever the datasets, the more difficult can it be to present the con-
present errors easier. Be aware of the dangers of gross errors and clusions. Also, they may become massive and correspondingly
of systematic errors, of omitted or duplicated batches of data, difficult to manage. With high-dimensional data (we met ex-
and so on. The meta-data (i.e., the story behind the data, how amples in market research and highway quality maintenance
they were preprocessed, the precise meaning of the variables, problems), the number of potential questions can explode. We
and so on) are just as important as the data themselves. found that a kind of sophisticated decision support system (i.e.,
To assist with overall planning and preparation of resources, a customized software system to generate answers to questions
I proposed a kind of checklist by dividing the actual analysis of the customers) is almost always a better solution than a thick
into strategically important stages. I illustrated them with the volume of precomputed tables and graphs.
help of examples. These stages tend to be encountered in the I do endorse Mallows’ new vision and his conclusion. I per-
following order, but, strictly speaking, a linear ordering is not sonally think that the best way to teach applied statistics is
possible; one naturally and repeatedly cycles between different through case studies and apprenticeship, particularly through
actions (I prefer this word to Fisher’s “problems”): active participation in substantial projects with real data. But
I know from experience how difficult it is to involve students in
• Inspection such a fashion during their university education. His suggestion
• Error checking about the JASA vignettes is an interesting first step in system-
• Modification atizing such ideas. But deep immersion in one substantial scien-
• Comparison tific problem is more important for a future applied statistician
• Modeling and model fitting than shallow immersion in many problems. I think it was Nor-
• Simulation bert Wiener who once claimed that a necessary prerequisite for
• “What-if” analyses successful collaboration between a mathematician and a scien-
• Interpretation tist was that both are knowledgeable about the other’s field of
• Presentation of conclusions. expertise to such a degree that the scientist is able to suggest
a novel theorem, and the mathematician can suggest a novel
Conceptually, Fisher had been concerned with homogeneous
experiment. The full statement can be found in Wiener (1965,
univariate populations, which was a perfect starting point for
p. 3). It is ascribed there to the physiologist Arturo Rosenblueth.
his time. But his three problems do not generalize very well
to more complicated situations. His first two (specification and
estimation) correspond to modeling and model fitting in my ADDITIONAL REFERENCES
framework. I combine them because they are closely intercon- Huber, P. J. (1997), “Strategy Issues in Data Analysis,” in Proceedings of
nected. With more complex data and more complex analysis the Conference on Statistical Science Honoring the Bicentennial of Stefano
procedures, Fisher’s third problem (distribution) is becoming Franscini’s Birth, Monte Verità, Switzerland, eds. C. Malaguerra, S. Morgen-
thaler, and E. Ronchetti, Basel: Birkhäuser-Verlag, pp. 221–238.
too difficult to handle by theory alone, and one must resort to (1999), “Massive Datasets Workshop: Four Years After,” Journal of
simulation. Model fitting may involve such techniques as non- Computational and Graphical Statistics, 8, 635–652.
linear weighted least squares, generalizing Fisher’s maximum Wiener, N. (1965), Cybernetics, Cambridge, MA: MIT Press.

Discussion
James M. L ANDWEHR

Avaya Labs
Basking Ridge, NJ 07920
( jml@avaya.com)

In this paper, Mallows shares his interesting insights into One point that I would especially like to note is Mallows’
Tukey’s landmark 1962 paper through relating Tukey’s paper definition of statistics: “Statistics concerns the relation of quan-
to several relevant statistical contexts: Sir R. A. Fisher’s ear- titative data to a real-world problem, often in the presence of
lier work, the research environment in London in the late 1940s
and early 1950s, and several events and reports from the last
© 2006 American Statistical Association and
10 years or so. We benefit from Mallows’ reflections and also the American Society for Quality
from his views on some important general issues for statistics TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3
today. DOI 10.1198/004017006000000165

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3


DISCUSSION 335

variability and uncertainty. It attempts to make precise and ex- process is not totally ad hoc. He refers to comparing efficacy,
plicit what the data has to say about the problem of interest.” It not efficiency under some modeling assumptions. He suggests
seems to me that this focuses on the essence of our subject and praising and using work that gets only to the first stage of the
yet is sufficiently broad to encompass the range of important overall process. Tukey wanted real analyses, not just theoretical
problems that we statisticians attack today. Mallows has use- investigations of statistical properties of new procedures.
fully characterized these activities into his five stages of prob- To what extent have we incorporated this paradigm into our
lems. applied work? My sense is that in many serious application en-
Mallows writes early in this paper that “perhaps the most use- vironments it is followed roughly, with plenty of iterations. But
ful thing I can do is to urge you to reread Tukey’s paper. . . ” I think it is hard to see this approach very often in textbooks or
Having found Mallows’ advice well worth heeding, I did reread in the literature we write for each other, where the goals—for
Tukey’s paper, which is eminently quotable. I would like to take whatever reasons—generally seem to focus on in-depth treat-
this opportunity to share a few quotations from Tukey’s paper ment of specific pieces of the process rather than providing a
beyond those included by Mallows, and also to reflect on them sense of the overall problem solving and data analysis processes
a bit. Have Tukey’s words been borne out by events over the for serious applications. I would be delighted to have someone
last 40 years? Are Tukey’s words relevant today and looking convince me that I am wrong about this; specifically, that it is
forward? relatively easy to get a good sense of the data analysis process
The quotations that follow are all taken from Tukey’s 1962 from our statistics journals. (And, as a former journal editor my-
paper on the indicated pages. Italics are from the original paper.
self, I realize that journals and authors have multiple purposes,
and this may often not be a realistic one.)
1. WHAT IS DATA ANALYSIS? “There is a corresponding danger for data analysis, particularly in its statistical
“I have come to feel that my central interest is in data analysis, which I take aspects. This is the view that all statisticians should treat a given set of data in
to include, among other things: procedures for analyzing data, techniques for the same way, just as all British admirals, in the days of sail, maneuvered in
interpreting the results of such procedures, ways of planning the gathering of accord with the same principles” (p. 13).
data to make its analysis easier, more precise or more accurate, and all the “The most important maxim for data analysis to heed, and one which many
machinery and results of (mathematical) statistics which apply to analyzing statisticians seem to have shunned, is this: ‘Far better an approximate answer
data” (p. 2). to the right question, which is often vague, than an exact answer to the wrong
“Many have forgotten that data analysis can, sometimes quite appropriately, question, which can always be made precise’ ” (pp. 13–14).
precede probability models, that progress can come from asking what a speci-
fied indicator (=a specified function of the data) may reasonably be regarded as Concerning the first of these quotes, perhaps the view that
estimating. Escape from this constraint can do much to promote novelty” (p. 5). Tukey attacked came from the notion that statistics amounts to
Note the breadth of the definition of data analysis in the first solving a precisely defined mathematical problem. My sense
quote. Through the last phrase Tukey includes mathematical is that we statisticians have indeed gotten away from this no-
statistics, perhaps not all of it but the portions that he feels tion, as well as from the views that there is only one “correct”
actually apply to analyzing data. My sense is that this defini- way to analyze a set of data and that all good statisticians must
tion is broader than the way in which the term “data analysis” treat the data the same way. I believe that alternative, sensible
has actually come to be used over the years. As for data analy- approaches should give basically similar important conclusions
sis preceding probability models or being done separately from about the data; moreover, if they do not, we should at least be
probability models, the situation has changed, and that certainly able to understand and explain to others what aspects of the
seems to have come to pass over the years. approaches lead to the different conclusions. But my sense is
also that others who analyze data from time to time but are not
highly trained in statistics may not see the situation this way,
2. THE PROCESS OF DATA ANALYSIS,
and that many continue to believe that there must be one and
DANGERS AND GOALS
only one “right” way to analyze some data.
“There is but one natural chain of growth in dealing with a specific problem of The second statement here is arguably Tukey’s most famous
data analysis, viz:
single quotation (at least about data analysis). Probably we all
(a1 ) recognition of problem, agree with it, but probably we all also need to keep it in mind
(a1 ) one technique used,
(a2) competing techniques used, more than we do.
(a3) rough comparisons of efficacy,
(a4) comparison in terms of a precise (and thereby inadequate) criterion,
(a5 ) optimization in terms of a precise, and similarly inadequate criterion, 3. METHODOLOGICAL AREAS: MULTIVARIATE
(a5 ) comparison in terms of several criteria
ANALYSIS, GRAPHICS, AND COMPUTING
(Number of primes does not indicate relative order.)
“The analysis of multiple-response data has similarly been much discussed, but,
...
with the exception of psychological uses of factor analysis, we see few exam-
(A) Praise and use work which reaches stage (a3), or only stage (a2), or
ples of multiple-response data today which make essential use of its multiple-
even stage (a1 ). . . ” (p. 7).
response character” (p. 4).
Here Tukey laid out his view of the data analysis process, “In view of this difficulty of description, it is not surprising that we do not have
apparently unique at this level of specification. I am struck by a good collection of ideal, or prototype multivariate problems and solutions,
a few points. He states that any precise criterion is inadequate indeed it is doubtful if we have even one (where many are needed). A better
grasp of just what we want from a multivariate situation, and why, could per-
(for the purpose of data analysis) but still requires one or several haps come without the aid of better description, but only with painful slowness”
criteria as key components of the process. The data analysis (p. 33).

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3


336 JAMES M. LANDWEHR

Applications of multivariate analysis have come a long way, perhaps more data analysis is done by users through Excel than
and I doubt if many statisticians today would agree with the first through all of the mainstream statistics packages. Is that desir-
statement. Descriptive multivariate statistical techniques, which able or not? I think the answer is that it depends on the expertise
Tukey saw as a gap, still provide in my opinion plenty of op- with which it is done, not where or by whom (although these
portunities, even though much progress has been made through variables may be correlated).
creative and computationally intensive graphics. It is interesting
that Tukey made these statements on data analysis for multivari-
ate problems approximately ten years after T. W. Anderson’s 4. DATA ANALYSIS: WHAT’S NEXT?
classic book on multivariate analysis was published. “We need to face up to the need for both indication and conclusion in the same
“The importance of the introduction of half-normal plotting is probably not analysis” (p. 62).
quite large enough to be regarded as marking a whole new cycle of data analy- Multiple comparisons research topics have progressed, but in
sis, though this remains to be seen. The half-normal plot itself is important, but practice I think this simply has not happened nor have we made
the developments leading out of it are likely to be many and varied. . . . These
techniques, like the half-normal plot itself, will begin with indication, and will the progress that Tukey anticipated 40 years ago. Were his goals
only later pass on to significance, confidence, or other explicitly probabilistic unrealistic?
types of inference” (p. 42).
“We need to face up to the need for a free use of ad hoc and informal procedures
Certainly probability plotting methods in general are now in seeking indications” (p. 62).
widely used by statisticians and are very powerful tools in mod- We have made more progress, both in techniques and in atti-
eling, so I would say that Tukey’s prediction has been borne out. tudes, on this front.
But their integration and use with significance and confidence
procedures has not happened yet to a large degree, in my opin- “. . . it is natural for indication procedures to grow up before the corresponding
conclusion procedures do so” (p. 62).
ion. I understand that half-normal plots are part of six-sigma
black-belt training programs, but how widely used are prob- How does this statement relate to the computationally inten-
ability plotting methods outside the community of those with sive Bayesian modeling topics that have developed over the last
advanced statistical training? I think that use of and comfort decade and more? My sense is that this important new area de-
with probability plotting methods is almost a reliable marker veloped through a different process than that which Tukey en-
for those with strong data analysis interests and skills, so there visioned for advances.
is still a long ways to go in terms of advancing and disseminat- “The future of data analysis can involve great progress, the overcoming of real
ing this general topic. difficulties, and the provision of a great service to all fields of science and tech-
nology. Will it? That remains to us, to our willingness to take up the rocky
“How vital, and how important, to the matters we have discussed is the rise
road of real problems in preference to the smooth road of unreal assumptions,
of the stored-program electronic computer? In many instances the answer may
arbitrary criteria, and abstract results without real attachments. Who is for the
surprise many by being ‘important but not vital,’ although in others there is no
challenge?” (p. 64).
doubt but what the computer has been ‘vital’. . . . On the other hand, there are
situation[s] where the computer makes feasible what would have been wholly
unfeasible. Analysis of highly incomplete medical records is almost sure to
These were the closing words of Tukey’s paper. The paper
prove an outstanding example” (pp. 63–64). was a call to action, an attempt to move, if not rock, the foun-
“Some would say that one should not automate such procedures of examina- dations of the establishment. It was emotional, not dry—an ex-
tion, that one should encourage the study of the data. (Which is somehow hortation. Although containing some technical content, it was
discouraged by automation?) To this view there are at least three strong counter- primarily a polemic. It came around 40 years after Fisher’s pa-
arguments:
pers of the 1920s laying out some of the foundations of math-
1. Most data analysis is going to be done by people who are not sophisti- ematical statistics. We now are 40-plus years from Tukey’s
cated data analysts and who have very limited time; if you do not provide
them tools the data will be even less studied. . . . paper, roughly the same length of time. We have come a long
way, I believe, as measured against Tukey’s statements. The
I look forward to the automation of as many standardizable statistical proce-
dures as possible. When these are available, we can teach the man who will range, depth, and technology of serious data-analytic appli-
have access to them the ‘why’ and the ‘which,’ and let the ‘how’ follow along” cations today—whether led by people calling themselves sta-
(p. 22).
tisticians or by others—are impressive. But is there today a
Tukey was not proselytizing for a data analysis approach to corresponding call to rock the establishment, or could there be,
statistics because computing enabled it, nor was he saying that or should there be? If so, what is it?
heavy-duty computing is required for it. Instead, he argued that Thanks again to Mallows for a stimulating paper and for mo-
data analysis was the correct approach and mindset to take. tivating me to reread Tukey’s paper, and to the editors for invit-
Computing just made it easier and more widely available, and ing this discussion and causing me to think further about the
permitted attacking bigger and more realistic problems. Today, issues.

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3

View publication stats

You might also like