You are on page 1of 4

Statistical Science

2003, Vol. 18, No. 3, 342–345


© Institute of Mathematical Statistics, 2003

John W. Tukey as Teacher


Stephan Morgenthaler

John W. Tukey was an atypical intellectual for our way through the most general of these meth-
times, a thinker of surprising inventiveness. He spent ods, beginning with the simplest and adding
his whole academic career at Princeton University, complexity step by step.
almost from the start dividing his time between Bell
The chapter headings included:
Labs and the University. Each term he taught courses
for undergraduates and doctoral students. He cherished General outline; Combining independent
teaching and was beloved and highly respected by his results and assessing their significance;
students. Combination of directionalities and indica-
He had many interests outside of science, but was tions of directionality; Combining tests of
such a consummate statistician that we thought it might fit; Combining values to get a value; Com-
bring him closer to you, the reader, if we offer you a bining values to get an interval; Combin-
glimpse of one of his courses. ing intervals to get a value: group by group
(scared, Paull-2, Paull-2F and PL combina-
A COURSE ON COMBINING DATASETS tions); Externally weighted combination of
In the spring term of 1982, John W. Tukey taught intervals to get an interval; Which combina-
a graduate course entitled “Combining Datasets.” At tion when?
that time, John’s teaching method was firmly estab- Chapter 12 discussed a statistical problem, the com-
lished. John wrote transparencies by hand and Eileen bination of intervals to get a value, that had fasci-
Olszewski, for many years his secretary at Princeton, nated John over a long period and the remainder of
typed them for distribution to the students. During lec- this section consists essentially of an extract of the
tures John used two projectors with each slide being course notes.
first discussed and then moved to the second projector.
Combining Datasets was announced as follows: In this chapter and the next, we start from
intervals—essentially values with estimated
The purpose of this course will be to re- uncertainties—and combine them to get a
view methods of combining results. This value. Essentially, then, our problem is to
is, the simplest problems of data analysis— choose the weights with which our values
treating single batches, regression, and are to be combined, in the first instance on
analysis of variance—all involve a single
the basis of the given interval lengths (and
(often internally structured) body of data.
in the second, if we go over to resistive
The next natural step is to put together—
combination, with a view to a more precise
to combine—the results of such separate
result).
analyses.
The crucial consideration, which we met
We do this in many ways, using as little,
casually above, but which now drives and
from each individual data body, as an ap-
determines our choice of method, is the
parent direction and as much as an esti-
question of whether the values to be com-
mated amount; together with an estimated
bined estimate the same thing or whether
variance for that estimate, and an indicated
each value to be combined estimates a dif-
number of degrees of freedom for that es-
ferent thing—and we want to estimate a
timated variance. We will try to work our
summary of the different things that are to
be combined. Exhibit 1 illustrates one ex-
Stephan Morgenthaler is Professor, Institute of
treme, where it is clear that both:
Mathematics, Swiss Federal Institute of Technology
(EPFL), 1015 Lausanne, Switzerland (e-mail: stephan. • the intervals are NOT estimating the same
morgenthaler@epfl.ch). thing, and

342
JOHN W. TUKEY AS TEACHER 343

E XIBIT 1. E XIBIT 2.

• the individual estimates are not of the


same precision. related to value of center?), we will want to
weight the values for the individual series in
As a result, we would be foolish indeed a way that reflects their apparent precisions
to let the choice of weights depend upon
(accuracies, perhaps).
the interval lengths. While this might give
Just how to reflect is the main topic of this
us a combination of minimum variance,
chapter.
its interpretation would be at best blurred,
and probably quite unreasonable. In such What NOT to Do!
a case, the weights ought to depend upon
the intrinsic importance of the targets of the The most natural—to the unthinking—way
several series, which may often mean that to reflect the interval lengths is also just
they should be equal. what we SHOULD NOT DO. This is to
There are intermediate cases, which will be treat the s’s underlying the interval lengths
the subject of the following chapter. as if they were σ ’s, which means weighting
Exhibit 2 illustrates the situation to be like 1/si2 .
treated in the present chapter, where there is Why is this dangerous—and a loser on
no reason—either from the data or from our average? We have repeatedly stressed two
understanding of the subject-matter situa- points:
tion—to believe that different series esti-
mate different values. [Ha! Ha!] • most s 2 are poor estimates of their σ 2 —
Here, if there is no reason to weight the some high, some low,
individual-series values to reflect some • the worst thing one can do, when weight-
known intrinsic importance (and there is no ing, is to put a big weight on what de-
reason to believe that length of interval is serves a small one.
344 S. MORGENTHALER

If we weight by 1/si2 , some si2 will be much follow some stretched tail distribution,
too small, so that 1/si2 will be much too big, each si2 will be more widely dispersed than
so that we will have done just exactly what a multiple of χf2 /f . We get part way there
we should not! by allowing for χ 2 /f variability. Even part
Partial Weighting way is good! (We return, in the next section,
to a way of going further.)
For the last quarter century, the standard
response to this challenge has been partial The Order Statistics of χf2 /f . The simplest
weighting as expounded in Cochran’s paper way to deal with an unknown multiple
of 1954 in which after a careful inquiry of χf2 /f is to focus on ratios, and adjust
Cochran confirmed the usefulness of the by division. Suppose we have a number
following procedure: of si2 , for convenience based on a common
number of df. How might we bring them
• find a naive weight for each series;
more nearly to a common value?
• order the series by these naive weights;
If they are indeed from the distribution of
• take 1/2 to 2/3 of the series with the
highest naive weights (no recipe for how kχf2 /f
to pick the exact fraction);
for a common k, and if we order them, the
• for the series in this low-variability group,
ordered values will behave like order statis-
use the mean of all their naive weights as
tics from kχf2 /f , or, equivalently, as k times
their weight, or, better, do this with the
reciprocals; the order statistics of a sample from χf2 /f ,
• for the remaining series, use their naive so that it is natural to look at
weights. si2 /c(i|n, f )
This procedure of “partial weighting” meets where c(i|n, f ) is a typical value for the ith
the major difficulties head on, and over- order statistic of n from χf2 /f . These ought
comes them. So long as most series will all to behave as though they were estimat-
tend to have about the same accuracy, par- ing k—which in our context is the com-
tial weighting will ensure that no individual mon σ 2 from which all si2 came.
series gets a catastrophically high weight. It A reasonable approximation, good enough
is not surprising that it has been a standard for many purposes, to the c(i|n, f ) can be
for so many years. had from
If we want to look for difficulties, we are  
−1 3i − 1
likely to focus upon: c(i|n) = Gau
3n + 1
1. the absence of a recipe for the size of the
lower group, and and
  3
2. the possible misweighting of the series  2 2 
 
NOT in the lower group. c(i|n, f ) = 1 − + c(i|n)  ,
 9f 9f 
Grouped Weighting
f ≥ 3,
Let us plan to take the natural variation
so that we have numbers for c(i|n, f ) when
of the naive weights (which we suppose
we want them.
to be 1/si2 ’s) into account. How can we
One way to proceed would be to replace the
do this? In the overutopian case, where
naive weight
the series summaries come from individual
measurements that follow a Gaussian distri- 1
wi∗ =
bution, each si2 is distributed like a multiple si2
of χf2 /f . We could certainly try to allow for
by the pushed-back weights
this much variability.
In the real world, where series summaries 1 c(i|n, f )
Wi∗ = = .
come from individual measurements that si2 /c(i|n, f ) si2
JOHN W. TUKEY AS TEACHER 345

Doing this would be a real gain, since the FINAL REMARKS


largest weights would be pulled down, but
this does not feel like the place we should The material reprinted here is quite typical for John’s
stop. course notes. He put the emphasis on problems, on
If, for example, we had n = 10, f = 6 and general approaches and on a few specific methods he
considered to be reasonable. The historical develop-
si2 = 4, 7, 8, 9, 10, 15, 18, 20, 200, 500, ment of an area and the systematic discussion of all
the values of si2 /c(i|n, 6) would be, roughly, the relevant ideas was not his style. In private discus-
respectively, 13.2, 15.2, 13.6, 12.7, 12.1, sion with him or in asking questions during his lec-
15.6, 16.3, 15.5, 130.4, 252.9, we see that tures, however, the students got a glimpse of the depth
we have increased the weights (decreased of his knowledge. Because his courses were not of the
the reciprocal weights, just given) on the standard type, he rarely recommended a textbook. In
two series with si2 = 200 and 500 without the tradition of Princeton’s mathematics department,
any real justification, because, even after he expected students to read related material on their
adjustment, it is clear that these two series own and to ask questions if something remained un-
were more variable than the others. clear. John also seldomly put exercices to the students.
Accordingly, instead of being divided by His notes explained the computations necessary for im-
plementing the methods and he expected the readers
c(9|10, 6) = 1.534
and especially his reasearch students to have under-
and stood the formulas and algorithms, if necessary by per-
forming them on a few simple examples by hand. This
c(10|10, 6) = 1.977
expectation was not always met and this could make
they deserve, if anything, to be divided by conversations with him faintly surreal. One could be
drawn into a discussion of a new idea without fully un-
c(1|2, 6) = .623
derstanding it, because it would have been necessary to
and reread some course material beforehand.
c(2|2, 6) = 1.231 Teaching took a large amount of John’s time and has
remained one of his loves throughout his career. He met
giving 321 and 406, respectively, instead with his graduate student once a week and always tried
of 130 and 253. hard not to miss this appointment. My impression was
The course goes on to develop a rule to decide on that he gave as much time to each as he thought was
the natural lower group on the basis of the observation needed, but his personal interest in the topic the student
that misweighting by factor of no more than 2 is almost worked on also had an impact. He was at times keen on
harmless. seeing the results of a new idea.

You might also like