You are on page 1of 12

Should Computer Scientists Experiment More?

16 Excuses to Avoid Experimentation


Walter F. Tichy
University of Karlsruhe, Germany
Nov. 1997

Abstract puters and programs are human creations, so


we could conclude that computer science is not
Computer scientists and practitioners defend a natural science in the traditional sense.
the lack of experimentation with a wide range I think that the engineering view of com-
of arguments. Some arguments suggest that puter science is too narrow, too computer-
experimentation may be inappropriate, too myopic. First of all, the primary subjects of
dicult, useless, and even harmful. This ar- inquiry in computer science are not merely
ticle discusses several such arguments to illus- computers, but information and information
trate the importance of experimentation for processes[13]. Computers play a dominant role
computer science. because they make information processes eas-
This is a preprint of an article with the same ier to model and observe. However, by no
title that appeared in IEEE Computer, 31(5), means are computers the only place where in-
May 1998, 32{40. formation processes occur. In fact, computer
Keywords: Empiricism, experiments, labo- models compare poorly with information pro-
ratory, scienti c method. cesses found in nature, say in nervous systems,
in immune systems, in genetic processes, or,
1 Is computer science an if you will, in the brains of programmers and
computer users. The phenomena studied in
experimental science? computer science are much broader than those
arising around computers.
Do computer scientists need to experiment at Regarding \syntheticness", I prefer to think
all? Only if we answer \yes" does it make about computers and programs as models.
sense to ask whether there is enough of it. Modeling is in the best tradition of science,
In his Allen Newell Award Lecture, Fred because it helps us study phenomena closely.
Brooks suggests that computer science is \not For example, for studying lasing, one needs
a science, but a synthetic, an engineering to build a laser. Regardless of whether lasers
discipline"[2]. In an engineering eld, test- occur in nature, building a laser does not
ing theories by experiments would be mis- make the phenomenon of massive stimulated
placed. Brooks and others seem troubled by emission arti cial. Superheavy elements must
the fact that the phenomena studied by com- be synthesized in the lab for study, because
puter scientists appear manufactured | com- they are unstable and do not occur naturally,

1
yet nobody assumes that particle physics is a multi-version program was the product of
synthetic. Similarly, computers and software the failure probabilities of the individual ver-
don't occur naturally, but they help us model sions. However, Knight and Leveson observed
and study information processes closely. Us- in an experiment that the failure probabili-
ing these devices does not render information ties of real multi-version programs were sig-
processes arti cial. ni cantly higher. In essence, the experiment
A major di erence to traditional sciences is falsi ed the basic assumption of conventional
that information is neither energy nor matter. theory, namely that faults in program versions
Could this di erence be the reason we see lit- are statistically independent.
tle experimentation in computer science? To Experiments are also used for exploring ar-
answer this questions, let's look at the purpose eas where theory and deductive analysis do not
of experiments. reach. Experiments probe the in uence of as-
sumptions, eliminate alternative explanations
of phenomena, and unearth new phenomena
2 Why should we experi- in need of explanation. In this mode, exper-
iments help with induction: deriving theories
ment? from observation.
Arti cial neural networks are a good exam-
When I discuss the purpose of experiments ple of this process. After having been dis-
with mathematicians, they often exclaim that carded on theoretical grounds, experiments
experiments don't prove a thing. It is true demonstrated properties better than pre-
that no amount of experimentation provides dicted. Researchers have now developed bet-
proof with absolute certainty. What then are ter theories to account for these properties.
experiments good for? We use experiments for
theory testing and for exploration. 2.1 Traditional scienti c method
Experimentalists test theoretical predic-
tions against reality. A community gradually isn't applicable
accepts a theory if all known facts within its The fact that | in the eld of computer sci-
domain can be deduced from the theory, if ence | the subject of inquiry is information
it has withstood numerous experimental tests rather than matter or energy makes no no dif-
and if it correctly predicts new phenomena. ference to the applicability of the traditional
Nevertheless, there is always an element of scienti c method. In order to understand the
suspense: To paraphrase Dijkstra, an experi- nature of information processes, computer sci-
ment can only show the presence of bugs in a entists must observe phenomena, formulate
theory, not their absence. Scientists are keenly explanations and theories, and test them.
aware of this uncertainty and are therefore There are plenty of computer science the-
ready to shoot down a theory if contradicting ories that haven't been tested. For instance,
evidence comes to light. functional programming, object-oriented pro-
A good example of theory falsi cation in gramming, and formal methods are all thought
computer science is the famous Knight-and- to improve programmer productivity, program
Leveson experiment[8]. The experiment was quality, or both. It is surprising that none
concerned with the failure probabilities of of these obviously important claims have ever
multi-version programs. Conventional the- been tested in a systematic way, even though
ory predicted that the failure probability of they are all 30 years old and a lot of e ort

2
has been invested in developing programming much smaller percentages of unvalidated pa-
languages and formal techniques. pers there than in computer science.
Traditional sciences use theory test and ex- Relative to other sciences, the data shows
ploration iteratively because observations help that computer scientists validate a smaller
formulate new theories that can be tested percentage of their claims. One could argue
later. An important requirement for any ex- that computer science at age 50 is still young
periment, however, is repeatability. Repeata- and hence a comparison with other sciences is
bility makes sure that results can be checked of limited value. I disagree, because 50 years
independently and thus raises con dence in seems plenty of time for two to three genera-
the results and helps eliminate errors, hoaxes, tions of scientists to establish solid principles.
and frauds. But even on an absolute scale, I think that
it is scary when half of the non-mathematical
2.2 The current level of experi- papers make unvalidated claims. Assume that
each idea published without validation would
mentation is good enough have to be followed up by at least two valida-
Suggesting that the current level of experi- tion studies (that's a very mild requirement).
mentation doesn't need to change is based on It follows trivially that no more than one third
the assumption that computer scientists, as a of papers published could contain unvalidated
group, know what they are doing. This argu- claims. The data suggests that computer sci-
ment maintains that if we need more experi- entists publish a lot of untested ideas or the
ments, we'll simply do them. ideas published are not worth testing.
But this argument is tenuous; let's look at I'm not advocating replacing theory and en-
the data. In [15], 400 papers were classi- gineering by experiment, but I am advocating
ed. Only those papers were considered fur- a better balance. I advocate balance not be-
ther whose claims required empirical evalua- cause it would be desirable for computer sci-
tion. For example, papers that proved the- ence to appear more scienti c, but because of
orems were excluded, because mathematical the following principal bene ts:
theory needs no experiment. In a random sam-
ple of all papers ACM published in 1993, the  Experiment can help build up a reliable
study found that of the papers with claims base of knowledge and thus reduce un-
that would need empirical backup, 40% had certainty about which theories, methods,
none at all. In journals related to software, and tools are adequate.
this fraction was 50%. The same study also  Observation and experiment can lead to
analyzed a non-CS journal, Optical Engineer- new, useful, and unexpected insights and
ing, and found that in this journal, the fraction open up whole new areas of investigation.
of papers lacking quantitative evaluation was Experimentation can push into unknown
merely 15%. areas where engineering alone progresses
The study by Zelkowitz and Wallace[17] only slowly, if at all.
found similar results. When applying con-
sistent classi cation schemes, both studies re-  Experimentation can accelerate progress
port between 40% and 50% unvalidated pa- by quickly eliminating fruitless ap-
pers in software engineering. Zelkowitz and proaches, erroneous assumptions, and
Wallace also surveyed journals in physics, psy- fads. It also helps orient engineering and
chology, and anthropology and again found theory into promising directions.

3
Conversely, when we ignore experimenta- tant to test whether Einstein was correct or
tion and avoid contact with reality, we hamper not.
progress. Not many investigations are of a scope com-
parable to General Relativity, but there are
2.3 Experiments cost too much many smaller, but still important questions to
answer. How can such work be done econom-
The rst line of defense against experimenta- ically? Since cost seems to be uppermost in
tion goes typically like the following: \Doing everybody's mind, I will spend more space on
an experiment would be incredibly expensive" this issue than on the others. My goal is to
or \For doing this right, I would need hundreds help the cost-conscious scientist or engineer
of subjects, I would be busy for years without overcome the cost barrier.
being able to publish, and the cost would be Experiments can indeed be expensive. But
enormous." are all of them prohibitively expensive? I
To this, a hard-nosed scientist might say: think not. There are meaningful experiments
\So what?" Instead of being paralyzed by cost that t the budget of small laboratories. There
considerations, he or she would rst probe the are also expensive experiments that are worth
importance of the research question. When much more than their cost. And there is a
convinced that a fundamental problem is be- wide spectrum in between.
ing addressed, an experienced experimentalist
would then go about planning an appropriate
research program, actively look for a ordable Benchmarking. Though often criticized,
experimental techniques, and suggest interme- benchmarks are an e ective and a ordable
diate steps with partial results along the way. way of conducting experiments. Essentially, a
For a scientist, funding potential should not benchmark is a sample of a task domain; this
be the only or primary criterion for decid- sample is executed by a computer or by human
ing what questions to ask. In the traditional and computer. During execution, well-de ned
sciences, there is a complex social process at performance measurements are taken. Bench-
work in which important questions crystallize. marks have been used successfully in widely
These become the foci of research, the break- di ering areas, including speech understand-
through goals that open up new areas, and ing, information retrieval, pattern recognition,
scientists actively search for economic ways software reuse, computer architecture, perfor-
to conduct the necessary experiments. For mance evaluation, applied numerical analysis,
instance, the rst experimental validation of algorithms, data compression, logic synthesis,
General Relativity was tremendously expen- and robotics. A benchmark provides a level
sive and barely showed the e ect. The ex- playing eld for competing ideas, and assum-
periment was performed by Sir Issac Edding- ing the benchmark is suciently representa-
ton in 1919. Eddington used a total solar tive, it allows repeatable and objective com-
eclipse to check Einstein's theory that grav- parisons. At the very least, a benchmark can
ity bends light when it passes near a massive quickly eliminate unpromising approaches and
star. At the time, this was a truly expensive exaggerated claims.
experiment since it involved an expedition to Constructing a benchmark is usually in-
Principe Island (West Africa) and the tech- tensive work, but the burden can be shared
nology of photographic emulsions had to be among several laboratories. Once a bench-
pushed to its limits. However, it was impor- mark is de ned, it can be executed repeatedly

4
at moderate cost. In practice, it is necessary this observation is not a uke.1 Then running
to evolve benchmarks to prevent over- tting. experiments to test the fundamental tenets of
Regarding benchmark tests in speech recog- object-oriented programming would be truly
nition, Raj Reddy writes: \Using common valuable. These experiments might save re-
databases, competing models are evaluated sources far in excess of their cost. The ex-
within operational systems. The successful periments might also have a lasting and pos-
ideas then seem to appear magically in other itive e ect on the direction of programming
systems within a few months, leading to a vali- language research. They may not only save
dation or refutation of speci c mechanisms for industry money, but also save research e ort.
modeling speech."[14] It is useful to check what scientists in other
In many of the examples cited above, bench- disciplines spend on experimentation. Every-
marks have caused a sudden blossoming of the one realizes that drug testing in medicine is
area, because they made it easy to identify extremely expensive, but only desperate pa-
promising approaches and discard poor ones. tients accept poorly tested drugs and thera-
I agree with Reddy that \all of experimental pies. In aeronautics, we demand that airfoils
computer science could bene t from such dis- be tested; expensive wind tunnels have been
ciplined experiments." built for just this purpose. Numerical simula-
tion has reduced the number of such tests, but
Costly experiments. When human sub- not eliminated them. In many sciences, simu-
jects are involved in an experiment, the cost lation has become an important form of exper-
often goes up dramatically, while signi cance imentation, and computer science might also
goes down. When are expensive experiments bene t from good simulation techniques. In
justi ed? When the implications of the in- biology, Wilson names the Forest Fragmenta-
sights gained outweigh the cost. Let us take tion Project in Brasilia as the most expensive
an example. A signi cant segment of the soft- biological experiment ever[16]. While clearing
ware industry has converted from C to C++ a large tract of the Amazon jungle, isolated
at a substantial cost in retraining. One might patches of various sizes (1 to 1000 hectares)
well ask how solidly grounded the decision to were left standing. The purpose was to test
switch to C++ was. Other than case stud- hypotheses regarding the relationship between
ies (which are questionable because they don't habitat size and number of species remain-
generalize easily and may be under pressure to ing. And the list of experiments continues {
demonstrate desired outcomes), I'm not aware in physics, chemistry, ecology, geology, clima-
of any solid evidence showing that C++ is su- tology, and on and on. Any reader of Scien-
perior to C with respect to programmer pro- ti c American can nd experiments in every
ductivity or software quality. Nor am I aware issue. Computer scientists need not be afraid
of any independent con rmation of such ev- or ashamed of conducting large experiments
idence. However, while training students in when exploring important questions.
improving their personal software processes,
my research group has recently observed that
1 Just as this article went to press, we learned that
C++ programmers may make many more mis-
takes and take much longer than C program- a paper by Les Hatton, \Does OO Really Match the
Way We Think?" will appear in the May issue of IEEE
mers of comparable training { both during ini- Software, reporting strong evidence of the negative ef-
tial development and maintenance. Suppose fects of C++.

5
2.4 Demonstrations will suce 2.5 There is too much noise in
In his 1994 Turing Award lecture, Juris
the way
Hartmanis argues that computer science dif-
fers suciently from other sciences to per- The second line of defense against experimen-
mit di erent standards in experimentation, tation goes like this: \There are too many
and that demonstrations can take the place of variables to control, and the results would be
experiments[5]. I couldn't disagree more. De- meaningless, because the e ects I'm looking
mos can provide proof-of-concepts in the en- for are swamped by noise."
gineering sense, or provide incentives to study True, experimentation is dicult { for re-
a question further. Too often, however, these searchers in all disciplines, not just computer
demos merely illustrate a potential. Demon- science. I think researchers who are invoking
strations depend critically on the observers' this excuse are looking for an easy way out.
imagination and their willingness to extrap-
olate; they do not normally produce solid evi- An e ective simpli cation for repeated ex-
dence. To obtain such evidence, a careful anal- periments is benchmarking. Fortunately,
ysis is necessary, involving experiments, data, benchmarking can be used for many ques-
and replication. tions in computer science. The subjective and
What would therefore weakest part in a benchmark test is
be interesting questions amenable to experi- the composition of the benchmark; everything
mentation in the traditional sense? Here are else, if properly documented, can be checked
a few examples. The programming process is by the skeptic. Hence, the composition of the
poorly understood; computer scientists could benchmark is always hotly debated (is it rep-
therefore introduce di erent theories of how resentative enough?), and benchmarks must
requirements are re ned into programs and evolve over time to get them closer to what
test them experimentally. Similarly, a deeper one wants to test.
understanding of intelligence might be discov- Experiments with human subjects involve
ered and tested. The same applies to research many additional challenges. Several elds
in perception, questions about the quality of have found techniques for dealing with hu-
man-machine interfaces, or human-computer man variability, notably medicine and psychol-
interaction in general. Also, the behavior of ogy. We've all heard about control groups,
algorithms on typical problems or on comput- random assignments, placebos, pre- and post-
ers with storage hierarchies cannot be pre- testing, balancing, blocking, blind and double-
dicted accurately. Better algorithm theories blind studies, and the battery of statistical
are needed and should be tested in the labora- tests. The fact that a drug in uences di erent
tory. Research in parallel systems is currently people in di erent ways doesn't stop medical
generating a number of machine models; their researchers from testing. And when control
relative merits can only be explored experi- is impossible, then case studies, observational
mentally. This list is certainly not exhaustive, studies and an assortment of other investiga-
but the examples all involve experiments in tive techniques are used. Indeed, medicine of-
the tradition of science: They require a clear fers many important lessons on experimental
question, an experimental apparatus to test design, on how to control variables and how
the question, data collection, interpretation, to minimize errors. Eschewing experimenta-
and sharing of the results. tion because of diculties is not acceptable.

6
2.6 Progress will slow imental subject, i.e., of a software product or
tool?
The argument here is that if everything must If a question becomes irrelevant quickly, it
be backed up by experiment before publica- is perhaps too narrow and not worth spending
tion, then the number of ideas that can be a lot of e ort on it. But behind many ques-
generated and discussed in the scienti c com- tions with a short lifetime lurks a fundamental
munity will be throttled and progress will slow. problem with a long lifetime. My rst advice
This is not an argument to be taken lightly. to scientists here is to probe the fundamental
In a fast-paced eld such as computer science, and not the ephemeral, and to learn to tell
the number of ideas being discussed is obvi- the di erence. My second advice hinges on
ously important. However, experimentation the observation that technological change of-
need not have an adverse e ect; quite the con- ten shifts or eliminates assumptions that were
trary. taken for granted. Therefore, scientists should
First, increasing the ratio of papers with anticipate changes in assumptions and pro-
meaningful validation has a good chance of actively employ experiments to explore the
actually accelerating progress: Questionable consequences of such changes. Note that this
ideas will be weeded out more quickly and sci- type of work is much more demanding, and
entists will concentrate their energies on more can have much higher long-term value, than
promising approaches. merely comparing software products.
Second, I'm con dent that good conceptual
papers and papers formulating new hypothe-
ses will continue to be valued by readers and 2.8 You'll never get it published
will therefore get published. It should be un- This is actually partly true. Some established
derstood that experimental testing of these hy- computer science journals have diculty nd-
potheses will come later. ing editors and reviewers capable of evaluat-
So it is a matter of balance once more. ing empirical work. Promotion committees
Presently, non-theory research rarely moves may be dominated by theoreticians. The ex-
beyond the assertive state, a state character- perimenter is often confronted with review-
ized by such weak justi cation as \it seems ers who expect perfection and absolute cer-
intuitively obvious", or \it looks like a good tainty. However, experiments are conducted in
idea", or \I tried it on a small example and the real world and are therefore always awed
it worked." Reaching a ground rmer than somehow. Reviewers may also build up im-
assertion is desirable. possibly high barriers. I've seen demands for
experiments to be conducted with hundreds of
2.7 Technology changes too fast subjects over a span of many years involving
several industrial projects before publication.
This concern comes up frequently in computer That smaller steps are still worth publishing
architecture. Trevor Mudge summarizes it: because they improve our understanding and
\...the rate of change in computing is so great raise new questions is a thinking that some are
that by the time results are con rmed they not familiar with.
may no longer be of any relevance."[9] The However, this situation is changing. In my
same can be said about software. What good experience, publication of experimental results
is an experiment when the duration of the ex- is not a problem of one chooses the right out-
periment exceeds the useful life of the exper- let. I'm on the editorial board of three jour-

7
nals; I review for quite a number of additional spectable experimentalists. They must artic-
journals and have served on numerous confer- ulate how their systems contributes to our
ence committees. All non-theory journals and knowledge. Systems come and go; insights
conferences that I've seen would greatly wel- about the concepts and phenomena underly-
come papers reporting on solid experiments. ing systems are what is needed. I have great
The occasional rejection of high-quality papers expectations for systems researchers who use
not withstanding, I'm convinced that the low their skills in setting up interesting experi-
number of good experimental papers is a sup- ments.
ply problem.
The funding situation for experimentation is
more dicult, especially in industry/academia 3 Why substitutes won't
collaborations. However, it helps to note that
experimentation may give industry a three to work
ve year lead over the competition. For ex- Can we get by with forms of validation that
ample, suppose an experiment discovered an are weaker than experiments? It depends on
e ective way to reduce maintenance costs by what question we're asking, but here are some
using software design patterns. The industrial excuses that I nd less than satisfactory.
partner of such an experiment could exploit
this result immediately, especially since the ex-
periment prepared the groundwork for adopt- 3.1 Feature comparison is good
ing the technology. Given a two-year publica- enough
tion time lag and various other delays (such as
the results being noticed by others, let alone A frequently found model of a scienti c paper
adopted), the industrial partner in such an is the following. The work describes a new
experiment can exploit at least a three-year idea, prototyped perhaps in a small system.
lead. Lucent Technologies estimates that it is The claim to \scienti cness" is then made by
presently bene ting from a ve-year lead in feature comparison. The reports sets out a list
software inspection methods based on a series of features and qualitatively compares older
of in-house experiments,2 apparently despite approaches with the new one, feature by fea-
(or because of) vigorous publication of the re- ture.
sults. I nd this method satisfactory when a rad-
On the negative side I fear that the \sys- ically new idea or a signi cant breakthrough
tems researcher" of old will face diculties. is presented, such as the rst compiler for a
Just building systems is not enough unless the block-structured language, the rst timeshar-
system demonstrates some kind of a \ rst," ing system, the rst object-oriented language,
a breakthrough. Computer science continues the rst web browser. Unfortunately, the ma-
to be favored with such breakthroughs and we jority of papers published take much smaller
should continue to strive for them. The ma- steps forward. As computer science becomes a
jority of systems researchers, however, works harder science, mere discussions of advantages
on incremental improvements of existing ideas. and disadvantages or long feature comparisons
These researchers should try to become re- will no longer be sucient. Any PC magazine
can provide those. A science, on the other
2 Larry Votta, private communication, Lucent hand, cannot live o such weak phenomeno-
Technolgies. logical inferences in the long run. Instead,

8
scientists should create models, formulate hy- out to be dead wrong sometimes.
potheses, and test them using experiments.

3.2 Trust your intuition 3.3 Trust the experts


In his March 1996 column, Al Davis, the During a recent talk at a top US university,
editor of IEEE Software suggests that gut I was about to present my data, when a col-
feeling is enough when adopting new soft- league interrupted and suggested that I skip
ware technology; experimentation and data that part and go on to the conclusions. \We
are super uous[3]. He even suggests ignoring trust you." was the explanation. Flattering
evidence that contradicts one's intuition. as that was, it shows a disturbing misunder-
standing of the scienti c process (or someone
However, instinct and personal experience in a hurry). Any scienti c claim is initially sus-
sometimes lead down the wrong path and com- pect and must be examined closely. Imagine
puter science is no exception. Here are some what would have happened if physicists hadn't
examples. For about twenty years, it was been skeptical about the claims by Ponds and
thought that meetings were essential for soft- Fleischman regarding cold fusion.
ware reviews. However, recently Porter and Frankly, I'm continually surprised how
Johnson found that reviews without meetings much the computer industry and sometimes
are neither substantially more nor less e ec- even university teaching relies on so-called
tive than those with meetings[11]. Meeting- \experts" of all kinds, who fail to back up their
less reviews also cost less and cause fewer de- assertions with evidence. Science, on the other
lays, which can lead to a more e ective inspec- hand, is built on healthy skepticism. It is a
tion process overall. Another example where good system to carefully check results and to
observation contradicts conventional wisdom accept them only provisionally until they have
is that small software components are propor- been con rmed independently.
tionally less reliable than larger ones. This ob-
servation was rst reported by Basili [1] and
has been con rmed by a number of disparate
sources; see Hatton [6] for summaries and an 4 Problems do exist
explanatory theory. As mentioned, the failure
probabilities of multi-version programs were Here are some excuses that are in uenced by
incorrectly believed to be the product of the the quality of experiments in computer sci-
failure probabilities of the component versions. ence.
Another example is type checking in program-
ming languages. Type checking is thought to
reveal programming errors, but there are con-
4.1 Flawed experiments
texts when it does not help [12]. P eeger et al. \Experiments make unrealistic assumptions",
[10] provides further discussion of the pitfalls or \The data was manipulated", or \It is im-
of intuition. possible to quantify the variable of interest,"
What we can learn from these examples is are some of the criticisms. There are many
that intuition may provide a starting point, more potential aws: Experimenters may pick
but must be backed up by empirical evidence. irrelevant questions, may neglect to provide
Without grounding, intuition is highly ques- enough detail for repeating experiments, may
tionable. What one thinks obvious may turn be nonchalant about control, may not validate

9
observations, forget to bound errors, use inap- intelligence. The weak reasoning methods of
propriate measurements, over-interpret their the rst theory have gradually given way, or
results, produce results that do not general- have been coupled with, knowledge bases [4].
ize, etc. Other examples include symbolic vs. subsym-
Good examples of solid experimentation in bolic processing, RISC vs. CISC, the various
computer science are rare. And there will al- models for predicting the performance of (par-
ways be questionable, even bad experiments. allel) computers, and the competition among
However, the conclusion from this observation programming language families (logic, func-
is not to discard the concept of experimenta- tional, imperative, object-oriented, rule-based,
tion. We should keep in mind that other sci- constraint-based). Another important exam-
enti c elds have been faced with bad experi- ple is algorithms theory. The present theory
ments, even frauds. But the scienti c process has many drawbacks; in particular, it does
on the whole has been self-correcting. Bad not account for the behavior of algorithms on
ideas, errors, and downright hoaxes have been \typical" problems[7]. A more accurate the-
weeded out, sometimes promptly (see cold fu- ory that applies to modern computers would
sion) sometimes belatedly (see the Piltdown be valuable.
man).3 A prerequisite for competition among the-
We can be sure of one thing, though: If sci- ories is falsi ability. Unfortunately, computer
entists overlook experimentation or neglect re- science theorists rarely produce falsi able the-
examining others' claims, an important source ories; they tend to pursue mathematical theo-
of self-correction will be cut o and the eld ries that are disconnected from the real world.4
may drift into the wrong direction. Thus, it has largely fallen to experimentalists
and engineers to formulate falsi able theories.
4.2 Competing theories While computer science is perhaps too
A science is most exciting when there are two young to have brought forth grand theories,
or more strong, competing theories. When my greatest fear is that the lack of such theo-
a new, major theory replaces an older one, ries might be caused by a lack of experimen-
one speaks of a paradigm shift, while the sta- tation. If scientists neglect experiment and
ble periods in between are called \normal sci- observation, they'll have diculties discover-
ence". Physics provides interesting examples ing new and interesting phenomena worthy of
of paradigm shifts. better theories.
There are a few competing theories in com-
puter science, none of them earth-shaking.
The physical symbol system theory vs. the 4 In Ch. 9 of The Quark and the Jaguar, W.H. Free-
knowledge processing theory in AI is one of man (1994), Gell-Mann provides a lucid discussion of
them. These two theories attempt to explain the relationship between mathematics and science. If
science is concerned with describing nature and its
3 Piltdown man are fossil remains found in England laws, then mathematics is not a science, because it
in 1912. The fossils were thought to be a species of pre- is not concerned with nature; it is concerned with the
historic man and generated scholarly controversy that logical consequences of certain assumptions. On the
lasted about 40 years. In 1954, intense re-examination other hand, mathematics can also be viewed as the rig-
showed the remains to be fraudulent. The fossils con- orous study of what might have been, i.e., the study
sisted of skillfully disguised fragments of a quite mod- of hypothetical worlds, including the real one. In that
ern human cranium (50,000 years old), the jaw and case, mathematics is the most fundamental science of
teeth of an orangutan, and the tooth of a chimpanzee. all.

10
4.3 Soft science the general public. There is also much evi-
\Soft science" means that experimental results dence suggesting that the scienti c method ap-
cannot be reproduced. Experiments with hu- plies. As computer science leaves adolescence
man subjects are not necessarily soft. There behind, I hope to see the experimental branch
are stacks of books on how to conduct exper- of this discipline ourish.
iments with humans. Experimental computer Acknowledgments This essay has bene ted
scientists can learn the relevant techniques or tremendously from numerous discussions with
ask for help. The side-bar provides some start- colleagues. I'm especially grateful for thought-
ing points. provoking comments by Les Hatton, Ernst
Heinz, James Hunt, Paul Lukowicz, Anneliese
v. Mayrhauser, David Notkin, Shari Lawrence
4.4 Misuse P eeger, Adam Porter, Lutz Prechelt, and
The argument goes along the following lines: Larry Votta.
\Give the managers or funding agencies a sin-
gle gure of merit and they will use it blindly
to promote or eliminate the wrong research." References
I think this is a red herring. Good managers, [1] Victor R. Basili and B.T. Perricone. Soft-
good scientists, and good engineers all know ware errors and complexity: An empiri-
better than to rely on a single gure of merit. cal investigation. Communications of the
Second, there is a much greater danger in re- ACM, 27(1):42{52, January 1984.
lying on intuition and expert assertion alone.
Keeping decision makers in the dark has an [2] Frederick P. Brooks. Toolsmith II. Com-
overwhelmingly higher damage potential than munications of the ACM, 39(3):61{68,
informing them to the best of ones abilities. March 1996.
[3] Al Davis. From the editor. IEEE Soft-
5 Conclusion ware, 13(2):4{7, March 1996.
Experimentation is central to the scienti c [4] Edward A. Feigenbaum. How the What
process. Only experiments test theories. Only becomes the How. Communications of the
experiments can explore critical factors and ACM, 39(5):97{104, May 1996.
bring new phenomena to light so theories can [5] Juris Hartmanis. Turing award lecture:
be formulated in the rst place. Without ex- On computational complexity and the na-
periments in the tradition of science, computer ture of computer science. Communica-
science is in danger of drying up and becoming tions of the ACM, 37(10):37{43, October
an auxiliary discipline. The current pressure 1994.
to concentrate on applications is the writing
on the wall. [6] Les Hatton. Reexamining the
I have no doubt that computer science is fault density{component size connection.
a fundamental science of great intellectual IEEE Software, 14(2):89{97, 1997.
depth and importance. Much has already been
achieved. Computer technology has changed [7] John N. Hooker. Needed: An empiri-
society, and computer science is in the pro- cal science of algorithms. Operations Re-
cess of deeply a ecting the weltanschauung of search, 42(2):201{212, March 1994.

11
[8] John C. Knight and Nancy G. Leveson. [16] Edward O. Wilson. The Diversity of Life.
An experimental evaluation of the as- Harvard University Press, 1992.
sumption of independence in multiver-
sion programming. IEEE Transactions on [17] Marvin V. Zelkowitz and Dolores Wal-
Software Engineering, SE-12(1):96{109, lace. Experimental models for validating
January 1986. computer technology. IEEE Computer,
31(5), May 1998.
[9] Trevor Mudge. Report on the panel: How
can computer architecture researchers
avoid becoming the society for irrepro-
ducible results? Computer Architecture
News, 24(1):1{5, March 1996.
[10] Shari Lawrence P eeger, Victor Basili,
Lionel Briand, and Khaled El-Emam. Re-
buttal to March 96 editorial. IEEE Soft-
ware, 13(4), July 1996.
[11] Adam A. Porter and P.M. Johnson. As-
sessing software review meetings: Re-
sults of a comparative analysis of two ex-
perimental studies. IEEE Transactions
on Software Engineering, 23(3):129{145,
March 1997.
[12] Lutz Prechelt and Walter F. Tichy. An
experiment to assess the bene ts of inter-
module type checking. In Proc. Third
Intl. Software Metrics Symposium, pages
112{119, Berlin, March 1996. IEEE Com-
puter Society Press.
[13] Anthony Ralston and Edwin D. Reilly.
Encyclopedia of Computer Science, Third
Edition. Van Nostrand Reinhold, 1993.
[14] Raj Reddy. To dream the possible dream.
Communications of the ACM, 39(5):105{
112, May 1996.
[15] Walter F. Tichy, Paul Lukowicz, Lutz
Prechelt, and Ernst A. Heinz. Experi-
mental evaluation in computer science: A
quantitative study. The Journal of Sys-
tems and Software, 28(1):1{18, January
1995.

12

You might also like